Behind the Scenes: Keeping our service running 24/7
Behind the Scenes is a regular feature of the Credo Reference Blog. Each post describes an aspect of our content, service or development process not normally seen by our customers.
In a typical week, credoreference.com is visited by thousands of users searching for information, viewing entries, looking at images, saving citations, and exploring information using our Concept Map. This happens at the same time that meta search systems like Webfeat are requesting result sets and sites such as the UK’s National Health Service are pulling information in XML format for use on their web site.
While our users are browsing, we are often adding new content, updating existing content or even updating system software. Occasionally, we even upgrade or replace servers. Many sites schedule downtime to perform updates like these. We’ve made it part of our standard practice to run our site continuously.
So how do we do that? It’s really pretty simple.
The most important technique we use is to run many complete copies of our service. Each copy runs on a separate server. When a user connects, they are assigned to one of the servers. From that point on in their session, all of the searches, entries they view and other activities are handled by their assigned server. This assignment process is handled by a load balancer, a special device that monitors all of our servers and assigns new users to the one that is least loaded.
This architecture makes it easy to do updates without any downtime. We start by disconnecting the first server from the load balancer. Any users are immediately reassigned to the least loaded of the remaining servers. We then update the disconnected server. Once updated, we reconnect it. It will then begin to share the overall load as newly connecting users get assigned to it. We then repeat the process with the second, third and remaining servers. This is all done with an automated script so timing details and other possible types of errors are avoided.
While the update is occurring, some of our users will see the pre-update version of our service and some the post-update. It’s even possible for someone to see the pre and post update versions in a single session. The most frequent change is new title additions, so if you’re ever using Credo and a new title appears, you’ve probably experienced an update firsthand.
Besides updating the content and software, this architecture makes it easy to update the hardware. At one point, we actually did a major upgrade, replacing all servers with no downtime. We just replaced them one at a time.
All of this would be for nought if the servers themselves got disconnected, damaged or cut off from the Internet. To ensure that doesn’t happen, all of our servers are in a secure facility managed by Rackspace, a US based company with facilities around the world. Their centers are staffed 24 hours a day and have redundant power, air conditioning, fire suppression, backup power and fiber data lines that enter at separate points, so short of a really major disaster they’ll keep running.


