Update:Friends and Clan Chat Rollback
This update was added on 21 September 2009.
During the migration of one of our storage systems, we have regrettably lost five days' worth of Friend List and Clan Chat updates for approximately one quarter of our players. Unfortunately, restoring from a newer backup is not an option in this situation, and the only course of action we have is to revert to the pre-migration data from last week.
As you may be aware, we have been moving our core systems from their current home in Cambridge to new, faster facilities located in central London. As part of this move, we are switching from a local server and disk storage system to a storage area network (SAN). The SAN is designed to allow us to scale our storage needs better in the future, and should have provided us with more resilience in the case of hardware or network failures.
The data in the SAN is replicated from the primary hosting facility, in real-time, to a secondary facility where we can run systems from in the event of a failure of the primary. Data is also backed up continuously from these two sites to our offices in Cambridge.
In this instance, however, during one of the system moves, we inadvertently triggered a flaw in the new storage system. This flaw caused the server hosting the fourth set of friend data to cease writing updates to the SAN at about midday last Wednesday. Because the data was cached in memory, and not flushed to the SAN, none was replicated to the second facility or backed up to our offices.
Despite this the friend storage system appeared to be working normally and we didn't spot the problem until this afternoon when we restarted the server for other maintenance. It is important to note that we DO have comprehensive backups, but in this case they didn’t help because the data was (invisibly) not saved to disk at all, so wasn’t there for us to rollback to.
We are currently investigating the root cause of the write failures, and will not migrate any further core systems until we have determined the cause of and resolved this problem. We are also more closely monitoring the systems we have already migrated, so if similar problems occur we can deal with them in a timely manner.
Please accept our sincerest apologies for any Friend List or Clan Chat updates you may have lost as a result of this failure.