This morning Amazon suffered a major outage in its Virginia data center. We’ve talked about the importance of backups here before and will provide some more technical details about our backup strategy in a future post. In the meantime, the Amazon outage is a great example of why disaster recovery plans have to take prolonged data center outages and correlated failures into account. The availability zones in the US-EAST-1 region did not fail independently! A failure on one node or in one portion of a data center can cause load spikes and correlated failures elsewhere. What’s your plan for when disaster strikes? Does it involve multiple data centers? What about multiple data center operators?