Two Lessons from the Amazon Outage This Week

This week, Amazon’s Simple Storage Service (S3) went down across the East Coast, due to a typo by an engineer. The result was website outages across a massive number of both smaller companies and major brands including Nest home thermostats and work productivity software Slack. All told, more than 100,000 sites were affected by the outage. While little about the Amazon outage was ‘great’ for anyone, the story itself has done a great job of highlighting two important facts about cloud storage, and cloud-based networks in general.

sadcloud

First, even well-established cloud services can go down. It’s likely that many people using S3’s services simply assumed it was too big to fail – if Amazon Web Services has more than 1 million customers, what’s the likelihood of a major network outage? But a single point of failure doesn’t simply disappear by virtue of becoming a very big point of failure. While Amazon has likely invested a great deal of time and energy into ensuring uptime of services this large, it obviously couldn’t totally mitigate the possibility of failure.

Second, there’s a difference between data protection and data availability. Many people were surprised that S3 went down for this long because it’s advertised as having 99.999999999% durability. This durability statistic refers to the likelihood that your data will be lost, however, not the amount of time that it will be available to you. Amazon advertises the S3 service as “designed for” 99.99% uptime – that means that the service may be down for nearly an hour every year. There hasn’t been a major outage since 2012, so based on their own estimates, they were due for some downtime. Unfortunately for everyone involved, they delivered.

Is it time to take a look at your own data loss and accessibility risks? Give WingSwept a call at 919.779.0954 or contact us online to discuss how we can help!