Massive AWS Cloud Outage May Frighten Enterprise Companies – Part 1

aws us-east-1 s3 region outage

When you’re trying to promote a technological solution as a solid, fail-proof alternative to traditional technologies, it’s bad PR if your tech breaks down in front of the whole world. That’s what happened to Amazon Web Services yesterday when they were hit with a 4-hour outage on one of their most important and largest cloud “regions” in North America.

Amazon has structured its physical cloud assets into what are known as Global Infrastructure Regions, each of which has multiple Availability Zones. Each of these is comprised of several datacenters that are home to Amazon’s servers.

On Tuesday, a large portion of AWS’s S3 cloud storage system went offline for about four hours, affecting numerous cloud-hosted websites.

The worst part of it is that the outage hit one of AWS’s most important regions in the United States, which is much larger than the other two and is also where Amazon Web Services typically rolls out its new features.

The service apparently went down at 12:35 pm ET and was restored at 4:49 pm ET, according to Amazon. AWS’s US-EAST-1 was the global infrastructure region impacted by the outage.

Market research company SimilarTech estimated that the S3 cloud storage service currently hosts data for 148,213 websites, and a lot of large and small companies were hit by the outage. One of these is Slack, one of the world’s fastest-growing team collaboration platforms.

 

What Does This Mean for Enterprise Cloud Adoption?

Though thousands of businesses may have been affected by this outage, from a broader viewpoint, it’s a frightening experience for enterprise companies that are still mulling over their move to a cloud environment.




One of the bigger concerns for enterprise-level companies is business continuity. A technology outage cannot be allowed to disrupt the normal functioning of a business, and for that purpose there are measures in place to bridge these disruptions and allow the business to continue functioning.

On the one side, cloud infrastructure has been promoted as one of the best countermeasures to such disruption. In fact, the way Amazon has structured its datacenters into availability zones and regions protects applications from total failure in a single location.

The problem here was, the entire region went down. In such a case, it’s not easy to quickly failover to another region to ensure business continuity for its clients.




Here’s what Amazon says about its cloud asset structure:

AWS Regions are comprised of Availability Zones, which refer to technology infrastructure in separate and distinct geographic locations with enough distance to significantly reduce the risk of a single event impacting availability, yet near enough for business continuity applications that require rapid failover.”

You can immediately see where the problem is: even though availability zones within a region are all interconnected (via private fiber optic networks), the regions themselves are distinct from each other.




This incident yesterday not only reveals a basic vulnerability of cloud datacenters, but also highlights a major pain point for enterprise companies – the inability to fail-over at a moment’s notice in situations where the entire region is affected.

Granted, this isn’t something that happens on a regular basis. The last major AWS outage was in September 2015, about a year and a half ago. Nevertheless, enterprise companies will be more wary of moving completely to AWS cloud.

What this underlines is the need for backup options in case of a major outage, and that’s something we’ll be discussing in the next article in this series. Stay tuned.



Thanks for reading our work! We invite you to check out our Essentials of Cloud Computing page, which covers the basics of cloud computing, its components, various deployment models, historical, current and forecast data for the cloud computing industry, and even a glossary of cloud computing terms.