Amazon Says Employee Error Caused Tuesday’s Cloud Outage
(Bloomberg) — Amazon.com Inc. said efforts to fix a bug in its cloud-computing service caused prolonged disruptions Tuesday that affected thousands of websites and apps, from project-management and expense-reporting tools to commuter alerts.
An Amazon Web Services employee working on the issue accidentally switched off more computer servers than intended at 9:37 a.m. Seattle time, resulting in errors that cascaded through the company’s S3 service, Amazon said in a statement Thursday. S3 is used to house data, manage apps and software downloads by nearly 150,000 sites, including ESPN.com and aol.com, according to SimilarTech.com.
“We are making several changes as a result of this operational event,” Amazon said in a statement. “While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly. We have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level.”
AWS is the company’s fastest-growing and most-profitable division, generating $3.5 billion in revenue in the fourth quarter. It’s the biggest public cloud-services provider, with data centers around the world that handle the computing power for many large companies, such as Netflix Inc. and Capital One Corp. Amazon and competitors like Microsoft Corp. and Alphabet Inc.’s Google are growing their cloud businesses as customers find it more efficient to shift their data storage and computer processes to the cloud rather than maintaining those functions on their own. Widespread adoption also increases the likelihood that problems with one service can have sweeping ramifications online.