Amazon has released its official report on the recent AWS global outage, pinpointing the root cause as a race condition between two automated system components.
What Caused the AWS Outage?
The problem originated from a conflict between:
- DNS Planner - Creates DNS plans
- DNS Enactor - Applies changes through Amazon Route 53
How It Happened
A delay in the DNS Enactor triggered an unintended consequence: active DNS plans were inadvertently deleted. This erasure removed the IP addresses for the DynamoDB EAST-US-1 endpoint, causing widespread service disruptions.
Amazon Route 53, which manages domain names and routes internet traffic, became the focal point of the failure when these critical addresses disappeared from the system.
Amazon's Response
The company has temporarily disabled the automation responsible for the issue while implementing safeguards to prevent similar failures in the future.
Source: The Register


