On June 12, Google experienced a widespread outage affecting its cloud services across multiple regions. The company has now released a statement explaining the cause of the disruption, which was traced back to an internal software issue.
According to Google, the problem stemmed from a new version of the Service Control binary—a critical component responsible for enforcing API usage policies. A section of code in this update included a null pointer, which lacked proper error handling and was not protected by a feature flag. This flawed code was inadvertently activated due to a policy change, leading to cascading failures throughout the global infrastructure.
As multiple services restarted simultaneously in response to the failures, the infrastructure became overloaded, making it difficult to restore services promptly in some regions.
The incident highlights the complex nature of managing global cloud platforms and the importance of rigorous testing and safeguards in critical system updates. More technical details were reported by The Register.


