Google apologizes for cloud outage that was a 'comedy of errors'
So it was not good that on Monday, the Google Compute Engine cloud service went down for 18 minutes at 7 p.m. Pacific for just about all of its customers everywhere.
Compute Engine is the cloud service that Google has launched to compete head-to-head with Amazon Web Services EC2, where companies can rent space on Google's computers accessed over the internet.
While the world did not spin into an apocalyptic frenzy because of the outage (it didn't impact Google's regular services like search or maps or Gmail), such a big outage was a black eye. Companies like BrightCove, DataStax, Evite, HTC, Zulily use Compute Engine, Google says.
More importantly, this is Google. Going dark for nearly 20 minutes just isn't supposed to happen. The company has systems and backup systems to prevent that.
So on Wednesday, Google published an apology, and a lengthy explanation. It also offered to credit its customers with 10% to 25% of their monthly bill, more than the refund it promises in its service agreement.
The non-technical TLDR version: Someone was doing a semi-routine update to the network and hit a bug. Then the automated failsafe software that should have caught the problem and automaticaly fix it also hit a bug. Then the software went nuts and sent the wrong technical information across the whole network and boom, the network went down.
All told, that 20 minutes outage caused Google to make "14 distinct engineering changes" to ensure its cloud won't go down like this again. And more changes are coming "as our engineering teams review the incident with other senior engineers across Google in the coming week."
The Google Cloud team says, "We recognize the severity of this outage, and we apologize to all of our customers for allowing it to occur."
But the whole thing still leaves a little egg on Google's face.
Wow, it turns out Google's GCE outage was a comedy of errors. Cloud platforms aren't as reliable as we think. https://t.co/8NV4aKwFOr- Sean Byrnes (@sbyrnes) April 13, 2016