Summary
There are several lessons to be learned here:
1 Have a second email account that is with another company, and ensure that your email contacts know of it.
2 Ensure that there are other means of communicating urgent messages, other than via email.
3 Ensure that there is a robust and well tested disaster recovery/contingency plan in place.
Analysis
The majority of Google's 150 million users of Gmail were left without access to their mailboxes for almost two hours yesterday, as a result of technical problems within Google.
Seemingly Google had not made adequate contingency plans for loading, when it took some of its servers offline for maintenance.
This is the third outage this year for Google Gmail.
To add to Google's humiliation, it seems that their attempts to fix the problem were hampered by the fact that their own engineers were unable to access their Gmail accounts.
There are several lessons to be learned here for both Google (which should know better) and the hapless 150 million users of Gmail (I am one of the 150 million):
1 Have a second email account that is with another company, and ensure that your email contacts know of it.
2 Ensure that there are other means of communicating urgent messages, other than via email.
Telephones, and even good "old fashioned" face to face discussions can be utilised in emergencies!
3 Ensure that there is a robust and well tested disaster recovery/contingency plan in place.
Google are more than welcome to use my pro forma guide Contingency Planning - The Basics as a starting point.
However, despite this humiliation, Google are not alone in failing when it comes to ensuring that their IT systems can withstand peak load.
Her Majesty's Revenue and Customs (HMRC) suffered a humiliating systems crash on 31 January 2008 (when people were trying to file their year end tax returns online).
Contrary to earlier stories spread by Dave Hartnett (CEO of HMRC at the time) that the crash was down to hacking and other nefarious issues, the real cause was far more mundane.
It was in fact a hardware fault that caused 15,000 people to be unable to file their tax returns.
It was in fact a hardware fault that caused 15,000 people to be unable to file their tax returns.
HMRC and Capgemini (IT supplier to HMRC) were quoted in Computerworld UK:
"The technical issues were caused by a hardware problem which was triggered by a spike in logins.
"The technical issues were caused by a hardware problem which was triggered by a spike in logins.
Our systems had been thoroughly capacity tested but this hardware problem meant that we did not manage the January 31 peak as well as we would have liked."
Nonsense!
The reality was that the "spike" was puny in terms of professional well run IT systems. Had they really tested it properly, and designed it well, this would not have happened.
The fourth lesson to be drawn from the HMRC fiasco, and one that all companies should remember, is that when you are explaining to your customers why the system failed, don't lie.
This author consults with leading institutions through GLG
Analyses are solely the work of the authors and have not been edited or endorsed by GLG.


