Partly Cloudy: Outage Raises Resiliency Concerns

Jeff Weber, Managing Director Technology Strategy and Operation

Everyone needs a little downtime – critical IT infrastructure, not so much. Security and reliability have long been the two primary enterprise concerns when it comes to the cloud. And while security has been the dominant concern over the past couple of years, recent high-profile cloud outages have brought reliability front and center.

A recent outage affected almost 150,000 sites. In the not so distant, cloud-less past, most companies would have had in-house servers, and the disruption would have been limited and isolated. Included in the outage was an internet messaging and chat service popular among IT professionals, who were quick to notice and spread the word. More importantly, this service enables IT services and communication and impacted organizations in their ability to maintain service levels.

Even companies with on-premise enterprise systems could find themselves unexpectedly cut off from critical services, vendor portals and clients, in the event of a service interruption at a cloud-based communications provider.

Cloud functionality affects virtually everyone. These days, if any company thinks it doesn’t have significant cloud exposure, it needs to think again. Now is the time for companies to be asking themselves whether their risk management framework is robust enough to identify risk exposure they may not have thought about.

The worst time to discover a critical exposure to a cloud outage is…well, always. Protiviti recommends that companies act now to conduct a cloud risk assessment and impact analysis and develop an effective response plan. Key elements include:

  • Conducting a thorough process review to identify any hidden cloud exposures
  • Identifying and prioritizing “crown jewels” – in this case, critical functions that must be protected from disruption
  • Comparing exposures against the company’s risk appetite and establishing a remediation threshold – for example, frequency and duration of outage
  • Creating an awareness of susceptibilities and developing response procedures

Although for many companies this type of exercise is new when it comes to cloud computing, it is essentially the same process they have applied in the past to telecommunications, infrastructure and other “always-on” systems and applications. The chief information officer should lead, or at least be at the table for this discussion, and ensure that the right people are involved in the conversation. Furthermore, the discussion should be conducted in business-relevant terms (risk, effect on operations) rather than IT terms (systems downtime, for example).

Public reaction to cloud outages, to date, has been relatively muted. That is likely to change, and quickly, as connectivity increases and digitization and the Internet of Things transforms existing business models. No one is really shocked that cloud outages happen, but now that they are on the radar, it is important to plan for the occasional yet inevitable “inclement weather.”

3 comments