Availability: How Many 9s Are Enough?

By: Bob Landstrom

In today’s always-on world, it’s ironic that the most vital element powering the digital economy – data centres – are invisible. Business and individuals are so accustomed to seamless, 3G and 4G communication that they forget they’re being powered by bottom-of-the-stack technologies.

Unfortunately, it’s only when interruptions occur that the population understands how valuable data centres are to the digital supply chain. Recent international headlines announcing major networking outages, such as the United Airlines’ flight cancellations and the Singapore Exchange failure, provide a powerful lesson in how downtime translates into lost revenue.

The question then remains, how do businesses and their IT prevent downtime and, therefore, avoid jeopardising corporate reputation? Understanding the term “availability” in the context of the number of “nines” as they both relate to data centres will provide clarity to business choosing the best ways to stay up and running.

Availability vs. Reliability

First, it’s important to understand the distinction between “availability” and its better-understood counterpart, “reliability.” Availability is a conditional probability that the system delivers the required service at the time it’s called upon to do so. Availability considers that when the system fails, it can be repaired and restored for service. That is, it includes the Mean Time to Repair (“MTTR,”) which is a critical aspect of availability and an important performance metric to consider when negotiating a service-level agreement (“SLA”) with a service prover. Availability is often expressed in terms of some number of nines. “Three 9’s,” for example, is the same as saying “99.9 percent availability.”

Availability is different from “reliability,” which is simply the probability of the system performing its function for a given period of time, under certain conditions. Reliability is often expressed in terms of Mean Time between Failure (“MTBF”) or Failure Rate. Unlike availability, reliability is not a conditional probability, and does not include maintenance or repair.

How Many 9s Are Enough?

Figure 1 below demonstrates the amount of time annually that a system is down (or unavailable,) given the number of nines availability that system has. This shows that that even a five-9s system has, on average, over five minutes of downtime annually.

% Availability Amount of time unavailable annually
99 88 Hours
99.9 8.8 Hours
99.99 53 Minutes
99.999 5.3 Minutes
99.9999 32 Seconds

 Figure 1

Settling for anything less than 5-nines availability could be the difference between revenue gains or loss. For businesses to maintain continuity, they must strive for service that is as close to 100 percent available as possible.

Availability in Real Life
The importance of availability in the data centre can best be understood when compared to real-life use cases. The following calculations all pertain to a three 9’s (99.9 percent) availability value:

  • 44 minutes of unsafe drinking water per month
  • 3 crash-landings per week at Heathrow
  • 3,000 letters lost by the Postal Service every hour
  • 2,000 surgical mistakes in the NHS every week
  • 9,000 incorrect banking debits per hour
  • 36,000 missed heartbeats per year (9 hours)

Like data centre operations, each preceding scenario has an exceedingly slim margin for error. Given the prevalence of real-world downtime within three 9’s availability, it’s evident that this performance rate is unacceptable.

Choosing Superior Availability

Modern computing demands make data centre availability as important today as ever. As businesses continue to shift toward hybrid IT environments, business-critical processes increasingly rely on applications from the cloud. Cloud services must be delivered from data centres offering five-9s availability and a proven track record over time.

It’s important to remember that availability depends on various characteristics of the data centre, such as design and operation, connectivity options, proximity to business and consumer centres, and scalability of power and space. Therefore, when selecting a data centre partner, business should consider multiple factors – a well-trained data centre operations team, mature Method of Procedures (“MOPs”) and Standard Operating Procedures (“SOPs”), disciplined security processes, superior engineering – that all combine to ensure superior availability performance.

Hypothetical tier ratings are interesting, but a demonstrated track record of strong availability performance, along with evidence of mature and disciplined data centre operations is important for minimising risk to a business.