How Redundant is Your Data Center...Part 1
Have you ever been out to dinner and needed to use the restroom? Of course! You make your way to the stall and guess what? Occupied. It's no big deal since there are two stalls and the other is free. The building codes require builders to provide these built-in redundancies of "critical systems."
Think about the next time you choose to complete a task in one way, but need to change course and complete that same task using a different means. It happens all the time, and that's redundancy!
In data center applications, there aren't codes per se that govern redundancy and performance, and there don't need to be. Data centers without robust infrastructure cannot advertise high up-times and can thus lose marketshare to competitors who can. So in a certain way, the free market dictates an acceptable level of performance.
Folks in the Colocation and data center business might be familiar with an organization known as the...
(click the picture to explore)
These guys are good. They are so good that they created standards for data center redundancy and performance over two decades ago that are still referenced and implemented today. They can also certify folks capable of rating your facility (or the design of a facility) based on redundancy levels.
So let's start by taking a look at the the different redundancy levels The Uptime Institute has standardized along with the requirements. Note: BICS/ANSI uses a similar scale (F Classification) which we'll cover in the next article.
They call them Tiers and here's basically how they go...
The following italicized text is quoted from The Uptime Institute at:
(I have included inline commentary below in blue)
What are the Tiers?
Uptime Institute created the standard Tier Classification System to consistently evaluate various data center facilities in terms of potential site infrastructure performance, or uptime. The below is a summary and please see Tier Standard: Topology and accompanying Accredited Tier Designer Technical Papers.
The Tiers (I-IV) are progressive; each Tier incorporates the requirements of all the lower Tiers.
Tier I: Basic Capacity
A Tier I data center provides dedicated site infrastructure to support information technology beyond an office setting. Tier I infrastructure includes a dedicated space for IT systems; an uninterruptible power supply (UPS) to filter power spikes, sags, and momentary outages; dedicated cooling equipment that won’t get shut down at the end of normal office hours; and an engine generator to protect IT functions from extended power outages.
Commentary: This is most commonly seen in basic office building data rooms or small scale data centers that are not selling digital real-estate or providing a utility such as internet or digital media. In other words, when the data processing is done for completely private purposes and revenue is not being generated.
Tier II: Redundant Capacity Components
Tier II facilities include redundant critical power and cooling components to provide select maintenance opportunities and an increased margin of safety against IT process disruptions that would result from site infrastructure equipment failures. The redundant components include power and cooling equipment such as UPS modules, chillers or pumps, and engine generators.
Commentary: This scenario happens accidentally all the time. It's actually one of the hardest to picture. I like to call it System Wide N + 0.5. Imagine N+1 UPS modules being fed from power infrastructure that may have a single point of failure between the redundant power sources (utility and on-site generation). This could be switchgear, conductors, etc. Or chilled water computer room air conditioning systems installed in N+1 configuration but with a single chilled water main set serving all of the equipment. If you shut a valve on that line, all of your equipment fails to cool, but you have N+1 equipment capacity available. So, N+1 equipment and N source distribution path, or N+0.5.
Tier III: Concurrently Maintainable
A Tier III data center requires no shutdowns for equipment replacement and maintenance. A redundant delivery path for power and cooling is added to the redundant critical components of Tier II so that each and every component needed to support the IT processing environment can be shut down and maintained without impact on the IT operation.
Commentary: Tier III is Tier II but now considering distribution paths. In my humble opinion, true N+1 includes some form of distribution redundancy or considerations should be made for back feeding all equipment for cooling, power and data transmission. To meet this requirement, both paths do not need to be active at all times, but they do need to exist.
Tier IV: Fault Tolerance
Tier IV site infrastructure builds on Tier III, adding the concept of Fault Tolerance to the site infrastructure topology. Fault Tolerance means that when individual equipment failures or distribution path interruptions occur, the effects of the events are stopped short of the IT operations.
Commentary: Tier IIII is Tier III but incorporating upstream considerations and redundant active distribution, so any single fault or failure within infrastructure is isolated from critical operations.
Table Comparison from Uptime Institute:
The tier classification directly addresses power and cooling within their general definitions but there are few other critical items that may need to be considered:
Utility power sources
Utility network reliability and hardness
Utility distribution network and substation configuration
On-Site distribution and reliability
Fire suppression systems
A lot more!
Each one of the above could warrant their own article. The above list is presented to get your gears turning. The point being, there's a lot that goes into increasing uptime. And, of course, even if you spare all expenses you still can't eliminate risk, but you sure can greatly reduce it.
As always, this article is far from exhaustive but intended to get you to think about your data center infrastructure past your IT room walls, past your building walls and maybe even past your property line. If your an executive considering a colocation or heavily leveraging cloud computing, this is a good guide to develop questions to ask potential service providers.
For additional information or a consultation, feel free to reach out and contact us. Find out how we design and commission