Managing Data Center Resources Is about Managing Changes
by Dr. Raju Pandey, SynapSense Co-Founder and Chief Technology Officer
Data centers are complex systems to manage. Much of this complexity arises from managing a wide range of resources, assets and complex relationships among the assets and resources. Most data centers have managed the operational complexity by overprovisioning both IT and facility resources: Excess server capacity ensures that an application will always find an execution host. Excess cooling and power capacity ensures that IT loads will not cause overheating or circuit overloads. While over-provisioning as a management tool has ensured reliable and resilient operations, it has also meant significant and continuous investment in resources that are underutilized and in many cases wasted. Most data centers are overcooled; most data centers have significant stranded power that could be better utilized; most data centers significantly under- utilize server capacity.
The key to managing resources involves managing changes in resource requirements. These changes may arise because a data center may need to increase its compute capacity by adding IT resources. The changes may also occur because the data centers may virtualize its computing resources, and shift load across public and private clouds based on specific policies. The changes in IT requirements then propagate through the entire resource chain: changes in IT loads require more IT resources, which require additional power and cooling resources. Overprovisioning addresses the change problem by allocating resources for the worst-case load requirements, which are rare occurrences.
What is needed is a set of DCIM automation tools that monitor for changes in resource requirements, that characterize these changes in terms of resource usage, and that gracefully adapt resource usage to match the changing requirements. Such an adaptation will need to take place both on the IT side and the facility side. The automation tools on the IT side will adapt the IT resources to the actual IT load through consolidation and load shifting across public and private clouds. The automation tools on the facility side will adapt the cooling and power infrastructure to precisely match the heat generated by the IT load at any given instance. It will involve consolidating the cooling units by turning then on or off, increasing or decreasing the volume of the air needed, and/or increasing or decreasing the temperature. The two tools could work autonomously – each adapting the data center independently or they could work cooperatively, each learning from the other with the vital clues about impending changes resource availability and additional hints about future changes.