The EkkoSense® Guide to

Balancing increased workload demands with energy reduction goals.

Paul Milburn
Chief Product Officer

July 2021

Delivering proven machine learning and AI benefits for the M&E space.

Today’s data centers face a challenge that, initially, looks like it's almost impossible to resolve. Their operations have never been busier, and analyst projections suggest that workload levels are only going to increase over the next five years – with some projecting a 21% CAGR growth rate between now and 2025. At the same time, critical facilities such as data centers are coming under increased pressure to reduce their energy consumption - particularly as the reality of corporate net zero commitments start to bite.

So how can data centers resolve what appear to be two potentially conflicting demands – supporting escalating workloads while still cutting their carbon usage?

In this White Paper EkkoSense argues that traditional software toolsets - such as BMS, EPMS, CFD and DCIM - can’t provide a credible answer as they don’t equip operations teams with a complete view of what’s happening in their data centers.

Instead, the answer requires a fundamentally different and more innovative approach, combining much more comprehensive sensing, advanced software optimization tools and the power of machine learning and AI algorithms to provide a true real-time visualization of data center performance.

Direct benefits of EkkoSense AI and machine learning enabled software-based optimization:

Moving beyond traditional data center infrastructure tools.

Before unlocking these potential benefits, data center operations need to look beyond their legacy critical infrastructure software toolsets that don’t give them the answers they need to successfully balance increased workloads and deliver on their carbon reduction targets.

To understand this, it’s worth considering the findings of EkkoSense research looking into the performance of many of the typical software platforms currently used in data centers. Thermal-related issues still account for around a third of unplanned data center outages. Some 15% of racks remain outside of ASHRAE guidelines for inlet temperatures, and there’s a massive industry-wide over-provision of cooling, with average cooling utilization sitting at just 40%.

Unfortunately, while traditional data center infrastructure toolsets may have very solid use cases, they’re less flexible when it comes to directly supporting real-time optimization activities. BMS is of course a key platform, but it typically doesn’t feature any analytics and is usually designed to alert only on hard faults or SLA breaches – which is often too late to prevent outages. Similarly, Electrical Power Management Systems are often treated as a BMS extension.

While CFD systems can be great to support new build projects or major design changes, they typically only utilize data from a specific point in time under set parameters – making it overly complex to unlock real-time optimization opportunities. DCIM systems too tend to be largely driven out of IT requirements, with very little focus on the M&E side. Because many DCIM vendors originated from the IT side of the fence and, although they might have claimed comprehensive functionality for ‘inside the rack’, none of them have yet to properly address the very real M&E needs of data center operators – particularly when it comes to overall energy efficiency and capacity management.

Given these inefficiencies, it’s hardly surprising that the Uptime Institute reported that the global data center industry was on track to waste some 140-billion-kilowatt hours of energy in 2020, or about $18 billion worth of inefficient cooling and poor airflow management. There’s no doubt that the size of the optimization prize is considerable. But achieving this will require a break from legacy infrastructure tools with an AI and machine learning approach that doesn’t just highlight problems as they occur, but delivers intelligent insights in real-time – along with recommended actions before potential issues even have a chance to develop.

Setting out how machine learning and AI can really make a difference.

For several years, AI and machine learning have been used as buzzwords to signal the vision of an automated data center that’s more resilient and costs less to run. But the reality as we have shown is that most data center operators are still living in a reactive world – often spending most of their time chasing down problems and putting out fires. While data center teams of course recognize the value that machine learning and AI could bring to their operations, it’s important now to not just see it as a universal answer to all their concerns. Rather than consider AI as a kind of miracle data center infrastructure management plugin – one that can suddenly monitor, manage and optimize all of a data center’s power, cooling and capacity requirements - it makes more sense to focus initially on specific areas where machine learning can be applied and can deliver significant results.

The focus needs to be on those areas where machine learning and AI techniques can be applied quickly for immediate benefits. Cooling optimization and airflow management are proven areas where the technology can deliver tangible results.

Effective AI solutions rely on extensive access to accurate granular sensor data, so that they can learn and adapt when exposed to new findings. Machine learning systems are capable of analyzing information from very large data sets, and then detecting and extrapolating patterns from that data to apply them to evolving scenarios. And because today’s IT platforms clearly have access to massive computing power, they are now able to identify, analyze and act on data. However, the key question remains: what data are they going to process to get the AI insights they require?

How machine learning and AI can support the five key stages of effective software-based optimization.

At EkkoSense we focus on five core optimization stages:

Taking things to the next level with 3D digital twins that help manage the machine learning process.

The only reliable way for data center teams to troubleshoot and optimize data center performance is to gather massive amounts of data from right across the facility with no sampling. This removes the risk of visibility gaps, but also introduces a potential challenge in terms of the sheer volume of real-time data that is being collected.

Combining the power of machine learning with real-time data from a ‘fully-sensed’ room enables the creation of a digital twin of an organization’s data center layout – one that not only visually represents current cooling, power and thermal conditions, but also provides tangible recommendations for optimization. This level of decision support can help operations teams to take things to the next level, as the software will recognise when changes have been made and even refresh the digital twin to reflect updates.

At EkkoSense we address this through the powerful 3D visualization capabilities offered by our EkkoSoft Critical software, enabling the building, population and editing of customer data center rooms for presentation via first person, plan, 3D views as well as power schematics. Combining the power of machine learning with real-time data from a ‘fully-sensed’ room enables the creation of a fully-realised digital twin of your data center layout – one that not only visually represents current thermal conditions, but also provides tangible recommendations for thermal, power and capacity optimization.

Driving software-based optimization with EkkoSense machine learning algorithms.

Instead of simply automating systems and trusting AI to get on with managing the sensitive security and controls needed for critical data center cooling duty performance, EkkoSense believes in a more productive approach.

Gather cooling, power and space data at a granular level, visualize that complex data to make it easier to compare changes and highlight anomalies, and then use machine learning algorithms to identify specific areas where actions could be needed.

Using machine learning to define cooling Zones of Influence.

By capturing entirely new levels of data center cooling data (going beyond basic temperature measurements to also include energy usage and airflow distribution) EkkoSense can map zone-by-zone cooling analytics within the data center to help with resiliency and capacity planning decisions. This creates live airflow ‘zones of influence’ that can group racks into clusters that specifically match the cooling unit that’s provisioning them. By then sharing these insights with Cooling Advisor – the industry’s first embedded AI-enabled advisory solution – EkkoSense is then able to show precisely which critical equipment would be impacted by changes to cooling configuration, and advise on the most efficient operational changes.

With this kind of real-time thermal monitoring in place, data center teams can track cooling outputs and identify any poorly performing cooling systems in advance so timely improvements can be made. EkkoSense supports both granular rack and cooling unit level monitoring to find the hidden – but easy to fix – cooling and airflow problems that typical cooling PPMs and BMS systems fail to find or diagnose.

Taking advantage of a powerful clustering algorithm, the Zones of Influence approach works particularly well for data centers that are air-cooled, either under floor or directly into the space – effectively supporting most data centers worldwide from legacy systems to new builds. By knowing ‘which’ equipment is reliant on ‘which’ cooling helps determine a risk profile for the operation of the data center. Because most data centers have sizeable cooling redundancy still, single inactive units may not be such an issue – but a faulty unit or an inefficient one may cause recirculation problems that have a direct effect on the equipment within their zone.

Visualizing critical equipment performance within data centers.

Viewing data center rooms via real-time 3D visualizations provides a great visual representation of temperatures thanks to monitoring points on all the critical equipment. However, while it provides an immediate way of alerting and managing SLAs, the underlying cooling relationship remains hidden. Thanks to the Zones of Influence machine learning cooling algorithm, a different view can show the same section of the room broken into cooling zones (as shown on the right side of the image). You can see that the cooling unit at the top of this image is off, meaning that the orange cooling unit next to it is having to provide cooling to four rows of racks. This then requires further insight as to how data is captured on individual racks.

Mapping racks with cooling units.

Viewing data center rooms via real-time 3D visualizations provides a great visual representation of temperatures thanks to monitoring points on all the critical equipment. However, while it provides an immediate way of alerting and managing SLAs, the underlying cooling relationship remains hidden. Thanks to the Zones of Influence machine learning cooling algorithm, a different view can show the same section of the room broken into cooling zones (as shown on the right side of the image). You can see that the cooling unit at the top of this image is off, meaning that the orange cooling unit next to it is having to provide cooling to four rows of racks. This then requires further insight as to how data is captured on individual racks.

By then taking two racks that are in the neighbouring cluster, and bringing up some of the data we can see that the thermal profile for the rack inlet temperatures doesn’t match the cooling unit supply temperature well. The circles marked on the graph below clearly show areas where the rack inlet temperature is falling despite the cooling backing off. While this example is quite dramatic in terms of temperature swings, with current sensor technology it is also possible to deduce these patterns where the fluctuations are much smaller. In fact, one of the key values from this Zones of Influence approach is not so much how the absolute temperatures match up, but more in the relative movement of the sensor. And given that the latest digital sensors available don’t require any calibration, they can provide a critical source of machine learning data for automation systems.

Adapting to operational changes.

Taking advantage of EkkoSense’s machine learning algorithms and its Zones of Influence approach means that data center teams can be even more responsive to change. If, for example, a cooling unit is switched off, the EkkoSense model will recognize the change, adapt at the next iteration following the change, and then take a few iterations to return a new cooling zone configuration. In this first example, while the rack temperatures are within ASHRAE guidelines, the blue cooling unit is not impacting many racks. It’s a clear example of too much airflow being provided into a space, hence the optimization advice.

In acting on this advice, the green cooling unit at the top is put into standby mode (see above) which has several interesting effects from a Zones of Influence perspective. Firstly, on the left-hand side, the temperature distribution of the IT equipment becomes more evenly spread. While on the right-hand side, average rack temperature has lifted for the smaller row and passive equipment – but in an area of the room that’s less thermally at risk. However, the most obvious change is in the center aisle, which now has a lower but more consistent inlet temperature. The cooling zones clearly show an advancement of the blue zone, as well as the red zone expanding to fill the area previously being cooled by the green cooling unit.

Air, being one of the poorest conductors of heat, gives a very long latency time to be able to validate the changes and align the racks to the appropriate zones – even in cases where the changes to a particular rack can be noticed quite early. Currently for the EkkoSense machine learning algorithm to adjust to this change takes between one and four hours – depending on the complexity of the room and the rate of change within the data center. It is this, combined with the impact that a cooling change makes when it has to ‘distribute’ a balance throughout the remaining operational cooling units that takes the most time – typically around one to two hours.

Additional cooling zone benefits that can be derived from cluster monitoring.

Adopting the EkkoSense Zones of Influence approach can also deliver key benefits in terms of resilience testing, weighted resilience testing and enhanced capacity management.

Cooling zones can provide very effective redundancy and resilience testing. These tests are time-consuming, and being able to identify and target key units provides a much higher degree of confidence in the ongoing resilience of the cooling plant within a facility.

Supporting enhanced capacity planning.

Having cooling zones in place also highlights the local cooling capacity within those zones – meaning that any new rack or equipment placement can be sense-checked prior to deployment.

In this example, a capacity request for a new rack requiring 3kW of power and cooling is initially rejected due to insufficient cooling within the zone. Providing the model is kept up to date with existing allocated loads and future reserved loads, then the cooling zones can be used to ensure that capacity is not breached at some later date due to iterative but unconnected changes taking place over time. In the above example, you can see that the new rack was accepted in a new location within the room. However, without the control offered via cooling zones, the decision may have been to still place the new rack in the first location but with the support of additional cooling capacity – adding both significant costs as well as additional carbon loads.

Improving the data model.

Applying machine learning algorithms and taking advantage of Zones of Influence within the data center optimization process not only allows teams to build the connections between their racks and cooling units, but also start to see how these relationships can impact airflow within spaces.

Applying clustering algorithms clearly helps to uncover previously hidden cooling configurations and relationships within the data, however these can only be as good as the feature sets they are provided with. Adding rack power data, for example, can considerably enhance the model – making it possible to represent the full data center cooling cycle with twinned supply and return cooling zones.

Enhance operating efficiencies by making the most of anomaly detection.

Machine learning also proves particularly valuable in terms of uncovering potential anomalies that are hidden within more complex data sets.

In the example featured, it’s much easier to pick up an outlier when presented in a 2D plot and 3D representation than when it is completely masked in data that’s presented as raw time-series charts.

In this example we see a cooling unit and a subsection of a data center across a period of two consecutive days. At first glance things look relatively similar – some racks are hotter than others and some are colder. None of the racks are outside of compliance thresholds and the cooling unit would appear to the engineers to be operating the same as before. Overlaying the images on to each other, however, and it starts to become apparent that the thermal ‘profile’ of the room has changed with a significant change in performance over the past 24 hours.

Another example demonstrates the ability to pick up on cooling unit failures. Consider the consequences of a change when an anomaly implies an inefficient or failing cooling unit that could lead to a thermal outage. Above we see how a unit is continuing with a normal, steady cooling duty profile. At some stage, the unit starts operating outside its normal parameters. Fortunately, the EkkoSense machine learning algorithm detects this instability as it becomes unstable and increased cooling load is detected on nearby cooling units before the original unit fails completely.

In this case the BMS would not have alerted as the cooling unit failure was not component-based and wouldn’t have triggered a hard SLA breach. While the nearby cooling unit continues to cool, it would ultimately be unable to prevent a localised hot spot. This failure was picked up thanks to a combination of the anomaly detection on the cooling units combined with granular level of sensors at rack level, allowing the faulty unit to be repaired quickly and brought back online under normal operating conditions.

Turning machine learning insights into AI-powered actions – Cooling Advisor.

We’ve seen how comprehensive monitoring combined with machine learning algorithms can uncover and continually track the cooling zones within a data center – presenting a core foundation for effective automated cooling within critical facilities.

However, there’s still considerable concern and reluctance when it comes to handing over control to what is essentially a machine learning powered black box.

To avoid this concern, EkkoSense believes it’s important to concentrate on making the expertise behind our machine learning algorithms as intuitive and accessible as possible. Clear 3D visualizations and digital twin representations play a key role here, but it’s also necessary to provide data center teams with the human auditability that’s so important to them.

The 1st embedded advisory solution powered by AI and machine learning analytics.

That’s where Cooling Advisor fits in, providing data center operations teams with focused power, space and cooling performance recommendations and advisory actions. Built right into the heart of our EkkoSoft® Critical software, Cooling Advisor is the industry’s first embedded advisory solution. It helps organisations to unlock significant data center optimization benefits – including significant cooling energy savings – just by acting on the software’s advice alone.

Cooling Advisor’s actionable changes – such as suggested optimum cooling unit setpoint adjustments, fan speed points and standby settings, changes to floor grille layouts, checking that cooling units are running to specification as well as advice on optimum rack locations – are based on EkkoSoft Critical’s powerful AI and machine learning analytics.

Recommendations are presented each time for human auditability before data center operations team members make the suggested changes. They can then use EkkoSense’s data center performance optimization solution to confirm that Cooling Advisor recommendations are delivering the expected results. Cooling Advisor operation is particularly intuitive, with potential risk mitigated by the defining of clear action steps, the provision of obvious back-out mechanisms, as well as the ability to flag and unflag items so that optimization suggestions aren’t repeatedly given for changes that cannot be implemented.

With the latest EkkoSoft Critical software-based data center optimization solution already unlocking data center cooling energy savings of up to 30% per annum, Cooling Advisor goes one step further, with clear recommended actions that take advantage of EkkoSense’s embedded PhD-level optimization expertise in the form of cooling unit algorithms updated by learnings to equip data center teams with a powerful self-optimization capability.

By adopting analytics and machine learning, EkkoSense ensures that Cooling Advisor delivers advice that is specific to each data center room. This approach recognises that data centers never stay the same. Using Cooling Advisor and following its recommendations will allow operations teams to keep on unlocking savings on their cooling costs.

Five key takeaways for machine learning and AI-powered software-based optimization.

Optimizing data center performance with EkkoSense.

EkkoSense is a global leader in the provision of software-driven thermal optimization solutions for critical live environments.

With its powerful EkkoSoft Critical SaaS 3D visualization and analytics solution for data centers, EkkoSense is making it even easier for data center operations teams to collect granular real-time data, visualize airflow management improvements, manage complex capacity decisions, and quickly highlight any worrying trends in cooling performance.

The key difference with the EkkoSense approach is that the solutions not only pick up the problems or underlying negative trends but also suggest best practice solutions based on EkkoSoft Critical’s extensive knowledge base and deep analytics capability. This effectively removes data center thermal risks and provides 100% rack-level ASHRAE thermal compliance. All this comes at a fraction of the cost of more expensive and complex legacy data center DCIM or CFD solutions, and offers a genuine ROI of less that 12 months in most cases.

EkkoSense has already helped its clients to reduce their cooling power-related carbon emissions by around 4,100 tonnes CO2-eq per year – equivalent to a cumulative 10 MW+ cooling power saving and a $10 million cooling energy cost saving. These totals are being added to on a daily basis.

Paul Milburn
Chief Product Officer, EkkoSense