Answer your key Liquid Cooling engineering questions
With data centers scaling up rapidly to meet accelerating AI, HPC, and big data analytics demands, there’s a huge requirement for data center cooling solutions that can keep pace with the many ultra high-density AI racks that regularly consume upwards of 100-125 kW per rack.
And with these IT racks worth potentially $10 million plus each in terms of hardware and cooling equipment, data center operations management teams are rightly concerned that they are cooled effectively and will continue to perform optimally under potentially challenging thermal conditions. Traditional air cooling alone is clearly no longer enough to meet AI computing cooling requirements, and this has led to a surge in demand for data center liquid cooling systems. Indeed McKinsey estimates that liquid cooling systems will account for almost half of the global data center cooling market by 2030.
Liquid Cooling benefits
Integrating liquid cooling brings a number of clear thermal management benefits, including improved thermal conductivity and a heat capacity that enables the technology to first absorb and then transfer heat more effectively. This enables operations teams to ensure lower overall operating temperatures. Given this efficiency, liquid cooling can deliver reductions in overall data center cooling energy usage, while its more compact size supports increases in power density.
Optimizing data center cooling strategies
Given that it’s not possible to run completely liquid-cooled data centers, the reality for most operators is that liquid cooling and air cooling will both have an important role to play in the cooling mix – most likely as part of an evolving hybrid cooling approach.
We’re already starting to see a mix of data center cooling options, ranging from traditional air cooling through enhanced air cooling options such as in-row, rear-door cooling, and high volume fan walls, to direct-to-chip and immersion liquid cooling. Not surprisingly, data center operations teams are busy considering how their plans to accommodate higher density AI racks will impact their cooling solutions strategy going forward.
Managing the transition to this kind of hybrid cooling approach will be challenging for data center technology teams looking to provide an effective and flexible cooling strategy to support potentially hundreds of millions of dollars of investment in AI compute hardware and cooling infrastructure.
Key engineering questions need answering
However, a number of key engineering questions still need answering before deploying liquid cooling – including analysing the optimised blend of air and liquid required for dynamic IT loads.
Selecting the optimum approach also requires a careful assessment of risk, performance and long-term costs. Whether it’s managing leak risks, recognizing potential issues such as rising heat flux, or acknowledging the need to deliver a consistent thermal performance under demanding AI workloads.
Key questions here that data center operations teams need to be asking include:
- What is the optimal hybrid liquid/air cooling mix across each of your rooms?
- How do you plan to keep this in sync? What temperatures can your new AI compute engines safely operate at?
- What steps have you taken to manage liquid cooling leak risks? Will you be able to pick up on potential leaks before they start to impact performance?
- How are you addressing potential issues such as rising heat flux, especially with dynamic IT loads?
- How do you know if your CDUs are delivering a thermally-uniform performance across their target racks running AI and HPC workloads?
- How do you establish the key points for the data monitoring of liquid cooling flows? How granular should liquid cooling monitoring be? What exactly are you looking to measure?
- How are you monitoring the key chilled water assets that support your data sites? Do you have visibility of the holistic performance of the system?
- How do you balance chiller staging, flow rates and temperatures with your fluctuating IT load requirements?
- How do you find the sweet spot where data collection works best for chillers in AI data centers?
- Who actually ‘owns’ the control and, crucially, the ongoing configuration piece for your liquid cooling deployments?
- How are you considering cooling pre-conditioning for the introduction of dynamic IT loads? Are you considering feedback triggers from systems to back-off loads in the event of cooling anomalies with your liquid systems?
EkkoSense gets you the Liquid Cooling answers you need
It’s entirely normal for data center operations to have concerns when they’re investing significantly in expensive areas of risk such as Liquid Cooling, particularly when they bring concerns that they don’t yet fully understand.
The transition to hybrid air and liquid cooling to support AI computing needs careful design, planning, deployment and ongoing management. Placing new AI workloads demands smart capacity management, with careful consideration of space, power, and air & liquid cooling requirements. This increases the need for absolute real-time white and grey space visibility.
Here’s where EkkoSense makes the difference. Our EkkoSoft Critical software is the only AI-powered data center optimization platform that allows operations teams to manage the real-time performance of all their air-cooled, liquid-cooled and hybrid-cooled environments.
With liquid cooling deployments, there’s always an element of air cooling can be anywhere between 15-30% of a liquid cooling installation depending on the configuration and CDUs installed. So there’s a huge potential for optimization of this balance to minimize the ratio. And because AI workload heat loads are so immense, you could actually end up using more air cooling than you ever did before the introduction of liquid cooling!
‘Single pane of glass’ optimization for air cooling, liquid cooling or hybrid environments
Without effective instrumentation you’ll never know if you have a problem
Successful liquid cooling deployments are all about understanding potential risks and having the right level of visibility into increasingly complex cooling installations. When you’re investing hundreds of millions in AI data centers, lack of visibility is simply too big a risk for most operators.
EkkoSense is the only company to put an extensive monitoring wrap around your liquid cooling operations. We understand that unless you effectively instrument all aspects of your cooling, then you’ll never know if you have a problem. And given the costs involved, it’s infinitely better to identify and resolve potential liquid cooling issues before they start impacting availability.
Absolute white and grey space visibility with EkkoSoft Critical
Working with EkkoSoft Critical is a great way for operations teams to take control of their high-density AI and HPC computing cooling requirements. We’ve designed our software to be cooling-agnostic, so whether you’re looking to first ensure that your air cooling performance is fully-optimised, or you want to make sure your hybrid air and liquid cooling approach is performing optimally, EkkoSoft Critical is the platform to support your transition.
Extending EkkoSoft Critical support for Liquid Cooling means that EkkoSense is ideally placed to equip data center teams with the absolute real-time visibility needed to optimize hybrid cooling across their white and grey space operations. Our 3D visualization and analytics software can accommodate all cooling architectures, allowing operations teams to monitor, visualize and optimize all their different liquid and air cooling types and strategies.
Because EkkoSoft Critical lets data centers visualize cooling performance at a much more granular level, operations teams can ensure their air cooling, liquid cooling or hybrid environment remains fully optimized – particularly as AI and HPC workloads continue to scale upwards.
EkkoSense offers a broad portfolio of innovative optimization technologies for data center liquid cooling, and our solutions are showcased in a number of dedicated liquid cooling labs around the world. These include…
Auto Anomaly Detection
Machine learning based cooling anomalies and alerting for the early detection of cooling, airflow, and flow-rate faults to prevent service-impacting issues.
Hybrid Cooling Advisor
Offering the world’s first in-room optimization engine that supports hybrid liquid and air cooling for rooms with high rack loads.
Comprehensive CDU Integration
Offering integration with immersion and CDU components for flow rate analytics, visuals and optimization.
EkkoChill
Vendor-agnostic chillers, pumps, and control optimization engine provides a repeatable approach to liquid system optimization.
EkkoSensor
Low-cost air-side and direct-to-chip & Immersion cooling liquid-side monitoring sensors for Temp/RH environmental
compliance.
EkkoSim
Data center simulation software supports water-cooled chillers, liquid-cooled IT racks, chilled water, split DX units and immersion cooling.
Chiller Optimization
EkkoSense Chiller Optimization algorithms not only support grey space performance but also consider real-time data center activities to ensure true end-to-end cooling system optimization.
EkkoFlow
Universal retrofit monitoring device for liquid-based systems including CDUs, immersion cooling pods, and chillers.
Liquid Cooling 3D Optimization
Core data center optimization, 3D visualization and analytics software.
Contact EkkoSense to continue the conversation to discuss your data center liquid cooling optimization challenges. Try an instant free demo or read our customer case studies here.
Download this article as a pdf eBook here.