Data centres – Why ASHRAE needs to steer clear of oversimplifications.
Since it was first published in 2004, ASHRAE’s ‘Thermal Guidelines for Data Processing Environments’ has been a highly-regarded, vendor-neutral reference resource for data centre designers, operators and managers. The latest 2021 5th edition has just been published and, while we welcome its continued focus on helping to optimise data centre thermal performance, we believe there are several areas where ASHRAE could have taken its best practice recommendations further.
At EkkoSense we have always been a strong supporter of ASHRAE’s mission to advance the art and sciences of heating, ventilation, air conditioning and refrigerating. However, we also try and help our customers to exceed ASHRAE guidelines wherever possible. So, here’s where we think ASHRAE’s 5th edition reference resource could do more – particularly by resolving some of the oversimplifications that, while well-intentioned, could cost energy and expose operators to potential risk. I’ve also flagged up two updates where ASHRAE are keeping pace with today’s data centre realities
Questioning variations in temperature
In the new 5th edition ASHRAE makes an argument that IT failure rates go up from 15°C, implying that -were data centres ran at even colder temperatures – reliability would get much better. I think this position is far too simplistic. In reality failures are rare, but typically at extremes above 27°C IT equipment tends to enter self-protection modes, shutting down important functions which is service affecting but is not damage. But we agree that this is a failure to the operation of a data centre even if it is not a failure of the equipment itself.
However, the regime where mechanical damage is more often sustained is counterintuitively under 20°C, particularly where rapid thermal cycling is taking place causing mechanical fatigue in components. From our observation over the last 20 years, the lower the temperature racks get the more we tend to see failures from mechanical stress as well as water, humidity, and corrosion damage. This effect clearly showing up when racks close to AHU’s are subjected to more extreme low temperatures than those further away in the room.
Based on this experience, I find it hard to believe the information in the ASHRAE guidelines that presents reliability degradation as a straight-line increase with temperature starting at 15°C without thresholds, when we know that different processes kick-in above and below certain temperatures. Given the lack of reliable data underpinning this commentary, I would suggest this only serves to confirm the need to track data centre temperatures and equipment performance and measure at a much more granular level. Only then can organisations make the right judgements about their critical data centre facilities.
The question of energy efficiency gains or losses approaching 27°C has always been a tricky one because internal equipment fans tend to ramp up with temperature. But since IT manufacturers are – not surprisingly – hesitant about releasing clear data on this, it will remain confused. I believe ASHRAE should simplify its position here and acknowledge the partial energy gains when IT equipment is run at higher temperatures, but make it very clear that this is more than made up for with mechanical efficiencies of the 25°C to 27°C inlet temperature range. A strong lead from ASHRAE in this area will do much to motivate the IT manufacturers that the thermal environments that we operate our IT systems in are changing and that higher temperature operation needs to be part of the IT equipment selection process. In fact, we would really like to see the ASHRAE guidelines publishing a road map for further increases in inlet temperature so that future data centre builds can plan for data centres coming into service in the mid 2020’s that will operate beyond 2050. Let’s not forget the commitments we have all signed up for net zero.
Temperature ranges across the room
ASHRAE has kept recommended temperature bands largely unchanged in its 5th edition. Interestingly, we find no reference to how temperatures vary significantly across data centre rooms – one of the great challenges we face in optimisation. In keeping things simple by trusting that one temperature value can be used to describe the overall inlet of an entire site, ASHRAE is instead under-estimating the variance and complexities involved. At EkkoSense we frequently see temperature ranges of 10°C+, that we can usually reduce to less than 5°C through our software-led optimisation approach. Therefore, perhaps ASHRAE can look at the standard deviation of the rack inlets as a useful measure as we do? Similarly, we see no discussion on temperature changes over time where again we see significant variations that can go unnoticed where often as many as four significant temperature swings per hour can occur as oversized AHU’s modulate into and out of their cooling regime, leading to the thermal stresses mentioned previously.
Stop measuring temperatures on IT rack exhausts
Credit to ASHRAE for trying to stop people measuring temperatures on the exhausts of IT racks. Our experience of measuring thousands of racks confirms that you can draw no conclusions from this data – so please, invest instead in measuring the front of all your racks. However, ASHRAE still isn’t consistent in its measurement recommendations. They still only advocate measuring just one out of every three racks. Do they believe the unmeasured racks have exactly the same temperature as their nearest measured neighbour? Surely, its time now that we recognise that there’s no meaningful relationship between one rack and its neighbour. I don’t understand why ASHRAE doesn’t say that ‘ideally’ every rack should have its own sensor – Maybe the 6th edition will make this logical step?
Encouraging the automated logging of HVAC data
The latest ASHRAE edition also recommends the automated logging HVAC equipment parameters, suggesting that this can provide valuable insights into operational trends and may simplify data collection.
That’s certainly our view at EkkoSense, where we’ve been tracking data centre cooling duty performance in real-time for several years. Monitoring ΔT’s, humidity levels and air mass flow rates allows us to monitor the operating status of the CRAC/AHU, and gives us the ability to capture actual changes to operating conditions between AHU’s and their respective racks. It’s a powerful capability, and it’s good news that ASHRAE has picked up on this important activity.
As I said earlier, the ASHRAE ‘Thermal Guidelines for Data Processing Environments’ is a great resource, and a valuable source of best practice thinking. I’m sure others will have further comments and suggestions for how the new edition could be further improved – I’m looking forward to seeing what makes it to the 6th edition! I’d love to hear your thoughts!