HVAC Doesn't Create Cold — It Removes Heat

This is the first of a series of URE articles about thermal management in data center environments—not theory, not “best practices,” but what actually happens when heat meets physics and scale.

Here’s a simple puzzle from two idle machines.

ai01 — home lab, Threadripper 32-core with 2× NVIDIA GPUs (NVLink), rack-level liquid cooling loop, used for ML training and vLLM inference:

Tctl:         +33.0°C
Tccd1:        +33.2°C
Tccd5:        +31.5°C

nj01 — third-party datacenter (colo), Ryzen 12-core, air-cooled:

Tctl:         +42.5°C

The colo server is nearly 10°C hotter at idle than a liquid-cooled GPU training rig sitting in a home lab.

That sounds wrong until you remember what cooling really is.

Cooling Doesn’t “Create Cold”

HVAC doesn’t manufacture cold air. It removes heat from a loop.

In a hot aisle / cold aisle setup, the same air keeps circulating:

Cold aisle gets supply air
Servers ingest it and dump heat into it
Hot aisle collects the exhaust
CRAC/CRAH removes heat and sends it back as “cold” air
Repeat forever

So the question isn’t “how cold is the air?” It’s: how many kilowatts of heat can you remove, continuously, without losing control of inlet temperature?

Why Two Idle Machines Can Look So Different

ai01 is liquid-cooled at the component level. Heat leaves the CPU (and GPUs) through cold plates and coolant, then gets rejected at radiators. The critical path (silicon → coolant) is extremely effective.

nj01 is air-cooled, and it lives in an environment optimized for efficiency and density, not “minimum CPU temperature at idle.” Datacenters routinely run warmer supply air on purpose because it improves overall efficiency—as long as inlet temps stay within equipment limits.

So a higher idle Tctl in a colo isn’t automatically “bad.” It’s usually just the natural result of:

warmer inlet air than a home lab loop delivers to the CPU socket area
conservative fan curves / acoustic policies
platform/BIOS sensor offsets and boosting behavior
the fact that datacenters manage systems, not one CPU temp number

But here’s the important part:

Even if nj01 isn’t “over spec,” it’s clearly starting closer to the edge than ai01—just from the idle baseline.

The Part People Miss: Scale Changes Everything

It’s easy to “fix air” for:

one server
maybe one rack

You can add blanking panels, improve containment, tweak fan curves, tune an aisle, close bypass paths, and call it a day.

Now jump to:

10 racks
50 racks
100 racks
500 racks

At that point, heat generation is absurd. Not “warm.” Not “a little hot.” Industrial heat.

This is the real story behind modern GPU infrastructure: we are taking power densities that used to be rare and making them normal.

And the world is adjusting in real time.

“Colo Will Handle It” Is the Assumption — Reality Is Harder

Most companies take it for granted that colo/data center providers handle cooling “like a charm.” And many do a great job—within the envelope they were designed for.

But the envelope is changing.

AI-era requirements are pushing:

higher per-rack power
more synchronized load spikes (training clusters don’t ramp smoothly)
tighter performance expectations (throttling is a silent tax)
more racks per deployment (clusters become buildings inside buildings)

So when you see something like nj01 idling above 40°C while a liquid-cooled system idles at 33°C, the takeaway isn’t “colo is bad.”

The takeaway is this:

Air is a shared, building-level resource. Liquid is a local, engineered heat removal path.

And the gap between those two worlds grows fast as power density rises.

Why This Matters (Even at Idle)

Idle temperature is not the full story. It’s just the baseline.

But baseline matters because it defines how much margin you have when reality hits:

load ramps
neighbors heat the aisle
a containment leak appears
airflow shifts
a CRAH is offline
a control loop lags
a “normal” day becomes a “hot” day

If you start at 42.5°C idle, you simply have less room before you hit the points where clocks start dropping and performance becomes inconsistent.

And in GPU fleets, inconsistency is the killer:

training jobs get stragglers
steps get jitter
throughput drops without obvious “utilization” alarms
tail latency gets ugly

Bottom Line

This article is intentionally simple: two idle machines, two environments, two very different thermals.

ai01 shows what happens when you pull heat away locally with liquid cooling and you own the thermal path.

nj01 shows the reality of air cooling in a shared, efficiency-optimized environment—where you don’t control all the variables, and where the global industry is actively wrestling with new density requirements.

Next in this series, we’ll go one step deeper: how airflow and ΔT set the real ceiling, why “more CFM” isn’t a magic answer at fleet scale, and how thermal headroom turns into performance headroom (or failure modes) when the cluster is big enough to behave like a single machine.

Cooling Doesn’t “Create Cold”#

Why Two Idle Machines Can Look So Different#

The Part People Miss: Scale Changes Everything#

“Colo Will Handle It” Is the Assumption — Reality Is Harder#

Why This Matters (Even at Idle)#

Bottom Line#