Article
Power, not GPUs, is the limiting factor for AI data centers

Power, not GPUs, is the limiting factor for AI data centers

March 31, 2026

As AI workloads drive extreme rack densities, traditional cooling architectures are breaking down—and thermal management OEMs must evolve.

Today, the real constraint for scaling artificial intelligence is electricity—but in two distinct ways. First, grid availability: securing sufficient power connections to data center sites remains an ongoing infrastructure bottleneck. Second, power density: once power reaches the facility, delivering, distributing, and dissipating unprecedented amounts within increasingly dense racks creates a separate architectural challenge. Global data center capacity is surging toward 165 GW by 2030, with AI driving a compound annual growth rate of 16%. While grid connection constraints limit where and how quickly facilities can scale, the power density challenge is creating a fundamental architectural crisis at the rack level: traditional AC power distribution, cooling strategies, and equipment designed for 10–40 kW racks are physically and economically incapable of supporting the 200+ kW AI factories now entering production. The companies that solve this rack-level challenge first will shape the competitive landscape for years to come.

As artificial intelligence computing demands explode beyond 200 kilowatts per rack, the infrastructure that powers these systems is reaching fundamental physical limits, forcing a complete architectural rethink of how data centers manage both power and heat.

The global data center market is racing toward 165 gigawatts of capacity by 2030, with AI workloads driving nearly half of that growth. But behind this expansion lie critical constraints that few outside the industry fully appreciate. Grid availability continues to be a fundamental bottleneck — securing sufficient utility power to data center sites limits where facilities can be built and how quickly they can scale. This is an ongoing infrastructure challenge that affects the entire industry. Power density within racks represents a separate but equally critical constraint. As NVIDIA's processor roadmap moves from 40-kilowatt Hopper racks to anticipated 1-megawatt Feynman systems and discussions of 2 MW systems at the latest GTC within the next few years, data center operators face a stark reality. Traditional alternating current (AC) architectures and conventional air cooling simply cannot handle these extreme rack-level densities. The result is a fundamental shift in how the industry thinks about infrastructure — one that transforms the economics, design priorities, and competitive landscape for thermal management suppliers.

From Power Distribution to Heat Concentration at the Rack Level

The transition to 800-volt direct current (VDC) power distribution represents more than an incremental efficiency improvement. It fundamentally changes where heat is generated and concentrated within a data center. Traditional AC architectures distribute power through bulky copper cabling, requiring multiple conversion stages from the utility feed to the chip. Each conversion creates heat, scattering thermal loads across power distribution units, uninterruptible power supplies, and rack-mounted power shelves throughout the facility.

This dispersed heat generation made room-level cooling the dominant paradigm. Perimeter computer room air conditioning units and in-row handlers managed ambient temperatures across the data hall, with heat treated as a facility-wide challenge. But as rack densities climb past 100 kilowatts, this model breaks down. The sheer volume of waste heat overwhelms air's capacity to transport it efficiently, even with aggressive airflow management and containment strategies.

The shift to 800 VDC eliminates many of these distributed conversion losses. By reducing the number of power transformation stages, full-DC architectures can achieve up to 95% efficiency—but critically, they also concentrate heat generation at the source: the compute rack itself. Non-IT cooling loads, which historically consumed 10-15% of facility power, drop to an estimated 3-7% in full-DC designs. Room-level heat diminishes significantly, but the thermal intensity within each rack skyrockets.

The End of Air as the Primary Cooling Medium

"Beyond 200kW, air can't transport heat fast enough. Direct-to-chip liquid cooling becomes mandatory."
Urs Neumair
Partner
Munich Office, Central Europe

This concentration of heat at the rack level exposes the physical limitations of air cooling. Air simply cannot remove heat fast enough when power densities exceed certain thresholds. Direct-to-chip liquid cooling—where cold plates capture heat directly from processors—becomes not just advantageous but mandatory. The transition is already underway, but it's accelerating as NVIDIA's roadmap pushes toward multi-hundred-kilowatt and eventually megawatt-scale racks.

For thermal management suppliers, this shift represents both disruption and opportunity. Equipment designed for room-level air management—perimeter CRAC units, fan walls, and even in-row coolers—faces declining relevance in new AI facilities. These systems remain essential for non-AI data centers and retrofit applications, but they are being displaced at the point of growth. Meanwhile, liquid cooling infrastructure—cold plates, coolant distribution units, pumps, manifolds, and valves—sees surging demand.

The equipment mix is changing, but so is the strategic position of different cooling modalities. Rear door heat exchangers offer a bridge solution, intercepting heat at the rack before it reaches room air. They're more effective than pure air cooling but still limited compared to direct-to-chip approaches for ultra-high-density applications. Immersion cooling, where entire servers are submerged in dielectric fluid, addresses extreme thermal loads but faces adoption constraints around cost, compatibility with existing hardware, and operational complexity. The emerging consensus points toward direct-to-chip liquid cooling as the default architecture for AI factories, with other modalities serving niche or transitional roles.

From Components to Integrated Systems

Perhaps the most strategic shift is the move from selling discrete cooling products to delivering integrated rack-level systems. In traditional data centers, operators or engineering, procurement, and construction (EPC) contractors purchased cooling components—chillers, pumps, distribution units—and integrated them on-site. Value derived primarily from unit cost and efficiency specifications. Suppliers competed on product performance within narrow categories.

AI data centers are changing this dynamic. The rack is becoming the atomic deployment unit—a self-contained module combining compute, power delivery, and thermal management. Modern AI racks, such as those based on open rack architectures, house high-voltage DC bus bars, power supplies, cold plates, and coolant manifolds alongside the compute hardware itself. Each rack functions as a modular "factory cell" that can be independently deployed, scaled, serviced, and upgraded.

This integration means power and cooling must be co-engineered with compute hardware from the outset. Suppliers can no longer optimize components in isolation. Instead, they must deliver pre-engineered, validated systems where thermal performance is guaranteed across the entire stack. The value proposition shifts from component specifications to system-level performance, reliability, and deployability.

For thermal management OEMs, this creates both pressure and opportunity. It raises barriers to entry, favoring suppliers with deep engineering capabilities and ecosystem partnerships—or vertical integration. But it also enables differentiation beyond price competition. Companies that can deliver turnkey rack solutions, certified and tested with specific GPU configurations, capture strategic advantage. Speed matters: being able to bring capacity online faster than competitors allows suppliers to build better performance models, secure scarce resources, and influence emerging industry standards.

The Rise of Operational Services

"Thermal OEMs must shift from equipment sales to operational partnerships with recurring services."
Daniel Tao
Principal
Chicago Office, North America

The complexity of liquid-cooled, high-voltage DC systems is creating demand for a new category of value: continuous operational support. AI workloads run 24/7 near nameplate capacity, unlike traditional enterprise applications with variable demand patterns. System uptime becomes paramount, and the tightly coupled nature of liquid cooling with power delivery means failures can cascade quickly.

Most data center operators lack deep in-house expertise in liquid cooling fault modes, control algorithms, and predictive maintenance for these new architectures. This creates an opening for thermal management suppliers to shift from product vendors to operational partners. Services that were once considered optional or discretionary — remote monitoring, predictive maintenance, performance optimization — are becoming strategically critical.

Several forms of service are gaining prominence. Monitoring and predictive services use real-time telemetry, fault detection algorithms, and digital twins to anticipate failures before they occur. Optimization services continuously tune flow rates, temperatures, and load balancing to maximize efficiency and headroom. Operations and reliability services provide rapid-response expertise, training, and redundancy planning. Lifecycle management handles everything from commissioning to upgrades and decommissioning.

These services generate recurring revenue and operational lock-in, fundamentally changing the economics for suppliers. They also align incentives differently: suppliers are rewarded for system uptime and performance, not just equipment sales. This shifts the competitive landscape toward companies that can build strong customer relationships, invest in software and analytics platforms, and maintain field service capabilities.

Strategic Implications for Industry Leaders

The convergence of 800 VDC power, liquid cooling, integrated systems, and operational services rewrites the strategic playbook for thermal management companies — and opens the door to new entrants into the ecosystem. . Portfolio strategy must span multiple cooling modalities, addressing both the transitional state — where hybrid air-liquid systems dominate — and the future state, where direct-to-chip liquid cooling becomes standard. Companies locked into a single technology or product category face structural disadvantage.

Ecosystem positioning also matters more than before. As traditional OEM boundaries blur, success depends on collaboration across power delivery, compute hardware, and facility design. Some suppliers may pursue vertical integration, acquiring capabilities in adjacent domains. Others will rely on partnerships and open standards. Either way, the ability to deliver certified, interoperable solutions becomes a competitive differentiator.

Speed is another dimension of competition. The data center build-out race is creating urgency around deployment timelines. Suppliers that can compress installation, commissioning, and ramp-up schedules gain market share. This favors modular, pre-tested systems that minimize on-site integration risk. Component suppliers from other industries also have an opportunity to enter and differentiate by collaborating in new factory-built modular systems. It also rewards companies that secure long-lead components early and build resilient supply chains.

Finally, the shift toward services requires organizational capabilities that many traditional equipment suppliers lack. Building remote monitoring platforms, training field service teams, and developing predictive analytics all require investment in software, data science, and customer success functions. Companies that successfully make this transition can capture value across the equipment lifecycle, not just at the point of sale.

Looking Ahead: Infrastructure Architecture for AI at Scale

The transition to 800 VDC and rack-level liquid cooling is not a distant future scenario — it's unfolding now, driven by AI computing demands that are doubling or tripling year over year. For data center operators, the challenge is deploying infrastructure fast enough to capture AI workload growth while managing unprecedented capital intensity. For thermal management suppliers, the challenge is different: transforming business models from product sales to integrated systems and services while building capabilities across an expanding technology stack.

The winners will be those who move decisively — investing in liquid cooling capabilities, forging ecosystem partnerships, and developing service platforms before market leadership consolidates. The data center industry has seen technological shifts before, but rarely at this pace or scale. What's emerging is not just a new generation of equipment, but a fundamentally different architecture for how we build, power, and cool the infrastructure that underpins artificial intelligence.

Sign up for our newsletter

Further readings
Load More