Data Center Energy Management: Why the AI Factory Requires Industrial Control Solutions

I'll admit I was resistant to the term "AI factory" when it first started making the rounds (after all, what's wrong with "AI Data Center"?), but that changed the first time I walked onto the floor of a modern AI data center under construction. Observing the sheer complexity and density of the fluid distribution systems for the technical water loops settled the debate instantly. When you combine those mechanical systems with racks approaching two tons and bus voltages pushing toward 800 volts, the reality sets in quickly: we are no longer operating in the pristine IT white spaces of the last decade. This is heavy industry now.

If we are indeed building factories for AI now, we must manage them as such. We need to embrace the concept of Manufacturing Execution Systems (MES) for data centers. An MES tracks and controls the transformation of raw materials into finished goods. For an AI data center, power is the raw material, and compute tokens are the output. For data center operators specifying commercial Building Management Systems (BMS) instead of Programmable Logic Controllers (PLCs), designing a truly energy-efficient AI data center is an uphill battle. They're trying to run a heavy industrial plant on systems that were never designed for this environment. However, this isn't only an issue for BMS: facilities running PLCs have faster, more precise control, but if those PLCs aren't publishing into a common data plane, you've still got fragmented intelligence. The polling problem is solved, but the integration problem persists.

Three Domains, Zero Handshakes
The shift from traditional enterprise workloads to GPU cluster deployments introduces a fundamental physics challenge. Air cooling has an inherent thermal flywheel in the ambient air mass, which buys you time between a load spike and a critical temperature event. Direct-to-chip liquid cooling compresses that buffer. Heat moves immediately, which is exactly the point, but it exposes something the industry hasn't fully reckoned with: the thermal stack is controlled by three separate domains that weren't designed to talk to each other.

The liquid cooling stack in a modern AI data center spans three distinct control domains. The IT layer manages GPU thermals and compute load. The CDU manages the immediate liquid loop: supply and return temps, flow rates, local pressure. The facility layer manages bulk heat rejection, whether that's chilled water, dry coolers, or cooling towers. Each domain generally manages itself competently in isolation. The problem is at the seams.
When a training job launches, the load does not ramp gracefully. In moderately sized clusters, synchronized GPU activity can produce sudden, millisecond-speed changes in power draw ranging from several hundred kilowatts to a few megawatts. In a direct-to-chip system, the thermal load tracks the electrical load closely, so the heat arrives on that same timescale.
The scheduler knows the job started. The signal exists, but it lives in the IT domain, speaks a different protocol, and crosses an organizational boundary. The facility layer has no direct line to it; the plant reacts to what its own sensors read, not to what the workload is about to demand. Facility capacity ends up pre-positioned against lagging data from the wrong layer. Because no one trusts the handoff, the facility compensates the only way it can: by running cold all the time. The architecture forces over-provisioning.

Built for Buildings, Not Factories
This is where commercial BMS architectures often hit their limits. Traditional BMS platforms were built for building automation, not for sub-second industrial control loops or integration with a unified data plane at scale. BACnet and its Change of Value subscriptions can move data faster than legacy polling, but the platform architecture above the protocol was designed for a different problem. If your control stack is updating state on intervals measured in seconds, it does not have a real-time digital twin. It has a historical record.
The electrical side has its own version of the same problem, and it isn't limited to AI deployments. At hyperscale, commercial BMS platforms hit an architectural ceiling that shows up in any large data center. Point counts, historian throughput, and integration models were designed for building automation, not for managing thousands of intelligent PDUs, busway segments, and EPMS nodes across a 100+ megawatt campus. Most BMS platforms either can't get there, or require architectural compromises that undermine unified visibility. Operators end up with segmented electrical monitoring, cooling systems running blind to actual load conditions, and no single operational picture that unifies the two. That inefficiency is baked into the facility from day one.

The Event-Driven AI Factory
You cannot optimize a physical process that you cannot monitor in real time. Managing these dynamic thermal and electrical loads requires an event-driven, edge-native architecture: a publish-and-subscribe framework using MQTT or similar lightweight messaging protocols.
When a coolant valve actuates or a CDU registers a pressure drop, the edge device does not wait to be interrogated by a centralized server. It instantly publishes that state change into a Unified Namespace (UNS). The UNS is the central data hub where the current state of the entire facility lives at any given moment. It is the same pattern factories use to coordinate MES, SCADA, and plant-floor controls. Which is to say, it is what running an AI data center as an industrial facility actually looks like in practice.


In this architecture, the fluid distribution controls, the electrical power management system (EPMS), and the digital twin all subscribe to the same real-time data stream. This eliminates point-to-point integrations. What replaces them is a single, instantaneous source of truth that supports real data center energy management. The control system reacts to a GPU compute spike in real time, matching cooling output precisely to the thermal load without energy-wasting safety margins.
Real-time reaction has limits. A 1000-ton chiller cannot shed load in milliseconds, regardless of how fast the data arrives. The full architecture has to extend past reaction into prediction: the scheduler publishing job intent to the UNS minutes before launch, giving the mechanical plant time to pre-position capacity. The UNS makes that possible because it is the same data plane, just consumed by different subscribers on different timescales.

An industrial SCADA platform like Ignition is what ties this together. It connects BMS, EPMS, and the broader data plane into a single environment where power draw, thermal state, and cooling response are visible together, alert logic is buildable without custom integration work, and conditions surface as they develop rather than after they escalate. That is the difference between data center energy management and data center energy monitoring. Monitoring is passive. Management is active.
An energy-efficient data center is built on feedback loops, with feed-forward control where possible: cooling output tracking thermal load in real time, power capacity allocated precisely, inefficiencies surfacing immediately. Over-provisioning and month-end PUE reports are not a substitute.

Fragmented Intelligence and the Procurement Gate
The industry actively purchases its own operational bottlenecks. Fragmented intelligence is rarely an accident. Every time a facility procures a cooling block, an intelligent PDU, or rack monitoring equipment with its own proprietary dashboard, the design team is building a new silo. The individual decisions make sense in isolation. The aggregate result is a facility where BMS, EPMS, and IT telemetry cannot talk to each other in real time.
AI-focused data center builds are heavy manufacturing plants. They need to be procured that way. No engineer would design a production facility where the power systems cannot inform the assembly line. Yet that disconnected approach is standard practice in data centers today, and it is a significant driver of unnecessary data center energy consumption across the industry.
Interoperability has to be a hard gate in the procurement process. Leadership needs to stop treating open standards as an optional feature at the RFP stage. The culture shift is straightforward: mandate that every physical asset on the floor is a live, communicating node in a central control platform. That means a documented interface that the platform can reach, with real-time capability wherever the control loop requires it. MQTT, OPC UA pub/sub, and Sparkplug B are the protocols built for event-driven architectures. BACnet, Modbus, and polled APIs can be brought into the UNS, but the data only moves as fast as the poll. That tradeoff belongs in the specification, not in the integration team's punch list.
The contextualization, tag naming, and unit standardization are the platform's job, not the vendor's. What the platform cannot do is reach data that only exists behind a proprietary dashboard or a vendor cloud portal with no other path out.
The counterargument is that most industrial and electrical vendors already support open protocols. That's largely true. BACnet, Modbus, and OPC UA have been table stakes for decades, and a well-architected SCADA platform can contextualize almost anything that arrives through them. There are exceptions. Certain high-end instrumentation is only fully accessible through the vendor's own software stack, and those tradeoffs should be made consciously at the specification stage rather than discovered at commissioning. The real problem is specification discipline. Equipment enters the facility through an RFP. If the RFP doesn't require documented, open access to every telemetry point that the platform will need, the integration team inherits a problem that should have been solved in procurement.
Every asset that fails this gate becomes another seam in the thermal stack. Every seam becomes another reason to run cold all the time.

Industrial Plants, Not Buildings
The industry has been treating data centers simply as buildings with servers in them: managed by building automation, monitored by dashboards, commissioned as real estate. The physics of the AI workloads say otherwise. Liquid cooling loops, megawatt step changes, two-ton racks, 800-volt buses: these are industrial plants that happen to produce tokens instead of steel or chemicals.
Running them that way means event-driven control, a unified data plane, and specification discipline that treats every asset as part of a single operational environment. None of this is exotic. It is how modern factories are being built.
Pretending AI data centers are anything else is actually the more exotic position. The industry can keep procuring these facilities as office buildings. They will still be factories.
Tags /
Data Centers Energy Management BMS EPMS SCADA IIoT Ignition Unified Namespace MQTT Digital Transformation DCIM