The newly launched Vera Rubin POD by NVIDIA is a state-of-the-art AI supercomputer designed to meet the growing requirements of agentic AI systems. This architecture features five specialized rack-scale systems built on NVIDIA's advanced MGX framework, capable of processing over 10 quadrillion tokens annually.
Comprising 40 racks, the system integrates 1.2 quadrillion transistors and nearly 20,000 NVIDIA dies, including 1,152 NVIDIA Rubin GPUs. It is engineered to deliver an impressive 60 exaflops of processing power and achieve a bandwidth of 10 petabytes per second. The centerpiece of this architecture, the NVIDIA Vera Rubin NVL72 rack, features 72 GPUs and 36 CPUs, interconnected through a robust NVLink spine, enabling the rack to operate as a cohesive GPU unit.
Designed to enhance four key AI scaling laws, the NVL72 rack provides up to four times improved training performance and ten times enhanced inference performance per watt compared to previous models. Additionally, the system includes dedicated inference accelerator racks equipped with NVIDIA Groq 3 LPX, which contain 256 language processing units per rack, significantly increasing processing capabilities and revenue opportunities for large-scale AI models.