The words Innovation Explained with the ai underlined on gradient background with a data node pattern.The words Innovation Explained with the ai underlined on gradient background with a data node pattern.

The NVIDIA Vera CPU is a next-generation data center processor announced at NVIDIA’s GTC 2026 conference. It features 88 custom-designed Olympus cores and represents NVIDIA’s boldest move yet into the general-purpose CPU market. Positioned as the first CPU purpose-built for reinforcement learning and agentic AI workloads, Vera delivers up to 1.2 terabytes per second of memory bandwidth, supports up to 1.5 TB of memory, and is the first CPU to support FP8 precision. The chip is built on a monolithic die architecture with full Armv9.2 compatibility, and it serves as the host CPU in NVIDIA’s new Vera Rubin platform – a rack-scale AI supercomputer that pairs 72 Rubin GPUs with 36 Vera CPUs to power the next wave of AI infrastructure.

In this article, we’ll discuss what makes the Vera CPU architecturally distinct from its competitors, why NVIDIA designed a processor specifically for agentic AI workloads, how the Vera CPU Rack and Vera Rubin platform fit into the broader AI factory vision, and what this launch means for the data center CPU market currently dominated by Intel and AMD.

TL;DR Snapshot

NVIDIA’s Vera CPU marks the company’s full-scale entry into the data center CPU market with an 88-core, Arm-based processor optimized for the compute demands of agentic AI and reinforcement learning. It debuts alongside the Vera Rubin platform and a dedicated 256-CPU rack system, signaling a fundamental shift in how AI infrastructure is designed; one where the CPU is no longer an afterthought but a co-equal partner to the GPU.

Key takeaways include…

Vera’s 88 Olympus cores deliver a claimed 1.5x IPC improvement over the previous generation, with 1.2 TB/s of memory bandwidth and the fastest single-threaded performance NVIDIA says is available on the market today.
The new Vera CPU Rack packs 256 liquid-cooled processors into a single rack, sustaining over 22,500 concurrent environments for reinforcement learning and agentic sandbox execution – a use case that GPUs alone cannot serve.
Major cloud providers (AWS, Google Cloud, Azure, Oracle) and AI companies (OpenAI, Anthropic, Meta, Mistral) are already onboard, with Vera-based systems expected to ship in the second half of 2026.

Who should read this: Cloud architects, AI infrastructure engineers, data center strategists, semiconductor industry analysts, and technology investors.

The Olympus Core: Why NVIDIA Built Its Own CPU Architecture

At the heart of the Vera CPU sits the Olympus core – NVIDIA’s first fully custom data center CPU core. Unlike the Grace CPU, which used off-the-shelf Arm Neoverse cores, Vera represents a ground-up design effort. The Olympus core features a 10-wide instruction fetch and decode front end, built for sustained high instructions-per-cycle throughput on memory-intensive workloads with heavy control-flow logic. NVIDIA claims a 1.5x improvement in IPC over its predecessor – a massive generational jump in a market where competing architectures typically deliver single-digit or low-teens percentage gains per generation.

The design philosophy behind Olympus reflects a specific insight: agentic AI workloads need a hybrid of characteristics that no existing CPU delivers well. Individual AI environments require strong single-threaded performance to execute complex code quickly, similar to a workstation. At the same time, modern AI systems launch thousands of these environments concurrently, creating massive throughput demands typical of server infrastructure. NVIDIA designed Vera to bridge this gap, combining the high core count of hyperscale cloud CPUs with the single-thread performance of desktop chips and the power efficiency of mobile processors.

Vera also introduces NVIDIA Spatial Multithreading (SMT), which lets users choose between performance-per-thread and thread count at runtime. Unlike traditional SMT that relies on time-shared resources and introduces performance variability, NVIDIA’s approach aims to give each thread stable performance, stronger isolation, and predictable tail latency under heavy load – qualities that matter enormously when running thousands of concurrent sandbox environments.

Why Agentic AI Needs a Different Kind of CPU

The shift from chatbot-style AI to agentic AI represents a fundamental change in computing demands. A chatbot query might consume milliseconds of GPU time. An agentic system – one that reasons autonomously, writes and executes code, calls external tools, and iterates over hours or days – consumes enormous CPU resources for the tasks that happen between GPU inference steps (e.g. SQL queries, code compilation, key-value cache management, orchestration, etc.).

NVIDIA’s Ian Buck, VP of Hyperscale and HPC, put it plainly at a press briefing. Agents don’t operate on GPUs alone, they need powerful CPUs to work efficiently. GPUs call out to CPUs for sandbox execution, and that execution is a critical part of both training and deploying agents across data centers. This is the core thesis behind Vera. As AI systems become more autonomous and tool-using, the CPU becomes a performance-critical component that deserves purpose-built silicon, rather than a bottleneck to be tolerated.

Vera’s monolithic die design is specifically tailored to these demands. Built on a single compute die with adjacent dielets for memory and I/O, the architecture ensures that every core has uniform, high-bandwidth access to resources. From an application’s perspective, every core is the same practical distance to other cores, caches, memory, and networking, eliminating the cross-chiplet latency penalties common in competing multi-die designs. For workloads where runtime paths are inherently unpredictable, this architectural consistency translates directly to more reliable performance.

The Vera CPU Rack and the Vera Rubin Platform

NVIDIA isn’t just selling chips, it’s selling complete systems now. The Vera CPU Rack packs 256 liquid-cooled Vera processors into a single rack, built on the NVIDIA MGX platform. NVIDIA claims this configuration can sustain more than 22,500 concurrent CPU environments, delivering a 6x gain in CPU throughput and twice the performance in agentic AI workloads compared to existing solutions. The rack is purpose-built for reinforcement learning post-training, where AI models learn by generating code, executing it in sandboxed environments, evaluating results, and iterating; a loop that’s intensely CPU-dependent.

Then there’s the Vera Rubin NVL72, the flagship rack-scale AI supercomputer. This system unifies 72 Rubin GPUs and 36 Vera CPUs, connected via NVLink 6, alongside ConnectX-9 SuperNICs, BlueField-4 DPUs, and Spectrum-6 Ethernet switches. NVIDIA also integrated the Groq 3 LPU, a dedicated inference accelerator, into the platform. The company claims Vera Rubin NVL72 can train large mixture-of-experts models using one-quarter of the GPUs required on Blackwell-generation hardware, and deliver AI inference at one-tenth the cost per million tokens.

The broader message is clear. NVIDIA is treating the data center, not the individual chip, as the fundamental unit of compute. Vera Rubin is designed as a cohesive system where CPU, GPU, network, and storage work in concert, rather than a collection of discrete components. Third-generation MGX rack designs enable cable-free modular trays and a seamless upgrade path from prior-generation Blackwell systems, with more than 80 manufacturing partners building systems around the platform.

Early Benchmarks and Industry Adoption

Early third-party benchmarks are already surfacing. Redpanda, a high-performance data streaming platform, published results showing Vera delivering up to 5.5x lower latencies than AMD EPYC Turin, and more than 2.5x faster than Intel Xeon 6 Granite Rapids on Kafka-compatible streaming workloads. The benchmarks also showed up to 73% higher throughput than Turin in cross-core communication tests. Perhaps most notably, Redpanda found that Vera’s latency actually decreased as the system scaled across more cores and machines (opposite of what they observed on competing architectures).

On the adoption front, the customer list is formidable. Meta announced it will deploy multiple generations of NVIDIA CPU-only systems across its infrastructure. Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure have all committed to offering Vera Rubin–based systems, and AI companies like OpenAI, Anthropic, and Mistral AI are all backing the platform as well. The chips are now in full production, with commercial systems from OEM partners expected in the second half of 2026.

What This Means for the CPU Market

Vera’s launch marks a significant escalation of competition in the data center CPU market. NVIDIA introduced its first-generation Grace CPUs at GTC 2022, signaling long-term ambitions. With Vera, those ambitions are now fully realized. NVIDIA is no longer just a GPU company that happens to make CPUs, it’s entering direct competition with Intel and AMD for CPU sockets in AI data centers, while also challenging the custom Arm-based processors developed by hyperscalers like AWS (Graviton) and Google (Axion).

The competitive dynamics are especially stark given NVIDIA’s ability to co-design CPU and GPU together. Because Vera connects to Rubin GPUs via NVLink Chip-to-Chip (C2C) interconnects, NVIDIA can optimize the entire data path between CPU and GPU in ways that third-party CPU vendors simply can’t. This system-level integration, which NVIDIA is calling “extreme codesign,” is likely to be the company’s most enduring competitive advantage in the CPU market.

For enterprise buyers and cloud providers, the math is pretty straightforward. If your workloads are increasingly agentic and GPU-intensive, a platform where CPU and GPU are designed as a unified system offers performance and efficiency gains that a mix-and-match approach can’t easily replicate. Intel and AMD will need to come up with some pretty compelling answers here. Otherwise, they may struggle to maintain their current positions as AI-driven workloads continue to reshape the way data center purchasing decisions are made.

Frequently Asked Questions

GTC (GPU Technology Conference) is NVIDIA’s annual developer conference, where the company announces new hardware, software, and platform updates. It’s the primary venue where NVIDIA reveals its roadmap to developers, partners, cloud providers, and investors.

Agentic AI refers to AI systems that can reason autonomously over extended periods, write and execute code, call external tools, and continuously improve without constant human intervention. Unlike chatbots that respond to a single prompt, agentic systems orchestrate multi-step workflows, such as debugging codebases, running simulations, or managing data pipelines. These workflows can run for hours or days at a time.

Reinforcement learning (RL) is a training technique where an AI model learns by trial and error. It takes actions in an environment, receives feedback (rewards or penalties), and iterates to improve its behavior. In the context of Vera, RL post-training involves running thousands of sandboxed environments simultaneously where AI models generate code, execute it, evaluate results, and refine their approach. This process is heavily CPU-dependent.

Arm is a processor architecture originally designed for mobile devices and known for its power efficiency. It has increasingly moved into data center and server markets as companies seek alternatives to x86 processors from Intel and AMD. NVIDIA’s Vera CPU is based on the Armv9.2 instruction set, which provides full compatibility with the broad Arm software ecosystem.

Grace, launched in 2022, was NVIDIA’s first-generation data center CPU and used standard Arm Neoverse cores. Vera is the second generation and uses NVIDIA’s fully custom-designed Olympus cores, delivering a claimed 1.5x IPC improvement. Vera also increases core count from 72 to 88, triples memory capacity to 1.5 TB, doubles memory bandwidth to 1.2 TB/s, and introduces features like Spatial Multithreading and FP8 precision support.

NVIDIA has stated that Vera chips are now in full production. Commercial systems from OEM partners are expected to become available in the second half of 2026, with major cloud providers (AWS, Google Cloud, Azure, Oracle) planning to offer Vera Rubin–based services soon thereafter.

TL;DR Snapshot

The Olympus Core: Why NVIDIA Built Its Own CPU Architecture

Why Agentic AI Needs a Different Kind of CPU

The Vera CPU Rack and the Vera Rubin Platform

Early Benchmarks and Industry Adoption

What This Means for the CPU Market

Frequently Asked Questions

What is GTC?+

What is agentic AI?+

What is reinforcement learning?+

What is the Arm architecture?+

How does Vera differ from NVIDIA’s Grace CPU?+

When will Vera-based systems be available?+