NVIDIA AI Stack Overview

NVIDIA’s AI strategy is a vertically integrated stack from silicon to factory operations. Understanding it requires reading across four interdependent layers: infrastructure, platform, models, and applications.

The four layers

Applications
  └── FOX (factory ops) / AI-Q (enterprise research) / industry verticals
          ↓
Platform
  └── NeMo Agent Toolkit + NIM + OpenShell + AI-Q Blueprint
          ↓
Models
  └── Nemotron (enterprise) / Cosmos (physical AI) / Earth-2 / BioNeMo
          ↓
Infrastructure
  └── GB300 NVL72 / Vera Rubin / BlueField-4 STX / Spectrum-X

Infrastructure: AI factories

NVIDIA frames its hardware as AI factories — integrated systems of compute (GPU), interconnect (NVLink), networking (Spectrum-X), storage (BlueField-4/STX), DPUs, cooling, and operations software. The constraint is not GPU count alone; it is the whole system’s ability to feed data to the GPU without stalling.

Key hardware:

  • GB300 NVL72 — Blackwell Ultra rack-scale system for large reasoning and MoE inference
  • Vera Rubin — next-generation architecture in roadmap
  • BlueField-4 STX — moves KV-cache and context storage closer to compute
  • Spectrum-X — AI-optimized Ethernet fabric

See ContextBudget — the same memory pressure that matters for software agents matters for hardware: inference bottlenecks are often memory bandwidth, not raw compute.

Platform: separation of concerns

NVIDIA’s agent platform separates four concerns:

LayerComponentRole
InferenceNIMOptimized model serving as repeatable microservice
WorkflowNeMo Agent ToolkitOrchestration, MCP, evaluation, observability
Data retrievalAI-Q BlueprintGrounded enterprise research over private data
Runtime securityOpenShellFilesystem, network, credential, inference policy

This mirrors the Claude SDK’s separation of model / hooks / permissions / sessions. The pattern is convergent: production agents require infrastructure-level controls, not prompt-level controls.

See NeMoAgentToolkit and AgenticGovernance.

Models: vertical operating loops

Each model family has a distinct operating loop:

  • Nemotron — enterprise agents (reasoning, coding, multimodal, speech)
  • Cosmos — physical AI and world simulation (synthetic data, photoreal transfer, video reasoning)
  • Earth-2 — weather and climate forecasting
  • BioNeMo — biology and drug discovery

The key insight: the model is not the product. Durable value comes from the data pipeline, simulation, post-training, deployment, and integration into business workflows around the model.

Applications: physical AI and factory operations

Physical AI (see SimToReal) runs on:

  • NvidiaOmniverse — simulation and digital twin
  • NvidiaIsaac — robot learning and deployment
  • Metropolis/VSS — video intelligence
  • Holoscan — real-time sensor pipelines

Factory operations run on NvidiaFOX — a manager-agent-plus-specialist-agents architecture that connects machines, quality, SOPs, transport, energy, and video through governed APIs.

Cross-stack convergence

The same governance principles appear at every layer:

  • Runtime controls must be infrastructural (not prompt-level)
  • Agent tool access must be minimal and auditable
  • Human approval remains required for consequential decisions
  • Data readiness precedes model capability