NVIDIA Nemotron

Nemotron is NVIDIA’s open model family for enterprise agents, covering reasoning, coding, multimodal understanding, speech, safety, and retrieval. The flagship agent model is Nemotron 3 Ultra, positioned for long-running orchestration workflows.

Nemotron 3 Ultra

A 550B-parameter Mixture-of-Experts model with 55B active parameters, released June 2026.

Architecture highlights:

  • MoE — large total capacity, smaller active inference footprint per token
  • Hybrid Mamba-Transformer — Mamba layers for long-sequence efficiency; Transformer layers for precise factual recall
  • NVFP4 — 4-bit floating-point quantization running across Hopper, Blackwell, and Ampere GPUs
  • LatentMoE + multi-token prediction — routing efficiency and faster generation in multi-turn workflows

Training:

  • 10M new SFT samples, 1M new RL tasks, 15 net-new RL environments
  • 212B domain pretraining tokens (synthetic legal, Wiki-based, GitHub through 2025-09-30)
  • Multi-Teacher On-Policy Distillation (MOPD) — student generates attempts; specialist teachers provide dense feedback

Reported performance (NVIDIA claims; validate independently):

  • Up to 5× higher throughput vs comparable open models
  • Up to 30% lower cost for agentic tasks (SWE-bench and Terminal-Bench-style experiments)

Evaluation lens for agent models

NVIDIA argues agent models should be evaluated by: throughput, cost-to-task-completion, long-context behavior, domain adaptation path, and deployment control — not only by single-turn benchmark score. This “cost-to-completion” lens is important for multi-turn workflows. See ContextBudget.

Broader model portfolio

FamilyDomain
NemotronEnterprise agents
Cosmos 3Physical AI and world simulation
Earth-2Weather and climate intelligence
BioNeMoBiology and drug discovery

Each family has a different operating loop and validation path. Generic benchmarks do not transfer between families.

Boundary

NVIDIA benchmark claims should be validated against the enterprise’s own workflows. Open weights and training recipes do not remove security, audit, and data-governance requirements. NVFP4 benefits depend on NVIDIA GPU availability and kernel support.

  • NVIDIANeMoAgentToolkit — NIM serves Nemotron as production inference microservices
  • Cosmos3 — Cosmos 3 is the physical AI world model family; distinct operating loop from Nemotron
  • NVIDIAOmniverse — Cosmos feeds into Omniverse as the digital twin and simulation platform
  • NVIDIAAIPlatform — full NVIDIA model and platform synthesis