Source Snapshot
- Origin: NVIDIA Technical Blog
- Type: Product / technical launch note
- Author / org: Chris Alexiuk and Chintan Patel, NVIDIA
- One-line takeaway: Nemotron 3 Ultra is NVIDIA’s open reasoning model for long-running agents, combining stronger orchestration, lower token cost, and an open enterprise deployment path.
Garden Card
This note captures why Nemotron 3 Ultra matters for enterprise agents that run across many turns, tools, and sub-agents.
这篇笔记记录 Nemotron 3 Ultra 为什么对多轮、工具调用和子智能体协作的企业智能体重要。
-
Core question: How does NVIDIA position Nemotron 3 Ultra for long-running agent orchestration? 核心问题:NVIDIA 如何定位 Nemotron 3 Ultra 来支撑长流程智能体编排?
-
Operational value: It reframes model choice around cost-to-completion, context discipline, and secure runtime deployment. 运营价值:它把模型选择从单次回答能力转向任务完成成本、上下文纪律和安全运行环境。
-
Best connection: Open Models & Industry Verticals, Core AI Platforms & Agents, Hardware Architecture & Computing Infrastructure 最适合连接的内容:开放模型与行业垂直、核心 AI 平台与智能体、硬件架构与计算基础设施。
1. Executive Summary
NVIDIA released Nemotron 3 Ultra as an open 550B-parameter Mixture-of-Experts model with 55B active parameters, aimed at complex long-running agent workflows.
NVIDIA 发布 Nemotron 3 Ultra,定位为开放的 550B 参数 MoE 模型,其中 55B 为活跃参数,目标是复杂长流程智能体工作流。
The important shift is not only benchmark accuracy. NVIDIA is arguing that agent models should be evaluated by throughput, cost-to-task-completion, long-context behavior, domain adaptation, and deployment control.
关键变化不只是基准准确率。NVIDIA 实际上在强调,智能体模型应该按吞吐量、任务完成成本、长上下文能力、领域适配和部署控制来评估。
For enterprise AI, this makes Nemotron 3 Ultra less like a chatbot model and more like an orchestration layer for coding agents, research agents, validation workflows, and secure autonomous execution.
对企业 AI 来说,Nemotron 3 Ultra 更像编码智能体、研究智能体、验证流程和安全自主执行的编排层,而不是普通聊天模型。
-
Main idea: Long-running agents need a model system optimized for sustained reasoning, tool use, and efficient completion. 主要观点:长流程智能体需要面向持续推理、工具使用和高效完成任务优化的模型系统。
-
Why now: Agent workflows are becoming longer, token-heavy, and more expensive as they plan, call tools, delegate, and validate across many turns. 为什么现在重要:智能体工作流正在变长,随着规划、工具调用、委派和验证不断发生,token 负担和成本都在上升。
-
Where it applies: Coding agents, research automation, engineering review, enterprise workflow orchestration, and secure agent execution. 可以应用的场景:编码智能体、研究自动化、工程评审、企业流程编排和安全智能体执行。
Decision Signal
Evaluate Nemotron 3 Ultra by cost-to-completion and orchestration quality, not only by single-turn benchmark score.
2. Key Technical Terms
Use these terms when comparing Nemotron 3 Ultra with other frontier open models.
比较 Nemotron 3 Ultra 与其他前沿开放模型时,可以使用这些术语。
-
Mixture-of-Experts / 专家混合模型: A model architecture where only selected expert subnetworks activate for a given token or task.
一种模型架构,每次推理只激活部分专家子网络来处理特定 token 或任务。
-
55B active parameters / 550B 参数中的 55B 活跃参数: Nemotron 3 Ultra has large total capacity but activates a smaller subset during inference.
Nemotron 3 Ultra 总参数规模很大,但推理时只激活其中一部分,从而兼顾能力和效率。
-
Hybrid Mamba-Transformer / 混合 Mamba-Transformer 架构: Mamba layers improve sequence efficiency, while Transformer layers help precise recall from long context.
Mamba 层提升长序列效率,Transformer 层帮助从长上下文中精确召回信息。
-
NVFP4 / NVIDIA 4-bit floating point precision: A quantized checkpoint and kernel path designed to improve throughput across NVIDIA GPU generations.
一种面向 NVIDIA GPU 架构的低精度量化与内核路径,用于提升吞吐量。
-
Multi-Teacher On-Policy Distillation / 多教师在线策略蒸馏: A training method where the student model generates attempts and receives dense feedback from specialized teacher models.
一种训练方法,学生模型先生成尝试,再由多个领域教师模型提供密集反馈。
-
Cost-to-completion / 任务完成成本: The total inference cost required to finish a benchmark or real workflow, not just the cost of one model call.
完成一个任务或工作流所需的总推理成本,而不是单次模型调用成本。
3. Core Notes
3.1 Problem
Long-running agents generate large communication overhead. They plan, call tools, pass observations, invoke sub-agents, and feed reasoning traces back into the model across many turns.
长流程智能体会产生大量通信开销。它们会规划、调用工具、传递观察结果、调用子智能体,并在多轮过程中把推理轨迹不断送回模型。
-
Token counts can grow quickly as the workflow becomes longer. 工作流越长,token 数量增长越快。
-
Higher token volume increases cost and can create goal drift. 更高 token 量会增加成本,也可能带来目标漂移。
-
A single large model is not always the best architecture; enterprises may need a system of orchestration and execution models. 单一大模型不一定是最佳架构,企业可能需要编排模型和执行模型组成的系统。
3.2 Mechanism
Nemotron 3 Ultra is built for the harder calls inside agent systems: orchestration, complex planning, architectural decisions, evidence synthesis, and constraint-heavy verification.
Nemotron 3 Ultra 面向智能体系统中的困难调用:编排、复杂规划、架构决策、证据综合和高约束验证。
-
The MoE design gives large capacity while keeping active inference smaller than total parameter count. MoE 设计提供大容量,同时让活跃推理规模小于总参数规模。
-
Hybrid Mamba-Transformer layers support long-context efficiency and factual recall. 混合 Mamba-Transformer 层同时支持长上下文效率和事实召回。
-
NVFP4 deployment can run across Hopper, Blackwell, and Ampere GPUs, reducing fragmentation in NVIDIA-based infrastructure. NVFP4 部署可覆盖 Hopper、Blackwell 和 Ampere GPU,降低 NVIDIA 基础设施中的部署碎片化。
-
LatentMoE and multi-token prediction support routing efficiency and faster generation in multi-turn workflows. LatentMoE 和多 token 预测支持更高效的专家路由和多轮工作流中的生成速度。
3.3 Evidence
NVIDIA reports Nemotron 3 Ultra as a frontier open model with strong benchmark performance, faster inference, and lower task-completion cost.
NVIDIA 将 Nemotron 3 Ultra 描述为具备强基准表现、更快推理速度和更低任务完成成本的前沿开放模型。
-
NVIDIA says the model achieves up to 5x higher throughput versus comparable open models in its class. NVIDIA 称该模型相对同类开放模型可达到最高 5 倍吞吐量。
-
NVIDIA reports up to 30% lower cost for agentic tasks in SWE-bench and Terminal-Bench-style experiments. NVIDIA 报告,在 SWE-bench 和 Terminal-Bench 类实验中,智能体任务成本可降低最高 30%。
-
The training release includes 10M new SFT samples, 1M new RL tasks, and 15 net-new RL environments. 这次训练发布包含 1000 万新 SFT 样本、100 万新 RL 任务和 15 个新增 RL 环境。
-
Domain pretraining adds 212B tokens across synthetic legal data, synthesized Wiki-based data, and refreshed GitHub data through September 30, 2025. 领域预训练新增 212B token,覆盖合成法律数据、合成 Wiki 数据,以及截至 2025 年 9 月 30 日的 GitHub 刷新数据。
3.4 Boundary
Nemotron 3 Ultra is promising, but enterprise adoption still needs local validation, governance review, and infrastructure fit.
Nemotron 3 Ultra 很有潜力,但企业采用仍需要本地验证、治理审查和基础设施匹配。
-
NVIDIA’s benchmark claims should be validated against the enterprise’s own agent workflows. NVIDIA 的基准声明需要在企业自己的智能体工作流中验证。
-
Open weights and recipes do not remove security, audit, and data-governance requirements. 开放权重和 recipes 不会消除安全、审计和数据治理要求。
-
OpenClaw, OpenShell, and NemoClaw should be treated as evolving runtime components; production use needs current documentation and security review. OpenClaw、OpenShell 和 NemoClaw 应被视为持续演进的运行时组件,生产使用前需要检查最新文档和安全边界。
-
NVFP4 benefits depend on NVIDIA GPU availability, kernel support, and deployment stack maturity. NVFP4 的收益取决于 NVIDIA GPU 可用性、内核支持和部署栈成熟度。
4. Concept Map
Use wikilinks to place this launch inside the broader NVIDIA agent stack.
使用双向链接把这次发布放进更大的 NVIDIA 智能体技术栈中。
- Related model strategy: Open Models & Industry Verticals
- Related platform layer: Core AI Platforms & Agents
- Related infrastructure layer: Hardware Architecture & Computing Infrastructure
- Related manufacturing lens: Physical AI & Industrial Manufacturing
flowchart LR A["Long-Running Agent Workflows"] --> B["Nemotron 3 Ultra"] B --> C["Frontier Reasoning"] B --> D["Higher Throughput"] B --> E["Lower Cost-to-Completion"] B --> F["Domain Adaptation"] C --> G["Agent Orchestration"] D --> H["NVFP4 Deployment"] E --> I["Token Discipline"] F --> J["MOPD and NeMo Recipes"]
Diagram labels stay in English for rendering consistency and easier reuse across published pages.
图中的标签保持英文,便于 Quartz 渲染后跨页面复用,也方便技术读者快速识别。
5. My Take
Nemotron 3 Ultra is strategically important because it turns “open model” from a weights-only discussion into a full agent operating stack: model architecture, training recipes, runtime safety, inference partners, and deployment options.
Nemotron 3 Ultra 的战略意义在于,它把“开放模型”从仅讨论权重,推进到完整智能体操作栈:模型架构、训练 recipes、运行时安全、推理伙伴和部署选项。
For manufacturing and enterprise AI, the most useful lesson is to evaluate agent models by workflow economics. A model that spends fewer tokens and finishes tasks faster can matter more than a model that only wins isolated single-turn benchmarks.
对制造业和企业 AI 来说,最有用的启发是用工作流经济性评估智能体模型。一个 token 消耗更少、完成任务更快的模型,可能比只在单轮基准上领先的模型更有价值。
-
What changed my thinking: Agent model selection should include throughput, task-completion cost, runtime safety, and domain tuning path. 改变我理解的地方:智能体模型选择应包括吞吐量、任务完成成本、运行时安全和领域调优路径。
-
What I may do next: Track Nemotron 3 Ultra as a candidate orchestration model for private agent workflows, especially coding, research, and engineering-review loops. 下一步可能行动:跟踪 Nemotron 3 Ultra 作为私有智能体工作流的候选编排模型,尤其是编码、研究和工程评审闭环。
-
What still needs verification: Real API availability, local deployment requirements, license terms, OpenShell security model, and actual cost on representative enterprise tasks. 仍需要验证的内容:真实 API 可用性、本地部署要求、许可条款、OpenShell 安全模型,以及代表性企业任务上的实际成本。
Reuse Path
Convert this note into an enterprise agent-model evaluation checklist: accuracy, throughput, token cost, context retention, tool-use reliability, runtime security, and fine-tuning path.