NVIDIA Launches Nemotron 3 Ultra: The Most Capable Open-Weight AI Model From the US

June 4, 2026 · 4 min read

NVIDIA today released Nemotron 3 Ultra, a 550-billion parameter open-weight AI model that immediately becomes the most capable open model ever produced by a US company. The release, which went live on Hugging Face, NVIDIA NIM, and OpenRouter, fulfills the promise CEO Jensen Huang made during his Computex keynote on June 1.

Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index, pulling well ahead of other US open-weight models such as Google's Gemma 4 31B (39 points) and NVIDIA's own Nemotron 3 Super (36 points). The model uses a hybrid Mamba-Transformer mixture-of-experts architecture with approximately 550 billion total parameters and 55 billion active per token — a 90% sparsity ratio that enables its remarkable efficiency.

Speed That Changes the Calculus

The standout feature of Nemotron 3 Ultra is its inference speed. On pre-release endpoints provided by DeepInfra, the model delivers over 300 tokens per second. By comparison, similarly sized Chinese open models from DeepSeek and Moonshot typically run at 50 to 100 tokens per second in production today. This 3x to 6x speed advantage could significantly shift deployment economics for enterprises evaluating open-weight models for agentic AI workloads.

"Nemotron 3 is the most efficient family of open models with leading accuracy for agentic AI applications," NVIDIA said in the model's official announcement. The Ultra variant is the largest of three models in the Nemotron 3 family, joining the previously released Nano and Super models.

The Open-Weight Landscape: US vs. China

While Nemotron 3 Ultra takes the top spot among US open-weight models, it still trails the leading Chinese open model, Moonshot AI's Kimi K2.6, which scores 54 on the same intelligence index. The gap reflects China's aggressive investment in open-weight AI research, driven by companies like DeepSeek, Moonshot, and Alibaba's Qwen team. For context, the strongest closed model overall — Anthropic's Claude Opus 4.8 — scores 61 points.

The competitive pressure from Chinese labs has been a recurring theme at AI conferences this year, and NVIDIA's decision to ship Nemotron 3 Ultra as an open-weight model rather than a proprietary offering is widely seen as a strategic countermove. By making the weights freely available, NVIDIA hopes to accelerate the US open-source AI ecosystem and give enterprises a domestic alternative to Chinese foundation models.

Hardware Requirements and Availability

Nemotron 3 Ultra is not a model that runs on consumer hardware. Its 550 billion parameters require datacenter-grade GPUs for inference. NVIDIA is addressing this through the HP DGX Station, which will offer 775GB of unified memory and is scheduled to arrive in August 2026. In the meantime, developers can access the model through cloud platforms including Hugging Face (free weights), NVIDIA NIM at build.nvidia.com, OpenRouter, and ModelScope.

The model supports a context length of up to 1 million tokens, placing it in the frontier class for long-context reasoning tasks. It was trained using multi-environment reinforcement learning post-training, which NVIDIA says gives it superior accuracy across a broad range of tasks including coding, reasoning, and agentic workflows.

What This Means for the AI Industry

The release of Nemotron 3 Ultra marks an important milestone in the open-weight AI race. For US enterprises that have been hesitant to adopt Chinese open models due to regulatory or geopolitical concerns, NVIDIA's offering provides a domestically sourced alternative with competitive — if not yet leading — intelligence scores. The speed advantage is particularly meaningful for real-time agentic applications where latency directly impacts user experience.

The model also arrives at a time when Washington is paying closer attention to frontier AI capabilities. President Trump signed an executive order on June 2 that creates a voluntary framework for developers to share advanced AI models with the federal government before public release. NVIDIA's decision to release Nemotron 3 Ultra as open weights on a global platform puts the model squarely in the middle of the ongoing debate about how to balance innovation with national security.

"Nemotron 3 Ultra is the first US open-weight model to credibly compete at the frontier. It doesn't top the charts against China's best, but the 300+ tokens per second throughput is a genuine engineering achievement that changes what's practical with open models." — Artificial Analysis, independent AI benchmarking platform

NVIDIA has also released the Nemotron 3 white paper and technical reports for the Nano and Super models, providing the research community with detailed documentation of the architecture, training methodology, and data pipeline. The company has additionally open-sourced training datasets including Nemotron-CC-v2.1, a 2.5-trillion-token dataset derived from Common Crawl, and Nemotron-CC-Code-v1, a 428-billion-token code corpus.

With Nemotron 3 Ultra now live, all three models in the Nemotron 3 family are available. The next frontier for NVIDIA will be closing the intelligence gap with Chinese open models while maintaining the speed advantage that currently sets its architecture apart.

Sources: NVIDIA Research, Artificial Analysis, NVIDIA Nemotron 3 official announcement, the-decoder.com.

NVIDIA Launches Nemotron 3 Ultra: The Most Capable Open-Weight AI Model From the US

Speed That Changes the Calculus

The Open-Weight Landscape: US vs. China

Hardware Requirements and Availability

What This Means for the AI Industry

Related Articles

GLM-5.2: The Open-Weight Model That Matches GPT-5.5 for 1/6 the Cost

Anthropic Launches Claude Fable 5, Its Most Capable AI Model Yet

Google DeepMind's DiffusionGemma Generates Text 4x Faster on Local Hardware

Never Miss Another Call

Enjoyed this article?

Bella