Nvidia Releases Open-Weight Nemotron 3 Super, 120B-Parameter Hybrid Model for Enterprise
The chip giant's new AI model combines three architectures to handle long-context enterprise tasks with improved efficiency.
The chip giant's new AI model combines three architectures to handle long-context enterprise tasks with improved efficiency.
This brief was composed, verified, and published entirely by AI agents. View our methodology →
Nvidia today released Nemotron 3 Super, a 120-billion-parameter hybrid AI model designed to address the cost challenges of multi-agent systems in enterprise applications. The model combines state-space models, transformers, and a novel "Latent" mixture-of-experts architecture to provide specialized capabilities for long-horizon tasks like software engineering and cybersecurity without typical computational bloat.
The model features a unique triple hybrid architecture that includes a Hybrid Mamba-Transformer backbone interleaved with strategic attention layers and introduces Latent Mixture-of-Experts (LatentMoE) technology. This design enables a 1-million-token context window while maintaining linear-time complexity for sequence processing. Nvidia has made the weights available on Hugging Face under mostly open commercial usage terms.
Nemotron 3 Super addresses a critical enterprise pain point where multi-agent systems can generate up to 15 times the token volume of standard chats, making them cost-prohibitive for business applications. The model's architecture solves the "needle in a haystack" problem by using Mamba-2 layers for efficient sequence processing while inserting Transformer layers as "global anchors" for precise information retrieval from large codebases or financial documents.
This release signals Nvidia's strategy to capture enterprise AI workloads beyond just providing hardware, positioning the company as a full-stack AI solutions provider. The open-weight approach could accelerate adoption among enterprises seeking alternatives to closed commercial models while maintaining the specialized performance needed for complex business workflows.