Does NVIDIA Nemotron 3 Ultra NVFP4 support zero data retention?

Not self-serve on Pay-as-you-go. Zero data retention for NVIDIA Nemotron 3 Ultra NVFP4 is available on request via Opper Enterprise — contact hello@opper.click.

Where is NVIDIA Nemotron 3 Ultra NVFP4 hosted?

NVIDIA Nemotron 3 Ultra NVFP4 runs on US hosting routes through the Opper gateway.

Is NVIDIA Nemotron 3 Ultra NVFP4 trained on prompts and outputs?

No. Every hosting route for NVIDIA Nemotron 3 Ultra NVFP4 is configured so customer prompts and outputs are not used to train models by default.

Is a GDPR DPA available for NVIDIA Nemotron 3 Ultra NVFP4?

Yes. A GDPR Data Processing Agreement is available on at least one route serving NVIDIA Nemotron 3 Ultra NVFP4. Contact hello@opper.click to receive it.

NVIDIA Nemotron 3 Ultra NVFP4

by NVIDIA

NVIDIA Nemotron 3 Ultra NVFP4 is the 4-bit floating-point build of NVIDIA's largest open-weight reasoning model, pairing 550B total parameters with 55B active per token through a hybrid Mamba-Transformer LatentMoE architecture. It carries Multi-Token Prediction layers for native speculative decoding and is built for long-context, high-complexity agentic workloads. NVFP4 is NVIDIA's 4-bit floating-point format, and this build is pre-trained directly in NVFP4 using a quantization-aware recipe, keeping the bulk of its layers in that compact representation while holding select layers in higher precision for stability. The result is a frontier-scale Mixture-of-Experts that needs far less accelerator memory to run, supported on NVIDIA Hopper and Blackwell GPUs and deployable through frameworks like vLLM, SGLang, and TensorRT-LLM. The model targets the most demanding agentic AI, including complex multi-step agents, long-context analysis, and high-accuracy reasoning over code, math, and science. It is released under the permissive OpenMDW License (v1.1), alongside a BF16 build for different hardware configurations.

Open in playground Try in chat Compare

Key info

Input

Output

Features

Context window

262K

Max output

—

Input price

$0.60 /1M

Output price

$2.40 /1M

US residency available
Zero data retention via Enterprise
No training by default
GDPR DPA available

Available routes

NVIDIA Nemotron 3 Ultra NVFP4 runs on 1 route through the Opper gateway. Compare residency, ZDR, and training posture at a glance — full data-handling detail per route below.

Provider	Region	Zero data retention	Training	Input	Output
Fireworks	US	Enterprise	No	$0.60	$2.40

Training posture across routes: No training on prompts by default.

Data handling per route

Each route hosting NVIDIA Nemotron 3 Ultra NVFP4 has its own privacy posture, residency, and GDPR terms. Postures are maintained by Opper with a last-verification timestamp.

Fireworks — United States🇺🇸

Zero data retention is available via Opper Enterprise contract. No training on customer data. US; SCCs; DPA available.

Zero data retention: Available via Opper Enterprise contract.
Training: No training on customer data.
Logging: None
Third-party access: None disclosed
GDPR DPA: DPA available
Transfer mechanism: SCCs

Get started

Call NVIDIA Nemotron 3 Ultra NVFP4 through the Opper gateway with one API key. Let your coding agent set it up, or call it directly — Opper is drop-in compatible with the OpenAI, Anthropic, and Google AI SDKs.

Set it up with your agent

Copy this and paste it into your coding agent — Claude Code, Cursor, Codex, and more — and it'll wire up Opper for you.

Or call it directly

1import OpenAI from "openai";
2
3const client = new OpenAI({
4  apiKey: process.env.OPPER_API_KEY,
5  baseURL: "https://api.opper.click/v3/compat",
6});
7
8const completion = await client.chat.completions.create({
9  model: "fireworks/nemotron-3-ultra-nvfp4",
10  messages: [{ role: "user", content: "Hello" }],
11});
12console.log(completion.choices[0].message.content);