What's the cheapest way to host NVIDIA-Nemotron-3-Super-120B-A12B-BF16?

The cheapest API option for NVIDIA-Nemotron-3-Super-120B-A12B-BF16 in nfer's index is AWS at $0.150/1M input + $0.650/1M output tokens. For self-hosted workloads, the cheapest GPU rental that fits is Azure on RTX PRO 6000 at $1.24/hour. The right choice depends on your daily token volume — see the break-even chart on this page.

How much VRAM does NVIDIA-Nemotron-3-Super-120B-A12B-BF16 need?

NVIDIA-Nemotron-3-Super-120B-A12B-BF16 248 GB at native precision; roughly 124 GB at int8 and 62 GB at int4. Native precision is bf16. Quantization roughly halves (int8) or quarters (int4) the VRAM footprint at modest accuracy cost.

Can I use NVIDIA-Nemotron-3-Super-120B-A12B-BF16 commercially?

Released under other. Most permissive open-source licenses allow commercial use; confirm specific clauses with the publisher before deploying.

What's the difference between API and GPU rental for NVIDIA-Nemotron-3-Super-120B-A12B-BF16?

Token-priced API providers (like AWS) bill per million input/output tokens — best for low or bursty volume. Renting a GPU (e.g. Azure at $1.24/hour) is a flat ~$894.96/month regardless of usage — better economics once you sustain enough tokens per day to justify the fixed cost. The break-even chart on this page shows the exact crossover point.

Is NVIDIA-Nemotron-3-Super-120B-A12B-BF16 available with EU data residency?

NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is not a European-developed model. 2 EU-owned providers offer hosting in nfer's index — filter on EU sovereignty in the comparator to see them.

Deployment guide

Cheapest way to deploy NVIDIA-Nemotron-3-Super-120B-A12B-BF16 in 2026

7 providers compared. API token-pricing, dedicated capacity, and rented GPU costs side-by-side, normalized to monthly cost.



Cheapest API

$0.150 / 1M input tokens

at AWS



Cheapest GPU rental

$1.24 / hour

at Azure on RTX PRO 6000

Use full comparator with NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Provider rate cards

7 providers compared

Provider	Region	Quantization					Source
AWS	Multiple regions	—	—	$0.150	$0.650	—	Source ↗
Baseten	United States	—	—	$0.300	$0.750	—	Source ↗

Provider	Hardware			Region		Commitment	Source
Azure	RTX PRO 6000	4	96	Multiple regions	$1.24	On-demand	Source ↗
Nebius	H100 SXM	8	80	Netherlands	$2.95	On-demand	Source ↗
Nebius	H200 SXM	8	141	Netherlands	$3.50	On-demand	Source ↗
Verda	RTX A6000	8	48	Finland	$3.92	On-demand	Source ↗
Verda	A100 SXM	4	80	Finland	$5.16	On-demand	Source ↗
Nebius	B200 SXM	8	192	Netherlands	$5.50	On-demand	Source ↗
Nebius	B300 SXM	8	288	Netherlands	$6.10	On-demand	Source ↗
Verda	RTX 6000 Ada	8	48	Finland	$6.61	On-demand	Source ↗
Verda	RTX PRO 6000	4	96	Finland	$6.76	On-demand	Source ↗
Verda	L40S	8	48	Finland	$7.31	On-demand	Source ↗
Verda	H100 SXM	4	80	Finland	$9.16	On-demand	Source ↗
Verda	B200 SXM	2	192	Finland	$9.78	On-demand	Source ↗
CoreWeave	L40	8	48	United States	$10.00	On-demand	Source ↗
Verda	H200 SXM	4	141	Finland	$13.56	On-demand	Source ↗
Verda	B300 SXM	2	288	Finland	$13.98	On-demand	Source ↗
OVHcloud	H100 SXM	4	80	France	$14.01	On-demand	Source ↗
OVHcloud	A100 SXM	4	80	France	$14.39	On-demand	Source ↗
Azure	A100	4	80	Multiple regions	$14.69	On-demand	Source ↗
CoreWeave	L40S	8	48	United States	$18.00	On-demand	Source ↗
CoreWeave	RTX PRO 6000	8	96	United States	$20.00	On-demand	Source ↗
CoreWeave	A100	8	80	United States	$21.60	On-demand	Source ↗
AWS	A100 SXM	8	80	Multiple regions	$23.72	On-demand	Source ↗
Azure	A100 SXM	8	80	Multiple regions	$27.20	On-demand	Source ↗
AWS	L40S	8	48	Multiple regions	$30.13	On-demand	Source ↗
CoreWeave	GB200	4	192	United States	$42.00	On-demand	Source ↗
CoreWeave	H100 SXM	8	80	United States	$49.24	On-demand	Source ↗
CoreWeave	H200 SXM	8	141	United States	$50.44	On-demand	Source ↗
AWS	H100 SXM	8	80	Multiple regions	$55.04	On-demand	Source ↗
Azure	MI300X	8	192	Multiple regions	$57.60	On-demand	Source ↗
AWS	H200 SXM	8	141	Multiple regions	$63.30	On-demand	Source ↗
CoreWeave	B200 SXM	8	192	United States	$68.80	On-demand	Source ↗
Azure	H200 SXM	8	141	Multiple regions	$84.80	On-demand	Source ↗
Azure	H100 SXM	8	80	Multiple regions	$88.49	On-demand	Source ↗
Azure	GB200	2	192	Multiple regions	$108.16	On-demand	Source ↗

Break-even chart

API vs. GPU rental

The crossover point at which renting a GPU full-time becomes cheaper than paying per token: ~75M tokens/day.

Your monthly cost

Top 5 cheapest for your workload

Adjust the assumptions below — token volume, input/output ratio, days and hours of usage — to see how the cheapest options shift.

Your workload

Tokens vol.

1M1B

Input / Output

100% Input100% Output

Active days / mo

1 day30 days

Active hours / day

1 hr24 hrs

Cache hit rate

0%100%

Batch APIs

Rank	Provider	Pricing	Hardware	Monthly
#1	AWS	API	—	$825
#2	Azure	GPU · On-demand	RTX PRO 6000	$895
#3	Baseten	API	—	$990
#4	Nebius	GPU · On-demand	H100 SXM	$2.12k
#5	Nebius	GPU · On-demand	H200 SXM	$2.52k

View full results with your assumptions →

Spec sheet

NVIDIA-Nemotron-3-Super-120B-A12B-BF16 at a glance

VRAM (native precision): 248 GB
Parameters: 123.611B
Native precision: bf16
Context length: —
License: other
Knowledge cutoff: —
Modalities: text
Access type: Open source
EU developed: No
Origin country: US

Why these numbers

A second opinion on the data

Hardware footprint

NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is a 123.611B-parameter model that needs 248 GB VRAM at bf16 when self-hosted at native precision (needs 4x H100). Quantization to int8 typically halves the VRAM requirement; int4 quarters it, at modest accuracy cost. 6 GPU rental providers in nfer's index currently offer hardware that fits this model at native precision.

Cheapest path today

For NVIDIA-Nemotron-3-Super-120B-A12B-BF16: The cheapest API offering is AWS at $0.15/1M input + $0.65/1M output tokens. The cheapest GPU rental that fits the model is Azure on RTX PRO 6000 at $1.24/hour. The break-even point between paying per token and renting a GPU depends on your daily volume — see the chart above.

Licensing and fit

Released under the other license, NVIDIA-Nemotron-3-Super-120B-A12B-BF16 ships with a context length not specified; open-source weights are publicly available.

FAQ

Common questions

What's the cheapest way to host NVIDIA-Nemotron-3-Super-120B-A12B-BF16?
The cheapest API option for NVIDIA-Nemotron-3-Super-120B-A12B-BF16 in nfer's index is AWS at $0.150/1M input + $0.650/1M output tokens. For self-hosted workloads, the cheapest GPU rental that fits is Azure on RTX PRO 6000 at $1.24/hour. The right choice depends on your daily token volume — see the break-even chart on this page.
How much VRAM does NVIDIA-Nemotron-3-Super-120B-A12B-BF16 need?
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 248 GB at native precision; roughly 124 GB at int8 and 62 GB at int4. Native precision is bf16. Quantization roughly halves (int8) or quarters (int4) the VRAM footprint at modest accuracy cost.
Can I use NVIDIA-Nemotron-3-Super-120B-A12B-BF16 commercially?
Released under other. Most permissive open-source licenses allow commercial use; confirm specific clauses with the publisher before deploying.
What's the difference between API and GPU rental for NVIDIA-Nemotron-3-Super-120B-A12B-BF16?
Token-priced API providers (like AWS) bill per million input/output tokens — best for low or bursty volume. Renting a GPU (e.g. Azure at $1.24/hour) is a flat ~$894.96/month regardless of usage — better economics once you sustain enough tokens per day to justify the fixed cost. The break-even chart on this page shows the exact crossover point.
Is NVIDIA-Nemotron-3-Super-120B-A12B-BF16 available with EU data residency?
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 is not a European-developed model. 2 EU-owned providers offer hosting in nfer's index — filter on EU sovereignty in the comparator to see them.

Prices last updated · 2026-04-30