What's the cheapest way to host gemma-3-4b-it?

The cheapest API option for gemma-3-4b-it in nfer's index is DeepInfra at $0.040/1M input + $0.080/1M output tokens. For self-hosted workloads, the cheapest GPU rental that fits is Verda on V100 at $0.14/hour. The right choice depends on your daily token volume — see the break-even chart on this page.

How much VRAM does gemma-3-4b-it need?

gemma-3-4b-it 9 GB at native precision; roughly 5 GB at int8 and 2 GB at int4. Native precision is bf16. Quantization roughly halves (int8) or quarters (int4) the VRAM footprint at modest accuracy cost.

Can I use gemma-3-4b-it commercially?

Released under the gemma license — commercial use is permitted subject to Google's prohibited-uses policy.

What's the difference between API and GPU rental for gemma-3-4b-it?

Token-priced API providers (like DeepInfra) bill per million input/output tokens — best for low or bursty volume. Renting a GPU (e.g. Verda at $0.14/hour) is a flat ~$100.80/month regardless of usage — better economics once you sustain enough tokens per day to justify the fixed cost. The break-even chart on this page shows the exact crossover point.

Is gemma-3-4b-it available with EU data residency?

gemma-3-4b-it is not a European-developed model. 3 EU-owned providers offer hosting in nfer's index — filter on EU sovereignty in the comparator to see them.

Deployment guide

Cheapest way to deploy gemma-3-4b-it in 2026

11 providers compared. API token-pricing, dedicated capacity, and rented GPU costs side-by-side, normalized to monthly cost.



Cheapest API

$0.040 / 1M input tokens

at DeepInfra



Cheapest GPU rental

$0.14 / hour

at Verda on V100

Use full comparator with gemma-3-4b-it

Provider rate cards

11 providers compared

Provider	Region	Quantization					Source
DeepInfra	United States	—	128k	$0.040	$0.080	—	Source ↗
AWS	Multiple regions	—	—	$0.040	$0.080	—	Source ↗

Provider	Hardware			Region		Commitment	Source
Verda	V100	1	16	Finland	$0.14	On-demand	Source ↗
Google Cloud Vertex AI	T4	1	16	United States	$0.35	On-demand	Source ↗
Azure	K80	1	24	Multiple regions	$0.40	On-demand	Source ↗
Verda	RTX A6000	1	48	Finland	$0.49	On-demand	Source ↗
Google Cloud Vertex AI	T4 Virtual Workstation	1	16	United States	$0.55	On-demand	Source ↗
Azure	RTX PRO 6000	1	96	United States	$0.55	On-demand	Source ↗
CoreWeave	RTX 5000	1	16	United States	$0.57	On-demand	Source ↗
CoreWeave	RTX A4000	1	16	United States	$0.61	On-demand	Source ↗
Lambda Labs	Quadro RTX 6000	1	24	United States	$0.69	On-demand	Source ↗
OVHcloud	RTX 5000	1	16	France	$0.70	On-demand	Source ↗
Azure	T4	1	16	Multiple regions	$0.75	On-demand	Source ↗
CoreWeave	RTX A5000	1	24	United States	$0.77	On-demand	Source ↗
Lambda Labs	V100	1	16	United States	$0.79	On-demand	Source ↗
CoreWeave	Tesla V100 NVLINK	1	32	United States	$0.80	On-demand	Source ↗
Verda	RTX 6000 Ada	1	48	Finland	$0.83	On-demand	Source ↗
Scaleway	L4	1	24	France	$0.88	On-demand	Source ↗
OVHcloud	V100	1	16	France	$0.90	On-demand	Source ↗
Verda	L40S	1	48	Finland	$0.91	On-demand	Source ↗
AWS	L4	1	24	Multiple regions	$0.98	On-demand	Source ↗
OVHcloud	V100S	1	32	France	$1.03	On-demand	Source ↗
AWS	T4	1	16	Multiple regions	$1.04	On-demand	Source ↗
Lambda Labs	RTX A6000	1	48	United States	$1.09	On-demand	Source ↗
AWS	A10	1	24	Multiple regions	$1.12	On-demand	Source ↗
OVHcloud	A10	1	24	France	$1.17	On-demand	Source ↗
OVHcloud	L4	1	24	France	$1.17	On-demand	Source ↗
CoreWeave	A40	1	48	United States	$1.28	On-demand	Source ↗
CoreWeave	RTX A6000	1	48	United States	$1.28	On-demand	Source ↗
Lambda Labs	A10	1	24	United States	$1.29	On-demand	Source ↗
Verda	A100 SXM	1	80	Finland	$1.29	On-demand	Source ↗
Azure	A10	1	24	Netherlands	$1.43	On-demand	Source ↗
Google Cloud Vertex AI	P100	1	16	United States	$1.46	On-demand	Source ↗
Google Cloud Vertex AI	P100 Virtual Workstation	1	16	United States	$1.66	On-demand	Source ↗
Verda	RTX PRO 6000	1	96	Finland	$1.69	On-demand	Source ↗
Nebius	L40S	1	48	Netherlands	$1.82	On-demand	Source ↗
Lambda Labs	A100	1	80	United States	$1.99	On-demand	Source ↗
CoreWeave	A100 40GB NVLINK	1	40	United States	$2.06	On-demand	Source ↗
CoreWeave	A100 40GB PCIe	1	40	United States	$2.06	On-demand	Source ↗
OVHcloud	L40S	1	48	France	$2.11	On-demand	Source ↗
CoreWeave	A100 80GB NVLINK	1	80	United States	$2.21	On-demand	Source ↗
CoreWeave	A100 80GB PCIe	1	80	United States	$2.21	On-demand	Source ↗
AWS	L40S	1	48	Multiple regions	$2.24	On-demand	Source ↗
Verda	H100 SXM	1	80	Finland	$2.29	On-demand	Source ↗
Lambda Labs	GH200	1	96	United States	$2.29	On-demand	Source ↗
Google Cloud Vertex AI	V100	1	16	United States	$2.48	On-demand	Source ↗
Lambda Labs	A100 SXM	1	80	United States	$2.79	On-demand	Source ↗
Nebius	H100 SXM	8	80	Netherlands	$2.95	On-demand	Source ↗
Azure	V100	1	16	Multiple regions	$3.06	On-demand	Source ↗
Lambda Labs	H100	1	80	United States	$3.29	On-demand	Source ↗
Verda	H200 SXM	1	141	Finland	$3.39	On-demand	Source ↗
Nebius	H200 SXM	8	141	Netherlands	$3.50	On-demand	Source ↗
OVHcloud	H100 SXM	1	80	France	$3.50	On-demand	Source ↗
OVHcloud	A100 SXM	1	80	France	$3.59	On-demand	Source ↗
Azure	A100	1	80	Multiple regions	$3.67	On-demand	Source ↗
Lambda Labs	H100 SXM	1	80	United States	$3.99	On-demand	Source ↗
Together AI	H100 SXM	1	80	United States	$3.99	On-demand	Source ↗
CoreWeave	H100	1	80	United States	$4.25	On-demand	Source ↗
Verda	B200 SXM	1	192	Finland	$4.89	On-demand	Source ↗
Together AI	H200 SXM	1	141	United States	$5.49	On-demand	Source ↗
Nebius	B200 SXM	8	192	Netherlands	$5.50	On-demand	Source ↗
Nebius	B300 SXM	8	288	Netherlands	$6.10	On-demand	Source ↗
CoreWeave	GH200	1	96	United States	$6.50	On-demand	Source ↗
Lambda Labs	B200 SXM	1	192	United States	$6.69	On-demand	Source ↗
AWS	H100 SXM	1	80	Multiple regions	$6.88	On-demand	Source ↗
Azure	H100	1	80	Multiple regions	$6.98	On-demand	Source ↗
Verda	B300 SXM	1	288	Finland	$6.99	On-demand	Source ↗
Together AI	B200 SXM	1	192	United States	$9.95	On-demand	Source ↗
CoreWeave	L40	8	48	United States	$10.00	On-demand	Source ↗
CoreWeave	L40S	8	48	United States	$18.00	On-demand	Source ↗
CoreWeave	RTX PRO 6000	8	96	United States	$20.00	On-demand	Source ↗
CoreWeave	A100	8	80	United States	$21.60	On-demand	Source ↗
AWS	A100 SXM	8	80	Multiple regions	$23.72	On-demand	Source ↗
Azure	A100 SXM	8	80	Multiple regions	$27.20	On-demand	Source ↗
CoreWeave	GB200	4	192	United States	$42.00	On-demand	Source ↗
CoreWeave	H100 SXM	8	80	United States	$49.24	On-demand	Source ↗
CoreWeave	H200 SXM	8	141	United States	$50.44	On-demand	Source ↗
Azure	MI300X	8	192	Multiple regions	$57.60	On-demand	Source ↗
AWS	H200 SXM	8	141	Multiple regions	$63.30	On-demand	Source ↗
CoreWeave	B200 SXM	8	192	United States	$68.80	On-demand	Source ↗
Azure	H200 SXM	8	141	Multiple regions	$84.80	On-demand	Source ↗
Azure	H100 SXM	8	80	Multiple regions	$88.49	On-demand	Source ↗
Azure	GB200	2	192	Multiple regions	$108.16	On-demand	Source ↗

Break-even chart

API vs. GPU rental

The crossover point at which renting a GPU full-time becomes cheaper than paying per token: ~56M tokens/day.

Your monthly cost

Top 5 cheapest for your workload

Adjust the assumptions below — token volume, input/output ratio, days and hours of usage — to see how the cheapest options shift.

Your workload

Tokens vol.

1M1B

Input / Output

100% Input100% Output

Active days / mo

1 day30 days

Active hours / day

1 hr24 hrs

Cache hit rate

0%100%

Batch APIs

Rank	Provider	Pricing	Hardware	Monthly
#1	Verda	GPU · On-demand	V100	$101
#2	DeepInfra	API	—	$108
#3	AWS	API	—	$108
#4	Google Cloud Vertex AI	GPU · On-demand	T4	$252
#5	Azure	GPU · On-demand	K80	$285

View full results with your assumptions →

Spec sheet

gemma-3-4b-it at a glance

VRAM (native precision): 9 GB
Parameters: 4.3001B
Native precision: bf16
Context length: —
License: gemma
Knowledge cutoff: —
Modalities: text, vision
Access type: Open source
EU developed: No
Origin country: US

Why these numbers

A second opinion on the data

Hardware footprint

gemma-3-4b-it is a 4.3001B-parameter model that needs 9 GB VRAM at bf16 when self-hosted at native precision (fits a single L4/A10). Quantization to int8 typically halves the VRAM requirement; int4 quarters it, at modest accuracy cost. 10 GPU rental providers in nfer's index currently offer hardware that fits this model at native precision.

Cheapest path today

For gemma-3-4b-it: The cheapest API offering is DeepInfra at $0.04/1M input + $0.08/1M output tokens. The cheapest GPU rental that fits the model is Verda on V100 at $0.14/hour. The break-even point between paying per token and renting a GPU depends on your daily volume — see the chart above.

Licensing and fit

Released under the gemma license, gemma-3-4b-it ships with a context length not specified; open-source weights are publicly available.

FAQ

Common questions

What's the cheapest way to host gemma-3-4b-it?
The cheapest API option for gemma-3-4b-it in nfer's index is DeepInfra at $0.040/1M input + $0.080/1M output tokens. For self-hosted workloads, the cheapest GPU rental that fits is Verda on V100 at $0.14/hour. The right choice depends on your daily token volume — see the break-even chart on this page.
How much VRAM does gemma-3-4b-it need?
gemma-3-4b-it 9 GB at native precision; roughly 5 GB at int8 and 2 GB at int4. Native precision is bf16. Quantization roughly halves (int8) or quarters (int4) the VRAM footprint at modest accuracy cost.
Can I use gemma-3-4b-it commercially?
Released under the gemma license — commercial use is permitted subject to Google's prohibited-uses policy.
What's the difference between API and GPU rental for gemma-3-4b-it?
Token-priced API providers (like DeepInfra) bill per million input/output tokens — best for low or bursty volume. Renting a GPU (e.g. Verda at $0.14/hour) is a flat ~$100.80/month regardless of usage — better economics once you sustain enough tokens per day to justify the fixed cost. The break-even chart on this page shows the exact crossover point.
Is gemma-3-4b-it available with EU data residency?
gemma-3-4b-it is not a European-developed model. 3 EU-owned providers offer hosting in nfer's index — filter on EU sovereignty in the comparator to see them.

Prices last updated · 2026-04-30