Reference

Methodology

Open math, reproducible numbers. Every number on the comparator is derived from a public, provider-published source plus the assumptions you set on the home page. This page documents every formula and every limitation.

Last updated · 2026-05-05

Overview#

nfer's pricing math is open and reproducible. Every number on the comparator is derived from a public, provider-published source plus the assumptions you set on the home page (monthly token volume, input/output ratio, cache hit rate, target utilization, quantization). This page documents every assumption, every formula, and every limitation.

Data sources & cadence#

Every price is synced from the provider's own pricing page or public price API. There are no negotiated private rates in the dataset. Most pricing refreshes daily; tiers that don't publish on a clear public schedule (some reserved-capacity offerings, for example) are reviewed when a provider updates them.

You can find the latest update time on every model card in the comparator.

Pricing math#

API providers (per-token): monthly cost is the blended per-token rate times your monthly volume.

monthly_cost
  = volume_tokens
  × ( input_share  × input_price_per_token
    + output_share × output_price_per_token × (1 − cache_hit_rate) )

Default assumptions: input_share = 0.20, output_share = 0.80, cache_hit_rate = 0. Each is editable in the home-page assumptions bar.

GPU rent (hourly): monthly cost is the hourly rate times the hours billed at your target utilization.

monthly_cost
  = hourly_price
  × hours_per_month
  × utilization

We use hours_per_month = 24 × 30 = 720. Utilization defaults to 100% but is editable; a 50% utilization knob halves billed hours, not throughput.

Dedicated / reserved (hourly committed): same shape as GPU rent but with a fixed throughput ceiling. Where a provider doesn't publish a clear hourly equivalent, we approximate using their published reserved discount against on-demand list price; rows derived this way are flagged.

API vs GPU rent break-even#

The break-even point is the daily token volume at which continuously running a rented GPU starts costing less than paying per-token API rates.

tokens_per_day_break_even
  = (gpu_hourly_price × 24)
  / blended_token_price
Worked example

Llama-3-70B Q4 deployment on a single A100 at $1.20/hour, with a blended token price of $0.40/1M tokens:

tokens_per_day_break_even
  = ($1.20 × 24)
  / $0.40 / 1,000,000
  ≈ 72M tokens / day

Below ~72M tokens/day, API is cheaper; above it, the rented GPU wins (assuming you can actually saturate it). A break-even calculation feature is on the roadmap that will let you explore different scenarios - model size, hardware tier, target utilization - and find the threshold that matches your workload.

Currency conversion#

Source prices are stored in their native currency (USD, EUR, GBP). Display currency is selected via the navbar picker. Conversion rates are pulled from authoritative public sources and refreshed regularly so prices stay close to live.

Limitations and known gaps#

  • Reserved/committed pricing is approximated for providers that don't publish per-hour equivalents.
  • Self-hosted and co-location deployments aren't yet covered by the comparator - they're on the roadmap.
  • Fine-grained regional pricing (e.g. AWS region-by-region GPU rates) is collapsed to a representative region per provider; full per-region detail is on the roadmap.
  • Provider coverage is expanding continuously. If a provider you need is missing, please email and we'll prioritize it.

Contributing & corrections#

If a price looks wrong, please email the provider URL and the figure you expected to [email protected]. Most corrections land within a day.