Why it exists#
nfer started life as an internal tool. We were trying to figure out how to reduce our own AI spend and pick the right deployment path for each model we were working with - and we kept hitting the same wall. There's no single place to compare API pricing, dedicated capacity, and rented GPUs side-by-side. We built one for ourselves. It turned out to be the kind of thing other teams could use too, so we opened it up.
There are plenty of tools for picking which model to use. Plenty of great work goes into benchmarking model quality, routing API calls, and indexing the models themselves. Once a team has chosen Llama 3 70B, Mistral, Qwen, or any other model, no comparable tool exists for picking where to deploy it.
Token-priced API providers, dedicated-throughput offerings, and hourly GPU rentals all advertise different prices in different units. nfer normalizes them to monthly cost for your specific workload (your tokens, your I/O ratio, your utilization) and surfaces the cheapest option at a glance, so picking the right deployment is a few clicks instead of an afternoon of spreadsheet wrangling.
25+ providers, more than 100 models, 30+ hardware SKUs, and 600+ price points, kept in sync with each provider's published pricing. You can filter by sovereignty, certifications, region, license, quantization, and pricing type. New providers and models are added regularly.