LLM Cost Explorer · Best viewed on a desktop browser
Compare estimated API costs across 25 models from Anthropic, OpenAI, Google, Groq, and open source — at every max_tokens setting from 256 to 8192. Darker green means cheaper; red means expensive. Hover any cell for a full cost breakdown. Use the filters below to highlight models best suited to your use case.
Model use cases
| Model | Vendor | Cost | Best used for |
|---|---|---|---|
| claude-opus-4-7 | Anthropic | $$$$ | Complex reasoning, hard coding, deep research, agentic tasks, nuanced long-form writing, science 🔬 |
| claude-opus-4-6 | Anthropic | $$$$ | Complex reasoning and analysis, research, difficult coding, detailed document generation, science 🔬 |
| claude-sonnet-4-6 | Anthropic | $$$ | Balanced capability and cost — coding, business analysis, content creation, everyday tasks |
| claude-haiku-4-5 | Anthropic | $$ | Fast and cheap — classification, summarisation, chatbots, simple Q&A, high-volume pipelines |
| gpt-4o | OpenAI | $$$ | General purpose — strong coding, vision/image input, real-time applications, tool use |
| gpt-4o-mini | OpenAI | $ | Lightweight tasks — quick answers, classification, low-latency apps, cost-sensitive workloads |
| o3 | OpenAI | $$$$ | Deep reasoning — complex maths, advanced science, hard logic problems, rigorous research 🔬 |
| gemini-2.5-pro | $$$ | Long context and multimodal — large document analysis, video/image understanding, research, science 🔬 | |
| gemini-2.5-flash | $ | Fast and efficient — summarisation, structured extraction, balanced everyday tasks | |
| gemini-1.5-pro | $$ | Very long documents — book-length context, large codebase analysis, extended conversations | |
| llama-4-scout | Groq | $ | High-volume cheap inference — chatbots, simple generation, experimentation, prototyping |
| llama-4-maverick | Groq | $ | Moderate complexity — general Q&A, content drafting, open source flexibility |
| mixtral-8x7b | Groq | $ | Open source general purpose — moderate tasks, multilingual, good baseline for fine-tuning |
| Open Source / Self-hosted — $0 API cost (compute not included) | |||
| llama-3.3-70b | Meta | Free* | Strong general-purpose open weights — coding, analysis, Q&A, content generation. Best balance of capability and size in Llama 3 family |
| llama-3.1-405b | Meta | Free* | Largest open weights Llama — complex reasoning, hard coding, long-form tasks, 128K context. Closest open source rival to GPT-4o |
| deepseek-r1 | DeepSeek | Free* | Open weights reasoning model — hard maths, advanced science 🔬, logic, competitive coding. Rivals o3 on many benchmarks at zero API cost |
| deepseek-v3 | DeepSeek | Free* | Strong open weights general/coding model — instruction following, code generation, analysis. Top-tier for a self-hosted model |
| qwen-2.5-72b | Qwen/Alibaba | Free* | Excellent open weights for coding and multilingual tasks — code generation, translation, structured output, STEM |
| mistral-7b | Mistral | Free* | Lightweight and fast open weights — simple tasks, classification, high-volume pipelines, prototyping. Very resource-efficient |
| phi-4 | Microsoft | Free* | Small but capable open weights — strong reasoning and Q&A despite compact size. Ideal for edge or embedded inference |
$ = under $1/1M output tokens · $$ = $1–$6/1M · $$$ = $6–$18/1M · $$$$ = over $18/1M output tokens · Free* = self-hosted, $0 API cost (compute/hosting costs apply)
Columns — max_tokens reference guide
| max_tokens | Approx. words | Typical use cases |
|---|---|---|
| 256 | ~190 words | Short answer, single paragraph, yes/no with brief explanation, quick translation |
| 512 | ~380 words | Half a page, short email reply, basic code function, brief summary |
| 1 024 | ~750 words | Full page, detailed explanation, short code file, structured list response |
| 2 048 | ~1 500 words | 2–3 pages, short report, medium code review, multi-step reasoning answer |
| 4 096 | ~3 000 words | Long article, full code module, detailed analysis, complex creative writing |
| 8 192 | ~6 000 words | Book chapter, large codebase review, extensive research response, full document draft |
1 token ≈ 0.75 words in English. Word counts are approximate and vary by content type. max_tokens sets the maximum response length — the model may respond with fewer tokens.
Assumptions used in this chart
| Input tokens per run | 500 tokens (typical prompt size) |
| Claude effort rows | max effort applies a 1.8× output token multiplier vs high (default) |
| Adaptive thinking rows | Adds an estimated +1200 thinking tokens (billed as output tokens) |
| Effort multipliers | Representative estimates — Anthropic does not publish exact per-effort token counts |
| Open source models | Shown as $0.00000 / Free — no API fee when self-hosted. Actual cost depends on your compute (GPU cloud, local hardware, etc.) |
⚠️ DISCLAIMER
Prices shown are estimates based on publicly available information and may be outdated or incorrect.
Actual costs vary depending on your usage, prompt caching, batch discounts, free tiers, and each vendor’s current pricing.
Always verify current pricing directly with each vendor before making any financial or architectural decisions.
This chart is for indicative comparison purposes only and should not be treated as a source of truth.