LLM Cost Explorer

LLM Cost Explorer · Best viewed on a desktop browser

Compare estimated API costs across 25 models from Anthropic, OpenAI, Google, Groq, and open source — at every max_tokens setting from 256 to 8192. Darker green means cheaper; red means expensive. Hover any cell for a full cost breakdown. Use the filters below to highlight models best suited to your use case.

Highlight by use case:

Use case spotlight

Model use cases

Model	Vendor	Cost	Best used for
claude-opus-4-7	Anthropic	$$$$	Complex reasoning, hard coding, deep research, agentic tasks, nuanced long-form writing, science 🔬
claude-opus-4-6	Anthropic	$$$$	Complex reasoning and analysis, research, difficult coding, detailed document generation, science 🔬
claude-sonnet-4-6	Anthropic	$$$	Balanced capability and cost — coding, business analysis, content creation, everyday tasks
claude-haiku-4-5	Anthropic	$$	Fast and cheap — classification, summarisation, chatbots, simple Q&A, high-volume pipelines
gpt-4o	OpenAI	$$$	General purpose — strong coding, vision/image input, real-time applications, tool use
gpt-4o-mini	OpenAI	$	Lightweight tasks — quick answers, classification, low-latency apps, cost-sensitive workloads
o3	OpenAI	$$$$	Deep reasoning — complex maths, advanced science, hard logic problems, rigorous research 🔬
gemini-2.5-pro	Google	$$$	Long context and multimodal — large document analysis, video/image understanding, research, science 🔬
gemini-2.5-flash	Google	$	Fast and efficient — summarisation, structured extraction, balanced everyday tasks
gemini-1.5-pro	Google	$$	Very long documents — book-length context, large codebase analysis, extended conversations
llama-4-scout	Groq	$	High-volume cheap inference — chatbots, simple generation, experimentation, prototyping
llama-4-maverick	Groq	$	Moderate complexity — general Q&A, content drafting, open source flexibility
mixtral-8x7b	Groq	$	Open source general purpose — moderate tasks, multilingual, good baseline for fine-tuning
Open Source / Self-hosted — $0 API cost (compute not included)
llama-3.3-70b	Meta	Free*	Strong general-purpose open weights — coding, analysis, Q&A, content generation. Best balance of capability and size in Llama 3 family
llama-3.1-405b	Meta	Free*	Largest open weights Llama — complex reasoning, hard coding, long-form tasks, 128K context. Closest open source rival to GPT-4o
deepseek-r1	DeepSeek	Free*	Open weights reasoning model — hard maths, advanced science 🔬, logic, competitive coding. Rivals o3 on many benchmarks at zero API cost
deepseek-v3	DeepSeek	Free*	Strong open weights general/coding model — instruction following, code generation, analysis. Top-tier for a self-hosted model
qwen-2.5-72b	Qwen/Alibaba	Free*	Excellent open weights for coding and multilingual tasks — code generation, translation, structured output, STEM
mistral-7b	Mistral	Free*	Lightweight and fast open weights — simple tasks, classification, high-volume pipelines, prototyping. Very resource-efficient
phi-4	Microsoft	Free*	Small but capable open weights — strong reasoning and Q&A despite compact size. Ideal for edge or embedded inference

$ = under $1/1M output tokens · $$ = $1–$6/1M · $$$ = $6–$18/1M · $$$$ = over $18/1M output tokens · Free* = self-hosted, $0 API cost (compute/hosting costs apply)

Columns — max_tokens reference guide

max_tokens	Approx. words	Typical use cases
256	~190 words	Short answer, single paragraph, yes/no with brief explanation, quick translation
512	~380 words	Half a page, short email reply, basic code function, brief summary
1 024	~750 words	Full page, detailed explanation, short code file, structured list response
2 048	~1 500 words	2–3 pages, short report, medium code review, multi-step reasoning answer
4 096	~3 000 words	Long article, full code module, detailed analysis, complex creative writing
8 192	~6 000 words	Book chapter, large codebase review, extensive research response, full document draft

1 token ≈ 0.75 words in English. Word counts are approximate and vary by content type. max_tokens sets the maximum response length — the model may respond with fewer tokens.

Assumptions used in this chart

Input tokens per run	500 tokens (typical prompt size)
Claude effort rows	max effort applies a 1.8× output token multiplier vs high (default)
Adaptive thinking rows	Adds an estimated +1200 thinking tokens (billed as output tokens)
Effort multipliers	Representative estimates — Anthropic does not publish exact per-effort token counts
Open source models	Shown as $0.00000 / Free — no API fee when self-hosted. Actual cost depends on your compute (GPU cloud, local hardware, etc.)

⚠️ DISCLAIMER

Prices shown are estimates based on publicly available information and may be outdated or incorrect.

Actual costs vary depending on your usage, prompt caching, batch discounts, free tiers, and each vendor’s current pricing.

Always verify current pricing directly with each vendor before making any financial or architectural decisions.

This chart is for indicative comparison purposes only and should not be treated as a source of truth.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.