Understanding AI Tokens — What They Cost and How to Optimize

AI & Tech March 13, 2026 8 min read

If you've ever used the OpenAI or Anthropic API, you've seen charges measured in "tokens." But what exactly is a token, how much does it cost, and how can you keep your bill under control? This guide breaks it all down.

What Is a Token?

A token is the fundamental unit that AI language models use to process text. It's not exactly a word, and not exactly a character — it's somewhere in between.

For English text, 1 token is roughly 4 characters or about 0.75 words. That means:

"Hello" = 1 token
"I love programming" = 3 tokens
A typical paragraph (100 words) = ~130 tokens
A full page (500 words) = ~650 tokens
A 300-page book = ~100,000 tokens

Code tends to use more tokens per line because symbols like {, =, and // are often separate tokens. Non-English languages (especially CJK) may use more tokens per word.

2025 AI Model Pricing

Pricing is always quoted per 1 million tokens (1M). Here's how the major models compare:

Model	Input $/1M	Output $/1M	Context Window
GPT-4o	$2.50	$10.00	128K
GPT-4o Mini	$0.15	$0.60	128K
Claude 3.5 Sonnet	$3.00	$15.00	200K
Claude 3 Haiku	$0.25	$1.25	200K
Gemini 1.5 Pro	$1.25	$5.00	1M
Gemini 1.5 Flash	$0.075	$0.30	1M

Output tokens always cost more than input tokens (typically 3-5x more) because the model does more computational work when generating text.

Real-World Cost Examples

Let's put these numbers in perspective:

Summarizing a 10-page document with GPT-4o: ~3,000 input tokens + ~500 output tokens = $0.0125 (~1 cent)
100 customer support chats per day with Claude 3 Haiku: ~$0.50/day or $15/month
Code review of a 500-line file with GPT-4o: ~2,000 tokens in + ~1,000 out = $0.015 (~1.5 cents)
Processing 10,000 product descriptions with Gemini Flash: ~$0.50 total

7 Ways to Reduce Token Costs

Use the cheapest model that works. GPT-4o Mini and Claude 3 Haiku handle 80% of tasks at 10x lower cost. Only upgrade to premium models for complex reasoning.
Trim your prompts. Remove unnecessary context, examples, and verbose instructions. Every word costs money.
Cache repeated prompts. If you're sending the same system prompt with every request, look into prompt caching (Anthropic and OpenAI both offer this).
Limit output length. Set max_tokens to prevent the model from generating unnecessarily long responses.
Use structured output. Requesting JSON instead of prose typically produces shorter, more efficient responses.
Batch similar requests. Process multiple items in a single API call instead of one call per item.
Monitor usage. Set up billing alerts and track token usage per feature to find optimization opportunities.

Context Windows Explained

The context window is the maximum number of tokens a model can process in a single request — this includes both your input and the model's output combined.

A 128K context window means roughly 96,000 words (about 300 pages). Gemini's 1M context can handle entire codebases or book-length documents in one go.

Larger context windows give more flexibility but cost more per request. For most tasks, you'll never need more than 16K-32K tokens.

Check Your Token Count

Paste your prompt to see token count and cost estimate across all major AI models.

Try AI Token Counter →

Key Takeaways

1 token ≈ 4 characters ≈ 0.75 words for English text
Output tokens cost 3-5x more than input tokens
Start with cheaper models and only upgrade when needed
Trim prompts, cache system messages, and limit output length to save money
Use our free AI Token Counter to estimate costs before making API calls