Understanding AI Tokens — What They Cost and How to Optimize
If you've ever used the OpenAI or Anthropic API, you've seen charges measured in "tokens." But what exactly is a token, how much does it cost, and how can you keep your bill under control? This guide breaks it all down.
What Is a Token?
A token is the fundamental unit that AI language models use to process text. It's not exactly a word, and not exactly a character — it's somewhere in between.
For English text, 1 token is roughly 4 characters or about 0.75 words. That means:
- "Hello" = 1 token
- "I love programming" = 3 tokens
- A typical paragraph (100 words) = ~130 tokens
- A full page (500 words) = ~650 tokens
- A 300-page book = ~100,000 tokens
Code tends to use more tokens per line because symbols like {, =, and // are often separate tokens. Non-English languages (especially CJK) may use more tokens per word.
2025 AI Model Pricing
Pricing is always quoted per 1 million tokens (1M). Here's how the major models compare:
| Model | Input $/1M | Output $/1M | Context Window |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K |
| GPT-4o Mini | $0.15 | $0.60 | 128K |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K |
| Claude 3 Haiku | $0.25 | $1.25 | 200K |
| Gemini 1.5 Pro | $1.25 | $5.00 | 1M |
| Gemini 1.5 Flash | $0.075 | $0.30 | 1M |
Output tokens always cost more than input tokens (typically 3-5x more) because the model does more computational work when generating text.
Real-World Cost Examples
Let's put these numbers in perspective:
- Summarizing a 10-page document with GPT-4o: ~3,000 input tokens + ~500 output tokens = $0.0125 (~1 cent)
- 100 customer support chats per day with Claude 3 Haiku: ~$0.50/day or $15/month
- Code review of a 500-line file with GPT-4o: ~2,000 tokens in + ~1,000 out = $0.015 (~1.5 cents)
- Processing 10,000 product descriptions with Gemini Flash: ~$0.50 total
7 Ways to Reduce Token Costs
- Use the cheapest model that works. GPT-4o Mini and Claude 3 Haiku handle 80% of tasks at 10x lower cost. Only upgrade to premium models for complex reasoning.
- Trim your prompts. Remove unnecessary context, examples, and verbose instructions. Every word costs money.
- Cache repeated prompts. If you're sending the same system prompt with every request, look into prompt caching (Anthropic and OpenAI both offer this).
- Limit output length. Set
max_tokensto prevent the model from generating unnecessarily long responses. - Use structured output. Requesting JSON instead of prose typically produces shorter, more efficient responses.
- Batch similar requests. Process multiple items in a single API call instead of one call per item.
- Monitor usage. Set up billing alerts and track token usage per feature to find optimization opportunities.
Context Windows Explained
The context window is the maximum number of tokens a model can process in a single request — this includes both your input and the model's output combined.
A 128K context window means roughly 96,000 words (about 300 pages). Gemini's 1M context can handle entire codebases or book-length documents in one go.
Larger context windows give more flexibility but cost more per request. For most tasks, you'll never need more than 16K-32K tokens.
Check Your Token Count
Paste your prompt to see token count and cost estimate across all major AI models.
Try AI Token Counter →Key Takeaways
- 1 token ≈ 4 characters ≈ 0.75 words for English text
- Output tokens cost 3-5x more than input tokens
- Start with cheaper models and only upgrade when needed
- Trim prompts, cache system messages, and limit output length to save money
- Use our free AI Token Counter to estimate costs before making API calls