Understanding AI Tokens — What They Cost and How to Optimize

AI & Tech March 13, 2026 8 min read

If you've ever used the OpenAI or Anthropic API, you've seen charges measured in "tokens." But what exactly is a token, how much does it cost, and how can you keep your bill under control? This guide breaks it all down.

What Is a Token?

A token is the fundamental unit that AI language models use to process text. It's not exactly a word, and not exactly a character — it's somewhere in between.

For English text, 1 token is roughly 4 characters or about 0.75 words. That means:

Code tends to use more tokens per line because symbols like {, =, and // are often separate tokens. Non-English languages (especially CJK) may use more tokens per word.

2025 AI Model Pricing

Pricing is always quoted per 1 million tokens (1M). Here's how the major models compare:

ModelInput $/1MOutput $/1MContext Window
GPT-4o$2.50$10.00128K
GPT-4o Mini$0.15$0.60128K
Claude 3.5 Sonnet$3.00$15.00200K
Claude 3 Haiku$0.25$1.25200K
Gemini 1.5 Pro$1.25$5.001M
Gemini 1.5 Flash$0.075$0.301M
Output tokens always cost more than input tokens (typically 3-5x more) because the model does more computational work when generating text.

Real-World Cost Examples

Let's put these numbers in perspective:

7 Ways to Reduce Token Costs

  1. Use the cheapest model that works. GPT-4o Mini and Claude 3 Haiku handle 80% of tasks at 10x lower cost. Only upgrade to premium models for complex reasoning.
  2. Trim your prompts. Remove unnecessary context, examples, and verbose instructions. Every word costs money.
  3. Cache repeated prompts. If you're sending the same system prompt with every request, look into prompt caching (Anthropic and OpenAI both offer this).
  4. Limit output length. Set max_tokens to prevent the model from generating unnecessarily long responses.
  5. Use structured output. Requesting JSON instead of prose typically produces shorter, more efficient responses.
  6. Batch similar requests. Process multiple items in a single API call instead of one call per item.
  7. Monitor usage. Set up billing alerts and track token usage per feature to find optimization opportunities.

Context Windows Explained

The context window is the maximum number of tokens a model can process in a single request — this includes both your input and the model's output combined.

A 128K context window means roughly 96,000 words (about 300 pages). Gemini's 1M context can handle entire codebases or book-length documents in one go.

Larger context windows give more flexibility but cost more per request. For most tasks, you'll never need more than 16K-32K tokens.

Check Your Token Count

Paste your prompt to see token count and cost estimate across all major AI models.

Try AI Token Counter →

Key Takeaways