Skip to main content
Back to blog
7 min read

How to count LLM tokens for free, and stop overpaying for API calls

Tokens, not words, decide what an LLM costs and whether your text fits. Here's how to count GPT tokens exactly, estimate cost per model, and trim or split text when it doesn't fit, all in your browser.

If you build anything on top of GPT, Claude or Gemini, two questions come up constantly: how much will this prompt cost? and will it even fit? Both are answered by the same number, tokens, and most people are guessing at it. Here is how to count them exactly, for free, without sending your text anywhere.

Tokens are not words

A token is a chunk of text the model treats as a unit, usually a short word or a piece of one. As a rough rule, English runs about 4 characters per token, or roughly ¾ of a word. But it's lumpy: common words are a single token, rare words split into several, and code, punctuation, emoji and non-Latin scripts all change the ratio. That's why "estimate by word count" is unreliable, especially near a context limit where being off by 10% means a failed call.

Count them exactly, in your browser

The LLM Token Counterruns the real GPT tokenizer (the same byte-pair encoding OpenAI uses) directly in your tab. Paste any text and you get the exact GPT token count, plus words, characters and lines. Because it's the actual tokenizer rather than a heuristic, the GPT number is exact; Claude, Gemini and Llama are shown as close approximations, since their tokenizers aren't public.

Nothing you paste is uploaded, there is no API call behind it, which matters when the text you're measuring is a customer export or an unreleased draft.

Turn tokens into dollars

Token count alone doesn't tell you the bill. The counter multiplies your tokens by each model's published input price, so you can see at a glance that the same prompt might be a fraction of a cent on GPT-4o mini and meaningfully more on a frontier model. If you call an API in a loop, that per-call difference is the difference between a viable feature and a surprise invoice. It also estimates a round-trip cost assuming the reply is about the same size as the input, a decent back-of-envelope for chat-style usage.

Will it fit the context window?

Each model has a maximum context, 128K tokens for GPT-4o, 200K for Claude, up to a million for Gemini 1.5 Pro. The counter shows a green dot when your text fits a given model and red when it doesn't, so you can pick the right model for the job instead of discovering the ceiling the hard way.

When it doesn't fit

Two moves, depending on whether you can afford to lose detail:

  • You only need the gist. Run the text through the Text Summarizer to cut it to its key sentences, often a 60-80% reduction, then re-check the count.
  • You need all of it. Use the Text Chunker to split the text into token-bounded pieces with overlap, ideal for embeddings and retrieval-augmented generation where each chunk is stored and searched separately.

A quick workflow

Starting from a file rather than plain text? Convert it first, with PDF to AI-ready text or Office to Text, both of which show the token count and per-model cost on the spot. Then count, trim if needed, and only then send. It takes under a minute and turns "I hope this fits and isn't too pricey" into a number you can see.

Tools mentioned in this post

Read next