When programatically using an AI chatbot API, it is easy to run up big bills. To avoid this, carefully moniter token usage, but resist the urge to estimate it by simply dividing character count. It may be inaccurate unless the following are true:
- You use the same tokenizer. For example, Claude 3.5 Sonnet and Haiku 3.5 use the same tokenizer, but each vendor uses their own tokenizer.
- Your inputs are always have the same proportion of text and images. If always text, that is a good start. If a short text plus image of same dimensions, that might work.
- Your input prompts have low diversity. If some prompts are in English and others in Japanese, then the ratios will be wrong. Likewise, if some are mostly English while others are mostly source code, then your estimates will be poor.
Token counts for various documents
To make this more concrete, I tested two tokenizers across a variety of prompts:
Text type | characters | Amazon Nova Lite tokens | Amazon characters/tokens ratio | Claude 3 Haiku tokens | Amazon vs Claude difference % |
Python code | 14,914 | 4,331 | 3.4 | 4,648 | 7% |
news article English | 6,555 | 6,034 | 1.1 | 6,104 | 1% |
privacy policy in English and HTML | 90,179 | 31,436 | 2.9 | 32,937 | 5% |
JPEG photo, width 500px | 74,660 | 970 | 77.0 | 503 | -48% |
Wikipedia article in Japanese | 1,793 | 2,754 | 0.7 | 2,014 | -27% |
Different Wikipedia article in Chinese | 2,715 | 5,717 | 0.5 | 4,662 | -18% |
Linux /var/log | 8,543 | 3,942 | 2.2 | 3,503 | -11% |
minified javascript | 20,856 | 9,294 | 2.2 | 9,366 | 1% |
The Raven by Poe | 6,838 | 1,628 | 4.2 | 1,900 | 17% |
Russian poem in Cyrillic | 591 | 277 | 2.1 | 274 | -1% |
SQL DML | 9,265 | 5,843 | 1.6 | 4,851 | -17% |
Assembly code | 10,621 | 4,265 | 2.5 | 4,842 | 14% |
UTF-8 sampler | 10,467 | 5,338 | 2.0 | 5,979 | 12% |
Thai Wikipedia | 7,645 | 6,223 | 1.2 | 6,592 | 6% |
Bengali | 7,191 | 6,119 | 1.2 | 6,624 | 8% |
Microsoft policy in Korean | 6,945 | 6,775 | 1.0 | 6,348 | -6% |
rare words, English | 9,157 | 1,995 | 4.6 | 2,339 | 17% |
Your results may vary, so run some tests. Tip: use your favorite API's "playground" function as a quick way to benchmark token counts without writing more code.
Correct code
Here is example code for accurately getting token counts from Amazon Bedrock.
response = bedrock.converse( modelId=full_model_id, messages=conversation, inferenceConfig={"maxTokens": max_tokens, "temperature": temperature, "topP": top_p},) output_text = response["output"]["message"]["content"][0]["text"] output_characters = len(ret.output_text) input_tokens = response["usage"]["inputTokens"] output_tokens = response["usage"]["outputTokens"]
For OpenAI ChatGPT, check the usage field in the JSON structure.
Code for Google Gemini.
from google import genai client = genai.Client() prompt = "The quick brown fox jumps over the lazy dog." # Count tokens using the new client method. total_tokens = client.models.count_tokens( model="gemini-2.0-flash", contents=prompt ) print("total_tokens: ", total_tokens) # ( e.g., total_tokens: 10 ) response = client.models.generate_content( model="gemini-2.0-flash", contents=prompt ) # The usage_metadata provides detailed token counts. print(response.usage_metadata) # ( e.g., prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )