Token
When you type a message to an AI, it does not read your words the way you do. It breaks your text into smaller pieces called tokens. A token might be a whole word (“hello”), part of a word (“un” + “der” + “stand”), or a single character. The AI thinks in tokens, not words. This matters because AI pricing, speed, and memory limits are all measured in tokens.
A token is the atomic unit of text processed by a language model. Tokenization converts raw text into a sequence of integer IDs from a fixed vocabulary (typically 32K-100K entries) using subword algorithms.
Tokenization algorithms:
- Byte Pair Encoding (BPE): iteratively merges the most frequent character pairs. Used by GPT models and Claude.
- SentencePiece: language-agnostic tokenizer that treats text as raw bytes. Used by LLaMA, T5.
- WordPiece: similar to BPE but uses likelihood maximization. Used by BERT.
Token characteristics:
- Common English words are typically 1 token (“the”, “hello”, “code”)
- Less common words are split into subwords (“tokenization” = “token” + “ization”)
- Numbers are often split per digit (“2026” = “20” + “26” or “2” + “0” + “2” + “6”)
- Whitespace and punctuation are separate tokens
- Code is tokenized differently than prose (variable names, operators, indentation)
Rough approximation: 1 token is roughly 4 characters or 0.75 words in English. 100 tokens is about 75 words.
Why tokens matter:
- Context window: measured in tokens (e.g., 200K tokens). Determines how much text the model can process at once.
- Pricing: API costs are per-token (input and output priced separately)
- Speed: generation speed is measured in tokens per second
- Limits: max output length is a token count, not a word count
Tokenization in practice
# Using the tiktoken library (OpenAI tokenizer)
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
text = "Kubernetes orchestrates containerized applications."
tokens = enc.encode(text)
print(f"Text: {text}")
print(f"Tokens: {tokens}")
print(f"Token count: {len(tokens)}")
print(f"Decoded: {[enc.decode([t]) for t in tokens]}")
# Output:
# Text: Kubernetes orchestrates containerized applications.
# Tokens: [42, 13789, 68898, 9059, 56930, 8522, 13]
# Token count: 7
# Decoded: ['K', 'ubernetes', ' orchestrates', ' container', 'ized', ' applications', '.']
# Cost calculation example:
# Claude Sonnet: $3/M input tokens, $15/M output tokens
# 1000-token prompt + 500-token response = $0.003 + $0.0075 = ~$0.01 Tokens are the currency of the AI world. Every API call, every chatbot response, every code completion is measured and billed in tokens. Understanding tokenization helps you write more efficient prompts (shorter prompts cost less and leave more room for output), estimate costs for production AI features, and understand why models sometimes produce unexpected behavior at word boundaries. When building AI-powered applications, token counting is essential for staying within context window limits. Long documents must be chunked into token-sized pieces for processing. The tokenizer’s vocabulary also explains why models handle English better than other languages: English words often map to single tokens, while characters in languages like Chinese, Japanese, or Arabic may require multiple tokens.