Temperature
Temperature is a dial that controls how creative or predictable an AI’s responses are. At temperature 0, the AI always picks the most likely word, giving you the same answer every time (good for facts and code). At temperature 1, it considers less likely words too, producing more varied and creative responses (good for brainstorming and writing). Above 1, responses become increasingly random and potentially nonsensical.
Temperature is a hyperparameter applied during the sampling stage of LLM inference that scales the logits (raw prediction scores) before the softmax function converts them into probabilities.
Mathematically:
P(token_i) = exp(logit_i / T) / Σ exp(logit_j / T)Where T is the temperature value.
Effect on probability distribution:
| Temperature | Behavior | Distribution shape |
|---|---|---|
| T = 0 | Deterministic (greedy decoding) | All probability on top token |
| T = 0.1-0.3 | Highly focused, minimal variation | Sharp peak |
| T = 0.5-0.7 | Balanced creativity and coherence | Moderate spread |
| T = 1.0 | Default; model’s natural distribution | As trained |
| T > 1.0 | Increasingly random | Flattened, uniform-like |
Related sampling parameters:
- Top-p (nucleus sampling): only consider tokens whose cumulative probability exceeds p (e.g., top_p=0.9 considers the smallest set of tokens covering 90% probability)
- Top-k: only consider the k most probable tokens
- Temperature + top-p are often used together for fine-grained control
Recommended settings by task:
| Task | Temperature | Reasoning |
|---|---|---|
| Code generation | 0-0.2 | Correctness matters; deterministic is safer |
| Factual Q&A | 0-0.3 | Accuracy over creativity |
| Summarization | 0.3-0.5 | Moderate variation in phrasing |
| Creative writing | 0.7-1.0 | Diverse, expressive output |
| Brainstorming | 0.8-1.2 | Maximum idea diversity |
Temperature comparison
import anthropic
client = anthropic.Anthropic()
prompt = "Name a network protocol used for secure remote access."
for temp in [0.0, 0.5, 1.0]:
responses = []
for _ in range(3):
msg = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=50,
temperature=temp,
messages=[{"role": "user", "content": prompt}]
)
responses.append(msg.content[0].text.strip())
print(f"Temperature {temp}: {responses}")
# Temperature 0.0: ["SSH", "SSH", "SSH"]
# (deterministic: same answer every time)
# Temperature 0.5: ["SSH", "SSH", "IPsec VPN"]
# (mostly consistent, occasional variation)
# Temperature 1.0: ["SSH", "WireGuard", "IPsec with IKEv2"]
# (diverse, all valid but different) Temperature is one of the first parameters engineers tune when building AI applications. Customer support bots use low temperature (0.1-0.3) for consistent, accurate answers. Marketing content generators use medium temperature (0.5-0.7) for natural-sounding variation. Creative tools use high temperature (0.8-1.0) for diverse ideas. A common mistake is using high temperature for code generation, which produces syntactically creative but incorrect code. In production AI systems, temperature is typically exposed as a configuration option so product teams can tune the creativity/accuracy tradeoff without code changes. Claude Code uses low temperature for tool calls and code edits to maximize correctness.