general artificial-intelligence

Temperature

Updated April 15, 2026

temperature ai sampling parameters

Plain English

Temperature is a dial that controls how creative or predictable an AI’s responses are. At temperature 0, the AI always picks the most likely word, giving you the same answer every time (good for facts and code). At temperature 1, it considers less likely words too, producing more varied and creative responses (good for brainstorming and writing). Above 1, responses become increasingly random and potentially nonsensical.

Technical Definition

Temperature is a hyperparameter applied during the sampling stage of LLM inference that scales the logits (raw prediction scores) before the softmax function converts them into probabilities.

Mathematically:

P(token_i) = exp(logit_i / T) / Σ exp(logit_j / T)

Where T is the temperature value.

Effect on probability distribution:

Temperature	Behavior	Distribution shape
T = 0	Deterministic (greedy decoding)	All probability on top token
T = 0.1-0.3	Highly focused, minimal variation	Sharp peak
T = 0.5-0.7	Balanced creativity and coherence	Moderate spread
T = 1.0	Default; model’s natural distribution	As trained
T > 1.0	Increasingly random	Flattened, uniform-like

Related sampling parameters:

Top-p (nucleus sampling): only consider tokens whose cumulative probability exceeds p (e.g., top_p=0.9 considers the smallest set of tokens covering 90% probability)
Top-k: only consider the k most probable tokens
Temperature + top-p are often used together for fine-grained control

Recommended settings by task:

Task	Temperature	Reasoning
Code generation	0-0.2	Correctness matters; deterministic is safer
Factual Q&A	0-0.3	Accuracy over creativity
Summarization	0.3-0.5	Moderate variation in phrasing
Creative writing	0.7-1.0	Diverse, expressive output
Brainstorming	0.8-1.2	Maximum idea diversity

Temperature comparison

import anthropic

client = anthropic.Anthropic()

prompt = "Name a network protocol used for secure remote access."

for temp in [0.0, 0.5, 1.0]:
    responses = []
    for _ in range(3):
        msg = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=50,
            temperature=temp,
            messages=[{"role": "user", "content": prompt}]
        )
        responses.append(msg.content[0].text.strip())
    print(f"Temperature {temp}: {responses}")

# Temperature 0.0: ["SSH", "SSH", "SSH"]
#   (deterministic: same answer every time)
# Temperature 0.5: ["SSH", "SSH", "IPsec VPN"]
#   (mostly consistent, occasional variation)
# Temperature 1.0: ["SSH", "WireGuard", "IPsec with IKEv2"]
#   (diverse, all valid but different)

In the Wild

Temperature is one of the first parameters engineers tune when building AI applications. Customer support bots use low temperature (0.1-0.3) for consistent, accurate answers. Marketing content generators use medium temperature (0.5-0.7) for natural-sounding variation. Creative tools use high temperature (0.8-1.0) for diverse ideas. A common mistake is using high temperature for code generation, which produces syntactically creative but incorrect code. In production AI systems, temperature is typically exposed as a configuration option so product teams can tune the creativity/accuracy tradeoff without code changes. Claude Code uses low temperature for tool calls and code edits to maximize correctness.

← Back to Dictionary

Temperature

Related Terms