Skip to content
general artificial-intelligence

Temperature

temperature ai sampling parameters
Plain English

Temperature is a dial that controls how creative or predictable an AI’s responses are. At temperature 0, the AI always picks the most likely word, giving you the same answer every time (good for facts and code). At temperature 1, it considers less likely words too, producing more varied and creative responses (good for brainstorming and writing). Above 1, responses become increasingly random and potentially nonsensical.

Technical Definition

Temperature is a hyperparameter applied during the sampling stage of LLM inference that scales the logits (raw prediction scores) before the softmax function converts them into probabilities.

Mathematically:

P(token_i) = exp(logit_i / T) / Σ exp(logit_j / T)

Where T is the temperature value.

Effect on probability distribution:

TemperatureBehaviorDistribution shape
T = 0Deterministic (greedy decoding)All probability on top token
T = 0.1-0.3Highly focused, minimal variationSharp peak
T = 0.5-0.7Balanced creativity and coherenceModerate spread
T = 1.0Default; model’s natural distributionAs trained
T > 1.0Increasingly randomFlattened, uniform-like

Related sampling parameters:

  • Top-p (nucleus sampling): only consider tokens whose cumulative probability exceeds p (e.g., top_p=0.9 considers the smallest set of tokens covering 90% probability)
  • Top-k: only consider the k most probable tokens
  • Temperature + top-p are often used together for fine-grained control

Recommended settings by task:

TaskTemperatureReasoning
Code generation0-0.2Correctness matters; deterministic is safer
Factual Q&A0-0.3Accuracy over creativity
Summarization0.3-0.5Moderate variation in phrasing
Creative writing0.7-1.0Diverse, expressive output
Brainstorming0.8-1.2Maximum idea diversity

Temperature comparison

import anthropic

client = anthropic.Anthropic()

prompt = "Name a network protocol used for secure remote access."

for temp in [0.0, 0.5, 1.0]:
    responses = []
    for _ in range(3):
        msg = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=50,
            temperature=temp,
            messages=[{"role": "user", "content": prompt}]
        )
        responses.append(msg.content[0].text.strip())
    print(f"Temperature {temp}: {responses}")

# Temperature 0.0: ["SSH", "SSH", "SSH"]
#   (deterministic: same answer every time)
# Temperature 0.5: ["SSH", "SSH", "IPsec VPN"]
#   (mostly consistent, occasional variation)
# Temperature 1.0: ["SSH", "WireGuard", "IPsec with IKEv2"]
#   (diverse, all valid but different)
In the Wild

Temperature is one of the first parameters engineers tune when building AI applications. Customer support bots use low temperature (0.1-0.3) for consistent, accurate answers. Marketing content generators use medium temperature (0.5-0.7) for natural-sounding variation. Creative tools use high temperature (0.8-1.0) for diverse ideas. A common mistake is using high temperature for code generation, which produces syntactically creative but incorrect code. In production AI systems, temperature is typically exposed as a configuration option so product teams can tune the creativity/accuracy tradeoff without code changes. Claude Code uses low temperature for tool calls and code edits to maximize correctness.