Large Language Model (LLM)
A large language model is an AI system that has read an enormous amount of text (books, websites, code, conversations) and learned the patterns of language well enough to write, summarize, translate, answer questions, and generate code. It does not “understand” in the human sense; it predicts the most likely next word based on everything it has learned. The “large” refers to the billions of mathematical parameters that encode these patterns.
A Large Language Model (LLM) is a deep neural network, typically based on the Transformer architecture, trained on large-scale text corpora to perform autoregressive language generation. The model predicts the probability distribution over the next token given all preceding tokens.
Architecture:
- Based on the Transformer (Vaswani et al., 2017), specifically the decoder-only variant for generative models
- Attention mechanism: allows the model to weigh the relevance of every previous token when predicting the next one, enabling long-range dependencies
- Scale: modern LLMs have 7B to 400B+ parameters across 32-128+ transformer layers
- Trained with next-token prediction (causal language modeling) on trillions of tokens
Key concepts:
- Parameters: learnable weights in the neural network. More parameters generally means more capability but also more compute cost.
- Tokenization: text is split into subword tokens (BPE, SentencePiece). “Understanding” becomes [“Under”, “standing”]. Each token maps to a numeric ID.
- Context window: the maximum number of tokens the model can process in a single forward pass (8K to 1M+ tokens depending on model).
- Temperature: controls randomness in token selection. 0 = deterministic (always pick highest probability), 1 = proportional sampling, >1 = more random.
- Inference: the process of generating output from a trained model. Computationally expensive due to the autoregressive loop (one token at a time).
Training pipeline:
- Pretraining: unsupervised learning on massive text corpora (internet, books, code)
- Supervised fine-tuning (SFT): training on human-curated instruction/response pairs
- RLHF/RLAIF: reinforcement learning from human or AI feedback to align outputs with human preferences
Using an LLM via API (Anthropic Claude)
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Explain what a VLAN is in one sentence."
}
],
temperature=0.3, # Low temperature for factual accuracy
)
print(message.content[0].text)
# "A VLAN is a virtual partition of a physical network switch
# that isolates broadcast traffic between groups of ports." LLMs have moved from research novelty to production tool in under three years. Claude (Anthropic), GPT-4 (OpenAI), and Gemini (Google) power coding assistants, customer support bots, document summarization, code review, and content generation across every industry. In IT specifically, LLMs are used for log analysis, incident summarization, infrastructure-as-code generation, and natural language querying of monitoring data. The key limitation is hallucination: LLMs can generate confident, plausible-sounding information that is factually wrong. This is why RAG (retrieval-augmented generation) and human review remain essential for production deployments. Running LLMs locally (Ollama, llama.cpp) on consumer GPUs is increasingly viable for smaller models (7B-13B parameters).