Fine-Tuning
Fine-tuning is like teaching a general-purpose doctor to become a specialist. The doctor already knows medicine (the pre-trained model), but by studying thousands of cardiology cases (your specific dataset), they become an expert cardiologist. Fine-tuning takes an AI model that already understands language and trains it further on your data so it speaks your terminology, follows your formats, and excels at your specific tasks.
Fine-tuning is the process of continuing the training of a pre-trained model on a smaller, task-specific dataset to adapt it for a particular use case. The pre-trained model’s weights are updated (fully or partially) using the new data.
Fine-tuning vs. alternatives:
| Approach | Training? | Cost | Best for |
|---|---|---|---|
| Prompt engineering | No | Free | Format, style, simple tasks |
| RAG | No | Low (infra only) | Grounding in specific documents |
| Fine-tuning | Yes | Medium | Style, format, domain terminology, complex behavior patterns |
| Pre-training from scratch | Yes | Very high | New languages, entirely new domains |
When to fine-tune:
- Consistent output format that prompting cannot reliably achieve
- Domain-specific terminology or jargon the base model gets wrong
- Classification tasks with labeled data
- Style matching (writing like your brand voice)
- Reducing token usage (fine-tuned models need shorter prompts)
When NOT to fine-tune:
- You need the model to know specific facts (use RAG instead)
- You have fewer than 100 training examples
- Prompt engineering already achieves acceptable results
Fine-tuning approaches:
- Full fine-tuning: update all model parameters. Requires significant compute (GPUs).
- LoRA (Low-Rank Adaptation): freeze most weights; train small adapter matrices. 90%+ fewer trainable parameters, much lower compute cost.
- QLoRA: LoRA on a quantized (4-bit) base model. Fine-tune 70B models on a single GPU.
Training data format: typically JSONL with instruction/completion pairs:
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} Fine-tuning with OpenAI API
from openai import OpenAI
client = OpenAI()
# 1. Prepare training data (JSONL format)
# training_data.jsonl:
# {"messages": [{"role": "system", "content": "You are a network engineer."},
# {"role": "user", "content": "What VLAN should guest Wi-Fi use?"},
# {"role": "assistant", "content": "Isolate guest Wi-Fi on VLAN 20..."}]}
# 2. Upload training file
file = client.files.create(
file=open("training_data.jsonl", "rb"),
purpose="fine-tune"
)
# 3. Create fine-tuning job
job = client.fine_tuning.jobs.create(
training_file=file.id,
model="gpt-4o-mini-2024-07-18",
hyperparameters={"n_epochs": 3}
)
# 4. Monitor progress
status = client.fine_tuning.jobs.retrieve(job.id)
print(f"Status: {status.status}") # queued -> running -> succeeded
# 5. Use the fine-tuned model
response = client.chat.completions.create(
model=status.fine_tuned_model, # ft:gpt-4o-mini:org:custom:id
messages=[{"role": "user", "content": "Configure VLAN 30 for management"}]
) Fine-tuning is used when prompting alone cannot achieve the required consistency. Companies fine-tune models to match their brand voice, follow internal documentation formats, or classify support tickets into categories. In cybersecurity, fine-tuned models analyze logs and alert triage with domain-specific understanding. The key decision is fine-tuning vs. RAG: fine-tuning teaches the model how to behave (style, format, reasoning patterns), while RAG teaches it what to know (specific facts and documents). Most production systems use both: a fine-tuned model for consistent behavior, with RAG for grounding in current data. LoRA has dramatically lowered the barrier: fine-tuning a 7B parameter model on a single consumer GPU is now routine, and platforms like Hugging Face, Together AI, and Fireworks AI offer managed fine-tuning as a service.