LEARNING

Training & Fine-Tuning

Created 2 May 2025
learningmachine-learningtrainingfine-tuningrlhf

Training & Fine-Tuning

You’ve probably used a fine-tuned model today without knowing it. ChatGPT, Claude, Gemini — none of them ship as raw “base models.” They go through a careful multi-stage process that turns a pattern-matching engine into something that feels helpful, honest, and safe.

Understanding how this works explains why AI behaves the way it does — the hallucinations, the guardrails, the occasional weird refusal. It’s all a product of training.


The Three Stages

Building a modern AI model like Claude or GPT-4 isn’t one process. It’s three, stacked on top of each other.

Stage 1: Pre-training

What happens: The model reads the internet. Trillions of words — books, websites, code, conversations. Its only job: predict the next word.

What it produces: A “base model” that’s incredibly good at continuing text, but has no idea how to be helpful. Ask it a question and it might just… keep writing the question in different ways.

What it costs: Millions of dollars. Thousands of NVIDIA GPUs. Weeks to months of continuous compute.

This is the expensive part. Everything after is comparatively cheap.

Stage 2: Supervised Fine-Tuning (SFT)

What happens: Humans write thousands of examples: “Here’s a question, here’s how you should answer.” The model learns the format of being helpful.

What it produces: A model that follows instructions, answers questions, and feels conversational. Most of the “personality” comes from here.

Stage 3: Alignment (RLHF / RLAIF / Constitutional AI)

What happens: The model generates responses, and raters (human or AI) rank them: “This answer is better than that one.” A reward model learns those preferences. The main model is optimised to match them.

What it produces: A model that’s not just capable, but aligned — less likely to be harmful, more likely to say “I don’t know” when uncertain.

Key variants:

  • RLHF — Human feedback (OpenAI’s approach)
  • RLAIF — AI feedback, guided by principles (Anthropic’s approach)
  • DPO — Direct Preference Optimisation (simpler, no reward model needed)

Why Models Hallucinate

This is where understanding training actually helps you in practice.

During pre-training, the model learned to be confident and fluent. Text that flows well was rewarded. “I don’t know” never appeared in the training signal — it was always better to generate something plausible-sounding.

Alignment tries to fix this, but it’s fighting against trillions of tokens of conditioning. The model’s first instinct is still to sound sure. That’s why RAG (grounding in real documents) and careful prompting matter so much.


Fine-Tuning Your Own Models

You can’t pre-train a model (unless you’re a lab with billions to spend). But you can fine-tune one. This is where open models like LLaMA shine — take a foundation model and specialise it for your use case.

MethodWhat it trainsCostBest for
Full fine-tuneEverythingMassiveLabs with resources
LoRA~0.5% of weightsLowMost practical use cases
QLoRASame, but quantisedVery lowConsumer GPUs (24GB)
Prefix tuningJust a prefixMinimalQuick experiments

LoRA (Low-Rank Adaptation) is the sweet spot for most people. You freeze the original weights and train small adaptor matrices alongside them. The result is almost as good as full fine-tuning at a fraction of the cost.


Key Vocabulary

TermPlain English
LossHow wrong the model is (lower = better)
Learning rateHow big the correction steps are
EpochOne complete pass through all training data
OverfittingModel memorises examples instead of learning patterns
TokenisationBreaking text into sub-word pieces before the model sees it
Gradient descentThe algorithm that makes all of this work — follow the slope downhill

What I’m Still Learning

  • The practical details of running a LoRA fine-tune end-to-end
  • How DPO compares to full RLHF in practice (it’s simpler but is it as good?)
  • Where the line is between “fine-tuning” and “just write a better prompt”

Go Deeper

Best Resources

  • Andrej Karpathy “Let’s build GPT” — Actually train a small Transformer
  • Hugging Face PEFT library — Practical LoRA/QLoRA implementation
  • Sebastian Raschka “Build an LLM From Scratch” — Thorough book-length treatment
  • “Training language models to follow instructions” (InstructGPT paper) — How SFT+RLHF was invented
enes