Neural Networks
Neural Networks
If you want to understand modern AI at all, this is where you start. Every AI system — every chatbot, image generator, self-driving car, and recommendation engine — is a neural network at its core. The rest is details.
The name sounds biological, and that’s intentional (if a bit misleading). Neural networks are loosely inspired by how brains work: layers of simple units, connected together, that collectively learn patterns from data.
The Simplest Version
A single “neuron” does this:
inputs × weights → add them up → activation function → output That’s it. Take some numbers in. Multiply each by a learned weight (how important is this input?). Add a bias. Apply a function that introduces non-linearity. Output a number.
One neuron is useless. Millions of them, arranged in layers, can recognise faces, write poetry, or predict protein structures.
How a Network Learns
This is the part that feels like magic until you see the maths, and then it feels like magic and maths.
1. Forward pass — Data flows through the network. It makes a prediction.
2. Loss — We compare that prediction to the right answer. The gap is the “loss.” The network is wrong (it always is, at first).
3. Backpropagation — We trace backward through the network, asking: “Which weights contributed most to being wrong?”
4. Update — Nudge those weights a tiny bit in the direction that reduces the error.
5. Repeat — Do this millions of times with millions of examples. Gradually, the network gets good.
This is called gradient descent, and it’s the core algorithm behind all of deep learning. It’s also why NVIDIA GPUs became so valuable — this process is embarrassingly parallel.
The Family Tree
Neural networks come in different shapes for different problems:
| Architecture | What it’s good at | Key idea |
|---|---|---|
| Feedforward (MLP) | Simple patterns, tabular data | Layers stacked sequentially |
| CNN | Images, spatial patterns | Small learnable filters scan across data |
| RNN / LSTM | Sequential data (old approach) | Memory of previous inputs |
| Transformer | Everything (modern) | Self-attention — all positions see all others |
| GAN | Image generation (older) | Two networks compete (generator vs critic) |
| Diffusion | Image/video generation | Learn to gradually remove noise |
The Transformer has largely eaten the others. But understanding why CNNs and RNNs existed helps you appreciate what the Transformer solved.
Why This Matters
You don’t need to implement a neural network to work with AI. But understanding this:
- Explains why training costs so much (billions of weight updates across billions of parameters)
- Explains why bias creeps in (networks learn patterns from data, including harmful ones)
- Explains why alignment is hard (you can’t just “tell” a network what to do — you shape it through examples)
- Makes everything else in this hub click at a deeper level
Where to Go From Here
Ready to go deeper:
- Transformers — The specific architecture behind all modern AI
- Embeddings — How neural networks represent meaning as numbers
- Training & Fine-Tuning — The art (and expense) of making networks learn well
Want more intuition first:
- How LLMs Work — The full picture of how language models function
- Andrej Karpathy — His “Zero to Hero” series builds everything from scratch
Best Resources
- 3Blue1Brown “Neural Networks” (YouTube) — the best visual introduction that exists
- Andrej Karpathy “Neural Networks: Zero to Hero” — build a language model from first principles
- Michael Nielsen “Neural Networks and Deep Learning” — free online book, beautifully written