LEARNING

Neural Networks

Created 2 May 2025

learningmachine-learningneural-networksfundamentals

Neural Networks

If you want to understand modern AI at all, this is where you start. Every AI system — every chatbot, image generator, self-driving car, and recommendation engine — is a neural network at its core. The rest is details.

The name sounds biological, and that’s intentional (if a bit misleading). Neural networks are loosely inspired by how brains work: layers of simple units, connected together, that collectively learn patterns from data.

The Simplest Version

A single “neuron” does this:

inputs × weights → add them up → activation function → output

That’s it. Take some numbers in. Multiply each by a learned weight (how important is this input?). Add a bias. Apply a function that introduces non-linearity. Output a number.

One neuron is useless. Millions of them, arranged in layers, can recognise faces, write poetry, or predict protein structures.

How a Network Learns

This is the part that feels like magic until you see the maths, and then it feels like magic and maths.

1. Forward pass — Data flows through the network. It makes a prediction.

2. Loss — We compare that prediction to the right answer. The gap is the “loss.” The network is wrong (it always is, at first).

3. Backpropagation — We trace backward through the network, asking: “Which weights contributed most to being wrong?”

4. Update — Nudge those weights a tiny bit in the direction that reduces the error.

5. Repeat — Do this millions of times with millions of examples. Gradually, the network gets good.

This is called gradient descent, and it’s the core algorithm behind all of deep learning. It’s also why NVIDIA GPUs became so valuable — this process is embarrassingly parallel.

The Family Tree

Neural networks come in different shapes for different problems:

Architecture	What it’s good at	Key idea
Feedforward (MLP)	Simple patterns, tabular data	Layers stacked sequentially
CNN	Images, spatial patterns	Small learnable filters scan across data
RNN / LSTM	Sequential data (old approach)	Memory of previous inputs
Transformer	Everything (modern)	Self-attention — all positions see all others
GAN	Image generation (older)	Two networks compete (generator vs critic)
Diffusion	Image/video generation	Learn to gradually remove noise

The Transformer has largely eaten the others. But understanding why CNNs and RNNs existed helps you appreciate what the Transformer solved.

Why This Matters

You don’t need to implement a neural network to work with AI. But understanding this:

Explains why training costs so much (billions of weight updates across billions of parameters)
Explains why bias creeps in (networks learn patterns from data, including harmful ones)
Explains why alignment is hard (you can’t just “tell” a network what to do — you shape it through examples)
Makes everything else in this hub click at a deeper level

Where to Go From Here

Ready to go deeper:

Transformers — The specific architecture behind all modern AI
Embeddings — How neural networks represent meaning as numbers
Training & Fine-Tuning — The art (and expense) of making networks learn well

Want more intuition first:

How LLMs Work — The full picture of how language models function
Andrej Karpathy — His “Zero to Hero” series builds everything from scratch

Best Resources

3Blue1Brown “Neural Networks” (YouTube) — the best visual introduction that exists
Andrej Karpathy “Neural Networks: Zero to Hero” — build a language model from first principles
Michael Nielsen “Neural Networks and Deep Learning” — free online book, beautifully written