LEARNING

Neural Networks

Created 2 May 2025
learningmachine-learningneural-networksfundamentals

Neural Networks

If you want to understand modern AI at all, this is where you start. Every AI system — every chatbot, image generator, self-driving car, and recommendation engine — is a neural network at its core. The rest is details.

The name sounds biological, and that’s intentional (if a bit misleading). Neural networks are loosely inspired by how brains work: layers of simple units, connected together, that collectively learn patterns from data.


The Simplest Version

A single “neuron” does this:

inputs × weights → add them up → activation function → output

That’s it. Take some numbers in. Multiply each by a learned weight (how important is this input?). Add a bias. Apply a function that introduces non-linearity. Output a number.

One neuron is useless. Millions of them, arranged in layers, can recognise faces, write poetry, or predict protein structures.


How a Network Learns

This is the part that feels like magic until you see the maths, and then it feels like magic and maths.

1. Forward pass — Data flows through the network. It makes a prediction.

2. Loss — We compare that prediction to the right answer. The gap is the “loss.” The network is wrong (it always is, at first).

3. Backpropagation — We trace backward through the network, asking: “Which weights contributed most to being wrong?”

4. Update — Nudge those weights a tiny bit in the direction that reduces the error.

5. Repeat — Do this millions of times with millions of examples. Gradually, the network gets good.

This is called gradient descent, and it’s the core algorithm behind all of deep learning. It’s also why NVIDIA GPUs became so valuable — this process is embarrassingly parallel.


The Family Tree

Neural networks come in different shapes for different problems:

ArchitectureWhat it’s good atKey idea
Feedforward (MLP)Simple patterns, tabular dataLayers stacked sequentially
CNNImages, spatial patternsSmall learnable filters scan across data
RNN / LSTMSequential data (old approach)Memory of previous inputs
TransformerEverything (modern)Self-attention — all positions see all others
GANImage generation (older)Two networks compete (generator vs critic)
DiffusionImage/video generationLearn to gradually remove noise

The Transformer has largely eaten the others. But understanding why CNNs and RNNs existed helps you appreciate what the Transformer solved.


Why This Matters

You don’t need to implement a neural network to work with AI. But understanding this:

  • Explains why training costs so much (billions of weight updates across billions of parameters)
  • Explains why bias creeps in (networks learn patterns from data, including harmful ones)
  • Explains why alignment is hard (you can’t just “tell” a network what to do — you shape it through examples)
  • Makes everything else in this hub click at a deeper level

Where to Go From Here

Ready to go deeper:

Want more intuition first:

  • How LLMs Work — The full picture of how language models function
  • Andrej Karpathy — His “Zero to Hero” series builds everything from scratch

Best Resources

  • 3Blue1Brown “Neural Networks” (YouTube) — the best visual introduction that exists
  • Andrej Karpathy “Neural Networks: Zero to Hero” — build a language model from first principles
  • Michael Nielsen “Neural Networks and Deep Learning” — free online book, beautifully written
enes