LEARNING

Embeddings

Created 2 May 2025

learningmachine-learningembeddingsvectorsrepresentation

Embeddings

Every time you search by meaning rather than exact keywords, embeddings are doing the work. They’re one of those ideas that sounds abstract until you see what they enable — then they’re everywhere.

An embedding turns something (a word, a sentence, an image, a concept) into a list of numbers — a point in space. Things that are similar end up close together. Things that are different end up far apart. That’s the whole idea.

The Intuition

Imagine you could place every English word on a map. “King” and “queen” would be near each other. “Dog” and “puppy” closer still. “Banana” would be way off in another direction.

Now imagine that map has 768 dimensions instead of two. That’s an embedding space.

"king"   → [0.2, -0.4, 0.7, 0.1, ..., -0.3]
"queen"  → [0.21, -0.38, 0.69, 0.12, ..., -0.28]   ← very close!
"banana" → [-0.5, 0.8, -0.1, 0.9, ..., 0.4]         ← very far

The beautiful thing: nobody designed which dimension means what. The network learned these representations from patterns in data. Meaning emerges from structure.

What Makes Them Powerful

Arithmetic with meaning. The famous example:

king - man + woman ≈ queen

Directions in embedding space encode relationships. The “gender” direction, the “royalty” direction — they emerge naturally.

Context changes everything. Modern embeddings (from Transformers) give the same word different vectors depending on context:

“bank” in “river bank” → nature, geography
“bank” in “bank account” → finance, money

This is what makes AI feel like it “understands.”

Where You’ll Encounter Them

Use case	How embeddings help
RAG & Retrieval	Find relevant documents by meaning, not just keywords
Semantic search	“Show me articles about job loss” finds “unemployment” too
Recommendations	“Similar items” in any domain
Clustering	Automatically group related content
Inside every LLM	Tokens become embeddings as the first step inside a Transformer

If you’re building anything with AI that involves finding, matching, or comparing — you’re using embeddings.

The Practical Pipeline

This is how embeddings work in a real application (like RAG):

1. Take your documents → split into chunks
2. Run each chunk through an embedding model → get vectors
3. Store vectors in a vector database
4. User asks a question → embed the question too
5. Find the nearest vectors (most similar chunks)
6. Feed those chunks to an LLM as context → grounded answer

Embedding Models Worth Knowing

OpenAI text-embedding-3 — High quality, paid API
BGE / Nomic — Open-source, excellent performance
all-MiniLM — Fast, lightweight, good for getting started
Cohere Embed — Strong multilingual support

Vector Databases

Pinecone — Managed, zero-ops
Weaviate — Open-source, hybrid search (vectors + keywords)
ChromaDB — Simplest option, great for prototyping
pgvector — PostgreSQL extension (use your existing DB)

What I’m Still Learning

How to choose the right chunk size (too small = no context, too big = noise)
When hybrid search (vectors + BM25 keyword matching) outperforms pure vector search
How multimodal embeddings (CLIP, ImageBind) unify text and image in one space

Go Deeper

RAG & Retrieval — The main application of embeddings today
Transformers — How contextual embeddings are actually generated
Neural Networks — The foundation (if this page felt too fast)
How LLMs Work — Where embeddings fit in the full picture
Tools & Frameworks — Practical tools for working with embeddings

Best Resources

“Illustrated Word2Vec” (Jay Alammar) — Visual intuition for how embeddings form
Hugging Face MTEB Leaderboard — Compare embedding models by benchmarks
Pinecone Learning Centre — Practical guides for vector search