Your RAG System Is Just a Really Tall Tree (And Why That Matters)

You’ve probably called .similarity_search() a hundred times. You know it works. Your RAG pipeline retrieves relevant chunks, your LLM generates responses, and your users are happy.

But have you ever wondered what happens in those milliseconds between your query embedding and getting back the top-k results? Spoiler: you’re traversing a tree. A very clever, very tall tree.

Let me take you on a journey from Computer Science 101 to cutting-edge vector databases. Because understanding this evolution will change how you think about RAG performance.

The Basics: Binary Search Trees (The 1D World)

Let’s start simple. Imagine you have a list of numbers: [23, 45, 12, 67, 34, 89, 5, 78].

If someone asks “Is 34 in this list?”, you’d have to check each number one by one. That’s O(n) time. Slow and boring.

Now, what if we organize these numbers into a Binary Search Tree (BST)?

        45
       /  \
     23    67
    /  \   / \
   12  34 56  89
  /
 5

Suddenly, finding 34 becomes a game:

Start at 45. Is 34 smaller? Yes, go left.
Now at 23. Is 34 smaller? No, go right.
Found it at 34!

Three steps instead of eight. That’s O(log n), exponentially faster.

The magic trick? We traded organization upfront (building the tree) for speed later (searching). This is the fundamental insight behind all modern search systems.

Pro tip: Every time you hear “indexing” in databases, it’s this same tradeoff: organize now, search fast later.

The Problem: Your Embeddings Aren’t Numbers

Here’s where things get interesting.

That BST works great for numbers because numbers have a natural order. You can say “5 is less than 45” and everyone agrees.

But your RAG embeddings? They look like this:

[0.234, -0.891, 0.456, 0.123, ..., -0.234]  # 768 dimensions

How do you sort that? What does “less than” even mean for a 768-dimensional vector?

You can’t alphabetize by multiple criteria simultaneously. Your brain breaks trying to imagine 700+ dimensions. This is where Computer Science had to get creative.

The Evolution: K-D Trees Enter the Chat

In the 1970s, someone had a brilliant idea: What if we extend BST logic to multiple dimensions?

Enter the K-D Tree (k-dimensional tree).

Here’s how it works for 2D points (imagine storing geographical coordinates):

Points: [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]

K-D Tree splits alternately:
Level 0: Split on X-axis at x=7
├─ Left region (x < 7):  All points with x-coordinate < 7
│  Level 1: Split on Y-axis
│  ├─ Points below this Y-value
│  └─ Points above this Y-value
└─ Right region (x >= 7): All points with x-coordinate >= 7
   Level 1: Split on Y-axis
   └─ (continue alternating...)

The beauty? At each level, we split the space in half along one dimension. First X, then Y, then back to X, cycling through all k dimensions.

Nearest neighbor search becomes possible: When someone asks “What’s closest to point (3,5)?”, we don’t check all points. We traverse the tree, pruning entire branches that are too far away.

Cool, right?

The Curse Arrives: High Dimensions Break Everything

K-D Trees work beautifully… until dimension 20 or so.

Then something weird happens. In high-dimensional spaces, every point is roughly equidistant from every other point. It’s called the curse of dimensionality, and it ruins everything.

Think about it:

In 1D, half your data is “close” (one side of the split)
In 2D, one quarter is close (a quadrant)
In 3D, one eighth (an octant)
In 768D? You have 2^768 possible regions. Your data is so spread out that pruning branches barely helps.

K-D Trees degrade to basically linear search in high dimensions. And your RAG embeddings live in 768-1536+ dimensional space.

Warning: This is why you can’t just build a K-D tree for your embeddings and call it a day. It’ll be slower than you think.

The Modern Reality: Approximate Is Good Enough

Here’s where the plot twists: Modern vector databases gave up on finding the perfect nearest neighbor.

Instead, they find approximately the nearest neighbors. And they do it at blazing speed.

Techniques like:

HNSW (Hierarchical Navigable Small Worlds): Multi-layer graphs that skip across the space
FAISS (Facebook AI Similarity Search): Clusters vectors and narrows down search space
Annoy (Approximate Nearest Neighbors Oh Yeah): Random projection trees (yes, trees!)
ScaNN (Scalable Nearest Neighbors): Quantization + tree structures

These are all sophisticated descendants of the K-D tree idea. They still partition space hierarchically. They still prune search regions. But they add:

Approximation: Accept 98% accuracy for 100x speed
Graph structures: Navigate the space like a highway system (fast lanes + local roads)
Quantization: Compress vectors to fit more in memory
Hybrid approaches: Combine multiple strategies

When you use Pinecone, Weaviate, Qdrant, or Chroma, they’re all using variations of these methods under the hood.

The Through-Line: It’s Trees All The Way Down

Here’s what this journey teaches us:

Every search system is about organizing space hierarchically. From BSTs to K-D trees to HNSW graphs—it’s all the same core idea: divide, conquer, and prune.

Understanding this evolution helps you:

Debug slow RAG queries: Maybe your vector DB isn’t properly indexed
Choose the right vector database: HNSW for recall, IVF for speed, etc.
Tune parameters: You’ll understand what “nprobe” or “ef” actually control
Explain to stakeholders: Why indexing takes time but search is instant

Next time you call .similarity_search(), picture a tree. Imagine your query vector descending through branches, eliminating millions of irrelevant chunks, until it lands on the few that matter.

That 50ms response time? That’s a tree traversal.

Pro tip: If your RAG is slow, check these in order:

Is your vector DB actually indexed? (Some lazy-load indexes)
What’s your index type? (HNSW vs IVF vs Flat)
Are you searching too many dimensions? (PCA/dimensionality reduction helps)
Is your top-k too large? (Fetching 100 results vs 5 matters)

Where to Go From Here

If this clicked for you, here are some rabbit holes worth exploring:

Locality Sensitive Hashing (LSH): Another way to organize high-dimensional space
Product Quantization: How to compress 768D to fit in RAM
HNSW deep dive: The graph structure that powers most modern vector DBs
The curse of dimensionality: Why high dimensions are weird (and beautiful)

The fundamental lesson? Computer Science solved the search problem decades ago. Modern AI just had to adapt those solutions to weirder, higher-dimensional spaces.

And now you know: beneath every RAG query is a tree, waiting to be traversed.

Until next time. ☕️

Did this change how you think about vector search? Let me know what you’d like to explore next—maybe HNSW internals, or how to actually benchmark different vector DBs?

Your RAG System Is Just a Really Tall Tree