“Just Add an Index” Is Terrible Advice. Here’s Why.

Let’s be honest. Your query is slow.

You ran EXPLAIN, you saw the execution plan, and you did the obvious thing: you added an index. But the result was disappointing. The speed didn’t improve much, and in some cases, it might have even gotten slower.

Why? Wasn’t an index supposed to be the silver bullet?

This is where the journey to find the ‘real cause’ of the problem begins. The problem isn’t your indexing strategy. The problem is your fundamental understanding of how data is accessed. By the time you finish this article, you’ll stop asking what index to add and start dictating how data is read. You will become an architect.

1. The Starting Point: The Unseen Enemy, ‘Disk I/O’

Why You Have to Look at the Disk Before You Look at Your Code.

Most developers think performance degradation is a symptom of complex application code. This is a complete miscalculation of scale.

Your code executes in the realm of nanoseconds on the CPU and in memory. But the moment you need to access data stored on a disk, you enter the realm of milliseconds. Between these two operations lies a time gap of millions of times that is impossible to perceive.

No matter how complex your application logic is, it’s mostly a battle fought in the nanosecond world. A single disk access, however, is a millisecond-scale time barrier that renders all that effort meaningless. The culprit behind a slow query is, almost without exception, this “disk access” process.

The B-Tree: A Hero Forged in the War Against the Disk

So how does a database win this fight against the notoriously slow disk? The answer is the index, and the data structure it employs: the B-Tree.

The B-Tree is pathologically obsessed with one thing: reducing the number of disk accesses to an absolute minimum.

Its genius lies in an almost magical property: the tree’s height barely increases, even as the data grows exponentially. For a table with hundreds of millions of records, the B-Tree’s height is typically no more than 3 or 4.

The implication is clear. With an index, you only need to read the disk 3-4 times to find a single piece of data among hundreds of millions. Without an index? You might have to read the disk millions or tens of millions of times while scanning the entire table.

So, the first lesson in optimization is this “simple truth”: our goal is to reduce the number of disk accesses.

But what if we pushed this simple truth to its extreme? What if we could make disk access “zero”?

2. The Architect’s Questions: Hunting the Real Culprit

Following this “simple truth” leads us to a naive but seemingly powerful conclusion: the Covering Index. You put all the data the query needs into the index itself, making access to the table—and thus the disk—zero. It seems like the perfect optimization.

But reality, as you’ve likely discovered, is different.

You’ve created the perfect covering index, yet the database optimizer makes the “seemingly stupid” choice to ignore your index and read the entire table. Why?

Why does the optimizer make this “counter-intuitive” decision? You’ll understand its genius when you see the cost difference between ‘Random I/O’ and ‘Sequential I/O’.

Imagine your query needs to retrieve 90% of the table’s data (e.g., ‘active users’).

  • Using the Index (Random I/O): The optimizer finds 900,000 locations in the index, then has to perform 900,000 inefficient “jumps” across the disk to fetch the actual data scattered everywhere.
  • Reading the Table (Sequential I/O): The optimizer ignores the index and, in one single, efficient “drive” from start to finish, reads the entire table into memory and filters for the data it needs.

The optimizer knows that ‘one long drive’ is far cheaper than ‘900,000 jumps’. So it abandons your index.

This dilemma gives us two core lessons in optimization.

First, “How little data can we read?” (the lesson of the Covering Index).
Second, “How effectively can we filter?” (the lesson of Cardinality).

But a true architect’s journey doesn’t end here. On top of these two lessons, you need a few more compasses for making sophisticated decisions.

The Architect’s Checklist: The Devil is in the Details

  • The ‘Batting Order’ of Index Columns: A composite index follows a “left-to-right” rule. Which condition is most frequent? Which is searched with an = operator? The order of your cuts determines the flavor of the dish.
  • The ‘Access Pattern’ of the Query: Using LIKE '%text' nullifies the index. The same ingredient is useless if you use the wrong cooking method.
  • The ‘Distribution’ of Data: What if cardinality is high, but the data is heavily skewed to one specific value? The optimizer knows this via its ‘statistics’ and may choose another path.
  • The Balance of ‘Read vs. Write’ Load: Is this table hit with thousands of inserts/updates per second? If so, an index (an ‘investment’) becomes a ‘debt’ that degrades write performance.

3. Conclusion: You Will Now Ask Different Questions

The source of your slow query wasn’t your complex business logic. It was a more primal problem: how you approach the physical entity of data.

You will no longer blame the ORM. You will no longer tear apart your application logic.

Instead, you will open EXPLAIN and see the path your database takes to access data. You will check the cardinality, design a covering index, and weigh the trade-offs between Reads and Writes.

You will begin to command the path of data access.

That is the work of an architect.

The book cover of 'Future-Proof Your Java Career With Spring AI', a guide for enterprise Java developers on becoming AI Orchestrators.

Enjoyed this article? Take the next step.

Future-Proof Your Java Career With Spring AI

The age of AI is here, but your Java & Spring experience isn’t obsolete—it’s your greatest asset.

This is the definitive guide for enterprise developers to stop being just coders and become the AI Orchestrators of the future.

View on Amazon Kindle →

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.