SQL SERVER INDEX (Part 2): Key Lookup

 Firstly, I recommend watching this How to Think Like the Engine - YouTube by Brent Ozar on YouTube (all parts). It is incredibly helpful foundational material before you begin your SQL performance tuning journey.

To demonstrate concepts about key lookups, I will use the StackOverflow2010 database. Feel free to follow along with my examples. If you haven't, please check out my post Tools I Used During Query Performance Troubleshooting to see how to setup environment to obtain execution plans and track logical reads.

Now, let's consider the following index and query:

CREATE INDEX IX_PostTypeID
  ON dbo.posts (PostTypeId);

SELECT ID,
       Title,
       LastEditDate,
       Score
FROM   dbo.Posts
WHERE  posttypeid = 3 

The index IX_PostTypeID has PostTypeID as its key column. To execute the query, the optimizer can choose to use the index to seek post where the PostTypeID is 3.

However, the index doesn't contain other columns required by the query, specifically, TitleLastEditdateScore. Therefore, additional key lookups on the clustered index PK_Post_ID are necessary.

That is how a key lookup works. Using the non-clustered index IX_PostTypeIDSQL Server seeks post ids with PostTypeID = 3. Once those post IDs are retrieved, the optimizer performs key lookups on the clustered index PK_Post_ID to fetch the additional column data (See the demonstration below).

However, key lookups can be extremely costly if there are a large number of rows to lookup.


When Key Lookups Are Acceptable

Key lookups are generally acceptable when there are only a few rows to retrieve. 

In this case, the optimizer estimates that 28 post IDs match the condition PostTypeID = 3. Therefore, 28 key lookups are required to retrieve TitleLastEditDate, and Score. The optimizer determines that this is manageable.

When Key Lookups Become a Problem

However, as we change the query to search for a PostTypeID that matches millions of rows, the optimizer decides using index seek with key lookups is no longer efficient, as it would require millions of key lookupsInstead, it chooses a clustered index scan - scanning the entire table and filter rows (with logical reads totaling 800,856).

When I force the query to use the previous plan (index seek with key lookups), the number of logical reads increases fivefold, reaching over 4 million.


A key point to remember is that if cardinality estimates are accurate, the optimizer usually makes the right choice—unless a query hint forces it otherwise.

However, when cardinality is underestimated, the optimizer might mistakenly choose an index seek with millions of key lookups, when a scan would have been more efficient.


Note: When comparing estimated vs. actual row counts, remember that misestimation isn’t always due to outdated statistics or skewed data distribution. It can also be caused by parameter sniffing, where a cached plan is reused and isn't optimal for different parameter values.
To rule out parameter sniffing, always run your query with OPTION (RECOMPILE) when analyzing cardinality estimates.

The Solution: Covering Index

What if we want the query to still benefit from the index seek, but avoid costly key lookups?

To achieve this, we create a covering index that includes all the columns needed by the query.

DROP INDEX IX_PostTypeID  ON dbo.Posts;

CREATE INDEX IX_PostTypeID
  ON dbo.posts (PostTypeID)
  include (Title, LastEditDate, Score); 

This index covers the query, eliminating the need for key lookups on the clustered index PK_Post_ID.

The number of logical reads is 17,809 (smaller than when scanning the clustered index).

Why Not Include All Columns in an Index?

The goal of IX_PostTypeID is to support faster seeks for queries filtered by PostTypeID. However, we don’t want the index to become bloated, which would lead to longer reads and writes. That’s why we include only the columns required by the query.

A useful rule of thumb I learned from DBA guru Brent Ozar:

A table should have no more than 5 non-clustered indexes, and each index should ideally include no more than 5 columns —unless you're running a high-performance machine (e.g., 12–16 CPU cores, 64GB RAM, 2TB SSD).

Fewer supporting indexes may require some queries to scan the table to retrieve data (while seeks would have read a smaller number of pages and search the data faster). More indexes can slow down updates, deletes, and inserts; potentially lead to slower storage, and reduce memory efficiency. It’s important to find the right balance so performance doesn't suffer (helpful tool for index tuning and query tuning: sp_blitzindex and sp_blitzcache)

Cleanup

DROP INDEX IF EXISTS ix_posttypeid ON dbo.posts 

Comments

Popular posts from this blog

SQL SERVER INDEX (Part 1): Index Seek vs Index Scan and Statistics

📐 SQL Server Performance: Troubleshooting Memory Grant Overestimation using SQL Diagnostic Manager and SQL Server execution plan

📐 SQL Server Performance: Solving Memory Grant Overestimation - the limitations of Memory Grant Feedback and the power of SQL Tuning