SQL SERVER INDEX (Part 5): Function-based Indexes

Firstly, I recommend watching this How to Think Like the Engine - YouTube by Brent Ozar on YouTube (all parts). It is incredibly helpful foundational material before you begin your SQL performance tuning journey.

Assume we have an index on column C. A query has a WHERE clause that filters rows based on the values of func(C). func() is a function. In this case, the optimizer can make use of the index, but it cannot seek on the index.

To demonstrate, I will use the StackOverflow2010 database. Feel free to follow along with my examples. If you haven't, please checkout Tools I Used During Query Performance Troubleshooting to setup to obtain execution plan and logical reads data.

Problem

We have the following index on CreateionDate and a query that queries data created in 2008:
CREATE NONCLUSTERED INDEX [IX_CreationDate]
  ON [dbo].[Posts] ( [creationdate] ASC )
  include (id, title, lasteditdate, score)

SELECT id,
       title,
       lasteditdate,
       score
FROM   dbo.Posts
WHERE  Year(creationdate) = 2008 

Here, the optimizer performs an index scan rather than an index seek. Logical reads 28573.

Because the index tree IX_CreationDate has the CreationDate (Datetime) as key value, we cannot perform index seek on the index to search for the year 2008.

To enable an index seek, we need an index on the Year. In SQL Server, we can accomplish this by creating a computed column and indexing it.

A computed column is a virtual column that isn't physically stored in the table.
-- Create computed column CreationYear
ALTER TABLE dbo.Posts
  ADD CreationYear AS Year(CreationDate)
-- Create an index on the column CreationYear
CREATE NONCLUSTERED INDEX IX_Year_CreationDate
  ON dbo.Posts(CreationYear)
  include (id, title, lasteditdate, score) 

Rerun the query. The optimizer performs an index seek, locating the rows where YEAR(CreationDate) = 2008 and scanning the qualifying rows. Logical reads 1670, significantly better than 28,573 reads.

When running the query, I can still keep the YEAR(CreationDate) and the optimizer still knows to use the computed column index.

Small tweak to the problem

Now, suppose we modify the query to include the AnswerCount column in the SELECT statement:

SELECT id,
       title,
       lasteditdate,
       score,
       answercount
FROM   dbo.Posts
WHERE Year(CreationDate) = 2008 

Running this modified query, the optimizer now scans the clustered index instead. Logical reads 800,000.

This is because AnswerCount was not included in the IX_Year_CreationDate index.

If we use IX_Year_CreationDate index, the optimizer needs key lookups to get the AnswerCount value for each record whose Creation Year is 2008.

However, because there are 280k rows whose Creation Year is 2008, 280k key lookups are required.

In this case, a clustered index scan is more efficient, which is why the optimizer chooses it.

Once again, we see that lookup operations are highly sensitive to cardinality estimates!


The solution is to include AnswerCount column in the IX_Year_CreationDate index. 
DROP INDEXIF EXISTS ix_year_creationdate ON dbo.posts;
CREATE NONCLUSTERED INDEX ix_year_creationdate
ON dbo.posts
                          (
                                                    creationyear
                          )
                          include
                          (
                                                    id,
                                                    title,
                                                    lasteditdate,
                                                    score,
                                                    answercount
                          )
Logical reads 1807.

Alternative solution:

Function-based index is one way to solve, but in many situations, we can rewrite the query so that the index uses the original columns instead.

In that way, the optimizer can use the index IX_CreationDate instead of the index IX_Year_CreationDate. Also, IX_CreationDate can serve a wider range of queries than the year specific IX_Year_CreationDate index.

Clean up

DROP INDEX [IX_CreationDate] ON [dbo].[Posts];

DROP INDEX [IX_Year_CreationDate] ON [dbo].[Posts]; 

Comments

Popular posts from this blog

SQL SERVER INDEX (Part 1): Index Seek vs Index Scan and Statistics

📐 SQL Server Performance: Troubleshooting Memory Grant Overestimation using SQL Diagnostic Manager and SQL Server execution plan

SQL SERVER INDEX (Part 4): Statistics