Troubleshoot SQL Server Cardinality Estimate

Today, I will show you why good cardinality estimate is essential for an efficient execution plan and how I troubleshoot inaccurate cardinality estimates.

I will use the database StackOverflow2010 to demonstrate.

INSERT 50,000 rows

I insert 50,000 rows with OwnerUserID 26837 into the table. Assuming this is a real life workload!

DECLARE @date AS DATETIME = Getdate();

INSERT INTO posts
            (body,
             lastactivitydate,
             creationdate,
             score,
             viewcount,
             posttypeid,
             owneruserid)
VALUES      ('Some question',
             @date,
             @date,
             0,
             4,
             1,
             26837)
go
50000 

Run a query

The following query will be run:

DROP TABLE IF EXISTS #tempposts;

CREATE TABLE #tempposts
  (
     id INT
  )

INSERT INTO #tempposts 
SELECT id FROM dbo.posts WHERE owneruserid = 26837 

SELECT title, u.displayname, pt.type 
FROM dbo.posts p 
INNER JOIN #tempposts temp ON p.id = temp.id 
INNER JOIN dbo.users u ON p.owneruserid = u.id 
INNER JOIN dbo.posttypes pt ON p.posttypeid = pt.id 
OPTION(recompile, USE hint('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_150')) 


Examine the execution plan

Execution Plan (Before statistics updated)


The Cardinality Estimator, from now I will call it CE, failed to estimate the number of rows being processed accurately. While 50,319 rows were returned from the join, the estimate was only 202 rows. This underestimated row count caused descendant operators in the execution plan (e.g., the Sort operator) to be underestimated as well.

The number of logical reads is 457,732. Whether this is optimal or not, we may not know, but we do know that the estimates are significantly off.

Additionally, the optimizer chose to perform seeks on the PostTypes and Users tables instead of scans, based on the assumption of only 202 rows being returned.

Checkout the OptimizerStatsUsage property

Since SQL Server 2017, we can identify which statistics were used during compilation through the execution plan.

To see this property, I make sure to set the query to recompile and to run with the newer compatibility level (e.g. 150) through query hint.

The OptimizerStatsUsage property is located under the properties of the root node of the plan:

The underestimation of rows returned from the inner join between the temp table and the Posts table uses the two statistics: PK_Posts__Id and _WA_Sys_00000001_A32EF8E3.

Notice there have been 50,000 modifications to the Posts table since the statistics was last updated corresponding to the rows we just inserted. Updating the statistics might help improve the accuracy of the estimate.

Investigate with DBCC SHOW_STATISTICS 

But we can dive deeper to understand how the CE draws the estimate of 202 rows. We will use the command DBCC SHOW_STATISTICS.

DBCC show_statistics ('dbo.Posts', pk_posts__id); 

Next is the statistics associated to the temp table:

DBCC show_statistics ('#tempPosts', _wa_sys_00000001_b7a2b873); 
Both statistics of the join are on the ID column. The left table has stale statistics, so the maximum step RANGE_HI_KEY stops at 12,496,713. The right table, the temp table, had its statistics compiled during execution, with the maximum step RANGE_HI_KEY at 12,596,712.

For simple joins with a single equality or inequality predicate, the new CE thus uses a coarse alignment algorithm that aligns histograms using minimum and maximum matching histogram boundaries to estimate the number of the matching rows.

The maximum matching histogram step of the temporary table on the right could have extended to the 5th step but stopped at 2nd step because the left table's maximum step was only at  12,496,713.

50,000 rows were not included in the calculation.

Update statistics

Let's update statistics and review the execution plan and the performance change:

UPDATE STATISTICS dbo.posts 

Since the statistics is updated, the CE also accounts the 50,000 new rows to the calculation. The inner join's row estimate is closer to the actual - 41,030 rows. The query plan changes. The optimizer chose to scan the Users and PostTypes tables instead of seeks. 

The number of logical reads by the query was reduced to 221,217. The performance improves 2x.

2371 trace flag

The trace flag adjusts the default 20% threshold to a dynamic percentage based on the number of rows in a table. The more rows, the lower the threshold.

The table Posts has 3.7 million rows. Without enabling the trace flag, we would need (3.7millions * 20%) = 740,000 inserts for the system to update the statistics.

With the trace flag enabled, the threshold is reduced: sqrt(3.7millions * 1000) = 60,000 inserts.
There are many insights that we could draw from the chart:

- Given the trace flag enabled, as the number of inserts hits 60k, the statistics are updated. Using the up-to-date statistics, the query plan is more efficient, causing the reduction in the number of logical reads. 

- The more data the query reads, the worse the performance is, indicated by the linear growth as the number of rows inserted to the table and qualifying the query increases.

- With up-to-date statistics and optimal query plan, the slope of the growth becomes smaller, indicating the better scalability.
It is important to actively update statistics in addition to using the SQL Server auto-update feature. Overdoing it, however, can be problematic. Based on other DBA posts, the general advice is to update statistics weekly. Through your tuning experience, you can also learn if your current statistics update strategy is sufficient or not.

To better understand about cardinality estimates, you should check out this Microsoft article Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator | Microsoft Learn.

Comments

Popular posts from this blog

SQL SERVER INDEX (Part 1): Index Seek vs Index Scan and Statistics

📐 SQL Server Performance: Troubleshooting Memory Grant Overestimation using SQL Diagnostic Manager and SQL Server execution plan

📐 SQL Server Performance: Solving Memory Grant Overestimation - the limitations of Memory Grant Feedback and the power of SQL Tuning