📐 SQL Server Performance: Troubleshooting Memory Grant Overestimation using SQL Diagnostic Manager and SQL Server execution plan

Have you ever wondered how SQL Server uses memory? Have you ever encountered a memory issue in SQL Server? Did you know that SQL Server isn't smart sometimes and it can overestimate and grant a significant amount of memory just for a single query to run —up to one-fifth of the server’s total memory 😱? If you are curious to learn more, read on 🤓.

Introduction

In SQL Server, before executing a query, a compilation phase may be triggered to generate an execution plan. An execution plan determines 3 key aspects: the order in which source tables are accessed, the operations used to retrieve data (e.g. seek, index/table scan), and the methods for joining, filtering, sorting, or aggregating data (Execution plan overview - SQL Server | Microsoft Learn). SQL Server can bypass the compilation phase if it finds a reusable cached plan for the query.

During the compilation phase, SQL Server estimates how much data each operation (e.g. read, join, sort, aggregate) will handle and uses that info to estimate the memory required for processing. Depending on the accuracy of this estimation, SQL Server may grant just enough, too much, or too little memory. This blog focuses on a real-world example that I encountered when SQL Server grants more memory than the query actually needs. For information on the opposite scenario, refer to the article An Introduction to Query Memory - Brent Ozar Unlimited®.

When the memory grant significantly exceeds the actual memory used by the query:

  • SQL Server might flush a lot of useful cached data to allocate memory for the query
  • The query may experience prolonged waiting times for memory (you can investigate this by examining the Query Store Wait Starts)
  • Other queries may have to wait for memory, leading to high CPU

In this blog, I will share about my experience that I used to:

  • Identify the problem: determine if the issue is related to SQL Server's memory grant and identify the query causing the issue using tools like SQLDM IDERA, SQL Server's built-in Query Store, and execution plans.
  • Explore potential solutions and their exceptions.

Problem Identification

At 8:31AM, the page life expectancy dropped sharply from 28,000 seconds down to 325 seconds, signaling a cache issue where pages in cache were being flushed to make room for something else.

During this time, CPU usage was also elevated. The Processor Queue Length spiked significantly compared to the baseline.

The wait time statistics of the RESOURCE_SEMAPHORE also increased. According to SQLSkills, “the RESOURCE_SEMAPHORE wait type is when a thread is waiting for a query execution memory grant so it can begin executing” (SQL Server RESOURCE_SEMAPHORE Wait). Right after the PLE dropped, the RESOURCE_SEMAPHORE wait time increased, and not one query but multiple queries experienced the wait (indicated by multiple colors on the chart). This suggested an after effect, where memory contention from one query causing other subsequent queries to wait for memory.

During this period (8:31AM - 8:36AM), a significant portion of the cache (16GB) was dedicated to Sort, Hash, Indexing operations. This pointed a particular query consuming excessive memory for Sort, Hash, or Indexing. By narrowing down to that particular historical snapshot, I identified the problematic query responsible for the high memory consumption, which resulted in high CPU usage and forced other queries to wait for memory.

*Sidenote: to answer for the ealier question "how SQL Server uses memory?" - the answer lies in the Memory Area chart (below). SQL Server allocates memory for various tasks: procedure plan cache, Connections, Locks, Database, Optimizer, Sorting, Hashing, and Indexing. Of these, the database typically occupies the most memory (the pink shade), as it is used to store the tables. But if a query needs a large amount of memory for Sort, Hash, or Indexing, what does it do? The server purges cached data and tables to make room for these operations. In the near future, queries that use those purged tables sadly read from disk instead of the memory.

One key feature of the SQL Diagnostics Manager tool is the ability to narrow down to any historical snapshot. My approach typically involves first identifying an issue or outlier on the server (related to CPU, disk, cache, networks, etc.) by skimming through the Overview Dashboard; and then drilling down into the problematic historical snapshot to examine further details such as resources and queries. In this case, during the snapshot (6 AM - 9 AM), SQL Diagnostics Manager pinpointed a single problematic query. The query had a CPU time of 2.6 minutes and performed an approximately 152 million logical reads — a concerning number, as values exceeding 1 million often indicate a need for query optimization.

*Note: Although I am highlighting the large number of logical reads, it doesn’t necessarily mean that this alone causes the memory issue that we observed.

Let’s look at the query’s execution plan. Examining the query plan reveals a significant discrepancy: the memory grant was 15GB, while the query only used 311MB. What exactly is a memory grant? Query memory grant (a.k.a. query work buffer) is a part of server memory used to store temporary row data during sorting and joining rows (Understanding SQL server memory grant - Microsoft Community Hub). The execution plan demonstrated that there are 4 Sort opertions executed and Sort operation often requires a lot of memory.

The next question is, What caused the inaccurate memory grant estimation? Although the query appears simple, it accesses data from a complex view involving multiple operations (see below). Since row estimation for each operation is based on the prior operation, any misestimation early on can worsen subsequent estimations. Thus, the more complex the query is, the worsen the row estimations can be. The inaccurate row estimation then caused miscalculation in memory grant.

Conclusion and future post

In summary, the memory issue that we observed stemmed from running a complex query that misled the SQL Server optimizer, resulting a significant overestimation of row count and memory requirements for operations like joining, hashing, and sorting.

With the supports from tools like SQL Diagnostic Manager, and excecution plan, along the knowledge and experience to interpret the data presented to you, you can effectively identify the root cause of any problem rather than just guessing. And if you lack an accurate sense of what the problem is, you won't be able find a solution.

If you want to learn how I solved this particular issue, please check out my post SQL Server Performance: Solving Memory Grant Overestimation - the limitations of Memory Grant Feedback and the power of SQL Tuning (mydbadventure.blogspot.com). Also, in an upcoming post, I will share more about my experience with reading execution plans, a crucial skill that every DBA should have to diagnose query execution problems. For instance, in this issue, as you see, I used the memory grant or row estimation info from the execution plan to help me confirm the problematic query and identify the cause of memory grant overestimation.

Key takeaways

If you suspect a memory grant overestimation issue in your system, look for the following signs:

  • A sudden drop in Page Life Expectancy
  • High CPU usage
  • Presence of RESOURCE_SEMAPHORE wait
  • Numerous running queries waiting for memory
  • Identify problematic query by reading the top memory consumption queries in Query Store, and reading the memory grant information in the execution plan
  • If a query's row size estimation is significantly different from the actual value, and the query involves a lot of join, hash, sort operations, memory grant is likely to be overestimated or underestimated.

Comments

Popular posts from this blog

SQL SERVER INDEX (Part 1): Index Seek vs Index Scan and Statistics

SQL SERVER INDEX (Part 4): Statistics