Optimizing SQL Server Performance when Sorting with Left Join: A 20-Row Solution

SQL Server Performance when Sorting with Left Join

Understanding the Issue

The provided Stack Overflow post highlights a SQL Server performance issue related to sorting with a LEFT JOIN. The goal is to optimize the query to retrieve the top 20 rows in a reasonable amount of time.

The Query

SELECT    o.OrderId, 
          p.PaymentDate

FROM      dbo.Orders o                              -- 6 million records

LEFT JOIN dbo.Payments p ON p.OrderId = o.OrderId   -- 3.5 million records

WHERE     o.IsDeleted = 0                           -- There is an index on this column.

ORDER BY  p.PaymentDate DESC                        -- There is an index on this column and on OrderId
OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY

Execution Plan Analysis

The execution plan shows an index scan on both the Orders and Payments tables. The query reads approximately:

  • 6 million records from the Orders table (due to the LEFT JOIN)
  • 3 million records from the Payments table

However, only 20 rows are returned.

Performance Insights

The provided execution plan suggests that the main bottleneck is the index scan on both tables. This is because the query needs to retrieve all records from one table (6 million) and then join them with another table (3.5 million). However, since we’re only interested in the top 20 rows, there’s an opportunity to improve performance by reducing the number of records being retrieved.

Optimizing the Query

Reducing the Number of Records Retrieved

The initial query retrieves all orders without a deleted status and their corresponding payments. Since we only need the top 20 rows, it makes sense to filter out some of these records. A possible approach is to limit the data by specifying a date range that includes most but not all payment records.

Consider adding a WHERE clause to include only records with a payment date within a specific time frame (e.g., the last 30 days). This would reduce the number of records being retrieved:

SELECT    o.OrderId, 
          p.PaymentDate

FROM      dbo.Orders o                              -- 6 million records

LEFT JOIN dbo.Payments p ON p.OrderId = o.OrderId   -- 3.5 million records

WHERE     o.IsDeleted = 0 AND 
          (p.PaymentDate >= DATEADD(day, -30, GETDATE()) OR p.PaymentDate IS NULL)

ORDER BY  p.PaymentDate DESC                        -- There is an index on this column and on OrderId
OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY

This modification assumes that most orders are processed within the last 30 days. You can adjust this time frame based on your organization’s data.

Removing Unnecessary Records from the LEFT JOIN

As mentioned in the Stack Overflow post, the use of a LEFT JOIN is not necessary here since we’re only interested in rows with corresponding payments records. By changing the join type to an INNER JOIN, we exclude records without matching payment records:

SELECT    o.OrderId, 
          p.PaymentDate

FROM      dbo.Orders o                              -- 6 million records

INNER JOIN dbo.Payments p ON p.OrderId = o.OrderId   -- 3.5 million records

WHERE     o.IsDeleted = 0 AND 
          (p.PaymentDate >= DATEADD(day, -30, GETDATE()) OR p.PaymentDate IS NULL)

ORDER BY  p.PaymentDate DESC                        -- There is an index on this column and on OrderId
OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY

Additional Considerations

  • Indexing: Ensure that the Orders table has an index on the IsDeleted column, as used in the query. Similarly, create an index on the Payments table’s OrderId and PaymentDate columns.
  • Statistics: Regularly update query statistics to reflect changes in data distribution. This can help SQL Server optimize queries more effectively.
  • Efficient Data Retrieval: Consider optimizing other parts of your application that might be impacting performance, such as the frontend app increasing the number of days for caching or implementing a retry mechanism.

Conclusion

Optimizing a SQL query to retrieve the top 20 rows with sorting and filtering can significantly improve performance. By reducing the number of records retrieved using date-based filtering and changing the join type from LEFT JOIN to INNER JOIN, you can minimize the number of records being processed by SQL Server. Remember to maintain optimal indexing strategies and regularly update statistics for better query performance.


Last modified on 2023-07-16