Optimizing Subqueries with Select Count: A Better Approach to Complex Queries

Using a Select Count for Subquery Instead of an ‘And’ Filter

As developers, we often find ourselves dealing with complex queries that require multiple filters and calculations. In this post, we’ll explore a common scenario where using a subquery with an ‘and’ filter can be inefficient and how to replace it with a more efficient approach.

Background and Motivation

The original query is designed to retrieve documents created within the last year by users in specific libraries (DEN or HOU). The current implementation uses multiple filters, including du.LIBRARYNAME, dm.TYPE, and dh.ACTIVITY, which can lead to performance issues as the dataset grows.

The Problem with Subqueries

The subquery in the original query is:

inner join mhgroup.DOCUSERS du (nolock) on dh.DOCUSER=du.USERID
where (dh.activity_datetime >= dateadd(dd,-365, getutcdate()))
and du.LIBRARYNAME in ('DEN','HOU')
and dm.TYPE = 'd'
and dh.ACTIVITY = 'Create'

This subquery filters documents based on the user’s library and activity type. However, this approach has several drawbacks:

  • Performance: Using a subquery can slow down query performance, especially when dealing with large datasets.
  • Readability: The use of multiple and conditions in the filter can make it harder to read and understand.

A Better Approach: Using a Case Expression

One way to improve this query is by using a case expression with SUM. This approach allows us to count non-billable documents for each user without having to repeat the subquery.

select 
    du.USERID as 'Username', 
    du.LIBRARYNAME as 'Home Office', 
    count(*) as 'Documents created'
    , NonBillableDocs = sum(case when dm.C1ALIAS like '9999%' then 1 else 0 end)
from mhgroup.dochistory dh (nolock)
inner join mhgroup.DOCMASTER dm (nolock) on (dh.DOCNUM=dm.DOCNUM and dh.VERSION=dm.VERSION)
inner join mhgroup.DOCUSERS du (nolock) on dh.DOCUSER=du.USERID
where (dh.activity_datetime >= dateadd(dd,-365, getutcdate()))
    and du.LIBRARYNAME in ('DEN','HOU')
    and dm.TYPE = 'd'
    and dh.ACTIVITY = 'Create'
group by du.USERID 
    , du.LIBRARYNAME
order by count(*) desc

Understanding the Case Expression

The case expression is used to evaluate a condition and return either 0 or 1 based on that condition. In this example, we’re using it to filter non-billable documents (dm.C1ALIAS like '9999%').

  • case when dm.C1ALIAS like '9999%' then 1 else 0 end: If the value of dm.C1ALIAS matches the specified pattern ('9999%'), return 1; otherwise, return 0.
  • sum(...): Sum up all the values returned by the case expression.

Benefits and Best Practices

Using a case expression with SUM offers several benefits:

  • Improved performance: By avoiding repeated subqueries, we can reduce the computational overhead of the query.
  • Simplified readability: The use of a single condition in the filter makes it easier to understand and maintain.

Best practices for using case expressions include:

  • Avoid complex conditions: Break down complex conditions into smaller, more manageable parts.
  • Use meaningful aliases: Choose descriptive aliases for your columns to improve query readability.
  • Test thoroughly: Verify that your query produces the expected results before deploying it in production.

Last modified on 2023-07-29