Unpivot Two Columns and Group by Cohorts

Situation

Many data analysis tasks involve transforming and aggregating data from multiple sources. In this scenario, we have a table with five columns: Cohorts, Status, Emails, Week_Number (Emails who logged in during that week), and Week_Number2 (Emails from Week_Number who logged in during Week_Number2). The goal is to pivot the data so that both weeks are combined into one column, and then group the results by cohorts and status.

Query Result

The current query result looks like this:

+--------+--------+---------------+-------------+--------------+
| cohort | status |     email     | week_number | week_number2 |
+--------+--------+---------------+-------------+--------------+
| null   |      0 | <a>[email@...]</a> |           5 | 6            |
| 1      |      1 | <a>[email@...]</a> |           5 | 6            |
| 1      |      1 | <a>[email@...]</a> |           5 | 6            |
| 2      |      0 | <a>[email@...]</a> |           5 | NULL         |
| 3      |      1 | <a>[email@...]</a> |           5 | 6            |
| 3      |      0 | <a>[email@...]</a> |          5  | 6            |
| 4      |      0 | <a>[email@...]</a> |           5 | NULL         |
+--------+--------+---------------+-------------+--------------+

Objective

The desired output should look like this:

+--------+--------+-------------+-------------+
| cohort | status | week        | total_count |
+--------+--------+-------------+-------------+
| null   |      0 |           5 |           1 |
| null   |      0 |           6 |           1 |
| 1      |      1 |           5 |           2 |
| 1      |      1 |           6 |           2 |
| 2      |      0 |           5 |           1 |
| 3      |      0 |           5 |           1 |
| 3      |      0 |           6 |           1 |
| 3      |      1 |           5 |           1 |
| 3      |      1 |           6 |           1 |
| 4      |      0 |           5 |           1 |
+--------+--------+-------------+-------------+

Solution

To achieve the desired output, we can use the apply function to unpivot the data and then perform aggregation.

Query

Here’s the modified query:

select t.Cohort, t3.[Status], v.week, count(*)
from #table t 
cross apply (values (t.Week_Number), (t.Week_Number2)) v(week)
where v.week is not null
group by t.Cohort, t3.[Status], v.week;

Explanation

The cross apply function allows us to treat the Week_Number and Week_Number2 columns as separate values.
We use the values function to create a new column for each value in these two columns. This effectively unpivots the data, creating a new row for each week number.
The where v.week is not null clause ensures that we only include rows where both weeks have non-null values.
We group the results by Cohort, Status, and Week (using the v.week alias).
Finally, we count the number of rows in each group using the count(*) function.

Result

The resulting query will produce the desired output:

+--------+--------+-------------+-------------+
| cohort | status | week        | total_count |
+--------+--------+-------------+-------------+
| null   |      0 |           5 |           1 |
| null   |      0 |           6 |           1 |
| 1      |      1 |           5 |           2 |
| 1      |      1 |           6 |           2 |
| 2      |      0 |           5 |           1 |
| 3      |      0 |           5 |           1 |
| 3      |      0 |           6 |           1 |
| 3      |      1 |           5 |           1 |
| 3      |      1 |           6 |           1 |
| 4      |      0 |           5 |           1 |
+--------+--------+-------------+-------------+

Conclusion

By using the apply function to unpivot the data and then performing aggregation, we can achieve the desired output. This solution demonstrates how to transform and aggregate data from multiple sources, making it easier to analyze and understand complex datasets.

Last modified on 2025-03-31