Unpivot Two Columns and Group by Cohorts
Situation
Many data analysis tasks involve transforming and aggregating data from multiple sources. In this scenario, we have a table with five columns: Cohorts, Status, Emails, Week_Number (Emails who logged in during that week), and Week_Number2 (Emails from Week_Number who logged in during Week_Number2). The goal is to pivot the data so that both weeks are combined into one column, and then group the results by cohorts and status.
Query Result
The current query result looks like this:
+--------+--------+---------------+-------------+--------------+
| cohort | status | email | week_number | week_number2 |
+--------+--------+---------------+-------------+--------------+
| null | 0 | <a>[email@...]</a> | 5 | 6 |
| 1 | 1 | <a>[email@...]</a> | 5 | 6 |
| 1 | 1 | <a>[email@...]</a> | 5 | 6 |
| 2 | 0 | <a>[email@...]</a> | 5 | NULL |
| 3 | 1 | <a>[email@...]</a> | 5 | 6 |
| 3 | 0 | <a>[email@...]</a> | 5 | 6 |
| 4 | 0 | <a>[email@...]</a> | 5 | NULL |
+--------+--------+---------------+-------------+--------------+
Objective
The desired output should look like this:
+--------+--------+-------------+-------------+
| cohort | status | week | total_count |
+--------+--------+-------------+-------------+
| null | 0 | 5 | 1 |
| null | 0 | 6 | 1 |
| 1 | 1 | 5 | 2 |
| 1 | 1 | 6 | 2 |
| 2 | 0 | 5 | 1 |
| 3 | 0 | 5 | 1 |
| 3 | 0 | 6 | 1 |
| 3 | 1 | 5 | 1 |
| 3 | 1 | 6 | 1 |
| 4 | 0 | 5 | 1 |
+--------+--------+-------------+-------------+
Solution
To achieve the desired output, we can use the apply function to unpivot the data and then perform aggregation.
Query
Here’s the modified query:
select t.Cohort, t3.[Status], v.week, count(*)
from #table t
cross apply (values (t.Week_Number), (t.Week_Number2)) v(week)
where v.week is not null
group by t.Cohort, t3.[Status], v.week;
Explanation
- The
cross applyfunction allows us to treat theWeek_NumberandWeek_Number2columns as separate values. - We use the
valuesfunction to create a new column for each value in these two columns. This effectively unpivots the data, creating a new row for each week number. - The
where v.week is not nullclause ensures that we only include rows where both weeks have non-null values. - We group the results by Cohort, Status, and Week (using the
v.weekalias). - Finally, we count the number of rows in each group using the
count(*)function.
Result
The resulting query will produce the desired output:
+--------+--------+-------------+-------------+
| cohort | status | week | total_count |
+--------+--------+-------------+-------------+
| null | 0 | 5 | 1 |
| null | 0 | 6 | 1 |
| 1 | 1 | 5 | 2 |
| 1 | 1 | 6 | 2 |
| 2 | 0 | 5 | 1 |
| 3 | 0 | 5 | 1 |
| 3 | 0 | 6 | 1 |
| 3 | 1 | 5 | 1 |
| 3 | 1 | 6 | 1 |
| 4 | 0 | 5 | 1 |
+--------+--------+-------------+-------------+
Conclusion
By using the apply function to unpivot the data and then performing aggregation, we can achieve the desired output. This solution demonstrates how to transform and aggregate data from multiple sources, making it easier to analyze and understand complex datasets.
Last modified on 2025-03-31