Wednesday, July 19, 2023

Pivoting with SQL

Just like last month’s entry, the roots for this entry were formed from a recent discussion on an IBM TechXChange Community - the Db2 for i SQL Community. If you’re not yet part of this online community, it’s a good resource for getting your Db2 for i & SQL-related questions answered.

The discussion in this community was regarding two tables like the following which contained employee data and deductions data for employees.


An IBM i developer was looking for SQL support which would allow the deduction details to be returned alongside the employee data in the same row. Essentially, their goal was to take the data in these two tables above and produce the following result set. In their situation, an employee could have a maximum of four deductions. 

This transformation of the data that moves values from a single column into multiple columns is often referred to as a pivot operation. While Db2 for i SQL support doesn’t include a pivot operator, it is possible to perform this type of transformation using a combination of rich Db2 for i SQL syntax.

The first step is associating a unique identifier with each employee’s deduction using the ROW_NUMBER specification as shown in this query.

SELECT EmpID, Deductions,
ROW_NUMBER() OVER (PARTITION BY EmpID) AS seqnum FROM deduction

This query converts the data in the Deduction table into the following output.  The Partition By keyword on the Row_Number specification is what causes the seqnum column (i.e., the unique identifier) value to reset to 1 when it encounters the deductions for a different employee.




Now that each employee’s deduction has a unique identifier associated with it, the next step is getting those individual deduction column values into a single row for each employee. The Group By EmpID clause in the following query is used to consolidate all of the deduction entries for an employee into a single result row.


SELECT EmpID,
       MAX(CASE WHEN seqnum = 1 THEN Deductions END) AS Deduct1,
       MAX(CASE WHEN seqnum = 2 THEN Deductions END) AS Deduct2,
       MAX(CASE WHEN seqnum = 3 THEN Deductions END) AS Deduct3,
       MAX(CASE WHEN seqnum = 4 THEN Deductions END) AS Deduct4
   FROM (SELECT EmpID, Deductions,
ROW_NUMBER() OVER (PARTITION BY EmpID) AS seqnum FROM deduction) GROUP BY EmpID)

 As the group of deductions for an individual employee is processed, a Case expression is used to perform the pivot of the employee’s deductions values into separate output columns. The unique identifier (seqnum) for a deduction is checked to see which output column (Deduct1, Deduct2, Deduct3, Deduct4) will be assigned the value of one of the employee’s deduction values. The MAX function is used only as a trick to get the deduction output columns into the result set - the result set can only contain grouping columns and aggregate functions because the outer Select statement is using a grouping clause.  This grouping query produces the following result set.




As you can see in the complete query solution that follows, this grouping query is encapsulated within a common table expression (CTE) named pivot_deductions. The last step remaining step is to join the pivoted deductions with the employee table, so the employee name can be part of the the final pivoted result set. 


WITH pivot_deductions AS
  SELECT EmpID,
       MAX(CASE WHEN seqnum = 1 THEN Deductions END) AS Deduct1,
       MAX(CASE WHEN seqnum = 2 THEN Deductions END) AS Deduct2,
       MAX(CASE WHEN seqnum = 3 THEN Deductions END) AS Deduct3,
       MAX(CASE WHEN seqnum = 4 THEN Deductions END) AS Deduct4
   FROM (SELECT EmpID, Deductions,
ROW_NUMBER() OVER (PARTITION BY EmpID) AS seqnum FROM deduction) GROUP BY EmpID) SELECT e.EmpID, Name, Deduct1, Deduct2, Deduct3, Deduct4 FROM employee e INNER JOIN pivot_deductions p ON e.EmpID = p.EmpID

Hopefully, this article will give you confidence to leverage some of the SQL functionality highlighted here to more easily perform your own data transformations whether they involve pivoting or not.