Just like last month’s entry, the roots for this entry were formed from a recent discussion on an IBM TechXChange Community - the Db2 for i SQL Community. If you’re not yet part of this online community, it’s a good resource for getting your Db2 for i & SQL-related questions answered.
The discussion in this community was regarding two tables like the following which contained employee data and deductions data for employees.
An IBM i developer was looking for SQL support which would allow the deduction details to be returned alongside the employee data in the same row. Essentially, their goal was to take the data in these two tables above and produce the following result set. In their situation, an employee could have a maximum of four deductions.
This transformation of the data that moves values from a single column into multiple columns is often referred to as a pivot operation. While Db2 for i SQL support doesn’t include a pivot operator, it is possible to perform this type of transformation using a combination of rich Db2 for i SQL syntax.
The first step is associating a unique identifier with each employee’s deduction using the ROW_NUMBER specification as shown in this query.
SELECT EmpID, Deductions,
ROW_NUMBER() OVER (PARTITION BY EmpID) AS seqnum FROM deduction
This query converts the data in the Deduction table into the following output. The Partition By keyword on the Row_Number specification is what causes the seqnum column (i.e., the unique identifier) value to reset to 1 when it encounters the deductions for a different employee.
Now that each employee’s deduction has a unique identifier associated with it, the next step is getting those individual deduction column values into a single row for each employee. The Group By EmpID clause in the following query is used to consolidate all of the deduction entries for an employee into a single result row.
SELECT EmpID,
MAX(CASE WHEN seqnum = 1 THEN Deductions END) AS Deduct1,
MAX(CASE WHEN seqnum = 2 THEN Deductions END) AS Deduct2,
MAX(CASE WHEN seqnum = 3 THEN Deductions END) AS Deduct3,
MAX(CASE WHEN seqnum = 4 THEN Deductions END) AS Deduct4
FROM (SELECT EmpID, Deductions,
ROW_NUMBER() OVER (PARTITION BY EmpID) AS seqnum FROM deduction)
GROUP BY EmpID)
As you can see in the complete query solution that follows, this grouping query is encapsulated within a common table expression (CTE) named pivot_deductions. The last step remaining step is to join the pivoted deductions with the employee table, so the employee name can be part of the the final pivoted result set.
WITH pivot_deductions AS
SELECT EmpID,
MAX(CASE WHEN seqnum = 1 THEN Deductions END) AS Deduct1,
MAX(CASE WHEN seqnum = 2 THEN Deductions END) AS Deduct2,
MAX(CASE WHEN seqnum = 3 THEN Deductions END) AS Deduct3,
MAX(CASE WHEN seqnum = 4 THEN Deductions END) AS Deduct4
FROM (SELECT EmpID, Deductions,
ROW_NUMBER() OVER (PARTITION BY EmpID) AS seqnum FROM deduction)
GROUP BY EmpID)
SELECT e.EmpID, Name, Deduct1, Deduct2, Deduct3, Deduct4
FROM employee e
INNER JOIN pivot_deductions p ON e.EmpID = p.EmpID
Hopefully, this article will give you confidence to leverage
some of the SQL functionality highlighted here to more easily perform your own data
transformations whether they involve pivoting or not.
cool sql transformation tips, thanks Kent
ReplyDeleteGreat and very useful example. And thank you for the suggestion join the Db2 for i SQL Community.
ReplyDelete