Monday, June 6, 2022

Got Shared CTEs?

This entry title harkens back to the Got Milk? advertising campaign. Who would have guessed that this campaign goes back almost 30 years?!? As they say, time really flies… now, back to the topic at hand.

If you’re reading this entry, hopefully you already know that the CTE acronym in the SQL world stands for Common Table Expression.  Even if you already knew what a CTE is, you may not understand what the term “shared CTE” is referring to.  A shared CTE is the term used to describe any CTE that is referenced more than once on a query definition.

Here’s an example of a query that contains a shared CTE.  In this query, the staff CTE is referenced twice in the main query definition which qualifies the staff CTE to be categorized as a shared CTE.

WITH staff (deptno, empcount) AS
   (SELECT deptno, COUNT(*) FROM employee
    GROUP BY deptno)
SELECT deptno, empcount FROM staff
WHERE empcount = (SELECT MAX(empcount) FROM staff)

In contrast, the following query contains multiple CTES, but no shared CTEs.  Each of the CTEs (top10_2020 & top10_2021) are only referenced once in the main query, so they don’t meet the criteria of a shared CTE.

WITH top10_2020 (customer_name, total_sales
    (SELECT customer_name, SUM(sales_amt) FROM sales
     WHERE year=2020
     GROUP BY customer_name
     ORDER BY SUM(sales_amt) DESC
     FETCH FIRST 10 ROWS ONLY) ,
   top10_2021 (customer_name, total_sales) AS
     (SELECT customer_name, SUM(sales_amt) FROM sales
      WHERE YEAR=2021
      GROUP BY customer_name
      ORDER BY SUM(sales_amt) DESC
      FETCH FIRST 10 ROWS ONLY)
SELECT Y1.customer_name, Y1.total_sales AS sales2020, 
       Y2.total_sales AS sales2021
   FROM top10_2020 Y1 INNER JOIN top10_2021 Y2
       ON Y1.customer_name = Y2.customer_name                

Knowing whether or not a CTE is a shared CTE is significant because an SQL Standards compatibility fix was recently delivered in the IBM i 7.5 release for SQL statements with shared CTEs. This fix had to be made because there’s a possibility that some queries with shared CTEs may return incorrect results if the queries are run while the tables referenced by a shared CTE are being changed. The fix will guarantee as dictated by the SQL standards that each reference to the shared CTE generates the same result set. The fix delivered by IBM may cause some queries with shared CTE references to run slower and some queries with shared CTEs to run faster.

Now, you may be thinking that it will be a long time before your company installs the IBM i 7.5 release, so why pay attention to this change. The reason that you should pay attention is that the code fixes for shared CTEs that were made for the IBM i 7.5 release will eventually be delivered as PTFs for the IBM i 7.4 release (NOTE: Level 23 of the 7.4 Database Group PTF includes the shared CTE fix). IBM recommends analyzing queries with shared CTEs before the PTFs are delivered so that clients understand the possible impact of these PTFs and have time to change their SQL statements, if needed.

To help with this analysis effort, IBM in late 2021 delivered PTFs for IBM i 7.3 and 7.4 that flag SQL statements with shared CTEs in both Plan Cache Snapshot and SQL Performance Monitor collections. In addition, these flags classify whether or not a Shared CTE is estimated to generate a large result set. Shared CTEs that generate a large result set have a greater chance of having performance issues after the fix is delivered as compared to CTEs with a smaller result set size. However, all SQL statements using shared CTEs have the potential to perform differently once the PTFs are applied.

It is possible to predict the possible performance impact of the future PTFs on SQL requests containing Shared CTEs with a simple coding change.  This simple change involves adding the following predicate, AND RAND() IS NOT NULL, to the CTE that is shared (i.e., referenced multiple times) on the SQL request. This predicate forces the Db2 query optimizer to use the same CTE runtime implementation which is used by the fix in the IBM i 7.5 release.

The IBM development team has published a detailed writeup which includes details on the SQL statement flagging that will aid analysis on 7.3 & 7.4 along with possible coding changes that you may want to consider. Our Db2 team in IBM Systems Lab Services can also be engaged to provide additional assistance, so let me know if we can help.

Before I close, I also want to highlight that the licensing change that I highlighted in last month's entry is effective starting June 1, 2022.  This change converts Db2 Add-On features such as Db2 SMP and Db2 Multisystem into no additional charge features for all of the IBM i 7.x releases (7.1, 7.2, 7.3, 7.4, 7.5)

No comments:

Post a Comment