Wednesday, June 29, 2022

DBA/DBE Appreciation Day

Not long ago, I discovered that DataBase Administrator(DBA)  Appreciation Day is observed annually on the first Friday in July which happens to be today!  I’ve recently discussed that although some of the traditional DBA tasks don’t apply to Db2 for i, that there is a real need for a Database Engineer (DBE) in IBM i shops. 

Based on our Systems Lab Services team interactions with clients, I think it will be difficult for IBM i shops to show appreciation to their DBEs on July 1 because far too many shops do NOT have a DBE or a DBE team.  Our team often sees the negative impact of the missing DBE during IBM i client engagements where we are brought into assess and analyze their SQL performance & Db2 for i databases. 

Because there’s not a DBE focusing on the database objects and data access, our team regularly sees issues like:

  • Queries that are poor performers that can be easily fixed with the creation of an index
  • Large tables that have grown past 90% of the Db2 size/row maximum limit – when the limit is reached no more rows can be added by your application!
  • Overcommitment of system resources due to a system-wide parallel degree setting of *MAX for the Db2 SMP feature.
  • Incorrect usage of SQL routine or program settings that are unnecessarily slowing performance
  • Low SQL Plan Cache hit ratio due to runaway usage of QTEMP tables by developers
  • Queries being run against tables with 60-70% of the rows deleted because tables are not being reorganized on a regular basis or they’re not configured to reuse deleted rows

It can be a challenge to find a qualified Db2 for i DBE, but that’s where Systems Lab Services can help with our DBE skills enablement offerings.  These services can help grown an existing IBM i team member into the DBE role or help convert a DBA from another platform into a DBE. So if you don’t have a DBE on your IBM i team to appreciate today, then make a goal to have one by DBA/DBE Appreciation Day 2023! 

I also discovered that July 1 holds the designation of International Joke Day as well, so I’d be remiss if I didn’t close out this July 1 focused entry with a database-related joke: 

Did you hear about the two relational databases that walked into a NoSQL bar. They left after 5 minutes.... 

    because they couldn't find a table! 🙃

No comments on the quality of my joke unless you’re willing to include your own database joke in the comment ðŸ˜Š

Monday, June 6, 2022

Got Shared CTEs?

This entry title harkens back to the Got Milk? advertising campaign. Who would have guessed that this campaign goes back almost 30 years?!? As they say, time really flies… now, back to the topic at hand.

If you’re reading this entry, hopefully you already know that the CTE acronym in the SQL world stands for Common Table Expression.  Even if you already knew what a CTE is, you may not understand what the term “shared CTE” is referring to.  A shared CTE is the term used to describe any CTE that is referenced more than once on a query definition.

Here’s an example of a query that contains a shared CTE.  In this query, the staff CTE is referenced twice in the main query definition which qualifies the staff CTE to be categorized as a shared CTE.

WITH staff (deptno, empcount) AS
   (SELECT deptno, COUNT(*) FROM employee
    GROUP BY deptno)
SELECT deptno, empcount FROM staff
WHERE empcount = (SELECT MAX(empcount) FROM staff)

In contrast, the following query contains multiple CTES, but no shared CTEs.  Each of the CTEs (top10_2020 & top10_2021) are only referenced once in the main query, so they don’t meet the criteria of a shared CTE.

WITH top10_2020 (customer_name, total_sales
    (SELECT customer_name, SUM(sales_amt) FROM sales
     WHERE year=2020
     GROUP BY customer_name
     ORDER BY SUM(sales_amt) DESC
     FETCH FIRST 10 ROWS ONLY) ,
   top10_2021 (customer_name, total_sales) AS
     (SELECT customer_name, SUM(sales_amt) FROM sales
      WHERE YEAR=2021
      GROUP BY customer_name
      ORDER BY SUM(sales_amt) DESC
      FETCH FIRST 10 ROWS ONLY)
SELECT Y1.customer_name, Y1.total_sales AS sales2020, 
       Y2.total_sales AS sales2021
   FROM top10_2020 Y1 INNER JOIN top10_2021 Y2
       ON Y1.customer_name = Y2.customer_name                

Knowing whether or not a CTE is a shared CTE is significant because an SQL Standards compatibility fix was recently delivered in the IBM i 7.5 release for SQL statements with shared CTEs. This fix had to be made because there’s a possibility that some queries with shared CTEs may return incorrect results if the queries are run while the tables referenced by a shared CTE are being changed. The fix will guarantee as dictated by the SQL standards that each reference to the shared CTE generates the same result set. The fix delivered by IBM may cause some queries with shared CTE references to run slower and some queries with shared CTEs to run faster.

Now, you may be thinking that it will be a long time before your company installs the IBM i 7.5 release, so why pay attention to this change. The reason that you should pay attention is that the code fixes for shared CTEs that were made for the IBM i 7.5 release will eventually be delivered as PTFs for the IBM i 7.4 release (NOTE: Level 23 of the 7.4 Database Group PTF includes the shared CTE fix). IBM recommends analyzing queries with shared CTEs before the PTFs are delivered so that clients understand the possible impact of these PTFs and have time to change their SQL statements, if needed.

To help with this analysis effort, IBM in late 2021 delivered PTFs for IBM i 7.3 and 7.4 that flag SQL statements with shared CTEs in both Plan Cache Snapshot and SQL Performance Monitor collections. In addition, these flags classify whether or not a Shared CTE is estimated to generate a large result set. Shared CTEs that generate a large result set have a greater chance of having performance issues after the fix is delivered as compared to CTEs with a smaller result set size. However, all SQL statements using shared CTEs have the potential to perform differently once the PTFs are applied.

It is possible to predict the possible performance impact of the future PTFs on SQL requests containing Shared CTEs with a simple coding change.  This simple change involves adding the following predicate, AND RAND() IS NOT NULL, to the CTE that is shared (i.e., referenced multiple times) on the SQL request. This predicate forces the Db2 query optimizer to use the same CTE runtime implementation which is used by the fix in the IBM i 7.5 release.

The IBM development team has published a detailed writeup which includes details on the SQL statement flagging that will aid analysis on 7.3 & 7.4 along with possible coding changes that you may want to consider. Our Db2 team in IBM Systems Lab Services can also be engaged to provide additional assistance, so let me know if we can help.

Before I close, I also want to highlight that the licensing change that I highlighted in last month's entry is effective starting June 1, 2022.  This change converts Db2 Add-On features such as Db2 SMP and Db2 Multisystem into no additional charge features for all of the IBM i 7.x releases (7.1, 7.2, 7.3, 7.4, 7.5)