Monday, March 24, 2025

Db2 Dummy Table Replacement Simplified

I’ve written in the past about the performance advantages of using tableless queries instead of the Db2 dummy table – SYSDUMMY1. As our IBM Expert Labs team has helped clients convert their queries from using SYSDUMMY1 to tableless syntax, we’ve found that some conversions are easier than others.

Simple SYSDUMMY1 queries like this one are easy to change to use a VALUES statement and there’s very little risk of breaking the query. 

  SELECT CURRENT DATE FROM sysibm.sysdummy1

 

The conversions get more interesting when SYSDUMMY1 is referenced in a query along with “real” tables that store your business data. Here’s an example of that type of query:

SELECT 1 INTO :hv1 FROM sysibm.sysdummy1
     WHERE EXISTS (SELECT 1 FROM table1 
                    WHERE col1=:hv2 AND col2=:hv3)

 

While the following converted version of this query is not super complex, the number of required changes to eliminate the usage of SYSDUMMY1 does increase the risk of the query modifications accidentally causing the query to return different results.

WITH return1 (rtnval) AS (VALUES 1)
SELECT 1 INTO :hv1 FROM return1
  WHERE EXISTS (SELECT 1 FROM table1 
                  WHERE col1=:hv2 AND col2=:hv3)

 

Even though the risk of breaking the query is still low, I’ve recently started to recommend a simpler conversion method. This method involves creating an SQL view and replacing the SYSDUMMY1 reference with the name of your view.

 

First, let’s look at the definition of this view. Effectively this view uses the VALUES statement to return the same column value and column name as Db2’s SYSDUMMY1 table. The earlier conversion examples don’t reference the column value or column name, but your application may have SYSDUMMY1 queries that are dependent on the column value or name.

CREATE VIEW mylib.mydummy1 AS
  WITH returny (ibmreqd) AS (VALUES 'Y')
  SELECT ibmreqd FROM returny

 

Now that a view is in place providing a logical equivalent of SYSDUMMY1 – converting a query from SYSDUMMY1 to use tableless syntax is as simple as changing the FROM clause. Essentially, all that’s needed is the Find & Replace option in your code editor is all that’s needed as demonstrated in thew new

SELECT 1 INTO :hv1 FROM mylib.mydummy1 
  WHERE EXISTS (SELECT 1 FROM table1 
                  WHERE col1=:hv2 AND col2=:hv3)

 

I think the only downside of using this view-based approach is that one may overlook whether the SYSDUMMY1 reference is even needed. I’ve recently worked with clients that used SYSDUMMY1 in the following manner. 

SELECT col2 FROM table1 
  WHERE col1 = (SELECT someUDF(:hv1) FROM sysibm.sysdummy1)

 

Not sure why the programmers thought the Db2 dummy table needed to be used – maybe the user-defined function reference. However, a tableless query is also not needed in this case. The better correction of the query would be to change the query to the following:

SELECT col2 FROM table1 WHERE col1 = someUDF(:hv1)


Hopefully, you are able to see how this view-based approach simplifies the conversion to tableless query syntax and eliminates the risk of accidentally breaking your query.

 

Tuesday, February 25, 2025

To Check or Not to Check (Uniqueness)

In Hamlet’s famous ‘To Be or Not To Be’ speech, he wrestles with thoughts about life and death. Obviously, slow application performance is not a life-or-death issue, but it can sometimes feel that way if you’re the developer whose program is slowing down a critical business process.

Clients frequently engage our IBM Expert Labs team to analyze applications to identify performance bottlenecks and to identify solutions to improve performance. A couple months ago, I was reviewing the SQL statements embedded within a client’s RPG program. The analysis of the SQL identified a coding pattern where the program would run a SELECT statement against a table and almost immediately turn around and run an INSERT against the same table. The code looked something like the following:


SELECT 1 INTO :outhv FROM sometab
  WHERE col1=:hv1 AND col2=:hv2 AND col3=:hv3 AND col4=:hv4;

If sqlCode = 100;
  INSERT INTO sometab
     VALUES(:hv1, :hv2, :hv3, :hv4, )
Endif;

 

When I asked the customer about the purpose of this coding pattern, they shared that the columns referenced on the Where clause defined a unique key for the table. Thus, the SELECT statement was being run to verify if the specified key value already exists in the table. If the SELECT statement didn’t return a row, the program would know that there was no possibility of a duplicate key error – meaning that the INSERT statement would run successfully.

 

This explanation led the developer to ask if it was more efficient to have Db2 just check for the duplicate key value on the INSERT statement. With this approach, the program would be running a single SQL statement opposed to the coding pattern above that would result in two SQL statements being executed in the cases where the new values were unique. In general, the fewer calls that you make to Db2 for i (or any database), the faster that your application will run. 

 

I put together a small performance test to verify if less is more when it comes to inserting new rows that may result in a duplicate key error. Essentially, is it faster to check first to avoid the duplicate key exception or not to check by running the Insert statement and relying on Db2 to detect duplicate keys?

 

My performance test used an SQL stored procedure with a loop to insert 1,000 rows – the loop had logic that would cause every other INSERT statement to fail with a duplicate key error. For the “not to check” version of my test, the following condition handler was used to trap the duplicate key error and then allow the stored procedure to continue to run its loop inserting rows. Even with the condition handler in place, Db2 still writes a duplicate key error message into the job log.


DECLARE CONTINUE HANDLER FOR SQLSTATE '23505' 
   SET errFlag = 'Y' ;

 

I ran each version of the stored procedure multiple times to get a consistent timing. In the end, the “not to check” version of the tests consistently ran 5-6% faster than the “check” version of the procedure which avoided the duplicate key error by first running a SELECT statement. The performance tests essentially showed that the overhead of running the second SQL statement was greater than the overhead of Db2 signaling an error back to the application.

 

These test results reinforce the earlier assertion that performance is usually best when your application program runs the fewest number of SQL statements possible. With this coding pattern related to the insertion of unique keys, the answer to the question posed at the start of this entry is: Not To Check!

Friday, January 24, 2025

QTEMP Performance Loses Again

In the past, I’ve written about the inefficiencies caused by referencing QTEMP tables in your query requests. Due to the performance overhead of this approach, our IBM Expert Labs team is often employed by clients to help them reduce the number of queries using QTEMP. 

While working with a customer recently, I was reminded of one reason where developers in the past were forced to use QTEMP queries with stored procedures. While Db2 for i has supported stored procedures which return result sets for a very long time, there were many releases where those result sets were not available to invokers using embedded SQL. If the invoking program wanted to consume a result set returned by a stored procedure, the program had to use an API-based interface like the SQL Call Level Interface (CLI) or JDBC.  So, if you were an RPG or COBOL programmer you could use embedded SQL to call a stored procedure, but you were out of luck when it came to using embedded SQL to consuming the contents of a result set returned by a stored procedure.

 

This limitation was what drove some developers to use QTEMP queries. The basic idea is that if the stored procedure copied the contents of its result set into a QTEMP table, then the invoker of the stored procedure just needed to know the name of the QTEMP table in order to be able to consume the result set with a simple query.

 

Here’s an example of a logic from a stored procedure using the QTEMP technique to return a result set to the invoker.


...
-- Create QTEMP result set table if it doesn't exist in job yet
DECLARE CONTINUE HANDLER FOR SQLSTATE '42704'
  CREATE TABLE qtemp.results
   AS (SELECT ordID, ordCustID FROM orders
           WHERE ordDate = inputDate) WITH NO DATA;

-- Clear our result set rows from prior call
DELETE FROM qtemp.results;

-- Populate with result set rows
INSERT INTO qtemp.results
 (SELECT ordID, ordCustID FROM orders 
           WHERE ordDate = inputDate);
...

 

 

The invoker would then declare a cursor over the QTEMP table to consume the result set with this simple query: SELECT ordID, ordCustID FROM qtemp.results. This query is one reason that the QTEMP solution has slower performance. The application needs to one run query to populate the QTEMP table and a second query to unload the QTEMP table.

 

The good news is that there’s a simpler and faster solution which has been available since the IBM i 7.1 release. This solution is made possible through Db2’s support for result set locator variables. Think of result set locator variables like pointers. These result set locators enable the stored procedure invoker to simply point at the result set returned by the procedure in order to consume it. This approach is more efficient since it doesn’t require a copy of the result set to be made like the QTEMP approach.

 

Using the result set locators support, a stored procedure uses an SQL cursor to make the result set contents available to the invoker. The following SQL demonstrates how that is done.

CREATE PROCEDURE GetOrders()
LANGUAGE SQL
DYNAMIC RESULT SETS 1
BEGIN 

DECLARE ordCursor CURSOR WITH RETURN FOR
 (SELECT ordID, ordCustID FROM orders 
           WHERE ordDate = inputDate);

OPEN ordCursor;

END

 

The consumption of this stored procedure’s result set is enabled by result set locators as shown in the following snippet of SQL code.

DECLARE rs1 DECLARERESULT_SET_LOCATOR VARYING;  

ASSOCIATE LOCATOR(rs1) WITH PROCEDURE GetOrders;

ALLOCATE rs1Cursor CURSOR FOR RESULT SET rs1;
   
FETCH rs1Cursor INTO ordIdVar, custIdVar;
...

 

After declaring variable rs1 as a result set locator, the first step is executing the ASSOCIATE LOCATOR statement to point the rs1 variable at the result set returned by the GetOrders stored procedure.  The ALLOCATE CURSOR statement is the final step that needs to be performed since a cursor is required to retrieve the rows contained in the result set. In this example, the a cursor named rs1Cursor is allocated to fetch the rows from the stored procedure result.

Now that the two different coding methodologies have been compared, let’s talk about performance. In my testing, I measured the performance of repeated calls to the stored procedure in a single job. With distributed applications, it’s very common for the same stored procedure to be called over and over again within a database connection or job. My performance tests showed the QTEMP-based solution to be 2-4.5x slower than the solution using result set locators.  The result set locator performance advantage grew as the number of stored procedure calls increased. When comparing the performance of 25 calls, the result set locator solution was 2x faster. The performance advantage of result set locators grew to 3.8x faster with 150 procedure calls and to 4.5x faster with 300 procedure calls.

 

In my performance tests, the stored procedure result set contained 25 rows. Thus, each call to the QTEMP-based stored procedure results in 25 rows being copied into the QTEMP table. The QTEMP solution performance would slow even further with result sets that are larger in size. In addition, if the stored procedures were being called in multiple jobs, the QTEMP solution would suffer from the Plan Cache inefficiencies that I highlighted in my original QTEMP article.

 

There are some cases where a QTEMP-based solution is the winning choice, but hopefully it’s now clear that using a QTEMP table to return a stored procedure result set is clearly not one of those situations.

Tuesday, December 3, 2024

SQL Performance Center's Early Christmas Gift

If you have kids, there’s a pretty good chance they’ve asked you at some point to open Christmas gifts early. The anticipation and excitement associated with their gifts makes it difficult to wait until Christmas to open gifts. Why wait, when you can do it now!

 

Speaking of new and exciting things, the 1.1.9.7 version of IBM i Access Client Solutions (ACS) includes some significant enhancements to the SQL Performance Center. The good news is that you don’t have to wait until Christmas because this update is available now!

 

These enhancements enable you for the first time to perform all your SQL performance analysis and tuning using just the SQL Performance Center. In prior versions, the ACS Schemas tool had to be used any time that you wanted to use the Index Advisor. As you can see in the figure below, a launchpad (i.e., green box) has been added to the top of the SQL Performance Center that allows easy access to index advice.







The red box in this figure highlights that other data related to SQL performance can also be readily accessed from the launchpad.  The Maintained Temporary Indexes option makes it simple to find all the temporary indexes on your system using the MTI_INFO service that I’ve highlighted in the past. The Active Query Info option streamlines the process of identifying active queries on your system by returning the results from the ACTIVE_QUERY_INFO service which also has been recommended on this blog.

The performance data returned by these two new options can be sorted by clicking on a column.  However, the output data cannot be filtered at all. To counteract this limitation, the output for these two new interfaces (as well as other ACS interfaces) has been enhanced to include a new SQL widget button highlighted in the figure below. This SQL widget option allows you to easily access the SQL statement used to return the output. This SQL statement text can be copied into a Run SQL Scripts window. Then, you can add a WHERE clause to the query to return a filtered set of performance data (e.g., by job or by table).








The updated SQL Performance Center also has a more integrated solution when it comes to analyzing the Plan Cache entries at an ASP (auxiliary storage pool) level. A couple of years ago, I wrote about the need to consider Independent ASPs when reviewing Plan Cache entries or creating a Plan Cache Snapshot. The default setting was to only show queries run against tables created in the System ASP.  The default setting is still the same with the latest SQL Performance Center, but now the ASP selection only appears when you are accessing an IBM i server that has an Independent ASP attached. When an active Independent ASP exists, the ASP (i.e., Database) selection is displayed next to the button to display statements in the Plan Cache.






Notice that the first figure above had no selection option next to the Plan Cache Statements button. I think this visual clue is a great reminder to consider which ASP you want to focus your query performance analysis efforts on.

Now that you have a better understanding of the SQL Performance Center enhancements, I hope you agree that Christmas did come early with this ACS update. I also hope that your corporate IT policies allow you to access this gift right away:)  Merry Christmas and I look forward to talking to you in 2025!


Wednesday, November 13, 2024

Taming Query Temporary Storage Usage

 According to the Collins Dictionary, the word taming can be defined as: to overcome the wildness or fierceness of; make gentle, docile, obedient, or spiritless; subdue. Wild and fierce are words that you normally would not associate with the temporary storage used by queries. However, if the temporary storage used by runaway queries starts to push your system storage levels to 90-95% of capacity, then you probably will feel the fear associated with a wild animal approaching your camp. There are probably a few administrators out there who have had nightmares about system crashes due to running out of disk space.

The good news is that over the last several years, Db2 for i has added features (e.g., Query Supervisor) that make it easier to monitor and control the temporary storage being used by queries. The bad news is that our IBM Technology Expert Labs team is seeing these newer Db2 features not being used by clients as they engage us to help deal with system and SQL performance issues. You know what they say: you can lead a horse to water but you can’t make him drink.

 

To make it easier to leverage some of these Db2 for i features to keep query temporary storage usage under control, the IBM Technology Expert Labs team recently made two new tools available. These new aids are a Configurable Query Supervisor exit program toolkit and a Work with Queries using Temporary Storage command.


The Query Supervisor is a very flexible solution because you can create an exit program to take a wide variety of actions whenever a query exceeds a threshold like 10 GB of temporary storage. What our Db2 Expert Labs teams has discovered is that
Database Engineer (DBE) and System Admin teams don’t have the time and/or skills to create and maintain an exit program. Thus, this new toolkit allows clients to use the Query Supervisor to control query temporary storage without any programming! The exit program logic is controlled with a configuration table.

 

After defining a query supervisor threshold, user just need to perform a simple insert operation on the configuration table to define which actions should be taken when a query exceeds a temporary storage threshold. The supported actions include notifying QSYSOPR, cancelling the query, logging the event on a data queue, and sending an email via the SNDDST command.  These actions can also be conditional based on the current usage percentage of the system storage pool. For example, you may want to cancel a query if the System Storage Pool usage percentage is at 90%. However, if the System Storage usage is only at 70%, then maybe you have the exit program just notify QSYSOPR and allow the query to continue to run. If the initial set of configured actions don’t meet your requirements, then you just need to update the settings in the configuration table. No changes to the exit program or recompiles are necessary!

 

The Active_Query_Info service makes it pretty easy to identify queries on your system that are consuming temporary storage or rely on large maintained temporary indexes (MTIs). Once the query is identified, however, you’re probably going to have to run another command to take action on the query that is gobbling up temporary storage.  The new Work with Queries using Temporary Storage (WRKQTMPSTG) command enables you to take quick action on the queries that are returned by the Db2 service.

 

As you can see in the figure below once an active query with high temporary storage usage is identified you can either end or hold the job to prevent the query from consuming more temporary storage on the system. The command also allows you to log the offending query to a Plan Cache Snapshot for later review by a DBE or developer.  Another option would be to use the QRO Hash value from the display to analyze the query with Visual Explain by using this value as a filter in the Plan Cache viewer that’s available with the IBM i Access Client Solutions (ACS) SQL Performance Center.







This new support also provides filter and sort capabilities to make it faster for you to identity the queries causing the temporary storage headaches on your system. 

Neither of these new tools completely automates the taming of query temporary storage on your system, but hopefully you can see how they make it much simpler for you to monitor and control query temporary storage usage on your system. Why live in fear of wild, runaway queries overrunning the storage on your system when there’s a better way. If you’re interested in getting more information on how to purchase these new tools from IBM Technology Expert Labs, then just shoot me an email.


Wednesday, October 9, 2024

IBM i TR Boosts Db2 for i Functionality

 Just like Fall means that wonderful colors will be added to our landscapes each year, a new IBM i Technology Refresh means new features in Db2 for i for you to leverage. New and enhanced function support is the primary benefit of this round of Db2 enhancements.

 The SQL built-in functions toolset was enhanced with the delivery of a new GENERATE_UUID function. This function makes it easy to generate a Universally Unique Identifier (UUID) for those IT processes that require it.

 

For those of you create your own functions with SQL, you may be able to realize better performance with your user-defined table functions (UDTF). Prior to this latest round of enhancements, Db2 for i always generated the same cursor name (i.e., SQL_TABLE_CURSOR) for every UDTF. This naming approach can cause a cursor name collision when multiple SQL table functions are used within a query. As a result, UDTFs often need to be defined with the FENCED option which adds overhead to the function execution. The new cursor name generation algorithm should allow the NOT FENCED option to be used in more cases enabling your UDTFs to run faster.

 

The MTI_INFO service has been a great addition to the Database Engineer (DBE) toolbox. This service will now return a more complete set of information for Maintained Temporary Indexes (MTIs) associated with a partitioned table by including the partition name.  Remember, IBM only recommends the usage of partitioned tables when a table is close to the maximum size or row limit for a Db2 for i table. Even in those cases, I strongly encourage engaging the IBM Technology Expert Labs team to guide you in the table partitioning process – the very large data sets that can be stored in partitioned tables come with performance and management challenges that you will want to address with best practices.

 

The ACTIVE_JOB_INFO service has always provided the ability to retrieve the statement text for the SQL currently running in a job or that last ran in a job. This service has been enhanced to also return the QRO Hash and Plan ID identifiers for the active SQL request within a job. These identifiers make it easier to analyze the SQL statement using the tooling provided by SQL Performance Center in the IBM i Access Client Solutions (ACS) client.

 

These are my highlights of some of the Db2 for i advancements in the latest IBM i TR, so I’d encourage you to review the complete list on the IBM i Technology Updates wiki.

Monday, September 16, 2024

Cache Identity If You Can

 This entry’s title is derived from the Catch Me If You Can movie – a good one if you haven’t seen it yet. This movie  is based on the true story of a con man who takes on the identity of a pilot, doctor and lawyer as he steals millions of dollars by cashing fraudulent checks.  You may be wondering how a story like this relates to Db2 for i since none of the con’s fake identities were IT professions like programmer or database engineer. However, Db2 for i database tables can take on an identity in the form of an identity column.

 An identity column allows you to have Db2 automatically generate sequential numeric values for the column as new rows are inserted into the table. Thus, identity columns are ideal to use as a surrogate primary key or natural key columns like order number. Here’s a simple example an employee table using an identity column to generate the employee number as new employees are inserted into the table.


CREATE TABLE employee (
     empno INTEGER GENERATED ALWAYS AS IDENTITY
           (START WITH 10 INCREMENT BY 10),
     name CHAR(30), deptno CHAR(4))

 

The prior example didn’t include the CACHE setting for the identity column which defaults to 20 when the CACHE option is not specified. Here’s the updated version of our simple example to include the default setting for the CACHE option.


CREATE TABLE employee (
     empno INTEGER GENERATED ALWAYS AS IDENTITY
           (START WITH 10 INCREMENT BY 10 CACHE 20),
     name CHAR(30), deptno CHAR(4))


The default cache setting of 20 results in Db2 generating the next 20 values for that column and storing those generated values in memory. As new rows are written to this employee table, the generated values for the identity column are taken from the cached values in memory. This means that after 20 rows are inserted into the table, Db2 will have to generate and cache another 20 values in memory.

 

If the table with the identity column has a low velocity rate of inserts, then the default cache of 20 generated values will perform pretty well.  However, what if the table has a high insert velocity?  That means that in the case where the table is the target of a high rate of inserts, Db2 will have to pause after 20 rows are inserted to generate values for the next 20 rows. If you have a batch process that inserting hundreds of thousands of rows like one of our recent IBM Technology Expert Labs clients, pausing after every 20 rows is not very efficient.  The client that our IBM Technology Expert Labs team assisted was able to speed up the writes of all those rows to the table by increasing the identity column cache size to 1024. 

 

A larger cache size can be specified using the CACHE clause when a table is initially being created or after the fact with an ALTER TABLE statement like the following:


ALTER TABLE employee   
  ALTER COLUMN empno
    SET GENERATED ALWAYS AS IDENTITY
    (START WITH 10 INCREMENT BY 10 CACHE 100)

 

The obvious question to ask at this point is what’s the downside of using a larger cache size. One tradeoff is memory. However, most IBM i servers have plenty of memory to use these days … so you shouldn’t worry too much about the memory needed to store 20 generated numeric values versus 100 numeric values. Another consideration is the potential for cached generated values being lost with a system crash. System failures are a pretty rare event in the IBM i world, but if a crash did occur all the cached generated values for an identity column will be lost.

 

Let’s say that this simple employee table was created with an identity column cache size of 100 and that a system failure occurs after first 50 employees have been added to the table. After the system IPL, the next employee number value will be 1010 and not 510.  This gap in occurs because the unused generated values (i.e., 510-1000) cached in memory were lost during the system crash. 

 

This gap potential may seem concerning, but don’t forget your identity columns will often have gaps in their values without a system crash. For example, if a row is deleted, then that generated identity value is not reused. If an Insert statement running under commitment control gets rolled back, the generated identity value for the rolled back Insert is also not reused. 

 

Caching in the IT world can often provide benefits, so I hope that you’ve learned how caching can be beneficial with Db2 for i identity columns.  If you’re interested in learning about other SQL performance tuning techniques like this one, IBM is hosting a Db2 for i SQL Performance enablement workshop December 10-12 at its facility in Montpellier, France that I will be helping lead. Contact me if you're interested in additional details.