Wednesday, November 13, 2024

Taming Query Temporary Storage Usage

 According to the Collins Dictionary, the word taming can be defined as: to overcome the wildness or fierceness of; make gentle, docile, obedient, or spiritless; subdue. Wild and fierce are words that you normally would not associate with the temporary storage used by queries. However, if the temporary storage used by runaway queries starts to push your system storage levels to 90-95% of capacity, then you probably will feel the fear associated with a wild animal approaching your camp. There are probably a few administrators out there who have had nightmares about system crashes due to running out of disk space.

The good news is that over the last several years, Db2 for i has added features (e.g., Query Supervisor) that make it easier to monitor and control the temporary storage being used by queries. The bad news is that our IBM Technology Expert Labs team is seeing these newer Db2 features not being used by clients as they engage us to help deal with system and SQL performance issues. You know what they say: you can lead a horse to water but you can’t make him drink.

 

To make it easier to leverage some of these Db2 for i features to keep query temporary storage usage under control, the IBM Technology Expert Labs team recently made two new tools available. These new aids are a Configurable Query Supervisor exit program toolkit and a Work with Queries using Temporary Storage command.


The Query Supervisor is a very flexible solution because you can create an exit program to take a wide variety of actions whenever a query exceeds a threshold like 10 GB of temporary storage. What our Db2 Expert Labs teams has discovered is that
Database Engineer (DBE) and System Admin teams don’t have the time and/or skills to create and maintain an exit program. Thus, this new toolkit allows clients to use the Query Supervisor to control query temporary storage without any programming! The exit program logic is controlled with a configuration table.

 

After defining a query supervisor threshold, user just need to perform a simple insert operation on the configuration table to define which actions should be taken when a query exceeds a temporary storage threshold. The supported actions include notifying QSYSOPR, cancelling the query, logging the event on a data queue, and sending an email via the SNDDST command.  These actions can also be conditional based on the current usage percentage of the system storage pool. For example, you may want to cancel a query if the System Storage Pool usage percentage is at 90%. However, if the System Storage usage is only at 70%, then maybe you have the exit program just notify QSYSOPR and allow the query to continue to run. If the initial set of configured actions don’t meet your requirements, then you just need to update the settings in the configuration table. No changes to the exit program or recompiles are necessary!

 

The Active_Query_Info service makes it pretty easy to identify queries on your system that are consuming temporary storage or rely on large maintained temporary indexes (MTIs). Once the query is identified, however, you’re probably going to have to run another command to take action on the query that is gobbling up temporary storage.  The new Work with Queries using Temporary Storage (WRKQTMPSTG) command enables you to take quick action on the queries that are returned by the Db2 service.

 

As you can see in the figure below once an active query with high temporary storage usage is identified you can either end or hold the job to prevent the query from consuming more temporary storage on the system. The command also allows you to log the offending query to a Plan Cache Snapshot for later review by a DBE or developer.  Another option would be to use the QRO Hash value from the display to analyze the query with Visual Explain by using this value as a filter in the Plan Cache viewer that’s available with the IBM i Access Client Solutions (ACS) SQL Performance Center.







This new support also provides filter and sort capabilities to make it faster for you to identity the queries causing the temporary storage headaches on your system. 

Neither of these new tools completely automates the taming of query temporary storage on your system, but hopefully you can see how they make it much simpler for you to monitor and control query temporary storage usage on your system. Why live in fear of wild, runaway queries overrunning the storage on your system when there’s a better way. If you’re interested in getting more information on how to purchase these new tools from IBM Technology Expert Labs, then just shoot me an email.


Wednesday, October 9, 2024

IBM i TR Boosts Db2 for i Functionality

 Just like Fall means that wonderful colors will be added to our landscapes each year, a new IBM i Technology Refresh means new features in Db2 for i for you to leverage. New and enhanced function support is the primary benefit of this round of Db2 enhancements.

 The SQL built-in functions toolset was enhanced with the delivery of a new GENERATE_UUID function. This function makes it easy to generate a Universally Unique Identifier (UUID) for those IT processes that require it.

 

For those of you create your own functions with SQL, you may be able to realize better performance with your user-defined table functions (UDTF). Prior to this latest round of enhancements, Db2 for i always generated the same cursor name (i.e., SQL_TABLE_CURSOR) for every UDTF. This naming approach can cause a cursor name collision when multiple SQL table functions are used within a query. As a result, UDTFs often need to be defined with the FENCED option which adds overhead to the function execution. The new cursor name generation algorithm should allow the NOT FENCED option to be used in more cases enabling your UDTFs to run faster.

 

The MTI_INFO service has been a great addition to the Database Engineer (DBE) toolbox. This service will now return a more complete set of information for Maintained Temporary Indexes (MTIs) associated with a partitioned table by including the partition name.  Remember, IBM only recommends the usage of partitioned tables when a table is close to the maximum size or row limit for a Db2 for i table. Even in those cases, I strongly encourage engaging the IBM Technology Expert Labs team to guide you in the table partitioning process – the very large data sets that can be stored in partitioned tables come with performance and management challenges that you will want to address with best practices.

 

The ACTIVE_JOB_INFO service has always provided the ability to retrieve the statement text for the SQL currently running in a job or that last ran in a job. This service has been enhanced to also return the QRO Hash and Plan ID identifiers for the active SQL request within a job. These identifiers make it easier to analyze the SQL statement using the tooling provided by SQL Performance Center in the IBM i Access Client Solutions (ACS) client.

 

These are my highlights of some of the Db2 for i advancements in the latest IBM i TR, so I’d encourage you to review the complete list on the IBM i Technology Updates wiki.

Monday, September 16, 2024

Cache Identity If You Can

 This entry’s title is derived from the Catch Me If You Can movie – a good one if you haven’t seen it yet. This movie  is based on the true story of a con man who takes on the identity of a pilot, doctor and lawyer as he steals millions of dollars by cashing fraudulent checks.  You may be wondering how a story like this relates to Db2 for i since none of the con’s fake identities were IT professions like programmer or database engineer. However, Db2 for i database tables can take on an identity in the form of an identity column.

 An identity column allows you to have Db2 automatically generate sequential numeric values for the column as new rows are inserted into the table. Thus, identity columns are ideal to use as a surrogate primary key or natural key columns like order number. Here’s a simple example an employee table using an identity column to generate the employee number as new employees are inserted into the table.


CREATE TABLE employee (
     empno INTEGER GENERATED ALWAYS AS IDENTITY
           (START WITH 10 INCREMENT BY 10),
     name CHAR(30), deptno CHAR(4))

 

The prior example didn’t include the CACHE setting for the identity column which defaults to 20 when the CACHE option is not specified. Here’s the updated version of our simple example to include the default setting for the CACHE option.


CREATE TABLE employee (
     empno INTEGER GENERATED ALWAYS AS IDENTITY
           (START WITH 10 INCREMENT BY 10 CACHE 20),
     name CHAR(30), deptno CHAR(4))


The default cache setting of 20 results in Db2 generating the next 20 values for that column and storing those generated values in memory. As new rows are written to this employee table, the generated values for the identity column are taken from the cached values in memory. This means that after 20 rows are inserted into the table, Db2 will have to generate and cache another 20 values in memory.

 

If the table with the identity column has a low velocity rate of inserts, then the default cache of 20 generated values will perform pretty well.  However, what if the table has a high insert velocity?  That means that in the case where the table is the target of a high rate of inserts, Db2 will have to pause after 20 rows are inserted to generate values for the next 20 rows. If you have a batch process that inserting hundreds of thousands of rows like one of our recent IBM Technology Expert Labs clients, pausing after every 20 rows is not very efficient.  The client that our IBM Technology Expert Labs team assisted was able to speed up the writes of all those rows to the table by increasing the identity column cache size to 1024. 

 

A larger cache size can be specified using the CACHE clause when a table is initially being created or after the fact with an ALTER TABLE statement like the following:


ALTER TABLE employee   
  ALTER COLUMN empno
    SET GENERATED ALWAYS AS IDENTITY
    (START WITH 10 INCREMENT BY 10 CACHE 100)

 

The obvious question to ask at this point is what’s the downside of using a larger cache size. One tradeoff is memory. However, most IBM i servers have plenty of memory to use these days … so you shouldn’t worry too much about the memory needed to store 20 generated numeric values versus 100 numeric values. Another consideration is the potential for cached generated values being lost with a system crash. System failures are a pretty rare event in the IBM i world, but if a crash did occur all the cached generated values for an identity column will be lost.

 

Let’s say that this simple employee table was created with an identity column cache size of 100 and that a system failure occurs after first 50 employees have been added to the table. After the system IPL, the next employee number value will be 1010 and not 510.  This gap in occurs because the unused generated values (i.e., 510-1000) cached in memory were lost during the system crash. 

 

This gap potential may seem concerning, but don’t forget your identity columns will often have gaps in their values without a system crash. For example, if a row is deleted, then that generated identity value is not reused. If an Insert statement running under commitment control gets rolled back, the generated identity value for the rolled back Insert is also not reused. 

 

Caching in the IT world can often provide benefits, so I hope that you’ve learned how caching can be beneficial with Db2 for i identity columns.  If you’re interested in learning about other SQL performance tuning techniques like this one, IBM is hosting a Db2 for i SQL Performance enablement workshop December 10-12 at its facility in Montpellier, France that I will be helping lead. Contact me if you're interested in additional details.

Wednesday, August 21, 2024

30 Years of Db2 on IBM i

 This month marks the 30th anniversary of Db2 on IBM i and its predecessor platforms – iSeries & AS/400. Those of you who follow the platform closely might think that my mathematic skills are lacking since IBM i celebrated its 35th anniversary last year in 2023. Obviously, that’s a gap of 5 years.

 

Does that mean there wasn’t a database the first 5 years of the platform? Clearly not since the integrated relational database was one the AS/400 hallmarks. However, that integrated database had no official name. Those newer to the IBM i platform, including some of our younger developers in the Rochester lab, are unaware that the database was nameless for a period of time.

 

AS/400 database was as close as the platform came to an official name. Without an official name, it was difficult for IBM Sales and Marketing teams to tout the advantages of the platform’s integrated relational database. This nameless database was so far underthe radar for some AS/400 customers that they didn’t even know they had a relational database for their applications to use. I can remember a couple of IBM salespersons talking about AS/400 clients that actually went out and requested bids from DBMS vendors like Oracle because they believed the platform didn’t have a database!

 

This lack of awareness fed into IBM’s decision to christen the integrated relational database with a name of DB2/400 in 1994 with the availability of the V3R1 release. Adopting the Db2Family brand for the AS/400 database was a no-brainer given DB2’s strong reputation for running mission-critical workloads and strong commitment to industry standards.  

 

With an official name that could finally be marketed, ads like the following started to show up in industry magazines – remember when magazines were a thing…












The DB2/400 name was more than just a naming change – the V3R1 release added new features to the database such as referential integrity, triggers, and stored procedures so that the database was on the same level as the existing Db2 Family members.

 

Once it joined the Db2 Family, DB2/400 benefitted from the sharing of ideas, designs, and technology amongst the family members. Substantial jumps in the SQL functionality for DB2/400 soon followed. In fact, five years later in 1999 another significant milestone was achieved when enough new features were added to adopt the DB2 Universal Database (DB2 UDB) branding. The evolution to the new name of DB2 Universal Database for AS/400 was warranted through the addition of advanced capabilities such as user-defined functions, user-defined types, and the large object data types (e.g., BLOB & CLOB).










After that, the database name evolved over time with the platform name changes. Moving onto DB2 UDB for iSeries before arriving at the current name of Db2 for IBM i or Db2 for i for short.

Hopefully, this history lesson has been useful to understand how Db2 for i has progressed over time and the benefits that it continues to receive from being part of the Db2 Family. Happy 30th anniversary Db2 for i!

Monday, July 29, 2024

Selects With No Limits

There’s no limit on the number of rows that an SQL SELECT statement can return to an application by default. There are a couple of different techniques, however, that programmers can employ to change this default behavior. One direct option is adding a Fetch or Limit clause to set the maximum numbers of rows the query can return. Usage of these clauses can prevent accidentally sending millions of rows across the network or ensuring a query only returns the number of rows that can be displayed on the end user interface.

Like many syntax options, there’s nothing that prevents these clauses from being used incorrectly. The following examples highlight a poor coding practice with these clauses that has recently emerged in SQL performance reviews performed by our Technology Expert Labs team. Both of these SQL examples would fail if the specified SELECT returns more than one row and they didn't contain a Fetch or Limit clause.

SELECT c1 INTO :hv1 FROM t1 WHERE c2=1 AND c3='A' 
  FETCH FIRST 1 ROWS ONLY

SET :hv1 = (SELECT c1 FROM t1 WHERE c2=1 AND c3='A' LIMIT 1)

While these limiting clauses prevent the SQL from failing with a More Than One Row error, there are several issues with this coding approach.

One big issue is data integrity. If these clauses are being used just as a safety net – meaning the query shouldn’t return more than row but it’s added just in case that happens, then these clauses will prevent the application from ever detecting that the query is returning more than one row. Multiple rows being returned could be caused by a data integrity problem, but that potential problem is hidden from the application with this coding practice. Furthermore, if it’s normal that the query could return more than one row from time to time, these SQL requests have told Db2 it’s okay to randomly select which row is returned to the program. That potential randomness exists because these SELECT statements don’t have an ORDER BY clause. In this case, Db2 is allowed to return the selected rows in any order that it wants. The chances of the randomness goes up even higher if these queries are eligible for parallel processing with Db2 SMP. I think you’d agree it’s a much cleaner and safer approach to control which row is assigned to the host variable instead of assigning a value at random.

The other issue is performance. These clauses require Db2 to track how many rows a query is returning and that tracking takes time and resources. Some internal tests show these clauses adding 0.5% overhead to the execution of a query. That overhead may seem like nothing to worry about, but just multiply that little overhand times thousands or millions of statement executions.

The Sky’s the Limit when it comes to the scalability of your SQL applications… assuming that you’re utilizing good SQL coding practices. If you’re company needs help learning efficient SQL coding practices, then our IBM Technology Expert Labs team is here to help.

Thursday, June 20, 2024

Copied Cursor Name Chaos

 

As our IBM Technology Expert Labs team performs SQL performance assessments for clients, we get the opportunity to review a wide variety of SQL statements.  These SQL statements run the gamut from simple to head-scratching to inefficient. 

One inefficient SQL coding technique that I’ve recently seen way too much is the same cursor name being used over and over again. I was a developer once, so I understand the common coding practice of copying existing code (i.e., cursor declaration) and customizing the copied code to meet your needs. However, the customization of copied cursor code should include changing the cursor name and I’ll explain why. 

Inputting a cursor name like C1 or CUR1 is fast and easy, but there are a couple of reasons why you should avoid reusing the same cursor name in your application programs. The reasons really come down to avoiding chaos which is defined as complete disorder and confusion.  The cursor name chaos impacts both the Db2 for i engine and your IBM i team.

First, duplicate cursor names can add a small amount of overhead to the runtime execution of your cursors. Db2 creates runtime data structures at program execution time to support the usage of your cursors. Once these runtime data structures are created for a cursor, Db2 tries to reuse the runtime data structures on future usages of that cursor within a job or database connection.  Db2 uses the cursor name to find these cached runtime data structures.  If your program has a half dozen cursors all named C1, then Db2 must do more work to figure out which runtime data structures belongs to the 6 different cursors named C1. If each of the cursors in the program had a unique name, Db2 is able to more quickly find the associated runtime data structures to reuse.

Second, duplicate cursor names add overhead and complexity to usage of SQL analysis tools such as those found in the IBM i Access Client Solutions (ACS) SQL Performance Center. Below, you’ll find a screen capture from my recent usage of the Plan Cache Snapshot tooling on a client’s snapshot. 











I think you’ll agree that the repeated cursor names make it difficult to differentiate between the long list of C1 cursors. At first glance, you can be tricked into thinking that they’re all the same cursor. The user must scan their eyes further down the cursor declaration to find the C1 cursor that they’re trying to analyze and tune. Making the Statement column wider helps, but I think it’s easier when the leading characters of a cursor definition clearly define the cursor.

At this point, hopefully you’re convinced that cursor names matter. If so, the next logical question to ask is: how should I name my cursors? I believe the best approach is to use a name that describes the cursor result set from a business perspective. Cursor names like LatePayments or LowInventory would be good examples of this approach. If you think the business perspective approach requires too much thinking, then consider using the program name as a suffix or prefix.  For example, the two cursors declared in PgmA might be C1_PgmA and C2_PgmA. 

Either of these cursor naming approaches is going to eliminate the overhead of finding the cached runtime data structures. In addition, they provide context when you’re reviewing a list of queries using ACS tools to analyze the Plan Cache or a Plan Cache Snapshot. More descriptive cursor names definitely make it easier to understand the who and why related to the usage of a cursor.  

Now that you understand there’s more than meets the eye when it comes to cursor names, I hope that you’ll be updating the SQL coding standards at your shop to avoid the chaos.



Tuesday, May 7, 2024

Another Round of Db2 for i Enhancements

Today, IBM announced the content of its biannual delivery of new and enhanced IBM i capabilities via the upcoming TRs (Technology Refresh) for the 7.4 and 7.5 releases. As always, there are some nice additions to  Db2 for i to make your life as an application developer and/or Database Engineer (DBE) easier!

On the programming side of things, there’s a new aggregate function, ANY_VALUE, for your grouping queries. This addition should make your grouping queries easier to write and maintain by providing a vehicle to include non-aggregated values in your grouping results. 

Walking through an example is the best way to understand the flexibility that the new ANY_VALUE function offers. The following grouping query generates a list of the Top 10 clients in terms of sales revenue.

SELECT clientName, SUM(salesAmt) AS totSales
  FROM salesHistory
  GROUP BY clientName
  ORDER BY totSales DESC LIMIT 10

Let’s says the business needs the report modified to include the client’s email address to make it easier for them to contact and thank the client for their loyalty. Instinctively, a developer might just add that column to the Select List as follows:

SELECT clientName, SUM(salesAmt) AS totSales, clientEmail
  FROM salesHistory
  GROUP BY clientName
  ORDER BY totSales DESC LIMIT 10

 However, this change results in an SQL syntax error because the GROUP BY clause requires the Select List to only contain the grouping columns or an aggregate function - the clientEmail column clearly doesn’t fall into either category. Before the ANY_VALUE function, there were two possible workarounds as shown below – neither of them ideal.

-- Workaround #1, update grouping clause
SELECT clientName, SUM(salesAmt) AS totSales, clientEmail
  FROM salesHistory
  GROUP BY clientName, clientEmail
  ORDER BY totSales DESC LIMIT 10

-- Workaround #2, use MAX grouping function
SELECT clientName, SUM(salesAmt) AS totSales, MAX(clientEmail)
  FROM salesHistory
  GROUP BY clientName
  ORDER BY totSales DESC LIMIT 10 

The first workaround adds performance overhead to the grouping query. Adding the email address to the GROUP BY clause causes Db2 to have a larger grouping value that must be compared as part of finding all the rows that belong to each group. In addition, it reduces the number of indexes that could be used by Db2 to speed up the aggregation process. Now, the index key definition must start with clientName and clientEmail in order for it to be usable by Db2. Before the grouping clause change, any index that had clientName as the leading key could be used by Db2 to perform the aggregation.

The second alternative makes the query harder to understand. The next developer that comes along may wonder why the maximum function is being used when the client email address is a constant value. Usage of the maximum function also adds a little performance overhead because Db2 must go through the work of computing the maximum value – even though the email column value will be the same within a client group.

The ANY_VALUE aggregate function simplifies the query writing and eliminates the performance overhead by telling Db2 that any old value in the specified column (or expression) will do as shown in this example query.

SELECT clientName, SUM(salesAmt) AS totSales, 
       ANY_VALUE(clientEmail) 
  FROM salesHistory
  GROUP BY clientName
  ORDER BY totSales DESC LIMIT 10

If you started leveraging the GENERATE_SPREADSHEET service delivered in the last IBM i TR in your programs, then you may appreciate that service being enhanced to support more spreadsheet types and smarter column width generation for xlsx and xls files.

The latest version of IBM i Access Client Solutions (ACS) also includes some additions to help developers looking for SQL syntax help. The relatively new SQL Generators for Update and Delete statements were enhanced to generate more efficient SQL for non-nullable columns.  In ACS Run SQL Scripts, the performance of the Content Assist feature was improved to make it faster for those wanting to prompt their way through building an SQL request. The Content Assist feature for SQL statement building is launched by using the F4 key or Ctrl+Space.

ACS also includes new capabilities for the DBE. Specifically, new toolbar options to make customizing and interpreting Visual Explain output easier. Instead of having to navigate the Actions pull-down menu in Visual Explain, these new toolbar options make it easy to change the highlighting, icon labels, or arrow labels as you attempt to optimize the performance of your queries.





On the server-side, DBEs should also enjoy the enhancements to the DUMP_PLAN_CACHE and ACTIVE_QUERY_INFO. In the past, I wrote about the challenges of using the DUMP_PLAN_CACHE service with servers that utilize Independent Auxiliary Storage Pools (IASPs). The good news is that this service is no longer IASP challenged! A new parameter, IASP_NAME, has been added to enable the seamless collection of plan cache entries related to Db2 databases in your IASP as shown in this example.

CALL QSYS2.DUMP_PLAN_CACHE(FILESCHEMA=> 'SOMELIB',
                           FILENAME  => 'SNAP1',
                           IASP_NAME => '*ALL')

The usability of the ACTIVE_QUERY_INFO service was also greatly enhanced for those of you that are trying to monitor and track queries running in host server jobs like the QZDASOINIT jobs.  The original version of the service had a job user parameter, JOB_USER, to track queries for a specific user. However, the JOB_USER filter was not applied to the current user of host server jobs.  So essentially, there was no way to do user-level filtering for active queries running in host server jobs. That limitation is now gone with the addition of a new USER_NAME parameter that is applied to the current user of a host server job. The following example shows how you can easily target the active queries run by BADUSER in the QZDASOINIT server jobs.

SELECT * FROM
   TABLE(ACTIVE_QUERY_INFO(JOB_NAME=> 'QZDASOINIT', 
                           USER_NAME=> 'BADUSER'))

These are my highlights of some of the Db2 for i advancements in the latest IBM i TR, so I’d encourage you to review the complete list on the IBM i Technology Updates wiki.  Also, don’t forget that the Memorandum to Users (MTU) gets updated with each IBM i TR as well – many users mistakenly believe that the MTU only gets updated at the GA of a new release.  Here are the links to the MTU: