Thursday, March 21, 2024

Not All Db2 Functions Are Created Equal

If you’ve ever perused the list of built-in Db2 for i functions, you’ve probably noticed that there’s some overlap in the capabilities provided by a portion of the functions. In some cases, the functions have different names but result in the exact same processing being performed – the COALESCE and IFNULL functions are good examples. This situation arises because COALESCE is the name assigned to the function by the SQL Standards and IFNULL is a proprietary name used by other DBMS products. In this situation, the overlapping functions are created equal - meaning performance will be exactly the same regardless of which function name that your SQL request references.

There are other overlapping functions, however, where performance can be different depending on the function that you choose. This difference happens most often where one function provides a superset of capabilities of another function. For example, the VARCHAR and VARCHAR_FORMAT built-in functions both support the conversion of a datetime value into a character string. The VARCHAR function supports a much smaller list of string representations of a datetime value than VARCHAR_FORMAT.

The larger and richer set of string representations of datetime values supported by the VARCHAR_FORMAT function comes at a cost. That added performance cost is because Db2 implemented the VARCHAR_FORMAT built-in function as an internal user-defined function. The user-defined function implementation means that each invocation of the VARCHAR_FORMAT function involves a call to an external program to perform the specified processing.  Most Db2 for i built-in functions are implemented as inline functions which means that all of the processing is performed without the cost of an external program call. 

Each of the following two SQL statements will return the same string value, but the VARCHAR_FORMAT function is going to consume more system resources and time in the process. So, it’s helpful to understand there may be performance differences when choosing between built-in Db2 functions. 

VALUES VARCHAR(:HOST_VAR, ISO);

VALUES VARCHAR_FORMAT(:HOST_VAR, 'YYYY-MM-DD');

The next logical question is: How do I know if a Db2 built-in function is implemented as an internal user-defined function? The good news is that it is documented in the Db2 for i SQL Reference. The bad news is that this documentation is hard to find. The function implementation details are not covered anywhere in the Db2 built-in function documentation. The function implementation details are reported indirectly in the CREATE INDEX documentation?!?!?

Why on earth would function implementation details be covered with the CREATE INDEX statement? To answer that question, you need to remember that Db2 for i enables you to create derived key indexes like the following where derivations such as a built-in function can be specified as part of the key definition.

CREATE INDEX index1 ON sometable( DATE(TSTMPCOL) )

Derived key indexes, however, don’t support functions that are implemented as user-defined functions. Thus, the CREATE INDEX documentation needs to highlight those built-in functions that are not allowed in your key definition because they are implemented as internal user-defined functions.

When using this index-based function documentation, you need to be aware that not every name for a function is documented in this list. For example, the VARCHAR_FORMAT function name is in the list. However, the TO_CHAR function does not appear in this list even though its usage results in the VARCHAR_FORMAT function being invoked which is implemented as an internal user-defined function.

Hopefully, you've learned a new criteria that should be applied when selecting a Db2 built-in function to use in your SQL - now that you know all Db2 functions are not created equal.

Friday, February 23, 2024

Oldie, but Goodie - Journal Minimal Data

 On this blog, I often highlight recent Db2 for i enhancements to ensure that users are leveraging the latest Db2 functions and features to speed up performance, simplify app development, and ease management of their databases.

With all the great enhancements churned out by the IBM i development team in Rochester, it’s easy to forget about valuable features delivered in older releases. I think a good way to characterize these older features is the The Oldies, but Goodies phrase coined by a Los Angeles DJ.

In the last several SQL Performance Assessments that I’ve performed for IBM i clients, I’ve come to realize that Journal Minimal Data is an older feature (first delivered almost 25 years ago with V5R1!) that clients have either forgotten about or not aware of. This feature can speed up applications that are updating journaled tables or physical files by writing only the changed data to the journal instead of the entire record image. This performance boost can be substantial in the cases where the update operation is changing the value of a small number of columns. The following SQL statement that I recently analyzed in a client’s SQL workload definitely fell into that category. 

UPDATE table1 SET column1 = 'New Value' WHERE column1 = ' '

In the client’s case, the definition of table1 contained 50+ columns with a total record length of 1530 bytes. The column being updated, column1, was a fixed-length character column with a length of 50 bytes. Those metrics mean that 1530 bytes were being written to the journal every time this Update statement was executed - even though only 50 bytes were being changed.  With the Journal Minimal Data support enabled, 30 times less data would be written to the journal. 

There was a tradeoff with this initial Journal Minimal Data support because the journal entry wasn’t readable.  As a result, you couldn’t use the journal entry for audit purposes to determine who changed a column to a particular value. Knowing that could be a potential showstopper for some users, IBM development came to the rescue and added a Field Boundary setting (*FLDBDY) for the MINENTDTA parameter. This *FLDBDY value provides the performance benefits of minimizing the amount of data written to the journal entry along with enabling the journal entries to be readable for audit purposes. BTW, this functionality still falls into the Oldie but Goodie category since it’s been around since V5R4 which GA’d back in 2006.

It's still a good idea to keep up with the latest Db2 for i enhancements with this blog and the Db2 Technology Updates site, but also don’t forget to mine the IBM i documentation for classic “old” features that could boost performance or simplify your application development.

Friday, January 12, 2024

The Long & Short of SQL Names

While crafting a title for this month’s blog entry, I discovered that the phrase, the long and short of it,  which this title is based on started out a little differently. The original phrase used to mean the essence of something was actually - the short and long of it

This original ordering of the phrase mirrors the evolution of table names on Db2 for i. Initially, table names were limited to a short 10-character maximum length. That max table name length jumped to 128 characters with the V3R1 release. Based on the 128-character max length for table names, you may think that the only consideration in choosing a new table name is how descriptive you want your name to be. The descriptiveness of the table name is definitely a consideration, but there’s also a performance consideration you’re probably unaware of. 

As mentioned earlier, names were initially limited to a length of 10 characters. This limit was not just for Db2 tables names, it was the limit for all object types (data queues, journals, etc.) supported by the operating system.  As a result, the internal system function that is used to resolve unqualified object names like the reference to table1 in the following SQL was built to support 10-character names.

SELECT column1 FROM table1 

When Db2 for i was enhanced to support table names up to 128 characters in length, the plumbing used by that internal system function was only extended to support object names up to 30 characters in length. Does that mean you can’t have unqualified references to table names greater than characters long? NO – it just means that it will take longer for the system to resolve unqualified references to tables with a name longer than 30 characters.

The next logical question is: how much longer does it take? Simple performance tests on my system (your mileage may vary) showed the response time of an SQL request being almost 1.5% slower with a 40-character table name compared with the same SQL statement using a 20-character table name.  This same performance overhead would also apply to unqualified SQL view and alias names greater than 30 characters long.

This overhead may not seem like a big performance difference, but if you multiply that across hundreds or thousands of SQL statement executions on your system it could be significant. As I’ve highlighted on this blog before, little things can add up when you’re talking about SQL performance.

George Washington, the first U.S. president, agreed with this thinking. According to the Daily Stoic,  one of Washington’s favorite sayings was “many mickles makes a muckle”. That saying was from a Scottish proverb that highlighted that things add up – even little ones.

And that’s the long and short of it when it comes to performance considerations for SQL table, view, and alias names. 

Wednesday, December 13, 2023

Db2 Gives the Gift of Cache

If you’re like me during the Christmas season, you’re devoting a fair amount of brain cycles and time trying to come up with that perfect gift idea for a family member. There are obvious Christmas gifts to avoid like clothes for little kids, a gym membership for your spouse, etc. Then, there’s the controversial gift of cash which can lead to the question of how much thought you put into the gift (if you don’t think cash is controversial, try your own internet search on: is cash a good Christmas gift).

Luckily, Db2 for i has avoided all that controversy and debate by giving you the gift of cache instead of cash. Even better, you don’t have to wait until Christmas Day for this gift since the Db2 query engine has already given you the gift of cache - way back in V5R2!.

Several months ago, I highlighted one of the cache gifts – the Plan Cache. This month, I’m going to highlight some under the radar caching associated with Plan Cache that Db2 performs automatically to improve the performance of frequently run queries on your system.

Occasionally, the access plan (aka, query plan) constructed by the query optimizer utilizes temporary data structures and access methods  to implement your query. For example, a temporary sorted list may be used to perform an in-memory sort if your query contains an ORDER BY clause like the following. 

SELECT ordID, ordQty FROM orders 
  WHERE ordCustID = :inpID ORDER BY ordDate

When a query is done running and the query plan includes a temporary data structure, Db2 sometimes decides to leave that temporary data structure on the system populated with the data from your table.  The reason that it decides to leave your table data cached in temporary data structures is that it can improve performance on the next run of that query.

If that query is run again and the data in the underlying table has not been changed, Db2 can skip the work of repopulating the temporary data structure with your table data. Depending on the amount of table data being copied, this can provide a nice performance boost to the next run of the query. If a query contains predicates that reference parameter markers or host variables like ordCustID = :inpID in this example query, then Db2 will also validate that the query is searching for the same value In addition to verifying that the table data is unchanged. If a different host variable or parameter value is being searched for by the query, then the cached data in the temporary data cannot be used.

The Db2 query engine provides feedback when it boosts your query performance with cached temporary data structures on a couple of different interfaces.  If you’re analyzing a query plan with Visual Explain (VE), you can see below that the highlighted Final Select icon has a couple of fields in the Actual Runtime Information section providing feedback on the use of cached temporary data structures.










VE provides a count of the number of times a cached temporary result was used during the execution of the query. If Db2 was unable to reuse the cached data in a temporary data structure, then it provides the last reason that the cached data could not be used. For this query, the cached data was not able to be reused because the rows in the table had had changed (e.g., Insert, Update or Delete) since the temporary data structure had been populated.

The live Plan Cache viewer in the IBM i Access Client Solutions (ACS) SQL Performance Center can also be used to get feedback on the reuse of cached data from temporary data structures. After you launch the ACS SQL Performance Center, the live Plan Cache viewer is accessed by clicking on the Show Statements button. As you can see in the figure below, this interface allows you to see cached results feedback at a statement level. Most likely, you’ll have to use the Show Statements Columns editor to get your Total Cached Results Used column to display next to the query text.




In this case, half of the query runs were able to use the table data cached in a temporary data structure instead of spending time and resources copying the same data into the temporary data structure. The application did nothing but run the query 8 times and Db2 automatically boosted the performance on half of those runs with its temporary data structure caching.

As you’ve learned here, your mileage may vary on the frequency of cached results on your system based on how often the underlying tables are changed and how often the same host variable and parameter marker values are searched. Our IBM Technology Expert Labs team has noticed several instances were 10-15% of some query executions on busy systems are benefitting from this cache as we’ve assisted IBM i clients with SQL performance tuning.

This Db2 gift of cached temporary results is not a new gift, but hopefully you have a better understanding of the query performance benefits that this type of caching can provide. Merry Christmas and I look forward to talking to you in 2024!


Monday, November 13, 2023

Can SMP stand for System May Plod?!?!

 I'm guessing that you already know that the SMP abbreviation in Db2 for i SMP actually stands for Symmetric Multiprocessing and not System May Plod. However, I'm hoping a title that highlights a different meaning - especially one that implies slow performance - gets people's attention because the usage of Db2 SMP can have negative impacts.

One definition of plod that I found online is: walk doggedly and slowly with heavy steps. I'm not sure about you, but that's not how I'd want the performance of my IBM Power server described! How could slow ever be associated with the Db2 for i SMP feature?  When one thinks of parallel processing, it's easy to envision your queries receiving a performance boost like an F1 race car receives from its turbochargers. While this type of performance improvement can occur for your queries with Db2 SMP, it can also result in plodding system performance when parallel processing is not properly used.

The improper usage of Db2 SMP occurs when it's enabled systemwide instead of selectively applying parallel processing to long-running queries and database operations in specific jobs. A couple of years ago, I summarized parallel processing best practices for Db2 for i in an entry titled: The ABCs of Effective Db2 SMP Usage. The "C" stands for Controlled Usage - just another way of describing how parallel processing should only be applied to a subset of your workloads. Despite this strong recommendation from myself and other IBMers, too many IBM i clients are taking the easy road of activating Db2 SMP systemwide looking for a quick fix for their query performance challenges.

This "easy" approach of enabling database parallel processing systemwide may enhance query performance, but it can be at the expense of other workloads running on the system. Below you'll find an IBM iDoctor for IBM i graph of CPU utilization (green line) from a customer's system. Notice how the CPU utilization is holding steady around a 60% rate. That steady, healthy CPU usage rate is then interrupted by a sharp jump in CPU utilization for several minutes in the 85-90% range.













When our IBM Technology Expert Labs team was engaged by the client to determine the root cause of these system performance spikes, they found that the high CPU usage peaks were a result of multiple queries on the system using parallel processing. The query users were probably happy, but it's easy to see how other users and workloads on the system were negatively impacted. This client's experience was not just a one off - my colleagues in IBM Support and Technology Expert Labs have shared several stories me about clients engaging them to explain their system slowdowns.

This poor practice has been compounded by the fact that since June 2022, IBM i clients can try Db2 SMP without paying for a license. In prior years, IBM i customers had to make a monetary investment before they could take the easy road of enabling Db2 SMP systemwide. In fact, IBM Support's official stance is that clients should only enable parallel processing systemwide on IBM i 7.5.  This latest release of the IBM i contains two new features, PARALLEL_MAX_SYSTEM_CPU and PARALLEL_MIN_TIME, which are very useful in preventing a system from being overrun from systemwide usage of Db2 SMP.    

Some clients that have taken the easy path of enabling Db2 SMP for all queries run on the system have also discovered that some of their high-level language programs are not thread safe. Consider an RPG program which has been registered as an external user-defined function (UDF) with the SQL CREATE FUNCTION statement. If parallel processing is applied to a query that invokes this external UDF, then you need to remember that multiple parallel threads can now be invoking the RPG program.  If this RPG program is not thread safe, then issues can arise.  In this situation, the developer can compile their high-level language programs to be thread-safe or specify the DISALLOW PARALLEL option on the CREATE FUNCTION statement which will prevent the database engine from using parallel processing on any query that references the UDF. It's very easy to overlook program thread safety when turning on parallel processing for all the queries on your system.

Systemwide activation of Db2 SMP is also risky because it can mask the root cause of your query performance problems. With Db2 SMP, you're essentially trading system resources for faster response times. While your queries can return faster with Db2 SMP, sometimes there are more efficient methods to improve query performance - if you've enabled parallel processing systemwide, it can be harder to find the queries that would benefit from these more efficient methods. 

A couple of years ago, our Technology Expert Labs team was hired to help a client experiencing degraded system performance. As we investigated the problem, we discovered that Db2 SMP was enabled systemwide. This systemwide enablement of Db2 SMP was causing a frequently run query to be implemented with a parallel degree of 38. This meant that every time this query was run on the system - there were 38 threads running in parallel consuming CPU, memory, and disk I/O resources. Further analysis of this query and database identified that the creation of a new index would enable the query to run in the same amount of time but with less system resources (i.e., use a single thread). This client had used Db2 SMP systemwide to "fix" their query performance problems with hardware instead of addressing the root cause of a missing index - eventually they ran out of hardware resources.

Hopefully, you now have a better awareness of the risks of taking the easy road of activating Db2 SMP systemwide. Putting in the extra effort to selectively use parallel processing should result in a healthier system performance.  I stumbled upon the following quote from Robert Kiyosaki and think it summarizes things nicely: Remember the easy road often becomes hard, and the hard road often becomes easy.

I hope all my US readers have a great Thanksgiving holiday.

Thursday, October 12, 2023

Fall Into a Smorgasbord of Db2 Goodies

Last month's entry coincided with the start of Fall on the calendar. Another event that occurs each Fall is IBM announcing IBM i Technology Refresh content. This year's announcement happened on October 10th and as you've come to expect there are several Db2 for i enhancements. No major additions to Db2 this time, but several small Db2 enhancements that could make your life easier - thus, usage of smorgasbord in this title is a good fit.

I'll illuminate my favorite Db2 enhancements in the latest IBM i 7.4 and 7.5 Technology Refreshes (TRs), but you'll want to review the complete list of enhancements. Feel free to add comments to this entry if you find other Db2 for i enhancements that you believe are worthy of highlighting.

Tightening security for the valuable business data stored on your IBM i systems is a top priority, so the latest TRs have a couple of interesting security enhancements. If you're using SQL to perform encryption, you can now perform 256-bit encryption with the new ENCRYPT_AES256 function. The original SQL function for AES encryption, ENCRYPT_AES, only supports 128-bit encryption. If you have applications using Dynamic SQL with interfaces that may be susceptible to SQL injection attacks, then you may want to consider using the enhanced PARSE_STATEMENT service to verify only the intended Db2 objects are being referenced by your Dynamic SQL statement strings.

The new HTTP functions in QSYS2 introduced 2 years ago have been a big boon to developers using SQL to integrate restful services into their IBM i applications. The latest TRs continue to enhance this HTTP services toolkit with the addition of new HTTP header response controls for the existing functions and two new functions. The new functions are HTML_ENTITY_ENCODE and HTML_ENTITY_DECODE. The encode can be used to convert a character string into a UTF-8 string that is valid HTML. For example, if your input string contains the less than or greater than characters like <b>, the encode function will produce &lt;b&gt; - the decode function provides similar functionality, just the other direction.

The usability of the SQL Error Logging Facility (SELF) is also improved with the ability to have Db2 monitor and log for groupings of SQL codes - specifically the warning codes (SQLCODE>0) and the error codes (SQLCODE<0). As shown below, the special values of *WARN & *ERROR can be specified instead of assigning multiple individual code values to the IBM-provided global variable.

CREATE OR REPLACE VARIABLE SYSIBMADM.SELFCODES VARCHAR(256) DEFAULT '*WARN';
SET SYSIBMADM.SELFCODES = '*ERROR';

When the logged SQL error data in the SELF_ERROR_LOG view in QSYS2 is analyzed, one may also want to leverage the new SQLCODE_INFO function in SYSTOOLS to return the message text associated with the logged SQLCODE value.

Developers that are using Code for IBM i as their IDE will also want to give the new Db2 for i Extension a try to enrich their usage of SQL in this graphical development environment.

The final Db2 for I enhancement to highlight is the GENERATE_SPREADSHEET function in SYSTOOLS. This function provides a way for you to programmatically drive the Data Transfer feature in IBM i Access Client Solutions (ACS) to generate a spreadsheet. The function can be used to populate the contents of a spreadsheet file in IFS with the rows in table or the rows produced by a query.

Here's an example of using the new function to generate a spreadsheet file containing the results of the specified query.

VALUES SYSTOOLS.GENERATE_SPREADSHEET(
   PATH_NAME => '/mydir/EmpQry_Spreadsheet', 
   SPREADSHEET_QUERY => 'SELECT fname, lname, workdept FROM emp WHERE role=''ANALYST'' ', 
   LIBRARY_NAME => NULL,  
   FILE_NAME => NULL, 
   SPREADSHEET_TYPE => 'csv', 
   COLUMN_HEADINGS => 'COLUMN' )


I hope you find a few tasty enhancements from the buffet of Db2 items delivered by this latest IBM i TRs that can make your IBM i life easier!

Friday, September 22, 2023

Jump-start your SQL Development with new ACS SQL Generators

With the first day of Fall quickly approaching, it won’t be too long before us Minnesotans need to make sure we have jumper cables in our vehicles for the cold winter months. Hopefully, that’s later rather than sooner…

The new SQL generators for Db2 tables (& physical files) in IBM i Access Client Solutions (ACS) 1.1.9.3 provide a different type of jump-start. These recently released generators should help developers that are new to SQL by providing the option of generating Insert, Update, and Delete statements along with Stored Procedures that perform the same data change operations.

The following figure highlights the new Generate options available for a table with ACS 1.1.9.3. Prior to this latest round of enhancements, ACS only provided the ability to generate the table definition or a query to retrieve all the columns from a table.














Here's an example of the SQL produced by the Update and Delete statements generated for the Employee table in the Db2 sample database.










Notice that the generated statements include host variable placeholders to make it easier to embed the SQL statements in your high-level language programs. Also, notice how the ACS generator utilizes the IS DISTINCT predicate on the WHERE clause. The IS DISTINCT predicate is handy with null capable columns because it automatically accounts for the null values - either both values being compared must both be null or both of the non-null values must match.

Here's an example of the SQL generated by the Procedure->Delete option.


















You may be wondering what’s the point of generating a stored procedure that wrappers an Insert, Update or Delete statement. The usage of stored procedures to support CRUD (Create, Read, Update, Delete) operations on database tables is a pretty common coding pattern/architecture. Some DBMS products advocate this approach because their stored procedures offer better performance for Create, Read, Update, and Delete operations. While this performance difference doesn’t apply to Db2 for i, the CRUD stored procedure approach does offer security and application architecture advantages.

ACS 1.1.9.3 currently only generates stored procedures for create (ie, insert), update, and delete operations. Adding a stored procedure generator for the read operation is on IBM’s list of future possible enhancements.

You should now see how these new SQL generators in ACS can provide a nice boost to the coding productivity of developers who are new to using SQL and/or stored procedures on IBM i.