Wednesday, July 7, 2021

Tale of the Tape - SYSDUMMY1 vs Tableless SQL Performance

 I thought about titling this entry “Tableless SQL for Dummies” to play off the old Dummies book series and the focus of this entry: the Db2 dummy table, SYSDUMMY1.  However, I thought that might be a little too abrasive for some readers.  Plus, that catchy title doesn’t reflect that SQL performance can be improved by simply replacing SYSDUMMY1 queries with a tableless SQL request. This Tale of the Tape comparison will help you better understand this performance difference.  

You may or may not know that Db2 provides a dummy table, SYSDUMMY1, in the SYSIBM schema.  The SYSDUMMY1 table contains a single row and column which as you can see from the following example makes it really easy for a developer to invoke an SQL function or retrieve the value of an SQL special register. 

SELECT UPPER(:hv), CURRENT TIMESTAMP
  INTO :hv1, :hv2
FROM sysibm.sysdummy1

Back when I was working with software vendors, the SYSDUMMY1 table was also quite handy when porting Oracle applications to the IBM i platform because some of those applications relied on using the Oracle dummy table - DUAL.  Database vendors often included a dummy table because all SQL DML (Data Manipulation Language) statements required a table reference. However, that’s no longer case because the SQL standard has been enhanced to support tableless queries.

The VALUES statement was the new addition to the SQL standard which enables you to run tableless queries.  Taking the previous example, you can see how it is implemented with the tableless support:

VALUES (UPPER(:hv), CURRENT TIMESTAMP)
  INTO :hv1, :hv2


These two examples are almost identical except for the FROM clause. Not only does the VALUES statement save you from having to type the FROM clause, it also will return the results faster because there is no FROM clause! Let’s dig into why the FROM clause causes a performance difference. 

 

Any time there is a table referenced on the FROM clause on an SQL request, the Db2 for i engine has to performance serialization actions to support concurrent access of the specified table and its rows. Even though the SYSDUMMY1 table is small and contains a single row, the required serialization steps makes it slower to process than the equivalent tableless query. 

 

To measure this performance difference, I created separate SQL procedures with each one of them running these two equivalent SQL statements 50,000 times. 


SELECT CURRENT TIMESTAMP INTO :hv1 FROM sysibm.sysdummy1

VALUES CURRENT TIMESTAMP INTO :hv1

The average run time for the SELECT statement procedure call was 1,430 milliseconds while the average run time for the VALUES statement call was 340 milliseconds – a 4x times difference in performance!  While you might be thinking that the average execution time of each method is very small when you consider each statement was executed fifty thousand times, remember the old proverb that “Little by little, a little becomes a lot”.

You might find that this type of SYSDUMMY1 request is being run over and over again on your system.  In fact, I recently reviewed a customer’s SQL Plan Cache Snapshot data and they had a SYSDUMMY1 query that has been run over 360 million times on their system.  So even if the tableless query approach was just 100 microseconds faster, the multiplicative impact of that small improvement will add up to make a difference on their system. 

The performance advantages that tableless SQL requests hold over SYSDUMMY1 queries should make it clear that tableless queries are the clear winner in this comparison.

3 comments:

  1. Good post, Kent. I've never really appreciated the seemingly pervasive use of SYSDUMMY1 in IBM examples/Documentation. Although I tend to jump to VALUES INTO as my first choice, my unscientific observations of regular RPG programmers is monkey-see, monkey-do, so they tend to use SELECT ... from sysdummy1. Hopefully they'll read and appreciate your recommendation.

    ReplyDelete
  2. Thanks - I hadn't really thought about the number of SQL examples in the IBM documentation that reference SYSDUMMY1. I'll have to look & see if those can be changed.

    ReplyDelete
  3. Makes me wish the syntax could be modified to read something like:
    Return Values x,y Into :a,:b when using Aggregate and Scalar functions for readability purposes.
    Now, to track down those pesky SYSDUMMY1 references...

    ReplyDelete