I thought about titling this entry “Tableless SQL for Dummies” to play off the old Dummies book series and the focus of this entry: the Db2 dummy table, SYSDUMMY1. However, I thought that might be a little too abrasive for some readers. Plus, that catchy title doesn’t reflect that SQL performance can be improved by simply replacing SYSDUMMY1 queries with a tableless SQL request. This Tale of the Tape comparison will help you better understand this performance difference.
You may or may not know that Db2 provides a dummy table,
SYSDUMMY1, in the SYSIBM schema. The
SYSDUMMY1 table contains a single row and column which as you can see from the
following example makes it really easy for a developer to invoke an SQL
function or retrieve the value of an SQL special register.
SELECT UPPER(:hv), CURRENT TIMESTAMP INTO :hv1, :hv2 FROM sysibm.sysdummy1
Back when I was working with software vendors, the SYSDUMMY1
table was also quite handy when porting Oracle applications to the IBM i
platform because some of those applications relied on using the Oracle dummy
table - DUAL. Database vendors often
included a dummy table because all SQL DML (Data Manipulation Language)
statements required a table reference. However, that’s no longer case because
the SQL standard has been enhanced to support tableless queries.
The VALUES statement was the new addition to the SQL standard which enables you to run tableless queries. Taking the previous example, you can see how it is implemented with the tableless support:
VALUES (UPPER(:hv), CURRENT TIMESTAMP) INTO :hv1, :hv2
These
two examples are almost identical except for the FROM clause. Not only does the
VALUES statement save you from having to type the FROM clause, it also will
return the results faster because there is no FROM clause! Let’s dig into why
the FROM clause causes a performance difference.
Any time
there is a table referenced on the FROM clause on an SQL request, the Db2 for i
engine has to performance serialization actions to support concurrent access of
the specified table and its rows. Even though the SYSDUMMY1 table is small and contains a single row, the
required serialization steps makes it slower to process than the equivalent tableless
query.
To
measure this performance difference, I created separate SQL procedures with
each one of them running these two equivalent SQL statements 50,000 times.
SELECT CURRENT TIMESTAMP INTO :hv1 FROM sysibm.sysdummy1 VALUES CURRENT TIMESTAMP INTO :hv1
The average run time for the SELECT statement procedure call
was 1,430 milliseconds while the average run time for the VALUES statement call
was 340 milliseconds – a 4x times difference in performance! While you might be thinking that the average
execution time of each method is very small when you consider each statement
was executed fifty thousand times, remember the old proverb that “Little by
little, a little becomes a lot”.
You might find that this type of SYSDUMMY1 request is being
run over and over again on your system.
In fact, I recently reviewed a customer’s SQL Plan Cache Snapshot data
and they had a SYSDUMMY1 query that has been run over 360 million times
on their system. So even if the
tableless query approach was just 100 microseconds faster, the multiplicative
impact of that small improvement will add up to make a difference on their
system.
The performance advantages that tableless
SQL requests hold over SYSDUMMY1 queries should make it clear that tableless queries are the clear winner in this comparison.
Good post, Kent. I've never really appreciated the seemingly pervasive use of SYSDUMMY1 in IBM examples/Documentation. Although I tend to jump to VALUES INTO as my first choice, my unscientific observations of regular RPG programmers is monkey-see, monkey-do, so they tend to use SELECT ... from sysdummy1. Hopefully they'll read and appreciate your recommendation.
ReplyDeleteThanks - I hadn't really thought about the number of SQL examples in the IBM documentation that reference SYSDUMMY1. I'll have to look & see if those can be changed.
ReplyDeleteMakes me wish the syntax could be modified to read something like:
ReplyDeleteReturn Values x,y Into :a,:b when using Aggregate and Scalar functions for readability purposes.
Now, to track down those pesky SYSDUMMY1 references...