In Hamlet’s famous ‘To Be or Not To Be’ speech, he wrestles with thoughts about life and death. Obviously, slow application performance is not a life-or-death issue, but it can sometimes feel that way if you’re the developer whose program is slowing down a critical business process.
Clients
frequently engage our IBM Expert Labs team to analyze applications to identify performance
bottlenecks and to identify solutions to improve performance. A couple months
ago, I was reviewing the SQL statements embedded within a client’s RPG program.
The analysis of the SQL identified a coding pattern where the program would run
a SELECT statement against a table and almost immediately turn around and run
an INSERT against the same table. The code looked something like the following:
SELECT 1 INTO :outhv FROM sometab WHERE col1=:hv1 AND col2=:hv2 AND col3=:hv3 AND col4=:hv4; If sqlCode = 100; INSERT INTO sometab VALUES(:hv1, :hv2, :hv3, :hv4, …) Endif;
When I
asked the customer about the purpose of this coding pattern, they shared that
the columns referenced on the Where clause defined a unique key for the table.
Thus, the SELECT statement was being run to verify if the specified key value
already exists in the table. If the SELECT statement didn’t return a row, the
program would know that there was no possibility of a duplicate key error –
meaning that the INSERT statement would run successfully.
This
explanation led the developer to ask if it was more efficient to have Db2 just
check for the duplicate key value on the INSERT statement. With this approach,
the program would be running a single SQL statement opposed to the coding
pattern above that would result in two SQL statements being executed in the
cases where the new values were unique. In general, the fewer calls that you
make to Db2 for i (or any database), the faster that your application will
run.
I put
together a small performance test to verify if less is
more
when it comes to inserting new rows that may result in a duplicate key error.
Essentially, is it faster to check first to avoid the duplicate key exception
or not to check by running the Insert statement and relying on Db2 to detect
duplicate keys?
My performance test used an SQL stored procedure with a loop to insert 1,000 rows – the loop had logic that would cause every other INSERT statement to fail with a duplicate key error. For the “not to check” version of my test, the following condition handler was used to trap the duplicate key error and then allow the stored procedure to continue to run its loop inserting rows. Even with the condition handler in place, Db2 still writes a duplicate key error message into the job log.
DECLARE CONTINUE HANDLER FOR SQLSTATE '23505' SET errFlag = 'Y' ;
I ran
each version of the stored procedure multiple times to get a consistent timing.
In the end, the “not to check” version of the tests consistently ran 5-6%
faster than the “check” version of the procedure which avoided the duplicate
key error by first running a SELECT statement. The performance tests
essentially showed that the overhead of running the second SQL statement was
greater than the overhead of Db2 signaling an error back to the application.
These
test results reinforce the earlier assertion that performance is usually best
when your application program runs the fewest number of SQL statements
possible. With this coding pattern related to the insertion of unique keys, the
answer to the question posed at the start of this entry is: Not To Check!
Performance is one of the factors to check but I don't like a joblog full of duplicate record messages neither so I consider a clean log as important as well.
ReplyDeleteI can't count the number of hours people waste when troubleshooting an issue and looking at errors that aren't real errors.
Yes, there are often tradeoffs with choices made in IT.
DeleteIs the sql MERGE statement better suited for the case? In native I/O would be a chain and a conditional WRITE, really compact and the intent is clear. I personally would avoid to advise to rely on exception handling for exceptional conditions like duplicate keys, it is really bad to rely systematically on that, the business logic should be clean and not resort to such practice. It is like a monitored WRITE in native I/O, the intent is not clear, the log would log an exception etc... really bad practice doing that.
ReplyDeleteCouldn't a write from another job after the chain/setll cause this write to fail?
DeleteMERGE has less overhead then the SELECT/INSERT approach, but my tests showed INSERT also a little faster than MERGE
DeleteI would personally avoid and suggesting leveraging such exception handling behavior. Exception are for exceptional cases and a red flag if someone has a logging facility or similar systems. Not worth it for 5% gain in performance (or even less against MERGE. Clear intent in code is better, known condition/cases should be handled and clear to the reader. Any coder would find this approach "hackish" at best. If those 5% are 5 minutes , one should analyze the whole picture and design instead of local single statement optimization.
ReplyDeleteIf the code documents why the duplicate key SQLSTATE/SQLCODE is being checked, then I don't agree that the "Check" code is cleaner.
Delete5% may not seem much, but consider the impact if there are dozens or hundreds of jobs calling the program.
I'll point out an additional benefit of the single statement. With check before insert, two jobs could be adding the same key at the same time. Both find out the coast is clear and proceed to insert. One of them will be surprised. The single insert with catching the error inherently catches this issue.
ReplyDeleteAbsolutely Not true if under locking or transaction, read for update etc.. these are db guarantees....
DeleteI would like to see the performance results if the check is instead done with a SETLL
ReplyDeleteSQL is doing the same index lookup as SETLL
Delete