Tuesday, February 25, 2025

To Check or Not to Check (Uniqueness)

In Hamlet’s famous ‘To Be or Not To Be’ speech, he wrestles with thoughts about life and death. Obviously, slow application performance is not a life-or-death issue, but it can sometimes feel that way if you’re the developer whose program is slowing down a critical business process.

Clients frequently engage our IBM Expert Labs team to analyze applications to identify performance bottlenecks and to identify solutions to improve performance. A couple months ago, I was reviewing the SQL statements embedded within a client’s RPG program. The analysis of the SQL identified a coding pattern where the program would run a SELECT statement against a table and almost immediately turn around and run an INSERT against the same table. The code looked something like the following:


SELECT 1 INTO :outhv FROM sometab
  WHERE col1=:hv1 AND col2=:hv2 AND col3=:hv3 AND col4=:hv4;

If sqlCode = 100;
  INSERT INTO sometab
     VALUES(:hv1, :hv2, :hv3, :hv4, )
Endif;

 

When I asked the customer about the purpose of this coding pattern, they shared that the columns referenced on the Where clause defined a unique key for the table. Thus, the SELECT statement was being run to verify if the specified key value already exists in the table. If the SELECT statement didn’t return a row, the program would know that there was no possibility of a duplicate key error – meaning that the INSERT statement would run successfully.

 

This explanation led the developer to ask if it was more efficient to have Db2 just check for the duplicate key value on the INSERT statement. With this approach, the program would be running a single SQL statement opposed to the coding pattern above that would result in two SQL statements being executed in the cases where the new values were unique. In general, the fewer calls that you make to Db2 for i (or any database), the faster that your application will run. 

 

I put together a small performance test to verify if less is more when it comes to inserting new rows that may result in a duplicate key error. Essentially, is it faster to check first to avoid the duplicate key exception or not to check by running the Insert statement and relying on Db2 to detect duplicate keys?

 

My performance test used an SQL stored procedure with a loop to insert 1,000 rows – the loop had logic that would cause every other INSERT statement to fail with a duplicate key error. For the “not to check” version of my test, the following condition handler was used to trap the duplicate key error and then allow the stored procedure to continue to run its loop inserting rows. Even with the condition handler in place, Db2 still writes a duplicate key error message into the job log.


DECLARE CONTINUE HANDLER FOR SQLSTATE '23505' 
   SET errFlag = 'Y' ;

 

I ran each version of the stored procedure multiple times to get a consistent timing. In the end, the “not to check” version of the tests consistently ran 5-6% faster than the “check” version of the procedure which avoided the duplicate key error by first running a SELECT statement. The performance tests essentially showed that the overhead of running the second SQL statement was greater than the overhead of Db2 signaling an error back to the application.

 

These test results reinforce the earlier assertion that performance is usually best when your application program runs the fewest number of SQL statements possible. With this coding pattern related to the insertion of unique keys, the answer to the question posed at the start of this entry is: Not To Check!

11 comments:

  1. Performance is one of the factors to check but I don't like a joblog full of duplicate record messages neither so I consider a clean log as important as well.

    I can't count the number of hours people waste when troubleshooting an issue and looking at errors that aren't real errors.

    ReplyDelete
  2. Is the sql MERGE statement better suited for the case? In native I/O would be a chain and a conditional WRITE, really compact and the intent is clear. I personally would avoid to advise to rely on exception handling for exceptional conditions like duplicate keys, it is really bad to rely systematically on that, the business logic should be clean and not resort to such practice. It is like a monitored WRITE in native I/O, the intent is not clear, the log would log an exception etc... really bad practice doing that.

    ReplyDelete
    Replies
    1. Couldn't a write from another job after the chain/setll cause this write to fail?

      Delete
    2. MERGE has less overhead then the SELECT/INSERT approach, but my tests showed INSERT also a little faster than MERGE

      Delete
  3. I would personally avoid and suggesting leveraging such exception handling behavior. Exception are for exceptional cases and a red flag if someone has a logging facility or similar systems. Not worth it for 5% gain in performance (or even less against MERGE. Clear intent in code is better, known condition/cases should be handled and clear to the reader. Any coder would find this approach "hackish" at best. If those 5% are 5 minutes , one should analyze the whole picture and design instead of local single statement optimization.

    ReplyDelete
    Replies
    1. If the code documents why the duplicate key SQLSTATE/SQLCODE is being checked, then I don't agree that the "Check" code is cleaner.

      5% may not seem much, but consider the impact if there are dozens or hundreds of jobs calling the program.

      Delete
  4. I'll point out an additional benefit of the single statement. With check before insert, two jobs could be adding the same key at the same time. Both find out the coast is clear and proceed to insert. One of them will be surprised. The single insert with catching the error inherently catches this issue.

    ReplyDelete
    Replies
    1. Absolutely Not true if under locking or transaction, read for update etc.. these are db guarantees....

      Delete
  5. I would like to see the performance results if the check is instead done with a SETLL

    ReplyDelete