Tuesday, February 25, 2025

To Check or Not to Check (Uniqueness)

In Hamlet’s famous ‘To Be or Not To Be’ speech, he wrestles with thoughts about life and death. Obviously, slow application performance is not a life-or-death issue, but it can sometimes feel that way if you’re the developer whose program is slowing down a critical business process.

Clients frequently engage our IBM Expert Labs team to analyze applications to identify performance bottlenecks and to identify solutions to improve performance. A couple months ago, I was reviewing the SQL statements embedded within a client’s RPG program. The analysis of the SQL identified a coding pattern where the program would run a SELECT statement against a table and almost immediately turn around and run an INSERT against the same table. The code looked something like the following:


SELECT 1 INTO :outhv FROM sometab
  WHERE col1=:hv1 AND col2=:hv2 AND col3=:hv3 AND col4=:hv4;

If sqlCode = 100;
  INSERT INTO sometab
     VALUES(:hv1, :hv2, :hv3, :hv4, )
Endif;

 

When I asked the customer about the purpose of this coding pattern, they shared that the columns referenced on the Where clause defined a unique key for the table. Thus, the SELECT statement was being run to verify if the specified key value already exists in the table. If the SELECT statement didn’t return a row, the program would know that there was no possibility of a duplicate key error – meaning that the INSERT statement would run successfully.

 

This explanation led the developer to ask if it was more efficient to have Db2 just check for the duplicate key value on the INSERT statement. With this approach, the program would be running a single SQL statement opposed to the coding pattern above that would result in two SQL statements being executed in the cases where the new values were unique. In general, the fewer calls that you make to Db2 for i (or any database), the faster that your application will run. 

 

I put together a small performance test to verify if less is more when it comes to inserting new rows that may result in a duplicate key error. Essentially, is it faster to check first to avoid the duplicate key exception or not to check by running the Insert statement and relying on Db2 to detect duplicate keys?

 

My performance test used an SQL stored procedure with a loop to insert 1,000 rows – the loop had logic that would cause every other INSERT statement to fail with a duplicate key error. For the “not to check” version of my test, the following condition handler was used to trap the duplicate key error and then allow the stored procedure to continue to run its loop inserting rows. Even with the condition handler in place, Db2 still writes a duplicate key error message into the job log.


DECLARE CONTINUE HANDLER FOR SQLSTATE '23505' 
   SET errFlag = 'Y' ;

 

I ran each version of the stored procedure multiple times to get a consistent timing. In the end, the “not to check” version of the tests consistently ran 5-6% faster than the “check” version of the procedure which avoided the duplicate key error by first running a SELECT statement. The performance tests essentially showed that the overhead of running the second SQL statement was greater than the overhead of Db2 signaling an error back to the application.

 

These test results reinforce the earlier assertion that performance is usually best when your application program runs the fewest number of SQL statements possible. With this coding pattern related to the insertion of unique keys, the answer to the question posed at the start of this entry is: Not To Check!