Monday, April 11, 2022

Look Before you Merge

The more common version of this title phrase is look before you leap. However, this same recommendation applies to driving a car – if you don’t look before you merge your vehicle into traffic, there’s a good chance that eventually you will either damage your car or cause a car accident. In this entry, you will also see why this is also good advice from a performance perspective when using the SQL MERGE statement.

 

The MERGE statement has been available for use with Db2 for i since the IBM i 7.1 release. The merge support was designed for logic where you’re taking data from a source table/query and using that data to perform either an insert or update operation on the target table. If the row already exists in the target table, then you want to “add” to the existing row by performing an update; if the row doesn’t exist, then a new row needs to be inserted into the target table. In fact, some DBMS products provide this feature with an UPSERT statement since it is a database operation statement that support both insert and update operations.

 

The following MERGE statement merges data into an account summary table by updating the balance from the set of transactions for an existing account and inserting a new balance for the new accounts. 

                 

MERGE INTO account_summary AS a
   USING (SELECT id, SUM(trans_amount) sum_amount FROM trans 
          GROUP BY id) AS t 
     ON a.id = t.id
   WHEN MATCHED THEN UPDATE SET balance = a.balance + t.sum_amount
   WHEN NOT MATCHED THEN
      INSERT (id, balance) VALUES (t.id, t.sum_amount)


While the Insert & Update combo is the sweet spot for the MERGE statement, the syntax actually supports many different use cases including the ability to delete rows from the target table. While recently assisting clients with SQL performance tuning, our Lab Services team has encountered clients that are using the MERGE statement as an alternative syntax for performing update operations. The same data changes could have been coded with an UPDATE statement, but the developers chose to use the MERGE statement instead. 

 

Developers at these shops made this choice because they found the MERGE statement syntax easier to use.  When you compare these two equivalent data change operations, you can see how some developers may find the MERGE version to be simpler to understand.


MERGE INTO employee 
  USING (SELECT jobcode, jobrate FROM jobinfo) j
      ON (empcode = j.jobcode)
    WHEN MATCHED THEN UPDATE SET emprate = j.jobrate
    ELSE IGNORE
 

UPDATE employee 
  SET emprate = (SELECT jobrate FROM jobinfo j 
                  WHERE j.jobcode = empcode)
  WHERE empcode IN (SELECT jobcode FROM jobinfo)

 

A developer should definitely consider how easy an SQL statement is to code and maintain during the coding process, but a developer also needs to make sure they “look” at all coding factors before taking the “leap”.  Performance was a factor that was overlooked during the development process in the client situations that our team was recently involved with.

 

To be fair, the IBM documentation currently does not clearly spell out the performance considerations for the MERGE statement. So here are some key performance factors based on the internal Db2 implementation that should be considered when you weigh using MERGE as an alternative for an UPDATE request.

       Query plan will create & populate a temporary data structure identifying the target rows in the target table

       Merge processing applies update locks to every target row in the target table based on the ON clause before performing any updates. 

       Merge update process results in extra read lock being applied to every updated row.

       With No Commit, Update & read row locks are held until the end of the Merge statement

 

The net of these performance factors is that a MERGE statement will acquire twice as many row locks as the equivalent UPDATE statement.  In addition, all of those row-level locks are held by Db2 until all of the target rows have been updated by the MERGE statement even with an isolation level of No Commit.  In contrast, an UPDATE statement running under No Commit will release the row locks as soon as the change to the target row has been completed. 

 

If your MERGE statement only updates a small number of rows in the target table, the overhead causing by the database locking is probably not going to be noticeable. However, MERGE statements that end up changing thousands of rows have the potential to generate significant performance overhead on your system. One client used a MERGE statement to update about 550,000 rows – the statement held over 1.1 million row locks at the time the last row was updated even with an isolation level of No Commit! The equivalent UPDATE statement running No Commit would have held only a single row lock at any point in time while it changed the 550,000 rows. 

 

Like many things in IT, there are tradeoffs with the solutions that you can implement.  Hopefully, you now have a much better understanding of the performance tradeoffs between a MERGE and the equivalent UPDATE statement.