The more common version of this
title phrase is look
before you leap. However, this same recommendation applies to driving a
car – if you don’t look before you merge your vehicle into traffic, there’s a
good chance that eventually you will either damage your car or cause a car
accident. In this entry, you will also see why this is also good advice from a
performance perspective when using the SQL MERGE statement.
The MERGE statement
has been available for use with Db2 for i since the IBM i 7.1 release. The merge
support was designed for logic where you’re taking data from a source
table/query and using that data to perform either an insert or update operation
on the target table. If the row already exists in the target table, then you
want to “add” to the existing row by performing an update; if the row doesn’t
exist, then a new row needs to be inserted into the target table. In fact, some
DBMS products provide this feature with an UPSERT statement since it is a
database operation statement that support both insert and update operations.
The following MERGE statement merges data into an account summary table by updating the balance from the set of transactions for an existing account and inserting a new balance for the new accounts.
MERGE INTO account_summary AS a USING (SELECT id, SUM(trans_amount) sum_amount FROM trans GROUP BY id) AS t ON a.id = t.id WHEN MATCHED THEN UPDATE SET balance = a.balance + t.sum_amount WHEN NOT MATCHED THEN INSERT (id, balance) VALUES (t.id, t.sum_amount)
While the
Insert & Update combo is the sweet spot for the MERGE statement, the syntax actually supports many different
use cases including the ability to delete rows from the target table. While
recently assisting clients with SQL performance tuning, our Lab Services team
has encountered clients that are using the MERGE statement as an alternative
syntax for performing update operations. The same data changes could have been
coded with an UPDATE statement, but the developers chose to use the MERGE
statement instead.
Developers at these shops made this choice because they found the MERGE statement syntax easier to use. When you compare these two equivalent data change operations, you can see how some developers may find the MERGE version to be simpler to understand.
MERGE INTO employee USING (SELECT jobcode, jobrate FROM jobinfo) j ON (empcode = j.jobcode) WHEN MATCHED THEN UPDATE SET emprate = j.jobrate ELSE IGNORE UPDATE employee SET emprate = (SELECT jobrate FROM jobinfo j WHERE j.jobcode = empcode) WHERE empcode IN (SELECT jobcode FROM jobinfo)
A developer should definitely consider how easy an SQL statement is to code and maintain during the coding process, but a developer also needs to make sure they “look” at all coding factors before taking the “leap”. Performance was a factor that was overlooked during the development process in the client situations that our team was recently involved with.
To be
fair, the IBM documentation currently does not clearly spell out the
performance considerations for the MERGE statement. So here are some key performance factors
based on the internal Db2 implementation that should be considered when you
weigh using MERGE as an alternative for an UPDATE request.
– Query
plan will create & populate a temporary data structure identifying the
target rows in the target table
– Merge
processing applies update locks to every target row in the target table based
on the ON clause before performing any updates.
– Merge
update process results in extra read lock being applied to every updated
row.
– With
No Commit, Update & read row locks are held until the end of
the Merge statement
The net
of these performance factors is that a MERGE statement will acquire twice as
many row locks as the equivalent UPDATE statement. In addition, all of those row-level locks are
held by Db2 until all of the target rows have been updated by the MERGE
statement even with an isolation level of No Commit. In contrast, an UPDATE statement running
under No Commit will release the row locks as soon as the change to the target
row has been completed.
If your
MERGE statement only updates a small number of rows in the target table, the
overhead causing by the database locking is probably not going to be
noticeable. However, MERGE statements that end up changing thousands of rows
have the potential to generate significant performance overhead on your system.
One client used a MERGE statement to update about 550,000 rows – the statement
held over 1.1 million row locks at the
time the last row was updated even with an isolation level of No Commit! The equivalent
UPDATE statement running No Commit would have held only a single
row lock at any point in time while it changed the 550,000 rows.
Like
many things in IT, there are tradeoffs with the solutions that you can implement. Hopefully, you now have a much better
understanding of the performance tradeoffs between a MERGE and the equivalent
UPDATE statement.