Wednesday, November 17, 2021

To Reorg or Not to Reorg

 After thinking about a “To Be Or Not to Be” type of title in last month’s entry on database masking, it was an easy title choice for this entry. I don’t think that a discussion of whether or not to perform Db2 table reorganizations should be as controversial as the topic of masking, but I've been surprised before…

The IBM Lab Services team recently received a question from client asking if it is still worth the time to reorganize tables in light of today’s faster disk technologies. The main benefit of running the RGZPFM command is to remove/compress deleted rows from a Db2 table. And given that disk technologies such as IBM Flash Storage can retrieve database rows from disk significantly faster than spinning drives, are my applications really going to notice deleted rows being paged into memory? 

The short answer is: yes, application and system performance can be negatively impacted. While today’s disk technology is faster, disk operations on deleted rows are wasteful and those deleted rows unnecessarily increase the memory working set size for an application.

Deleted rows can also waste CPU resources when Db2 performs a scan operation on a table with deleted rows. Assume that the customers table being referenced in the following query contains 1 million rows and there are 300,000 rows in the table that are deleted.

SELECT * FROM customers WHERE company_name LIKE '%INC.'

A Table Scan is the only access method that Db2 can use in the runtime implementation of this query because of the wild-card search criteria that looks for company names ending with ‘INC.’ (This statement is mostly true, but the IBM OmniFind Text Search Server is a topic for another day). The Table Scan method has to process every row in the table whether it’s deleted or not. Obviously, no deleted rows will be returned in the query result set, but this query will waste CPU resources on 300,000 rows checking if the rows are active or deleted. This unneeded processing is definitely something that can impact the scalability of your applications and systems.

 

Of course, it goes without saying that deleted rows affect disk storage requirements. In addition, deleted rows also count against the maximum storage size and row count for a Db2 table.  And yes, there are clients that have gotten close to the 1.7 TB size limit and the 4.2 billion rows limit for a table. 

Another reason to perform table reorgs is that IBM has enhanced the RGZPFM command through the years to minimize the time that the table is unavailable to your applications. Parallel processing, reorganize while active, and the ability to suspend a reorg request are all recent additions that make it easier/faster to perform reorgs on tables. Here are some useful links that provide more details regarding these different types of reorg options:

Assuming that you’re now sold on the benefits of reorgs, the next logical question is: When should a table be reorganized? A good rule of thumb is to wait until the table has a deleted row percentage of around 20%. This site outlines a method to determine the deleted row percentage for a table. It's also important to remember that the Reuse Deleted Rows feature can be used to reduce the number of times that a table has to be reorganized.

 

Happy RGZPFM’ing & Happy Thanksgiving to my US readers!