After thinking about a “To Be Or Not to Be” type of title in last month’s entry on database masking, it was an easy title choice for this entry. I don’t think that a discussion of whether or not to perform Db2 table reorganizations should be as controversial as the topic of masking, but I've been surprised before…
The IBM Lab Services team recently received a question from
client asking if it is still worth the time to reorganize tables in light of
today’s faster disk technologies. The main benefit of running the RGZPFM
command is to remove/compress deleted rows from a Db2 table. And given that disk
technologies such as IBM Flash Storage
can retrieve database rows from disk significantly faster than spinning
drives, are my applications really going to notice deleted rows being paged
into memory?
The short answer is: yes, application and system performance
can be negatively impacted. While today’s disk technology is faster, disk
operations on deleted rows are wasteful and those deleted rows unnecessarily
increase the memory working set size for an application.
Deleted rows can also waste CPU resources when Db2 performs
a scan operation on a table with deleted rows. Assume that the customers table
being referenced in the following query contains 1 million rows and there are
300,000 rows in the table that are deleted.
SELECT * FROM customers WHERE company_name LIKE '%INC.'
A Table
Scan is the only access method that Db2 can use in the runtime implementation
of this query because of the wild-card search criteria that looks for company
names ending with ‘INC.’ (This statement is mostly true, but the IBM OmniFind Text Search Server is a topic for another day). The Table Scan method has
to process every row in the table whether it’s deleted or not. Obviously, no
deleted rows will be returned in the query result set, but this query will
waste CPU resources on 300,000 rows checking if the rows are active or deleted.
This unneeded processing is definitely something that can impact the
scalability of your applications and systems.
Of course, it goes without saying that deleted rows affect disk storage requirements. In addition, deleted rows also count against the maximum storage size and row count for a Db2 table. And yes, there are clients that have gotten close to the 1.7 TB size limit and the 4.2 billion rows limit for a table.
Another reason to perform
table reorgs is that IBM has enhanced the RGZPFM command through the years to
minimize the time that the table is unavailable to your applications. Parallel processing, reorganize while active,
and the ability to suspend a reorg request are all recent additions that make
it easier/faster to perform reorgs on tables. Here are some useful links that provide
more details regarding these different types of reorg options:
Assuming that you’re now sold
on the benefits of reorgs, the next logical question is: When should a table be
reorganized? A good rule of thumb
is to wait until the table has a deleted row percentage of around 20%. This site
outlines a method to determine the deleted row percentage for a table. It's
also important to remember that the Reuse Deleted Rows feature can be used to reduce the number of times that a table
has to be reorganized.
Happy
RGZPFM’ing & Happy Thanksgiving to my US readers!