Assessing the performance of the SQL and queries running on your IBM i systems is the most popular service that our Db2 for i team in Systems Expert Labs performs. One of those things that I frequently notice when doing this work is that a lot of a client’s query and reports perform data transformations. These data transformations are required to get the data in a form that the end users can understand and digest. These transformations range from converting legacy data values into SQL dates to eliminating leading blanks in character columns to the aggregation of data.
The transformations themselves are not an issue. The issue
is the number of times that they are done and the number of queries that
contain the same transformation code. As a result, these repeated
transformations leave clients with both performance and code maintenance
headaches.
This transformation overhead reared its head
on a recent client engagement where they were experiencing system performance
issues due to reporting workloads straining their system. Our analysis showed
that a significant number of reports referenced the same SQL view. One
transformation in this view was the aggregations of millions of rows of sales
data. This same expensive data transformation was occurring hundreds of times
of day on their system. As it turned
out, the analysts consuming these reports did not need the latest sales data
being aggregated for their work. The
client addressed their performance woes by scheduling the needed aggregation of
the sales data and changing the SQL view to reference the pre-aggregated data
set. System performance improved dramatically after this repeated data
transformation was reduced to a single daily event. After this change, their
Analytics SME remarked that the performance change of our reports was quite
literally astounding.
This client avoided the code maintenance burden of having
the same data transformation code duplicated across many queries through the
use of an SQL view, but I see many clients with the same data transformation
code repeated in different queries. While the use of SQL views enables code
reuse, they don’t address the performance overhead of having multiple queries
performing the same data transformations over and over again on your
system.
Again, you could free up a lot of your system resources by
reducing the number of times that the data transformations have to be
performed. Schedule these data transformations once a day (or couple times) to
get a copy of your data into a format that is more easily consumed by your
reporting and analytic workloads. This data copy that enables easier and faster
analytics is often referred to as a data mart or data warehouse.
There’s a good chance you have heard about data marts and
data warehouses but I’m guessing you haven’t heard of DataMigrator.
This tool which is part of the Db2 Web Query product suite can help generate
the SQL for your transformations like aggregations which are commonly used in
analytical applications and schedule the transformations to run. It even allows you to just copy the changed
data in your tables with it’s support for journals and remote journals as a
source. You can learn more about DataMigrator, motioning your data, and
analytics in general from a new blog started by my teammate John Westcott. Check it out and learn more
about how you can avoid the costs of those repeated transformations on your
system.
Dear Kent
ReplyDeleteWhen you mentioned pre-aggregated data set, I was thinking of using Materialize Query Table (MQT). Is MQT the most efficient tool to use in this case?
MQTs are definitely an option for pre-aggregating data sets. How much you can benefit from the automatic rewrite of queries to use the MQT by the optimizer depends on your data currency requirements. There's a white paper on the MQT support located at this site: ibm.biz/db2iPapers
Delete