This week we celebrate the Thanksgiving holiday in the US, so I decided to write about an overlooked Db2 enhancement that I’m grateful to have as a tool in the toolbox when our IBM Expert Labs team helps clients optimize their SQL performance.
One of
the great things about the SQL user-defined function (UDF) support is that it
allows programmers to use modular programming
as they develop queries and reports. If there’s a calculation commonly used by your
business, then good modular programming would dictate that the calculation is
coded in a single program and reused as opposed to embedding that calculation
in every application program that needs it.
UDFs enable this same type of code reuse, but at an SQL statement level.
Let’s use Costs of Goods Sold (COGS) as a common calculation or derivation that your reports frequently use. COGS is a simple calculation that adds together beginning inventory and purchases during the period and subtracts the ending inventory. Note: You may be thinking that the conversion of a legacy date value to an SQL Date value would be a better example of a common derivation that queries need - however, remember that I've previously covered a Date lookup table as a better solution for that.
Here’s the SQL UDF definition to support the COGS calculation:
CREATE FUNCTION CostOfGoodsSold (beginInv DECIMAL(8,2),
endInv DECIMAL(8,2),
purchasedInv DECIMAL(8,2)) RETURNS DECIMAL(8,2) LANGUAGE SQL NOT FENCED GLOBAL DETERMINISTIC RETURN( beginInv + purchasedInv — endInv)
While using
the UDF is easy from a programming perspective, there is a performance cost
since each call to the CostsOfGoodsSold UDF involves a call to an external
program object. Invoking an external program takes much more time and resources
than retrieving a column from a row in a table. Thus, the modular programming
approach of using a UDF adds overhead due to the program call.
IBM
knows that both performance and modular programming are important, so they
delivered SQL UDF inlining to allow SQL developers to get the best of both worlds. Inlining is a technique
that is used by program compilers to improve performance of the generated
program object. The compiler scans the program source code and notices that the
main body of a program calls the same procedure or routine multiple times to
perform a specific task. The compiler could decide to replace each procedure
call in the program’s main body with the procedure logic itself. This inline
replacement is done to eliminate the procedure call overhead to improve the
overall performance.
Db2 for
i takes a similar approach with the UDF logic. Instead of taking time to invoke
the external program object to perform the UDF logic, Db2 moves the function
inline to the query as demonstrated in the following graphic. The SELECT
statement coded in the application at the top of the figure is automatically
transformed into the bottom SELECT statement when it’s run by the application program.
First, the Db2 inline support enables the application delivers to employ modular programming because their SQL statements reuse the calculation by invoking the UDF. Second, the Db2 inline support improves performance because when the SQL statement is run the UDF calculation is moved inline to the SQL statement which eliminates the overhead of the external program call.
A simple performance test of the inline and non-inline versions of a UDF against a table with a half million rows showed a 4x performance improvement for the query using the inline version of the UDF. This performance result reinforces how expensive it is for the database engine to perform external program calls.
When the
function is created, Db2 for i reviews the function definition to determine if
the function is inline eligible. The main requirement is that the function
logic must be simple – the function body must contain a RETURN statement or a
RETURN statement wrapped within a BEGIN … END
pairing. If the calculation
requires multiple assignments and SQL statements, then that logic cannot be
moved into another SQL statement. If the
UDF is a scalar user-defined function, the function must be defined with the GLOBAL DETERMINISTIC setting. A
user-defined table function must be defined with the NO EXTERNAL ACTION
setting. These are the main
requirements, but the complete list of the inline requirements is documented
for both scalar UDFs and table UDFs.
At this
point, you’re probably wondering how to verify if your existing SQL UDFs are
being inlined. The SYSFUNCS view in QSYS2 contains all the details about UDFs
on your system, so it’s easy to determine the inline status by running a query
like the following:
SELECT routine_schema, routine_name, inline
FROM qsys2.sysfuncs WHERE external_language IS NULL
Here’s an example of the output that you’ll receive:
The YES and NO values
returned as inline status above are obvious, but what does mean if the INLINE status
is Null (i.e., -) like the INT_ARRAY function in this example? The Null value means that Db2 for i hasn’t
had a chance to determine if your function is inline eligible. Meaning
that the UDF was created before Db2 had the inline feature that was first made
available on the IBM i 7.3 release via a Database Group PTF.
One of
the strengths of IBM i is that programs like an SQL UDF keep on running when
you install a new release. No program
recompiles or recreations are needed. One downside of this run forever approach
is that your program can miss out on new performance optimization like UDF
inlining. Because of this, I recommend
that clients recreate all their SQL functions, SQL procedures, and SQL triggers
after installing a new release of the IBM i operating system. With this approach, you guarantee that your
SQL procedural objects are leveraging all the latest optimizations delivered by
the Db2 team in Rochester.
You should now have a firm understanding of how UDF inlining delivers the best of both worlds in terms
of UDF performance and using UDFs for modular programming. And hopefully, you're thankful for it like me!


