Monday, July 29, 2024

Selects With No Limits

There’s no limit on the number of rows that an SQL SELECT statement can return to an application by default. There are a couple of different techniques, however, that programmers can employ to change this default behavior. One direct option is adding a Fetch or Limit clause to set the maximum numbers of rows the query can return. Usage of these clauses can prevent accidentally sending millions of rows across the network or ensuring a query only returns the number of rows that can be displayed on the end user interface.

Like many syntax options, there’s nothing that prevents these clauses from being used incorrectly. The following examples highlight a poor coding practice with these clauses that has recently emerged in SQL performance reviews performed by our Technology Expert Labs team. Both of these SQL examples would fail if the specified SELECT returns more than one row and they didn't contain a Fetch or Limit clause.

SELECT c1 INTO :hv1 FROM t1 WHERE c2=1 AND c3='A' 
  FETCH FIRST 1 ROWS ONLY

SET :hv1 = (SELECT c1 FROM t1 WHERE c2=1 AND c3='A' LIMIT 1)

While these limiting clauses prevent the SQL from failing with a More Than One Row error, there are several issues with this coding approach.

One big issue is data integrity. If these clauses are being used just as a safety net – meaning the query shouldn’t return more than row but it’s added just in case that happens, then these clauses will prevent the application from ever detecting that the query is returning more than one row. Multiple rows being returned could be caused by a data integrity problem, but that potential problem is hidden from the application with this coding practice. Furthermore, if it’s normal that the query could return more than one row from time to time, these SQL requests have told Db2 it’s okay to randomly select which row is returned to the program. That potential randomness exists because these SELECT statements don’t have an ORDER BY clause. In this case, Db2 is allowed to return the selected rows in any order that it wants. The chances of the randomness goes up even higher if these queries are eligible for parallel processing with Db2 SMP. I think you’d agree it’s a much cleaner and safer approach to control which row is assigned to the host variable instead of assigning a value at random.

The other issue is performance. These clauses require Db2 to track how many rows a query is returning and that tracking takes time and resources. Some internal tests show these clauses adding 0.5% overhead to the execution of a query. That overhead may seem like nothing to worry about, but just multiply that little overhand times thousands or millions of statement executions.

The Sky’s the Limit when it comes to the scalability of your SQL applications… assuming that you’re utilizing good SQL coding practices. If you’re company needs help learning efficient SQL coding practices, then our IBM Technology Expert Labs team is here to help.