Monday, July 29, 2024

Selects With No Limits

There’s no limit on the number of rows that an SQL SELECT statement can return to an application by default. There are a couple of different techniques, however, that programmers can employ to change this default behavior. One direct option is adding a Fetch or Limit clause to set the maximum numbers of rows the query can return. Usage of these clauses can prevent accidentally sending millions of rows across the network or ensuring a query only returns the number of rows that can be displayed on the end user interface.

Like many syntax options, there’s nothing that prevents these clauses from being used incorrectly. The following examples highlight a poor coding practice with these clauses that has recently emerged in SQL performance reviews performed by our Technology Expert Labs team. Both of these SQL examples would fail if the specified SELECT returns more than one row and they didn't contain a Fetch or Limit clause.

SELECT c1 INTO :hv1 FROM t1 WHERE c2=1 AND c3='A' 
  FETCH FIRST 1 ROWS ONLY

SET :hv1 = (SELECT c1 FROM t1 WHERE c2=1 AND c3='A' LIMIT 1)

While these limiting clauses prevent the SQL from failing with a More Than One Row error, there are several issues with this coding approach.

One big issue is data integrity. If these clauses are being used just as a safety net – meaning the query shouldn’t return more than row but it’s added just in case that happens, then these clauses will prevent the application from ever detecting that the query is returning more than one row. Multiple rows being returned could be caused by a data integrity problem, but that potential problem is hidden from the application with this coding practice. Furthermore, if it’s normal that the query could return more than one row from time to time, these SQL requests have told Db2 it’s okay to randomly select which row is returned to the program. That potential randomness exists because these SELECT statements don’t have an ORDER BY clause. In this case, Db2 is allowed to return the selected rows in any order that it wants. The chances of the randomness goes up even higher if these queries are eligible for parallel processing with Db2 SMP. I think you’d agree it’s a much cleaner and safer approach to control which row is assigned to the host variable instead of assigning a value at random.

The other issue is performance. These clauses require Db2 to track how many rows a query is returning and that tracking takes time and resources. Some internal tests show these clauses adding 0.5% overhead to the execution of a query. That overhead may seem like nothing to worry about, but just multiply that little overhand times thousands or millions of statement executions.

The Sky’s the Limit when it comes to the scalability of your SQL applications… assuming that you’re utilizing good SQL coding practices. If you’re company needs help learning efficient SQL coding practices, then our IBM Technology Expert Labs team is here to help.

8 comments:

  1. Wait? Are you saying that "LIMIT 1" doesn't cause the query processor to stop after 1 row but rather continues for N rows and then throws away the N-1 rows?

    ReplyDelete
    Replies
    1. No, that's not what I'm saying. The query engine doesn't normally pay attention to how many rows are returned to by the query. So when you add LIMIT 1/FETCH FIRST 1 ROW ONLY to a query, the query engine has to count the number of rows returned. Counting rows take time & resources.

      Delete
  2. Very good point made about "hiding" the real issue. This is akin to fixing the symptom instead of the problem - but without visibility to even the symptom you can fix neither! Cheers :)

    ReplyDelete
  3. You write "Both of these SQL examples will fail if the specified SELECT returns more than one row." but I don't see how the two examples can return more than one row.

    ReplyDelete
    Replies
    1. Good catch - I meant to say they would fail if there was not a Fetch or Limit clause. I've updated the text.

      Delete
  4. How would you rewrite these two statement then?

    ReplyDelete
    Replies
    1. The Fetch/Limit clause should only be used when the query can return more than one row and there's an Order By clause to control which row gets returned.

      Delete
  5. Thanks for all your great content, much appreciated.

    ReplyDelete