One of my tasks is to diagnose and optimize sql statements. One of the challenges I face is when do you not try to optimize the statement and just push it back to development with the request to "start over"? Perhaps even as far as looking at their data model?
There are certainly some red flags
- Multiple accesses of the same table
- UNIONs where the tables and columns are the same, but predicates are different
- 2+ levels of nested subqueries
- Queries that take several pages to display
It is the last red flag that is of interest today. How long is too long for a sql statement? I have seen some monsters (especially from Oracle's own applications/code). One that I am currently looking at is over 400 lines (with the column lists concatenated into single lines). The problem with these long statements is where do you begin the diagnosis process?
Answer - you begin by going back to the developer and ask 'What are you trying to achieve with this query?'.One metric I came up with (at lunch...so it's totally untested or thought out...should make for good debate fodder) is to compare the number of items in the column list with the number of lines.
- Each column is on a separate line
- Expressions (CASE, DECODE) are considered a single column
- Each table in the FROM clause is on a separate line
- Each predicate clause is on a separate line
- Views are expanded into the statement
- ORDER BY and GROUP BY clauses are ignored for the purposes of calculation
Let's say that each column comes from a different table and has a single filter condition. Add in the join condition and you have 4 lines per column per sql statement. Of course, many statements pull multiple columns from a single table and have compound filters or joins. This leaves me still at 4 lines per column per statement. This means that 75% of the statement are in the FROM and WHERE/HAVING clauses.
If there are more than 3 lines in the FROM and WHERE/HAVING clauses in a sql statement for every column/expression in the column list, the statement is a prime candidate for returning to development with a request to "start over".The one question that is unanswered and may be unanswerable is "Is a long sql statement an indication of a bad data model?"