One of my tasks is to diagnose and optimize sql statements. One of the challenges I face is when do you not try to optimize the statement and just push it back to development with the request to "start over"? Perhaps even as far as looking at their data model?
There are certainly some red flags
- Multiple accesses of the same table
- UNIONs where the tables and columns are the same, but predicates are different
- 2+ levels of nested subqueries
- Queries that take several pages to display
It is the last red flag that is of interest today. How long is too long for a sql statement? I have seen some monsters (especially from Oracle's own applications/code). One that I am currently looking at is over 400 lines (with the column lists concatenated into single lines). The problem with these long statements is where do you begin the diagnosis process?
Answer - you begin by going back to the developer and ask 'What are you trying to achieve with this query?'.One metric I came up with (at lunch...so it's totally untested or thought out...should make for good debate fodder) is to compare the number of items in the column list with the number of lines.
- Each column is on a separate line
- Expressions (CASE, DECODE) are considered a single column
- Each table in the FROM clause is on a separate line
- Each predicate clause is on a separate line
- Views are expanded into the statement
- ORDER BY and GROUP BY clauses are ignored for the purposes of calculation
Let's say that each column comes from a different table and has a single filter condition. Add in the join condition and you have 4 lines per column per sql statement. Of course, many statements pull multiple columns from a single table and have compound filters or joins. This leaves me still at 4 lines per column per statement. This means that 75% of the statement are in the FROM and WHERE/HAVING clauses.
If there are more than 3 lines in the FROM and WHERE/HAVING clauses in a sql statement for every column/expression in the column list, the statement is a prime candidate for returning to development with a request to "start over".The one question that is unanswered and may be unanswerable is "Is a long sql statement an indication of a bad data model?"
7 comments:
One problem with that (or any metric) is people start coding to the metric rather than the problem. In this case, it is as simple as adding columns to the SELECT list that you don't intend to use, just to meet the metric.
Also, people will start to write selects that 'save' lines of code rather than are optimal for performance or maintenance/clarity. Complex case statements, using PL/SQL functions....
Ah, one sympathises, but as a notorous producer of Very Long SQL statement (only 400 lines? Hah!) I have to add a couple of comments on how to make it more acceptable.
Firstly, as the business logic being implemented gets more complek the need for comments in the code rises. Inline comments are a handy technique, but they can add a lot of length to the statement, so I often prefer adding markers for code points of interest (eg. "-- see note 1 below") and following up on the note in a comment block below.
Secondly, both the development and the maintenance of long statements with complex business logic can be aided by using subquery factoring clauses instead of inline views. This allows blocks of code to be isolated almost in the way that procedures and functions are used in a procedural language. It can also allow these sections of code to be more easily executed and tested in isolation from other code.
For example,
With candidate_items as
(...),
static_data as
(...),
daily_changes as
(...)
select ...
from candidate_items, static_data, daily_changes
where ...
Lastly, one of the unfortunate side-effects that I've seen in returning code to developers for rewrite based on the statement being allegedly too long is that they simply start creating temporary tables and populating them, then joining the results. Alternatively you can end up with a bunch of views that are only used by one query, with the intent of making the query look "acceptbale". This, of course, makes the code "look better" and it passes the line-count test, but at the expense of performance and/or maintainability. Being more on the development side myself I've personally refused to rewrite code on the basis of it being too long, but only in the context of explaining exactly why it would be a Bad Thing.
Anyway, there's no clear-cut answers here unfortunately, but commenting of code and formatting appropriately ought to be a necessary precursor to deployment for any application.
Size doesn't matter?
Neatness and clarity of thought are where it's at.
A sql statement should be as long as it needs to be to do its job efficiently and that means without resorting to unnecessary possibly performance-killing function lookups and views as Gary and David mention.
But, when you need to tune a statement that won't fit on one screen in any font, then that in itself is a challenge.
And that's something that subquery factoring can really help with, I find. Great for working out what was the original intention, finding unneccessary duplication of tables, and for encouraging set thinking whilst breaking stuff out into logical sets of information that need to be joined.
As a developer, I think this is, in fact, a nice little metric. Obviously, as was noted, your SQL has to be as long as it has to be. However, if you finish a query, look down and go "Dear God What Have I Wrought???", this isn't a bad way to sort of feel your way through, to have a flag go up when you are reviewing your work.
I find usingthe taller than I am test a good start on this, if the query printed out in close type is taller than I am it probably needs reviewed to make sure.
a: its doing what Is expected
b: there isn't a simpler way to perform the same process.
ideally if you can get an old daisywheel linefeed printer that way you don't have to tape the bits of A4 together..
One risk with breaking up a complex SQL statement into multiple SQLs is that read consistency may also "break up".
Oracle implements Statement Level Read Consistency. Thus, even a complex SQL gets a read-consistent view of all the tables it references.
Rewrite the code as seperate SQLs (probably storing the intermediate results in variables in PL/SQL or Pro*C or in temporary tables) and Oracle may return slightly different views of the referenced tables each time you query them !
Not all developers understand the implications of Statement Level Read Consistency.
Agree with Dom firstly: size doesnt matter, as long as I (and some maintenance body) can easily understand what is going on.
But in case of doubt: the metric is a good, reasoned-out start. Although my kids would probably know how to add a few dummy columns in the select-clase to stay inside the acceptable limit.
Any chance of factoring in the nr of actual tables and the nr of (PK-)columns that cause join-lines ?
My own subjective measure is: Can someone explain it to me within reasonable time. If I dont get it, it is probably too complicated.
I try not to be too clever, and I often suspect the maintenance crew is dumber then me.
Another simple measures: If we cant (get CBO to) run it acceptably, it is too complicated: Break it up - please.
And I like a cleverly done qry: logic inside queries is often, funny enough, more transparent and more efficient then logic in code.
@Dave, comments: I've reduced adding comments when my comments started to cut out lines after accidental editing. But that was just us. Normally, comments are Good.
@Dave, subquery factoring: Yep, but it shouldnt be an excuse to leave out proper joins. too much factoring can point to sloppy coding.
Post a Comment