**IMPORTANT NOTE - This applies only to 10gR1 and older databases **
I recently reviewed a couple of statspack reports from a 9i database to determine the differences between a 'good' and 'bad' period. After identifying several key differences (more logical reads, more redo, more disk reads), I set about looking at the underlying activity in the form of the sql statements. In the past, I have used this data to find statements consuming large amount of resource and causing contention issues with other processes. I must now question the accuracy of that approach.
At UKOUG, I learned that in releases prior to 10.2, the statistical information for a given sql statement was not updated until the successful completion of the session's execution of the statement. While I found that interesting, I did not grasp the real significance in regards to Statspack until just now.
Let's say that you have a high resource consumption query that runs 5 hours (and you are taking hourly snapshots). All of the sql statistics (disk reads, buffer gets, time) will be reported for the snapshot immediately following the statement completion (unless it gets purged from the shared pool before that can happen...but that is another issue for another blog). Even though the statement was consuming resource during snaps 1, 2, 3 and 4, it is only reported for snap 5! If you are trying to diagnose issues that occur during the other snapshots, you will completely miss this query...and focus on queries that may have completed well before the snapshot and had little if any impact on the time period being examined!
This does not mean that all SQL metrics should be ignored (despite the title of this blog). Statements with multiple executions might be accurately reflected if they are of reasonably short duration (shorter than your report window). And the bottom line is that a query that runs for 5 hours to process less than 20 rows needs to be looked at more closely.
As with all tools, you need to understand the shortcomings. That does not make the tool useless, just not a wonder tool.
Thursday, March 22, 2007
Thursday, March 15, 2007
What would like to see in a Statspack report?
One of my frustrations with the supplied statspack report is that it provides too much information which makes it hard to see the relevant information needed to diagnose problems. I do wish that there was at least a 132 character wide version...
You have seen some of my recent 'customizations' on this blog or on my site. I am currently working on scripts that will compare statistics/events between two (or more) time ranges and generate information about sql statements with multiple execution plans and compare statement performance across time ranges as well.
So, my dear blog readers, what data would you like to see in a statspack report? Respond with your ideas, suggestions, thoughts and I'll see what I can do about putting together the sql to generate that data and post it to my website (www.optimaldba.com) for all to use.
You have seen some of my recent 'customizations' on this blog or on my site. I am currently working on scripts that will compare statistics/events between two (or more) time ranges and generate information about sql statements with multiple execution plans and compare statement performance across time ranges as well.
So, my dear blog readers, what data would you like to see in a statspack report? Respond with your ideas, suggestions, thoughts and I'll see what I can do about putting together the sql to generate that data and post it to my website (www.optimaldba.com) for all to use.
Tuesday, March 06, 2007
Creeping snapshots
One of the things I have noticed with Statspack is that the timing of the snapshots is not always what is intended. This is the result of the interval logic of dbms_job. The 'next' time a job is submitted is the 'interval' time from the end of the job, not from the beginning. So a snapshot that is scheduled to run at the top of each hour may begin to be scheduled to run later and later past the hour as the snapshot begins to take more time.
Here is a small example where a level 10 snapshot started running longer than 1 minute.
3192 12/05/06 11:00:06
3200 12/05/06 15:00:33
3208 12/05/06 19:00:57
3216 12/05/06 23:01:16
3224 12/06/06 03:01:38
3232 12/06/06 07:01:59
3240 12/06/06 11:02:27
3248 12/06/06 15:02:55
3256 12/06/06 19:03:19
3264 12/06/06 23:03:37
3272 12/07/06 03:03:58
3280 12/07/06 07:04:21
3284 12/07/06 09:04:32 <== In less than 2 days, the time had creeped by over 4 minutes
3285 12/07/06 09:30:04 <== Interval logic changed
3286 12/07/06 10:00:02
3287 12/07/06 10:30:06
3288 12/07/06 11:00:04
3297 12/07/06 15:00:06
3305 12/07/06 19:00:02
3313 12/07/06 23:00:03
3321 12/08/06 03:00:02
3329 12/08/06 07:00:04
3337 12/08/06 11:00:02
Here is a small example where a level 10 snapshot started running longer than 1 minute. The interval logic was changed to round the next time to the nearest 1/2 hour.
interval=>'trunc(sysdate, ''HH24'') + ((floor(to_number(to_char(sysdate, ''MI''))/30)+1)*30) / (24*60)'
Here is a small example where a level 10 snapshot started running longer than 1 minute.
3192 12/05/06 11:00:06
3200 12/05/06 15:00:33
3208 12/05/06 19:00:57
3216 12/05/06 23:01:16
3224 12/06/06 03:01:38
3232 12/06/06 07:01:59
3240 12/06/06 11:02:27
3248 12/06/06 15:02:55
3256 12/06/06 19:03:19
3264 12/06/06 23:03:37
3272 12/07/06 03:03:58
3280 12/07/06 07:04:21
3284 12/07/06 09:04:32 <== In less than 2 days, the time had creeped by over 4 minutes
3285 12/07/06 09:30:04 <== Interval logic changed
3286 12/07/06 10:00:02
3287 12/07/06 10:30:06
3288 12/07/06 11:00:04
3297 12/07/06 15:00:06
3305 12/07/06 19:00:02
3313 12/07/06 23:00:03
3321 12/08/06 03:00:02
3329 12/08/06 07:00:04
3337 12/08/06 11:00:02
Here is a small example where a level 10 snapshot started running longer than 1 minute. The interval logic was changed to round the next time to the nearest 1/2 hour.
interval=>'trunc(sysdate, ''HH24'') + ((floor(to_number(to_char(sysdate, ''MI''))/30)+1)*30) / (24*60)'
Subscribe to:
Posts (Atom)