Wednesday, August 22, 2007

More fun with SQL

Have you ever wondered if you can dynamically assign columns to a sql statement...using only sql and SQL*Plus commands?

Well...obviously it can be done or this would be one of the shortest blog entries in history!

I have been working on some statspack reports and have been frustrated by the change in sql statement tracking from 9i to 10g. A new column was added to one of the tables and this is the foreign key to another table. Faced with changing structures, the normal process is to create and maintain two separate versions of the code. Of course, since this was going to be a script used extensively and with a lot of information (think of it as my custom version of spreport) maintaining two (or more) versions was not going to be fun (especially when I lose track of what I am working on...).

I began to ponder using SQL*Plus variables as column names...and sure enough it can be done! A SQL*Plus variable can be dynamically assigned using a query and the NEW_VALUE clause of the COLUMN command.

First, issue the COLUMN command with the NEW_VALUE clause to assign the returned value to the variable. I also use the NOPRINT command to suppress any output.
COLUMN sql_summ_col_name NEW_VALUE sql_summ_col_name NOPRINT

Next, query the data dictionary to see if the 'new' column exists. Using the scalar subquery in the decode enables the query to always return a value. 1 if the column exists, 0 if it does not.

SELECT DECODE((SELECT COUNT(1)
FROM user_tab_columns
WHERE table_name = 'STATS$SQL_SUMMARY'
AND column_name = 'OLD_HASH_VALUE'),
1,'old_hash_value', 'hash_value')
sql_summ_col_name
FROM dual
/

Finally, use the variable in a query.

SQL> SELECT s1.hash_value         c_sql_hash_value,
2 s1.&sql_summ_col_name c_sql_old_hash_value,
3 s1.snap_id c_snap_id
4 FROM stats$sql_summary s1
5 WHERE rownum < 5
6 /
old 2: s1.&sql_summ_col_name c_sql_old_hash_value,
new 2: s1.old_hash_value c_sql_old_hash_value,

C_SQL_HASH_VALUE C_SQL_OLD_HASH_VALUE C_SNAP_ID
---------------- -------------------- ----------
3159716790 3694268570 16
864012087 4274598960 16
1471956217 2065408759 16
454710966 3665763022 16


You can see where the variable name is used in the column list and the result of the substitution of the value.

Let's see an example from a 9i database. Same setup, same query up to this one.
SQL> SELECT s1.hash_value         c_sql_hash_value,
2 s1.&sql_summ_col_name c_sql_old_hash_value,
3 s1.snap_id c_snap_id
4 FROM stats$sql_summary s1
5 WHERE rownum < 5
6 /
old 2: s1.&sql_summ_col_name c_sql_old_hash_value,
new 2: s1.hash_value c_sql_old_hash_value,

C_SQL_HASH_VALUE C_SQL_OLD_HASH_VALUE C_SNAP_ID
---------------- -------------------- ----------
1353454894 1353454894 4533
1355497416 1355497416 4533
1356705287 1356705287 4533
1356823499 1356823499 4533


As you can see, the column substituted is hash_value as old_hash_value does not exist in this version.

There is one more wrinkle to add to this. In 9i, the c_sql_hash_value and c_sql_old_hash_value columns are going to contain identical data. So...why not suppress the display of the redundant data! Once again, the COLUMN command comes to our rescue.

First define another column
COLUMN old_hv_print NEW_VALUE -
old_hv_print NOPRINT


Change the data dictionary query slightly to include a call to PRINT or NOPRINT the column.
SELECT DECODE((SELECT COUNT(1)
FROM user_tab_columns
WHERE table_name = 'STATS$SQL_SUMMARY'
AND column_name = 'OLD_HASH_VALUE'),
1,'old_hash_value', 'hash_value')
sql_summ_col_name,
DECODE((SELECT COUNT(1)
FROM user_tab_columns
WHERE table_name = 'STATS$SQL_SUMMARY'
AND column_name = 'OLD_HASH_VALUE'),
1,'PRINT', 'NOPRINT') old_hv_print
FROM dual
/


Now define the c_sql_old_hash_value COLUMN to include the PRINT/NOPRINT
COLUMN c_sql_old_hash_value  FORMAT 999999999 -
HEADING 'Old Hash Value' &old_hv_print


Finally, reexcute the queries.
10g - The old_hash_value column will be displayed
SQL> SELECT s1.hash_value         c_sql_hash_value,
2 s1.&sql_summ_col_name c_sql_old_hash_value,
3 s1.snap_id c_snap_id
4 FROM stats$sql_summary s1
5 WHERE rownum < 5
6 /
old 2: s1.&sql_summ_col_name c_sql_old_hash_value,
new 2: s1.old_hash_value c_sql_old_hash_value,

C_SQL_HASH_VALUE C_SQL_OLD_HASH_VALUE C_SNAP_ID
---------------- -------------------- ----------
3159716790 3694268570 16
864012087 4274598960 16
1471956217 2065408759 16
454710966 3665763022 16


9i - the old_hash_value column will not be displayed

SQL> SELECT s1.hash_value         c_sql_hash_value,
2 s1.&sql_summ_col_name c_sql_old_hash_value,
3 s1.snap_id c_snap_id
4 FROM stats$sql_summary s1
5 WHERE rownum < 5
6 /
old 2: s1.&sql_summ_col_name c_sql_old_hash_value,
new 2: s1.hash_value c_sql_old_hash_value,

C_SQL_HASH_VALUE C_SNAP_ID
---------------- ----------
1353454894 4533
1355497416 4533
1356705287 4533
1356823499 4533

Friday, August 10, 2007

Outer Joins, Nulls and Filters

In honor of "Mr. Null" Lex de Haan. A dear friend who is sorely missed by all those touched by his intelligence and humanity. He would have been 53 on Saturday August 11th.

Recently a developer needed assistance on a query with an outer join and a filter on a column in the outer table (the one where nulls are returned). The issue was that the data being returned did not match what was expected. We looked at the data, ran a couple of test cases and validated that the data returned was correct.

This led me to think about nulls, outer joins and filters. This is not a common request, but it can be a head scratcher. If the filter column is NOT NULL, it is pretty straight forward...but what happens when the filter column can be NULL? This makes it hard to distinguish between a NULL value being returned from the table and a NULL value being returned as the result of the outer join.

Let's set up a test case. The business requirement is to return all claims with payment information, if it exists. Any record in the PAYMENT table that is not a payment (PAYMENT_TYPE = 'P') is to be excluded. The PAYMENT_TYPE column is NULLable and NULL PAYMENT_TYPES should be excluded.

Here are the tables and data. The DDL calls to create/populate the tables is at the end of the blog entry.


SELECT * FROM claim
/
CLAIM_NUM CLAIM_AMT CLAIM_BAL
---------- ---------- ----------
1 100 0
2 200 99
3 300 0
4 400 350
5 500 0
6 600 295
7 700 700
8 800 800

8 rows selected.

SELECT * FROM payment
/
PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
----------- ---------- - -----------
1 1 P 100
2 2 A 1
3 2 P 100
4 3 A 50
5 3 P 100
6 3 P 150
7 4 P 50
8 5 P 500
9 6 A 105
10 6 P 100
11 6 P 100
12 7 100

12 rows selected.


Let's do a simple join on the tables.


SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num
/


CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 2 2 A 1
2 200 99 3 2 P 100
3 300 0 4 3 A 50
3 300 0 5 3 P 100
3 300 0 6 3 P 150
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 9 6 A 105
6 600 295 10 6 P 100
6 600 295 11 6 P 100
7 700 700 12 7 100

12 rows selected.

There are three problems with this result set.
1) CLAIM_NUM 8 is not included as it does not have a related record in the payment table.
2) CLAIM_NUM 7 is included even though the PAYMENT_TYPE is NULL.
3) PAYMENT_TYPEs of A (for adjustment) are included.

Adding in the outer join to return NULL values for CLAIMs without any PAYMENTs resolves the first problem.


SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num (+)
/
CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 2 2 A 1
2 200 99 3 2 P 100
3 300 0 4 3 A 50
3 300 0 5 3 P 100
3 300 0 6 3 P 150
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 9 6 A 105
6 600 295 10 6 P 100
6 600 295 11 6 P 100
7 700 700 12 7 100
8 800 800

13 rows selected.


The query is still returning too many rows as we are not filtering based on PAYMENT_TYPE of 'P'. This is where it begins to get complicated.


SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num (+)
AND p.payment_type = 'P'
/
CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 3 2 P 100
3 300 0 6 3 P 150
3 300 0 5 3 P 100
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 11 6 P 100
6 600 295 10 6 P 100

8 rows selected.


With the filter condition added, the non-P PAYMENT_TYPEs have been excluded, but so has the CLAIM_NUM of 8. As the PAYMENT_TYPE will be NULL for the outer join, changing the filter condition to use NVL should resolve the issue.



SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num (+)
AND NVL(p.payment_type, 'P') = 'P'
/

CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 3 2 P 100
3 300 0 5 3 P 100
3 300 0 6 3 P 150
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 10 6 P 100
6 600 295 11 6 P 100
7 700 700 12 7 100
8 800 800

10 rows selected.


1 step forward, CLAIM_NUM 8 is back.
1 step backward, PAYMENT_NUM 12 for CLAIM_NUM 7 is back.

The underlying issue is that the query does not differentiate between NULL PAYMENT_TYPEs in the PAYMENT table and NULL PAYMENT records being 'returned' because of the outer join. It is important to understand that if the filter column on the outer table is NOT NULL, you don't have to deal with this scenario. If the filter column contains NULLs, you need to use a CASE or DECODE to properly filter out the outer table NULLs. In this case, PAYMENT_NUM does not contain any NULL values, so it can be used.



SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num (+)
AND (CASE WHEN p.payment_num IS NULL THEN 'P'
ELSE p.payment_type
END
) = 'P'
/

CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 3 2 P 100
3 300 0 5 3 P 100
3 300 0 6 3 P 150
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 10 6 P 100
6 600 295 11 6 P 100
8 800 800

9 rows selected.


Now we have the data set we want.

And here's the DDL...


CREATE TABLE claim
(claim_num NUMBER,
claim_amt NUMBER,
claim_bal NUMBER
)
/

INSERT INTO claim VALUES (1, 100.00, 0)
/
INSERT INTO claim VALUES (2, 200.00, 99.00)
/
INSERT INTO claim VALUES (3, 300.00, 0)
/
INSERT INTO claim VALUES (4, 400.00, 350.00)
/
INSERT INTO claim VALUES (5, 500.00, 0)
/
INSERT INTO claim VALUES (6, 600.00, 295.00)
/
INSERT INTO claim VALUES (7, 700.00, 700.00)
/
INSERT INTO claim VALUES (8, 800.00, 800.00)
/

COMMIT
/


CREATE TABLE payment
(payment_num NUMBER,
claim_num NUMBER,
payment_type CHAR(1),
payment_amt NUMBER
)
/

INSERT INTO payment VALUES (1,1,'P', 100.00)
/
INSERT INTO payment VALUES (2,2,'A', 1.00)
/
INSERT INTO payment VALUES (3,2,'P', 100.00)
/
INSERT INTO payment VALUES (4,3,'A', 50.00)
/
INSERT INTO payment VALUES (5,3,'P', 100.00)
/
INSERT INTO payment VALUES (6,3,'P', 150.00)
/
INSERT INTO payment VALUES (7,4,'P', 50.00)
/
INSERT INTO payment VALUES (8,5,'P', 500.00)
/
INSERT INTO payment VALUES (9,6,'A', 105.00)
/
INSERT INTO payment VALUES (10,6,'P', 100.00)
/
INSERT INTO payment VALUES (11,6,'P', 100.00)
/
INSERT INTO payment VALUES (12,7,'', 100.00)
/

COMMIT
/

Followers

Blog Archive

About Me