Friday, August 10, 2007

Outer Joins, Nulls and Filters

In honor of "Mr. Null" Lex de Haan. A dear friend who is sorely missed by all those touched by his intelligence and humanity. He would have been 53 on Saturday August 11th.

Recently a developer needed assistance on a query with an outer join and a filter on a column in the outer table (the one where nulls are returned). The issue was that the data being returned did not match what was expected. We looked at the data, ran a couple of test cases and validated that the data returned was correct.

This led me to think about nulls, outer joins and filters. This is not a common request, but it can be a head scratcher. If the filter column is NOT NULL, it is pretty straight forward...but what happens when the filter column can be NULL? This makes it hard to distinguish between a NULL value being returned from the table and a NULL value being returned as the result of the outer join.

Let's set up a test case. The business requirement is to return all claims with payment information, if it exists. Any record in the PAYMENT table that is not a payment (PAYMENT_TYPE = 'P') is to be excluded. The PAYMENT_TYPE column is NULLable and NULL PAYMENT_TYPES should be excluded.

Here are the tables and data. The DDL calls to create/populate the tables is at the end of the blog entry.


SELECT * FROM claim
/
CLAIM_NUM CLAIM_AMT CLAIM_BAL
---------- ---------- ----------
1 100 0
2 200 99
3 300 0
4 400 350
5 500 0
6 600 295
7 700 700
8 800 800

8 rows selected.

SELECT * FROM payment
/
PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
----------- ---------- - -----------
1 1 P 100
2 2 A 1
3 2 P 100
4 3 A 50
5 3 P 100
6 3 P 150
7 4 P 50
8 5 P 500
9 6 A 105
10 6 P 100
11 6 P 100
12 7 100

12 rows selected.


Let's do a simple join on the tables.


SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num
/


CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 2 2 A 1
2 200 99 3 2 P 100
3 300 0 4 3 A 50
3 300 0 5 3 P 100
3 300 0 6 3 P 150
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 9 6 A 105
6 600 295 10 6 P 100
6 600 295 11 6 P 100
7 700 700 12 7 100

12 rows selected.

There are three problems with this result set.
1) CLAIM_NUM 8 is not included as it does not have a related record in the payment table.
2) CLAIM_NUM 7 is included even though the PAYMENT_TYPE is NULL.
3) PAYMENT_TYPEs of A (for adjustment) are included.

Adding in the outer join to return NULL values for CLAIMs without any PAYMENTs resolves the first problem.


SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num (+)
/
CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 2 2 A 1
2 200 99 3 2 P 100
3 300 0 4 3 A 50
3 300 0 5 3 P 100
3 300 0 6 3 P 150
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 9 6 A 105
6 600 295 10 6 P 100
6 600 295 11 6 P 100
7 700 700 12 7 100
8 800 800

13 rows selected.


The query is still returning too many rows as we are not filtering based on PAYMENT_TYPE of 'P'. This is where it begins to get complicated.


SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num (+)
AND p.payment_type = 'P'
/
CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 3 2 P 100
3 300 0 6 3 P 150
3 300 0 5 3 P 100
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 11 6 P 100
6 600 295 10 6 P 100

8 rows selected.


With the filter condition added, the non-P PAYMENT_TYPEs have been excluded, but so has the CLAIM_NUM of 8. As the PAYMENT_TYPE will be NULL for the outer join, changing the filter condition to use NVL should resolve the issue.



SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num (+)
AND NVL(p.payment_type, 'P') = 'P'
/

CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 3 2 P 100
3 300 0 5 3 P 100
3 300 0 6 3 P 150
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 10 6 P 100
6 600 295 11 6 P 100
7 700 700 12 7 100
8 800 800

10 rows selected.


1 step forward, CLAIM_NUM 8 is back.
1 step backward, PAYMENT_NUM 12 for CLAIM_NUM 7 is back.

The underlying issue is that the query does not differentiate between NULL PAYMENT_TYPEs in the PAYMENT table and NULL PAYMENT records being 'returned' because of the outer join. It is important to understand that if the filter column on the outer table is NOT NULL, you don't have to deal with this scenario. If the filter column contains NULLs, you need to use a CASE or DECODE to properly filter out the outer table NULLs. In this case, PAYMENT_NUM does not contain any NULL values, so it can be used.



SELECT c.claim_num, c.claim_amt, c.claim_bal,
p.payment_num, p.claim_num, p.payment_type, p.payment_amt
FROM claim c,
payment p
WHERE c.claim_num = p.claim_num (+)
AND (CASE WHEN p.payment_num IS NULL THEN 'P'
ELSE p.payment_type
END
) = 'P'
/

CLAIM_NUM CLAIM_AMT CLAIM_BAL PAYMENT_NUM CLAIM_NUM P PAYMENT_AMT
---------- ---------- ---------- ----------- ---------- - -----------
1 100 0 1 1 P 100
2 200 99 3 2 P 100
3 300 0 5 3 P 100
3 300 0 6 3 P 150
4 400 350 7 4 P 50
5 500 0 8 5 P 500
6 600 295 10 6 P 100
6 600 295 11 6 P 100
8 800 800

9 rows selected.


Now we have the data set we want.

And here's the DDL...


CREATE TABLE claim
(claim_num NUMBER,
claim_amt NUMBER,
claim_bal NUMBER
)
/

INSERT INTO claim VALUES (1, 100.00, 0)
/
INSERT INTO claim VALUES (2, 200.00, 99.00)
/
INSERT INTO claim VALUES (3, 300.00, 0)
/
INSERT INTO claim VALUES (4, 400.00, 350.00)
/
INSERT INTO claim VALUES (5, 500.00, 0)
/
INSERT INTO claim VALUES (6, 600.00, 295.00)
/
INSERT INTO claim VALUES (7, 700.00, 700.00)
/
INSERT INTO claim VALUES (8, 800.00, 800.00)
/

COMMIT
/


CREATE TABLE payment
(payment_num NUMBER,
claim_num NUMBER,
payment_type CHAR(1),
payment_amt NUMBER
)
/

INSERT INTO payment VALUES (1,1,'P', 100.00)
/
INSERT INTO payment VALUES (2,2,'A', 1.00)
/
INSERT INTO payment VALUES (3,2,'P', 100.00)
/
INSERT INTO payment VALUES (4,3,'A', 50.00)
/
INSERT INTO payment VALUES (5,3,'P', 100.00)
/
INSERT INTO payment VALUES (6,3,'P', 150.00)
/
INSERT INTO payment VALUES (7,4,'P', 50.00)
/
INSERT INTO payment VALUES (8,5,'P', 500.00)
/
INSERT INTO payment VALUES (9,6,'A', 105.00)
/
INSERT INTO payment VALUES (10,6,'P', 100.00)
/
INSERT INTO payment VALUES (11,6,'P', 100.00)
/
INSERT INTO payment VALUES (12,7,'', 100.00)
/

COMMIT
/

1 comments:

dzothang said...

Hello,
Thank for your sharing.
I have a problem
when c.claim_num = null and p.claim_num = null and I want to get all, what can I do?
for example:
where (c.claim_num is null or
c.claim_num = p.claim_num(+))
please help!
thank alot

Followers

Blog Archive

About Me