Saturday 2 August 2014

IN / NOT IN vs EXISTS / NOT EXISTS vs INTERSECT / EXCEPT vs JOIN (Part 1)

There are a number of ways of filtering your data according what exists or doesn't exist in another table or result set.  These are:

  1. IN and NOT IN
  2. EXISTS and NOT EXISTS
  3. INTERSECT and EXCEPT
  4. LEFT JOIN

In this series I will address the pros and cons of each, and how they handle nulls. I won't be discussing how you use them, but rather what they do and when to use them.

Let's start by prepping the tables that we'll use in the examples:

create table #orderItems ( orderItemID integer identity (1,1), productID integer, productName varchar(100), itemTotal decimal(6,2));
create table #products ( productID integer identity (1,1), productName varchar(100), price decimal(6,2));
insert into #products (productName,price) values ('Big Widget',39.99), ('Small Widget',12.95), ('Widget Multipack',72.95), ('Black Widget',14.99), ('Big Widget',39.99),  --Duplicate ('Super Widget',99.99);
insert into #orderItems (productID,productName,itemTotal) values (1,'Big Widget',35), (2,'Small Widget',12.95), (1,'Big Widget',39.99), (2,'Small Widget',12.95), (1,null,35), (5,'Super Widget',99.99);

The examples below will all be attempting to find the products that have had an order, or products that have not.  Note we have one duplicate product and we have one orderItem that has a null productName.  It's not very normalised, but for the purpose of this exercise we'll be matching on productName.

IN and NOT IN

These essentially just create a long list of AND or OR clauses and will of course return all rows, regardless of duplicates, from the containing query that match (or do not match) the resulting values from the subquery.  Lets look at IN first:

The following two queries are synonymous.  The second query is just to illustrate the logic of how the IN operator works.

select * 
from #products
where productName in (select productName from #orderItems);

select * 
from #products
where productName = 'Big Widget'
or productName = 'Small Widget'
or productName = 'Big Widget'
or productName = 'Small Widget'
or productName = null
or productName = 'Super Widget';  

The IN operator essentially creates a bunch of OR EQUAL TO conditions and a result is returned.  Now let's try NOT IN.

Again, the following two queries are synonymous:

select * 
from #products
where productName not in (select productName from #orderItems)

select * 
from #products
where productName <> 'Big Widget'
and productName <> 'Small Widget'
and productName <> 'Big Widget'
and productName <> 'Small Widget'
and productName <> null
and productName <> 'Super Widget';

Now for a NOT IN, it creates AND NOT EQUAL TO conditions.  Because these are now AND conditions, all of the results have to equate to true.  What happens when you run either of these queries?  What did you expect? Instead of seeing the two products that have no orders as you might expect, there are no results at all.  Can you see why? Remember that a null is an unknown value.  Nothing can equal, or not equal, an unknown.  So "productName <> null" does not return as true and because all results in the subquery set must be true, no result is returned.

Summary:

  • Be wary of nulls when doing a NOT IN.  Know your data if you plan on using NOT IN.
  • All rows will be returned. If you wanted distinct product names you would would have to use something like a distinct statement.
  • You can only match on one column.


Next:  EXISTS and NOT EXISTS

No comments:

Post a Comment

Deploying out of hours shouldn't be a thing

Getting code into production should not be hard. Nor should it have to be scheduled for special times. I have recently been involved with th...