Monday, June 4, 2012

Count Distinct In Access: Part 4

Crosstab Method (Reader Submitted)

Note: In the comments section of Part 3: User-Defined Function, Patrick mentioned another method that uses a Crosstab Query.  Even though I didn't develop it, I'm including it in this series for completeness.  So, thank you, Patrick

SQL Server has a nice built-in function called COUNT DISTINCT, which is missing in Access SQL.

What does COUNT DISTINCT do?  Well, there are times when you want to count distinct values in a query, that is, a count of values without duplicates.  For instance, given the following table, how many distinct customers have orders?

ORDERS
OrderIDOrderDateCustomerAmount
11/17/2008Ajax Inc.$310.00
21/17/2008Ajax Inc.$510.50
31/17/2008Ajax Inc.$311.00
41/17/2008Baker Corp.$5,144.00
51/17/2008Baker Corp.$61.00
61/17/2008Baker Corp.$110.50
71/17/2008Baker Corp.$11.00
81/17/2008Crystal & Co.$111.85
91/17/2008Crystal & Co.$511.00
101/18/2008Baker Corp.$711.95
111/18/2008Baker Corp.$810.00
121/18/2008Baker Corp.$310.59
131/18/2008Crystal & Co.$311.00
141/18/2008Crystal & Co.$811.50
151/18/2008Ajax Inc.$512.00
161/18/2008D&D LLC$211.00
171/18/2008D&D LLC$3,311.50
181/19/2008Ajax Inc.$410.00
191/19/2008Ajax Inc.$610.50
201/19/2008Baker Corp.$4,411.00
211/19/2008Baker Corp.$511.50
221/19/2008Baker Corp.$611.50

In SQL Server, I can do this:

SELECT COUNT(DISTINCT Customer) AS CountOfCustomer FROM Orders

Which will give me the following:

CountOfCustomer
4

In Access, if I use the Distinct predicate with the count:

SELECT DISTINCT Count(Customer) AS CountOfCustomer FROM Orders;

I get:

CountOfCustomer
22

Since Access SQL does not have the Count Distinct function, what can I do?
There are actually four different methods for simulating the Count Distinct:
  1. Subqueries the FROM Clause
  2. Subqueries in the Field List
  3. User-Defined Function
  4. Crosstab Query (this post)
Each of these methods have advantages and disadvantages, and I'll address each in turn. 

 Crosstab Query Method

A crosstab query presents aggregated data in an easy-to-understand grid. 
To create a simple Count Distinct as above, use the following SQL:

TRANSFORM Count(*) AS Cell
SELECT Count(cell) AS CountOfCustomer
FROM Orders
GROUP BY "Anything"
PIVOT Customer In (null);

To get the following result:

CountOfCustomer <>
4

More Complex Queries

The Crosstab can also be modified to produce a customer count grouped by the OrderDate:

TRANSFORM Count(*) AS Cell
SELECT OrderDate, Count(cell) AS CountOfCustomer
FROM Orders
GROUP BY OrderDate
PIVOT Customer In (null);


OrderDate CountOfCustomer <>
1/17/2008 3
1/18/2008 4
1/19/2008 2

Problem: Additional Aggregation

However, in the other methods I've discussed, I was able to add additional aggregates to the query like this:

OrderDate CountOfCustomer CountOfOrderID SumOfAmount
1/17/2008 3 9 $7,080.85
1/18/2008 4 8 $6,989.54
1/19/2008 2 5 $6,554.50

I can't figure out how to do that with the Crosstab query, so I'd say this method, although fast, is somewhat limited in terms of flexibility. 

Patrick insists this method has MUCH better performance than the UDF method.  Next time, I'll look at each of the methods and discuss their pros and cons, so we'll see.

2 comments:

Anonymous said...

To add parameters onto a crosstab query, you need to create 2 separate crosstab queries with the same row labels, and then run a 3rd query to combine the columns from the 2 crosstabs.

Anonymous said...

Fantastic! I am querying unique item codes in from list of 300,000 line items in 27 warehouses to get the unique item code count per warehouse as the item code can be repeated many times per warehouse....Works just perfect as I don't need to aggregate up to any higher group level. so a unique count by warehouse is all I needed. I had been trying for ages to do this!! Thank you