Sunday, March 11, 2018

How do I create a Running Sum Query?

SQL is a very powerful query language that is built into Microsoft Access. But one thing SQL does not do very well is refer to a previous row to calculate values. As a result, some things that are simple to do in a spreadsheet are difficult in a query. One of those is creating a running sum.
A running sum adds the value of a field in a record to the value of the same field in the previous record.
Running Sums are fairly easy in an Access Report, but more difficult in a query.  Difficult, but not impossible. This article will show how to create a running sum query.
Running Sum in a Report
Create a running sum in a report with an unbound textbox in the Detail section. This time, however, in the control source, put the name of the field you want the running sum on. And just like the number query, set the Running Sum property to Over All.
A running sum adds the value of a field in a record to the value of the same field in the previous record. image
Figure 1: Shows the property dialog box for a text box on a report that will create a running sum.
Running Sum in Query
But suppose I don't want to do it in a report. Suppose I want to do it directly in a query. There are two different ways to accomplish this. The first uses the Domain Aggregate Function DCount and the second uses a Correlated Subquery.
Both of these methods require a unique column in the table to create the sequence on. This could be the Primary Key field or any field that has a Unique Index. In the following example, the Customers table has two such columns, CustID (Customer ID), which is the primary key, and CustName (Customer Name), which has a unique index.
I’ve already discussed the Domain Function method in a previous article here: Domain Function Example: Running Sum with DSum, so in this article, I’ll address the Sub Query method.
Two Types: OverAll and OverGroup
There are two common types of running sums. The first is a running sum over the entire resultset. see Figure 2 for RunningSumOverAll 
image
Figure 2 illustrates the running sum of the Price field in an Order Details table.
The second is a running sum over a group. See Figure 3 RunningSumOverGroup
image
Figure 3 shows a running sum over a group.
In this case, the group is the OrderID field, and you'll see that the query starts the summing over as the group number value changes.
Correlated Sub Query Method
When used in either the FROM clause or Field List, a subquery must be correlated to the main query. What does that mean? Well, as the name implies, correlated means co-related, that is the records in the subquery must be related to a single record in the main query. In this way, the subquery can be executed for each record in the main query.
Let me take this in small steps.  Suppose I have an Order Details table that looks like this:
image
The Main Query
The table is sorted on OrderDetailID, so the first thing is to create a placeholder in the query:
image
SELECT tblOrderDetails.OrderDetailID,
     tblOrderDetails.OrderID,
     tblOrderDetails.ProductID,
     tblOrderDetails.Price,
     Null AS RunningSumFROM tblOrderDetails;

It is very important that the main query should be sorted on the field the sequence will be created on.  In this case OrderDetailID.
The Sub Query
Now I just need to replace the NULL with the running sum value.  But how? In Excel, I’d reference the cell to the left (Price) and add it to the cell above (RunningSum).  So cell B5 would have a formula of =A5+B4.
image
Figure 5: Method that WILL NOT work in Access.
But tempting as this is, Access can’t reference individual cells.  I need a different method.
Since addition is cumulative, I can sum A1 through A4, in other words, B4 = SUM(A1:A4)
image
Figure 6:This method CAN be translated to Access
This method I can use. 
Running Sum Over All
I can find the value of any individual record (say OrderDetailID=4) with a query like this:
image
Figure 7
or in SQL
SELECT Sum([Price])
FROM tblOrderDetails
WHERE tblOrderDetails.OrderDetailID <= 4

The result of which will be:
image
Figure 8:
Adding that to my main query, I could try something like this:
SELECT tblOrderDetails.OrderDetailID,
     tblOrderDetails.OrderID,
     tblOrderDetails.ProductID,
     tblOrderDetails.Price,
     (SELECT Sum(Price) FROM tblOrderDetails
        WHERE OrderDetailID <= 4)
AS RunningSumFROM tblOrderDetails;
but this will create a couple of problems. 
The first is a syntax issue.  I can’t directly reference the same table in two separate FROM clauses in a single SQL statement.  I need to Alias the table name in the subquery
Select Sum(Price) FROM tblOrderDetails as OD WHERE OD.OrderDetailID <=4
so I’d get:
SELECT tblOrderDetails.OrderDetailID,
     tblOrderDetails.OrderID,
     tblOrderDetails.ProductID,
     tblOrderDetails.Price,
     (SELECT Sum(Price) FROM tblOrderDetails as OD
        WHERE OD.OrderDetailID <= 4)
AS RunningSum
FROM tblOrderDetails;
The second problem is that this will return a $12.00 for every record.
image
Figure 9:

So I need to link the subquery with a particular record in the main query.

Pulling Them Together

The last piece is to reference the OrderDetailID in the main query rather than the hard-coded 4.

SELECT tblOrderDetails.OrderDetailID,
     tblOrderDetails.OrderID,
     tblOrderDetails.ProductID,
     tblOrderDetails.Price,
    (SELECT Sum(Price) FROM tblOrderDetails as OD
        WHERE OD.OrderDetailID <= tblOrderDetails.OrderDetailID)
          
AS RunningSum
FROM tblOrderDetails;
Which gives me this:
image
Figure 10: Results of the completed Running Sum Over All query.
Here's how it works.

For each record in the main query, Access runs the subquery. The subquery returns the number of records where the OrderDetailID in the subquery is less than or equal to the OrderDetailID in that record of the main query.

So in the first record, the OrderDetailID is 1. So the subquery opens the OrderDetails table again and sees that there is only 1 record whose OrderDetailID is less than or equal to 1. So it sums the value of that one field, which simply returns that value ($5.00).

Then it processes the second record. The OrderDetailID of that record is 2, and the subquery sees that there are only 2 records which have an OrderDetailID whose value is less than or equal to 2. So it sums those records ie. $5.00 + (-$2.00) = $3.00.

And so on.
Running Sum Over Group
Instead of creating a running sum over the entire resultset as we did above, you might want to create a running sum for a grouping of records. In the case of the Order Details table above, it might make more sense to give a running sum for each OrderID group.
Creating a running sum over a group of records rather than the whole resultset is a little more complicated, but not much. Again, I can use subquery. But I need to add an additional criterion to the Where condition to include the value that we want to group on.
And again, the only addition to the subquery method is to sum only those values that have the same OrderID. Thus the subquery looks like this:
SELECT Sum(Price) FROM tblOrderDetails as OD
        WHERE OD.OrderDetailID <= tblOrderDetails.OrderDetailID AND
             OD.OrderID=tblOrderDetails.OrderID

And the whole query, like this:
SELECT tblOrderDetails.OrderDetailID,
     tblOrderDetails.OrderID,
     tblOrderDetails.ProductID,
     tblOrderDetails.Price,
    (SELECT Sum(Price) FROM tblOrderDetails as OD
        WHERE OD.OrderDetailID <= tblOrderDetails.OrderDetailID
           AND OD.OrderID=tblOrderDetails.OrderID) AS RunningSum
FROM tblOrderDetails;
The result looks like Figure 11.
image
Figure 11: Result of the Running Sum query over a group. Notice how the running sum resets when the OrderID changes from 1234 to 1235.
You can find a sample called

RunningSumInQuery

which illustrates both Over All and Over Group as well as the corresponding DSum Method (domain aggregate function).










No comments: