Monday, March 30, 2009
In this series, I have tried to explain in non-mathematical terms what the first three Normal Forms mean and how they determine database design.
This is not the most useful method of learning normalization. In fact, many expert developers never learn the formal definition of the normal forms. If you haven't already, I suggest you read the following series:
However, I think it is useful to know what the Normal Forms are. Sometimes when you get stuck in a design, you can go back to the definitions to get yourself out of trouble.
So, in summary:
First Normal Form (1NF) says that each record must be unique, that is, it has a primary key. There are some additional restrictions on how such uniqueness is maintained such as not allowing positional referencing and no repeated columns.
Second Normal Form (2NF) says that each field in the record must depend on the whole primary key, not just a part of it.
Third Normal Form (3NF) says that no field must depend on any other field except the primary key.
William Kent, author of A Simple Guide to Five Normal Forms in Relational Database Theory, once abbreviated the first three normal forms like this:
"The Key, the whole Key, and nothing but the Key, so help me Codd."
Thursday, March 26, 2009
by A.D. Tejpal
Sample Db - Explanatory Notes
This sample db demonstrates movement of tab control to any desired location, along with all its controls located on various pages.
Conventional approach to moving a control does not suit a tab control, as the controls contained in it stay put at their current locations. As a result, one can land up with a badly deformed tab control, in an effort not to let go of any control.
A workaround for ensuring correct movement of tab control, along with all its controls, without resorting to custom built collection or array, is demonstrated in this sample db.
Version: Access 2000 File Format
You can find the sample here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=448
Wednesday, March 25, 2009
Last time, in The Normal Forms: Second Normal Form, I discussed how to remove redundant data by identifying fields which are not functionally dependant on the entire primary key. Figure 1 shows the results.
Figure 1: Order table Decomposed into Orders and Order Details
This corrected some data anomaly errors in my data, however, data anomalies are still possible under 2NF. To prevent these anomalies, I need an additional rule: Third Normal Form (3NF).
A table is said to be in Third Normal Form (3NF) if:
- It is in Second Normal Form and
- If all non-key fields are mutually independent, that is, all fields are functionally dependant ONLY on the primary key field(s).
There are two main sources of data anomalies that 3NF corrects are 1) Redundant Data and 2) Calculated fields.
Although I removed some of the redundant data when I split the Order table into Orders and OrderDetails, there is still some redundancy left, namely ProductNum and Item. Both of these fields are dependant on the entire primary key, so they comply with 2NF. However The ProductNum and Item fields are mutually dependant, that is, they depend upon each other. The product number determines the item description and the item description determines the product number.
Just as we saw in 2NF, redundancy can lead to inconsistent data being entered into the database or correct information being changed after the fact. Figure 2 shows some data anomalies possible under 2NF as a result of redundant data.
Figure 2: 2NF Data Anomalies Due to Redundant Data
Product A7S has two different items associated with it: either a wrench or a nail. Which is it?
Also, two product numbers (B7G and B7H) are associated with an Item called "saw". Is this the same saw or not?
Mutual dependency is also an issue with storing calculated values. Suppose I had a Quantity and Price field and I decided to calculate the ExtendedPrice by multiplying the them. This is a common database error made by novices.
The problem is one of dependency. The Extended Price calculation depends on the Quantity and Price fields for its value. 3NF says that no field should depend on any field except those making up the primary key.
If I store that calculation and later go back and change one of the dependant fields (either the Quantity or the Price), my calculation will be incorrect. Figure 3 shows some calculated values anomalies.
Figure 3: Anomalies with Calculated Values
First of all, if the user is manually calculating and typing in the value of the Extended Price, the value could be anything, even a calculation from a different row. So let's assume I have an automated process, a formula in a form which calculates the value.
The problem is that you must depend on programming to maintain your data integrity, not the database itself. If the integrity is maintained at the database level, it cannot be subverted.
In the case of the table above, the first anomalous record was caused by changing the Quantity from 1 to 2 after the fact. But because I didn't have a process to re-calculate the value if Quantity changed, the Extended Price is now wrong.
In the second case, an Update Query was used to raise the price of nails was raised from $0.09 to $0.10. Unfortunately, the query did not include a new calculation, so all Extended Price calculations for nails are now wrong.
The problem of calculated values is a simple one to solve. Don't. As a general rule, I just don't store calculations. There are minor exceptions, but in most cases, I'll be safe by just leaving them out. When I need these values, I'll calculate them as output in either a query, form, or report.
As with 2NF, the solution to redundant data is to remove it to a separate table, leaving one field to join back to the original. In this case, the ProductNum, Item, and Price fields will go into the Products table. I'll leave ProductNum in the Order Detail table to maintain the relationship. Figure 4 is the result.
Figure 4: Decomposing Order Details to Remove Redundant Data
So now I've removed as much redundant data as possible. There's still a little left. There always will be in order to maintain the relationships between tables. But none of the redundancy will result in data anomalies, so I can say with confidence that my tables are now normalized to Third Normal Form. Figure 5 shows the final design.
Figure 5: Final Design
In my next and final post: The Normal Forms: In A Nutshell, I'll wrap it all up.
Thursday, March 19, 2009
by A.D. Tejpal
Sample Add-In - Explanatory Notes
General: This Add-In is meant to utilize a set of common reports belonging to the Add-In, in various db's. Queries in host db are used as record source for such reports. On activation of the Add-In, names of reports belonging to the Add-In are shown in a list box at left, while names of queries available in host db are displayed in list box at right.
For a given report selected in left list box, the user can select an appropriate query in right list box. The selected report will open, using selected query as its record source.
Version: Access 2000 File format
Reference Required: DAO 3.6
You can find the sample here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=447
Wednesday, March 18, 2009
Last time, in The Normal Forms: First Normal Form, I discussed the rules for the basic arrangement of data in a table. If you don't follow those rules, called the First Normal Form (1NF), you don't even have a table. But even if a table is normalized to 1NF, that doesn't mean it's perfect. Figure 1 shows a table normalized to 1NF.
Figure 1: Order Table - 1NF
The problem here is the danger of introducing errors, called data anomalies, into the table. Data anomalies can be introduced by operator error or through programming. Once you have a single data anomaly in your table, all of your data is suspect, so the remaining normal forms work to remove such data anomalies. Figure 2 shows the same table with data anomalies present.
Figure 2: Order Table with Data Anomalies Present
As you can see, Order 112 has two different customer numbers (444 and 445), which is correct? It is impossible to tell. In addition, both product numbers B7G and B7H are identified as a 'saw'. Are these the same product with different product numbers or different products with the same description? Again, I can't know based on the data in the database.
The root cause of these data anomalies is redundant data, that is, data that is repeated in multiple rows. So we need to minimize this redundant data as much as possible.
Now wait a second! Didn't I just say in the last post that I HAD to repeat the values? Yes I did. But that was to comply with 1NF, which is not the end of the story.
So let's look at the definition of Second Normal Form (2NF). A table is said to be in 2NF if:
- It is in 1NF.
- Every field is functionally dependant on the entire primary key, that is, it depends on the entire primary key for its value.
Before I can continue, I have to talk a bit about functional dependencies, because all of the remaining normal forms rely on this concept. Functional dependency speaks to the relationship that fields in a table have to each other. It is perhaps best explained by example.
Suppose there is an Employee table, and I am an entity in that table. There is a row that represents all the information about Roger Carlson with Social Security Number (SSN) acting as the primary key. Since all the fields in my row are information about me, and I am represented by the SSN, we can say that each field depends on SSN for its value. Another way to say it is that SSN implies the value of each of the other fields in my record.
If a different row is selected, with a different SSN, the values of all the other fields will change to represent that entity.
Second Normal Form says that all of the fields in a table must depend on the ENTIRE primary key. When there is a single primary key (like SSN), it is pretty simple. Each field must be a fact about the record. But when there is a compound primary key, it's possible that some fields may depend on just part of the primary key.
Going back to our Order Table example, Figure 3 shows these partial dependencies.
Figure 3: 1NF Orders Table showing dependencies
In order to uniquely identify the record, the primary key of this table is a combination of OrderNum and ProductNum (or Item, but a number is a better choice).
2NF says that each field must depend on the ENTIRE primary key. This is true for some fields: Quantity depends on both the OrderNum and ProductNum, so does Item. However, some fields do not.
Order 112 will be for customer 444 regardless of the product selected. The order date does not change when the product number changes either. These fields depend ONLY on the OrderNum field.
Since some fields do not depend on the entire primary key, it is not in Second Normal Form. So what do I do about it?
The solution is to remove those records, which do not depend on the entire primary key, to a separate table where they do. In the process, I remove the redundant or repeated data so there is just a single record for each. Figure 4 shows the process of decomposing the table into two tables.
Figure 4: 1NF Orders table Decomposed to 2NF
This corrects the data anomaly with the two different customers for the same order. However, I still have the problem of the product number and the item description. It's still possible for the same product to have different descriptions or different items sharing the same ProductNum, as Figure 5 illustrates.
Figure 5: Remaining data anomalies.
Product A7S is either a wrench or a nail, and a saw is either product number B7G or B7H.
To correct these problems, I need to add yet another normal form: Third Normal Form. I'll talk about that next.
Tuesday, March 17, 2009
by A.D. Tejpal
Sample Db - Explanatory Notes
This sample db demonstrates Undo / Redo features while entering / editing data through an unbound form. Use of collections for storing control and field values respectively, facilitates the process.
Adding a new record:
Clicking the Add command button presents a simulated new record in distinct back color (indicative of new record). As soon as it is made dirty, Save and Undo command buttons become enabled. Clicking Save adds the record to source table and back color changes confirming the process. If, instead of Save, Undo is clicked, the existing entries get ignored, and fresh blank new record is presented.
Editing an existing record:
Based upon selection in combo box, the controls get populated with the contents of desired record, with a back color distinct from that for new record. As soon as the record is made dirty (by making change in any of the controls), Save and Undo buttons get enabled.
Clicking the Undo button restores the original values and simultaneously, Redo button gets enabled (accompanied by disabling of Undo button). Clicking the Redo button restores the values back to recently edited ones. Simultaneously, Undo button gets enabled (accompanied by disabling of Redo button). The user can shuttle between the Undo & Redo buttons as often as desired. Different back colors confirm Undo / Redo actions.
Clicking Save button saves the record as edited.
Clicking the Delete button deletes the selected record. At the same time, autonumber for source table gets reset suitably so that next one will get generated in sequence, without leaving any holes.
Version: Access 2000 File Format
You can find the sample here: http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=446