ACC: Database Normalization Basics

This article has been archived. It is offered "as is" and will no longer be updated.
Novice: Requires knowledge of the user interface on single-user computers.

This article explains the basics of database normalization terminology. Abasic understanding of this terminology is helpful when discussing thedesign of a relational database.

NOTE: Microsoft also offers a WebCast that discusses the basics of database normalization. To view this WebCast, please visit the following Microsoft Web site:NOTE: To see this information for Microsoft Access 2000, please see the following article in the Microsoft Knowledge Base:
209534 ACC2000: Database Normalization Basics
More information

Description of Normalization

Normalization is the process of organizing data in a database. Thisincludes creating tables and establishing relationships between thosetables according to rules designed both to protect the data and tomake the database more flexible by eliminating two factors: redundancy andinconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. Ifdata that exists in more than one place must be changed, the data mustbe changed in exactly the same way in all locations. A customeraddress change is much easier to implement if that data is stored onlyin the Customers table and nowhere else in the database.

What is an "inconsistent dependency"? While it is intuitive for a userto look in the Customers table for the address of a particularcustomer, it may not make sense to look there for the salary of theemployee who calls on that customer. The employee's salary is relatedto, or dependent on, the employee and thus should be moved to theEmployees table. Inconsistent dependencies can make data difficult toaccess; the path to find the data may be missing or broken.

There are a few rules for database normalization. Each rule is calleda "normal form." If the first rule is observed, the database is saidto be in "first normal form." If the first three rules are observed,the database is considered to be in "third normal form." Althoughother levels of normalization are possible, third normal form isconsidered the highest level necessary for most applications.

As with many formal rules and specifications, real world scenarios donot always allow for perfect compliance. In general, normalizationrequires additional tables and some customers find this cumbersome. Ifyou decide to violate one of the first three rules of normalization,make sure that your application anticipates any problems that couldoccur, such as redundant data and inconsistent dependencies.

NOTE: The following descriptions include examples.

First Normal Form

  • Eliminate repeating groups in individual tables.
  • Create a separate table for each set of related data.
  • Identify each set of related data with a primary key.
Do not use multiple fields in a single table to store similar data.For example, to track an inventory item that may come from twopossible sources, an inventory record may contain fields for VendorCode 1 and Vendor Code 2.

But what happens when you add a third vendor? Adding a field is notthe answer; it requires program and table modifications and does notsmoothly accommodate a dynamic number of vendors. Instead, place allvendor information in a separate table called Vendors, then linkinventory to vendors with an item number key, or vendors to inventorywith a vendor code key.

Second Normal Form

  • Create separate tables for sets of values that apply to multiple records.
  • Relate these tables with a foreign key.
Records should not depend on anything other than a table's primary key(a compound key, if necessary). For example, consider a customer'saddress in an accounting system. The address is needed by theCustomers table, but also by the Orders, Shipping, Invoices, AccountsReceivable, and Collections tables. Instead of storing the customer'saddress as a separate entry in each of these tables, store it in oneplace, either in the Customers table or in a separate Addresses table.

Third Normal Form

  • Eliminate fields that do not depend on the key.
Values in a record that are not part of that record's key do notbelong in the table. In general, any time the contents of a group offields may apply to more than a single record in the table, considerplacing those fields in a separate table.

For example, in an Employee Recruitment table, a candidate'suniversity name and address may be included. But you need a completelist of universities for group mailings. If university information isstored in the Candidates table, there is no way to list universitieswith no current candidates. Create a separate Universities table andlink it to the Candidates table with a university code key.

EXCEPTION: Adhering to the third normal form, while theoreticallydesirable, is not always practical. If you have a Customers tableand you want to eliminate all possible interfield dependencies, youmust create separate tables for cities, ZIP codes, salesrepresentatives, customer classes, and any other factor that maybe duplicated in multiple records. In theory, normalization isworth pursuing; however, many small tables may degrade performanceor exceed open file and memory capacities.

It may be more feasible to apply third normal form only to data thatchanges frequently. If some dependent fields remain, design yourapplication to require the user to verify all related fields when anyone is changed.

Other Normalization Forms

Fourth normal form, also called Boyce Codd Normal Form (BCNF), andfifth normal form do exist, but are rarely considered in practicaldesign. Disregarding these rules may result in less than perfectdatabase design, but should not affect functionality.
               **********************************                 Examples of Normalized Tables               ********************************** Normalization Examples: Unnormalized table:    Student#   Advisor   Adv-Room  Class1   Class2   Class3    -------------------------------------------------------    1022       Jones      412      101-07   143-01   159-02    4123       Smith      216      201-01   211-02   214-01				
  1. First Normal Form: NO REPEATING GROUPS

    Tables should have only two dimensions. Since one student has several classes, these classes should be listed in a separate table. Fields Class1, Class2, & Class3 in the above record are indications of design trouble.

    Spreadsheets often use the third dimension, but tables should not. Another way to look at this problem: with a one-to-many relationship, do not put the one side and the many side in the same table. Instead, create another table in first normal form by eliminating the repeating group (Class#), as shown below:
           Student#   Advisor   Adv-Room    Class#       ---------------------------------------       1022      Jones      412       101-07       1022      Jones      412       143-01       1022      Jones      412       159-02       4123      Smith      216       201-01       4123      Smith      216       211-02       4123      Smith      216       214-01					

    Note the multiple Class# values for each Student# value in the above table. Class# is not functionally dependent on Student# (primary key), so this relationship is not in second normal form.

    The following two tables demonstrate second normal form:
        Students:   Student#    Advisor   Adv-Room                ------------------------------                1022        Jones       412                4123        Smith       216    Registration:   Student#    Class#                    ------------------                    1022        101-07                    1022        143-01                    1022        159-02                    4123        201-01                    4123        211-02                    4123        214-01					

    In the last example, Adv-Room (the advisor's office number) is functionally dependent on the Advisor attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown below:
        Students:   Student#    Advisor                -------------------                1022        Jones                4123        Smith    Faculty:    Name    Room    Dept                --------------------                Jones   412     42                Smith   216     42					
For additional information about designing a database, click the article number below to view the article in the Microsoft Knowledge Base:
234208 ACC2000: "Understanding Relational Database Design" Document Available in Download Center
"FoxPro 2 A Developer's Guide," Hamilton M. Ahlo Jr. et al., pages220-225, M & T Books, 1991

"Using Access for Windows," Roger Jennings, pages 799-800, QueCorporation, 1993
BCNF relational normal model normalize

Article ID: 100139 - Last Review: 12/04/2015 09:30:07 - Revision: 3.0

Microsoft Access 1.0 Standard Edition, Microsoft Access 1.1 Standard Edition, Microsoft Access 2.0 Standard Edition, Microsoft Access 97 Standard Edition

  • kbnosurvey kbarchive kbinfo kbusage KB100139