












Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Hand written notes for database management system
Typology: Schemes and Mind Maps
1 / 20
This page cannot be seen from the preview
Don't miss anything!
Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a database. The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF). The identification of various types of update anomalies such as insertion, deletion, and modification anomalies can be found when tables that break the rules of 1NF, 2NF, and 3NF and they are likely to contain redundant data and suffer from update anomalies. Normalization is a technique for producing a set of tables with desirable properties that support the requirements of a user or company. Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file storage space required by base tables. Take a look at the following example:
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGrade CourseNo CrsDesc S1 SEATTLE JUN O1 FALL 2006 3.5 C1 DB S1 SEATTLE JUN O2 FALL 2006 3.3 C2 VB S2 BOTHELL JUN O3 SPRING 2007 3.1 C3 OO S2 BOTHELL JUN O2 FALL 2006 3.4 C2 VB
The insertion anomaly : Occurs when extra data beyond the desired data must be added to the database. For example, to insert a course (CourseNo), it is necessary to know a student (StdSSN) and offering (OfferNo) because the combination of StdSSN and OfferNo is the primary key. Remember that a row cannot exist with NULL values for part of its primary key. The update anomaly : Occurs when it is necessary to change multiple rows to modify ONLY a single fact. For example, if we change the StdClass of student S1 (JUN), two rows, row 1 and 2 must be changed. If S1 was enrolled in 10 classes, 10 rows must be changed. The deletion anomaly : Occurs whenever deleting a row inadvertently causes other data to be deleted. For example, if we delete the enrollment (EnrGrade) of S2 in O3 (third row), we lose the information about offering O3 and course C3 because these values are unique to the table (cell). Furthermore O3 is a primary key.
RECAP
Problems associated with data redundancy are illustrated by comparing the Staff and Branch tables with the StaffBranch table. Tables that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies. See the following Figure for an example of a table with redundant data called StaffBranch. There are two main types of insertion anomalies, which we illustrate using this table.
Insertion anomalies
Deletion anomalies
If we delete a record from the StaffBranch table that represents the last member of staff located at a branch, the details about that branch are also lost from the database. For example, if we delete the record for staff Art Peters (S0415) from the StaffBranch table, the details relating to branch B003 are lost from the database. The design of the tables that separate the Staff and Branch table avoids this problem because branch records are stored separately from staff records and only the column branchNo relates the two tables. If we delete the record for staff Art Peters (S0415) from the Staff table, the details on branch B003 in the Branch table remain unaffected.
Modification anomalies
If we want to change the value of one of the columns of a particular branch in the StaffBranch table, for example the telephone number for branch B001, we must update the records of all staff located at that branch (row 1 and 2). If this modification is not carried out on all the appropriate records of the StaffBranch table, the database will become inconsistent. In this example, branch B001 would have different telephone numbers in different staff records. The above examples illustrate that the Staff and Branch tables have more desirable properties than the StaffBranch table.
AIRCRAFT_1 Table
If we use the AIRCRAFT_1 table as shown in the above Figure, a change in hourly rental rates (AC_RENT_CHG) for the Cessna 172 Skyhawk must be made four times; if we forget to change just one of those rates, we have a data integrity problem. How much better it would be to have critical data in only one place. Then, if a change must be made, it need be made only once. In contrast, table structures are good when they preclude the possibility of producing uncontrolled data redundancies. We can produce such a happy circumstance by splitting the AIRCRAFT_1 table as shown in the following two Figures, connecting the two resulting tables through the AIRCRAFT_1 table's foreign key MOD_CODE.
Note that a rental rate change need be made in only one place, a description is given in only one place, and so on. No more data update and delete anomalies and no more data integrity problems. The relational schema in the following Figure shows how the two tables are related.
The First normal form (1NF)
A table in which the intersection of every column and record contains only one value. It prohibits nesting or repeating groups in table. The intersection must be atomic. For example the telNos column contains multiple values.
The Second normal form (2NF)
2NF ONLY applies to tables with composite primary keys (more than one primary key). A table that is in 1NF and in which the values of each non-primary-key column can be worked out from the values in ALL the columns that make up the primary key.
X (functionally) determines Y or Y is functionally dependent on X. X: left-hand-side (LHS) or determinant. For each X value, there is at most one Y value. Similar to candidate keys.
For example (take note regarding the arrow flow!):
StdSSN StdCity StdClass OfferNo OffTerm OffYear CourseNo CrsDesc EnrGrade
The FDs are (another notation used to write FDs):
StdSSN → StdCity, StdClass OfferNo → OffTerm, OffYear, CourseNo, CrsDesc CourseNo → CrsDesc StdSSN, OfferNo → EnrGrade
Formal definition of 2NF is a table that is in 1NF and every non-primary-key column is fully functional dependent on the primary key. Full functional dependency indicates that if A and B are columns of a table, B is fully dependent on A if B is functionally dependent on A but not on any proper subset of A. Consider the following examples.
Identifying Functional Dependencies
Database designers must be able to identify FD when collecting database requirements. In problem narratives, some FD can be identified by statements about uniqueness. For example a user may state that each course offering has a unique offering number along with the year and term of the offering. From this statement, the designer should assert that OfferNo → OffYear and OffTerm. You can also identify functional dependencies in a table design resulting from the conversion of an ERD. FD would be asserted for each unique column (PK or other candidate key) with the unique column as the LHS and other columns in the table on the RHS. Although FD derived from statements about 1-M relationships can be identify, FD derived from statements about 1-M relationship can be confusing to identify. When you see a statement about a 1-M relationship, the FD is derived from the child-to-parent direction, not the parent-to-child direction.
You should not write:
StdSSN, Email → StdCity, StdClass
This is because these FDs imply that the combination of StdSSN and Email is the determinant. Thus you should write FDs so that the LHS does not contain unneeded columns. The prohibition against unneeded columns for determinants is the same as the prohibition against unneeded columns in candidate keys. Both determinants and candidate keys must be minimal. A FD cannot be proven to exist by examining the rows of a table. However you can falsify a FD (i.e. prove that a FD does not exist) by examining the contents of a table. For example, in the university database, we can conclude that StdClass does not determine StdCity because there are two rows with the same value for StdClass but a different value for StdCity. Thus, it is sometimes helpful to examine sample rows in a table to eliminate potential functional dependencies. There are several commercial database design tools that automate the process of eliminating dependencies through examination of sample rows. Ultimately, the database designer must make the final decision about FDs that exist in a table.
Third normal form (3NF)
A table that is in 1NF and 2NF and in which all non-primary-key column can be worked out from only the primary key column(s) and no other columns. At this level, the combined definition of 2NF and 3NF is a table is in 3NF if each non-key column depends on all candidate keys, whole candidate keys and nothing but candidate keys. For 2NF we should remove partial dependency and for 3NF we should remove transitive dependency. For example the StaffBranch table is not in 3NF.
The formal definition of 3NF is a table that is in 1NF and 2NF and in which no non-primary- key column is transitively dependent on the primary key. For example, consider a table with A, B, and C. If B is functional dependent on A (A → B) and C is functional dependent on B (B → C), then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). If a transitive dependency exists on the primary key, the table is not in 3NF.
First normal form (1NF) is a table in which the intersection of every column and record contains only one value.
Second normal form applies only to tables with composite primary keys, that is, tables with a primary key composed of two or more columns. A 1NF table with a single column primary key is automatically in at least 2NF. A second normal form (2NF) is a table that is already in 1NF and in which the values in each non-primary-key column can be worked out from the values in all the columns that makes up the primary key. A table in 1NF can be converted into 2NF by removing the columns that can be worked out from only part of the primary key. These columns are placed in a new table along with a copy of the part of the primary key that they can be worked out from.
Second normal form (2NF) is a table that is already in 1NF and in which the values in each non- primary-key column can only be worked out from the values in all the columns that make up the primary key.
The formal definition of second normal form (2NF) is a table that is in first normal form and every non-primary-key column is fully functionally dependent on the primary key. Full functional dependency indicates that if A and B are columns of a table, B is fully functionally dependent on A, if B is not dependent on any subset of A. If B is dependent on a subset of A, this is referred to as a partial dependency. If a partial dependency exists on the primary key, the table is not in 2NF. The partial dependency must be removed for a table to achieve 2NF.
Third normal form (3NF) is a table that is already in 1NF and 2NF, and in which the values in all non-primary-key columns can be worked out from only the primary key (or candidate key) column(s) and no other columns.
The formal definition for third normal form (3NF) is a table that is in first and second normal forms and in which no non-primary-key column is transitively dependent on the primary key. Transitive dependency is a type of functional dependency that occurs when a particular type of relationship holds between columns of a table. For example, consider a table with columns A, B, and C. If B is functionally dependent on A (A → B) and C is functionally dependent on B (B → C), then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). If a transitive dependency exists on the primary key, the table is not in 3NF. The transitive dependency must be removed for a table to achieve 3NF.
branchNo branchAddress telNos B001 8 Jefferson Way, Portland, OR 97201 503-555-3618, 503-555-2727, 503-555- B002 City Center Plaza, Seattle, WA 98122 206-555-6756, 206-555- B003 14 – 8th Avenue, New York, NY 10012 212-371- B004 16 – 14th Avenue, Seattle, WA 98128 206-555-3131, 206-555-
(a) Why is this table not in 1NF? (b) Describe and illustrate the process of normalizing the data shown in this table to third normal form (3NF). (c) Identify the primary, alternate and foreign keys in your 3NF relations.
Answer:
Branch
branchNo branchAddress telNos B001 8 Jefferson Way, Portland, OR 97201 503-555-3618, 503-555-2727, 503-555- B002 City Center Plaza, Seattle, WA 98122 206-555-6756, 206-555- B003 14 – 8th Avenue, New York, NY 10012 212-371- B004 16 – 14th Avenue, Seattle, WA 98128 206-555-3131, 206-555-
Primary key Alternate key
More than one value, so not in 1NF
staffNo branchNo branchAddress name position hoursPerWeek S4555 B002 City Center Plaza, Seattle, WA 98122 Ellen Layman Assistant 16 S4555 B004 16 – 14th Avenue, Seattle, WA 98128 Ellen Layman Assistant 9 S4612 B002 City Center Plaza, Seattle, WA 98122 Dave Sinclair Assistant 14 S4612 B004 16 – 14th Avenue, Seattle, WA 98128 Dave Sinclair Assistant 10
(a) Why is this table not in 2NF? (b) Describe and illustrate the process of normalizing the data shown in this table to third normal form (3NF). (c) Identify the primary, (alternate) and foreign keys in your 3NF relations.
Answer:
TempStaffAllocation staffNo branchNo branchAddress name position hoursPerWeek S4555 B002 City Center Plaza, Seattle, WA 98122 Ellen Layman Assistant 16 S4555 B004 16 – 14th Avenue, Seattle, WA 98128 Ellen Layman Assistant 9 S4612 B002 City Center Plaza, Seattle, WA 98122 Dave Sinclair Assistant 14 S4612 B004 16 – 14th Avenue, Seattle, WA 98128 Dave Sinclair Assistant 10
Composite primary key
Values in branchAddress column can be worked out from only branchNo, so table not in 2NF
Values in name and position columns can be worked out from only staffNo, so table not in 2NF
Values in hoursPerWeek column can only be worked out from staffNo and branchNo
TempStaffAllocation
Composite primary key
staffNo branchNo branchAddress name position hoursPerWeek S4555 B002 City Center Plaza, Seattle, WA 98122 Ellen Layman Assistant 16 S4555 B004 16 – 14th Avenue, Seattle, WA 98128 Ellen Layman Assistant 9 S4612 B002 City Center Plaza, Seattle, WA 98122 Dave Sinclair Assistant 14 S4612 B004 16 – 14th Avenue, Seattle, WA 98128 Dave Sinclair Assistant 10
Branch TempStaff branchNo branchAddress staffNo name position B002 City Center Plaza, Seattle, WA 98122 S4555 Ellen Layman Assistant B004 16 – 14th Avenue, Seattle, WA 98128 S4612 Dave Sinclair Assistant
Take copy of staffNo
Take copy of branchNo
Remove name column to new table
Remove branchAddress column to new table
Remove position column to new table
TempStaffAllocation
Becomes primary key
Becomes primary key
staffNo branchNo hoursPerWeek S4555 B002 16 S4555 B004 9 S4612 B002 14 S4612 B004 10
Composite primary key
Becomes foreign key
Becomes foreign key
BranchManager branchNo branchAddress telNo mgrStaffNo name B001 8 Jefferson Way, Portland, OR 97201 503-555-3618 S1500 Tom Daniels B002 City Center Plaza, Seattle, WA 98122 206-555-6756 S0010 Mary Martinez B003 14 – 8th Avenue, New York, NY 10012 212-371-3000 S0415 Art Peters B004 16 – 14th Avenue, Seattle, WA 98128 206-555-3131 S2250 Sally Stern
Branch ManagerStaff
branchNo branchAddress telNo mgrStaffNo mgrStaffNo name B001 8 Jefferson Way, Portland, OR 97201 503-555-3618 S1500 S1500 Tom Daniels B002 City Center Plaza, Seattle, WA 98122 206-555-6756 S0010 S0010 Mary Martinez B003 14 – 8th Avenue, New York, NY 10012 212-371-3000 S0415 S0415 Art Peters B004 16 – 14th Avenue, Seattle, WA 98128 206-555-3131 S2250 S2250 Sally Stern
Remove name column to new table
BranchManger table is renamed Branch
Primary key Take copy of mgrStaffNo
Becomes foreign key
Primary key Primary key
More Questions and Answers
Normalization is the process for assigning attributes to entities. Properly executed, the normalization process eliminates uncontrolled data redundancies, thus eliminating the data anomalies and the data integrity problems that are produced by such redundancies. Normalization does not eliminate data redundancy; instead, it produces the carefully controlled redundancy that lets us properly link database tables.
A table is in 1NF when all the key attributes are defined (no repeating groups in the table) and when all remaining attributes are dependent on the primary key. However, a table in 1NF still may contain partial dependencies, i.e., dependencies based on only part of the primary key.
A table is in 2NF when it is in 1NF and it includes no partial dependencies. However, a table in 2NF may still have transitive dependencies, i.e., dependencies based on attributes that are not part of the primary key.
A table is in 3NF when it is in 2NF and it contains no transitive dependencies.
A table is in Boyce-Codd Normal Form (BCNF) when it is in 3NF and every determinant in the table is a candidate key. For example, if the table is in 3NF and it contains a nonprime attribute that determines a prime attribute, the BCNF requirements are not met. This description clearly yields the following conclusions:
If a table is in 3NF and it contains only one candidate key, 3NF and BCNF are equivalent. BCNF can be violated only if the table contains more than one candidate key. Putting it another way, there is no way that the BCNF requirement can be violated if there is only one candidate key.
Figure: The Dependency Diagram
a. Identify and discuss each of the indicated dependencies.
C1 → C2 represents a partial dependency, because C2 depends only on C1, rather than on the entire primary key composed of C1 and C3. C4 → C5 represents a transitive dependency, because C5 depends on an attribute (C4) that is not part of a primary key. C1, C3 → C2, C4, C5 represents a functional dependency, because C2, C4, and C depend on the primary key composed of C1 and C3.