Chapter 3 - Personal.kent.edu

Download Report

Transcript Chapter 3 - Personal.kent.edu

Database Processing

Chapter 3

Normalization

David M. Kroenke DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1

Chapter Premise

• We have one or more tables of data • The data is to be stored in a new database • QUESTION: Should the data be stored as received, or should it be transformed for storage?

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-2

Important Relational Model Terms

• Entity • Relation • Functional Dependency • Determinant • Candidate Key • Composite Key • Primary Key • Surrogate Key • Foreign Key • Referential integrity constraint • Normal Form DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-3

Entity

• An

entity

is some identifiable thing that users want to track: – Customers – Computers – Sales • Rows contain data about an entity • Columns contain data about attributes of the entity DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-4

Relation

• Relational DBMS products store data about entities in relations, which are a special type of table • A

relation

is a two-dimensional table that has the following characteristics: – All entries in a column are of the same kind – Each column has a unique name – Cells of the table hold a single value – The order of the columns and rows is unimportant – No two rows may be identical DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-5

A Relation

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-6

Tables That Are Not Relations: Multiple Entries per Cell

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-7

Functional Dependency

• A

functional dependency

occurs when the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s):

StudentID

StudentName StudentID

(DormName, DormRoom, Fee)

• The attribute on the left side of the functional dependency is called the

determinant

• Functional dependencies may be based on equations:

ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice)

ExtendedPrice

• Function dependencies are not equations!

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-8

Functional Dependencies Are Not Equations

ObjectColor

ObjectColor

ObjectColor

Weight Shape (Weight, Shape)

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-9

Composite Determinants

Composite determinant

: A determinant of a functional dependency that consists of more than one attribute

(StudentName, ClassName)

(Grade)

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-10

Functional Dependency Rules

• If A  (B, C), then A  B and A  C • If (A,B)  C, then neither A nor B determines C by itself DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-11

Functional Dependencies in the SKU_DATA Table

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-12

Functional Dependencies in the SKU_DATA Table

SKU

(SKU_Description, Department, Buyer) SKU_Description

(SKU, Department, Buyer) Buyer

Department

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-13

What Makes Determinant Values Unique?

• A determinant is unique in a relation if, and only if, it determines every other column in the relation • You cannot find the determinants of all functional dependencies simply by looking for unique values in one column: – Data set limitations DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-14

Keys

• A

key

is a combination of one or more columns that is used to identify rows in a relation • A

composite key

is a key that consists of two or more columns DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-15

Candidate and Primary Keys

• A

candidate key

is a key that determines all of the other columns in a relation • A

primary key

is a candidate key selected as the primary means of identifying rows in a relation: – There is one and only one primary key per relation – The primary key may be a composite key – The ideal primary key is short, numeric and never changes DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-16

Surrogate Keys

• A

surrogate key

as an artificial column added to a relation to serve as a primary key: – DBMS supplied – Short, numeric and never changes – an ideal primary key!

– Has artificial values that are meaningless to users – Normally hidden in forms and reports DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-17

Surrogate Keys Relation Descriptions:

• RENTAL_PROPERTY without surrogate key

:

RENTAL_PROPERTY (Street, City, State/Province, Zip/PostalCode, Rental_Rate)

• RENTAL_PROPERTY with surrogate key:

RENTAL_PROPERTY (PropertyID, Street, City, State/Province, Zip/PostalCode, Rental_Rate)

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-18

Foreign Keys

• A

foreign key

is the primary key of one relation that is placed in another relation to form a link between the relations: – A foreign key can be a single column or a composite key – The term refers to the fact that key values are

foreign

to the relation in which they appear as foreign key values DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-19

Foreign Keys

Relation Descriptions:

DEPARTMENT (DeptName, BudgetCode, ManagerName) EMPLOYEE (EmpNumber, EmpName, DeptName)

Or

DEPARTMENT (DeptName, BudgetCode, ManagerName) EMPLOYEE (EmpNumber, EmpName, DeptName)

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-20

The Referential Integrity Constraint

• A

referential integrity constraint

is a statement that limits the values of the foreign key to those already existing as primary key values in the corresponding relation DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-21

Foreign Key with a Referential Integrity Constraint

SKU_DATA (SKU, SKU_Description, Department, Buyer) ORDER_ITEM (OrderNumber, SKU, Quantity, Price, ExtendedPrice) Referential Integrity Constraint: ORDER_ITEM.SKU must first exist in SKU_DATA.SKU

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-22

Modification Anomalies

An anomaly is an undesirable consequence of data modification – Deletion anomaly – Insertion anomaly – Update anomaly DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-23

Normal Forms

• Relations are categorized as a

normal form

based on which modification anomalies or other problems that they are subject to: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-24

Normal Forms

• • • •

1NF

– A table that qualifies as a relation is in 1NF

2NF

– A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key

3NF

– A relation is in 3NF if it is in 2NF and has no transitive dependencies

Boyce-Codd Normal Form (BCNF)

– A relation is in BCNF if every determinant is a candidate key

“I swear to construct my tables so that all nonkey columns are dependent on the key, the whole key and nothing but the key, so help me Codd.”

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-25

First Normal Form (1NF)

• To be in

First Normal Form (1NF)

a relation must have only single-valued attributes -- neither repeating groups nor arrays are permitted DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-26

Violation of 1NF

PlayerID 123-45 6789 234-56 7890 Name Alomar, R Alomar, S Team Cleveland, New York Cleveland, Chicago Key: PlayerID Determinants: PlayerID  Name PlayerID, Team  AtBats, Hits DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall AtBats Hits 200, 120 60, 150 50, 58 17, 40 3-27

Second Normal Form (2NF)

• To be in

Second Normal Form (2NF)

the relation must be in 1NF and each nonkey attribute must be dependent on the whole key (not a subset of the key) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-28

Violation of 2NF

ItemNo CustomerID 12 34 57 679 Quantity 25 3 CreditRtg OK Poor Key: ItemNo, CustomerID Determinants: ItemNo, CustomerID  Quantity CustomerID  CreditRtg DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-29

Third Normal Form (3NF)

• To be in

Third Normal Form (3NF)

the relation must be in 2NF and no transitive dependencies may exist within the relation.

• A

transitive dependency

is when an attribute is indirectly functionally dependent on the key (that is, the dependency is through another nonkey attribute) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-30

Violation of 3NF

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-31

Boyce-Codd Normal Form (BCNF)

To be in

Boyce-Codd Normal Form (BCNF)

the relation must be in 3NF and every determinant must be a candidate key.

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-32

Steps for BCNF

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-33

Violation of BCNF

Client Alpha Alpha Beta Omega ProbType Marketing Production Marketing Marketing Consultant Gomez Raginski Gomez Taylor Key: Client, ProbType Candidate Key: Client, Consultant Determinants: Consultant  ProbType * No two consultants have same name * Consultant specializes in just one problem type * Consultant can be assigned to multiple clients DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-34