Transcript - xPrEsS123

What is normalization ?
 Proposed by Codd in 1972
 Takes a relation through a series of steps to certify whether it satisfies a
certain normal form
 Initially Codd proposed three normal forms
 Boyce-Codd normal form is introduced by Boyce and Codd
 Based on functional dependencies between attributes of a relation
 Later 4th and 5th normal forms were introduced based on multi-valued
dependencies and join dependencies
 Normalization is the process of efficiently organizing
data in a database
 There are two goals of the normalization process:


Eliminating redundant data
 For example, storing the same data in more than one table
Ensuring data dependencies make sense
 Only storing related data in a table
 Reduce the amount of space a database consumes and
ensure that data is logically stored
 Through normalization we want to design for our
relational database a set of files that





Contain all the data necessary for the purposes that the
database is to serve
Have as little redundancy as possible
Accommodate multiple values for types of data that require
them
Permit efficient updates of the data in the database
Avoid the danger of losing data unknowingly
 Normalization Avoids




Duplication of Data
 The same data is listed in multiple lines of the database
Insert Anomaly
 A record about an entity cannot be inserted into the table
without first inserting information about another entity –
Cannot enter a customer without a sales order
Delete Anomaly
 A record cannot be deleted without deleting a record about
a related entity. Cannot delete a sales order without
deleting all of the customer’s information.
Update Anomaly
 Cannot update information without changing information
in many places. To update customer information, it must be
updated for each sales order the customer has placed
The Normal Forms
 Guidelines for ensuring that databases are normalized
 Numbered from
1
through
5
1NF, 2NF, 3NF, 4NF and 5NF
 In practical applications,



We often see first three normal forms
Occasionally we can see 4th normal form
And 5th normal form is rarely seen
 Normalization is a three stage process –
 After the first stage, the data is said to be in first normal
form
 After the second, it is in second normal form
 After the third, it is in third normal form
Before Normalization
 Begin with a list of all of the fields that must appear in the
database. Think of this as one big table.
 Do not include computed fields
 One place to begin getting this information is from a
printed document used by the system.
 Additional attributes besides those for the entities
described on the document can be added to the database.
 ORDERS
SalesOrderNo,
Date,
CustomerNo,
CustomerName,
CustomerAdd,
ClerkNo,
ClerkName,
ItemNo,
Description,
Qty,
UnitPrice
Some definitions:
 Functional Dependency
 The value of one attribute in a table is determined entirely by
the value of the primary key
 Partial Dependency
 A type of functional dependency where an attribute is
functionally dependent on only part of the primary key
(primary key must be a composite key).
 Transitive Dependency
 A type of functional dependency where an attribute is
functionally dependent on an attribute other than the primary
key. Thus its value is only indirectly determined by the
primary key.
First Normal Form
 To disallow multi-valued attributes, composite
attributes and complex attributes
 Domain of an attribute must include only atomic
values (simple and indivisible)
 Disallows ‘relations within relations’ or ‘relations
as attribute values within tuples’
Example 1:
DNAME
DNUMBER
DMGRENO
DLOCATIONS
Research
5
333445555
{Bangalore, New Delhi, Hyderabad}
Administration
4
987654321
{Chennai}
Headquarters
1
888665555
{Hyderabad}
DLOCATION is
not an atomic
attribute
 The domain of DLOCATIONS contain atomic values
 The domain of DLOCATIONS contain sets of values
(nonatomic)
 Techniques to achieve 1NF
1.
Remove the attribute DLOCATIONS that violates 1NF and
place it in a separate relation DEPT_LOCATIONS
DEPARTMENT
DNAME
DNUMBER
Research
5
333445555
Administration
4
987654321
Headquarters
1
888665555
DNUMBER
DEPT_LOCATIONS
DMGRENO
DLOCATION
5
Bangalore
5
New Delhi
5
Hyderabad
4
Chennai
1
Hyderabad
2.
Expand the key so that there will be separate tuple in the
original DEPARTMENT for each location
disadvantage : introduces redundancy in relation
DNAME
DNUMBER
DMGRENO
DLOCATIONS
Research
5
333445555
Bangalore
Research
5
333445555
New Delhi
Research
5
333445555
Hyderabad
Administration
4
987654321
Chennai
Headquarters
1
888665555
Hyderabad
3.
If the maximum number of values is known for the
attribute, replace the attribute by number of atomic
attributes
disadvantage : introduces null values
DNAME
DNUMBER
DMGRENO
DLOCATIO
N1
DLOCATIO
N2
DLOCATIO
N3
Research
5
333445555
Bangalore
New Delhi
Hyderabad
Administration
4
987654321
Chennai
Headquarters
1
888665555
Hyderabad
Example 2 :
ENO
ENAME
PROJS
PNUMBER
HOURS
EMP_PROJ( ENO, ENAME, {PROJS ( PNUMBER,
HOURS ) } )
ENO is the primary key and PNUMBER is partial key of
relation
Example 3 :
PERSON ( IDNO, ENAME, ADDRESS, AGE,
PROFESSION, {CAR_LIC}, {PHONE} )
Second Normal Form
 The relation should be in first normal form
 Based on full functional dependency
 A functional dependency X Y is a full functional
dependency if removal of any attribute A from X
means that the dependency does not hold any more
 A є X, (X - {A}) does not functional determine Y
 A partial dependency X Y is a partial dependency if
some attribute A є X, (X – {A})  Y
Example 1 :
ENO
PNO
HOURS
ENAME
PNAME
PLOCATION
 A relation R is in 2NF if every non-prime attribute A in
R is fully functionally dependent on the primary key of
R
 If primary key contains one attribute, the test need not
be applied at all
ENAME
ENO DOB
ADDRESS
DNUMBER DNAME DMERGENO
 If the relation is not in 2NF, it can be ‘second
normalized’ in to a number of 2NF relations in which
non-prime attributes are associated only with the part
of the primary key on which they are fully functionally
dependent
ENO
PNO
HOURS
ENO PNUMBER HOURS
ENAME
ENO ENAME
PNAME
PLOCATION
PNUMBER PNAME
PLOCATION
Third Normal Form
 Relation should be in second normal form
 Based on transitive dependency
 A functional dependency A  Y in a relation R is a
transitive dependency if there is a set of attributes Z
that is neither a candidate key nor a subset of any key
of R, and X Z and Z  Y hold
ENAME
ENO DOB
ADDRESS
DNUMBER DNAME DMERGENO
 The dependency ENO  DNUMBER and
DNUMBER  DMGRENO hold and DNUMBER is
neither a key nor a subset of a key
 A relation is in 3NF if it satisfies 2NF and no non-
prime attribute of R is transitively dependent on the
primary key
ENAME
ENAME
ENO DOB
ENO DOB
ADDRESS
ADDRESS
DNUMBER DNAME DMERGENO
DNUMBER
DNUMBER
DNAME DMGRENO
 In what normal form this relation is ????
GRADES (StudentID, Course#, Semester#, Grade)
 Suppose you are given a relation R = (A,B,C,D,E) with the
following functional dependencies: {CE ! D,D ! B,C ! A}.
a.
Find all candidate keys.
b.
Identify the best normal form that R satisfies
( 1NF , 2NF , 3NF )
 What is normalization ????

A relational database is basically composed of tables that contain
related data. The process of organizing this data is called as
normalization
 What is 1 NF (Normal Form)????

The domain of attribute must include only atomic (simple,
indivisible) values.
 What is 2NF????

A relation schema R is in 2NF if it is in 1NF and every non-prime
attribute A in R is fully functionally dependent on primary key.
 What is 3NF?


A relation schema R is in 3NF if it is in 2NF and for every FD X A
either of the following is true
 X is a Super-key of R.
 A is a prime attribute of R.
In other words, if every non prime attribute is non-transitively
dependent on primary key.
NORMAL TEST
FORM
REMEDY
1NF
Relation should have no non-atomic
attributes or nested relations
Form new relation for each non-atomic
attribute or nested relation
2NF
For relations where primary key
contains multiple attributes, no nonkey attribute should be functionally
dependent on a part of the primary
key
Decompose and set up a new relation
for each partial key with its dependent
attribute(s). Make sure to keep relation
with the original primary key and any
attributes that are fully functional
dependent on it
3NF
Relation should not have a non-key
attribute functionally determined by
another non-key attribute. There
should be no transitive dependency
of a non-key attribute on the primary
key
Decompose and set up a relation that
includes the non-key attribute(s) that
functionally determine other non-key
attribute(s)
NID
Name Age ContactDetails
Ward WardInCharge
WardLocation
Address TelePhone
Conceptual Design :
Patient ( NID , Name , Age , {CotactDetails ( Address , {Telephone})}
, Ward , WardInCharge, WardLocation)
Convert this relation into 1st Normal Form, 2nd Normal Form, 3rd
Normal Form
Boyce-Codd Normal Form
 BCNF
 Simpler form of 3NF
 Stricter than 3NF
 Every relation in BCNF is also in 3NF
 Relation in 3NF is not necessarily in BCNF
 A relation schema R is in BCNF if whenever a nontrivial functional dependency X  A holds in
relation R, then X is a super key of R
  is trivial (i.e.,   )
 is a superkey for R
 

 R = (A, B, C)
F = {A  B
B  C}
Key = {A}
 R is not in BCNF
 Decomposition R1 = (A, B), R2 = (B, C)
 R1 and R2 in BCNF
 Lossless-join decomposition
 Dependency preserving
Example
PROPERTY_ID
LOCATION
PROVINCE
AREA
PRICE
TAX_RATE
Example :
Patient No
Patient
Name
Appointme
nt Id
Time
Doctor
1
John
0
09:00
Zorro
2
Kerr
0
09:00
Killer
3
Adam
1
10:00
Zorro
4
Robert
0
13:00
Killer
5
Zane
1
14:00
Zorro
 Patno --> PatName
Patno,appNo --> Time,doctor
Time --> appNo
Example :
Grade_report (
StudNo, StudName,
Grade (
Major,
Advisor,
Grade (
CourseNo,
Ctitle,
InstrucName,
InstructLocn,
Grade
)
)
)