aliyazici.atilim.edu.tr

Download Report

Transcript aliyazici.atilim.edu.tr

Normalization of Database
Relations : Manual,
Automatic or None
Ali Yazici
Software Engineering Department
Atılım University, Ankara, Turkey
Agenda
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction
Normalization (briefly)
Normalizing XML
Auto normalization (earlier work)
Normalization algorithms
JMathNorm
Demo
Conclusions
References
20.07.2015
ICADIWT 2009 – London
Metropolitan University
2
1. Introduction



Stages of RDB design:
•
•
•
•
Determination of user requirements
Conceptual design
Logical design
Physical design
Conceptual design
•
Data Modelling with
•
•
UML
ER or EER
Logical design
•
•
Mapping of UML/EER to Relational Schema
Normalization
20.07.2015
ICADIWT 2009 – London
Metropolitan University
3
Normality: For and Against?
20.07.2015
ICADIWT 2009 – London
Metropolitan University
4
2. Normalization (briefly)




R={R1,R2,R3,…,Rn}
For each Ri designate a primary key (PK)
Define FD’s among Ri’s based on the PK
Normalize 1NF → 8NF !
•
•
•
1NF: A relational database table is in1NF if it is free of
repeating groups.
2NF: A 1NF table is in 2NF if and only if , given any
candidate key and any attribute that is not a constituent of a
candidate key, the non-key attribute depends upon the
whole of the candidate key rather than just a part of it.
3NF: A table is in 3NF if and only if both of the following
conditions hold:
•
•
The relation R (table) is in second normal form (2NF)
Every non-prime attribute of R is non-transitively dependent on
every key of R.
20.07.2015
ICADIWT 2009 – London
Metropolitan University
5
...Normalization (briefly)
•
•
•
•
BCNF: A table is in Boyce-Codd normal form if and only if , for
every one of its non-trivial functional dependencies X → Y, X is a
superkey—that is, X is either a candidate key or a superset
thereof.
4NF: A table is in 4NF if and only if , for every one of its non-trivial
multivalued dependencies X →→ Y, X is a superkey—that is, X is
either a candidate key or a superset thereof.
5NF: A table is said to be in the 5NF if and only if every join
dependency in it is implied by the candidate keys.
• A join dependency *{A, B, … Z} on R is implied by the
candidate key(s) of R if and only if each of A, B, …, Z is a
superkey for R
6NF: A table is in 6NF if and only if it satisfies no nontrivial join
dependencies at all — where, as before, a join dependency is
trivial if and only if at least one of the projections (possibly
U_projections) involved is taken over the set of all attributes of R
concerned.
20.07.2015
ICADIWT 2009 – London
Metropolitan University
6
...Cause of violations




Multi-valued attributes cause violation of
1NF
Partial dependency violates 2NF
Transitive dependency violates 3NF
Dependence of a prime attribute on a
non-prime attribute violates BCNF
20.07.2015
ICADIWT 2009 – London
Metropolitan University
7
…Automation



Normalization is a process of decomposing
each Ri into well-formed! subrelations in a
systematic way
Can we automate this process?
Commercial design tools (CASE tools) do not
have a complete solution
•
•
•
•
Oracle Designer
Oracle SQL Data Modeller
ER-Win
...
20.07.2015
ICADIWT 2009 – London
Metropolitan University
8
Why normalize?

If db schema is not normalized
• As a result of certain transactions, data is lost
• Database integrity is lost!
• Example:
• COMPANY(InvoNo, Item, CustId, CustName,
CustPhoto, CustNextKin, (CustProperty) )
• COMPANY is not normalized!
• Consequences!
20.07.2015
ICADIWT 2009 – London
Metropolitan University
9
..if not normalized!
InNo
Item
CustId Cust Cust Cust
Name Photo NextKin
0345
PC
108
Jane
0390
Laptop 109
John
0455
Printer 108
Jane
20.07.2015
ICADIWT 2009 – London
Metropolitan University
Property Property
1
2
10
Example

Sample DB Schema
• PURCHASE-ITEM = {orderNo,
•
partNo,partDescription, custNo, custName,
quantity, price}
FD1
{orderNo, partNo}→{partDescription, quantity, price}
• FD2
partNo → partDescription
• FD3
(Partial dependency)
custNo → custName (Transitive dependency)
20.07.2015
ICADIWT 2009 – London
Metropolitan University
11
...Example

Normalization into 2NF by decomposing
PURCHASE_ITEM into P1 and P2 as follows:
• P1={orderNo, partNo,quantity, custNo,
custName, price}
• P2={partNo,partDescription}

Normalization of P1 into 3NF
• P11={orderNo, partNo,quantity, custNo, price}
• P12={custNo, custName}
20.07.2015
ICADIWT 2009 – London
Metropolitan University
12
3. Normalizing XML

Relational normal forms will partly apply to XML codes to
eliminate by using keyref and key tags on XML schemas
•
•
•
•
•
Ambiguity
Minimize redundancy
Preserve data consistency
Allow for rational data maintenance
Some references:
•
•
•
W. Provost,
http://www.xml.com/pub/a/2002/11/13/normalizing.html
M. Arenas & L. Libkin, “A normal form for XML Documents”,
Proceedings of the 21st. ACM SIGMOD-SIGACT-SIGART
Symposium on Principles of Database Systems, M. (2002)
J.J.Lu & S. Renjen, “Normalizing XML schemas through
relational databases”, Proceedings of the 43rd ACM Southeast
Regional Conference, Vol.1 (2005) 220 - 221
20.07.2015
ICADIWT 2009 – London
Metropolitan University
13
4. Auto Normalization (Earlier
work)

Prolog based work (complex programming and data
structures)
•
•


Ceri, S. and Gottlob, G.: Normalization of Relations & Prolog,
Communications of the ACM, Vol.29, No.6 (1986)
Welzer, W., Rozman, I. and Gyrks, J.G.: Automated Normalization
Tool, Microprocessing & Microprogramming, Vol.25 (1989) 375-380.
A prototype tool
•
Du, H., and Werry, L.: Micro: A Normalization Tool for Relational
Database Designers, J. Of Network and Computer Applications, Vo.22
(1999) 215-232
Object Constraint Object
•
Akehurst, D.H., Bordbar, B., Rodgers, P.J., and Dalgliesh, N.T.G.:
Automatic Normalization via Metamodelling, Proc. of the ASE 2002
Workshop on Declarative Meta Programming to Support Software
Development (2002)
20.07.2015
ICADIWT 2009 – London
Metropolitan University
14
...Earlier work



A web-based tool
•
Kung, H-J. and Tung, H-L.: A Web-based Tool to
Enhance Teaching/Learning Database Normalization,
Proc. of the 2006 Southern Association for Information
Systems Conference (2006) 251-258.
Symbolic Approach
•
Yazici, A. and Karakaya, Z.: Normalizing Relational
Database Schemas Using Mathematica, LNCS,
Springer-Verlag, Vol.3992 (2006) 375-382
Dependency graph approach
•
Bahmani, A.H., Naghibzadeh and Bahmani, B,
CCECE 2008, Canada
20.07.2015
ICADIWT 2009 – London
Metropolitan University
15
5. Normalization Algorithms
20.07.2015
ICADIWT 2009 – London
Metropolitan University
16
ClosureX

Algorithm ClosureX: Determining X+ under FD
Given a DB schema R, a set of attributes X and FD set F
calculate X+
1. X+ := X;
2. repeat
tempX+ := X+
for each FD Y → Z in F do
if X+ covers Y then X+ := X+ U Z
until ( X+ = tempX+ )
20.07.2015
ICADIWT 2009 – London
Metropolitan University
17
FullClosureX


FullClosureX, X++, is yet another function similar to ClosureX which
returns all attributes that are fully dependent on X with respect to FD
set. This function is used to remove partial dependencies for
transforming a relation into 2NF.
Algorithm FullClosureX(X: attribute set; F: FD set): return closure in
tempX ;
1. tempX := X;
2. repeat
oldX := tempX;
for each FD Y → Z in F do
if Y subset tempX then if not(Y subset X)
then tempX := Z U tempX
else if Y = X then tempX := Z U tempX;
until (length(oldX) = length(tempX));
3. return tempX;
20.07.2015
ICADIWT 2009 – London
Metropolitan University
18
Some Definitions


Given a set of FD’s F, an attribute B is said to be extraneous in
X→A w.r.t. F if X = ZB, X ≠ Z, and A ε Z+.
A set of FD’s H is called a minimal cover for a set F if each
dependency in H as exactly one attribute on the right-hand
side, if no attribute on the left-hand side is extraneous, and if no
dependency in H can be derived from the other dependencies
in H.
•

Uniqueness problem!
Actually, the calculation of a minimal cover consists of
”Elimination of ExtraneousAttributes” followed by the
”Elimination of Redundant Dependencies”. Normalization
algorithms considered in this study makes use of the minimal
cover of a given set of FD’s. Moreover, they are computationally
efficient with at most O(d2) operations where d is the number of
FD’s in the schema.
20.07.2015
ICADIWT 2009 – London
Metropolitan University
19
2NF Algorithm

Algorithm 2NF(X: attribute set; F: FD set): return decomposed
relation set ; (insert, first, next, mark, and end are list functions)
1. P := ø;
2. G := MinimalCover(F);
3. F := First(G);
4. While f ≠ end(G) do begin
XP := fullClosureX (f.determinant, G)
mark(XP,R);
insert(XP,P);
repeat ff := f; f := next(f);
until ff.determinant ≠ f.determinant
5. for all attributes in X do
if attribute is not marked then insert (attribute,XP);
if XP ≠ ø then insert(XP,P);
6. Return P;
20.07.2015
ICADIWT 2009 – London
Metropolitan University
20
3NF Algorithm

Algorithm 3NF: Bernstein’s Synthesis Algorithm:Given a
DB schema R and FD set F, find 3NF decompositions D
1. Find a Minimal Cover G for F
2. For each a left-hand side X of a FD that appears in G,
create a relation schema in D with attributes {XU{A1}U{A2}U.
. .U{Ak}}, where X→A1, X→A2,...,X→Ak are the only
dependencies in G with X as the left-hand side (X is the
key)
3. Place any remaining attributes (that have not been placed
in any relation) in a single relation schema to ensure the
attribute preservation
20.07.2015
ICADIWT 2009 – London
Metropolitan University
21
BCNF Algorithm

Algorithm BCNF(R: attribute set in 3NF;
F: FD set): return Q in BCNF;
1. D := R;
2. while there is a left-hand side X of a
FD X → Y in F do
if X → Y violatesBCNF then
decomp R into Rm and Rn
Rm := D - Y ; and Rn := X U Y ;
3. return Q := Rm U Rn;
20.07.2015
ICADIWT 2009 – London
Metropolitan University
22
Properties of 3NF and BCNF
algorithms



Bernstein’s Synthesis Algorithm
• It guarantees dependency preserving property.
• However, it does not guarantee nonadditive join property.
• An algorithm exists which satisfy both (Elmasri and
Navathe, Fundamentals of Database Systems, 5th Ed.,
Addison-Wesley, (2007) 379-380).
• Algorithm requires the relation to be in 1NF (not 2NF) only!
BCNF Algorithm
• It guarantees nonadditive join property.
Both algorithms are quadratic (O(d2)) in the number of
attributes=degree of the relation.
20.07.2015
ICADIWT 2009 – London
Metropolitan University
23
6. JMathNorm

An interactive GUI tool
• IntelliJ Java for GUI and JLink library
• Mathematica JLink facility to derive the kernel
• Mathematica implementation of the abstract
normalization algorithms
20.07.2015
ICADIWT 2009 – London
Metropolitan University
24
Data Structures

List structure and basic built-in list
functions such as
•


Union[], Complement[], Intersection[], MemberQ[],
Extract[], Append[], Length[], and Sort[].
For a db schema R left/right side of the
FD’s are represented as lists.
For example:
•
R = {a, b, c, d, e, f}, (trivial) functional dependencies
for example are shown as two lists:
• FL = {{a,b}, {a,b},{a,b},{a,b}}
• FR = {c, d, e, f}
20.07.2015
ICADIWT 2009 – London
Metropolitan University
25
BCNF in Mathematica
BCNF[GL_,GR_,U_] := Module[{i,X,D,Q,DIF,REL},
D = U; Q = {};
For[i = 1,i <= Length[GL], i++,
If[Length[GL[[i]]]>1,X = Sort[GL[[i]]],X={GL[[i]]}];
flag = violatesBCNF[GL,GR,X,GR[[i]],U];
If[flag == 1,REL = Union[X,{GR[[i]]}];
Q = Union[Q,{REL}];
RC = Complement[U,{GR[[i]]}];
DIF = Intersection[U,RC];
Print["Q=",Q," RC=",RC," DIF=",DIF];
Q = Union[Q,{DIF}];];];
Return[Q];];
violatesBCNF[GL_,GR_,X_,Y_,U_] := Module[{XP},
XP = Sort[ClosureX[GL,GR,X]];
If[XP == Sort[U],flag = 0,flag = 1];
Return[flag];];
20.07.2015
ICADIWT 2009 – London
Metropolitan University
26
IntelliJ Main Menu
20.07.2015
ICADIWT 2009 – London
Metropolitan University
27
Launch Mathematica
20.07.2015
ICADIWT 2009 – London
Metropolitan University
28
Functional Dependency
submenu
20.07.2015
ICADIWT 2009 – London
Metropolitan University
29
Operations submenu
20.07.2015
ICADIWT 2009 – London
Metropolitan University
30
Normalization submenu
20.07.2015
ICADIWT 2009 – London
Metropolitan University
31
Results submenu
20.07.2015
ICADIWT 2009 – London
Metropolitan University
32
Demo
20.07.2015
ICADIWT 2009 – London
Metropolitan University
33
Defining FD’s
20.07.2015
ICADIWT 2009 – London
Metropolitan University
34
BCNF Decomposition
20.07.2015
ICADIWT 2009 – London
Metropolitan University
35
Output in table form
20.07.2015
ICADIWT 2009 – London
Metropolitan University
36
8. Conclusions




Symbolic power of Mathematica
JLink facility of Mathematica
OOP with Java to produce a GUI
Things to be done
•
•
•
•
•
•
Physical Schema Generation geared towards a
specific DBMS
Progressive approach between normal forms
Tree representation of the decomposition process
Input FD’s from a text file and parse
Make it interactive and intelligent
Web application!
20.07.2015
ICADIWT 2009 – London
Metropolitan University
37
9. References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Yazici, A. and Karakaya, Z.: Normalizing Relational Database Schemas Using Mathematica, LNCS,
Springer-Verlag, Vol.3992 (2006) 375-382.
Elmasri, R. and Navathe, S.B.: Fundamentals of Database Systems, 5th Ed., Addison Wesley (2007).
Kung, H. and Case, T.: Traditional and Alternative Database Normalization Techniques:Their Impacts on
IS/IT Students’ Perceptions and Performance, International Journal of Information Technology Education,
Vol.1, No.1 (2004) 53-76.
Ceri, S. and Gottlob, G.: Normalization of Relations and Prolog, Communications of the ACM, Vol.29, No.6
(1986)
Welzer, W., Rozman, I. and Gyrks, J.G.: Automated Normalization Tool, Microprocessing and
Microprogramming, Vol.25 (1989) 375-380.
Akehurst, D.H., Bordbar, B., Rodgers, P.J., and Dalgliesh, N.T.G.: Automatic Normalization via
Metamodelling, Proc. of the ASE 2002 Workshop on Declarative Meta Programming to Support Software
Development (2002)
Kung, H-J. and Tung, H-L.: A Web-based Tool to Enhance Teaching/Learning Database Normalization,
Proc. of the 2006 Southern Association for Information Systems Conference (2006) 251-258.
Wolfram, S.: The Mathematica Book, 4th Ed., Cambridge University Press (1999).
Du, H. and Wery, L.: Micro: A Normalization Tool for Relational Database Designers, Journal of Network
and Computer Applications, Vol.22 (1999) 215-232.
Manning, M.V.: Database Design, Application Development and Administration, 2nd. Ed., McGraw-Hill
(2004).
Diederich, J. and Milton, J.: New Methods and Fast Algorithms for Database Normalization, ACM Trans.
on Database Systems, Vol.13, No.3(1988) 339-365.
Ozharahan, E.: Database Management: Concepts, Design and Practice, Prentice Hall (1990).
Bernstein, P.A.: Synthesizing Third Norm Relations from Functional Dependencies, ACM Trans. on
Database Systems, Vol.1, No.4 (1976) 277-298.
20.07.2015
ICADIWT 2009 – London
Metropolitan University
38

20.07.2015
ICADIWT 2009 – London
Metropolitan University
Thanks!
39