Normalization DB Tuning CS186 Final Review Session Plan • • • • Functional Dependencies, Rules of Inference Candidate Keys Normal forms (BCNF/3NF) Decomposition – – – – BCNF Lossless Dependency preserving 3NF + Minimal cover • DB Tuning.

Download Report

Transcript Normalization DB Tuning CS186 Final Review Session Plan • • • • Functional Dependencies, Rules of Inference Candidate Keys Normal forms (BCNF/3NF) Decomposition – – – – BCNF Lossless Dependency preserving 3NF + Minimal cover • DB Tuning.

Normalization DB Tuning

CS186 Final Review Session

Plan

• Functional Dependencies, Rules of Inference • Candidate Keys • Normal forms (BCNF/3NF) • Decomposition – BCNF – Lossless – Dependency preserving – 3NF + Minimal cover • DB Tuning

Functional Dependencies

• A functional dependency X  Y holds over relation schema R if, for every allowable instance

r t1

r, t2

r,

implies p

X Y

( (

t1 t1

) = ) = p

X Y

( (

t2 t2

) )

(where t1 and t2 are tuples;X and Y are sets of attributes)

of R: • In other words: X  Y means Given any two tuples in

r

, if the X values are the same, then the Y values must also be the same. (but not vice versa!!) • Can read “  ” as “determines”

Rules of Inference

• Armstrong’s Axioms – – – (X, Y, Z are sets of attributes):

Reflexivity

: If X  Y, then X 

Augmentation

: If X 

Transitivity

: If X  Y, then XZ Y and Y  Y  YZ for any Z Z, then X  Z • Some additional rules (that follow from AA): – –

Union

: If X  Y and X 

Decomposition

: If X  Z, then X YZ, then X   YZ Y and X  Z

Candidate Keys

• R = {A, B, C, D, E} • F = { B  CD, D  E, B  A, E  C, AD  B } • Is B  E in F + ?

Is AD a key for R? AD + = AD

B +

B + B + B +

= B

AD + = ABD and B is a key, so Yes!

= BCD • Is AD a

candidate key for R?

= BCDA A + = A, D+ = DEC … A,D not keys, so Yes!

= BCDAE … Yes! and B is a key for R too!

• Is D a key for R?

… No! AD is a key, so ADE is a superkey, but not a candidate key D + = D D + = DE D + = DEC … Nope!

Boyce-Codd Normal Form (BCNF)

• Reln R with FDs

F

is in BCNF if, for all X  A in F + – A  X (called a

trivial

FD), or – X is a superkey for R.

• In other words: “R is in BCNF if the only non trivial FDs over R are

key constraints

.”

Third Normal Form (3NF)

• Reln R with FDs

F

A  X (called a

trivial

is in 3NF FD), or if, for all X  A in F + X is a superkey of R, or A is part of some candidate key (not superkey!) for R. (sometimes stated as “A is

prime ”

) • If R is in BCNF, obviously in 3NF.

BCNF Decomposition

• For each FD in F+ that violates BCNF, X  A • Decompose R into R-A and XA • If either R-A or XA is not in BCNF, decompose recursively • Guaranteed to be lossless but not dependency preserving

Lossless Decomposition

• •

The decomposition of R into X and Y is

lossless with respect to F

if and only if the closure of F contains:

X  Y  X  Y  X, Y

or Useful result : If W

Z holds over R and W

Z is empty, then decomposition of R into R-Z and WZ is loss-less.

Dependency Preserving Decompositions • Decomposition of R into X and Y is

dependency preserving

if (F X  F Y ) + = F + – i.e., if we consider only dependencies in the closure F + that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F + .

• • • Important to consider F + in this definition: – ABC, A  B, B  • F+ also contains B  C, C  A, A  A, decomposed into AB and BC.

C, C  B …

F AB contains A

B and B

C

B So, (F AB

F BC) + A contains C

; F A BC contains B

C and

Example BCNF Decomposition

• CSJDPQV, candidate key C • JP  C, SD  P, J  S • Using SD  • Using J  P, we get SDP, CSJDQV S, we get JS and CJDQV • Result: SDP, JS and CJDQV . All are in BCNF • Lossless decomposition.

• Not dependency preserving. We did not preserve JP  C

Minimal Cover for a Set of FDs

Minimal cover

G for a set of FDs F: – Closure of F = closure of G.

– Right hand side of each FD in G is a single attribute.

– If we modify G by deleting an FD or by deleting attributes from an FD in G, the closure changes.

• Intuitively, every FD in G is needed, and ``

as small as possible

’’ in order to get the same closure as F.

• e.g., A  B, ABCD  E, EF  GH, ACDF  EG has the following minimal cover: – A  B, ACD  E, EF  G and EF  H – Do we need ACDF EF   EG? It can be derived from ACD  G. Same for ACDF  E, ACDF  G E and

3NF Decomposition

• Decompose to BCNF • For each FD X  A in minimal cover that is not preserved – Add relation XA • Guaranteed to be lossless AND dependency preserving

Tuning the Schema

Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val) Depts (Did, Budget, Report) Suppliers (Sid, Address) Parts (Pid, Cost) Projects (Jid, Mgr) • We will concentrate on Contracts , denoted as CSJDPQV . The following ICs are given to hold:   is the primary key .

– C and JP are candidate keys – 3NF normal form

BCNF Decomposition

• Use SD  P, we get SDP and CSJDQV • Lossless but not dependency-preserving (JP  C) • Three options – Leave it in 3NF without decomposition – Create an assertion to enforce JP  C • Acceptable when updates are infrequent – Add JPC as an extra table (redundancy across relations)

Check Assertion (JP

C)

PartInfo: SDP ContractInfo: CSJDQV CREATE ASSERTION checkDep CHECK (NOT EXISTS (SELECT * Lossless join on SD FROM PartInfo PI, ContractInfo CI Group By JP WHERE PI.supplierid=CI.supplierid AND PI.deptid = CI.deptid

) GROUP BY CI.projectid, PI.partid

HAVING COUNT (cid) > 1 Count C )

Dealing with CC Hotspots

• Consider relation R: ABC • Frequent Queries: – Update B – Read C • Tuple-granularity locking • Options – Lossless decomposition to AB and AC (Not good if there are queries that reads BC) – Others?

Other stuff

• Partitioning (Vertical and Horizontal) • Physical DB Design – Choice of index • Whether to index? (Factor in costs of index maintenance) • Choice of search key(s) • Clustered / Unclustered?

• Index-only scans