Document 7200555

Download Report

Transcript Document 7200555

The Relational Data Model (Based on Chapter 5)

1

1. Relational Model Concepts BASIS OF THE MODEL

• The relational Model of Data is based on the concept of a

Relation.

A Relation is a mathematical concept based on the ideas of sets.

• The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations.

2

INFORMAL DEFINITIONS •

RELATION:

A table of values • A relation may be thought of as a

set of rows.

• A relation may alternately be though of as a

columns.

set of •

Each row of the relation may be given an identifier.

• Each column typically is called by its column name or column header or attribute name.

3

FORMAL DEFINITIONS

• A

Relation

may be defined in multiple ways.

• The

Schema

of a Relation: R (A1, A2, .....An) Relation R is defined over

attributes

A1, A2, .....An

For Example CUSTOMER (Cust-id, Cust-name, Address, Phone#) Here, CUSTOMER is a relation defined over the four attributes Cust-id, Cust-name, Address, Phone#, each of which has a domain or a set of valid values.

4

For example, the

domain

• A

tuple

of Cust-id is 6 digit numbers.

is an ordered set of values • Each value is derived from an appropriate domain.

• Each row in the CUSTOMER table may be called as a tuple in the table and would consist of four values.

<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000"> is a triple belonging to the CUSTOMER relation.

• A relation may be regarded as a set of tuples (rows).

• Columns in a table are also called as attributes of the relation.

5

FORMAL DEFINITIONS (contd..)

• The relation is formed over the cartesian product of the sets; each set has values from a domain; that domain is used in a specific role which is conveyed by the attribute name.

• For example, attribute Cust-name is defined over the domain of strings of 25 characters. The role these strings play in the CUSTOMER relation is that of the name of customers.

• Formally, Given R(A1, A2, .........., An) r(R) subset-of dom (A1) X dom (A2) X ....X dom(An) • R: schema of the relation r of R: a specific "value" or population of R.

6

R is also called the

intension

of a relation r is also called the

extension

of a relation Let S1 = {0,1} Let S2 = {a,b,c} Let R be a subset-of S1 X S2 for example: r(R) = {<0.a> , <0,b> , <1,c> } 7

Informal Terms Table Column Row Values in a column Table Definition Populated Table

DEFINITION SUMMARY

Formal Terms Relation Attribute/Domain Tuple Domain Schema of Relation Extension 8

Figure 7.1

The attributes and tuples of a relation STUDENT.

9

2 Characteristics of Relations Ordering of tuples in a relation r(R)

: The tuples are

not

considered to be ordered, even though they appear to be in the tabular form.

Ordering of attributes in a relation schema R

(and of values within each tuple): We will consider the attributes in R(A1, A2, ..., An) and the values in t= to be

ordered

.

10

Values in a tuple

: All values are considered

atomic

(indivisible). A special

null

value is used to represent values that are unknown or inapplicable to certain tuples.

Notation: We refer to

component values

of a tuple t by t[Ai] = vi (the value of attribute Ai for tuple t).

- Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw, respectively.

11

Figure 7.2

The relation STUDENT from Figure 7.1, with a different order of tuples 12

3 Relational Integrity Constraints

Constraints are

conditions

that must hold on

all

valid relation instances. There are three main types of constraints: • • •

Key

constraints

Entity integrity

constraints,

Referential integrity

constraints 13

3.1 Key Constraints Superkey

of R: A set of attributes SK of R such that no two tuples

in any valid relation instance r(R)

will have the same value for SK. That is, for any distinct tuples t1 and t2 in r(R), t1[SK]

<>

t2[SK].

Key

of R: A "minimal" superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey.

14

Example: The CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a superkey but

not

a key.

If a relation has several candidate keys , one is chosen arbitrarily to be the

primary key

. The primary key attributes are

underlined

.

15

Figure 7.4

The CAR relation with two candidate keys: LicenseNumber and EngineSerialNumber.

16

Figure 7.5

Schema diagram for the COMPANY relational database schema; the primary keys are underlined.

17

Figure 7.5

Schema diagram for the COMPANY relational database schema; the primary keys are underlined.

18

Figure 7.6

(continued) 19

3.2 Entity Integrity Relational Database Schema

: A set S of relation schemas that belong to the same database. S is the

database

.

name

of the S = {R1, R2, ..., Rn}

Entity Integrity

: The

primary key attributes

PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to tuples.

identify

the individual t[PK]

<>

null for any tuple t in r(R) Note: Other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.

20

3.3 Referential Integrity

A constraint involving

two

relations (the previous constraints involve a

single

relation).

Used to specify a

relationship

referencing relation

among tuples in two relations: the and the

referenced relation

.

Tuples in the

referencing relation

foreign key

R1 have attributes FK (called attributes) that reference the primary key attributes PK of the

referenced relation

R2. A tuple t1 in R1 is said to

reference

t2[PK].

a tuple t2 in R2 if t1[FK] = A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.

21

Figure 7.7

Referential integrity constraints displayed on the COMPANY relational database schema diagram.

22

Figure 7.6

One possible relational database state corresponding to the company schema.

23

-

4 Update Operations on Relations

INSERT a tuple.

DELETE a tuple.

MODIFY a tuple.

Integrity constraints should not be violated by the update operations.

Several update operations may have to be grouped together.

Updates may

propagate

to cause other updates automatically. This may be necessary to maintain integrity constraints.

24

In case of integrity violation, several actions can be taken: - cancel the operation that causes the violation (REJECT optiom) - perform the operation but inform the user of the violation - trigger additional updates so the violation is corrected (CASCADE option, SET NULL option) - execute a user-specified error-correction routine 25

5 The Relational Algebra

Operations to manipulate relations.

Used to specify retrieval requests (queries).

Query result is in the form of a relation.

Relational Operations:

5.1 SELECT s and PROJECT P operations.

5.2 Set operations: These include UNION U, INTERSECTION | |, DIFFERENCE -, CARTESIAN PRODUCT X.

5.3 JOIN operations X.

5.4 Other relational operations: DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS.

26

5.1 SELECT

s

and PROJECT

P

SELECT

operation (denoted by

s

): Selects the tuples (rows) from a relation R that satisfy a certain

selection condition

c Form of the operation: s c(R) The condition c is an arbitrary Boolean expression on the attributes of R Resulting relation has the

same attributes

as R Resulting relation includes each tuple in r(R) whose attribute values satisfy the condition ‘c’ 27

Examples: s DNO=4 s SALARY>30000 (EMPLOYEE) (EMPLOYEE) s (DNO=4 AND SALARY>25000) OR DNO=5 (EMPLOYEE) 28

PROJECT

operation (denoted by P ): Keeps only certain attributes (columns) from a relation R specified in an

attribute list

L Form of operation: P L (R) L Resulting relation has only those attributes of R specified in Example: P FNAME,LNAME,SALARY (EMPLOYEE) The PROJECT operation

eliminates duplicate tuples

resulting relation so that it remains a mathematical set (no in the duplicate elements) 29

Example: P SEX,SALARY (EMPLOYEE) If several male employees have salary 30000, only a single tuple is kept in the resulting relation.

Duplicate tuples are eliminated by the

P

operation

.

30

Sequences of operations

: Several operations can be combined to form a

relational algebra expression

(query) Example: Retrieve the names and salaries of employees who work in department 4: P FNAME,LNAME,SALARY ( s DNO=4(EMPLOYEE) ) -

Alternatively

, we specify explicit intermediate relations for each step: DEPT4_EMPS < s R < P DNO=4(EMPLOYEE) FNAME,LNAME,SALARY(DEPT4_EMPS) 31

Attributes can optionally be

renamed

in the resulting left-hand-side relation (this may be required for some operations that will be presented later): DEPT4_EMPS < s DNO=4 (EMPLOYEE) R(FIRSTNAME,LASTNAME,SALARY) < P FNAME,LNAME,SALARY (DEPT4_EMPS) 32

(a) s

Figure 7.8

Results of SELECT and PROJECT operations.

(DNO=4 AND SALARY>25000) OR (DNO=5 AND SALARY>30000) (EMPLOYEE).

(b)  LNAME, FNAME, SALARY (EMPLOYEE). (c)  SEX, SALARY (EMPLOYEE).

33

Class Number – CS 304 Class Name DBMS

Instructor – Sanjay Madria Lesson Title – Relational Algebra – 3rd July

34

5.2 Set Operations

Binary operations from mathematical set theory:

UNION

: R1 U R2,

INTERSECTION

: R1 | | R2,

SET DIFFERENCE

: R1 - R2,

CARTESIAN PRODUCT

: R1 X R2.

For U, | |, -, the operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same number of attributes, and the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This condition is called

union compatibility

.

The resulting relation for U, | |, or - has the same attribute names as the

first

operand relation R1 (by convention).

35

Figure 7.10

Query result after the UNION operation: RESULT  RESULT1  RESULT2 36

Figure 7.11

Illustrating the set operations union, intersection and difference.

(b) STUDENT (a) Two union compatible relations.

 INSTRUCTOR. (c) STUDENT  INSTRUCTOR (d) STUDENT - INSTRUCTOR (e) INSTRUCTOR - STUDENT 37

CARTESIAN PRODUCT

R(A1, A2, ..., Am, B1, B2, ..., Bn) < R1(A1, A2, ..., Am) X R2 (B1, B2, ..., Bn) A tuple t exists in R for each combination of tuples t1 from R1 and t2 from R2 such that: t[A1, A2, ..., Am]=t1 and t[B1, B2, ..., Bn]=t2 - If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples.

CARTESIAN PRODUCT can

combine related tuples

from two relations

if followed by the appropriate SELECT operation

.

38

Example: Combine each DEPARTMENT tuple with the EMPLOYEE tuple of the manager.

DEP_EMP <-DEPARTMENT X EMPLOYEE DEPT_MANAGER < s MGRSSN=SSN (DEP_EMP) 39

5.3 JOIN Operations THETA JOIN

: Similar to a CARTESIAN PRODUCT followed by a SELECT. The condition c is called a

join condition

.

R(A1, A2, ..., Am, B1, B2, ..., Bn) <-R1(A1, A2, ..., Am) X c (B1, B2, ..., Bn) Here c can be <, >, =, <=, >= R2

EQUIJOIN

: The join condition c includes one or more

equality comparisons

involving attributes from R1 and R2. That is, c is of the form: (Ai=Bj) AND ... AND (Ah=Bk); 1

join attributes

of R1 Bj, ..., Bk are called the

join attributes

of R2 40

Example of using EQUIJOIN: Retrieve each DEPARTMENT's name and its manager's name: T <-DEPARTMENT X MGRSSN=SSN RESULT < P EMPLOYEE DNAME,FNAME,LNAME (T) 41

NATURAL JOIN

(*): In an EQUIJOIN R <- R1 X c R2, the join attribute of R2 appear

redundantly

in the result relation R. In a NATURAL JOIN, the

redundant join attributes

of R2 are

eliminated

from R. The equality condition is

implied

and need not be specified.

R <- R1 *(join attributes of R1),(join attributes of R2) R2 Example: Retrieve each EMPLOYEE's name and the name of the DEPARTMENT he/she works for: T<- EMPLOYEE *(DNO),(DNUMBER) DEPARTMENT RESULT < P FNAME,LNAME,DNAME (T) 42

If the join attributes

have the same names

relations, they

need not be specified

write R <- R1 * R2.

in both and we can Example: Retrieve each EMPLOYEE's name and the name of his/her SUPERVISOR: SUPERVISOR(SUPERSSN,SFN,SLN)< P SSN,FNAME,LNAME(EMPLOYEE) T<-EMPLOYEE * SUPERVISOR RESULT < P FNAME,LNAME,SFN,SLN (T) 43

Figure 7.13

Illustrating the JOIN operation.

44

Figure 7.14

An illustration of the NATURAL JOIN operation. (a) PROJ_DEPT  PROJECT * DEPT.

(b) DEPT_LOCS  DEPARTMENT * DEPT_LOCATIONS.

45

Note: In the

original definition

join attributes were

required

both relations.

of NATURAL JOIN, the to have the same names in There can be a

more than one set of join attributes different meaning

example: with a between the same two relations. For JOIN ATTRIBUTES EMPLOYEE.SSN= RELATIONSHIP EMPLOYEE

manages

DEPARTMENT.MGRSSNthe DEPARTMENT EMPLOYEE.DNO= DEPARTMENT.DNUMBER

EMPLOYEE

works for

the DEPARTMENT 46

A relation can have a

set of join attributes itself

: to join it

with

JOIN ATTRIBUTES EMPLOYEE(1).SUPERSSN= EMPLOYEE(2).SSN

RELATIONSHIP EMPLOYEE(2)

supervises

EMPLOYEE(1) One can

think of this

as joining

two distinct copies

relation, although only one relation actually exists of the In this case,

renaming

can be useful 47

Figure 7.15

Illustrating the division operation.

(a) Dividing SSN_PNOS by SMITH_PNOS. (b) T  R  S.

48

Complete Set of Relational Algebra Operations

: All the operations discussed so far can be described as a sequence of

only

the operations SELECT, PROJECT, UNION, SET DIFFERENCE, and CARTESIAN PRODUCT. Hence, the set { s , P , U, - , X } is called a

complete set

of relational algebra operations. Any query language

equivalent to

these operations is called

relationally complete

.

For database applications, additional operations are needed that were not part of the

original

relational algebra. These include: 1. Aggregate functions and grouping.

2. OUTER JOIN.

49

5.4 Additional Relational Operations AGGREGATE FUNCTIONS

Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often applied to sets of values or sets of tuples in database applications

F

(R) The grouping attributes are optional Example 1: Retrieve the average salary of all employees (no grouping needed): R(AVGSAL) <-

F

AVERAGE SALARY (EMPLOYEE) 50

Example 2: For each department, retrieve the department number, the number of employees, and the average salary (in the department): R(DNO,NUMEMPS,AVGSAL) < DNO

F

COUNT SSN , AVERAGE SALARY (EMPLOYEE) DNO is called the

grouping attribute

in the above example 51

Figure 7.16

SALARY An illustration of the AGGREGATE FUNCTION operation.

(a) R(DNO, NO_OF_EMPLOYEES, AVERAGE_SAL) (EMPLOYEE). (b) DNO (C)   COUNT SSN, AVERAGE SALARY  DNO  (EMPLOYEE).

COUNT SSN, AVERAGE COUNT SSN, AVERAGE SALARY (EMPLOYEE).

52

OUTER JOIN

In a regular EQUIJOIN or NATURAL JOIN operation, tuples in R1 or R2 that do not have matching tuples in the other relation

do not appear in the result

Some queries require all tuples in R1 (or R2 or both) to appear in the result When no matching tuples are found,

null

s are placed for the missing attributes 53

LEFT OUTER JOIN

: R1 X R2 lets every tuple in R1 appear in the result -

RIGHT OUTER JOIN

: R1 X R2 lets every tuple in R2 appear in the result -

FULL OUTER JOIN

: R1 X R2 lets every tuple in R1 or R2 appear in the result 54

Figure 7.18

The LEFT OUTER JOIN operation.

55

RENAME Operator  S(B1, B2,…Bn) (R) – Renaming relation R as S and renaming attributes of R as Bi’s.  S (R) – Renaming R as S  (B1, B2,…Bn) (R) – Renaming attributes of R as Bi’s. 56

Some Queries Q. Retrieve the SSNs of all the employees who either work in dept. 5 or supervise an employee who works in dept 5.

Dept-Emps s DNO = 5 (Employee) Result1   SSN (Dept-Emps) Result 2(SSN)   SuperSSN (Dept-Emps) Result = Result 1  Result 2 57

Q. Find for each female employee, a list of names of her dependents.

Female-Emp s SEX = F (Employee) Empname   FNAME, LNAME, SSN (Female-Emp) Emp-dep  Empname  Dependent Actual-dep s SSN = ESSN (Emp-dep) Result   FNAME, LNAME, Dependent-name (Actual-dep) Q. Find the name of the manager of each department.

Dept-mgr  Department  MGRSSN = SSN (Employee) Result   DNAME, LNAME, FNAME (Dept-mgr) 58

• List names of managers who have atleast one dependent • Find names of employees who have no dependents • List names of all employees with two or more dependents 59

Figure 7.12

An illustration of the CARTESIAN PRODUCT operation.

60

Figure 7.12 (continued) 61

The

Insert

Operation

1. Insert <‘Cecilia’, ‘F”, “Kolonsky’, null, ‘1960-04 05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, null, 4> into EMPLOYEE This insertion violates the entity integrity constraint (null for the primary key SSN), so it is rejected.

2. Insert <‘Alicia’, ‘J’, ‘Zelaya’, 999887777, 1960 04-05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, 987654321’, 4> into EMPLOYEE This insertion violates the key constraint because another tuple with the same SSN Value already exists in the EMPLOYEE relation, and so it is rejected.

62

The

Insert

Operation (contd.)

3. Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘67768989’, 1960 04-05’, ‘6357 Windswept, katy, TX’, F, 28000, ‘987654321’, 7> into EMPLOYEE This insertion violates the referential integrity constraint specified on DNO because no DEPARTMENT tuple exists with DNUMBER = 7 4. Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘67768989’, 1960 04-05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, null, 4> into EMPLOYEE.

This insertion satisfies all constraints, so it is acceptable.

63

The

Delete

Operation

1. Delete the WORKS_ON tuple with ESSN = ‘999887777’ and PNO = 10.

This deletion is acceptable 2. Delete the EMPLOYEE tuple with SSN = ‘999887777’ This deletion is not acceptable, because tuples in WORKS_ON refer to this tuple. Hence, if the tuple is deleted, referential integrity violations will result.

64

The

Delete

Operation (contd.)

3. Delete the EMPLOYEE tuple with SSN= ‘333445555’ This deletion will result in even worse referential integrity violations, because the tuple involved is referenced by tuples from the EMPLOYEE, DEPARTMENT, WORKS_ON and DEPENDENT relations.

• Options • Reject deletion • Propagate deletion by deleting other tuples • Modify the referencing attributes that cause the violations • Or combination of three above 65

The

Update

Operation

1. Update the SALARY of the EMPLOYEE tuple with SSN = ‘999887777; to 28000 Acceptable 2. Update the DNO of the EMPLOYEE tuple with SSN = ‘999887777’ to 1 Acceptable 3. Update the DNO of the EMPLOYEE tuple with SSN = ‘999887777’ to 7 Unacceptable, because it violates referential integrity.

66

The

Update

Operation (contd.)

4. Update the SSN of the EMPLOYEE tuple with SSN = ‘999887777’ to ‘987654321’ Unacceptable, because it violates primary key and referential integrity constraints.

Updating an attribute which is neither primary key or foreign key usually causes no problems, new value should be of correct data type and domain. 67