Transcript Document

Relational Algebra - Chapter 6.1-6.5

Relational Algebra

• • • • • Theoretical basis for SQL (E.F. Codd) Relational algebra (algebraic notation) and relational calculus (logical notation) Created to demonstrate the potential for a query language of the relational model Algebra and calculus are equivalent in expressive power Can represent complex queries compactly, but too mathematical for the average person

What is an “Algebra”?

• Mathematical system consisting of: – Operands --- variables or values from which new values can be constructed.

– Operators --- symbols denoting procedures that construct new values from given values.

What does it do?

• • Provide DML and DDL In relational algebra, a series of operations are combined to form a relational algebra expression (query)

Set Theoretic Operations

Set theoretic operations

• Union, Intersection, Difference - Binary – Applied to 2 sets (relations) - no duplicates in result - mathematical set – Must be same type of tuples - Compatibility • same degree n • dom(Ai) = dom(Bi) – – Fig. 6.4

Resulting relation - same attribute names as first relation – Which operations are: • Commutative ? – – R U S = S U R, R ∩ S = S ∩ R R - S = S - R ?

• Associative ? – – R U (S U T) = (R U S) U T, R ∩ (S ∩ T) = (R ∩ S) ∩ T (R - S) – T ?

Cartesian Product X - Binary

• • • Also binary, but does not require union compatibility – R(A1, A2, … An) X S(B1, B2, …, Bm) Creates a tuple with the combined attributes of 2 tables – Q(A1, A2, …, An, B1, B2, …, Bm) Fig. 6.5

Degree of resulting relation? – n+m

Relational Algebra Operations

Select Operation

s • • • • s - unary operation (Where in SQL) A subset of tuples satisfying a selection condition Selects rows Equivalent to select condition in WHERE clause s () s s dno=4 (Employee) salary>30000 (Employee)

Select Operation

• • • Select condition is a Boolean expression comparison op - =, <, <=, etc. You can use boolean conditions to connect clauses Can combine cascade of selects into single select with ANDs s (dno=4 and salary>25000) or (dno=5 and salary>30000) (Employee) Fig. 6.1

Select Operation

• Degree of resulting relation? • • Selectivity - the fraction of tuples selected – number of tuples ≤ total tuples Is Select is commutative? σc1 (σc2(R)) = σc2 (σc1(R))

Project Operation

p

or

 • • • • • p unary operation Equivalent to the SELECT clause in SQL(Select) Keeps only certain attributes (columns) from a relation Selects columns Form of operation: p () p fname, lname, salary Employee Fig 6.1

• • Resulting relation has only those attributes specified Degree of relation ?

– attributes in attr_list

Project Operation

• • • The project operation eliminates duplicate tuples in the resulting relation so that it remains a mathematical set p sex, salary Employee Fig 6.1

If several male employees have salary $30,000 only single tuple is kept in the resulting relation. Is the project operation commutative?

– No, why?

Sequences of operations

• • • Several relation algebra operations can be combined to form a relational algebra expression (query). Retrieve the names and salaries of employees who work in department 5. Q ← p fname, lname, salary ( s dno=5 Employee) Alternately, explicit intermediate relations can be specified for each step: Dept5 ← s R ← p dno=5 Employee fname,lname,salary Dept5

Write the following in Relational Algebra

• • • Select * from Employee Select bdate from Employee Select * from Employee where sex='F'

Write the following in Relational Algebra

List SSN of employees who do not have dependents (Select ssn from employee) minus (Select essn from dependent) Now write SSN, lname of employees who do not have dependents

Renaming

• • Attributes can optionally be renamed in the resulting relation: Dept5  s dno=5 Employee T(firstname,lastname,salary)  p fname,lname,salary Dept5 Fig. 6.2

Alternative notation in textbook, can rename attributes and/or table r R(firstname, lastname, salary) p fname,lname,salary Dept5

Operations

• DML Operations: – set theory operations • Union • Intersection • Difference – relational DB operations • Select • Project • Anything else? • Join

The Join operation |X|

• • • • • • • Similar to a Cartesian Product followed by a select Form of operation: R |X| S Result is: Q (A 1 , A 2 , …, A n , B 1 , B 2 , …, B m ) A 1 , A 2 … are the attributes of R B 1 , B 2 , .. are the attributes of S For all tuples that satisfy the join condition join condition: and and … Fig. 6.6

Resulting number of tuples?

Different types of joins - theta join, natural join, equijoin

Equijoin

• • • • • R |X| Ai=Bi S requires identical values in every tuple for each pair of join attributes (one or more equality comparisons) Join conditions are all of the form A i = B i and A j = B j … Retrieve each department’s name and manager’s name. T  Department |X| mgrssn=ssn Employee Result  p dname,fname,lname (T)

Theta Join

• • • R |X| Ai q Bi S where the join condition is of the form: Ai q q is =, < ,  , etc. Bi Example: Scholarship(SName GPA_Req Desc) Student (Name CWID GPA Major) Select Name, SName From Student, Scholarship Where GPA >= GPA_Req

Natural join

• • • • • • We will use the * notation (some others use |X| without subscript) Like an equijoin, except attributes for the equijoin in the second relation are deleted from result (Why have 2 columns with the same value?) Q  R * (),() S Fig 6.7

Equivalent to equijoin but keep only list1 If attributes have the same name in both relations, list1 and list2 are not needed. In the original definition of natural join, the join attributes required to have the same names in both relations.

Order of precedence

• • • Unary: – Select, project (highest precedence) Binary: – Joins, Cartesian product – Intersection – Union, minus Use lots of parenthesis!

Write the following in Relational Algebra

• • • • • Select * from Employee, Department where dno=dnumber List employee SSNs who are female and work for the research department Select * From Employee, dept_locations Where dno = dnumber and dlocation = 'Houston' List locations of ‘Research’ department (use natural join) List SSN, lname of employees who do not have dependents

When renaming is needed

• • • A relation can have a set of join attributes with itself List all employee names and their supervisor names S(soc, first, last)  p ssn,fname,lname Employee Temp  Employee |X| superssn=soc S Result  p fname,lname,first,last (Temp) Usually, don't see qualification of attributes in relational algebra

• •

Complete Set of Relational Algebra Operations

{ s , p ,  , , × } All other relational algebra operations can be expressed as a sequence of operations from this set. Other operations are for convenience. R |X| S = s R  S = (R  S) (R X S) ((R S)  (S R))

• Do we need anymore relational algebra operations to satisfy queries?

• How about?

Select COUNT(*) From Project Select pname, COUNT(ssn) From Project, Works_on Where pnumber=pno Group by pname

Additional relation algebra Operations

• Aggregate function - SUM, COUNT, AVG, MIN, MAX []  () R   count ssn, avg salary (Employee) The following uses the optional grouping attribute R  dno  count ssn, avg salary (Employee) Fig. 6.9

• The attributes returned from an aggregate function are the attributes in the function list and any grouping attributes listed

Outer Join

• • • • Extension of join and union In a regular equijoin or natural join, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the result. Some queries require all tuples in R1 (or R2 or both) to appear in the result When no matching tuples are found, nulls are placed for the missing attributes.

Outer Join

• • • • • Left outer join: R1 ]X| R2 keeps every tuple in R1 in result.

List all employees and if they are a manager, list dname Temp <- (Employee ]X| ssn=mgrssn Department) R < p fname, minit, lname, dname Fig. 6.11

(Temp) Right outer join: R1 |X[ R2 keeps every tuple in R2 in result. Full outer join: R1 ]X[ R2 keeps every tuple in R1 and R2 in result. Think about how this is different from R1 X R2.

Division operation

• • • • • Part of original relational algebra T(Y) = R(Z)  S(X) tuple t is in result if t is in R for every tuple in S More generally, result is a relation T(Y) that includes t if t appears in R with the value of X for every tuple in S. Fig. 6.8 The attributes Y in table T = attributes of R in Z attributes S in X, where Y is the set of attributes in R not in S.

Result <- R  S

Division operation

• For example, Retrieve ssn of employees who work on all projects John smith works on.

smith_pnos < ssn_pnos < p p essn, pno ssns <- ssn_pnos pno Works_on |X| Works_on  smith_pnos ssn=essn s fname='John' and lname='Smith' (Employee) //if wanted names would have to do another join results < p fname, lname (ssns |X| ssn=essn Employee)

// Using minus Select ssn from Employee where lname <> ‘Smith’ and not exists ((Select pno from Works_on, Employee where ssn=essn and lname = ‘Smith’) minus (select pno from Works_on where ssn=essn)); / / if result of minus is empty, work on same projects

Write the following in Relational Algebra

• For every project located in 'Stafford' list the project number, the controlling department number and department manager's last name, address and birthdate.

Select pnumber, dnum, lname, bdate, address From Project, Department, Employee Where dnum = dnumber and mgrssn = ssn and plocation = 'Stafford'

Write in Relational Algebra

• For each project on which more than two employees work, retrieve the project number, project name, and the number of employees who work on that project.

Select pnumber, pname, COUNT(*) From Project, Works_on Where pnumber =pno Group By pnumber, pname Having COUNT(*) > 2

Write the following in Relational Algebra

• Compute the average number of dependents over employees with dependents • Think about how you would write this: Select * From Employee Where salary > all (Select salary From Employee Where sex = 'F ')

DDL - Also provided

• • • • • Declare Schema for database Declare Relation for Schema Insert into Relation Delete Relation tuple with specified condition Modify col. of Relation tuple with specified condition