Transcript Document

Concepts of DB(3)
Relational Operation & SQL
한국기술대학교
인터넷미디어공학부
민준기
Spring, 2006
KUT
Page 1
 Relational Data Operation
• Procedural language
– The user instructs the system to perform a sequence of
operations on the database to compute the desired
result what & how
– Relational algebra
• Nonprocedural language
– The user describes the information desired without a specific
procedure for obtaining the information  what
– Relational calculus
• 1.Tuple Relational Calculus
• 2.Domain Relational Calculus
• Relational Algebra and Relational Calculus
have same expression/computing power
Spring, 2006
KUT
Page 2
2
 Relational Algebra
•
Relational Algebra consists of several groups of operations
– Relational Algebra Operations From Set Theory
• UNION (  )
• INTERSECTION (  )
• DIFFERENCE (or MINUS, – )
• CARTESIAN PRODUCT ( x )
– Relational Operations
• Unary Relational Operations
– SELECT (symbol:  (sigma))
– PROJECT (symbol:  (pi))
– RENAME (symbol:  (rho))
• Binary Relational Operations
– JOIN (several variations of JOIN exist,
– DIVISION ( ÷ )
)
– Additional Relational Operations
• OUTER JOINS,
• OUTER UNION
• AGGREGATE FUNCTIONS (These compute summary of information: for
example, SUM, COUNT, AVG, MIN, MAX)
•
closure property
– Operand and operation results are relation
– Support nested expressions
Spring, 2006
KUT
Page 3
3
UNION
INTERSECT
DIFFERENCE
10
PRODUCT
RESTRICTION
DIVIDE
a
x
a
x
a
x
x
a
b
y
a
y
a
y
y
b
b
x
b
x
b
y
b
y
c
x
c
x
c
y
c
y
c
c
PROJECTION
JOIN
a1
b1
b1
c1
a1
b1
c1
a2
b1
b2
c2
a2
b1
c1
a3
b2
b3
c3
a3
b2
c2
Spring, 2006
KUT
Page 4
Relational Algebra Operations
From Set Theory
ⅰ. union,∪
R∪S = { t | t∈R ∨ t∈S }
|R∪S| ≤ |R| + |S|
ⅱ. intersect,∩
R∩S = { t | t∈R ∧ t∈S }
|R∩S| ≤ min{ |R|, |S| }
ⅲ. difference,RS = { t | t∈R ∧ t
|RS| ≤ |R|
S}
ⅳ. Cartesian product,×
R×S = { r·s | r∈R ∧ s∈S }
· : concatenation
|R×S| = |R|×|S|
degree = R’s degree + S’s degree
Spring, 2006
KUT
Page 5
5
▶Relational Operations
• Relation : R(X) = R(A1, ... , An)
• R’s tuple r : <a1, ... , an>
R={r | r = <a1, ... , an> }
– ai : tuple r’s attribute Ai value
– ai = r.Ai = r[Ai]
• In general,
– <r.A1 , r.A2 ,…, r.An > = < r[A1], r[A2], …, r[An] >
= r[A1, A2, … An] = r[X]
Spring, 2006
KUT
Page 6
6
Unary Relational Operations: SELECT
•
The SELECT operation (denoted by  (sigma)) is used to select a
subset of the tuples from a relation based on a selection condition.
– The selection condition acts as a filter
– Keeps only those tuples that satisfy the qualifying condition
– Tuples satisfying the condition are selected whereas the
other tuples are discarded (filtered out)  horizontal subset
•
Av(R) = { r | r∈R ∧ r.Aθv }
AB(R) = { r | r∈R ∧ r.Aθr.B }
where, θ(theta) : <, >, ≤, ≥, =, ≠
Examples:
– Select the EMPLOYEE tuples whose department number is 4:
 DNO = 4 (EMPLOYEE)
– Select the employee tuples whose salary is greater than
$30,000:
 SALARY > 30,000 (EMPLOYEE)
Spring, 2006
KUT
Page 7
Slide 6- 7
Unary Relational Operations: SELECT
(contd.)
• SELECT Operation Properties
– The SELECT operation  <selection condition>(R) produces a relation S
that has the same schema (same attributes) as R
– SELECT  is commutative:
  <condition1>( < condition2> (R)) =  <condition2> ( < condition1> (R))
– Because of commutativity property, a cascade (sequence) of
SELECT operations may be applied in any order:
 <cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> ( R)))
– A cascade of SELECT operations may be replaced by a single
selection with a conjunction of all the conditions:
 <cond1>(< cond2> (<cond3>(R)) =  <cond1> AND < cond2> AND < cond3>(R)))
– The number of tuples in the result of a SELECT is less than (or
equal to) the number of tuples in the input relation R
– Selectivity
Spring, 2006
KUT
Page 8
Slide 6- 8
Join
• 세타조인 (theta-join)
R(X), S(Y), A∈X, B∈Y 에 대하여
R AθB S = { r · s | r∈R ∧ s∈S ∧ ( r.Aθs.B) }
– A, B : joining attribute
– 결과 차수 = R의 차수 + S의 차수
• example
– 학생
학번=학번
등록
• 동일조인 (equi-join)
세타조인에서 θ가 "="인 경우
R A=BS = { r·s | r∈R ∧ s∈S ∧ ( r.A=s.B ) }
Spring, 2006
KUT
Page 9
Unary Relational Operations: PROJECT
• PROJECT Operation is denoted by  (pi)
• In Relation R(X),
if Y⊆X and Y={B1,B2, … ,Bm},
Y(R)={ <r.B1, ... , r.Bm> | r∈R }
 vertical subset
• This operation keeps certain columns (attributes)
from a relation and discards the other columns.
– PROJECT creates a vertical partitioning
• The list of specified columns (attributes) is kept in each tuple
• The other attributes in each tuple are discarded
• Example: To list each employee’s first and last
name and salary, the following is used:
LNAME, FNAME,SALARY(EMPLOYEE)
Spring, 2006
KUT
Page 10
Slide 6- 10
Binary Relational Operations: JOIN
• JOIN Operation (denoted by
)
– The sequence of CARTESIAN PRODECT followed by
SELECT is used quite commonly to identify and select
related tuples from two relations
– A special operation, called JOIN combines this
sequence into a single operation
– This operation is very important for any relational
database with more than a single relation, because it
allows us combine related tuples from various relations
– The general form of a join operation on two relations
R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is:
R <join condition>S
– where R and S can be any relations that result from
general relational algebra expressions.
Spring, 2006
KUT
Page 11
Slide 6- 11
Some properties of JOIN
• Consider the following JOIN operation:
– R(A1, A2, . . ., An)
S(B1, B2, . . ., Bm)
R.Ai=S.Bj
– Result is a relation Q with degree n + m attributes:
• Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
– The resulting relation state has one tuple for each
combination of tuples—r from R and s from S, but only
if they satisfy the join condition r[Ai]=s[Bj]
– Hence, if R has nR tuples, and S has nS tuples, then the
join result will generally have less than nR * nS tuples.
– Only related tuples (based on the join condition) will
appear in the result
Spring, 2006
KUT
Page 12
Slide 6- 12
Theta JOIN
For R(X), S(Y), A∈X, B∈Y ,
R AθB S = { r · s | r∈R ∧ s∈S ∧ ( r.Aθs.B) }
– A, B : join attribute
– θ can be any general boolean expression on the
attributes of R and S; for example:
• R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)
Spring, 2006
KUT
Page 13
13
NATURAL JOIN Operation
•
NATURAL JOIN Operation
– Another variation of JOIN called NATURAL JOIN — denoted by N— was
created to get rid of the second (superfluous) attribute in an EQUIJOIN
condition.
• because one of each pair of attributes with identical values is superfluous
• natural join:
N)
If R(X), S(Y)’s join attribute is Z(=X∩Y)
R NS
= {<r · s>[X∪Y] | r∈R∧s∈S∧r[Z]=s[Z] }
= X∪Y( Z=Z(R×S))
= X∪Y(R Z=ZS)
– The standard definition of natural join requires that the two join attributes, or
each pair of corresponding join attributes, have the same name in both
relations
Spring, 2006
KUT
Page 14
Slide 6- 14
NATURAL JOIN Operation
• example: Q  R(A,B,C,D)
N
S(C,D,E)
– The implicit join condition includes each pair of attributes with
the same name, “AND”ed together:
• R.C=S.C AND R.D.S.D
– Result keeps only one attribute of each such pair:
• Q(A,B,C,D,E)
Spring, 2006
KUT
Page 15
DIVISION: ÷ (1)
• DIVISION Operation
– The division operation is applied to two relations R(X), S(Y)
–
R÷S={ t | t∈ D(R) ∧ t · s∈R for all s∈S }, where Y X. Let D =
X-Y(and hence X =D  Y); that is, let D be the set of attributes of R
that are not attributes of S.
– The result of DIVISION is a relation T(Y) that includes a tuple t if
tuples tR appear in R with tR [Y] = t, and with
• tR [X] = ts for every tuple ts in S.
– For a tuple t to appear in the result T of the DIVISION, the values in
t must appear in R in combination with every tuple in S.
• Note : ((R ÷ S) × S) ⊆ R
Spring, 2006
KUT
Page 16
16
학과목(SC)
학번
(SNO)
100
100
200
300
300
300
400
400
400
400
500
Spring, 2006
과목번호
(CNO)
C413
E412
C123
C312
C324
C413
C312
C324
C413
E412
C312
과목1(C1)
과목2(C2)
과목3(C3)
과목번호
(CNO)
C413
과목번호
(CNO)
C312
C413
과목번호
(CNO)
C312
C413
E412
SC ÷ C1
SC ÷ C2
SC ÷ C3
학번
(SNO)
100
300
400
학번
(SNO)
300
400
KUT
학번
(SNO)
400
Page 17
Extension of Relational Algebra(1)
ⅰ. semijoin:
–R
S: R’s tuples that can be natual join with S
• Let R(X), S(Y)’s join attribute be Z(=X∩Y),
R
S=R
N( Z(S))
=  X(R
NS)
• Property
–R
–R
Spring, 2006
S≠S R
S)
NS = (R
NS
= (S
R)
KUT
NR
Page 18
18
 Natual Join & SemiJoin
R
X∩Y(S)
S
A
B
C
B
C
D
B
C
a1
a2
a3
a4
b1
b1
b1
b2
c1
c1
c2
c3
b1
b1
b2
c1
c1
c3
d1
d2
d3
b1
b2
c1
c3
R
N
S
A
N
B
C
D
a1
a1
a2
a2
a4
b1
b1
b1
b1
b2
c1
c1
c1
c1
c3
d1
d2
d1
d2
d3
N
N
R
S
A
B
C
a1
a2
a4
b1
b1
b2
c1
c1
c3
(세미 조인)
(자연 조인)
Spring, 2006
KUT
Page 19
19
Extension of Relational Algebra(2)
• The OUTER JOIN Operation
– In NATURAL JOIN and EQUIJOIN, tuples without a
matching (or related) tuple are eliminated from the join
result
• Tuples with null in the join attributes are also eliminated
• This amounts to loss of information.
– A set of operations, called OUTER joins, can be used
when we want to keep all the tuples in R, or all those in
S, or all those in both relations in the result of the join,
regardless of whether or not they have matching tuples
in the other relation.
ⅱ. outerjoin,
Spring, 2006
+
KUT
Page 20
20
Extension of Relational Algebra(3)
• The left outer join operation keeps every tuple
in the first or left relation R in R
S; if no
matching tuple is found in S, then the
attributes of S in the join result are filled or
“padded” with null values.
• A similar operation, right outer join, keeps
every tuple in the second or right relation S in
the result of R
S.
• A third operation, full outer join, denoted by
or + , keeps all tuples in both the left and
the right relations when no matching tuples
are found, padding them with null values as
needed.
Spring, 2006
KUT
Page 21
 Natural Join and Outer Join
S
R
A
B
C
B
C
D
a1
a2
a3
a4
b1
b1
b1
b2
c1
c1
c2
c3
b1
b1
b2
b3
c1
c1
c3
c3
d1
d2
d3
d3
+
+S
R
Spring, 2006
N
A
B
C
D
a1
a1
a2
a2
a3
a4
b1
b1
b1
b1
b1
b2
b3
c1
c1
c1
c1
c2
c3
c3
d1
d2
d1
d2
(Outer Join)
R
S
A
a1
a1
a2
a2
a4
d3
d3
KUT
22
N
B
C
D
b1
b1
b1
b1
b2
c1
c1
c1
c1
c3
d1
d2
d1
d2
d3
(Natural Join)
Page 22
Extension of Relational Algebra(4)
ⅲ. outer-union, ∪+
– The outer union operation was developed to take
the union of tuples from two relations if the
relations are not type compatible.
– This operation will take the union of tuples in two
relations R(X, Y) and S(X, Z) that are partially
compatible, meaning that only some of their
attributes, say X, are type compatible.
– The attributes that are type compatible are
represented only once in the result, and those
attributes that are not type compatible from either
relation are also kept in the result relation T(X, Y,
Z).
Spring, 2006
KUT
Page 23
23
 Outer Union
R
S
A
B
C
B
C
D
a1
a2
a3
a4
b1
b1
b1
b2
c1
c1
c2
c3
b1
b1
b2
c1
c1
c2
d1
d2
d3
∪+
Spring, 2006
A
B
C
D
a1
a2
a3
a4
b1
b1
b1
b2
b1
b1
b2
c1
c1
c2
c3
c1
c1
c2
d1
d2
d3
KUT
Page 24
24
Extension of Relational Algebra(5)
•
•
•
•
A type of request that cannot be expressed in the basic relational algebra is to specify
mathematical aggregate functions on collections of values from the database.
Examples of such functions include retrieving the average or total salary of all employees or the
total number of employee tuples.
– These functions are used in simple statistical queries that summarize information from the
database tuples.
Common functions applied to collections of numeric values include
– SUM, AVERAGE, MAXIMUM, and MINIMUM.
The COUNT function is used for counting tuples or values.
– AVGgrade(enroll)
• retrieves the average score value from the enroll relation
– GROUPyear(student)
• Grouping student into subgroups with respect to year
– GROUPcnoAVGscore(enroll)
• Retrieve the average score of each cno group of enroll
– General Form : GAFB(E)
• E : Relational Algebra expression
• F : aggregation fuction ( SUM, AVG, MAX, MIN, COUNT)
• B : Aggregate attribute
• G : GROUP Function
• A :Group attribute
Spring, 2006
KUT
Page 25
25
▶ Algebra Expression
• Retrieve all students’ name and dept
 sname,dept (student)
• Retrieve name and score of a studuent who register C413
course
 sname,score( cno='C413' (student
NRegister))
• Retrieve name of a professor who teaches ‘database’
 profname( cname=‘database'(course))
Spring, 2006
KUT
Page 26
26
SQL
• SQL
– A standard language for RDB
– Based on relational algebra and calculus
• Features of SQL
– No-procedural language
• SQL language : handling a set of data satisfying the conditions
– Interactive or embedded
Spring, 2006
KUT
Page 27
Sample Table
학생
(STUDENT)
과목
(COURSE)
Spring, 2006
학번
(SNO)
100
200
300
400
500
이름
(SNANE)
나연묵
이찬영
정기태
송병호
박종화
학년
(YEAR)
4
3
1
4
2
학과
(DEPT)
컴퓨터
전기
컴퓨터
컴퓨터
산공
과목번호
과목이름
학점
(CNO)
(CNANE)
(CREDIT)
C123
프로그래밍
3
C312
자료 구조
3
C324
파일 처리
3
C413 데이타 베이스
3
C412
반도체
3
KUT
학과
(DEPT)
컴퓨터
컴퓨터
컴퓨터
컴퓨터
전자
담당교수
(PRNAME)
김성기
황수찬
이규철
이석호
홍봉희
Page 28
Basic of SQL
• SELECT statement
– Retrieve data from table
– SELECT clause and FROM clause are mandatory
in SELECT statement
– WHERE clause is optional to describe the
condition
SELECT SNAME
이름
학년
학과
FROM학생
STUDENT학번
(STUDENT)
Spring, 2006
(SNO)
100
200
300
400
500
(SNANE)
나연묵
이찬영
정기태
송병호
박종화
KUT
(YEAR)
4
3
1
4
2
(DEPT)
컴퓨터
전기
컴퓨터
컴퓨터
산공
Page 29
SQL의 기초
SELECT SNAME
FROM STUDENT
WHERE YEAR =4
학생
(STUDENT)
Spring, 2006
학번
(SNO)
100
200
300
400
500
이름
(SNANE)
나연묵
이찬영
정기태
송병호
박종화
KUT
학년
(YEAR)
4
3
1
4
2
학과
(DEPT)
컴퓨터
전기
컴퓨터
컴퓨터
산공
Page 30
SQL
• UNION :
– SQL ex : SELECT a FROM R UNION SELECT b FROM S;
• INTERSECT :
– SQL ex : SELECT a FROM R INTERSECT SELECT b FROM S;
• DIFFERENCE :
– SQL ex : SELECT a FROM R MINUS SELECT b FROM S;
• PRODUCT :
– SQL ex : SELECT a, b FROM R, S;
• RESTRICTION(SELECTION) :
– SQL ex : SELECT * FROM R WHERE r.A=10;
• PROJECTION :
– SQL ex : SELECT r.A1, r.A2 FROM R;
• JOIN
– SQL ex : SELECT r.A, r.B FROM R, S WHERE r.A = s.B;
Spring, 2006
KUT
Page 31
Query Processing
• Index
– Random data(tuple) access
–  inefficient
– Additional data structure
Spring, 2006
KUT
Page 32
▶Index method
• indexed file consists of
– index file
– data file
Data File
Index file
keyaddress
K1
K2
K3
Spring, 2006
KUT
Page 33
33
Traditional Index Structure
• B-Tree
• B+-Tree
Spring, 2006
KUT
Page 34
index : (1) B-tree
•
B-tree (degree = m)
–
m-way search tree
1. Except root and leaf, the number of subtrees of internal
node is at least ⌈m/2⌉, at most, m
1. at most, the number of key is ⌈m/2⌉-1
2. if root is not a leaf, root has two subtree ats least.
3. all leaf is same level
•
balanced tree
Note: degree is the maximum number of subtrees
Spring, 2006
KUT
Page 35
35
 3-nary B-tree
a
69
^
b
c
19
d
128
e
16
^
j
7
43
k
15
18
Spring, 2006
20
f
26
40
l
m
30 36
g
60
n
42
^
o
50 58
h
100
p
62 65
^
q
70
138
132
r
110 120 130
KUT
i
s
^
145
t
136
u
140
Page 36
36
^
v
150
▶ Operation(1)
• B-tree
– random access : branch by search key
– sequential access : in order traversal
– Insert/delete : keep balance
• split : by node overflow
• merge : by node underflow
• Insert
– Insert done at leaf node
• has free space : simple insertion
• overflow(no free space)  there m keys in a leaf node
1) split
2) m/2 th key  insert parent node
3) remains  left, right
Spring, 2006
KUT
Page 37
37
 Example
•
59 insert
b
b
f

60
f


^
o
 58
50
•
p
58

o
o’
50
59
60

p
57 insert
o’
o
 57
50
50
Spring, 2006
KUT
Page 38
 Example
• splite by insert 54
o
50 57
o’
o’’

50

57
54 goes
to parent
node f
• in Parent node f, insert 54
f

58
o
Spring, 2006
f’
f

o’
60


p
o
54

^
o’’

o’
KUT
60

p
^
58 goes
to parent
node b
Page 39
39
 Example
• parent node b, insert 58
b


19 43
d
b’
b
e


f
19
^

d
e

f
58

f’
^
43 goes
to parent a
• parent node a, insert 43
a

69
b
Spring, 2006
a

c

^
43

69

b’
b
c
KUT
Page 40
40
▶ Example (2)
•
Delete
– Delete is done at leaf node
– Deletion key is not in leaf node
•
•
swap with following key
deletion
– if # of key < ⌈ m/2 ⌉ -1, underflow
1. redistribution
– sibling node having keys whose number >=⌈m/2⌉
(parent node key → underflow node key)
(sibling node key →parent node key)
2. merge
– can not redistribution
(sibling node + parent node + underflow node)
Spring, 2006
KUT
Page 41
41
 Example
• delete 60
b
f
b
f
60
o
50
p
6265
• delete 20
e
l
20
Spring, 2006
26
b
f
62
o
50
p
6065
o
50
b
p
65
b
e
40
m
3036
62
n
42
l
26
KUT
30
40
m
36
n
42
Page 42
B-Tree insertion (m =5)
• Insert 77
2
7 40
72 84
74 75 76 78
89 90 91
Split
72 76 84
2 7
40
Spring, 2006
74 75
77 78
KUT
89 90 91
Page 43
B-Tree Deletion (m = 5)
• Delete 84
2 7
40
72 76 84
74 75
77 78
89 90 91
77 78
84 90 91
Swap & Delete
72 76 89
2 7 40
Spring, 2006
74 75
KUT
Page 44
B-Tree Deletion
• Delete 74
72 76 89
2 7 40
74 75
77 78
90 91
77 78
90 91
72 76 89
2 7
40
74 75
Underflow 발생
Spring, 2006
KUT
Page 45
B-Tree Deletion
Redistribution Using A Adjacent Sibling whose number of key greater than or
equal to ceiling(m/2)
72 76 89
2 7
40
74 75
77 78
90 91
,7, 40, 72, 75} is redistributed , [m/2] th value (즉, 40) go to pare
40 76 89
2 7
Spring, 2006
72 75
77 78
KUT
90 91
Page 46
B-Tree Deleton
• Delete 40
40 76 89
2 7
72 75
77 78
90 91
77 78
90 91
72 76 89
2 7
40 75
Swap
cannot Redistribution
Spring, 2006
KUT
Page 47
B-Tree Deletion
72 76 89
2 7
40 75
77 78
90 91
erge with right sibling and parent
72 89
2 7
Spring, 2006
75 76 77 78
KUT
90 91
Page 48
(2) B+-Tree
•
B+-tree consists of index set and sequence set
1. index set
–
–
–
consists of internal node
support access path to leaf nodes
support direct access
2. sequence set
–
–
consists of leaf nodes
leaf nodes store whole keys
 support sequential access
–
Spring, 2006
leaf node and internal node has different structures
KUT
Page 49
49
▶ B+-Tree(2)
• B+-tree with degree m
– node structure<n, P0, K1, P1, K2, P2, … , Pn-1, Kn,
Pn>
• n : # of keys ( 1≤n<m )
• P0, …, Pn :pointer to subtree
• K1, …, Kn : key value
– root has 0, 2~m subtrees
– Except root and leaf, internal node has ⌈m/2⌉~ m
subtrees
– All leaf nodes are same level
– key values in nodes is ascending order
Spring, 2006
KUT
Page 50
50
Difference between B-tree and B+-tree
• In a B-tree, pointers to data records exist at all
levels of the tree
• In a B+-tree, all pointers to data records exists
at the leaf-level nodes
• A B+-tree can have less levels (or higher
capacity of search values) than the
corresponding B-tree
Spring, 2006
KUT
Page 51
Slide 14-
▶ B+-tree operation
• search
– B+-tree index set : m-nary search tree
– Record is obtained at leaf node
• Insert
– similar to B-tree
• Delete
– done at leaf (when redistribution/merge)
• key in index set is not deleted
∵ it act as seperator access path
– redristribution: change key in index set
– merge : delete key in index set
Spring, 2006
KUT
Page 52
52
 B+-tree with degree 3
a
69
Index
set
b
sequence
set
d
1520
Spring, 2006
20
c
43
e
354043
f
5569
110
g
h
7090110 120125
KUT
Page 53
53
 Example
• B+-Tree , delete 43 43 in index set is not removed
69
20
1520
43
3540
110
5569
7090110 120125
• delete 125 (underflow  redistribution)
69
20
1520
Spring, 2006
3540
43
90
5569
KUT
7090
110120
Page 54
 Exapme
• delete 55(under flow merge)
69
20
1520
Spring, 2006
90
354069
7090
KUT
110120
Page 55
55