Distributed Query Processing using different Semijoin

Download Report

Transcript Distributed Query Processing using different Semijoin

Distributed Query Processing using
different Semijoin operations.
Presented By:
Jamal Uddin Ahamed
Friday,March12,2004
1
Presentation Outline:
1.Overview.
2.Semijoin Operation.
3. Different semijoin operations.
a. 2 way semijoin.
b.Hash Semijoin.
c.Domain Specific Semijoin.
d. Composite semijoin.
4. References.
5.Questions and Answer.
2
1.1 What is distributed database system?
 A distributed database system is
characterized by the distribution of the
system components of hardware ,control
and data. For this research, a distributed
system is a collection of independent
computers interconnected via point-to-point
communication lines.
3
1.2 Node Characteristics:
Each computer , known as a node in the
network, has a processing capability, a
data storage capability, and is capable
of operating autonomously in the system.
Each node contains a version of a
distributed DBMS.
4
1.3 What is distributed query processing?
 The retrieval of data from different sites in a
network is known as distributed query
processing.
5
1.4 Phases of distributed query
processing with a semijoin operator.
1. Initial Local processing (Selections and
Projects are processed at each site.)
2. Semijoin processing ( A semijoin program)
is derived from the remaining join
operations and executed to reduce the size
of the relations in a cost-effective way)
3. Final processing (all relations involved are
transmitted to final site and all joins are
performed there.)
6
2.1 Semijoin:

A semijoin from Ri to Rj on attribute A can be
denoted as Rj⋉ Ri .It is used to reduce the data
transmission cost.
Computing steps:
1) Project Ri on attribute A (Ri[A] ) and
ship this projection ( a semijoin
projection) from the site of Ri to the site
of Rj ;
2) Reduce Rj to Rj’ by eliminating tuples
where attribute A are not matching any
value in Ri[A] .
7
2.2 Example:
Example (semijoin s: R1—AR2):
R1
A
B
1
Site 2
R1[A]
Site 1
1
2
3
R2
Ship(3)
A
C
4
3
7
2
5
4
8
3
6
5
9
projection
reduce
3
Ship(6)
Ship(2)
7
R2’
qs
Benefit (s) = 6 -2 = 4
Cost (s) = 3
Cost effectiveness D(s) =
B(s)-C(s) >0
8
3.a.1 Definition of 2 way semijoin.
2-way Semijoin—an extended version of the
semijoin

Definition: A 2-way semijoin (t) of Ri and Rj on
attribute A can be denoted as
RiARj = {Ri—ARj, Rj—ARi }
So t reduces Ri and Rj to Ri’ and Rj’
respectively.
9
3.a.2 Properties of 2 way semijoin.


Computing steps:
1) Send Ri [A] from site i to site j ;
2) Reduce Rj to Rj’ by eliminating tuples whose attribute A
are not matching any of Ri [A] and at the same time
partition Ri [A] to Ri [A]m (match one of Rj [A]) and Ri
[A]nm(Ri [A]- Ri [A]m) ;
3) Send min(Ri [A] m , Ri [A] nm) back to site i ;
4) Reduce Ri to Ri ’ using Ri [A] m (or Ri [A] nm) .
Evaluation:
– Benefit: B(t) = [S(Ri ) - S(Ri ’)] + [S(Rj) - S(Rj’)]
– Cost: C(t) = S(Ri [A] ) + min[S(Ri [A]m ) , S( Ri [A]nm)]
– If the benefit exceeds the cost (D(t) >0) then it is called a
cost-effective 2-way semioin
10
3.a.3 2-way semijoin example.
1
2
3
Site 1
R1
R1[A]
Ship(3)
R2
projection
A
B
A
C
1
4
3
7
2
5
4
8
3
6
5
9
Ship(1)
R1[A]m
3
reduce
R1’
Site 2
3
partition
reduce
R1[A]nm
1
2
6
3
Ship(2)
7
R2’
Ship(2)
qs
11
3.a.4 Semijoin Vs 2-way semijoin.
-It is an extended version of semijoin.
– It has more reduction power than semijoin.
– The propagation of reduction effects by the 2way semijoin is further than by the semijoin.
12
3.b.1 Hash-semijoin operator.
Main idea : use a search filter which represents the
semijoin projection with a small bit array .
Definition:
The hash-semijoin of Ri and Rj is denoted Rj∝ Ri.
It is computed as follow:
– The Semijoin projection of Ri is represented as
a bit array;
– Shipping this bit array to the site of Rj ;
– finally, the tuples of Rj are screened by the
search filter.
13
3.b.2 hash semijoin example.
R2
R1
S
#
1
S#(R1)
Name
Cindy
3
Jemal
4
Sunny
8
Maggie
1
projection
3
4
8
B
1
0
H ((R ))B 1
1
H(x)=X 0
0
0
1
ij
i
ij
Ship(Bij)
Rj
S
#
2
Phon
e222
3
333
4
444
5
555
6
666
reduc
e
3
4
333
444 14
3.b.3 Semijoin Vs Hash Semijoin.
• Advantages:
– Hash-semijoin is more cost-effective than semijoin
– The search filter in the hash-semijoin achieves
considerable savings in the cost of a semijoin operation
• Limitation:
– Only works on execution tree
– Tightly related with the hash functions
15
3.c.1 What is horizontally partitioned database
We can call a distributed database system is
horizontally partitioned (or fragmented) if
the relations can be split horizontally into
several disjoint sets of tuples, which are
called horizontal fragments.
16
3.c.2 Horizontally partitioned database
system.(Example)
EMP1: 1D-no 10
EMP
E-no
E-name
E-no
E-name
D-no
101
johnson
01
D-no
101
johnson
01
103
jordan
03
103
jordan
03
105
erving
01
105
erving
01
109
jabbar
12
E-no
E-name
D-no
110
sampson
14
109
jabbar
12
110
sampson
14
141
chang
16
141
chang

EMP2: 11D-no 20
16
17
3.c.3 Horizontally partitioned database
system.(Properties)
 A fragmented relation Ri can be constructed by performing
a union operation on all its fragment.
Ri = Uk Rik
 There is commutative rule between the binary operations
join and union for fragmented relations: a join between two
fragmented relation R1 and R2 is equivalent to a union over
the joins between each fragment of R1 and each fragment
of R2.
Mathematically:
(U R1k)[A=B] (U R2m)= U(R1k[A=B] R2m)
k
m
k.m
18
3.c.4 Why can’t we use regular semjoin
between two fragment to reduce the size
of fragments?(Continue)
We consider a joint Ri[A=B] Rj between two fragmented
relations Ri and Rj. We want to reduce the size of Rik, a
fragment of Ri , by semijoin before it is sent to the final
processing site. We cannot perform the semijoin
Rik A=B] Rjm
between Rik and any fragment Rjm of Rj without considering
the other fragment Rjm of Rj , because the join operation
dictates that no tuple of a relation can be eliminate before it
is compare with all tupls of the other joining relation which
may be contribute to the join.
19
Example:
sal: 101E-no 105
EMP1: 1D-no 10
E-no
Sal
D-no
E-no
E-name
D-no
101
1000
12
101
johnson
01
102
2000
03
Dno
103
jordan
03
105
3000
11
01
135
erving
01
03
EMP2: 11D-no 20
12
E-no
E-name
D-no
14
109
jabbar
12
110
sampson
14
141
chang
16
sal: 105E-no 110
E-no
Sal
D-no
107
1000
12
107
2000
03
110
3000
11
16
20
3.c.5 Definition of Domain Specific Semijoin.
The domain-specific semijoin operation, Rik( A=B] Rjm,
where A and B are the joining attributes and Rik, Rjm are two
fragments of the joining relation Ri and Rj respectively, is
defined as follows:
Rik( A=B] Rjm ={r|r Rik ; r.A  Rjm [B] U(Dom[Rj.B]Dom[Rjm.B])}
Where Rik is the restricted fragment and Rjm is the
restricting fragment. We also called Ri the restricted
relation and Rj is the restricting relation of the domainspecific semijoin.
21
3.d.1 Definition of Composite Semijoin.
 Composite Semijoin: a semijoin in which
the projection and the transimssion involve
multiple columns (attrs).
22
3.d.2 Example of Composite Semijoin.
R2
R1
A1
1
1
2
3
A2 Non-join Attr
aa
bb
cc
cc
-
No False loop!!
A1
1
1
2
3
A2 Non-join Attr
cc
aa
bb
bb
-
A1 A2 Non-join Attr
1 aa
23
3.d.3 Semijoin Vs Composite Semijoin.
 Composite semijoins in a query processing
algorithm is likely to result in substantial
RT reduction.
 Composite semijoins should not always be
used. If it results greater RT, ignore it.
 Strategy with composite semijoins is at least
as good as that without composite
semijoins.
24
References:
1.
2.
3.
4.
Using 2-way semijoin in distributed query processing. By Hyunchul
Kang and Nick Roussopoulos.
Improving distributed query processing by hash-semijoins. By Judy
Tseng and Arbee Chen.
Domain Specific Semijoin:A new operation for distributed query
processing. By Jason Chen and Victor Li.
Composite Semijoin in distributed query processing. By William
Perrizio and Chun Chen
25
Comments
&
Questions??
Thank You!
26