Session – 10 QUERY OPTIMIZATION Matakuliah : M0184 / Pengolahan Data Distribusi

Download Report

Transcript Session – 10 QUERY OPTIMIZATION Matakuliah : M0184 / Pengolahan Data Distribusi

Matakuliah
Tahun
Versi
: M0184 / Pengolahan Data Distribusi
: 2005
:
Session – 10
QUERY OPTIMIZATION
OBJECTIVE
• QUERY Optimization process
• Distribute query optimization algorithms
DISTRIBUTED QUERY
OPTIMIZATION ALGORITHMS
Algorith
ms
DIST.
INGRES
Optm.
Timing
Dynami
c
Obj.
Function
Resp.Time
or Total
Cost
R*
Static
Total Cost
SDD – 1
Static
Msg Size
Optm.
Factors
Msg Size,
Proj Cost
Networ Semi
k
Joins
General
General Yes
General No
/
broadca
st
#Msg,
General No
Msg Size, / Local
IO, CPU
Source : Principles Of Distributed Database Systems
DISTRIBUTED INGRES
Algorithm
• Consist of dynamically optimizing the processing
strategy of a given query.
• The objective function of the algorithm is to
minimize response time and cost
• General and broadcast network are consider,
therefore the data unit can be transmitted from
one site to all the other sites in a single transfer
• This algorithm is executed by the site, called
Master site where the query is initiated
R * Algorithm
• Uses compilation approach where an exhaustive
search of all alternative strategies is performed
in order to choose the one with the least cost.
• Query compilation is a distributed task in R*,
coordinated by master site, where the query is
initiated.
• To join two relation, there are three candidate
sites : site for 1st relation, site for 2nd relation and
3rd site.
• Two method for intersite data transfers : Ship
Whole and Fetch as Needed
SDD-1 Algorithm
• Refinements of an initial feasible solution
are recursively computed until no more
cost improvement can be made.
• It is devised for wide area point-to-point
network
• The cost of transferring the result to final
site is ignored
Method of Performing JOIN
• On the basic relational operation
(SELECT, JOIN , PROJECT) the most
expensive in term in both time and money
is JOIN. It causes more page swaps.
• The frequency of page swaps is often
used as the cost measure when modeling
these system
SEMI-JOIN Method
Symbol : Semi Join of Relation R1 and Relation R2
R1
R1
R2 =
π 1 (R1
R2 = R1
R2)
(πR2)
………… (1.1)
………… (1.2)
The second example has potential advantage
over the first if R1 and R2 are at different sites
A and B, and the execution of full JOIN
between R1 and R2 is to be done at the 3rd Site
C – Refer to SDD-1 algorithm system
SEMI-JOIN Method Cont’d
Procedure SEMI-JOIN
• Project the JOIN attributes from R2 at B (
= πR2), after applying any required
SELECTIONs
• Transmit πR2 to A
• Compute Semi Join of R1 at A
• Move the result to C
Non SEMI-JOIN Methods
• In R*, execution is generated at compilation time
and therefore it uses a static optimization
algorithms.
• Query can be compiled during times when the
system is slack and the optimal access plan for
their execution are determined then.
• The nested loop and merge scan method are
used for join in R*
• The nested loop method scans one relation
called outer relation for each tuple of the other
relation, called the inner relation with a
matching JOIN Value
Non SEMI-JOIN Methods
•
1.
2.
3.
4.
Preceding query execution, four stages of
“query preparation” are completed in R*, they
are :
Analyze the SQL query and perform object
name resolution
Look up the catalog to check authorization and
view integration
Plan the minimum cost access strategy for the
global query
Generate the access modules and store them
at a master site
Specific Issues for Modern Query
Optimization
• Is a static or a dynamic algorithm used
• Are independence and uniformity assumptions
made
• What JOIN method is used
• Is account taken of the effects of other system
modules on optimization
• Is simultaneous query traffic taken into account
• Is the search for an optimum exhaustive
• Is any semantic query transformation carried out
• Is the influence of heterogeneity considered