Document 7477056
Download
Report
Transcript Document 7477056
Welcome to the
Workshop on
Query Optimization
I. I. T. Bombay
Jointly organized with and sponsored
by:
PSPL Ltd, Pune
October 1 and 2, 1999
1
Workshop Overview
Goal:
tutorial on research in query optimization
foster R&D in query optimization in India
Programme:
5 research sessions
2 discussion sessions
free time after dinner on Friday for discussions
lunch and dinner
sponsored by PSPL
2
Workshop Highlights
Extensions of query optimization
parametric query optimization, dynamic reordering of
query plans, ..
Multi-query optimization
Caching
Semi-structured data and information retrieval
including XML
Directories and databases
Discussion sessions: VLDB99 update, and
Future Directions
3
Introductions….
Please introduce yourself
(in less than 10 seconds!)
4
A Brief History of Query
Optimization
S. Sudarshan
CSE Dept, IIT Bombay
5
The Dark Ages ...
Till late 70s:
Before relational databases
Dark ages, all optimization by hand
Needed sorcerers and wizards (the human kind, not
the Microsoft kind!)
6
Let there be light ...
Late 70’s to mid 80s
SQL: declarative syntax, optimization is job of system
System R
Seminal work on join order optimization, set
stage for all later query optimization
Distributed database query optimization (System R*)
7
Why Query Optimization?
SQL is declarative
Up to system to decide how to execute query
Disk access was and is slow compared to
memory access
Join order etc. can have major impact on speed
e.g. query ran for days on MS SQL-Server 6.5 due to
bad optimization decision, half an hour with good
decision
Different from optimization in other domains
e.g. linear programming, compiler optimizations
8
System R
Join order selection
A1
A2
A3
..
Left deep join trees
An
An
Ai
Dynamic programming
Best plan computed for each subset of relations
• Best plan (A1, .., An) = min cost plan of(
Best plan(A2, .., An)
A1
Best plan(A1, A3, .., An)
A2
….
Best plan(A1, .., An-1))
An
9
System R (cont)
Selects and projects pushed down to lowest
possible place
Sort order
join may be cheaper if inputs are sorted on join
attributes
=> Best plan(set-of-relations, sort-order)
Heuristic handling of other SQL features
views, nested queries, aggregates, selects, unions, ...
Many other databases used heuristic optimizers
10
Brief History (Contd)
Mid 80’s to early 90s
Extensible query optimization
Exodus, Volcano
New data models
Nested relational model
Object oriented data model
Foreign functions
Recursive queries / deductive databases
11
Volcano Extensible Query
Optimizer Generator
General purpose cost based query optimizer,
based on equivalence rules on algebras
e.g. equivalences: join associativity, select push
down, aggregate push down, etc
extensible: new operations and equivalences can be
easily added
notion of physical properties generalizes “interesting
sort order” idea of System R
Developed by Graefe and McKenna 1993
12
Key Ideas in Volcano
DAG representation of query
Equivalence node
and operation nodes
Compactly represents set of all evaluation plans
choose one child of each equivalence node, and
all children of operation nodes
ABC
AB
AC
BC
A
B
C
13
Main Benefits of Volcano
Extensible
can handle arbitrary algebraic expressions
new operators and equivalence rules easy to add
must be careful of search space though
Yet (reasonably) efficient
generalizes the dynamic programming idea of
System-R optimizer
Optimizations of Pellenkroft et al. [VLDB 97]
eliminate redundant derivations for joins
Used in MS SQL Server and Tandem
14
Optimization in OODB/ORDBs
Major issues
Path expressions:
e.g. forall ( p in person) print (p->spouse>name)
can convert pointer dereferences to joins
can “assemble objects” in a clever sequence to
minimize I/O (Graefe 93, Blakeley et al, Open
OODB optimizer 95)
Path indices
e.g. forall (p in person suchthat
p->spouse->name = “Rabri”) …
15
Optimization in ORDBs
Expensive predicates/functions in
selects/projects
e.g. selects based on image manipulation
usual heuristic of “push select predicates to lowest
possible level’’ does not work
Extended ADTs with methods
optimizer order of applying methods, and ordering of
method evaluation and joins
16
Brief History (Contd.)
Mid 90’s to late 90s
Materialized views, view and index selection
Web, data warehouses, virtual databases
OLAP, data mining, …
Approximate query answering
Improvements on earlier techniques
handling outerjoins, aggregates
PLUS: all the topics covered in this workshop
17
Materialized Views
Can materialize (precompute and store) views
to speed up queries
Incremental maintenance
when database is updated, propagate updates to
materialized view without complete recomputation
Deciding when to use materialized views
even if query does not refer to materialized view,
optimizer can figure out it can be used
18
Deciding What to
Materialize
maintenance cost and query cost
workload:
queries and update transactions
weights for each component of workload
workload cost depends on what is materialized
Goal: find set of views that gives minimum
cost if materialized, subject to space constraints
Note: materializing views can reduce even
update costs
indices, and SQL assertions
19
Data Warehouses
Characteristics:
Very large
typical schema: very large fact table, small
dimension tables
typical query: aggregate on join of fact table and
dimension tables
Can exploit above characteristics for optimizing
queries
e.g., join dimension tables (even if cross product),
build in memory index, scan fact table, probe index.
Summarize if required and output
20
Data Warehouses
(Cont)
Synchronized scans
multiple queries can share a scan of fact table
slow some queries down so others catch up
Bit map indices
for selections on low cardinality attributes
e.g.: M 10011100011001
F 01100011100110
idea: and-ing of bit maps is very efficient, use on
bitmaps to filter to relevant tuples, retrieve them
Quass and O’Neill [Sigmod 1997]
21
Virtual Warehouses and
Databases
Data sources are numerous and distributed
may be accessible only via HTML / XML
=> wrappers needed
may support only limited number of access types
through forms interfaces
site descriptions: describe what data is contained at
a site Levy et al [1995].
Query sent only to relevant sites.
Stanford TSIMMIS project, Junglee, and others
22
And on to the workshop ...
23
Decorrelation
Idea: convert nested subqueries to joins
Consider
select * from emp E
where E.numchildren <>
(select count(*) from person
where person.parent = E.name
Can’t always express using basic rel. algebra
Long history:
special cases: Kim 88, Dayal 88, Muralikrishna 93
general case: P. Seshadri et al 95: use outerjoin
24
Incremental View
Maintenance
E.g. R
S
(R U ir)
S=R
S U ir
S
(R - dr)
S=R
S - dr
S
similar techniques for selection, projection
(must maintain multiplicity counters though)
and aggregation
Blakeley et al. [SIGMOD 87], Gupta and Mumick
survey [DE Bulletin 95].
25
Deciding What to
Materialize
History
Roussopolous [1982]: exhaustive A* algorithm
Ross, Srivastava and Sudarshan [SIGMOD 96]
suggest materializing views can reduce update costs,
give heuristics
Labio et al. [1997], Gupta [1997], Sellis et al [1997],
Yang, Karlapalem and Li [1997] give various
exhaustive/heuristic/greedy algorithms
Chaudhuri and Narsayya [1998] considers only
indices, being introduced in SQL server
Exhaustive algos are all doubly exponential!
26