Document 7477056

Download Report

Transcript Document 7477056

Welcome to the
Workshop on
Query Optimization
I. I. T. Bombay
Jointly organized with and sponsored
by:
PSPL Ltd, Pune
October 1 and 2, 1999
1
Workshop Overview
Goal:
tutorial on research in query optimization
foster R&D in query optimization in India
Programme:
5 research sessions
2 discussion sessions
free time after dinner on Friday for discussions
lunch and dinner
 sponsored by PSPL
2
Workshop Highlights
Extensions of query optimization
parametric query optimization, dynamic reordering of
query plans, ..
Multi-query optimization
Caching
Semi-structured data and information retrieval
including XML
Directories and databases
Discussion sessions: VLDB99 update, and
Future Directions
3
Introductions….
Please introduce yourself
(in less than 10 seconds!)
4
A Brief History of Query
Optimization
S. Sudarshan
CSE Dept, IIT Bombay
5
The Dark Ages ...
Till late 70s:
Before relational databases
Dark ages, all optimization by hand
Needed sorcerers and wizards (the human kind, not
the Microsoft kind!)
6
Let there be light ...
Late 70’s to mid 80s
SQL: declarative syntax, optimization is job of system
System R
Seminal work on join order optimization, set
stage for all later query optimization
Distributed database query optimization (System R*)
7
Why Query Optimization?
SQL is declarative
Up to system to decide how to execute query
Disk access was and is slow compared to
memory access
Join order etc. can have major impact on speed
e.g. query ran for days on MS SQL-Server 6.5 due to
bad optimization decision, half an hour with good
decision
Different from optimization in other domains
e.g. linear programming, compiler optimizations
8
System R
Join order selection
A1
A2
A3
..
Left deep join trees
An
An
Ai
Dynamic programming
Best plan computed for each subset of relations
• Best plan (A1, .., An) = min cost plan of(
Best plan(A2, .., An)
A1
Best plan(A1, A3, .., An)
A2
….
Best plan(A1, .., An-1))
An
9
System R (cont)
Selects and projects pushed down to lowest
possible place
Sort order
join may be cheaper if inputs are sorted on join
attributes
=> Best plan(set-of-relations, sort-order)
Heuristic handling of other SQL features
views, nested queries, aggregates, selects, unions, ...
Many other databases used heuristic optimizers
10
Brief History (Contd)
Mid 80’s to early 90s
Extensible query optimization
Exodus, Volcano
New data models
Nested relational model
Object oriented data model
Foreign functions
Recursive queries / deductive databases
11
Volcano Extensible Query
Optimizer Generator
General purpose cost based query optimizer,
based on equivalence rules on algebras
e.g. equivalences: join associativity, select push
down, aggregate push down, etc
extensible: new operations and equivalences can be
easily added
notion of physical properties generalizes “interesting
sort order” idea of System R
Developed by Graefe and McKenna 1993
12
Key Ideas in Volcano
DAG representation of query
Equivalence node
and operation nodes
Compactly represents set of all evaluation plans
choose one child of each equivalence node, and
all children of operation nodes
ABC
AB
AC
BC
A
B
C
13
Main Benefits of Volcano
Extensible
can handle arbitrary algebraic expressions
new operators and equivalence rules easy to add
must be careful of search space though
Yet (reasonably) efficient
generalizes the dynamic programming idea of
System-R optimizer
Optimizations of Pellenkroft et al. [VLDB 97]
eliminate redundant derivations for joins
Used in MS SQL Server and Tandem
14
Optimization in OODB/ORDBs
Major issues
Path expressions:
e.g. forall ( p in person) print (p->spouse>name)
can convert pointer dereferences to joins
can “assemble objects” in a clever sequence to
minimize I/O (Graefe 93, Blakeley et al, Open
OODB optimizer 95)
Path indices
e.g. forall (p in person suchthat
p->spouse->name = “Rabri”) …
15
Optimization in ORDBs
Expensive predicates/functions in
selects/projects
e.g. selects based on image manipulation
usual heuristic of “push select predicates to lowest
possible level’’ does not work
Extended ADTs with methods
optimizer order of applying methods, and ordering of
method evaluation and joins
16
Brief History (Contd.)
Mid 90’s to late 90s
Materialized views, view and index selection
Web, data warehouses, virtual databases
OLAP, data mining, …
Approximate query answering
Improvements on earlier techniques
handling outerjoins, aggregates
PLUS: all the topics covered in this workshop
17
Materialized Views
Can materialize (precompute and store) views
to speed up queries
Incremental maintenance
when database is updated, propagate updates to
materialized view without complete recomputation
Deciding when to use materialized views
even if query does not refer to materialized view,
optimizer can figure out it can be used
18
Deciding What to
Materialize
maintenance cost and query cost
workload:
queries and update transactions
weights for each component of workload
workload cost depends on what is materialized
Goal: find set of views that gives minimum
cost if materialized, subject to space constraints
Note: materializing views can reduce even
update costs
indices, and SQL assertions
19
Data Warehouses
Characteristics:
Very large
typical schema: very large fact table, small
dimension tables
typical query: aggregate on join of fact table and
dimension tables
Can exploit above characteristics for optimizing
queries
e.g., join dimension tables (even if cross product),
build in memory index, scan fact table, probe index.
Summarize if required and output
20
Data Warehouses
(Cont)
Synchronized scans
multiple queries can share a scan of fact table
slow some queries down so others catch up
Bit map indices
for selections on low cardinality attributes
e.g.: M 10011100011001
F 01100011100110
idea: and-ing of bit maps is very efficient, use on
bitmaps to filter to relevant tuples, retrieve them
Quass and O’Neill [Sigmod 1997]
21
Virtual Warehouses and
Databases
Data sources are numerous and distributed
may be accessible only via HTML / XML
=> wrappers needed
may support only limited number of access types
through forms interfaces
site descriptions: describe what data is contained at
a site Levy et al [1995].
Query sent only to relevant sites.
Stanford TSIMMIS project, Junglee, and others
22
And on to the workshop ...
23
Decorrelation
Idea: convert nested subqueries to joins
Consider
select * from emp E
where E.numchildren <>
(select count(*) from person
where person.parent = E.name
Can’t always express using basic rel. algebra
Long history:
special cases: Kim 88, Dayal 88, Muralikrishna 93
general case: P. Seshadri et al 95: use outerjoin
24
Incremental View
Maintenance
E.g. R
S
(R U ir)
S=R
S U ir
S
(R - dr)
S=R
S - dr
S
similar techniques for selection, projection
(must maintain multiplicity counters though)
and aggregation
Blakeley et al. [SIGMOD 87], Gupta and Mumick
survey [DE Bulletin 95].
25
Deciding What to
Materialize
History
Roussopolous [1982]: exhaustive A* algorithm
Ross, Srivastava and Sudarshan [SIGMOD 96]
suggest materializing views can reduce update costs,
give heuristics
Labio et al. [1997], Gupta [1997], Sellis et al [1997],
Yang, Karlapalem and Li [1997] give various
exhaustive/heuristic/greedy algorithms
Chaudhuri and Narsayya [1998] considers only
indices, being introduced in SQL server
Exhaustive algos are all doubly exponential!
26