Access Path Selection in a Relation Database Management System (summarized in section 2)

Download Report

Transcript Access Path Selection in a Relation Database Management System (summarized in section 2)

Access Path Selection in a Relation Database Management System

(summarized in section 2)

Processing an SQL statement

• parsing, optimization, code generation, execution • an SQL statement may have many query blocks (nesting)

Optimizer

• validates parsed query • collects statistics on referenced relations & columns • discovers available access paths for each relation • checks for type errors in expressions • Access path selection: – determines order of evaluation of query blocks – a tree of alternate path choices is created for each query block with more than one relation

RSS (Research Storage System)

• storage manager for System R • Maintains physical storage, access paths, locking, logging, and recovery • Relations are stored as a collection of tuples • tuples are stored on 4K pages; pages are organized into segments • segments completely contain one or more relations • tuples are accessed via a scan: sequential scan or index scan

Cost computation

• cost = page fetches + W*(RSI calls) – cost = IO costs + W * CPU costs • an index that matches a boolean factor of the query is an efficient access path

Statistics

• NCARD(T): cardinality of the relation T • TCARD(T): number of pages used for T • P(T): fraction of pages in a segment used for T • ICARD(I): number of distinct keys in index I • NINDX(I): number of pages in index I

Selectivity

• •

column = value

: F = 1/ICARD(column) if there is an index. F = 1/10 otherwise •

column1 = column2

: F = 1/MAX(ICARD(column1), ICARD(column2)); F = 1/ICARD(column i); F = 1/10 •

column > value

: F = (high key value value) / (high key - low key)

column between value1 and value2

: F= (value2 - value1)/ (high key - low key)

QCARD

• QCARD is (  card. of all relations) * (  F(pred i)) • RSICARD is the expected number of calls to RSI (  card. of all relations) * (  F(sargable pred i)) • An “interesting order” is an order specified by the GROUP BY or ORDER BY clause • Single relation cost: cheapest access path which produces the “interesting order” or

Cost Table (p. 515)

• index pages fetched plus data pages fetched plus W times RSI tuple retrieval calls.

• unique index matching an equal predicate: 1+1+W • clustered index I matching one or more boolean factors: F(preds) *(NINDX(I) + TCARD) + W * RSICARD • etc…

Joins

• nested loops and merging scans • merging scans require sorts on the join column -- another “interesting order” • n-way joins can be done by a succession of 2-way joins; not necessarily using the same technique. Results may be pipelined if a sort is not required.

Join ordering

• n! permutations of relation join orders • join of (k+1) relation with previous k relations is independent of first k join order • avoid Cartesian products when possible; make them as late as possible

Construct a tree

• construct a tree of possible join orderings: keep the cheapest order that produces an interesting ordering.

• First find the best way to access each single relation for each interesting ordering and unordered.

• Next, find the best way of joining any relation to each of these.

• Repeat until all relations have been added to each branch