C-Store: Self-Organizing Tuple Reconstruction
Download
Report
Transcript C-Store: Self-Organizing Tuple Reconstruction
C-Store: Self-Organizing
Tuple Reconstruction
Jianlin Feng
School of Software
SUN YAT-SEN UNIVERSITY
Apr. 17, 2009
Review of Tuple Reconstruction
Stitch together separate column values of the
same logical tuple.
Join on Tuple IDs/positions.
Two Strategies
Early materialization
Late matertialization
Motivation
Tuple Reconstruction is easy
if columns are sorted in the same order
However the pre-requisite can not always be
preserved.
During query processing, many operators (joins,
group by, order by, etc.) are not tuple orderpreserving.
The Ultimate Access Pattern
For each relation R, we have one copy for each
attribute in R.
each copy is pre-sorted on the corresponding attribute.
All tuple reconstruction initiated by a restriction on
an attribute R.a, can be done using the copy that is
sorted on R.a.
The Limitations:
Space constraint
Idle time for the pre-sortings.
The Proposed Solution
Partial Sideways Cracking
Uses auxiliary self-organizing data structures to
materialize mappings between pairs of attributes
used together for tuple reconstruction.
Background
MonetDB
Selection-based Cracking
MonetDB: http://monetdb.cwi.nl/
Every relation table
is represented as a collection of Binary Association Tables
(BATs).
Each BAT is a set of two columns
For a relation R of k attributes, there exists k BATs.
Each BAT stores (key, attr) pairs.
In each BAT, keys are system generated tuple IDs.
For base BAT, the key column is typically virtual.
Like STORAGE KEY in Read Store of C-Store.
MonetDB’s Basic Operators (1)
select(A, v1, v2)
Searches all (key, attr) pairs in base column A for
attribute values between v1 and v2.
Output:
A list of keys/positions.
In the output, the tuple order is usually preserved.
MonetDB’s Basic Operators (2)
join(j1, j2)
Performs a join between attr1 of j1 and attr2 of j2.
Output:
A list of (key1, key2) pairs.
In the output, the tuple order is mainly preserved
for outer join.
Outer Join
An outer join does not require each record in
the two joined tables to have a matching
record.
The joined table retains each record—even if
no other matching record exists.
Left outer join
Right outer join
Full outer join
Left Outer Join
MonetDB’s Basic Operators (3)
reconstruct(A, r)
Output:
All (key, attr) pairs of base column A at the position
specified by r.
Selection-Based Cracking
Cracker column
The first time an attribute A is required by a query, a copy
of column A is created, called the cracker column CA of A.
Each selection operator on A triggers a range-based
physical reorganization of CA.
Each cracker column, has a cracker index (AVL-tree) to
maintain partitioning information.
Future queries benefit from the physically clustered data
and do not need to access the whole column.
AVL-Tree
An AVL tree is a self-balancing binary search
tree.
In an AVL tree, the heights of the two child
subtrees of any node differ by at most one.
An example of an unbalanced non-AVL
tree
The same tree after being height-balanced
Order for Tuple Reconstruction
The order in which tuples are inserted is used
for tuple construction.
Physical reorganization happens only on cracker
columns.
The crackers.select Operator
crackers.select(A, v1, v2)
First, it creates CA if it does not exist.
It searches the index of CA for the area where v1 and
v2 fall.
If the bounds do not exist, i.e., no query used them in
the past, then CA is physically reorganized to cluster
all qualifying tuples into a contiguous area.
Output:
A list of keys/positions.
Cracker Map
A cracker map MAB is defined as a twocolumn table over two attributes A and B of a
relation R.
Values of A are stored in the left column, called
head.
Values of B are stored in the right column, called
tail.
Values of A and B in the same position of MAB
belong to the same tuple.
Maps Are Created on Demand Only
When a query q needs access to attribute B
based on a restriction on attribute A and MAB
does not exist,
then q will create MAB by performing a scan
over base columns A and B.
For each cracker map MAB , there is a
cracker index (AVL-tree) that maintains
information about how A values are
distributed over MAB.
Queries Trigger Cracking
Query Style
Access B based on A.
Each such query triggers cracking (physical
reorganization) of MAB based on the
restriction applied to A.
Cracking
All tuples with values of A that qualify the
restriction are in a contiguous area in MAB .
Realized by splitting a piece of MAB into two or
three new pieces.
The sideways.select(A, v1, v2, B)
Operator
Returns tuples of attribute B of relation R based on a
predicate on attribute A of R as follows:
(1) If there is no cracker map MAB , then create one.
(2) Search the index of MAB to find the contiguous area w of the
pieces related to the restriction σ on A.
If σdoes not match existing piece boundaries,
(3) Physically reorganize w to move false hits out of the
contiguous area of qualifying tuples.
(4) Update the cracker index of MAB accordingly.
(5) Return a non-materialized view of the tail of w.
Multi-Projection Queries
A single-selection query q that projects n
attributes requires n maps, one for each
attribute to be projected.
Select B, C
From R
Where A < 4;
For this query, we need 2 maps MAB and MAc .
All maps that have been created using A as
head are collected in the map set SA.
Adaptive Alignment
The Problem
The Solution
Naïve use of the sideways.select operator may
lead to non-aligned cracker maps.
Extend the sideways.select operator with an
alignment step to keep the alignment maps .
The Basic Idea
Is to apply all physical reorganizations, due to
selections on an attribute A, in the same order to
all maps in the map set SA.
Cracker Tape
For each map set SA, introduce a cracker tape TA.
TA logs (in order of their occurrence) all selections on attribute A
that trigger cracking of any map in SA.
Each map MAx is equipped with a cursor pointing to the entry in
TA that represents the last crack on MAx.
Given a tape TA , a map MAx is aligned (synchronized)
by successively forwarding its cursor towards the end of
MAx
and incrementally cracking MAx according to all
selections it passes on its way.
All maps whose cursors point to the same position in TA ,
are physically aligned.
The Extended sideways.select Operator
Map Set Choice: Self-organizing
Histograms
Following the “cracking philosophy”
In this way, for a query q, a set SA is chosen
such that the restriction on A is the most
selective in q.
In an unpredictable environment with no idle
system time, always perform the minimum
investment.
Yielding a minimal bit vector
The most selective restriction can be found
using the cracker indices.
Complex Queries
No other (relational) operators, rather than
tuple reconstruction, depends on tuple
insertion order.
Joins,aggregations, groupings, etc.
Potentially many operators can exploit the
clustering information in the maps.
A MAX operator can consider only the last piece
of a map.
Such directions are for future work.
Experimental Analysis
Compare the implementation of selection and
sideways cracking on top of MonetDB,
Against the latest non-cracking version of
MonetDB,
And against MonetDB on presorted data.
Results
Sideways cracking achieves similar performance
to presorted data.
But does not have the heavy initial cost and the
restrictions on updates and workload prediction.
Partial Sideways Cracking
Consider storage restriction
Partial Maps
Maps are only partially materialized driven by the
workload.
A map consists of several chunks.
Each chunk is a separate two-column table.
Each chunk contains a given value range of the
head attribute of this map.
Each chunk is cracked separately.
A Research Direction
Improving performance by compression
C-Store uses compression heavily.
Can we integrate compression with cracking?
References
S. Idreos, M. L. Kersten, S. Manegold. Selforganizing Tuple Reconstruction in Column-stores.
In Proceedings of the ACM SIGMOD International
Conference on Management of Data, Providence,
RI, USA, Accepted for publication, June 2009.
Daniel J. Abadi, Daniel S. Myers, David J. DeWitt,
and Samuel R. Madden。
Materialization Strategies in a Column-Oriented
DBMS . Proceedings of ICDE, April, 2007, Istanbul,
Turkey.