Interactive Dynamic Aggregate Queries Kenneth A. Ross, Junyan Ding

Download Report

Transcript Interactive Dynamic Aggregate Queries Kenneth A. Ross, Junyan Ding

Interactive Dynamic
Aggregate Queries
Kenneth A. Ross,
Junyan Ding
Columbia University
April 5, 2002
DGRC Evaluation Board Meeting
Research Experience



Complex query processing
Data Warehousing
Main memory databases
Students: Kazi Zaman, Junyan Ding
April 5, 2002
DGRC Evaluation Board Meeting
Outline





Past work on Datacubes
Scenario Overview
Performance Goals
Technical Details
Conclusion
April 5, 2002
DGRC Evaluation Board Meeting
Datacubes
State Year Grade Sales
State
Year
Grade
Sales
CA
NY
CA
1997 Regular 90
1997 Premium 70
1998 Premium 65
NY
1998 Premium 95
CUBE BY
(sum Sales)
Large increase in total Size,
especially with many dimensions
April 5, 2002
CA
CA
ALL
CA
ALL
ALL
ALL
CA
ALL
1997
1997
1997
ALL
1997
1997
ALL
ALL
ALL
Additional
records
…….
DGRC Evaluation Board Meeting
Regular 90
ALL
90
Regular 90
Regular 90
Regular 90
ALL
160
Regular 90
ALL
155
ALL
320
Two-Level Framework
Level-1 Store
Level-2 Store
Finest granularity
cuboid
Query q
records in linked
lists
Selected coarse
records in hash
table
April 5, 2002
Slot directory
DGRC Evaluation Board Meeting
Datacube Implementation Issues


Fast query response (2-4 ms/query), but
Rigid:
– Need to know dimensions and aggregate
functions in advance for precomputation.
– Not appropriate when hundreds (PUMS) or
thousands (NHANES) of dimensions are
available.
April 5, 2002
DGRC Evaluation Board Meeting
Scenario Outline
User
Data Request
Dynamic Query
Graphical User
Interface
Mediator
Web
...
Unified
Results
Traditional
DBMS
Dynamic Query
Engine
April 5, 2002
Data Files
e.g., PUMS
DGRC Evaluation Board Meeting
Engine Decoupled from Interface
•Can use a variety of interfaces
•Multiple connections to one server
•Can “do one thing well”
•Client/Server parallelism
•Abstract interaction via API
April 5, 2002
DGRC Evaluation Board Meeting
Engine Performance Goals
•Interactive data exploration
•Millions of records
•Thousands of columns (but look at ten
or so at a time)
•Aggregates and statistical measures
•Fine adjustments at 30 answers/second.
April 5, 2002
DGRC Evaluation Board Meeting
Technical Details
•Main Memory Implementation
•Multidimensional tree structures
•Cache consciousness
•Branch Misprediction
•SIMD
•Asynchronous work
April 5, 2002
DGRC Evaluation Board Meeting
Initial Results



Preliminary implementation
tens of answers/second on PUMS 1%
data (>2 million records)
“Vanilla” user interface under
construction
April 5, 2002
DGRC Evaluation Board Meeting
Conclusions and Plan



First step towards dynamic aggregate
queries
More general dependence on
parameters
More ambitious user interfaces
April 5, 2002
DGRC Evaluation Board Meeting
Plan




Integrate with user interface to generate
dynamic queries.
Self-tuning capability.
Multiple data sets.
Work with agencies to generate value
– For intra-agency analysis
– For enhanced data dissemination
April 5, 2002
DGRC Evaluation Board Meeting