Query Processing and Networking Infrastructures

Download Report

Transcript Query Processing and Networking Infrastructures

Query Processing and Networking
Infrastructures
Day 2 of 2
Joe Hellerstein
UC Berkeley
September 27, 2002
Outline
Day 1: Query Processing Crash Course





Intro
Queries as indirection
How do relational databases run queries?
How do search engines run queries?
Scaling up: cluster parallelism and distribution
Day 2: Research Synergies w/Networking





Queries as indirection, revisited
Useful (?) analogies to networking research
Some of our recent research at the seams
Some of your research?
Directions and collective discussion
Indirections
Standard: Spatial Indirection
Allows referent to
move without changes
to referers

Doesn’t matter where
the object is, we find it.
Alternative: copying

Works if updates are
managed carefully, or
don’t exist
Temporal Indirection
Asynchronous
communication is
indirection in time

Doesn’t matter
when the object
arrives, you find it
Analogy to space


Sender  referer
Recipient  referent
Generalizing
Indirection in Space
x-to-one or x-to-many?
 Physical or Logical mapping?

Indirection in Time
Persistence model: storage or re-xmission
 Persistence role: sender or receiver

Indirection in Space, Redux
One-to-one, one-to-many, many-to-many?



Standard relational issue
E.g. virtual address is many-to-one
E.g. email distribution list is one-to-many
Physical or logical

Mapping table?


E.g. page tables, mailing list, DNS, multicast group lists
Logical

E.g. queries, subscriptions, interests
Indirection in Time, Redux
Persistence model: storage or re-xmission


Storage: e.g. DB, heap, stack, NW buffer,
mailqueue
Re-xmission: e.g. polling, retries.

“Joe is so persistent”
Persistence of put or get


Put: e.g. DB insert, email, retry
Get: e.g. subscription, polling
Examples: Storage Systems
Virtual Memory System


Space: 1-to-1, physical
Time: synchronous (no indirection)
Database System


Space: many-to-many, logical
Time: synchronous (no indirection)
Broadcast Disks


Space: 1-to-1
Time: re-xmitted put
Examples: Split-Phase APIs
Polling


Space: no indirection
Time: re-xmitted get
Callbacks


Space: no indirection
Time: stored get
Active Messages


Space: no indirection
Time: stored get

App stores a get with putter, which tags it on messages
Examples: Communication
Email

Space: One-to-many, physical


Mapping is one-to-many, delivery is one-to-one
(copies)
Time: stored put
Multicast

Space: One-to-many, physical


Both mapping and delivery are one-to-many
Time: roughly synchronous?
Examples: Distributed APIs
RPC

Space: 1-to-1, physical


Can be 1-to-many
Time: synchronous (no indirection)
Messaging systems

Space: 1-to-1, physical


Often 1-to-many
Time: depends!

Transactional messaging is stored put


Exactly-once transmission guaranteed
Other schemes are re-xmitted put

At least once transmission. Idempotency of message
becomes important!
Examples: Logic-based APIs
Publish-Subscribe
Space: one-to-many, logical
 Time: stored receiver

Tuplespaces
Space: one-to-many, logical
 Time: stored sender

Indirection Summary
2 binary indirection variables for space, 2 for time
Can have indirection in one without the other
Leads to 24 indirection options



16 joint space/time indirections, 4 space-only, 4 time-only
And few lessons about the tradeoffs!
Note: issues here in performance and SW engineering and …


E.g. “Are tuplespaces better than pub/sub?”
Not a unidimensional question!
Rendezvous
Indirection on both sender and receiver
side
In time and/or space on each side
 Most general: neither sender nor receiver
know where or when rendezvous will
happen!

Each chases a reference for where
 Each must persist for when

Join as Rendezvous
Recall pipelining hash join

Combine all blue and gray tuples
that match
A batch rendezvous


In space: the data items were not
stored in a fixed location, copied into HT
In time: both sides do put-persist in the join algorithm via
storage
A hint of things to come:

In parallel DBs, the hash table is content-addressed (via the
exchange routing function)


What if hash table is distributed?
If a tuple in the join is doing “get”, then is there a distinction
between sender/recipient? Between query and data?
Some resonances
We said that query systems are an indirection
mechanism.

Logical, many-to-many, but synchronous

Query-response
And some dataflow techniques inside query engines
seem to provide useful indirection mechanisms
If we add a network into the picture, life gets very
interesting



Indirection in space very useful
Indirection in time is critical
Rendezvous is a basic operation
More Resonance
More Interaction: CS262
Experiment w/ Eric Brewer
Merge OS & DBMS grad class, over a year
Eric/Joe, point/counterpoint
Some tie-ins were obvious:

memory mgmt, storage, scheduling, concurrency
Surprising: QP and networks go well side by
side

E.g. eddies and TCP Congestion Control

Both use back-pressure and simple Control Theory to
“learn” in an unpredictable dataflow environment
Scout
Paths the key to comm-centric OS

“Making Paths Explicit in the Scout Operating System”,
David Mosberger and Larry L. Peterson. OSDI ‘96.
Figure 3:Example Router Graph
CLICK
A NW router is a query plan!

With a twist: flow-based context

An opportunity for “autonomous” query optimization
Revisiting a NW Classic with DB
Goggles
Clark & Tennenhouse, SIGCOMM ‘90
Architectural Considerations for a New Generation
of Protocols
Love it for two reasons

Tries to capture the essence of what networks do



Great for people who need the 10,000-foot view!
I’m a fan of doing this (witness last week)
Tries to move the community up the food chain

Resonances everywhere!!
C&T Overview
(for amateurs like me)
Core function of protocols: data xfer

Data Manipulation


buffer, checksum, encryption,
xfer to/from app space, presentation
Transfer Control

flow/congestion ctl,
detecting transmission problems,
acks, muxing, timestamps, framing
Query
Opt!
C & T’s Wacky Ideas
Thesis: nets are good at xfer control, not so
good at data manipulation
Some C&T wacky ideas for better data
manipulation
Data Modeling!

Xfer semantic units, not packets (ALF)
Auto-rewrite layers to flatten them (ILP)
Minimize cross-layer ordering constraints

Control delivery in parallel via packet content


Exchange!
DB People Should Be Experts!
BUT… remember:

Basic Internet assumption:
“a network of unknown topology and with an unknown,
unknowable and constantly changing population of competing
conversations” (Van Jacobson)
Spoils the whole optimize-then-execute architecture of query
optimization




What happens when
denvironment/dt < query length??
What about the competing conversations?
How do we handle the unknown topology?
What about partial failure?
Ideally, we’d like:


the semantics and optimization of DB dataflow
with the agility and efficiency of NW dataflow
The Cosmic Convergence
Data Models, Query Opt, DataScalability
DATABASE RESEARCH
Continuous P2P Query
Adaptive Query
Sensor
Queries, Streams Engines Query Engines
Processing
XML
Routing
Router Content Addressing Directed
Toolkits
Diffusion
and DHTs
NETWORKING RESEARCH
Adaptivity, Federated Control, GeoScalability
What does the QP perspective
add?
In terms of high-level languages?
In terms of a reusable set of operators?
In terms of optimization opportunities?
In terms of batch-I/O tricks?
In terms of approximate answers?
A “safe” route to Active Networks?


Not computationally complete
Optimizable and reconfigurable -- data independence applies
Fun to be had here!

Addressing a few fronts at Berkeley…
Some of our work at the seams
Starting with centralized engine for remote
data sets and streams


Telegraph: eddies, SteMs, FLuX
“Deep Web”, filesharing systems, sensor streams
More recently, querying sensor networks

TinyDB/TAG: in-network queries
And DHT-based overlay networks

PIER
Telegraph Overview
Telegraph: An Adaptive
Dataflow System
Themes: Adaptivity and Sharing

Adaptivity encapsulated in operators




Eddies for order of operations
State Modules (SteMs) for transient state
FLuX for parallel load-balance and availability
Work- and state-sharing across flows

Unlike traditional relational schemes, try to share
physical structures
Franklin, Hellerstein, Hong and students (to follow)
Telegraph Architecture
Request Parsing, Metadata
XML Catalog
Online Query Processing
Modules
Join
Select Project Group Aggregate
Transitive Closure DupElim
Adaptive Routing and Optimization
Eddy
Ingress
File Reader
Juggle
SteM
Sensor Proxy
TeSS
FLuX
P2P Proxy
InterModule Comm
and scheduling (Fjords)
Explicit Dataflows
SQL
Continuous Adaptivity:
Eddies
Eddy
A little more state per tuple


Ready/done bits (extensible a la Volcano/Starburst)
Minimal state in Eddy itself


Queue + parameters being learning
Decisions: which tuple in queue to which operator
Query processing = dataflow routing!!
Ron Avnur
Two Key Observations
Break the set-oriented boundary


Usual DB model: algebra expressions: (R S) T
Common DB implementation: pipelining operators!


Subexpressions needn’t be materialized
Typical implementation is more flexible than algebra

We can reorder in-flight operators
Don’t rewrite graph. Impose a router


Graph edge = absence of routing constraint
Observe operator consumption/production rates


Consumption: cost. Production: cost*selectivity
Could break these down per values of tuples
So fun!


Simple, incremental, general
Brings all of query optimization online

And hence a bridge to ML, Control Theory, Queuing Theory
State Modules (SteMs)
static
dataflows
Goal: Further adaptivity through
competition

Multiple mirrored sources (AMs)




Handle rate changes, failures,
parallelism
Multiple alternate operators
Join = Routing + State
SteM operator manages tradeoffs



State Module, unifies caches,
rendezvous buffers, join state
Competitive sources/operators share
building/probing SteMs
Join algorithm hybridization!
Eddies + SteMs tackle the full (singlesite) query optimization problem
online
Vijayshankar Raman, Amol Deshpande
eddy
eddy
+
stems
FLuX: Routing Across Cluster
Fault-tolerant, Load-balancing eXchange
Continuous/long-running flows need high availability
Big flows need parallelism

Adaptive Load-Balancing req’d
FLuX operator: Exchange plus…


Adaptive flow partitioning (River)
Transient state replication & migration



Needs to be extensible to different ops:



Replication & checkpointing for SteMs
Note: set-based, not sequence-based!
Content-sensitivity
History-sensitivity
Dataflow semantics


Optimize based on edge semantics
Networking tie-in again:



Mehul Shah
At-least-once delivery?
Exactly-once delivery?
In/Out of order?
Continuously Adaptive
Continuous Queries (CACQ)
Continuous Queries clearly need all this stuff!

Natural application of Telegraph infrastructure
4 Ideas in CACQ:

Use eddies to allow reordering of ops.


Queries are data: join with Grouped Filter



A la stored get!
This idea extended in PSOUP (Chandrasekaran & Franklin)
Explicit tuple lineage



But one eddy will serve for all queries
Mark each tuple with per-op ready/done bits
Mark each tuple with per-query completed bits
Joins via SteMs, shared across all queries


Note: mixed-lineage tuples in a SteM. I.e. shared state is not shared algebraic
expressions!
Delete a tuple from flow only if it matches no query
Sam Madden, Mehul Shah, Vijayshankar Raman,
Sirish Chandrasekaran
Sensor QP: TinyDB/TAG
Wireless Sensor Networks
Smart Dust Motes
TinyOS
Palm Devices
Linux
A spectrum of devices


Varying degrees of power and network constraints
Fun is on the small side!
Our current platform: Mica and TinyOS



4Mhz Atmel CPU, 4KB RAM, 40kBit radio, 512K EEPROM,
128K Flash
Sensors: temp, light, accelerometer, magnetometer, mic,
etc.
Wireless, single-ported, multi-hop ad-hoc network

Spanning-tree communication through “root”
TinyDB
A query/trigger engine for motes
Declarative (SQL-like) language for
optimizability


Data independence arguments in spades here!
Non-programmers can deal with it
Lots of challenges at the seams of queries
and routing


Query plans over dynamic multi-hop network
With power and bandwidth consumption as key
metrics
Sam Madden (w/Hellerstein, Hong, Franklin)
Focus: Hierarchical Aggregation
Query
Aggregation natural in sensornets


The “big picture” typically interesting
Aggregation can smooth noise and loss


E.g. signal processing aggs like wavelets
Provides data reduction
Power/Network Reduction:
in-network aggregation


Hierarchical version of parallel
aggregation
Tricky design space




power vs. quality
topology-selection
value-based routing
dynamic environment requires
adaptivity
TinyDB Sample Apps
Habitat Monitoring:
what is the average
humidity in the populated
petrel burrows on Great Duck
Island right now?
Smart Office: find me the
conference rooms that have
been reserved but
unoccupied for 5 minutes.
Home Automation: lower blinds when light intensity
is above a threshold.
Performance in SensorNets
Power consumption

Communication >> Computation


Send > Receive


METRIC: radio wake time
METRIC: messages generated
“Run for 5 years” vs. “Burn power for critical events” vs.
“Run my experiment”
Bandwidth Constraints

Internal >> External

Volume >> surface area
Result Quality



Noisy sensors
Discrete sampling of continuous phenomena
Lossy communication channel
TinyDB
SQL-like language for specifying continuous
queries and triggers

Schema management, etc.
Proxy on desktop, small query engine per
mote


Plug and play (query snooping)
To keep the engine “tiny”, use an eddy-style arch

One explicit copy of each iterator’s code image
Adaptive dataflow in network
Alpha available for download on SourceForge
Some of the Optimization Issues
Extensible Aggregation API:


Init(), Iter(), SplitFlow(), Close()
Properties




Amount of intermediate state
Duplicate sensitivity
Monotonicity
Exemplary vs. Summary
Hypothesis Testing
Snooping and Suppression
Compression, Presumption, Interpolation
Generally, QP and NW issues intertwine!
PIER: Querying the Internet
Querying the Internet
As opposed to querying over the Internet
Have to deal with Internet realities


Scale, dynamics, federated admin, partial failure, etc.
Standard distributed DBs won’t work
Applications

Start with real-time, distributed network monitoring


Use PIER’s SQL as a workload generator for networks?



Traffic monitoring, intrusion/spam detection, software deployment
detection (e.g. via TBIT), etc.
Virtual “tables” determine load produced by each site
“Queries” become a way of specifying site-to-site communication
Move to infect the network more deeply?


E.g. Indirection schemes like i3, rendezvous mechanisms, etc.
Overlays only?
And p2p QP, Obviously
Gnutella done right

And it’s so easy! :-)
Crawler-free web search
Bring WYGIWIGY queries to the people

Ranking, recommenders, etc.
Got to be more fun here

If p2p takes off in a big way, queries have to be a big piece
Why p2p DB, anyway?


No good reason I can think of! :-)
Focus on the grassroots nature of p2p




Schema integration and transactions and … ??
No! Work with what you got! Query the data that’s out there
Nothing complicated for users will fly
Avoid the “DB” word: P2P QP, not P2P DB
Approach: Leverage DHTs
“Distributed Hash Tables”

Family of distributed content-routing schemes


CAN, CHORD, Pastry, Tapestry, etc.
Internet scale “hash table”


A la wide-area, adaptive Exchange routing table
With some notion of storage
Leverage DHTs aggressively


As distributed indexes on stored data
As state modules for query processing


E.g. use DHTs as the hash tables in a hash join
As rendezvous points for exchanging info

E.g. Bloom Filters
PIER: P2p Information Exchange
and Retrieval
Relational-style query executor



With front-ends for SQL and catalogs
Standard and continuous queries
With access to DHT APIs


Currently CAN and Chord, working on Tapestry
Common DHT API would help
Currently simulating queries running on 10’s of
thousands of nodes

Look ma, it scales!


Widest-scale relational engine ever, looks feasible
Most of the simulator code will live on in implementation

On Millennium and PlanetLab this fall/winter
Ryan Huebsch and Boon Thau Loo (w/Hellerstein, Shenker, Stoica)
PIER Challenges
How does this batch workload stress DHTs?
How does republishing of soft-state interact with dataflow?

And semantics of query answers
Materialization/precomputation/caching

Physical tuning meets SteMs meets materialized views
How to do query optimization in this context

Distributed eddies!
Partial failure a reality


At storage nodes, query execution nodes?
Impact on results, mitigation
What about aggregation?


Similarities/difference with TAG?
With Astrolabe [Birman et al]?
The “usual” CQ and data stream query issues, distributed

Analogous to work in Telegraph, and at Brown, Wisconsin,
Stanford…
All together now?
I thought about changing the names:


Telegraph*, Teletiny…?
The group didn’t like the branding

Teletubby!
Seriously: integration?

It’s a plausible need




Sensor data + map data + historical sensor logs + …
Filesharing + Web
We have done both of these cheesily
But fun questions of doing it right

E.g. pushing predicates and data into sensor net or not?
References & Resources
Database Texts
Undergrad textbooks





Ramakrishnan & Gehrke, Database Management Systems
Silberschatz, Korth, Sudarshan, Database System Concepts
Garcia-Molina, Ullman, Widom, Database Systems - The
Complete Book
O’Neil & O’Neil, DATABASE Principles, Programming, and
Performance
Abiteboul, Hull, Vianu, Foundations of Databases
Graduate texts


Stonebraker & Hellerstein, Readings in Database Systems
(a.k.a “The Red Book”)
Brewer & Hellerstein: Readings book (e-book?) in progress.
Fall 2003?
Research Links
DB group at Berkeley: db.cs.berkeley.edu
GiST: gist.cs.berkeley.edu
Telegraph: telegraph.cs.berkeley.edu
TinyDB: telegraph.cs.berkeley.edu/tinydb
berkeley.intel-research.net/tinydb
Red Book: redbook.cs.berkeley.edu