Querying Sensor Networks - Massachusetts Institute of

Download Report

Transcript Querying Sensor Networks - Massachusetts Institute of

Querying Sensor Networks
Sam Madden
UC Berkeley
October 2, 2002 @ UCLA
Introduction
• Programming Sensor Networks Is Hard
– Especially if you want to build a “real”
application
• Declarative Queries Are Easy
– And, can be faster and more robust than
most applications!
Overview
• Overview of Declarative Systems
• TinyDB
– Features
– Demo
•
•
•
•
Challenges+ Research Issues
Language
Optimizations
The Next Step
Overview
• Overview of Declarative Systems
• TinyDB
– Features
– Demo
•
•
•
•
Challenges + Research Issues
Language
Optimizations
The Next Step
Declarative Queries: SQL
• SQL is the traditional declarative language
used in databases
SELECT {sel-list}
FROM {tables}
WHERE {pred}
GROUP BY {pred}
HAVING {pred}
SELECT dept.name, AVG(emp.salary)
FROM emp,dept
WHERE emp.dno = dept.dno
AND (dept.name=“Accounting”
OR dept.name=“Marketing”)
GROUP BY dept.name
Declarative Queries for
Sensor Networks
• Examples:
1 SELECT nodeid, light
FROM sensors
WHERE light > 400
SAMPLE PERIOD 1s
3 ON EVENT bird_detect(loc) AS bd
SELECT AVG(s.light), AVG(s.temp)
FROM sensors AS s
WHERE dist(bd.loc,s.loc) < 10m
2 SELECT AVG(volume)
FROM sensors
WHERE light > 400
GROUP BY roomNo
HAVING AVG(volume) > 200
SAMPLE PERIOD 1s for 10
[Coming soon!]
Rooms w/ volume > 200
General Declarative
Advantages
• Data Independence
– Not required to specify how or where, just
what.
» Of course, can specify specific addresses when
needed
• Transparent Optimization
– System is free to explore different
algorithms, locations, orders for operations
Data Independence In Sensor
Networks
• Vastly simplifies execution for large
networks
– Since locations are described by
predicates
– Operations are over groups
• Enables tolerance to faults
– Since system is free to choose where and
when operations happen
Optimization In Sensor
Networks
• Optimization Goal : Power!
• Where to process data
– In network
– Outside network
– Hybrid
• How to process data
– Predicate & Join Ordering
– Index Selection
• How to route data
– Semantically Driven Routing
Overview
• Overview of Declarative Systems
• TinyDB
– Features
– Demo
•
•
•
•
Challenges + Research Issues
Language
Optimizations
The Next Step
TinyDB
• A distributed query processor for networks of
Mica motes
– Available today!
• Goal: Eliminate the need to write C code for
most TinyOS users
• Features
–
–
–
–
Declarative queries
Temporal + spatial operations
Multihop routing
In-network storage
TinyDB @ 10000 Ft
Query
{A,B,C,D,E,F}
(Almost) All Queries are
Continuous and Periodic
A
{B,D,E,F}
B
C
{D,E,F} D
E
Written in SQL
With Extensions For :
•Sample rate
F
•Offline delivery
•Temporal Aggregation
TinyDB Demo
Applications + Early Adopters
• Some demo apps:
– Network monitoring
– Vehicle tracking
Demo!
• “Real” future deployments:
– Environmental monitoring @ GDI (and James
Reserve?)
– Generic Sensor Kit
– Parking Lot Monitor
TinyDB Architecture (Per
node)
SelOperator
AggOperator
TupleRouter
Network
TupleRouter:
•Fetches readings (for ready queries)
•Builds tuples
•Applies operators
•Deliver results (up tree)
AggOperator:
•Combines local & neighbor readings
TinyAllloc
Schema
Radio
Stack
SelOperator:
•Filters readings
Schema:
•“Catalog” of commands & attributes (more
later)
TinyAlloc:
•Reusable memory allocator!
TinyAlloc
• Handle Based Compacting Memory Allocator
• For Catalog, Queries
Handle h;
call MemAlloc.alloc(&h,10);
Free Bitmap
…
(*h)[0] = “Sam”;
Heap
Master Pointer
Table
call MemAlloc.lock(h);
tweakString(*h);
call MemAlloc.unlock(h);
call MemAlloc.free(h);
Compaction
User Program
Schema
• Attribute & Command IF
– At INIT(), components register attributes and
commands they support
» Commands implemented via wiring
» Attributes fetched via accessor command
– Catalog API allows local and remote queries over
known attributes / commands.
• Demo of adding an attribute, executing a
command.
Overview
• Overview of Declarative Systems
• TinyDB
– Features
– Demo
• Challenges + Research Issues
– Language
– Optimizations
– Quality
3 Questions
• Is this approach expressive enough?
• Can this approach be efficient enough?
• Are the answers this approach gives
good enough?
Q1: Expressiveness
• Simple data collection satisfies most users
• How much of what people want to do is just
simple aggregates?
– Anecdotally, most of it
– EE people want filters + simple statistics (unless
they can have signal processing)
• However, we’d like to satisfy everyone!
Query Language
• New Features:
– Joins
– Event-based triggers
» Via extensible catalog
– In network & nested queries
– Split-phase (offline) delivery
» Via buffers
Sample Query 1
Bird counter:
CREATE BUFFER birds(uint16 cnt)
SIZE 1
ON EVENT bird-enter(…)
SELECT b.cnt+1
FROM birds AS b
OUTPUT INTO b
ONCE
Sample Query 2
Birds that entered and left within time t of
each other:
ON EVENT bird-leave AND bird-enter WITHIN t
SELECT bird-leave.time, bird-leave.nest
WHERE bird-leave.nest = bird-enter.nest
ONCE
Sample Query 3
Delta compression:
SELECT light
FROM buf, sensors
WHERE |s.light – buf.light| > t
OUTPUT INTO buf
SAMPLE PERIOD 1s
Sample Query 4
Offline Delivery + Event Chaining
CREATE BUFFER equake_data( uint16 loc, uint16 xAccel,
uint16 yAccel)
SIZE 1000
PARTITION BY NODE
SELECT xAccel, yAccel
FROM SENSORS
WHERE xAccel > t OR yAccel > t
SIGNAL shake_start(…)
SAMPLE PERIOD 1s
ON EVENT shake_start(…)
SELECT loc, xAccel, yAccel
FROM sensors
OUTPUT INTO BUFFER equake_data(loc, xAccel, yAccel)
SAMPLE PERIOD 10ms
Event Based Processing
• Enables internal and chained actions
• Language Semantics
– Events are inter-node
– Buffers can be global
• Implementation plan
– Events and buffers must be local
– Since n-to-n communication not (well) supported
• Next: operator expressiveness
Operator Expressiveness:
Aggregate Framework
• Standard SQL supports “the basic 5”:
– MIN, MAX, SUM, AVERAGE, and COUNT
• We support any function conforming to:
Aggn={fmerge, finit, fevaluate}
Fmerge{<a1>,<a2>}  <a12>
finit{a0}
 <a0>
Fevaluate{<a1>}
 aggregate value
Partial Aggregate
(Merge associative, commutative!)
Example: Average
AVGmerge
{<S1, C1>, <S2, C2>}  < S1 + S2 , C1 + C2>
AVGinit{v}
 <v,1>
AVGevaluate{<S1, C1>}
 S1/C1
From Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002 (to appear).
Isobar Finding
Temporal Aggregates
• TAG was about “spatial” aggregates
– Inter-node, at the same time
• Want to be able to aggregate across time as
well
• Two types:
– Windowed: AGG(size,slide,attr)
slide =2
size =4
… R1 R2 R3 R4 R5 R6 …
– Decaying: AGG(comb_func, attr)
– Demo!
Expressiveness Review
• Internal & nested queries
– With logging of results for offline delivery
• Event based processing
• Extensible aggregates
– Spatial & temporal
• On to Question 2: What about efficiency?
Q2: Efficiency
• Metric: power consumption
• Goal: reduce communication, which
dominates cost
– 800 instrs/bit!
• Standard approach: in-network
processing, sleeping whenever you
can…
But that’s not good enough…
• What else can we do to bring down costs?
• Sleep Even More?
– Events are key
• Apply automatic optimization!
–
–
–
–
Semantically driven routing
…and topology construction
Operator placement + ordering
Adaptive data delivery
TAG
• In-network processing
– Reduces costs depending on type of
aggregates
• Exploitation of operator semantics
Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002 (to appear).
Illustration: Pipelined
Aggregation
SELECT COUNT(*)
FROM sensors
1
2
3
4
5
Depth = d
Illustration: Pipelined
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
Epoch #
1
1
2
1
3
1
1
1
4
1
Epoch 1
1
5
1
1
2
3
1
4
1
5
Illustration: Pipelined
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
Epoch #
1
2
3
3
Epoch 2
1
4
1
5
1
1
1
1
1
1
2
3
1
2
2
1
2
2
3
2
4
1
5
Illustration: Pipelined
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
Epoch #
1
2
3
4
Epoch 3
1
4
1
5
1
1
1
1
1
1
2
3
1
2
2
1
3
4
1
3
2
1
3
2
3
2
4
1
5
Illustration: Pipelined
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
Epoch #
1
2
3
5
Epoch 4
1
4
1
5
1
1
1
1
1
1
2
3
1
2
2
1
3
4
1
3
2
1
4
5
1
3
2
1
3
2
3
2
4
1
5
Illustration: Pipelined
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
Epoch #
1
2
3
5
Epoch 5
1
4
1
5
1
1
1
1
1
1
2
3
1
2
2
1
3
4
1
3
2
1
4
5
1
3
2
1
5
5
1
3
2
1
3
2
3
2
4
1
5
Simulation Result
Simulation Results
Total Bytes Xmitted vs. Aggregation Function
2500 Nodes
100000
Depth = ~10
Neighbors = ~20
Total Bytes Xmitted
50x50 Grid
90000
80000
Some aggregates
require dramatically
more state!
70000
60000
50000
40000
30000
20000
10000
0
EXTERNAL
MAX
AVERAGE
Aggregation Function
COUNT
MEDIAN
Taxonomy of Aggregates
• TAG insight: classify aggregates according to various
functional properties
– Yields a general set of optimizations that can automatically be applied
Property
Partial State
Examples
MEDIAN : unbounded,
MAX : 1 record
Affects
Effectiveness of TAG
Duplicate
MIN : dup. insensitive,
AVG : dup. sensitive
MAX : exemplary
COUNT: summary
COUNT : monotonic
AVG : non-monotonic
Routing Redundancy
Exemplary vs.
Monotonic
Applicability of Sampling,
Hypothesis Testing, Snooping
Optimization: Channel Sharing
• Insight: Shared channel enables optimizations
• Suppress messages that won’t affect aggregate
– E.g., in a MAX query, sensor with value v hears a neighbor with value ≥
v, so it doesn’t report
– Applies to all exemplary, monotonic aggregates
• Learn about query advertisements it missed
– If a sensor shows up in a new environment, it can learn about queries
by looking at neighbors messages.
» Root doesn’t have to explicitly rebroadcast query!
Optimization: Hypothesis
Testing
• Insight: Root can provide information that will
suppress readings that cannot affect the final
aggregate value.
– E.g. Tell all the nodes that the MIN is definitely < 50;
nodes with value ≥ 50 need not participate.
– Depends on monotonicity
• How is hypothesis computed?
– Blind guess
– Statistically informed guess
– Observation over first few levels of tree / rounds of aggregate
Optimization: Use Multiple
Parents
• For duplicate insensitive aggregates
• Or aggregates that can be expressed as a linear
combination of parts
– Send (part of) aggregate to all parents
– Decreases variance
» Dramatically, when there are lots of parents
B
C
1/2
1
1/2
A
TAG Summary
• In Query Processing A Win For Many
Aggregate Functions
• By exploiting general functional properties of
operators, many optimizations are possible
– Requires new aggregates to be tagged with their
properties
• Up next: non-aggregate query processing
optimizations – a flavor of things to come!
Attribute Driven Topology
Selection
• Observation: internal queries often over local
area*
– Or some other subset of the network
» E.g. regions with light value in [10,20]
• Idea: build topology for those queries based on
values of range-selected attributes
– Requires range attributes, connectivity to be
relatively static
* Heideman et. Al, Building Efficient Wireless Sensor Networks With Low Level
Naming. SOSP, 2001.
Attribute Driven Query Propagation
SELECT …
WHERE a > 5 AND a < 12
4
[1,10]
[20,40]
[7,15]
1
2
3
Precomputed
intervals ==
“Query
Dissemination
Index”
Attribute Driven Parent Selection
1
2
[1,10]
3
[7,15]
[20,40]
Even without
intervals,
expect that
sending to
parent with
closest value
will help
[3,6]  [1,10] = [3,6]
4
[3,7]  [7,15] = ø
[3,6]
[3,7]  [20,40] = ø
Number of Nodes
Vi s i t ed ( 400 = Max)
Hot off the press…
450
400
350
300
250
200
150
100
50
0
Nodes Vi si t ed vs. Range Quer y Si ze f or
Di f f er ent I ndex Pol i ci es
Best Case (Expected)
Closest Parent
Nearest Value
Snooping
0.001
0.05
0.1
0.2
0.5
1
Quer y Si ze as % of Val ue Range
( Random val ue di st r i but i on, 20x20 gr i d, i deal
connect i vi t y t o ( 8) nei ghbor s)
Operator Placement &
Ordering
• Observation: Nested queries, triggers, and joins
can often be re-ordered
• Ordering can dramatically affect the amount of
work you do
• Lots of standard database tricks here
Operator Ordering Example 1
SELECT light, mag
FROM sensors
WHERE pred1(mag)
AND pred2(light)
SAMPLE INTERVAL 1s
• Cost (in J) of sampling mag >> cost of sampling light
• Correct ordering (unless pred1 is very selective):
1. Sample light
2. Apply pred2
3. Sample mag
4. Apply pred1
Operator Ordering Example 2
ON EVENT bird-enter(…)
WHERE pred1(event)
SELECT light
WHERE pred2(light)
FROM sensors
SAMPLE INTERVAL 5s
FOR 30s
“Every time an event occurs
that satisfies pred1, sample
lights once every 5 seconds for
30 seconds and report the
samples that satisfy pred2”
Note: makes all samples in phase in
sample window
“Sample light once every 5
SELECT s.light
FROM bird-enter-events[30s] AS e,
sensors AS s
WHERE e.time < s.time
AND pred1(e) AND pred2(s.light)
SAMPLE INTERVAL 5s
seconds. For every sample
that satisfies pred2, check
and see if any events that
satisfy pred1 have
occurred in the last 30
seconds.”
Adaptivity For Contention
• Observation: Under high contention, radios deliver
fewer total packets than under low contention.
• Insight: Don’t allow radios to be highly contested.
Drop or aggregate instead.
– Higher throughput
– Choice over what gets lost
» Based on semantics!
Adaptivity for Power
Conservation
• For many applications, exact sample rate
doesn’t matter
• But network lifetime does!
• Idea: adaptively adjust sample rate & extent
of aggregation based on lifetime goal and
observed power consumption
Efficiency Summary
• Power is the important metric
• TAG
– In-network processing
– Exploit semantics of network and operators
» Channel sharing
» Hypothesis testing
» Using multiple parents
• Indexing for dissemination + collection of data
• Placement and Operator Ordering
• Adaptive Sampling
Q3: Answer Quality
• Lots of possibilities for improving quality
– Multi-path routing
» When applicable
– Transactional delivery
» a.k.a. custody transfer
– Link-layer retransmission
– Caching
• Failure still possible in all modes
• Open question: what’s the right quality
metric?
Diffusion as TinyDB
Foundation?
• Claim: diffusion is an infrastructure upon which TinyDB could be
built
• Via declarative language, TinyDB is able to provide semantic
guarantees and transparent optimization
– Operators can be reordered
– Any tuple can be routed to any operator
– No (important) duplicates will be produced
• At what cost? Diffusion can:
– Adjust better to loss
– Exploit well-connected networks
- Provide n-m routing, instead of n-1 routing
- Might allow global buffers, events, etc.
Summary
• Declarative queries are the right interface for data
collection in sensor nets!
– In network processing and optimization make approach
viable
• Big query language improvements coming soon…
– Event driven & internal queries
– Adaptive sampling + query indexes for performance!
• TinyDB Available Today –
http://telegraph.cs.berkeley.edu/tinydb
Questions?
Grouping
• GROUP BY expr
– expr is an expression over one or more attributes
» Evaluation of expr yields a group number
» Each reading is a member of exactly one group
Example: SELECT max(light) FROM sensors
GROUP BY TRUNC(temp/10)
Sensor ID
1
2
3
4
Light
45
27
66
68
Temp
25
28
34
37
Group
2
2
3
3
Result:
Group
2
3
max(light)
45
68
Having
• HAVING preds
– preds filters out groups that do not satisfy
predicate
– versus WHERE, which filters out tuples that do not
satisfy predicate
– Example:
SELECT max(temp) FROM sensors
GROUP BY light
HAVING max(temp) < 100
Yields all groups with temperature under 100
Group Eviction
• Problem: Number of groups in any one iteration may exceed
available storage on sensor
• Solution: Evict!
– Choose one or more groups to forward up tree
– Rely on nodes further up tree, or root, to recombine groups properly
– What policy to choose?
» Intuitively: least popular group, since don’t want to evict a group that will
receive more values this epoch.
» Experiments suggest:
• Policy matters very little
• Evicting as many groups as will fit into a single message is good
Simulation Environment
• Java-based simulation & visualization for
validating algorithms, collecting data.
• Coarse grained event based simulation
– Sensors arranged on a grid, radio connectivity by
Euclidian distance
– Communication model
» Lossless: All neighbors hear all messages
» Lossy: Messages lost with probability that increases with
distance
» Symmetric links
» No collisions, hidden terminals, etc.
Simulation Screenshot
Experiment: Basic TAG
Bytes / Epoch vs. Network Diameter
100000
90000
Avg. Bytes / Epoch
80000
70000
COUNT
MAX
AVERAGE
MEDIAN
EXTERNAL
DISTINCT
60000
50000
40000
30000
20000
10000
0
10
20
30
40
Network Diameter
Dense Packing, Ideal
Communication
50
Experiment: Hypothesis
Testing
Messages/ Epoch vs. Network Diameter
3000
Messages / Epoch
2500
2000
No Guess
Guess = 50
Guess = 90
Snooping
1500
1000
500
0
10
20
30
40
50
Network Diameter
Uniform Value Distribution, Dense Packing, Ideal
Communication
Experiment: Effects of Loss
Percent Error From Single Loss vs. Network
Diameter
Percent Error From Single Loss
3.5
3
2.5
AVERAGE
COUNT
MAX
MEDIAN
2
1.5
1
0.5
0
10
20
30
40
Network Diameter
50
Experiment: Benefit of Cache
Percentage of Network Involved vs. Network
Diameter
1.2
% Network
1
0.8
No Cache
5 Rounds Cache
9 Rounds Cache
15 Rounds Cache
0.6
0.4
0.2
0
10
20
30
40
Network Diameter
50
Pipelined Aggregates
• After query propagates, during each epoch:
Value from 2 produced at
– Each sensor samples local sensors once
time t arrives at 1 at time
– Combines them with PSRs from children
1
(t+1)
– Outputs PSR representing aggregate state in
the previous epoch.
2
3
• After (d-1) epochs, PSR for the whole tree output at
root
– d = Depth of the routing tree
4
– If desired, partial state from top k levels could
be output in kth epoch
5
• To avoid combining PSRs from different epochs,
Value from 5 produced at
sensors must cache values from children
time t arrives at 1 at time
(t+3)
Pipelining Example
SID
SID
Epoch
Epoch
1
Agg.
2
3
4
SID
5
Epoch
Agg.
Agg.
Pipelining Example
SID
Epoch
Agg.
2
0
1
4
0
1
1
2
<5,0,1>
5
3
SID
Epoch
Agg.
1
0
1
<4,0,1>
4
SID
Epoch
Agg.
3
0
1
5
0
1
Epoch 0
Pipelining Example
SID
Epoch
Agg.
2
0
1
4
0
1
2
1
1
4
1
1
3
0
2
1
<2,0,2>
<3,0,2>
<5,1,1>
5
2
3
SID
Epoch
Agg.
1
0
1
1
1
1
2
0
2
<4,1,1>
4
SID
Epoch
Agg.
3
0
1
5
0
1
3
1
1
5
1
1
Epoch 1
Pipelining Example
SID
Epoch
Agg.
2
0
1
4
0
1
2
1
1
4
1
1
3
0
2
2
2
1
4
2
1
3
1
2
<1,0,3>
1
<2,0,4>
<3,1,2>
<5,2,1>
5
2
3
SID
Epoch
Agg.
1
0
1
1
1
1
2
0
2
1
2
1
2
0
4
<4,2,1>
4
SID
3
5
3
5
3
5
Epoch
0
0
1
1
2
2
Agg.
1
1
1
1
1
1
Epoch 2
Pipelining Example
SID
Epoc
Agg.
2
0
1
4
0
1
2
1
1
4
1
1
3
0
2
2
2
1
4
2
1
3
1
2
<1,0,5>
1
<2,1,4>
<3,2,2>
<5,3,1>
5
3
2
SID
3
5
3
5
3
5
SID
Epoch
Agg.
1
0
1
1
1
1
2
0
2
1
2
1
2
0
4
<4,3,1>
4
Epoch
0
0
1
1
2
2
Agg.
1
1
1
1
1
1
Epoch 3
Pipelining Example
Epoch 4
<1,1,5>
1
<2,2,4>
<3,3,2>
<5,4,1>
5
3
2
<4,4,1>
4
Our Stream Semantics
•
•
•
•
One stream, ‘sensors’
We control data rates
Joins between that stream and buffers are allowed
Joins are always landmark, forward in time, one tuple
at a time
– Result of queries over ‘sensors’ either a single tuple (at time
of query) or a stream
• Easy to interface to more sophisticated systems
• Temporal aggregates enable fancy window
operations
Formal Spec.
ON EVENT <event> [<boolop> <event>... WITHIN <window>]
[SELECT {<expr>|agg(<expr>)|temporalagg(<expr>)}
FROM [sensors | <buffer> | events]]
[WHERE {<pred>}]
[GROUP BY {<expr>}]
[HAVING {<pred>}]
[ACTION [<command> [WHERE <pred>] |
BUFFER <bufname>
SIGNAL <event>({<params>}) |
(SELECT ... ) [INTO BUFFER <bufname>]]]
[SAMPLE PERIOD <seconds>
[FOR <nrounds>]
[INTERPOLATE <expr>]
[COMBINE {temporal_agg(<expr>)}] |
ONCE]
Buffer Commands
[AT <pred>:]
CREATE [<type>] BUFFER <name> ({<type>})
PARTITION BY [<expr>]
SIZE [<ntuples>,<nseconds>]
[AS SELECT ...
[SAMPLE PERIOD <seconds>]]
DROP BUFFER <name>