Transcript Querying Sensor Networks
Querying Sensor Networks
Sam Madden UC Berkeley November 18 th , 2002 @ Madison 1
Introduction
• • • What are sensor networks?
Programming Sensor Networks Is Hard – Especially if you want to build a “real” application » Example: Vehicle tracking application took 2 grad students 2 weeks to build and hundreds of lines of code.
Declarative Queries Are Easy – And, can be faster and more robust than most applications! » Vehicle tracking query: took 2 minutes to write, worked just as well!
SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE INTERVAL 64ms 2
Overview
• • • • • Sensor Networks Why Queries in Sensor Nets TinyDB – Features – Demo Focus: Tiny Aggregation The Next Step 3
Overview
• • • • • Sensor Networks Why Queries in Sensor Nets TinyDB – Features – Demo Focus: Tiny Aggregation The Next Step 4
Device Capabilities
• • • “Mica Motes” – 8bit, 4Mhz processor » » Roughly a PC AT – 40kbit radio Time to send 1 bit = 800 instrs » Reducing communication is good – 4KB RAM, 128K flash, 512K EEPROM – Sensor board expansion slot » Standard board has light & temperature sensors, accelerometer, magnetometer, microphone, & buzzer Other more powerful platforms exist – E.g. Sensoria WINS nodes Trend towards smaller devices – “Smart Dust” – Kris Pister, et al.
5
Sensor Net Sample Apps
Habitat Monitoring. Storm petrels on great duck island, microclimates on James Reserve.
Vehicle detection: sensors dropped from UAV along a road, collect data about passing vehicles, relay data back to UAV.
Earthquake monitoring in shake test sites.
Traditional monitoring apparatus.
6
Key Constraint: Power
• Lifetime from One pair of AA batteries – 2-3 days at full power – 6 months at 2% duty cycle • Communication dominates cost – Because it takes so long (~30ms) to send / receive a message 7
TinyOS
• • Operating system from David Culler’s group at Berkeley C-like programming environment • Provides messaging layer, abstractions for major hardware components – Split phase highly asynchronous, interrupt driven programming model
ASPLOS 2000. See http://webs.cs.berkeley.edu/tos
Communication In Sensor Nets
• • • • Radio communication has high link-level losses – typically about 20% @ 5m Newest versions of TinyOS provide link-level acknowledgments No end-to-end acknowledgements Ad-hoc neighbor discovery E B D A C F
00 10 21 10 11 12
• Two major routing techniques: tree-based hierarchy and geographic
20 21 22
9
Overview
• • • • • Sensor Networks Why Queries in Sensor Nets TinyDB – Features – Demo Focus: Tiny Aggregation The Next Step 10
Declarative Queries for Sensor Networks
• 1 Examples: SELECT nodeid, light FROM sensors WHERE light > 400 SAMPLE PERIOD 1s
T-2 T-1 T Light Temp Accel ….
453 245 512 … 442 406 278 335 513 511 … … “epoch”
2 SELECT roomNo, AVG(volume) FROM sensors GROUP BY roomNo HAVING AVG(volume) > 200
Rooms w/ volume > 200
11
Declarative Benefits In Sensor Networks
• • • Vastly simplifies execution for large networks – Since locations are described by predicates – Operations are over groups Enables tolerance to faults – Since system is free to choose where and when operations happen Data independence – System is free to choose where data lives, how it is represented 12
Computing In Sensor Nets Is Hard
• • Why?
– Limited power (must optimize for it!) – Lossy communication – Zero administration – Limited processing capabilities, storage, bandwidth In power-based optimization, we choose: » Where data is processed.
» » » How data is routed • Exploit operator semantics!
• Avoid dead nodes How to order operators, sampling, etc.
What kinds of indices to apply, which data to prioritize … 13
Overview
• • • • • Sensor Networks Why Queries in Sensor Nets TinyDB – Features – Demo Focus: Tiny Aggregation The Next Step 14
TinyDB
• • • A distributed query processor for networks of Mica motes – Available today!
Goal: Eliminate the need to write C code for most TinyOS users Features – Declarative queries – Temporal + spatial operations – Multihop routing – In-network storage 15
TinyDB @ 10000 Ft
{B,D,E,F} B {D,E,F} D A C F E (Almost) All Queries are Continuous and Periodic Written in SQL-Like Language With Extensions For : •Sample rate •Offline delivery •Temporal Aggregation 16
TinyDB Demo
17
Applications + Early Adopters
• Some demo apps: – Network monitoring – Vehicle tracking Demo!
• “Real” future deployments: – Environmental monitoring @ GDI (and James Reserve?) – Generic Sensor Kit – Building Monitoring 18
SelOperator Network
TinyDB Architecture (Per node)
AggOperator TupleRouter
TupleRouter: •Fetches readings (for ready queries) •Builds tuples •Applies operators •Deliver results (up tree) AggOperator: •Combines local & neighbor readings SelOperator:
~10,000 Lines C Code ~5,000 Lines Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2 nd largest TinyOS Program)
19
Overview
• • • • • Sensor Networks Why Queries in Sensor Nets TinyDB – Features – Demo Focus: Tiny Aggregation The Next Step 20
TAG
• In-network processing of aggregates – Aggregates are common operation – Reduces costs depending on type of aggregates – Focus on “spatial aggregation” (Versus “temporal aggregation”) • Exploitation of operator, functional semantics Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002 (to appear).
21
Aggregation Framework
• As in extensible databases, we support any aggregation function conforming to:
Agg n ={f merge , f init , f evaluate } F merge {,}
Example: Average AVG merge AVG init {v} {} 1 , C 1 >,
S 1 2 , C /C 1 2 >}
< S 1 + S 2 , C 1 + C 2 >
22
Query Propagation Review
SELECT AVG(light)…
A E B D C F 23
Pipelined Aggregates
• • • After query propagates, during each epoch: – Each sensor samples local sensors once – Combines them with PSRs from children – Outputs PSR representing aggregate state in the previous epoch.
After (d-1) epochs, PSR for the whole tree output at root – d = Depth of the routing tree – If desired, partial state from top k levels could be output in k th epoch To avoid combining PSRs from different epochs, sensors must cache values from children Value from 2 produced at time
t
arrives at 1 at time
(t+1)
2 4 1 5 3 Value from 5 produced at time
t
arrives at 1 at time
(t+3)
24
Illustration: Pipelined Aggregation
SELECT COUNT(*) FROM sensors
1 2 3
Depth = d
4 5 25
SELECT COUNT(*) FROM sensors
Sensor #
1
Illustration: Pipelined Aggregation
1
1
1
2
1
3
1
4
1
5
2 1 1 Epoch 1 1 1 4 1 1 3 5 26
SELECT COUNT(*) FROM sensors
Sensor #
1 2
Illustration: Pipelined Aggregation
1 3
1
1 1
2
1 2
3
1 2
4
1 1
5
2 1 3 Epoch 2 1 2 4 2 1 3 5 27
SELECT COUNT(*) FROM sensors
Sensor #
1 2 3
Illustration: Pipelined Aggregation
1 3 4
1
1 1 1
2
1 2 3
3
1 2 2
4
1 1 1
5
2 1 4 Epoch 3 1 3 4 2 1 3 5 28
SELECT COUNT(*) FROM sensors
Sensor #
1 2 3 4
Illustration: Pipelined Aggregation
1 3 4 5
1
1 1 1 1
2
1 2 3 3
3
1 2 2 2
4
1 1 1 1
5
2 1 5 Epoch 4 1 3 4 2 1 3 5 29
SELECT COUNT(*) FROM sensors
Sensor #
1 2 3 4 5
Illustration: Pipelined Aggregation
1 3 4 5 5
1
1 1 1 1 1
2
1 2 3 3 3
3
1 2 2 2 2
4
1 1 1 1 1
5
2 1 5 Epoch 5 1 3 4 2 1 3 5 30
Grouping
• • • • If query is grouped, sensors apply predicate on each epoch PSRs tagged with group When a PSR (with group) is received: – If it belongs to a stored group, merge with existing PSR – If not, just store it At the end of each epoch, transmit one PSR per group 31
Group Eviction
• • Problem: Number of groups in any one iteration may exceed available storage on sensor Solution: Evict! (Partial Preaggregation*) – Choose one or more groups to forward up tree – Rely on nodes further up tree, or root, to recombine groups properly – What policy to choose?
» Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch.
» Experiments suggest: • Policy matters very little • Evicting as many groups as will fit into a single message is good 32
* Per-Åke Larson. Data Reduction by Partial Preaggregation. ICDE 2002.
TAG Advantages
• • • In network processing reduces communication – Important for power and contention Continuous stream of results – In the absence of faults, will converge to right answer Lots of optimizations – Based on shared radio channel – Semantics of operators 33
Simulation Environment
• • • Chose to simulate to allow 1000’s of nodes and control of topology, connectivity, loss Java-based simulation & visualization for validating algorithms, collecting data.
Coarse grained event based simulation – Sensors arranged on a grid, radio connectivity by Euclidian distance – Communication model » Lossless: All neighbors hear all messages » » » Lossy: Messages lost with probability that increases with distance Symmetric links No collisions, hidden terminals, etc.
34
Simulation Result
Simulation Results 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20
Total Bytes Xmitted vs. Aggregation Function
100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 EXTERNAL
Some aggregates require dramatically more state!
MAX AVERAGE
Aggregation Function
COUNT MEDIAN 35
Taxonomy of Aggregates
• TAG insight: classify aggregates according to various functional properties – Yields a general set of optimizations that can automatically be applied Property Examples Affects Partial State MEDIAN : unbounded, MAX : 1 record Effectiveness of TAG Duplicate Sensitivity Exemplary vs. Summary Monotonic MIN : dup. insensitive, AVG : dup. sensitive MAX : exemplary COUNT: summary COUNT : monotonic AVG : non-monotonic Routing Redundancy Applicability of Sampling, Effect of Loss Hypothesis Testing, Snooping 36
Optimization: Channel Sharing (“Snooping”)
• • Insight: Shared channel enables optimizations Suppress messages that won’t affect aggregate – E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report – Applies to all exemplary, monotonic aggregates » Can be applied to summary aggregates also if imprecision is allowed • Learn about query advertisements it missed – If a sensor shows up in a new environment, it can learn about queries by looking at neighbors messages.
» Root doesn’t have to explicitly rebroadcast query!
37
Optimization: Hypothesis Testing
• • Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value.
– E.g. Tell all the nodes that the MIN is definitely < 50; nodes with value ≥ 50 need not participate.
– Depends on monotonicity How is hypothesis computed?
– Blind guess – Statistically informed guess – Observation over first few levels of tree / rounds of aggregate 38
Experiment: Hypothesis Testing
Messages/ Epoch vs. Network Diameter (SELECT MAX(attr), R(attr) = [0,100])
3000 2500 2000 No Guess Guess = 50 Guess = 90 Snooping 1500 1000 500 0 10 20 30 40
Network Diameter
50 Uniform Value Distribution, Dense Packing, Ideal Communication 39
Optimization: Use Multiple Parents
• • For duplicate insensitive aggregates Or aggregates that can be expressed as a linear combination of parts – Send (part of) aggregate to all parents – Decreases variance » Dramatically, when there are lots of parents
No splitting: E(count) = c * p Var(count) = c 2 * p * (1-p) With Splitting: 1/2 E(count) = 2 * c/2 * p Var(count) = 2 * (c/2) 2 * p * (1-p)
40
Multiple Parents Results
• • • Interestingly, this technique is much Losses aren’t independent!
Critical
analysis predicted!
Instead of focusing data on a few critical links, spreads data over many links
With Splitting
1400 1200 1000 800 600 400 200 0
Benefit of Result Splitting (COUNT query)
Splitting No Splitting
(2500 nodes, lossy radio model, 6 parents per node)
41
Fun Stuff
• Sophisticated, sensor network specific aggregates • Temporal aggregates 42
Temporal Aggregates
• • • TAG was about “spatial” aggregates – Inter-node, at the same time Want to be able to aggregate across time as well Two types: – Windowed: AGG(size,slide,attr)
slide =2 size =4
– Decaying: AGG(comb_func, attr) – Demo!
… R1 R2 R3 R4 R5 R6 …
43
Isobar Finding
44
TAG Summary
• • In-network query processing a big win for many aggregate functions By exploiting general functional properties of operators, optimizations are possible – Requires new aggregates to be tagged with their properties • Up next : non-aggregate query processing optimizations – a flavor of things to come!
45
Overview
• • • • • Sensor Networks Why Queries in Sensor Nets TinyDB – Features – Demo Focus: Tiny Aggregation The Next Step 46
Acquisitional Query Processing
• Cynical question: what’s really different about sensor networks?
–Low Power?
Laptops!
–Lots of Nodes?
Distributed DBs!
–Limited Processing Capabilities?
Moore’s Law!
47
Answer
• Long running queries on physically embedded devices that collected!
control when and and with what frequency data is • Versus traditional systems where data is provided a priori – Next: an acquisitional teaser… 48
ACQP: What’s Different?
• • • • How does the user control acquisition?
– Specify rates or lifetimes – Trigger queries in response to events Which nodes have relevant data?
– Need a node index – Construct topology such that nodes that are queried together route together What sensors should be sampled?
– Treat sampling at an operator – Sample cheapest sensors first Which samples should be transmitted?
– Not all of them, if bandwidth or power is limited – Those that are most “valuable”?
49
Operator Ordering: Interleave Sampling + Selection • SELECT FROM WHERE light, mag sensors pred1(mag) AND pred2(light) SAMPLE INTERVAL 1s
Energy cost of sampling mag >> cost of sampling light
•
same as the processor!
1500 uJ vs. 90 uJ
•
Correct ordering (unless pred1 is very selective): 1. Sample light 2. Sample light 3. Sample mag Sample mag Apply pred1 Apply pred2 Apply pred2 Sample mag Apply pred1 Apply pred1 Sample light
50
Apply pred2
Optimizing in ACQP
• • • Model sampling as an “expensive predicate” Some subtleties: – Attributes referenced in multiple predicates; which to “charge”?
– Attributes must be fetched before operators that use them can be applied Solution: – Treat sampling as a separate task – Build a partial order on sampling and predicates – Solve for cheapest schedule using series-parallel scheduling algorithm (Monma & Sidney, 1979.), as in other optimization work (e.g. Ibaraki & Kameda, TODS, 1984, or Hellerstein, TODS, 1998.) 51
Exemplary Aggregate Pushdown
SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x SAMPLE INTERVAL 1s
Unless > x is very selective, correct ordering is: Sample light Check if it’s the maximum If it is: Sample mag Check predicate If satisfied, update maximum
52
Summary
• • Declarative queries are the right interface for data collection in sensor nets!
Aggregation is a fundamental operation for which there are many possible optimizations – Network Aware Techniques • Current Research: Acquisitional Query Processing – Framework for addresses lots of the new issues that arise in sensor networks, e.g.
» Order of sampling and selection » Languages, indices, approximations that give user control over which data enters the system TinyDB Release Available http://telegraph.cs.berkeley.edu/tinydb 53
Questions?
54
Simulation Screenshot
55
TinyAlloc
• • Handle Based Compacting Memory Allocator For Catalog, Queries Handle h;
Master Pointer Table
call MemAlloc.alloc(&h,10); … (*h)[0] = “Sam”; call MemAlloc.lock(h); tweakString(*h); call MemAlloc.unlock(h); call MemAlloc.free(h);
Compaction User Program
56
Schema
• Attribute & Command IF – At INIT(), components register attributes and commands they support » Commands implemented via wiring » Attributes fetched via accessor command – Catalog API allows local and remote queries over known attributes / commands.
• Demo of adding an attribute, executing a command.
57
Q1: Expressiveness
• • • Simple data collection satisfies most users How much of what people want to do is just simple aggregates?
– Anecdotally, most of it – EE people want filters + simple statistics (unless they can have signal processing) However, we’d like to satisfy everyone! 58
Query Language
• New Features: – Joins – Event-based triggers » Via extensible catalog – In network & nested queries – Split-phase (offline) delivery » Via buffers 59
Sample Query 1 Bird counter:
CREATE BUFFER SIZE 1 birds(uint16 cnt) ON EVENT SELECT bird-enter(…) b.cnt+1 FROM ONCE birds AS OUTPUT INTO b b 60
Sample Query 2 Birds that entered and left within time t of each other:
ON EVENT t bird-leave SELECT WHERE ONCE AND bird-enter WITHIN bird-leave.time, bird-leave.nest
bird-leave.nest = bird-enter.nest
61
Sample Query 3 Delta compression:
SELECT FROM WHERE
|
light buf, sensors s.light – buf.light
| >
OUTPUT INTO buf SAMPLE PERIOD 1s t 62
Sample Query 4
Offline Delivery + Event Chaining CREATE BUFFER equake_data( uint16 loc, uint16 xAccel, uint16 yAccel) SIZE 1000 PARTITION BY NODE SELECT WHERE xAccel, yAccel FROM SENSORS xAccel > t OR yAccel > t SIGNAL shake_start(…) SAMPLE PERIOD 1s ON EVENT shake_start(…) SELECT loc, xAccel, yAccel FROM sensors OUTPUT INTO BUFFER equake_data(loc, xAccel, yAccel) SAMPLE PERIOD 10ms 63
Event Based Processing
• • • • Enables internal and chained actions Language Semantics – Events are inter-node – Buffers can be global Implementation plan – Events and buffers must be local – Since n-to-n communication not (well) supported Next : operator expressiveness 64
Attribute Driven Topology Selection
• • Observation: internal queries often over local area* – Or some other subset of the network » E.g. regions with light value in [10,20] Idea: build topology for those queries based on values of range-selected attributes – Requires range attributes, connectivity to be relatively static
* Heideman et. Al, Building Efficient Wireless Sensor Networks With Low Level
65
Naming. SOSP, 2001.
Attribute Driven Query Propagation
SELECT … WHERE a > 5 AND a < 12 [1,10] 4 [20,40] Precomputed intervals == “Query Dissemination Index” [7,15] 1 2 3
66
Attribute Driven Parent Selection
[1,10] 1 2 4 [7,15] [3,6] 3 [20,40] Even without intervals, expect that sending to parent with closest value will help [3,6]
[1,10] = [3,6] [3,7]
[7,15] = ø [3,7]
[20,40] = ø
67
Hot off the press…
Nodes Vi s i t ed v s . Range Quer y Si z e f or Di f f er ent I ndex Pol i c i es
450 400 350 300 250 200 150 100 50 0 Best Case (Expected) Closest Parent Nearest Value Snooping 0.001
0.05
0.1
0.2
0.5
Quer y Si ze as % of Val ue Range
1
( Random val ue di st r i but i on, 20x20 gr i d, i deal connect i vi t y t o ( 8) nei ghbor s) 68
•
Grouping
GROUP BY expr – expr is an expression over one or more attributes » Evaluation of expr yields a group number » Each reading is a member of exactly one group Example: SELECT max(light) FROM sensors GROUP BY TRUNC(temp/10) 1 2 3 4 Sensor ID 45 27 66 68 Light 25 28 Temp 34 37 2 2 3 3 Group Result: Group 2 3 max(light) 45 68 69
Having
• HAVING preds – preds filters out groups that do not satisfy predicate – versus WHERE, which filters out tuples that do not satisfy predicate – Example: SELECT max(temp) FROM sensors GROUP BY light HAVING max(temp) < 100 Yields all groups with temperature under 100 70
• •
Group Eviction
Problem: Number of groups in any one iteration may exceed available storage on sensor Solution: Evict!
– Choose one or more groups to forward up tree – Rely on nodes further up tree, or root, to recombine groups properly – What policy to choose?
» » Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch.
Experiments suggest: • Policy matters very little • Evicting as many groups as will fit into a single message is good 71
Experiment: Basic TAG
100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0
Bytes / Epoch vs. Network Diameter
10 20 30 40
Network Diameter
50 COUNT MAX AVERAGE MEDIAN EXTERNAL DISTINCT Dense Packing, Ideal Communication 72
Experiment: Hypothesis Testing
3000
Messages/ Epoch vs. Network Diameter
2500 2000 1500 1000 500 No Guess Guess = 50 Guess = 90 Snooping 0 10 20 30 40
Network Diameter
50 Uniform Value Distribution, Dense Packing, Ideal Communication 73
Experiment: Effects of Loss
1.5
1 0.5
0 3.5
3 2.5
2
Percent Error From Single Loss vs. Network Diameter
10 20 30
Network Diameter
40 50 AVERAGE COUNT MAX MEDIAN 74
Experiment: Benefit of Cache
Percentage of Network Involved vs. Network Diameter
0.6
0.4
0.2
0 1.2
1 0.8
10 20 30
Network Diameter
40 50 No Cache 5 Rounds Cache 9 Rounds Cache 15 Rounds Cache 75
Pipelined Aggregates
• • • After query propagates, during each epoch: – Each sensor samples local sensors once – Combines them with PSRs from children – Outputs PSR representing aggregate state in the previous epoch.
After (d-1) epochs, PSR for the whole tree output at root – d = Depth of the routing tree – If desired, partial state from top k levels could be output in k th epoch Value from 2 produced at time To avoid combining PSRs from different epochs, sensors must cache values from children arrives at 1 at time (t+1) 2 4 1 5 3 Value from 5 produced at time t arrives at 1 at time (t+3) 76
SI D Epoch Agg.
Pipelining Example
1 SID Epoch Agg.
5 3 2 4 SID Epoch Agg.
77
Pipelining Example
SI D 2 4 Epoch Agg.
0 0 1 1 1 SI D 1 Epoch Agg.
0 1 <5,0,1> 5 3 2 <4,0,1> 4 SID Epoch Agg.
3 5 0 0 1 1 Epoch 0 78
Pipelining Example
SID Epoch 2 0 4 2 0 1 4 3 1 0 1 1 Agg.
1 1 2 <3,0,2> <5,1,1> 5 <2,0,2> 1 3 SI D 1 1 2 Epoch Agg.
0 1 0 2 <4,1,1> 4 SID Epoch 3 0 5 3 5 0 1 1 1 1 1 Agg.
1 1 1 2 Epoch 1 79
Pipelining Example
3 2 4 3 SID Epoch Agg.
2 0 1 4 2 4 0 1 1 1 1 1 0 2 2 1 2 1 1 2 <1,0,3> 1 <2,0,4> 2 <3,1,2> 3 <5,2,1> 5 1 2 1 SI D 1 Epoch Agg.
5 3 5 SID 3 5 3 4 1 2 2 Epoch 0 0 1 0 1 0 2 0 1 1 1 1 1 Agg.
1 1 1 2 1 4 Epoch 2 80
Pipelining Example
SID 2 4 3 2 4 2 4 3 1 0 2 2 1 Epoc h 0 0 1 Agg.
1 1 2 1 1 1 1 2 <3,2,2> <5,3,1> 5 <1,0,5> 1 <2,1,4> 3 2 SID Epoch 1 0 1 2 1 0 1 2 2 0 <4,3,1> 5 3 5 SID 3 5 3 4 1 2 2 Epoch 0 0 1 1 1 1 1 1 Agg.
1 Agg.
1 1 2 1 4 Epoch 3 81
Pipelining Example
<1,1,5> 1 <2,2,4> 2 <3,3,2> 3 <5,4,1> 5 <4,4,1> 4 Epoch 4 82
Our Stream Semantics
• • • • • • One stream, ‘sensors’ We control data rates Joins between that stream and buffers are allowed Joins are always landmark, forward in time, one tuple at a time – Result of queries over ‘sensors’ either a single tuple (at time of query) or a stream Easy to interface to more sophisticated systems Temporal aggregates enable fancy window operations 83
Formal Spec.
ON EVENT
84
Buffer Commands
[ AT
85