Wireless Sensor Networks: An Overview

Download Report

Transcript Wireless Sensor Networks: An Overview

Introduction to cloud
computing
Jiaheng Lu
Department of Computer Science
Renmin University of China
www.jiahenglu.net
 Advanced


MapReduce Application
Reference: Jimmy Lin
http://www.umiacs.umd.edu/~jimmylin/clo
ud-2008-Fall/schedule.html
Managing Dependencies

Remember: Mappers run in isolation




You have no idea in what order the mappers run
You have no idea on what node the mappers run
You have no idea when each mapper finishes
Tools for synchronization:




Ability to hold state in reducer across multiple keyvalue pairs
Sorting function for keys
Partitioner
Cleverly-constructed data structures
Motivating Example

Term co-occurrence matrix for a text
collection



M = N x N matrix (N = vocabulary size)
Mij: number of times i and j co-occur in some
context
(for concreteness, let’s say context = sentence)
Why?


Distributional profiles as a way of measuring
semantic distance
Semantic distance useful for many language
processing tasks
e.g., Mohammad and Hirst (EMNLP, 2006)
MapReduce: Large Counting
Problems

Term co-occurrence matrix for a text
collection
= specific instance of a large counting
problem




A large event space (number of terms)
A large number of events (the collection itself)
Goal: keep track of interesting statistics about the
events
Basic approach


Mappers generate partial counts
Reducers aggregate partial counts
First Try: “Pairs”

Each mapper takes a sentence:




Generate all co-occurring term pairs
For all pairs, emit (a, b) → count
Reducers sums up counts associated with
these pairs
Use combiners!
“Pairs” Analysis

Advantages


Easy to implement, easy to understand
Disadvantages

Lots of pairs to sort and shuffle around (upper
bound?)
Another Try: “Stripes”

Idea: group together pairs into an associative
array
(a, b) → 1
(a, c) → 2
(a, d) → 5
(a, e) → 3
(a, f) → 2

a → { b: 1, c: 2, d: 5, e: 3, f: 2 }
a → { b: 1,
d: 5, e: 3 }
a → { b: 1, c: 2, d: 2,
f: 2 }
a → { b: 2, c: 2, d: 7, e: 3, f: 2 }
Each mapper takes a sentence:

Generate all co-occurring term pairs
Another Try: “Stripes”

Reducers perform element-wise sum of
associative arrays
+
a → { b: 1,
d: 5, e: 3 }
a → { b: 1, c: 2, d: 2,
f: 2 }
a → { b: 2, c: 2, d: 7, e: 3, f: 2 }
“Stripes” Analysis

Advantages



Far less sorting and shuffling of key-value pairs
Can make better use of combiners
Disadvantages



More difficult to implement
Underlying object is more heavyweight
Fundamental limitation in terms of size of event
space
Cluster size: 38 cores
Data Source: Associated Press Worldstream (APW) of the English Gigaword Corpus (v3),
which contains 2.27 million documents (1.8 GB compressed, 5.7 GB uncompressed)
Conditional Probabilities

How do we compute conditional probabilities
from counts?
P( B | A) 
count( A, B)
count( A, B)

count( A)
 count( A, B' )
B'


Why do we want to do this?
How do we do this with MapReduce?
P(B|A): “Pairs”
(a, *) → 32
Reducer holds this value in memory
(a, b1) → 3
(a, b2) → 12
(a, b3) → 7
(a, b4) → 1
…

(a, b1) → 3 / 32
(a, b2) → 12 / 32
(a, b3) → 7 / 32
(a, b4) → 1 / 32
…
For this to work:



Must emit extra (a, *) for every bn in mapper
Must make sure all a’s get sent to same reducer
(use Partitioner)
Must make sure (a, *) comes first (define sort
order)
P(B|A): “Stripes”
a → {b1:3, b2 :12, b3 :7, b4 :1, … }

Easy!


One pass to compute (a, *)
Another pass to directly compute P(B|A)
Synchronization in Hadoop

Approach 1: turn synchronization into an
ordering problem




Sort keys into correct order of computation
Partition key space so that each reducer gets the
appropriate set of partial results
Hold state in reducer across multiple key-value
pairs to perform computation
Approach 2: construct data structures that
“bring the pieces together”

Each reducer receives all the data it needs to
complete the computation
Issues and Tradeoffs

Number of key-value pairs



Size of each key-value pair


Object creation overhead
Time for sorting and shuffling pairs across the
network
De/serialization overhead
Combiners make a big difference!


RAM vs. disk and network
Arrange data to maximize opportunities to
aggregate partial results
Data Types in Hadoop
Writable
WritableComprable
IntWritable
LongWritable
Text
…
Defines a de/serialization protocol.
Every data type in Hadoop is a Writable.
Defines a sort order. All keys must be
of this type (but not values).
Concrete classes for different data types.
Complex Data Types in
Hadoop


How do you implement complex data types?
The easiest way:



Encoded it as Text, e.g., (a, b) = “a:b”
Use regular expressions to parse and extract data
The hard way:



Define a custom implementation of
WritableComprable
Must implement: readFields, write, compareTo
Computationally efficient, but slow for rapid
prototyping
Yahoo! PNUTS and Hadoop
Yahoo! Cloud Stack
EDGE
Brooklyn
Horizontal
Cloud Services
YCPI
…
WEB
VM/OS
Horizontal
Cloud ServicesPHP
yApache
APP
VM/OS
Horizontal
Cloud
Serving
Grid Services …
STORAGE
Sherpa
Horizontal
Cloud Services…
MOBStor
BATCH
Hadoop
Horizontal…Cloud Services
App Engine
Data Highway
Monitoring/Metering/Security
Provisioning (Self-serve)
YCS
Yahoo! CCDI Thrust Areas

Fast Provisioning and Machine Virtualization: On
demand, deliver a set of hosts imaged with desired software
and configured against standard services

Multiple hosts may be multiplexed onto the same physical
machine.

Batch Storage and Processing: Scalable data storage
optimized for batch processing, together with computational
capabilities

Operational Storage: Persistent storage that supports lowlatency updates and flexible retrieval

Edge Content Services: Support for dealing with network
topology, communication protocols, caching, and BCP
Rest of
today’s talk
Web Data Management
• Scan oriented
workloads
• Focus on
sequential disk
I/O
• $ per cpu
cycle
Large data analysis
(Hadoop)
Structured record
storage
(PNUTS/Sherpa)
Blob storage
(SAN/NAS)
• Object
retrieval and
streaming
• Scalable file
storage
• $ per GB
• CRUD
• Point lookups
and short
scans
• Index
organized
table and
random I/Os
• $ per latency
The World Has Changed

Web serving applications need:






Scalability!
 Preferably elastic
Flexible schemas
Geographic distribution
High availability
Reliable storage
Web serving applications can do without:


Complicated queries
Strong transactions
PNUTS /
SHERPA
To Help You Scale Your Mountains of Data
Yahoo! Serving Storage Problem

Small records – 100KB or less

Structured records – lots of fields, evolving

Extreme data scale - Tens of TB

Extreme request scale - Tens of thousands of requests/sec

Low latency globally - 20+ datacenters worldwide

High Availability - outages cost $millions

Variable usage patterns - as applications and users change
27
What is PNUTS/Sherpa?
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Parallel database
CREATE TABLE Parts (
ID VARCHAR,
StockNumber INT,
Status VARCHAR
…
)
Structured, flexible schema
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Geographic replication
Hosted, managed infrastructure
29
What Will It Become?
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
Indexes and views
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Design Goals
Scalability



Thousands of machines
Easy to add capacity
Restrict query language to avoid costly queries
Geographic replication


Asynchronous replication around the globe
Low-latency local access
High availability and fault tolerance


Automatically recover from failures
Serve reads and writes despite failures
Consistency



Per-record guarantees
Timeline model
Option to relax if needed
Multiple access paths


Hash table, ordered table
Primary, secondary access
Hosted service


Applications plug and play
Share operational cost
32
Technology Elements
Applications
Tabular API
PNUTS API
YCA: Authorization
PNUTS
• Query planning and execution
• Index maintenance
Distributed infrastructure for tabular data
• Data partitioning
• Update consistency
• Replication
YDOT FS
• Ordered tables
YDHT FS
• Hash tables
Tribble
• Pub/sub messaging
Zookeeper
• Consistency service
33
Data Manipulation

Per-record operations




Multi-record operations




Get
Set
Delete
Multiget
Scan
Getrange
Web service (RESTful) API
34
Tablets—Hash Table
0x0000
Name
Description
Grape
Grapes are good to eat
$12
Lime
Limes are green
$9
Apple
Apple is wisdom
$1
Strawberry
0x2AF3
0x911F
0xFFFF
Strawberry shortcake
Price
$900
Orange
Arrgh! Don’t get scurvy!
$2
Avocado
But at what price?
$3
Lemon
How much did you pay for this lemon?
$1
Tomato
Is this a vegetable?
$14
Banana
The perfect fruit
$2
New Zealand
$8
Kiwi
35
Tablets—Ordered Table
A
Name
Description
Price
Apple
Apple is wisdom
$1
Avocado
But at what price?
$3
Banana
The perfect fruit
$2
Grape
Grapes are good to eat
$12
New Zealand
$8
How much did you pay for this lemon?
$1
Limes are green
$9
H
Kiwi
Lemon
Lime
Q
Orange
Strawberry
Tomato
Arrgh! Don’t get scurvy!
$2
Strawberry shortcake
$900
Is this a vegetable?
$14
Z
36
Flexible Schema
Posted date
Listing id
Item
Price
6/1/07
424252
Couch
$570
6/1/07
763245
Bike
$86
6/3/07
211242
Car
$1123
6/5/07
421133
Lamp
$15
Color
Condition
Good
Red
Fair
Detailed Architecture
Remote regions
Local region
Clients
REST API
Routers
Tribble
Tablet Controller
Storage
units
38
Tablet Splitting and Balancing
Each storage unit has many tablets (horizontal partitions of the table)
Storage unit may become a hotspot
Storage unit
Tablet
Overfull tablets split
Tablets may grow over time
Shed load by moving tablets to other servers
39
QUERY
PROCESSING
40
Accessing Data
4 Record for key k
1
Get key k
3 Record for key k
SU
SU
2
Get key k
SU
41
Bulk Read
1
{k1, k2, … kn}
2
Get k1
Get k2
SU
SU
Get k3
Scatter/
gather
server
SU
42
Range Queries in YDOT

Clustered, ordered retrieval of records
Apple
Avocado
Grapefruit…Pear?
Banana
Blueberry
Canteloupe
Grape
Kiwi
Lemon
Grapefruit…Lime?
Lime…Pear?
Router
Lime
Mango
Orange
Strawberry
Apple
Tomato
Avocado
Watermelon
Banana
Blueberry
Storage unit 1
Canteloupe
Storage unit 3
Lime
Storage unit 2
Strawberry
Storage unit 1
Strawberry
Tomato
Watermelon
Storage unit 1
Lime
Mango
Orange
Canteloupe
Grape
Kiwi
Lemon
Storage unit 2
Storage unit 3
Updates
1
8
Write key k
Sequence # for key k
Routers
Message brokers
3
Write key k
2
7
Sequence # for key k
4
Write key k
5
SU
SU
SU
6
SUCCESS
Write key k
44
ASYNCHRONOUS REPLICATION
AND CONSISTENCY
45
Asynchronous Replication
46
Consistency Model

Goal: Make it easier for applications to reason about updates and
cope with asynchrony

What happens to a record with primary key “Alice”?
Record
inserted
Update
v. 1
Update Update
Update
v. 2
v. 3
v. 4
Update
Update
v. 5
v. 6
Generation 1
v. 7
Delete
Update
v. 8
Time
Time
As the record is updated, copies may get out of sync.
47
Example: Social Alice
East
West
User
Status
Alice
___
User
Status
Alice
Busy
User
Status
User
Status
Alice
Busy
Alice
Free
User
Status
User
Status
Alice
???
Alice
???
Record Timeline
___
Busy
Free
Free
Consistency Model
Read
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
In general, reads are served using a local copy
49
Consistency Model
Read up-to-date
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
But application can request and get current version
50
Consistency Model
Read ≥ v.6
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Or variations such as “read forward”—while copies may lag the
master record, every copy goes through the same sequence of changes
51
Consistency Model
Write
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Achieved via per-record primary copy protocol
(To maximize availability, record masterships automaticlly
transferred if site fails)
Can be selectively weakened to eventual consistency
(local writes that are reconciled using version vectors)
52
Consistency Model
Write if = v.7
ERROR
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Test-and-set writes facilitate per-record transactions
53
Consistency Techniques

Per-record mastering
 Each record is assigned a “master region”



May differ between records
Updates to the record forwarded to the master region
Ensures consistent ordering of updates

Tablet-level mastering
 Each tablet is assigned a “master region”
 Inserts and deletes of records forwarded to the master region
 Master region decides tablet splits

These details are hidden from the application
 Except for the latency impact!
Mastering
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
Tablet master
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
55
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Bulk Insert/Update/Replace
Client
Source Data
Bulk manager
1. Client feeds records to bulk
manager
2. Bulk loader transfers records
to SU’s in batches
• Bypass routers and
message brokers
• Efficient import into
storage unit
Bulk Load in YDOT

YDOT bulk inserts can cause performance hotspots

Solution: preallocate tablets
Index Maintenance

How to have lots of interesting indexes and
views, without killing performance?

Solution: Asynchrony!

Indexes/views updated asynchronously when
base table updated
SHERPA
IN CONTEXT
59
Types of Record Stores

Query expressiveness
S3
PNUTS
Oracle
Simple
Feature rich
Object
retrieval
Retrieval from
single table of
objects/records
SQL
Types of Record Stores

Consistency model
S3
PNUTS
Oracle
Best effort
Eventual
consistency
Timeline
consistency
Object-centric
consistency
ACID
Program
centric
consistency
Strong
guarantees
Types of Record Stores

Data model
PNUTS
CouchDB
Oracle
Flexibility,
Schema evolution
Object-centric
consistency
Optimized for
Fixed schemas
Consistency
spans objects
Types of Record Stores

Elasticity (ability to add resources on
demand)
Oracle
PNUTS
S3
Inelastic
Elastic
Limited
(via data
distribution)
VLSD
(Very Large
Scale
Distribution
/Replication)
Data Stores Comparison
Versus PNUTS

User-partitioned SQL stores



Microsoft Azure SDS
Amazon SimpleDB
Multi-tenant application databases


Salesforce.com
Oracle on Demand







More expressive queries
Users must control partitioning
Limited elasticity
Highly optimized for complex
workloads
Limited flexibility to evolving
applications
Inherit limitations of underlying data
management system
Mutable object stores

Amazon S3

Object storage versus record
management
Application Design Space
Get a few
things
Sherpa
MySQL Oracle
BigTable
Scan
everything
Everest
Records
MObStor
YMDB
Filer
Hadoop
Files
65
SQL/ACID
Consistency
model
Updates
Structured
access
Global low
latency
Availability
Operability
Elastic
Alternatives Matrix
Sherpa
Y! UDB
MySQL
Oracle
HDFS
BigTable
Dynamo
Cassandra
66
Further Reading
Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)
Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee,
Ramana Yerneni, Raghu Ramakrishnan
PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)
Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava,
Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen,
Nick Puz, Daniel Weaver, Ramana Yerneni
Asynchronous View Maintenance for VLSD Databases,
Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and
Raghu Ramakrishnan
SIGMOD 2009 (to appear)
Cloud Storage Design in a PNUTShell
Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava
Beautiful Data, O’Reilly Media, 2009 (to appear)
QUESTIONS?
68