Wireless Sensor Networks: An Overview

Download Report

Transcript Wireless Sensor Networks: An Overview

Introduction to cloud
computing
Jiaheng Lu
Department of Computer Science
Renmin University of China
www.jiahenglu.net
Yahoo! Cloud computing
Yahoo! Cloud Stack
EDGE
Brooklyn
Horizontal
Cloud Services
YCPI
…
WEB
VM/OS
Horizontal
Cloud ServicesPHP
yApache
APP
VM/OS
Horizontal
Cloud
Serving
Grid Services …
STORAGE
Sherpa
Horizontal
Cloud Services…
MOBStor
BATCH
Hadoop
Horizontal…Cloud Services
App Engine
Data Highway
Monitoring/Metering/Security
Provisioning (Self-serve)
YCS
Web Data Management
• Scan oriented
workloads
• Focus on
sequential disk
I/O
• $ per cpu
cycle
Large data analysis
(Hadoop)
Structured record
storage
(PNUTS/Sherpa)
Blob storage
(SAN/NAS)
• Object
retrieval and
streaming
• Scalable file
storage
• $ per GB
• CRUD
• Point lookups
and short
scans
• Index
organized
table and
random I/Os
• $ per latency
The World Has Changed

Web serving applications need:






Scalability!
 Preferably elastic
Flexible schemas
Geographic distribution
High availability
Reliable storage
Web serving applications can do without:


Complicated queries
Strong transactions
PNUTS /
SHERPA
To Help You Scale Your Mountains of Data
Yahoo! Serving Storage Problem

Small records – 100KB or less

Structured records – lots of fields, evolving

Extreme data scale - Tens of TB

Extreme request scale - Tens of thousands of requests/sec

Low latency globally - 20+ datacenters worldwide

High Availability - outages cost $millions

Variable usage patterns - as applications and users change
9
What is PNUTS/Sherpa?
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Parallel database
CREATE TABLE Parts (
ID VARCHAR,
StockNumber INT,
Status VARCHAR
…
)
Structured, flexible schema
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Geographic replication
Hosted, managed infrastructure
11
What Will It Become?
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
Indexes and views
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Design Goals
Scalability



Thousands of machines
Easy to add capacity
Restrict query language to avoid costly queries
Geographic replication


Asynchronous replication around the globe
Low-latency local access
High availability and fault tolerance


Automatically recover from failures
Serve reads and writes despite failures
Consistency



Per-record guarantees
Timeline model
Option to relax if needed
Multiple access paths


Hash table, ordered table
Primary, secondary access
Hosted service


Applications plug and play
Share operational cost
14
Technology Elements
Applications
Tabular API
PNUTS API
YCA: Authorization
PNUTS
• Query planning and execution
• Index maintenance
Distributed infrastructure for tabular data
• Data partitioning
• Update consistency
• Replication
YDOT FS
• Ordered tables
YDHT FS
• Hash tables
Tribble
• Pub/sub messaging
Zookeeper
• Consistency service
15
Data Manipulation

Per-record operations




Get
Set
Delete
Multi-record operations



Multiget
Scan
Getrange
16
Tablets—Hash Table
0x0000
Name
Description
Grape
Grapes are good to eat
$12
Lime
Limes are green
$9
Apple
Apple is wisdom
$1
Strawberry
0x2AF3
0x911F
0xFFFF
Strawberry shortcake
Price
$900
Orange
Arrgh! Don’t get scurvy!
$2
Avocado
But at what price?
$3
Lemon
How much did you pay for this lemon?
$1
Tomato
Is this a vegetable?
$14
Banana
The perfect fruit
$2
New Zealand
$8
Kiwi
17
Tablets—Ordered Table
A
Name
Description
Price
Apple
Apple is wisdom
$1
Avocado
But at what price?
$3
Banana
The perfect fruit
$2
Grape
Grapes are good to eat
$12
New Zealand
$8
How much did you pay for this lemon?
$1
Limes are green
$9
H
Kiwi
Lemon
Lime
Q
Orange
Strawberry
Tomato
Arrgh! Don’t get scurvy!
$2
Strawberry shortcake
$900
Is this a vegetable?
$14
Z
18
Flexible Schema
Posted date
Listing id
Item
Price
6/1/07
424252
Couch
$570
6/1/07
763245
Bike
$86
6/3/07
211242
Car
$1123
6/5/07
421133
Lamp
$15
Color
Condition
Good
Red
Fair
Detailed Architecture
Remote regions
Local region
Clients
REST API
Routers
Tribble
Tablet Controller
Storage
units
20
Tablet Splitting and Balancing
Each storage unit has many tablets (horizontal partitions of the table)
Storage unit may become a hotspot
Storage unit
Tablet
Overfull tablets split
Tablets may grow over time
Shed load by moving tablets to other servers
21
QUERY
PROCESSING
22
Accessing Data
4 Record for key k
1
Get key k
3 Record for key k
SU
SU
2
Get key k
SU
23
Bulk Read
1
{k1, k2, … kn}
2
Get k1
Get k2
SU
SU
Get k3
Scatter/
gather
server
SU
24
Range Queries in YDOT

Clustered, ordered retrieval of records
Apple
Avocado
Grapefruit…Pear?
Banana
Blueberry
Canteloupe
Grape
Kiwi
Lemon
Grapefruit…Lime?
Lime…Pear?
Router
Lime
Mango
Orange
Strawberry
Apple
Tomato
Avocado
Watermelon
Banana
Blueberry
Storage unit 1
Canteloupe
Storage unit 3
Lime
Storage unit 2
Strawberry
Storage unit 1
Strawberry
Tomato
Watermelon
Storage unit 1
Lime
Mango
Orange
Canteloupe
Grape
Kiwi
Lemon
Storage unit 2
Storage unit 3
Updates
1
8
Write key k
Sequence # for key k
Routers
Message brokers
3
Write key k
2
7
Sequence # for key k
4
Write key k
5
SU
SU
SU
6
SUCCESS
Write key k
26
ASYNCHRONOUS REPLICATION
AND CONSISTENCY
27
Asynchronous Replication
28
Consistency Model

Goal: Make it easier for applications to reason about updates and
cope with asynchrony

What happens to a record with primary key “Alice”?
Record
inserted
Update
v. 1
Update Update
Update
v. 2
v. 3
v. 4
Update
Update
v. 5
v. 6
Generation 1
v. 7
Delete
Update
v. 8
Time
Time
As the record is updated, copies may get out of sync.
29
Example: Social Alice
East
West
User
Status
Alice
___
User
Status
Alice
Busy
User
Status
User
Status
Alice
Busy
Alice
Free
User
Status
User
Status
Alice
???
Alice
???
Record Timeline
___
Busy
Free
Free
Consistency Model
Read
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
In general, reads are served using a local copy
31
Consistency Model
Read up-to-date
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
But application can request and get current version
32
Consistency Model
Read ≥ v.6
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Or variations such as “read forward”—while copies may lag the
master record, every copy goes through the same sequence of changes
33
Consistency Model
Write
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Achieved via per-record primary copy protocol
(To maximize availability, record masterships automaticlly
transferred if site fails)
Can be selectively weakened to eventual consistency
(local writes that are reconciled using version vectors)
34
Consistency Model
Write if = v.7
ERROR
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Test-and-set writes facilitate per-record transactions
35
Consistency Techniques

Per-record mastering
 Each record is assigned a “master region”



May differ between records
Updates to the record forwarded to the master region
Ensures consistent ordering of updates

Tablet-level mastering
 Each tablet is assigned a “master region”
 Inserts and deletes of records forwarded to the master region
 Master region decides tablet splits

These details are hidden from the application
 Except for the latency impact!
Mastering
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
Tablet master
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
37
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Bulk Insert/Update/Replace
Client
Source Data
Bulk manager
1. Client feeds records to bulk
manager
2. Bulk loader transfers records
to SU’s in batches
• Bypass routers and
message brokers
• Efficient import into
storage unit
Bulk Load in YDOT

YDOT bulk inserts can cause performance hotspots

Solution: preallocate tablets
Index Maintenance

How to have lots of interesting indexes and
views, without killing performance?

Solution: Asynchrony!

Indexes/views updated asynchronously when
base table updated
SHERPA
IN CONTEXT
41
Types of Record Stores

Query expressiveness
S3
PNUTS
Oracle
Simple
Feature rich
Object
retrieval
Retrieval from
single table of
objects/records
SQL
Types of Record Stores

Consistency model
S3
PNUTS
Oracle
Best effort
Eventual
consistency
Timeline
consistency
Object-centric
consistency
ACID
Program
centric
consistency
Strong
guarantees
Types of Record Stores

Data model
PNUTS
CouchDB
Oracle
Flexibility,
Schema evolution
Object-centric
consistency
Optimized for
Fixed schemas
Consistency
spans objects
Types of Record Stores

Elasticity (ability to add resources on
demand)
Oracle
PNUTS
S3
Inelastic
Elastic
Limited
(via data
distribution)
VLSD
(Very Large
Scale
Distribution
/Replication)
Data Stores Comparison
Versus PNUTS

User-partitioned SQL stores



Microsoft Azure SDS
Amazon SimpleDB
Multi-tenant application databases


Salesforce.com
Oracle on Demand







More expressive queries
Users must control partitioning
Limited elasticity
Highly optimized for complex
workloads
Limited flexibility to evolving
applications
Inherit limitations of underlying data
management system
Mutable object stores

Amazon S3

Object storage versus record
management
Application Design Space
Get a few
things
Sherpa
MySQL Oracle
BigTable
Scan
everything
Everest
Records
MObStor
YMDB
Filer
Hadoop
Files
47
SQL/ACID
Consistency
model
Updates
Structured
access
Global low
latency
Availability
Operability
Elastic
Alternatives Matrix
Sherpa
Y! UDB
MySQL
Oracle
HDFS
BigTable
Dynamo
Cassandra
48
Further Reading
Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)
Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee,
Ramana Yerneni, Raghu Ramakrishnan
PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)
Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava,
Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen,
Nick Puz, Daniel Weaver, Ramana Yerneni
Asynchronous View Maintenance for VLSD Databases,
Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and
Raghu Ramakrishnan
SIGMOD 2009 (to appear)
Cloud Storage Design in a PNUTShell
Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava
Beautiful Data, O’Reilly Media, 2009 (to appear)
QUESTIONS?
50
Hadoop
Problem

How do you scale up applications?



Run jobs processing 100’s of terabytes of data
Takes 11 days to read on 1 computer
Need lots of cheap computers


Fixes speed problem (15 minutes on 1000
computers), but…
Reliability problems
In large clusters, computers fail every day
 Cluster size is not fixed


Need common infrastructure

Must be efficient and reliable
Solution
Open Source Apache Project
 Hadoop Core includes:

Distributed File System - distributes data
 Map/Reduce - distributes application

Written in Java
 Runs on

Linux, Mac OS/X, Windows, and Solaris
 Commodity hardware

Hardware Cluster of Hadoop

Typically in 2 level architecture




Nodes are commodity PCs
40 nodes/rack
Uplink from rack is 8 gigabit
Rack-internal is 1 gigabit
Distributed File System

Single namespace for entire cluster




Files are broken in to large blocks.



Managed by a single namenode.
Files are single-writer and append-only.
Optimized for streaming reads of large files.
Typically 128 MB
Replicated to several datanodes, for reliability
Access from Java, C, or command line.
Block Placement
Default is 3 replicas, but settable
 Blocks are placed (writes are pipelined):

On same node
 On different rack
 On the other rack

Clients read from closest replica
 If the replication for a block drops below
target, it is automatically re-replicated.

How is Yahoo using Hadoop?

Started with building better applications



Scale up web scale batch applications (search,
ads, …)
Factor out common code from existing
systems, so new applications will be easier to
write
Manage the many clusters
Running Production WebMap

Search needs a graph of the “known” web


Periodic batch job using Map/Reduce


Invert edges, compute link text, whole graph
heuristics
Uses a chain of ~100 map/reduce jobs
Scale





1 trillion edges in graph
Largest shuffle is 450 TB
Final output is 300 TB compressed
Runs on 10,000 cores
Raw disk used 5 PB
Terabyte Sort Benchmark



Started by Jim Gray at Microsoft in 1998
Sorting 10 billion 100 byte records
Hadoop won the general category in 209
seconds








910 nodes
2 quad-core Xeons @ 2.0Ghz / node
4 SATA disks / node
8 GB ram / node
1 gb ethernet / node
40 nodes / rack
8 gb ethernet uplink / rack
Previous records was 297 seconds
Hadoop clusters




We have ~20,000 machines running Hadoop
Our largest clusters are currently 2000 nodes
Several petabytes of user data (compressed, unreplicated)
We run hundreds of thousands of jobs every month
Research Cluster Usage
Who Uses Hadoop?












Amazon/A9
AOL
Facebook
Fox interactive media
Google / IBM
New York Times
PowerSet (now Microsoft)
Quantcast
Rackspace/Mailtrust
Veoh
Yahoo!
More at http://wiki.apache.org/hadoop/PoweredBy
Q&A
 For
more information:
 Website:
http://hadoop.apache.org/core
 Mailing lists:
 [email protected][email protected]
QUESTIONS?
64