An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many discussions with: Eric Baldeschwieler, Jay.

Download Report

Transcript An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many discussions with: Eric Baldeschwieler, Jay.

An Overview of Cloud
Computing @ Yahoo!
Raghu Ramakrishnan
Chief Scientist, Audience and Cloud Computing
Research Fellow, Yahoo! Research
Reflects many discussions with:
Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata
and joint work with the Sherpa team, in particular:
Brian Cooper, Utkarsh Srivastava, Adam Silberstein, Rodrigo Fonseca and Nick Puz in Y! Research
Chuck Neerdaels, P.P. Suryanarayanan and many others in CCDI
1
Questions
• What is cloud computing?
– Horizontal and functional services
• What’s it going to change?
– Software business models, science, life
• How many clouds will there be?
– 1, 2, 3, infinity
• What’s new in cloud computing?
– HPC grids, ASPs, hosted services, Multics (!)
– Emerging “cloud stack” to support a broad class of
programs, including data intensive applications
2
Pie-in-the-sky
SCENARIOS
3
Living in the Clouds
• We want to start a new website, FredsList.com
• Our site will provide listings of items for sale, jobs,
etc.
• As time goes on, we’ll add more features
– And illustrate how more cloud capabilities (and
corresponding infrastructure components) are used as
needed
• List of capabilities/components is illustrative, not exhaustive
• Our cloud provides a “dataset” abstraction
– FredsList doesn’t worry about the underlying
components
4
Step 1: Listings Scenario
FredsList wants to store listings as (key, category, description)
FredsList.com application
1234323,
transportation,
For sale: one
bicycle, barely
used
5523442,
childcare,
Nanny available
in San Jose
215534,
wanted,
Looking for
issue 1 of
Superman comic
book
DECLARE DATASET Listings AS
( ID String PRIMARY KEY,
Category String,
Description Text )
Simple Web Service API’s
PNUTS
Database
5
Step 2: System Evolution
Fred belatedly realizes prices are useful information!
FredsList.com application
1234323,
transportation,
For sale: one
bicycle, barely
used
5523442,
childcare,
Nanny available
in San Jose
ALTER DATASET Listings
ADD (Price Float)
215534,
wanted,
Looking for
issue 1 of
Superman comic
book
32138,
camera,
Nikon D40,
USD 300
Simple Web Service API’s
vs.
PNUTS
Database
Not every record in a
dataset has values defined
for all fields declared for
the dataset
Schemas
are flexible,
and evolve
6
Federation
of systems
offering
different
capabilities
Step 3: Search
FredsList’s customers quickly ask for keyword search
FredsList.com application
“dvd’s”
“bicycle”
“nanny”
ALTER Listings
SET Description SEARCHABLE
Simple Web Service API’s
PNUTS
Vespa
Database
Search
Tribble
Messaging
7
Step 4: Photos
FredsList decides to add photos/videos to listings
Federation
of systems
offering
different
performance
points
FredsList.com application
ALTER Listings
ADD Photo BLOB
Simple Web Service API’s
PNUTS
Foreign key
MObStor
photo → listing
Vespa
Storage
Database
Search
Tribble
Messaging
8
Step 5: Data Analysis
FredsList wants to analyze its listings to get statistics about category, do geocoding, etc.
FredsList.com application
Pig query to
analyze
categories
Hadoop program to
generate fancy
pages for listings
Hadoop
program to
geocode data
ALTER Listings
MAKE ANALYZABLE
Simple Web Service API’s
Grid
PNUTS
Foreign key
MObStor
Vespa
photo → listing
Compute
Database
Batch export
Storage
Search
Tribble
Messaging
9
Step 6: Performance
And by now,
Fred is
global, and
wants georeplication!
FredsList wants to reduce its data access latency
FredsList.com application
ALTER Listings
MAKE CACHEABLE
Simple Web Service API’s
Grid
PNUTS
Foreign key
MObStor
Vespa
memcached
photo → listing
Compute
Database
Batch export
Storage
Search
Caching
Tribble
Messaging
10
Data Serving vs. Analysis
• Very different workloads, requirements
• Data from serving system is one of many
kinds of data (click streams are another
common kind, as are syndicated feeds)
to be analyzed and integrated
• The result of analysis often goes right
back into serving system
11
Motherhood-and-Apple-Pie
EYES TO THE SKIES
12
Why Clouds?
• On-demand infrastructure to
create a fundamental shift
in the OE curve:
– Do things we can’t do
– Build more robustly, more
efficiently, more globally,
more completely, more
quickly, for a given budget
• Cloud services should do
heavy lifting of heavy-lifting
of scaling & high-availability
– Today, this is done at the applevel, which is not productive
13
Requirements for Cloud Services
• Multitenant. A cloud service must support multiple, organizationally
distant customers.
• Elasticity. Tenants should be able to negotiate and receive
resources/QoS on-demand.
• Resource Sharing. Ideally, spare cloud resources should be
transparently applied when a tenant’s negotiated QoS is insufficient, e.g.,
due to spikes.
• Horizontal scaling. It should be possible to add cloud capacity in small
increments; this should be transparent to the tenants of the service.
• Metering. A cloud service must support accounting that reasonably
ascribes operational and capital expenditures to each of the tenants of the
service.
• Security. A cloud service should be secure in that tenants are not made
vulnerable because of loopholes in the cloud.
• Availability. A cloud service should be highly available.
• Operability. A cloud service should be easy to operate, with few
operators. Operating costs should scale linearly or better with the capacity
of the service.
14
Types of Cloud Services
• Two kinds of cloud services:
– Horizontal (“Platform”) Cloud Services
• Functionality enabling tenants to build applications or new
services on top of the cloud
– Functional Cloud Services
• Functionality that is useful in and of itself to tenants. E.g.,
various SaaS instances, such as Saleforce.com; Google
Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed
at end-users and small businesses, e.g., flickr, Groups, Mail,
News, Shopping
• Could be built on top of horizontal cloud services or from
scratch
• Yahoo! has been offering these for a long while (e.g., Mail for
SMB, Groups, Flickr, BOSS, Ad exchanges)
15
Opening Up Yahoo! Search
Phase 1
Giving site owners and developers
control over the appearance of Yahoo!
Search results.
Phase 2
BOSS takes Yahoo!’s open strategy to
the next level by providing Yahoo!
Search infrastructure and technology to
developers and companies to help them
build their own search experiences.
16
BOSS Offerings
BOSS offers two options for companies and developers and has partnered with top technology
universities to drive search experimentation, innovation and research into next generation
search.
API
A self-service, web services model for
developers and start-ups to quickly
build and deploy new search
experiences.
CUSTOM
Working with 3rd parties to build a
more relevant, brand/site specific
web search experience.
This option is jointly built by Yahoo!
and select partners.
ACADEMIC
Working with the following
universities to allow for wide-scale
research in the search field:
• University of Illinois
Urbana Champaign
• Carnegie Mellon
University
• Stanford University
• Purdue University
• MIT
• Indian Institute of
Technology Bombay
• University of
Massachusetts
(Slide courtesy Prabhakar Raghavan) 18
Partner Examples
19
Horizontal Cloud Services
• Horizontal cloud services are foundations on which
tenants build applications or new services. They should
be:
– Semantics-free. Must be "generic infrastructure,” and not tied to
specific app-logic.
• May provide the ability to inject application logic through well-defined
APIs
– Broadly applicable. Must be broadly applicable (i.e., it can't be
intended for just one or two properties).
– Fault-tolerant over commodity hardware. Must be built using
inexpensive commodity hardware, and should mask component
failures.
• While each cloud service provides value, the power of the
cloud paradigm will depend on a collection of well-chosen,
loosely coupled services that collectively make it easy to
quickly develop and operate innovative web applications.
20
Yahoo! Cloud Stack
EDGE
Brooklyn
Horizontal
Cloud Services
YCPI
…
WEB
VM/OS
Horizontal
Cloud ServicesPHP
yApache
APP
VM/OS
Horizontal
Cloud
Serving
Grid Services …
STORAGE
Sherpa
Horizontal
Cloud Services…
MOBStor
App Engine
Data Highway
Monitoring/Metering/Security
Provisioning (Self-serve)
YCS
BATCH
Hadoop
Horizontal…Cloud Services
22
Yahoo! CCDI Thrust Areas
• Fast Provisioning and Machine Virtualization: On
demand, deliver a set of hosts imaged with desired
software and configured against standard services
– Multiple hosts may be multiplexed onto the same physical
machine.
• Batch Storage and Processing: Scalable data storage
optimized for batch processing, together with
computational capabilities
• Operational Storage: Persistent storage that supports
low-latency updates and flexible retrieval
• Edge Content Services: Support for dealing with
network topology, communication protocols, caching, and
BCP
Rest of
today’s talk
23
Web Data Management
• Scan oriented
workloads
• Focus on
sequential disk
I/O
• $ per cpu
cycle
Large data analysis
(Hadoop)
Structured record
storage
(PNUTS/Sherpa)
Blob storage
(SAN/NAS)
• CRUD
• Point lookups
and short
scans
• Index
organized
table and
random I/Os
• $ per latency
• Object
retrieval and
streaming
• Scalable file
storage
• $ per GB
24
Hadoop: Batch Storage/Analysis
Why is batch processing
important?
[Workflow]
High-level query layer
(Pig)
Map-Reduce
HDFS
• Whether it’s
–
–
–
–
response-prediction for advertising
machine-learned relevance for Search, or
content optimization for audience,
data-intensive computing is increasingly
central to everything Yahoo! does
– Hadoop is central to addressing this need
• Hadoop is a case-study in our cloud vision
– Processes enormous amounts of data
– Provides horizontal scaling and faulttolerance for our users
– Allows those users to focus on their app
logic
25
The World Has Changed
• Web serving applications need:
– Scalability!
• Preferably elastic
–
–
–
–
Flexible schemas
Geographic distribution
High availability
Reliable storage
• Web serving applications can do without:
– Complicated queries
– Strong transactions
26
MObStor
• Yahoo!’s next-generation globally replicated, virtualized
media object storage service
• Better provisioning, easy migration, replication, better
BCP, and performance
• New features (Evergreen URLs, CDN integration, REST
API, …)
• The object metadata problem addressed using Sherpa,
though MObStor is focused on blob storage.
2727
Storage & Delivery Stack
28
PNUTS /
SHERPA
To Help You Scale Your Mountains of Data
29
CCDI—Research Collaboration
Yahoo! Research
CCDI
•
•
•
•
•
•
•
•
•
•
Raghu Ramakrishnan
Brian Cooper
Utkarsh Srivastava
Adam Silberstein
Rodrigo Fonseca
Chuck Neerdaels
P.P.S. Narayan
Kevin Athey
Toby Negrin
Plus Dev/QA teams
30
Yahoo! Serving Storage Problem
– Small records – 100KB or less
– Structured records – lots of fields, evolving
– Extreme data scale - Tens of TB
– Extreme request scale - Tens of thousands of requests/sec
– Low latency globally - 20+ datacenters worldwide
– High Availability - outages cost $millions
– Variable usage patterns - as applications and users change
31
31
What is PNUTS/Sherpa?
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Parallel database
CREATE TABLE Parts (
ID VARCHAR,
StockNumber INT,
Status VARCHAR
…
)
Structured, flexible schema
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Geographic replication
Hosted, managed infrastructure
33
33
What Will It Become?
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
Indexes and views
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
35
Design Goals
Scalability
Consistency
•
•
•
•
•
•
Thousands of machines
Easy to add capacity
Restrict query language to avoid costly queries
Per-record guarantees
Timeline model
Option to relax if needed
Geographic replication
Multiple access paths
•
•
•
•
Asynchronous replication around the globe
Low-latency local access
Hash table, ordered table
Primary, secondary access
High availability and fault tolerance
Hosted service
•
•
•
•
Automatically recover from failures
Serve reads and writes despite failures
Applications plug and play
Share operational cost
36
36
Technology Elements
Applications
Tabular API
PNUTS API
YCA: Authorization
PNUTS
• Query planning and execution
• Index maintenance
Distributed infrastructure for tabular data
• Data partitioning
• Update consistency
• Replication
YDOT FS
• Ordered tables
YDHT FS
• Hash tables
Tribble
• Pub/sub messaging
Zookeeper
• Consistency service
37
37
Data Manipulation
• Per-record operations
– Get
– Set
– Delete
• Multi-record operations
– Multiget
– Scan
– Getrange
• Web service (RESTful) API
38
38
Tablets—Hash Table
0x0000
Name
Description
Grape
Grapes are good to eat
$12
Lime
Limes are green
$9
Apple
Apple is wisdom
$1
Strawberry
0x2AF3
0x911F
0xFFFF
Strawberry shortcake
Price
$900
Orange
Arrgh! Don’t get scurvy!
$2
Avocado
But at what price?
$3
Lemon
How much did you pay for this lemon?
$1
Tomato
Is this a vegetable?
$14
Banana
The perfect fruit
$2
New Zealand
$8
Kiwi
39
39
Tablets—Ordered Table
A
Name
Description
Price
Apple
Apple is wisdom
$1
Avocado
But at what price?
$3
Banana
The perfect fruit
$2
Grape
Grapes are good to eat
$12
New Zealand
$8
How much did you pay for this lemon?
$1
Limes are green
$9
H
Kiwi
Lemon
Lime
Q
Orange
Strawberry
Tomato
Arrgh! Don’t get scurvy!
$2
Strawberry shortcake
$900
Is this a vegetable?
$14
Z
40
40
Flexible Schema
Posted date
Listing id
Item
Price
6/1/07
424252
Couch
$570
6/1/07
763245
Bike
$86
6/3/07
211242
Car
$1123
6/5/07
421133
Lamp
$15
Color
Condition
Good
Red
Fair
41
Detailed Architecture
Remote regions
Local region
Clients
REST API
Routers
Tribble
Tablet Controller
Storage
units
42
42
Tablet Splitting and Balancing
Each storage unit has many tablets (horizontal partitions of the table)
Storage unit may become a hotspot
Storage unit
Tablet
Overfull tablets split
Tablets may grow over time
Shed load by moving tablets to other servers
43
43
QUERY
PROCESSING
44
44
Accessing Data
4 Record for key k
1
Get key k
3 Record for key k
SU
SU
2
Get key k
SU
45
45
Bulk Read
1
{k1, k2, … kn}
2
Get k1
Get k2
SU
SU
Get k3
Scatter/
gather
server
SU
46
46
Range Queries in YDOT
• Clustered, ordered retrieval of records
Apple
Avocado
Grapefruit…Pear?
Banana
Blueberry
Canteloupe
Grape
Kiwi
Lemon
Grapefruit…Lime?
Lime…Pear?
Router
Lime
Mango
Orange
Strawberry
Apple
Tomato
Avocado
Watermelon
Banana
Blueberry
Storage unit 1
Canteloupe
Storage unit 3
Lime
Storage unit 2
Strawberry
Storage unit 1
Strawberry
Tomato
Watermelon
Storage unit 1
Lime
Mango
Orange
Canteloupe
Grape
Kiwi
Lemon
Storage unit 2
Storage unit 3
47
Updates
1
8
Write key k
Sequence # for key k
Routers
Message brokers
3
Write key k
2
7
Sequence # for key k
4
Write key k
5
SU
SU
SU
6
SUCCESS
Write key k
48
48
ASYNCHRONOUS
REPLICATION AND
CONSISTENCY
49
49
Asynchronous Replication
50
50
Consistency Model
• Goal: Make it easier for applications to reason about updates
and cope with asynchrony
• What happens to a record with primary key “Alice”?
Record
inserted
Update
v. 1
Update Update
Update
v. 2
v. 3
v. 4
Update
Update
v. 5
v. 6
Generation 1
v. 7
Delete
Update
v. 8
Time
Time
As the record is updated, copies may get out of sync.
51
51
Example: Social Alice
East
West
User
Status
Alice
___
User
Status
Alice
Busy
User
Status
User
Status
Alice
Busy
Alice
Free
User
Status
User
Status
Alice
???
Alice
???
Record Timeline
___
Busy
Free
Free
52
Consistency Model
Read
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
In general, reads are served using a local copy
53
53
Consistency Model
Read up-to-date
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
But application can request and get current version
54
54
Consistency Model
Read ≥ v.6
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Or variations such as “read forward”—while copies may lag the
master record, every copy goes through the same sequence of changes
55
55
Consistency Model
Write
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Achieved via per-record primary copy protocol
(To maximize availability, record masterships automaticlly
transferred if site fails)
Can be selectively weakened to eventual consistency
(local writes that are reconciled using version vectors)
56
56
Consistency Model
Write if = v.7
ERROR
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
Test-and-set writes facilitate per-record transactions
57
57
Consistency Techniques
• Per-record mastering
– Each record is assigned a “master region”
• May differ between records
– Updates to the record forwarded to the master region
– Ensures consistent ordering of updates
• Tablet-level mastering
– Each tablet is assigned a “master region”
– Inserts and deletes of records forwarded to the master region
– Master region decides tablet splits
• These details are hidden from the application
– Except for the latency impact!
58
Mastering
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
A
B
C
D
E
F
Tablet master
A
B
C
D
E
F
42342
42521
66354
12352
75656
15677
42342
42521
66354
12352
75656
15677
E
W
W
E
C
E
E
W
W
E
C
E
59
59
Bulk Insert/Update/Replace
Client
Source Data
Bulk manager
1. Client feeds records to bulk
manager
2. Bulk loader transfers records
to SU’s in batches
• Bypass routers and
message brokers
• Efficient import into
storage unit
60
Bulk Load in YDOT
• YDOT bulk inserts can cause performance
hotspots
• Solution: preallocate tablets
61
Index Maintenance
• How to have lots of interesting indexes
and views, without killing performance?
• Solution: Asynchrony!
– Indexes/views updated asynchronously when
base table updated
62
SHERPA
IN CONTEXT
63
63
Types of Record Stores
• Query expressiveness
S3
PNUTS
Oracle
Simple
Feature rich
Object
retrieval
Retrieval from
single table of
objects/records
SQL
64
Types of Record Stores
• Consistency model
S3
PNUTS
Oracle
Best effort
Eventual
consistency
Timeline
consistency
Object-centric
consistency
ACID
Strong
guarantees
Program
centric
consistency
65
Types of Record Stores
• Data model
PNUTS
CouchDB
Oracle
Flexibility,
Schema evolution
Object-centric
consistency
Optimized for
Fixed schemas
Consistency
spans objects
66
Types of Record Stores
• Elasticity (ability to add resources on
demand)
Oracle
PNUTS
S3
Inelastic
Elastic
Limited
(via data
distribution)
VLSD
(Very Large
Scale
Distribution
/Replication)
67
Data Stores Comparison
Versus PNUTS
•
User-partitioned SQL stores
– Microsoft Azure SDS
– Amazon SimpleDB
•
Multi-tenant application databases
– Salesforce.com
– Oracle on Demand
•
•
•
More expressive queries
Users must control partitioning
Limited elasticity
•
Highly optimized for complex
workloads
Limited flexibility to evolving
applications
Inherit limitations of underlying
data management system
•
•
•
Mutable object stores
– Amazon S3
•
Object storage versus record
management
68
Application Design Space
Get a few
things
Sherpa
MySQL Oracle
BigTable
Scan
everything
Everest
Records
MObStor
YMDB
Filer
Hadoop
Files
69
69
SQL/ACID
Consistency
model
Updates
Structured
access
Global low
latency
Availability
Operability
Elastic
Alternatives Matrix
Sherpa
Y! UDB
MySQL
Oracle
HDFS
BigTable
Dynamo
Cassandra
70
70
Further Reading
Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)
Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee,
Ramana Yerneni, Raghu Ramakrishnan
PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)
Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava,
Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen,
Nick Puz, Daniel Weaver, Ramana Yerneni
Asynchronous View Maintenance for VLSD Databases,
Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and
Raghu Ramakrishnan
SIGMOD 2009 (to appear)
Cloud Storage Design in a PNUTShell
Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava
Beautiful Data, O’Reilly Media, 2009 (to appear)
71
QUESTIONS?
72
72