Transcript 幻灯片 1

Introduction to Cloud Computing
http://net.pku.edu.cn/~course/cs502/
彭波
[email protected]
北京大学信息科学技术学院
5/25/2009
大纲


What is Cloud Computing?
Build a big cloud
云计算(Cloud Computing)
What is Cloud Computing?
1.
2.
3.
First write down your own opinion about “cloud
computing” , whatever you thought about in
your mind.
Question: What ? Who? Why? How? Pros and
cons?
The most important question is: What is the
relation with me?
Cloud Computing is…




No software
access everywhere by Internet
power -- Large-scale data processing
Appeal for startups




Cost efficiency
实在是太方便了
Software as platform
Cons


Security
Data lock-in
SaaS
PaaS
Utility Computing
Software as a Service (SaaS)

a model of software deployment whereby a
provider licenses an application to customers for
use as a service on demand.
Platform as a Service (PaaS)

对于开发Web Application和Services,PaaS提供了一
整套基于Internet的,从开发,测试,部署,运营到维护
的全方位的集成环境。特别它从一开始就具备了Multitenant architecture,用户不需要考虑多用户并发的问
题,而由platform来解决,包括并发管理,扩展性,失效
恢复,安全。
Utility Computing

“pay-as-you-go” 好比让用户把电源插头插在墙上,你得
到的电压和Microsoft得到的一样,只是你用得少,pay
less;utility computing的目标就是让计算资源也具有这
样的服务能力,用户可以使用500强公司所拥有的计算资
源,只是use less pay less。这是cloud computing的一
个重要方面
Cloud Computing is…
Key Characteristics
illusion of infinite
computing resources
available on demand;

elimination of an up-front
commitment by Cloud users;
创业启动花费

ability to pay for use of
computing resources on a
very large datacenters
short-term basis as needed。
小时间片的billing,报告指
large-scale software infrastructure
出utility computing在这一
点上的实践是失败的
operational expertise

Why now?


very large-scale datacenter的实践,
因为新的技术趋势和Business模式

pay-as-you-go computing
Key Players



Amazon Web Services
Google App Engine
Microsoft Windows Azure
Key Applications




Mobile Interactive applications, Tim O’Reilly相信未来是
属于能够实时对用户提供信息的服务。Mobile必定是关键。
而后台在datacenter中运行是很自然的模式,特别是那些
mashup融合类型的服务。
Parallel batch processing。大规模数据处理使用Cloud
Computing技术很自然,MapReduce,Hadoop在这里起
到重要作用。这里,数据移入/移出cloud是很大的开销,
Amazon开始尝试host large public datasets for free。
The rise of analytics。数据库应用中transaction based应
用还在增长,而analytics的应用增长迅速。数据挖掘,用
户行为分析等应用的巨大推动。
Extension of compute-intensive desktop application。计
算密集型的任务,说matlab, mathematica都有了cloud
computing的扩展,woo~
Cloud Computing = Silver Bullet?

Google文档在3月7日发生
了大批用户文件外泄事件。
美国隐私保护组织就此提
请政府对Google采取措施,
使其加强云计算产品的安
全性。

Problem of Data Lock-in
Challenges
Some other Voices
The interesting thing about Cloud Computing is that we’ve redefined
Cloud Computing to include everything that we already do. . . . I
don’t understand what we would do differently in the light of Cloud
Computing other than change the wording of some of our ads.
Larry Ellison, quoted in the Wall Street Journal, September 26, 2008
It’s stupidity. It’s worse than stupidity: it’s a marketing hype
campaign. Somebody is saying this is inevitable — and
whenever you hear somebody saying that, it’s very likely to be
a set of businesses campaigning to make it true.
Richard Stallman, quoted in The Guardian, September 29,
2008
What’s matter with ME?!

What you want to do with 1000pcs, or even
100,000 pcs?
Cloud is coming…
Build a big “Cloud”
Example: Wikipedia Anthropology
Kittur, Suh, Pendleton (UCLA, PARC), “He Says,
She Says: Conflict and Coordination in Wikipedia”
CHI, 2007
Increasing fraction of edits are for
work indirectly related to articles

Experiment



Download entire revision
history of Wikipedia
4.7 M pages, 58 M revisions,
800 GB
Analyze editing patterns &
trends

Computation

Hadoop on 20-machine
cluster
Example: Scene Completion
Hays, Efros (CMU), “Scene Completion Using
Millions of Photographs” SIGGRAPH, 2007

Image Database Grouped by
Semantic Content





30 different Flickr.com groups
2.3 M images total (396 GB).

Select Candidate Images Most
Suitable for Filling Hole



Classify images with gist scene
detector [Torralba]
Color similarity
Local context matching
Computation


Index images offline
50 min. scene matching, 20
min. local matching, 4 min.
compositing
Reduces to 5 minutes total by
using 5 machines
Extension

Flickr.com has over 500 million
images …
Example: Web Page Analysis
Fetterly, Manasse, Najork, Wiener (Microsoft, HP),
“A Large-Scale Study of the Evolution of Web
Pages,” Software-Practice & Experience, 2004

Experiment


Use web crawler to gather
151M HTML pages weekly
11 times
 Generated 1.2 TB log
information
Analyze page statistics and
change frequencies

Systems Challenge
“Moreover, we experienced a
catastrophic disk failure
during the third crawl,
causing us to lose a quarter
of the logs of that crawl.”
Let’s build a big Computer…


Given datacenter with
tens of thousands of pcs,
can you make all these
tasks easier and run faster?
Software infrastructure 的
关键部件是?
Distributed storage system
Distributed Computing Framework
Challenges

大规模数据处理面临的困难

Storage System &
大规模PC机群scaling reliably is hard! Computing Framework
良好可扩展性
 On 1000s of nodes
良好的容错能力
MTBF < 1 day
 With so many disks, nodes, switches something is
always broken
并行/分布式程序开发,调试is hard!
 数据如何划分
Programming Model
 任务如何调度
一定的表达能力
 任务之间的通信
很好的简单易用性
 错误处理,容错…


Cluster-Based Distributed File Systems



Observation: When dealing with very large data
collections, following a simple client-server approach is
not going to work.
Solution 1: For speeding up file accesses, apply striping
techniques by which files can be fetched in parallel:
(a) whole-file distribution, (b) file-striped system
A natural DFS design

File stripping as Chunks
Master of DFS

功能



元数据管理
 inode:
file -> <filename, timestamp, size, owner,
chunklist…>
运行数据管理
 Chunk server info管理:map(chunk, chunkserver)
 Client info管理: locks, open files, etc.
问题



Performance bottleneck?
Master failure?
Master Recovery?
ChunkServer of DFS

功能


管理chunk data: chunkid -> local file
问题


Performance bottleneck?
Chunkserver failure -> data lost?
Review on DFS design

Workload



Goal





大数据
顺序读和append操作为主
Reliability, availability, scalability…
Tolerance to hardware failures
Managing numerous files of large size
Optimizing commonly performed operations
Strategies



Chunk Replications (fault tolerance and performance)
Large chunk size (MB)
All metadata in memory on Master, with operation log
Data Replications in DFS
Master
/foo/bar.dat
Chunkserver
Chunkserver
Client
Chunkserver
Chunkserver
Chunkserver
Chunkserver
Data Mutations
Chunk
A

B
Two kinds of data
mutations are supported


Replica
Replica

A
A
B
B
Random writes
Record appends
Leases used to maintain
consistent mutation order
Primary-based Consistency Protocol
What if a mutation
operation fail in the
middle?
Chunkserver
Chunkserver
Secondary replica
Chunkserver
Primary replica
Chunkserver
Secondary replica
Master
Chunkserver
Client
/foo/bar.dat
Chunkserver
Relaxed Consistency Model

修改操作后的文件区域状态

Consistent


不管从那个replicas读,所有clients看到相同数据
Defined
 consistent + 所有clients看到更新操作写入的全部数
据

Undefined
 consistent +但是可能不能反映任意一个更新操作写
入的数据

Inconsistent

Clients不同时间看到不同的数据
Consistency Model (contd)



不提供完全严格的一致性[3]
由应用程序处理这种放宽的一致性下出现的
inconsistent数据区域问题
提供atomic append,保证append at least once
Summary for DFS





Architecture: master-worker
File strip : large chunk size
Scalability & Availability: Chunk replication
Primary-based consistency protocol
Relaxed consistency model
Distributed Computing

大规模机群 + 可靠存储(DFS)上怎样计算?



编程
运行
调试
Example: Web Page Analysis
Fetterly, Manasse, Najork, Wiener (Microsoft, HP),
“A Large-Scale Study of the Evolution of Web
Pages,” Software-Practice & Experience, 2004

Experiment


Use web crawler to gather
151M HTML pages weekly
11 times
 Generated 1.2 TB log
information
Analyze page statistics and
change frequencies

Systems Challenge
“Moreover, we experienced a
catastrophic disk failure
during the third crawl,
causing us to lose a quarter
of the logs of that crawl.”
A simple solution

M:提取网页长度,按domain执行数据合并
A possible solution


M: 提取网页长度,按domain执行数据合并
R: 按domain执行数据合并
A More difficult Problem

统计文档集中每个word出现的次数?
Shuffle Implementation
Partition and Sort Group
Partition function: hash(key)%reducer number
Group function: sort by key
A Distributed Computing Framework

Parallel/Distributed Computing Programming
Model
I’m the
shuffle
Input
split
MapReduce
Frameworkoutput
Typical problem solved by MapReduce


读入数据: key/value 对的记录格式数据
Map: 从每个记录里extract something


Shuffle: 混排交换数据


把相同key的中间结果汇集到相同节点上
Reduce: aggregate, summarize, filter, etc.


map (in_key, in_value) -> list(out_key, intermediate_value)
 处理input key/value pair
 输出中间结果key/value pairs
reduce (out_key, list(intermediate_value)) -> list(out_value)
 归并某一个key的所有values,进行计算
 输出合并的计算结果 (usually just one)
输出结果
Mapreduce Framework
Input key*value
pairs
Input key*value
pairs
...
map
map
Data store 1
Data store n
(key 1,
values...)
(key 2,
values...)
(key 3,
values...)
(key 2,
values...)
(key 1,
values...)
(key 3,
values...)
== Barrier == : Aggregates intermediate values by output key
key 1,
intermediate
values
key 2,
intermediate
values
key 3,
intermediate
values
reduce
reduce
reduce
final key 1
values
final key 2
values
final key 3
values
Word Frequencies in Web pages


输入:one document per record
用户实现map function,输入为



key = document URL
value = document contents
map输出 (potentially many) key/value pairs.

对document中每一个出现的词,输出一个记录<word, “1”>
Example continued:


MapReduce运行系统(库)把所有相同key的记录收集到一
起 (shuffle/sort)
用户实现reduce function对一个key对应的values计算



求和sum
Reduce输出<key, sum>
Model is Widely Applicable
MapReduce Programs In Google Source Tree
Example uses:
distributed grep
distributed sort
web link-graph reversal
term-vector / host
web access log stats
inverted index construction
document clustering
machine learning
statistical machine translation
...
...
...
Algorithms Fit in MapReduce

文献中见到实现了的算法





K-Means, EM, SVM, PCA, Linear Regression, Naïve
Bayes, Logistic Regression, Neural Network
PageRank
Word Co-occurrence Matrices,Pairwise Document
Similarity
Monte Carlo simulation
……
Capability of MapReduce
MapReduce是否可能成为
解决大部分并行计算需求的主要手段?

MapReduce难于有效实现的并行算法[2]






Dense/Sparse Linear Algebra
N-Body Problems
Dynamic Programming
Graph Traversal
Combinational Logic
"The landscape of parallel computing
research: a view from Berkeley," 2006
。。。
Google MapReduce Architecture
Single Master node
Many worker bees
Many worker bees
MapReduce Operation
Initial data split
into 64MB blocks
Computed, results
locally stored
Master informed of
result locations
M sends data
location to R workers
Final output written
Fault Tolerance

通过re-execution实现fault tolerance






周期性heartbeats检测failure
Re-execute失效节点上已经完成+正在执行的 map tasks
 Why????
Re-execute失效节点上正在执行的reduce tasks
Task completion committed through master
Robust: lost 1600/1800 machines once 
finished ok
Master Failure?
Refinement: Redundant Execution

Slow workers significantly delay completion time



Solution: Near end of phase, spawn backup tasks


Other jobs consuming resources on machine
Bad disks w/ soft errors transfer data slowly
Whichever one finishes first "wins"
Dramatically shortens job completion time
Refinement: Locality Optimization

Master scheduling policy:




Asks GFS for locations of replicas of input file blocks
Map tasks typically split into 64MB (GFS block size)
Map tasks scheduled so GFS input block replica are on
same machine or same rack
Effect

Thousands of machines read input at local disk speed
 Without this, rack switches limit read rate
Refinement: Skipping Bad Records

Map/Reduce functions sometimes fail for particular
inputs



Best solution is to debug & fix
 Not always possible ~ third-party source libraries
On segmentation fault:
 Send UDP packet to master from signal handler
 Include sequence number of record being processed
If master sees two failures for same record:
 Next worker is told to skip the record
Other Refinements


Compression of intermediate data
Combiner




“Combiner” functions can run on same machine as a
mapper
Causes a mini-reduce phase to occur before the real
reduce phase, to save bandwidth
Local execution for debugging/testing
User-defined counters
Summary

CloudComputing brings





Possible of using unlimited
resources on-demand, and by
anytime and anywhere
Possible of construct and
deploy applications
automatically scale to tens of
thousands computers
Possible of construct and run
programs dealing with
prodigious volume of data
…
How to make it real?



Distributed File System
Distributed Computing
Framework
…………………………………
Q&A
参考文献





[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A.
Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M.
Zaharia, "Above the Clouds: A Berkeley View of Cloud Computing,"
EECS Department, University of California, Berkeley UCB/EECS-200928, February 10 2009.
[2] Ucb/Eecs, K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P.
Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams,
and K. Yelick, "The landscape of parallel computing research: a view
from Berkeley," 2006.
[3] G. Sanjay, G. Howard, and L. Shun-Tak, "The Google file system,"
in Proceedings of the nineteenth ACM symposium on Operating
systems principles. Bolton Landing, NY, USA: ACM Press, 2003.
[4] J. D. a. S. Ghemawat, "MapReduce: Simplified Data Processing on
Large Clusters," in Osdi, 2004, pp. 137-150.
Google App Engine

App Engine handles HTTP(S) requests, nothing else



App configuration is dead simple


Think RPC: request in, processing, response out
Works well for the web and AJAX; also for other services
No performance tuning needed
Everything is built to scale


“infinite” number of apps, requests/sec, storage capacity
APIs are simple, stupid
App Engine Architecture
req/resp
stateless APIs
urlfech
mail
R/O FS
Python
VM
process
stdlib
app
images
stateful
APIs
memcache
datastore
63
Microsoft Windows Azure
Amazon Web Services






Amazon’s infrastructure (auto scaling, load
balancing)
Elastic Compute Cloud (EC2) – scalable virtual
private server instances
Simple Storage Service (S3)
Simple Queue Service (SQS) – messaging
SimpleDB - database
Flexible Payments Service, Mechanical Turk,
CloudFront, etc.
Amazon Web Services





Very flexible, lower-level offering (closer to
hardware) = more possibilities, higher performing
Runs platform you provide (machine images)
Supports all major web languages
Industry-standard services (move off AWS easily)
Require much more work, longer time-to-market


Deployment scripts, configuring images, etc.
Various libraries and GUI plug-ins make AWS do
help
Price of Amazon EC2