Transcript Slides

Distributed Systems
CS 15-440
Programming Models- Part IV
Lecture 16, Nov 4, 2013
Mohammad Hammoud
1
Today…
 Last Session:
 Programming Models – Part III: MapReduce
 Today’s Session:
 Programming Models – Part IV: Pregel & GraphLab
 Announcements:
 Project 3 is due on Saturday Nov 9, 2013 by midnight
 PS3 is due on Wednesday Nov 13, 2013 by midnight
 Quiz 2 is on Nov 20, 2013
 Final Exam is on Dec 8, 2013 at 9:00AM in room # 2051
 Last day of classes will be Wednesday Dec 4, 2013 (we will hold
2
an overview session)
Objectives
Discussion on Programming Models
Why
parallelizing our
programs?
Parallel
computer
architectures
Traditional
Models of
parallel
programming
Last 3 Sessions
Types of
Parallel
Programs
Message
Passing
Interface
(MPI)
MapReduce,
Pregel and
GraphLab
Cont’d
The Pregel Analytics Engine
Pregel
Motivation &
Definition
The Computation
& Programming
Models
Input and
Output
Architecture &
Execution Flow
4
FaultTolerance
Motivation for Pregel
 How to implement algorithms to process Big Graphs?
 Create a custom distributed Difficult!
infrastructure for each new algorithm
Inefficient
and Cumbersome!
 Rely on existing distributed
analytics
engines like MapReduce
 Use a single-computer graph algorithm library like BGL, LEDA,
Usually Big Graphs are too large to fit on a single machine!
NetworkX etc.
 Use a parallel
systemDistributed
like ParallelSystems!
BGL or CGMGraph
Not graph
suitedprocessing
for Large-Scale
5
What is Pregel?
 Pregel is a large-scale graph-parallel distributed analytics engine
 Some Characteristics:
•
•
•
•
•
•
•
In-Memory (opposite to MapReduce)
High scalability
Automatic fault-tolerance
Flexibility in expressing graph algorithms
Message-Passing programming model
Tree-style, master-slave architecture
Synchronous
 Pregel is inspired by Valiant’s Bulk Synchronous Parallel
(BSP) model
6
The Pregel Analytics Engine
Pregel
Motivation &
Definition
The Computation
& Programming
Models
Input and
Output
Architecture &
Execution Flow
7
FaultTolerance
The BSP Model
Iterations
Data
Data
CPU 1
Data
CPU 1
Data
CPU 1
Data
Data
Data
Data
Data
Data
Data
Data
CPU 2
CPU 2
CPU 2
Data
Data
Data
Data
Data
Data
Data
Data
CPU 3
CPU 3
CPU 3
Data
Data
Data
Data
Data
Super-Step 1
Super-Step 2
8
Super-Step 3
Barrier
Data
Barrier
Data
Barrier
Data
Entities and Super-Steps
 The computation is described in terms of vertices, edges and a
sequence of super-steps
 You give Pregel a directed graph consisting of
vertices and edges


Each vertex is associated with a modifiable
user-defined value
Each edge is associated with a source vertex, value
and a destination vertex
 During a super-step:




A user-defined function F is executed at each vertex V
F can read messages sent to V in superset S – 1 and send messages to other
vertices that will be received at superset S + 1
F can modify the state of V and its outgoing edges
9
F can alter the topology of the graph
Topology Mutations
 The graph structure can be modified during any super-step

Vertices and edges can be added or deleted
 Mutating graphs can create conflicting requests where multiple
vertices at a super-step might try to alter the same edge/vertex
 Conflicts are avoided using partial ordering and handlers
 Partial orderings:




Edges are removed before vertices
Vertices are added before edges
Mutations performed at super-step S are only
super-step S + 1
All mutations precede calls to actual computations
effective at
 Handlers:

Among multiple conflicting requests, one request is selected arbitrarily
10
Algorithm Termination
 Algorithm termination is based on every vertex voting to halt
 In super-step 0, every vertex is active
 All active vertices participate in the computation of any given super-step
 A vertex deactivates itself by voting
Vote to Halt
to halt and enters an inactive state
 A vertex can return to active state
Active
Inactive
if it receives an external message
Message Received
Vertex State Machine
 A Pregel program terminates when all vertices
are simultaneously inactive and there are no messages in transit
11
Finding the Max Value in a Graph
S:
3
6
2
1
Blue Arrows
are messages
S + 1:
3
6
6
2
1
6
Blue vertices
have voted to
halt
S + 2:
6
6
2
6
6
S + 3:
6
6
6
6
The Programming Model
 Pregel adopts the message-passing programming model
 Messages
can
be
passed
any other vertex in the graph



from
any
vertex
to
Any number of messages can be passed
The message order is not guaranteed
Messages will not be duplicated
 Combiners can be used to reduce
the number of messages passed
between super-steps
 Aggregators are available for reduction operations (e.g., sum, min,
and max)
13
The Pregel API in C++
 A Pregel program is written by sub-classing the Vertex class:
template <typename VertexValue,
typename EdgeValue,
typename MessageValue>
To define the types for vertices,
edges and messages
class Vertex {
public:
virtual void Compute(MessageIterator* msgs) = 0;
const string& vertex_id() const;
int64 superstep() const;
const VertexValue& GetValue();
VertexValue* MutableValue();
OutEdgeIterator GetOutEdgeIterator();
Override the
compute function to
define the
computation at
each superstep
To get the value of the
current vertex
To modify the value of
the vertex
void SendMessageTo(const string& dest_vertex,
const MessageValue& message);
To pass messages
to other vertices
void VoteToHalt();
};
14
Pregel Code for Finding the Max
Value
Class MaxFindVertex
: public Vertex<double, void, double> {
public:
virtual void Compute(MessageIterator* msgs) {
int currMax = GetValue();
SendMessageToAllNeighbors(currMax);
for ( ; !msgs->Done(); msgs->Next()) {
if (msgs->Value() > currMax)
currMax = msgs->Value();
}
if (currMax > GetValue())
*MutableValue() = currMax;
else VoteToHalt();
}
};
The Pregel Analytics Engine
Pregel
Motivation &
Definition
The Computation
& Programming
Models
Input and
Output
Architecture &
Execution Flow
16
FaultTolerance
Input, Graph Flow and Output
 The input graph in Pregel is stored in a distributed storage layer
(e.g., GFS or Bigtable)
 The input graph is divided into partitions consisting of vertices and
outgoing edges

Default partitioning function is hash(ID) mod N, where N is the # of partitions
 Partitions are stored at node memories for the duration of
computations (hence, an in-memory model & not a disk-based one)
 Outputs in Pregel are typically graphs isomorphic (or mutated) to
input graphs

Yet, outputs can be also aggregated statistics mined from input graphs
(depends on the graph algorithms)
The Pregel Analytics Engine
Pregel
Motivation &
Definition
The Computation
& Programming
Models
Input and
Output
Architecture &
Execution Flow
18
FaultTolerance
The Architectural Model
 Pregel assumes a tree-style
master-slave architecture
network
topology
and
Core Switch
Rack Switch
Rack Switch
Worker1
Worker2
Worker3
Worker4
Worker5
Master
Push work (i.e., partitions) to all workers
19
Send Completion Signals
When the master receives the completion signal from every worker in
super-step S, it starts super-step S + 1
a
The Execution Flow
 Steps of Program Execution in Pregel:
1. Copies of the program code are distributed across all machines
1.1 One copy is designated as the master and every other copy is
deemed as a worker/slave
2. The master partitions the graph and assigns workers partition(s),
along with portions of input “graph data”
3. Every worker executes the user-defined function on each vertex
4. Workers can communicate among each others
20
The Execution Flow
 Steps of Program Execution in Pregel:
5. The master coordinates the execution of super-steps
6. The master calculates the number of inactive vertices after
each super-step and signals workers to terminate if all vertices
are inactive (and no messages are in transit)
7. Each worker may be instructed to save its portion of the graph
21
The Pregel Analytics Engine
Pregel
Motivation &
Definition
The Computation
& Programming
Models
Input and
Output
Architecture &
Execution Flow
22
FaultTolerance
Fault Tolerance in Pregel
 Fault-tolerance is achieved through checkpointing
 At the start of every super-step the master may instruct the
workers to save the states of their partitions in a stable storage
 Master uses “ping” messages to detect worker failures
 If a worker fails, the master re-assigns corresponding vertices
and input graph data to another available worker,
and restarts the super-step
 The available worker re-loads the partition state of the failed
worker from the most recent available checkpoint
23
How Does Pregel Compare to
MapReduce?
24
Pregel versus MapReduce
Aspect
Hadoop MapReduce
Pregel
Programming Model
Shared-Memory
(abstraction)
Message-Passing
Computation Model
Synchronous
Synchronous
Parallelism Model
Data-Parallel
Graph-Parallel
Architectural Model
Master-Slave
Master-Slave
Task/Vertex Scheduling
Model
Pull-Based
Push-Based
Application Suitability
LooselyConnected/Embarrassingly
Parallel Applications
Strongly-Connected
Applications
25
Objectives
Discussion on Programming Models
Why
parallelizing our
programs?
Parallel
computer
architectures
Traditional
Models of
parallel
programming
Types of
Parallel
Programs
Message
Passing
Interface
(MPI)
MapReduce,
Pregel and
GraphLab
The GraphLab Analytics Engine
GraphLab
Motivation
&
Definition
Input, Output
&
Components
The
Architectural
Model
The
Programming
Model
The
Computation
Model
27
FaultTolerance
Motivation for GraphLab
 There is an exponential growth in the scale of Machine Learning and
Data Mining (MLDM) algorithms
 Designing, implementing and testing MLDM at large-scale are
challenging due to:





Synchronization
Deadlocks
Scheduling
Distributed state management
Fault-tolerance
 The interest on analytics engines that can execute MLDM algorithms
automatically and efficiently is increasing


MapReduce is inefficient with iterative jobs (common in MLDM algorithms)
Pregel cannot run asynchronous problems (common in MLDM algorithms)
28
What is GraphLab?
 GraphLab is a large-scale graph-parallel distributed analytics engine
 Some Characteristics:
• In-Memory (opposite to MapReduce and similar to Pregel)
• High scalability
• Automatic fault-tolerance
• Flexibility in expressing arbitrary graph algorithms (more flexible
than Pregel)
• Shared-Memory abstraction (opposite to Pregel but similar to
MapReduce)
• Peer-to-peer architecture (dissimilar to Pregel and MapReduce)
• Asynchronous (dissimilar to Pregel and MapReduce)
29
The GraphLab Analytics Engine
GraphLab
Motivation
&
Definition
Input, Output
&
Components
The
Architectural
Model
The
Programming
Model
The
Computation
Model
30
FaultTolerance
Input, Graph Flow and Output
 GraphLab assumes problems modeled as graphs
 It adopts two phases, the initialization and the execution phases
GraphLab Execution Phase
Initialization Phase
Distributed
File system
Raw
Graph
Data
Raw
Graph
Data
(MapReduce)
Graph Builder
Parsing +
Partitioning
Atom
Collection
Index
Construction
Distributed
File system
Atom Index
Atom
File
Atom
File
Atom
File
Atom
File
Atom
File
Atom
File
Cluster
TCP RPC
Comms
Monitoring +
Atom
Placement
GL Engine
GL Engine
GL Engine
31
Distributed
File system
Atom Index
Atom
File
Atom
File
Atom
File
Atom
File
Atom
File
Atom
File
Components of the GraphLab Engine:
The Data-Graph
 The GraphLab engine incorporates three main parts:
1. The data-graph, which represents the user program state at a cluster machine
Vertex
Edge
Data-Graph
32
Components of the GraphLab Engine:
The Update Function
 The GraphLab engine incorporates three main parts:
2. The update function, which involves two main functions:
2.1- Altering data within a scope of a vertex
2.2- Scheduling future update functions at neighboring vertices
Sv
v
The scope of a vertex v (i.e., Sv)
is the data stored in v and in all
v’s adjacent edges and vertices
Components of the GraphLab Engine:
The Update Function
 The GraphLab engine incorporates three main parts:
2. The update function, which involves two main functions:
2.1- Altering data within a scope of a vertex
2.2- Scheduling future update functions at neighboring vertices
Algorithm: The GraphLab Execution Engine
Schedule v
The update function
Components of the GraphLab Engine:
The Update Function
 The GraphLab engine incorporates three main parts:
2. The update function, which involves two main functions:
Scheduler
2.1- Altering data within a scope of a vertex
2.2- Scheduling future update functions at neighboring vertices
CPU 1
e
b
a
hi
h
c
b
a
f
i
d
g
j
CPU 2
The process repeats until the scheduler is empty
k
Components of the GraphLab Engine:
The Sync Operation
 The GraphLab engine incorporates three main parts:
3. The sync operation, which maintains global statistics describing data
stored in the data-graph
 Global values maintained by the sync operation can be written by all
update functions across the cluster machines
 The sync operation is similar to Pregel’s aggregators
 A mutual exclusion mechanism is applied by the sync operation to
avoid write-write conflicts
 For scalability reasons, the sync operation is not enabled by default
The GraphLab Analytics Engine
GraphLab
Motivation
&
Definition
Input, Output
&
Components
The
Architectural
Model
The
Programming
Model
The
Computation
Model
37
FaultTolerance
The Architectural Model
 GraphLab adopts a peer-to-peer architecture
 All engine instances are symmetric
 Engine instances communicate together using Remote Procedure Call
(RPC) protocol over TCP/IP
 The first triggered engine has an additional responsibility of being a
monitoring/master engine
 Advantages:
 Highly scalable
 Precludes centralized bottlenecks and single point of failures
 Main disadvantage:
 Complexity
The GraphLab Analytics Engine
GraphLab
Motivation
&
Definition
Input, Output
&
Components
The
Architectural
Model
The
Programming
Model
The
Computation
Model
39
FaultTolerance
The Programming Model
 GraphLab offers a shared-memory programming model
 It allows scopes to overlap and vertices to read/write from/to their scopes
Consistency Models in GraphLab
 GraphLab guarantees sequential consistency

Provides the same result as a sequential execution of the computational steps
 User-defined consistency models



Full Consistency
Vertex Consistency
Edge Consistency
41
Vertex v
Full
Consistency
Model
Consistency Models in GraphLab
Read
Write
2
1
D1
D1↔2
D2
3
D2↔3
D3
4
D3↔4
D4
5
D4↔5
D5
Edge
Consistency
Model
Read
Write
1
D1
2
D1↔2
D2
3
D2↔3
D3
4
D3↔4
D4
5
D4↔5
D5
Vertex
Consistency
Model
Read
1
D1
Write
3
2
D1↔2
D2
D2↔3
D3
4
D3↔4
D4
5
D4↔5
D5
The GraphLab Analytics Engine
GraphLab
Motivation
&
Definition
Input, Output
&
Components
The
Architectural
Model
The
Programming
Model
The
Computation
Model
43
FaultTolerance
The Computation Model
 GraphLab employs an asynchronous computation model
 It suggests two asynchronous engines
 Chromatic Engine
 Locking Engine
 The chromatic engine executes vertices partially asynchronous
 It applies vertex coloring (e.g., no adjacent vertices share the same color)
 All vertices with the same color are executed before proceeding to a different color
 The locking engine executes vertices fully asynchronously
 Data on vertices and edges are susceptible to corruption
 It applies a permission-based distributed mutual exclusion mechanism to
avoid read-write and write-write hazards
The GraphLab Analytics Engine
GraphLab
Motivation
&
Definition
Input, Output
&
Components
The
Architectural
Model
The
Programming
Model
The
Computation
Model
45
FaultTolerance
Fault-Tolerance in GraphLab
 GraphLab uses distributed checkpointing to recover from machine failures
 It suggests two checkpointing mechanisms
 Synchronous checkpointing (it suspends the entire execution of GraphLab)
 Asynchronous checkpointing
How Does GraphLab Compare to
MapReduce and Pregel?
47
GraphLab vs. Pregel vs. MapReduce
Aspect
Hadoop
MapReduce
Pregel
GraphLab
Programming
Model
Shared-Memory
Message-Passing
Shared-Memory
Computation
Model
Synchronous
Synchronous
Asynchronous
Parallelism
Model
Data-Parallel
Graph-Parallel
Graph-Parallel
Architectural
Model
Master-Slave
Master-Slave
Peer-to-Peer
Task/Vertex
Scheduling
Model
Pull-Based
Push-Based
Push-Based
Application
Suitability
LooselyConnected/Embarra
ssingly Parallel
Applications
Strongly-Connected
Applications
Strongly-Connected
Applications (more
precisely MLDM apps)
Next Class
Fault-Tolerance
Back-up Slides
50
PageRank
 PageRank is a link analysis algorithm
 The rank value indicates an importance of a particular web page
 A hyperlink to a page counts
as a vote of support
 A page that is linked to by many pages
with high PageRank receives a
high rank itself
 A PageRank of 0.5 means there is a 50% chance that a person
clicking on a random link will be directed to the document with the
0.5 PageRank
PageRank (Cont’d)
 Iterate:
 Where:
 α is the random reset probability
 L[j] is the number of links on page j
1
2
3
4
5
6
PageRank Example in GraphLab
 PageRank algorithm is defined as a per-vertex operation working on the scope
of the vertex
pagerank(i, scope){
// Get Neighborhood data
(R[i], Wij, R[j]) scope;
// Update the vertex data
R [ i ]    (1   )
W
ji
 R [ j ];
j N [ i ]
// Reschedule Neighbors if needed
if R[i] changes then
reschedule_neighbors_of(i);
}
Dynamic
computation