PowerPoint Template

Download Report

Transcript PowerPoint Template

BOINC Workshop 10
Enabling interprocess
communication for BOINC
applications
Hien Nguyen, Eshwar Rohit
University of Houston
Supervisors:
Dr. Jaspal Subhlok
University of Houston
Dr. David P. Anderson
SSL – U.C, Berkeley
RESEARCH GOAL
•Enable BOINC to efficiently support apps
that require interprocess communication.
Goals:
Easier programming for communicating
applications
Reduce execution time (not increase
throughput)
2
Hien Nguyen
University of Houston
Example Applications
REMD Protein Folding application
Each process runs a standard molecular simulation at different
temperature
P1 P2 P3 P4 P5 P6 P7 P8
270
280
290
300
310
320
330
340
280
270
290
300
320
310
340
330
280
290
270
300
320
340
310
330
STEP-1
STEP-2
STEP-3
3
Hien Nguyen
University of Houston
Example Applications
Or many other applications:
Differential equation solvers (grid) (synchronous)
Game playing with alpha/beta pruning
(asynchronous)
Search application.
…..
Suitable applications: moderate amount of
communication.
4
Hien Nguyen
University of Houston
DIFFICULTIES
Job execution
Fast host
X
X
Slow host
Synchronization point
X
Slow down overall
execution speed
X
5
Worse as number of
host increases
Hien Nguyen
University of Houston
OUTLINE
1. Volpex Dataspace Overview
•
IPC for volunteer environment
2. Integration With BOINC
•
•
Process management
Host selection
3. Future and Related Work
6
Hien Nguyen
University of Houston
Volpex Dataspace
•Dataspace: global shared space that processes
can use for information exchange without a
temporal or spatial coupling.
Volpex Dataspace Server
Put(ABC, 800)
Get(ABC,?)
(800)
7
Hien Nguyen
University of Houston
Volpex Dataspace – Fault Tolerance
replicated
X
Put(ABC, 800)
Get(ABC,?)
(800)
Volpex DSS is designed to support redundant
Put/Get operations unlike Linda & variants
8
Hien Nguyen
University of Houston
Volpex Dataspace
•Why centralized? Scale issues
•But: firewalls, no incoming connections
9
Hien Nguyen
University of Houston
INTEGRATION WITH BOINC
•Mechanics: Process management
Simultaneous process starting
Fault tolerance
Checkpoint/restart
•Policy: Host selection.
10
Hien Nguyen
University of Houston
Job execution scheme
BOINC Scheduler
X
Get work
Get checkpoint
Put data item
Volpex Dataspace Server
Put checkpoint
Get data item
11
Hien Nguyen
University of Houston
PROCESSES MANAGEMENT
•Simultaneous process starting:
All processes start computation together:
reduce wasted resources because processes will
have to wait for eachothers.
Volpex jobs have highest (infinite) priority:
uninterruptible by other jobs.
While waiting for all processes of a Volpex job to
be ready: host can do other finite priority
volunteer jobs.
Use of boinc_temporary_exit()
12
Hien Nguyen
University of Houston
PROCESSES MANAGEMENT
•Fault tolerance
Dead instance spotted by heartbeat
mechanism: process instances regularly
send heartbeat to Volpex DSS.
BOINC scheduler recruits a new
volunteer host to replace the dead one.
13
Hien Nguyen
University of Houston
PROCESSES MANAGEMENT
•Hot spare policy:
Fast replacement
BOINC Scheduler
X
Hot spare group
Volpex Dataspace Server
14
Hien Nguyen
University of Houston
PROCESSES MANAGEMENT
•Checkpointing:
Process instance commits and uploads
checkpoints to Volpex DSS (only stores
latest checkpoint for each process).
Restarted process instance
checkpoint from Volpex DSS.
15
requests
Hien Nguyen
University of Houston
HOST SELECTION
•Volpex job: consists of processes that
form a job, submitted by scientist.
•Has requirements on:
Deadline
Number of processes
•Has estimates of:
Total flops per process.
Flops between 2 consecutive
checkpoints.
Memory usage, disk usage
16
Hien Nguyen
University of Houston
HOST SELECTION POLICY
Criteria for selecting volunteer hosts to
assign to a Volpex job: Speed and
Availability.
Availability: the interval that a host is
available w/o interruption (BOINC client
allowed to compute).
17
Hien Nguyen
University of Houston
HOST SELECTION POLICY
Job’s minimum requirements:
•Minimum speed : Fast enough to meet job’s deadline
Min speed = Total flops / Deadline
•Minimum expected availability : host is continuously
available for x hours to commit at least 1 checkpoint.
18
Hien Nguyen
University of Houston
Evaluate Host's Availability
•We want to predict host's length of
availability interval.
•Method based on : Exploiting NonDedicated Resources for Cloud Computing
Artur Andrzejak, Derrick Kondo, David P.
Anderson. (NOMS10)
19
Hien Nguyen
University of Houston
Evaluate Host's Availability
Last value predictor: simplistic predictor
which uses the availability value in the last
hourly interval before prediction as the
prediction of availability for the next x hours
interval.
Combined with ranking hosts by
predictability: number availability changes
per week.
20
Hien Nguyen
University of Houston
Evaluate Host's Availability
In essence: select hosts which change
availability very rarely.
A process assigned to a host with high
predictability does not necessarily need
to be replicated.
21
Hien Nguyen
University of Houston
IMPLEMENTATION STATUS
•Volpex utilities: for scientists to submit,
abort or query status of a Volpex job.
•Modified BOINC scheduler: includes
host selection for Volpex job.
•Modified Volpex DSS: handling new
type of requests.
22
Hien Nguyen
University of Houston
IMPLEMENTATION STATUS
BOINC
BOINCServer
Scheduler
submit jobScheduler
specs
request
Scientist
replacement
dynamically create
result from WU
Create job & WU
X
Scheduler reply
heartbeat
get procID
Database
checkpoint
Volpex Dataspace Server
Hot spare group
request
procID
procID
of failed
instance
23
Hien Nguyen
University of Houston
FUTURE WORK
Experiment and evaluate with different degrees of
freedom:
•Number of processes 10-1M
•Communication pattern (local/global,
synch/asynch)
•Size and frequency of communication
Goal: study (via live experiment or simulation) the
performance of Volpex/BOINC over this space
Application: Eratosthenes, REMD Protein Folding
Enhance host selection policy
24
Hien Nguyen
University of Houston
OTHER WORK
Volpex MPI:
•An MPI library designed for executing parallel
applications in volunteer environment.
•Direct communication between processes.
•Key Features
 Controlled redundancy
 Receiver based direct communication
 Distributed sender based logging
More detail: “VolpexMPI: an MPI Library for Execution of
Parallel Applications on Volatile Nodes” by Troy LeBlanc,
Rakhi Anand, Edgar Gabriel, and Jaspal Subhlok.
25
Hien Nguyen
University of Houston
If you have application?
We would be happy to cooperate with you.
Our team contacts:
•Dr. Jaspal Subhlok: [email protected]
•Dr. David Anderson: [email protected]
•Dr. Edgar Gabriel: [email protected]
•Hien Nguyen: [email protected]
•Eshwar Rohit: [email protected]
•Rakhi Anand: [email protected]
Our Website:
http://www2.cs.uh.edu/~jsteach/volpex/index.htm
26
Hien Nguyen
University of Houston