Citizen CyberScience in China

Download Report

Transcript Citizen CyberScience in China

CAS@home
Wenjing Wu
[email protected]
Computer Center,
Institute of High Energy Physics
Chinese Academy of Sciences, Beijing
2015/7/16
BOINC workshop 2013 @Grenoble
1
outline
• CAS@home project
• Applications:
– Lammps: dynamical molecular simulation
– treeThreader: protein structure prediction
• Remote Job Submission
2015/7/16
BOINC workshop 2013 @Grenoble
2
CAS@HOME
First and Only Volunteer Project in mainland China
Launched in June 2010, hosted by the computer center
of IHEP, CAS
To support scientific computing from Chinese Academy of
Sciences and other Research Institutes
Host multiple applications from various research fields,
including nanotechnology, bioinformation, physics
2015/7/16
BOINC workshop 2013 @Grenoble
3
CAS@home status
Ever Since it was launched in June 2010
10K active users
23K
1.3 TFLOPS
active hosts
(real time computing power)
1/3 are Chinese
Peak: 1M/month
validated CPU hours
7M CPU hours
Since Nov 2012
Hosting 3 applications: Lammps , treeThreader, Aevol
Other ongoing applications: BOSS (VBoxwrapper based)
2015/7/16
BOINC workshop 2013 @Grenoble
4
Some project Statistics
Application 1: Lammps
• Software for dynamical molecular simulation, widely
used by scientists from various research fields.
• Restartable, developed in C by an international group,
can be compiled on both Windows and Linux with
some effort.
• Input/output: 3 mandatory input files (<10MB)/ 1
compressed output file (hundreds of MB)
• Running time : 0.5 hour to 800 hours (it
depends on a random number which decides
the steps of the simulation)
2015/7/16
BOINC workshop 2013 @Grenoble
6
Problems
• Results are numerical, it generates discrepancy for 2 reasons:
– float point calculation on different platforms
– the checkpoints also cause discrepancy due to losing
precision with printing the value to a text file.
• Solutions
– Homogeneous Redundancy, or Homogeneous Application
Version
• Running problems:
– Some long jobs (~hundreds hours) crash in the
middle without getting any credit.
2015/7/16
BOINC workshop 2013 @Grenoble
7
Application 2: treeThreader
• For Protein structure prediction
• Written in C by local scientists, can be compiled easily on both
Windows and Linux platform, restartable
• Computing task: to compare a protein sequence file against all
existing protein templates.
• Input files: configuration files, Protein Sequence file, ~50k
Protein templates (about 4GB)
• Output files: a text file corresponds to a template file
• It needs about 42GFLOPS/hour to compare one sequence file
against all templates.
2015/7/16
BOINC workshop 2013 @Grenoble
8
Computing task
Each comparison takes 6s
Protein Template 1
A Protein sequence
Protein Template 2
Protein Template 3
It takes about 84 hours on a
single core
Protein Template
150,000
Host
Running it on BOINC
Locality
Scheduling (job
goes to where
the data is)
Each comparison takes 6s,each
sub package takes 9000s on a
host
Protein Template 1
Protein Template 2
Protein Template 1500
Sub Package 1 (sticky file)
Host A1
A Protein sequence
Protein Template 1501
Protein Template 1502
Protein Template 3000
It takes 9000s (2.5 hours) to
finish the task
Protein Template 46501
Protein Template 46502
Protein Template 48000
Sub Package 32(sticky file)
Host Am
Sub Package 2(sticky file)
Host A2
Sub Package 14(sticky file)
Sub Package 15(sticky file)
Sub Package 16(sticky file)
Host An
Problems
• Long tail batches
– There is a front end server which submits batches and
does the pre-processing and post processing of the
sequence, hence it can only maintain/watch a maximum
number of active batches (batches in progress) in parallel
(300)
– a whole batch is delayed by the slowest job
– No new batches will be submitted to the BOINC server due
to some batches are still “in progress” (waiting for the
slowest jobs)
– A lot of hosts end up in “starving” situation
2015/7/16
BOINC workshop 2013 @Grenoble
11
Remote Job Submission
• CAS@home hosts multiple applications
• Each application has multiple users
• Application users have no privileges to submit jobs via CAS@home server
directly
• It requires remote job submission which allows authorized and
authenticated users to submit jobs through remote machines.
• Basic Remote Job Submission functions: batch
submit/check_status/retire/abort/download results
• BOINC provides a quite rich set of APIs for remote batch (a set of jobs
based on the same input files) operations, but each application still needs
its own server side CGI code and client side code for remote job
submission
– Some operations (Batch retire/abort/status check) are generic, can directly use BOINC
API
– Other operations like batch submit/results downloading are application specific, need to
be customized.
– Can add fancy functions as “test running”, “estimate running time”
2015/7/16
BOINC workshop 2013 @Grenoble
12
Lammps Job Submission
•
•
•
•
Jobs are created in batches.
A batch = 1 set of input files + different parameter-value pairs
A batch comprises from hundreds to thousands of jobs
Remote Job Submission: Batches are submitted through a
web portal by authenticated and authorized users
• Authenticated and Authorized users can “operate” the
batches through the web portal (retire, abort, check status,
download results)
Batch A –(input file1, input file 2)
Job 1: Ka1=Va1 Kb1=Vb1
Job 2: Ka2=Va2 Kb2=Vb2
…..
Job N: KaN=VaN KbN=VbN
2015/7/16
BOINC workshop 2013 @Grenoble
13
LAMMPS
File Sandbox
Test a Job
File Sandbox
Service
Submit a Batch
LAMMPS CGI
Check Batch Status
Get Output
CAS User Interface
Job1: Para List , Value List1
Job2: Para List , Value List2
Job3: Para List , Value List3
….
JobN: Para List , Value ListN
CAS@home
…
File1 Sandbox
File2
Syntax check, GLOPS,
output size estimation
Volunteer
Hosts
Test a job with
chosen input files
http
http
http
User
http
Job Tester
Pass the test
Submit a batch
http
Batch Creator
http
http
Batch Monitor
Job Monitor
http
http
Batch Operations
Abort/Retire a batch
Zip Results
Download Results
LAMMPS CGI on
CAS@home server
Operations on Batch
http
Volunteer
Hosts
Web Portal
2015/7/16
BOINC workshop 2013 @Grenoble
15
BOINC Sandbox
Can not repeat uploading a file
Can not delete files used
by a running batch
2015/7/16
BOINC workshop 2013 @Grenoble
16
Lammps Job Testing
Lammps
Specific !
Submit the batch
Test the job to the server
2015/7/16
BOINC workshop 2013 @Grenoble
17
Batch Monitoring
Admin can see the status of all
batches
Batch status: In process,
Completed, Aborted, Retired
2015/7/16
BOINC workshop 2013 @Grenoble
18
Admin all batches
2015/7/16
BOINC workshop 2013 @Grenoble
19
Job Status
Input files associated with this job
Results can be downloaded respectively
2015/7/16
BOINC workshop 2013 @Grenoble
20
Batch Operations
Can Abort an
unfinished batch
here
Download results of a work unit
Download results of this batch
Retire a batch
2015/7/16
BOINC workshop 2013 @Grenoble
21
TreeThreader job submission
• Jobs are created in batches: 1 protein sequence
corresponds to 1 batch (32 jobs)
• Remote Job Submission:
– Client side: provide a set of PHP APIs which allows
authenticated and authorized users to submit batches and
operate (check status, retire, abort, get output)these
batches from remote
– Server side:
• Generic operations such as batch abort/retire/status check are already
included in BOINC code
• Operations as batch submission and results downloading are application
specific, and implemented in a CGI program on the server side
2015/7/16
BOINC workshop 2013 @Grenoble
22
TreeThreader Job Submission CGI
• Batch submission
– Takes client uploaded the sequence and configuration files
– create a batch of jobs based on the input files and all templates files which
are already stored on the server side.
– Return a Batch ID
• Batch result downloading
– uncompress all output files of the batch
– put uncompressed output files into a same directory and compress it
– return the downloading URL of the batch result file
2015/7/16
BOINC workshop 2013 @Grenoble
23
TreeThreader Job Submission
ICT Web
Services
Status Check
Sequence
Submit a sequence
Template P2
Template P3
API
Merged Results
Get Output
Template P1
Template P4
…
Template P32
TreeThreader CGI
CAS@home
…
Thoughts on a more generic Job
submission interface
• Server side still requires specific functions to
create batches, merge results, testing,
estimation
• On client side, can generalize the job
submission and results downloading functions
• Use an XML file to describe input files, types
of input files from the client side
2015/7/16
BOINC workshop 2013 @Grenoble
25
<jobdesc>
<file info>
<number> 0 </number>
<type>upload</type> !file needs to be uploaded to BOINC server
</file info>
<file info>
<number> 1 </number>
<type>online</type> !file already stored on BOINC server
</file info>
<file_ref>
<file_number>0</file_number>
<open_name>MySEQ.tar.gz</open_name>
</file_ref>
<file_ref>
<file_number>1</file_number>
<open_name>Templates</open_name>
</file_ref>
</jobdesc>
2015/7/16
BOINC workshop 2013 @Grenoble
26
The End!
2015/7/16
BOINC workshop 2013 @Grenoble
27