GridPP Public Service Summit

Download Report

Transcript GridPP Public Service Summit

The Grid for
Particle
Physic(ist)s
What is it and how do I use it?
Steve Lloyd
Steve Lloyd
Queen Mary, University of London
IoP Dublin March 2005
Slide 1
LHC Data Challenge
• ~100,000,000 electronic
channels
• 800,000,000 protonproton interactions per
second
• 0.0002 Higgs per second
• 10 PBytes of data a year
• (10 Million GBytes = 14
Million CDs)
Starting from this event…
We are looking for this “signature”
Selectivity: 1 in 1013
Like looking for 1 person in a
thousand world populations
Or for a needle in 20 million
haystacks!
Steve Lloyd
Computing Solution: The Grid
IoP Dublin March 2005
Slide 2
The Grid
'Grid' means different things
to different people
Ian Foster / Carl Kesselman:
"A computational Grid is a
hardware and software
infrastructure that
provides dependable,
consistent, pervasive and
inexpensive access to
high-end computational
capabilities."
All agree it’s a funding opportunity!
Steve Lloyd
IoP Dublin March 2005
Slide 3
Electricity Grid
Analogy with the Electricity Power Grid
Power Stations
Distribution Infrastructure
'Standard Interface'
Steve Lloyd
IoP Dublin March 2005
Slide 4
Computing Grid
Computing and Data Centres
Fibre Optics of
the Internet
Steve Lloyd
IoP Dublin March 2005
Slide 5
Middleware
Single PC
PROGRAMS
Word/Excel
Your
Program
Grid
Your
Program
Games
MIDDLEWARE
Email/Web
OPERATING SYSTEM
User
Interface
Machine
Resource
Broker
CPU
Disks, CPU etc
Middleware is the Operating
System of a distributed
computing system
Steve Lloyd
Bookkeeping
Service
Disk
Server
CPU
Cluster
IoP Dublin March 2005
Information
Service
Replica
Catalogue
CPU
Cluster
CPU
Cluster
Slide 6
GridPP
19 UK Universities, CCLRC (RAL &
Daresbury) and CERN
Funded by the Particle Physics
and Astronomy Research
Council (PPARC)
GridPP1 - 2001-2004 £17m "From
Web to Grid"
GridPP2 - 2004-2007 £16m "From
Prototype to Production"
Not planning GridPP3 – aim is to
incorporate Grid activities and
facilities into baseline
programme.
Steve Lloyd
IoP Dublin March 2005
Slide 7
International
Collaboration
• EU DataGrid (EDG) 2001-2004
– Middleware Development Project
• LHC Computing Grid (LCG)
– Grid Deployment Project for LHC
• EU Enabling Grids for e-Science (EGEE) 2004-2006
– Grid Deployment Project for all disciplines
GridPP
LCG
EGEE
• US and other Grid projects Interoperability
Steve Lloyd
IoP Dublin March 2005
Slide 8
GridPP Support
Manpower for Experiments:
(Not directly supported but using LCG)
Manpower for Middleware Development:
• Metadata
• Storage
• Workload Management
• Security
• Information and Monitoring
• Networking
Hardware and Manpower at RAL (LHC Tier-1, BaBar Tier-A)
Manpower for System Support at Institutes (Tier-2s)
Manpower for LCG at CERN (under discussion)
Steve Lloyd
IoP Dublin March 2005
Slide 9
Paradigm Shift?
LHCb Monte Carlo Production
Grid
May: 89%:11%
Jun: 80%:20%
NonGrid
Jul: 77%:23%
Aug: 27%:73%
Grid
Steve Lloyd
IoP Dublin March 2005
NonGrid
Slide 10
Tier Structure
Offline farm
Online system
Tier 0
Tier 1
CERN computer centre
RAL,UK
USA Germany Italy
France
National centres
Tier 2
Regional groups
Tier 3
Institutes
ScotGrid NorthGrid SouthGrid
Glasgow Edinburgh
London
Durham
Tier 4
Workstations
Steve Lloyd
IoP Dublin March 2005
Slide 11
Resource
Discovery at Tier-1
RAL Linux CSF : Weekly CPU Utilisation
Financial Year 2000/01
Pre-Grid
18000
1 October
2000
14000
12000
10000
8000
6000
GRID Load 21-28 July 2004
5 - Mar- 0 1
5 - Feb - 0 1
Full again in 8 hours!
8 - Jan- 0 1
1 8 - Sep - 0 0
2 1 - Aug - 0 0
2 4 - Jul - 0 0
2 6 - Jun- 0 0
2 9 - May- 0 0
1 - May- 0 0
3 - Ap r- 0 0
0
1 1 - Dec- 0 0
2000
1 3 - Nov- 0 0
4000
1 6 - Oct - 0 0
Platform-related CPU Hours
16000
1 July
2000
Steve Lloyd
IoP Dublin March 2005
Slide 12
UK Tier-2 Centres
ScotGrid
Mostly funded
by HEFCE
Durham, Edinburgh, Glasgow
NorthGrid
Daresbury, Lancaster, Liverpool,
Manchester, Sheffield
SouthGrid
Birmingham, Bristol, Cambridge,
Oxford, RAL PPD, Warwick
London
Brunel, Imperial, QMUL, RHUL, UCL
Steve Lloyd
IoP Dublin March 2005
Slide 13
Tier-2 Resources
Committed Resources at each Tier-2 in 2007
Experiments’ Requirement of a Tier-2 in 2008
CPU
Disk
ALICE
ATLAS
CMS
LHCb
ALICE
ATLAS
CMS
LHCb
London
0.0
1.0
0.8
0.4
0.0
0.2
0.3
11.0
NorthGrid
0.0
2.5
0.0
0.3
0.0
1.3
0.0
12.1
ScotGrid
0.0
0.2
0.0
0.2
0.0
0.0
0.0
39.6
SouthGrid
0.2
0.5
0.2
0.3
0.0
0.1
0.0
6.8
Doesn’t include SRIF3.
Experiment shares
determined by Institutes
who bought the kit
Steve Lloyd
Need SRIF3 Resources!
Overall LCG shortfall ~30% in CPU
~50% in Disk (All Tiers)
IoP Dublin March 2005
Slide 14
The LCG Grid
123 Sites
33 Countries
10,314 CPUs
3.3 PBytes Disk
Steve Lloyd
IoP Dublin March 2005
Slide 15
Grid Demo
http://www.hep.ph.ic.ac.uk/e-science/projects/demo/index.html
Steve Lloyd
IoP Dublin March 2005
Slide 16
Getting Started
1. Get a digital certificate
Authentication – who you are
http://ca.grid-support.ac.uk/
2. Join a Virtual Organisation (VO)
For LHC join LCG and choose a VO
Authorisation – what you are
allowed to do
http://lcg-registrar.cern.ch/
3. Get access to a local User
Interface Machine (UI) and copy
your files and certificate there
Steve Lloyd
IoP Dublin March 2005
Slide 17
Job Preparation
Prepare a file of Job
Description Language
(JDL):
Script to run
Input files
Job Options
My C++ Code
############# athena.jdl #################
Executable = "athena.sh";
StdOutput = "athena.out";
StdError = "athena.err";
InputSandbox = {"athena.sh", "MyJobOptions.py", "MyAlg.cxx", "MyAlg.h", "MyAlg_entries.cxx",
"MyAlg_load.cxx", "login_requirements", "requirements", "Makefile"};
OutputSandbox = {"athena.out","athena.err", "ntuple.root", "histo.root", "CLIDDBout.txt"};
Requirements = Member("VO-atlas-release-9.0.4",
other.GlueHostApplicationSoftwareRunTimeEnvironment);
################################################
Output Files
Choose ATLAS Version
(Satisfied by ~32 Sites)
Steve Lloyd
IoP Dublin March 2005
Slide 18
Job Submission
Make a copy of your certificate to send out (~ once a day):
[lloyd@lcgui ~/atlas]$ grid-proxy-init
Your identity: /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=steve lloyd
Enter GRID pass phrase for this identity:
Creating proxy .............................. Done
Your proxy is valid until: Thu Mar 17 03:25:06 2005
[lloyd@lcgui ~/atlas]$
Submit the Job:
VO
File to hold job IDs
JDL
[lloyd@lcgui ~/atlas]$ edg-job-submit --vo atlas -o jobIDfile athena.jdl
Selected Virtual Organisation name (from --vo option): atlas
Connecting to host lxn1188.cern.ch, port 7772
Logging to host lxn1188.cern.ch, port 9002
================================ edg-job-submit Success ====================================
The job has been successfully submitted to the Network Server.
Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:
- https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQ
The edg_jobId has been saved in the following file:
/home/lloyd/atlas/jobIDfile
============================================================================================
[lloyd@lcgui ~/atlas]$
Steve Lloyd
IoP Dublin March 2005
Slide 19
Job Status
Find out its status:
Ran at:
[lloyd@lcgui ~/atlas]$ edg-job-status -i jobIDfile
-----------------------------------------------------------------1 : https://lxn1188.cern.ch:9000/tKlZHxqEhuroJUhuhEBtSA
2 : https://lxn1188.cern.ch:9000/IJhkSObaAN5XDKBHPQLQyA
3 : https://lxn1188.cern.ch:9000/BMEOq90zqALvkriHdVeN7A
4 : https://lxn1188.cern.ch:9000/l6wist7SMq6jVePwQjHofg
5 : https://lxn1188.cern.ch:9000/wHl9Yl_puz9hZDMe1OYRyQ
6 : https://lxn1188.cern.ch:9000/PciXGNuAu7vZfcuWiGS3zQ
7 : https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQ
a : all
q : quit
-----------------------------------------------------------------Choose one or more edg_jobId(s) in the list - [1-7]all:7
*************************************************************
BOOKKEEPING INFORMATION:
Valencia
RAL
CERN
Taiwan
Status info for the Job : https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQ
Current Status:
Done (Success)
Exit code:
0
Status Reason:
Job terminated successfully
Destination:
lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-short
reached on:
Wed Mar 16 17:45:41 2005
*************************************************************
Taiwan
[lloyd@lcgui ~/atlas]$
Steve Lloyd
IoP Dublin March 2005
Slide 20
Job Retrieval
Retrieve the Output:
[lloyd@lcgui ~/atlas]$ edg-job-get-output -dir . -i jobIDfile
Retrieving files from host: lxn1188.cern.ch ( for
https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQ )
*********************************************************************************
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
- https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQ
have been successfully retrieved and stored in the directory:
/home/lloyd/atlas/lloyd_0uDjtwbBbj8DTRetxYxoqQ
*********************************************************************************
[lloyd@lcgui ~/atlas]$ ls -lt /home/lloyd/atlas/lloyd_0uDjtwbBbj8DTRetxYxoqQ
total 11024
-rw-r--r-1 lloyd
hep
224 Mar 17 10:47 CLIDDBout.txt
-rw-r--r-1 lloyd
hep
69536 Mar 17 10:47 ntuple.root
-rw-r--r-1 lloyd
hep
5372 Mar 17 10:47 athena.err
-rw-r--r-1 lloyd
hep
11185282 Mar 17 10:47 athena.out
Steve Lloyd
IoP Dublin March 2005
Slide 21
Conclusions
• The Grid is here – it works!
• Currently difficult to install and maintain the
middleware and the experiment’s software
• It is straightforward to use
• There are huge resources available: Last week
LXBATCH had 6500 ATLAS Jobs queued - LCG had
3017 free CPUs
• Need to scale to full size ~10,000 → 100,000 CPUs
• Need Stability, Robustness, Security (Hackers
Paradise!) etc
• Need continued funding beyond start of LHC!
Use it!
Steve Lloyd
IoP Dublin March 2005
Slide 22
Further Info
http://www.gridpp.ac.uk
Steve Lloyd
IoP Dublin March 2005
Slide 23