Transcript Document
1
Tutorial: Technology of the Grid
1. Definition
2. Components
3. Infrastructure
Kento Aida
Tokyo Institute of Technology
Kento Aida, Tokyo Institute of Technology
2
Goal of the Tutorial
What is the grid?
definition
What technology is needed to create the grid?
component technology
How is the grid environment is constructed?
infrastructure
Kento Aida, Tokyo Institute of Technology
3
1. Definition
Kento Aida, Tokyo Institute of Technology
4
Definition of the Grid
Definition [http://www.jpgrid.org/about/index.html]
The grid is an infrastructure to dynamically
organize a virtual organization (or a virtual
computer) on demand by virtualizing and
integrating resources such as computers, data,
experimental devices, sensors, people.
(The original definition is written by Japanese.)
What is the grid? A three point checklist
[http://www.gridtoday.com/02/0722/100136.html]
coordinates resources that are not subject to centralized
control
using standard, open, general-purpose protocols and
interfaces
to deliver nontrivial qualities of service
Kento Aida, Tokyo Institute of Technology
5
What can we do using the grid?
We can use information resources (services) on
network
securely (to guarantee security),
stably (to use required
resources on demand),
and
easily (without
Internet
knowledge of network,
computers, …).
virtual organization
Kento Aida, Tokyo Institute of Technology
6
Examples of Virtual Organizations
Members in a collaborative research project
Researchers in a collaborative research project share
resources distributed over their sites, e.g. universities,
institutes, laboratories, ….
large-scale scientific computing
large-scale distributed database
Project team in a company
Members in a project team share resources distributed
over multiple branches in a company.
business
transaction
Kento Aida, Tokyo Institute of Technology
7
Definition of the Grid (again)
Definition
The grid is an infrastructure to dynamically
organize a virtual organization (or a virtual
computer) on demand by virtualizing and
integrating resources … .
What is the grid? A three point checklist
coordinates resources that are not subject to centralized
control
dynamic organization of VO
using standard, open, general-purpose protocols and
interfaces
access to resources by standardized protocols
to deliver nontrivial qualities of service
Users do not have to have knowledge about network,
computers, etc.
Kento Aida, Tokyo Institute of Technology
8
Grid?
Grid = supercomputer + network?
Grid = idle PCs + network?
Grid = large-scale parallel processing on the
internet?
If we connect our resources to the grid, anonymous
users’ jobs will run on our resources without owners’
knowledge?
If we submit jobs to the grid, our job will run on
resources in unknown sites?
Kento Aida, Tokyo Institute of Technology
9
Classification of the Grid
Computing Grid
(high-performance computing)
business
Data Grid
(high-performance data processing)
Sensor Grid
(advanced sensing)
Access Grid
(support for collaboration)
science
Business Grid
(advanced web service)
PC Grid
(utilization of idle PCs)
Kento Aida, Tokyo Institute of Technology
10
Computing Grid
Grid computing
high-performance computing service to utilize computers
on the grid
Merit of users
reducing computation time
expanding problem size
receiving computation service
Component technology
security, resource management, job management,
programming, problem solve environment (PSE), …
Kento Aida, Tokyo Institute of Technology
11
Data Grid
Large-scale data processing/computing
large-scale distributed database on the internet
data processing service to access distributed data
Merit of users
high-speed access to distributed data
high-performance and reliable processing using largescale data
Component technology
security, high-speed data transfer, replica management,
scheduling
Kento Aida, Tokyo Institute of Technology
12
Access Grid
Communication support on the grid
Example
remote conference
virtual laboratory
remote medical service
SARS Grid (NCHC)
entertainment
“KARAOKE” Grid (AIST)
Kento Aida, Tokyo Institute of Technology
13
Sensor Grid
Advanced Monitoring
coordination of autonomous sensors connected by
network
wired network, wireless network, satellite, …
advanced sensing, analysis, forecasting
Example
meteorology (weather forecast), ecology, agriculture, …
Kento Aida, Tokyo Institute of Technology
14
Technical Issues of the Grid
Component technology
security, information service, resource management
job management, scheduling
data management
programming
problem solve environment (PSE)
Infrastructure
production grid
Application
applying to big science
applying business
Kento Aida, Tokyo Institute of Technology
15
2. Components
Kento Aida, Tokyo Institute of Technology
16
Component Technology of the Grid
application
problem solve environment
programming
information
service
job
management
data
management
resource management
security
infrastructure (computer, network, experimental device, …)
Kento Aida, Tokyo Institute of Technology
17
Security
Issues
authentication, encryption of communication
Single sign on
user authentication on one host
Authentication on other hosts is automatically performed.
user
internet
authentication
authentication is
automatically
performed.
authentication
authentication
Org. A
Org. C
Org. B
Kento Aida, Tokyo Institute of Technology
18
Resource Management
Common interfaces to the grid
wrapping differences of commands/operations among
different machines
internet
user
common command
GW
com. a
OS A
Org. A
GW
GW
com. c
com. b
OS B
Org. B
OS C
Org. C
Kento Aida, Tokyo Institute of Technology
19
Information Service
Information about resources on the Grid
info. service
network
monitoring
CPU: …
memory: …
OS: …
internet
GW
GW
GW
Org. C
Org. A
Org. B
Kento Aida, Tokyo Institute of Technology
20
Big picture of the GT2
CA
Proxy
Cert.
GIIS
gatekeeper
Query
Resource
Status
User
Cert.
grid-proxy-init
Proxy
Cert.
Client
GRIS
Site B
process
Data
Transfer
Process invocation
Proxy
Cert.
GRIS
GridFTP
Server
Return result
Site C
gatekeeper
GRIS
[source: Yoshio Tanaka, AIST]
Site B
Kento Aida, Tokyo Institute of Technology
21
Job Management
Resource selection, Scheduling, Job control
info. service
(2)
resource broker
(1,3,4)
(0)
internet
GW
(4)
user
GW
GW
Org. A
Org. B
Org. C
Kento Aida, Tokyo Institute of Technology
22
Condor
High Throughput Computing
matching jobs and resources
by ClassAds mechanism
fault tolerance by check
pointing
Implementation on the
Globus Tool Kit
Condor-G
Client
job
Schedd
owner: aaa
CPU: 2GHz以上
Memory: 512MB以上
Disk: 10GB以上
:
Match maker
Startd
[ http://www.cs.wisc.edu/condor/ ]
Kento Aida, Tokyo Institute of Technology
23
Scheduling
Application scheduling
Scheduling of a single application (job) on resources
How do we decompose an application program into tasks?
Where do we allocate tasks?
When do we start execution of tasks?
Job scheduling
Scheduling of multiple jobs on resources
Where do we dispatch jobs on resources?
When do we start execution of jobs?
Goal
minimizing the execution time, meeting the deadline,
minimizing the cost, preserving fairness, …
Kento Aida, Tokyo Institute of Technology
24
Nimrod
Job management system for parameter-survey
applications
computational
economy
deadline scheduling
Implementation on
the Globus Tool Kit
Nimrod/G
[source: D. Abramson, et.al., “High
Performance Parametric Modeling with
Nimrod/G: Killer Application for the
Global Grid?,” IPDPS2000, 2000 ]
[ http://www.csse.monash.edu.au/~davida/nimrod.html/ ]
Kento Aida, Tokyo Institute of Technology
25
Data Management
Distributed file management, High-speed file
transfer, Replica management
data management
file
high-speed file transfer
internet
GW
user
GW
GW
Org. A
replication
Org. B
Org. C
Kento Aida, Tokyo Institute of Technology
26
Data Grid Applications
High Energy Physics
Earth Science, Astronomical Observation
Bio informatics
[source: Osamu Tatebe, AIST]
Kento Aida, Tokyo Institute of Technology
27
Grid Datafarm
• Peta-to-Exascale Global Filesystem on unified CPU/storage
cluster
• Parallel I/O and parallel processing with local I/O scalability
[source: Osamu Tatebe, AIST]
Kento Aida, Tokyo Institute of Technology
28
Trans-Pacific Gfarm Datafarm testbed:
Network and cluster configuration
SuperSINET Trans-Pacific thoretical peak 3.9 Gbps
Gfarm disk capacity
disk read/write
Titech
147 nodes
16 TBytes
4 GB/sec
10G
Univ
Tsukuba
10 nodes
1 TBytes
300 MB/sec
SuperSINET
2.4G
NII
10G
2.4G(1G)
[950 Mbps]
10G
1G
7 nodes
3.7 TBytes
200 MB/sec
Maffin
1G
APAN
Tokyo
1G XP
5G
AIST
16 nodes
11.7 TBytes
1 GB/sec
Tsukuba
WAN
[source: Osamu Tatebe, AIST]
New
York
2.4G
[2.34 Gbps]
16 nodes
11.7 TBytes
1 GB/sec
2.4G
Chicago
APAN/TransPAC
SC2003
Phoenix
Abilene
[500 Mbps]
OC-12 ATM
622M
KEK
70 TBytes
13 GB/sec
Indiana
Univ
1G
10G
32 nodes
Los Angeles
23.3 TBytes
10G
SDSC 2 GB/sec
Kasetsert
Univ,
Thailand
Kento Aida, Tokyo Institute of Technology
29
Programming
MPI
programming with Message Passing Interface
MPICH-G2,GridMPI,…
GridRPC
programming with Remote Procedure Call (RPC)
mechanism
Ninf-G,OmniRPC,NetSolve,…
Master Worker Template
template to develop master-worker programs
MW,AMWAT,…
Kento Aida, Tokyo Institute of Technology
30
GridRPC
internet
library program
user program
input data
-----for (…) {
grpc_call_async( )
}
------
master
output data
worker
library program
worker
library program
worker
Kento Aida, Tokyo Institute of Technology
31
GridRPC (cont’d)
Ninf-G [ http://ninf.apgrid.org/ ]
reference implementation of GridRPC
implementation on the Globus Toolkit
using security functions on the Globus (authentication,
encrypted communication).
for (i = start; i <= end; i++) {
SDP_search(argv[1], i, &value[i]); }
grpc_function_handle_init(&hdl, …, “SDP/search”);
for (i = start; i <= end; i++) {
grpc_call_async(&hdl, argv[1], i, &value[i]); }
Kento Aida, Tokyo Institute of Technology
32
Problem Solve Environment (PSE)
Portal
frontend to search, run, monitor, and control applications
on the grid
Web page
cooperation with a workflow system
Workflow
mechanism to run multiple applications following their
dependencies
representing dependencies among applications by a graph
initiation of applications following the workflow by the workflow
engine
Kento Aida, Tokyo Institute of Technology
33
Example of PSE (UNICORE)
[source: http://www.unicore.org/unicore.htm]
Kento Aida, Tokyo Institute of Technology
34
3. Infrastructure
Kento Aida, Tokyo Institute of Technology
35
Resources in Grid Infrastructure
Computer
PC, PC cluster,
supercomputer, …
Storage
HDD, RAID, …
[source:
http://www.gsic.titech.ac.jp/Japanese/Service
/R_System/Overview/index.html]
[source: Matsuoka Lab, TITECH]
Kento Aida, Tokyo Institute of Technology
Resources in Grid Infrastructure
(cont’d)
36
Experimental device
microscope, accelerator ,
…
Sensor
thermometer, camera, …
Ultra-High Voltage Electron Large Hadron Collider, CERN
[source: Osamu Tatebe, AIST]
Microscope,
Osaka University
[source: http://www.biogrid.jp/]
EcoGrid, NCHC
[source: Fang Pang Lin,
NCHC]
Kento Aida, Tokyo Institute of Technology
Resources in Grid Infrastructure
(cont’d)
37
Network
LAN, WAN, internet, …
[source:
http://www.noc.titech.ac.jp/titanet/
supertitanet/index.ja.shtml]
[ source: http://www.apan.net/]
Kento Aida, Tokyo Institute of Technology
38
Grid Infrastructure
Classification by objectives
test bed
the grid environment construct to perform experiment.
temporally available
production grid
the grid environment for production use, or to run
practical applications
permanently available.
Resources are fully operated for 24hrs.
Classification by geographic sites
department grid, campus grid, national grid, international
grid
Kento Aida, Tokyo Institute of Technology
39
ACT-JST Testbed
Grid testbed for running applications
to solve large-scale optimization
problem
construction of 1000CPU scale testbed
application development
collaboration among Grid researchers
and application scientists
AIST
TDU
TITECH
Tokushima U.
Kento Aida, Tokyo Institute of Technology
40
Grid Challenge Federation (GCF)
Test bed constructed for the Grid Challenge event,
programming contest on the grid
Resources
Grid Technology Research Center, AIST
HPCS Lab., U. Tsukuba
Yuba-Honda Lab., UEC
Matsuoka Lab., TITECH
Aida Lab., TITECH
Ono Lab., Tokushima U.
Hiraki Lab., U. Tokyo
Chikayama-Taura Lab.,
U. Tokyo
Kento Aida, Tokyo Institute of Technology
41
ApGrid/PRAGMA
Grid Partnership among Asia-Pacific region
[ source: http://www.apgrid.org/]
Kento Aida, Tokyo Institute of Technology
42
Titech Grid
[source: http://www.gsic.titech.ac.jp/index-j.html]
Kento Aida, Tokyo Institute of Technology
43
NAREGI
[source: http://www.naregi.org/ ]
Kento Aida, Tokyo Institute of Technology
44
TeraGrid
The 40Gbps network connects sites.
20TeraFlops,1PB resources
CalTech,
ANL, SDSC,
NCSA, PSC
[source: http://www.teragrid.org/]
Kento Aida, Tokyo Institute of Technology
45
Operation of Infrastructure
Objectives
An organization/staff is required to stably provide a grid
infrastructure to users.
The current internet is operated by experts (organizations) for
network operation.
Network Operation Center (NOC)
Grid Operation Center
organization to operate a grid infrastructure
providing information of grid resources
resources in VO
load on computing resources, traffic on networks, …
user support
accounting, documents archives, help desk, trouble shooting, …
Kento Aida, Tokyo Institute of Technology
46
PRAGMA GOC
Kento Aida, Tokyo Institute of Technology
47
Network Weather Map
http://mrtg.koganei.itrc.net/mmap/grid.html
Thanks: Dr. Hirabaru and APAN Tokyo NOC team
Kento Aida, Tokyo Institute of Technology
48
Kento Aida, Tokyo Institute of Technology