Transcript Document

1
Tutorial: Technology of the Grid
1. Definition
2. Components
3. Infrastructure
Kento Aida
Tokyo Institute of Technology
Kento Aida, Tokyo Institute of Technology
2
Goal of the Tutorial
 What is the grid?
 definition
 What technology is needed to create the grid?
 component technology
 How is the grid environment is constructed?
 infrastructure
Kento Aida, Tokyo Institute of Technology
3
1. Definition
Kento Aida, Tokyo Institute of Technology
4
Definition of the Grid
Definition [http://www.jpgrid.org/about/index.html]
The grid is an infrastructure to dynamically
organize a virtual organization (or a virtual
computer) on demand by virtualizing and
integrating resources such as computers, data,
experimental devices, sensors, people.
(The original definition is written by Japanese.)
 What is the grid? A three point checklist
[http://www.gridtoday.com/02/0722/100136.html]
 coordinates resources that are not subject to centralized
control
 using standard, open, general-purpose protocols and
interfaces
 to deliver nontrivial qualities of service
Kento Aida, Tokyo Institute of Technology
5
What can we do using the grid?
We can use information resources (services) on
network
securely (to guarantee security),
stably (to use required
resources on demand),
and
easily (without
Internet
knowledge of network,
computers, …).
virtual organization
Kento Aida, Tokyo Institute of Technology
6
Examples of Virtual Organizations
 Members in a collaborative research project
 Researchers in a collaborative research project share
resources distributed over their sites, e.g. universities,
institutes, laboratories, ….
 large-scale scientific computing
 large-scale distributed database
 Project team in a company
 Members in a project team share resources distributed
over multiple branches in a company.
 business
 transaction
Kento Aida, Tokyo Institute of Technology
7
Definition of the Grid (again)
Definition
The grid is an infrastructure to dynamically
organize a virtual organization (or a virtual
computer) on demand by virtualizing and
integrating resources … .
 What is the grid? A three point checklist
 coordinates resources that are not subject to centralized
control
dynamic organization of VO
 using standard, open, general-purpose protocols and
interfaces
access to resources by standardized protocols
 to deliver nontrivial qualities of service
 Users do not have to have knowledge about network,
computers, etc.
Kento Aida, Tokyo Institute of Technology
8
Grid?
 Grid = supercomputer + network?
 Grid = idle PCs + network?
 Grid = large-scale parallel processing on the
internet?
 If we connect our resources to the grid, anonymous
users’ jobs will run on our resources without owners’
knowledge?
 If we submit jobs to the grid, our job will run on
resources in unknown sites?
Kento Aida, Tokyo Institute of Technology
9
Classification of the Grid
Computing Grid
(high-performance computing)
business
Data Grid
(high-performance data processing)
Sensor Grid
(advanced sensing)
Access Grid
(support for collaboration)
science
Business Grid
(advanced web service)
PC Grid
(utilization of idle PCs)
Kento Aida, Tokyo Institute of Technology
10
Computing Grid
 Grid computing
 high-performance computing service to utilize computers
on the grid
 Merit of users
 reducing computation time
 expanding problem size
 receiving computation service
 Component technology
 security, resource management, job management,
programming, problem solve environment (PSE), …
Kento Aida, Tokyo Institute of Technology
11
Data Grid
 Large-scale data processing/computing
 large-scale distributed database on the internet
 data processing service to access distributed data
 Merit of users
 high-speed access to distributed data
 high-performance and reliable processing using largescale data
 Component technology
 security, high-speed data transfer, replica management,
scheduling
Kento Aida, Tokyo Institute of Technology
12
Access Grid
 Communication support on the grid
 Example
 remote conference
 virtual laboratory
 remote medical service
 SARS Grid (NCHC)
 entertainment
 “KARAOKE” Grid (AIST)
Kento Aida, Tokyo Institute of Technology
13
Sensor Grid
 Advanced Monitoring
 coordination of autonomous sensors connected by
network
 wired network, wireless network, satellite, …
 advanced sensing, analysis, forecasting
 Example
 meteorology (weather forecast), ecology, agriculture, …
Kento Aida, Tokyo Institute of Technology
14
Technical Issues of the Grid
 Component technology
 security, information service, resource management
 job management, scheduling
 data management
 programming
 problem solve environment (PSE)
 Infrastructure
 production grid
 Application
 applying to big science
 applying business
Kento Aida, Tokyo Institute of Technology
15
2. Components
Kento Aida, Tokyo Institute of Technology
16
Component Technology of the Grid
application
problem solve environment
programming
information
service
job
management
data
management
resource management
security
infrastructure (computer, network, experimental device, …)
Kento Aida, Tokyo Institute of Technology
17
Security
 Issues
 authentication, encryption of communication
 Single sign on
 user authentication on one host
 Authentication on other hosts is automatically performed.
user
internet
authentication
authentication is
automatically
performed.
authentication
authentication
Org. A
Org. C
Org. B
Kento Aida, Tokyo Institute of Technology
18
Resource Management
 Common interfaces to the grid
 wrapping differences of commands/operations among
different machines
internet
user
common command
GW
com. a
OS A
Org. A
GW
GW
com. c
com. b
OS B
Org. B
OS C
Org. C
Kento Aida, Tokyo Institute of Technology
19
Information Service
 Information about resources on the Grid
info. service
network
monitoring
CPU: …
memory: …
OS: …
internet
GW
GW
GW
Org. C
Org. A
Org. B
Kento Aida, Tokyo Institute of Technology
20
Big picture of the GT2
CA
Proxy
Cert.
GIIS
gatekeeper
Query
Resource
Status
User
Cert.
grid-proxy-init
Proxy
Cert.
Client
GRIS
Site B
process
Data
Transfer
Process invocation
Proxy
Cert.
GRIS
GridFTP
Server
Return result
Site C
gatekeeper
GRIS
[source: Yoshio Tanaka, AIST]
Site B
Kento Aida, Tokyo Institute of Technology
21
Job Management
 Resource selection, Scheduling, Job control
info. service
(2)
resource broker
(1,3,4)
(0)
internet
GW
(4)
user
GW
GW
Org. A
Org. B
Org. C
Kento Aida, Tokyo Institute of Technology
22
Condor
 High Throughput Computing
 matching jobs and resources
by ClassAds mechanism
 fault tolerance by check
pointing
 Implementation on the
Globus Tool Kit
 Condor-G
Client
job
Schedd
owner: aaa
CPU: 2GHz以上
Memory: 512MB以上
Disk: 10GB以上
:
Match maker
Startd
[ http://www.cs.wisc.edu/condor/ ]
Kento Aida, Tokyo Institute of Technology
23
Scheduling
 Application scheduling
 Scheduling of a single application (job) on resources
 How do we decompose an application program into tasks?
 Where do we allocate tasks?
 When do we start execution of tasks?
 Job scheduling
 Scheduling of multiple jobs on resources
 Where do we dispatch jobs on resources?
 When do we start execution of jobs?
 Goal
 minimizing the execution time, meeting the deadline,
minimizing the cost, preserving fairness, …
Kento Aida, Tokyo Institute of Technology
24
Nimrod
 Job management system for parameter-survey
applications
 computational
economy
 deadline scheduling
 Implementation on
the Globus Tool Kit
 Nimrod/G
[source: D. Abramson, et.al., “High
Performance Parametric Modeling with
Nimrod/G: Killer Application for the
Global Grid?,” IPDPS2000, 2000 ]
[ http://www.csse.monash.edu.au/~davida/nimrod.html/ ]
Kento Aida, Tokyo Institute of Technology
25
Data Management
 Distributed file management, High-speed file
transfer, Replica management
data management
file
high-speed file transfer
internet
GW
user
GW
GW
Org. A
replication
Org. B
Org. C
Kento Aida, Tokyo Institute of Technology
26
Data Grid Applications
 High Energy Physics
 Earth Science, Astronomical Observation
 Bio informatics
[source: Osamu Tatebe, AIST]
Kento Aida, Tokyo Institute of Technology
27
Grid Datafarm
• Peta-to-Exascale Global Filesystem on unified CPU/storage
cluster
• Parallel I/O and parallel processing with local I/O scalability
[source: Osamu Tatebe, AIST]
Kento Aida, Tokyo Institute of Technology
28
Trans-Pacific Gfarm Datafarm testbed:
Network and cluster configuration
SuperSINET Trans-Pacific thoretical peak 3.9 Gbps
Gfarm disk capacity
disk read/write
Titech
147 nodes
16 TBytes
4 GB/sec
10G
Univ
Tsukuba
10 nodes
1 TBytes
300 MB/sec
SuperSINET
2.4G
NII
10G
2.4G(1G)
[950 Mbps]
10G
1G
7 nodes
3.7 TBytes
200 MB/sec
Maffin
1G
APAN
Tokyo
1G XP
5G
AIST
16 nodes
11.7 TBytes
1 GB/sec
Tsukuba
WAN
[source: Osamu Tatebe, AIST]
New
York
2.4G
[2.34 Gbps]
16 nodes
11.7 TBytes
1 GB/sec
2.4G
Chicago
APAN/TransPAC
SC2003
Phoenix
Abilene
[500 Mbps]
OC-12 ATM
622M
KEK
70 TBytes
13 GB/sec
Indiana
Univ
1G
10G
32 nodes
Los Angeles
23.3 TBytes
10G
SDSC 2 GB/sec
Kasetsert
Univ,
Thailand
Kento Aida, Tokyo Institute of Technology
29
Programming
 MPI
 programming with Message Passing Interface
 MPICH-G2,GridMPI,…
 GridRPC
 programming with Remote Procedure Call (RPC)
mechanism
 Ninf-G,OmniRPC,NetSolve,…
 Master Worker Template
 template to develop master-worker programs
 MW,AMWAT,…
Kento Aida, Tokyo Institute of Technology
30
GridRPC
internet
library program
user program
input data
-----for (…) {
grpc_call_async( )
}
------
master
output data
worker
library program
worker
library program
worker
Kento Aida, Tokyo Institute of Technology
31
GridRPC (cont’d)
 Ninf-G [ http://ninf.apgrid.org/ ]
 reference implementation of GridRPC
 implementation on the Globus Toolkit
 using security functions on the Globus (authentication,
encrypted communication).
for (i = start; i <= end; i++) {
SDP_search(argv[1], i, &value[i]); }
grpc_function_handle_init(&hdl, …, “SDP/search”);
for (i = start; i <= end; i++) {
grpc_call_async(&hdl, argv[1], i, &value[i]); }
Kento Aida, Tokyo Institute of Technology
32
Problem Solve Environment (PSE)
 Portal
 frontend to search, run, monitor, and control applications
on the grid
 Web page
 cooperation with a workflow system
 Workflow
 mechanism to run multiple applications following their
dependencies
 representing dependencies among applications by a graph
 initiation of applications following the workflow by the workflow
engine
Kento Aida, Tokyo Institute of Technology
33
Example of PSE (UNICORE)
[source: http://www.unicore.org/unicore.htm]
Kento Aida, Tokyo Institute of Technology
34
3. Infrastructure
Kento Aida, Tokyo Institute of Technology
35
Resources in Grid Infrastructure
 Computer
 PC, PC cluster,
supercomputer, …
 Storage
 HDD, RAID, …
[source:
http://www.gsic.titech.ac.jp/Japanese/Service
/R_System/Overview/index.html]
[source: Matsuoka Lab, TITECH]
Kento Aida, Tokyo Institute of Technology
Resources in Grid Infrastructure
(cont’d)
36
 Experimental device
 microscope, accelerator ,
…
 Sensor
 thermometer, camera, …
Ultra-High Voltage Electron Large Hadron Collider, CERN
[source: Osamu Tatebe, AIST]
Microscope,
Osaka University
[source: http://www.biogrid.jp/]
EcoGrid, NCHC
[source: Fang Pang Lin,
NCHC]
Kento Aida, Tokyo Institute of Technology
Resources in Grid Infrastructure
(cont’d)
37
 Network
 LAN, WAN, internet, …
[source:
http://www.noc.titech.ac.jp/titanet/
supertitanet/index.ja.shtml]
[ source: http://www.apan.net/]
Kento Aida, Tokyo Institute of Technology
38
Grid Infrastructure
 Classification by objectives
 test bed
the grid environment construct to perform experiment.
 temporally available
 production grid
the grid environment for production use, or to run
practical applications
 permanently available.
 Resources are fully operated for 24hrs.
 Classification by geographic sites
 department grid, campus grid, national grid, international
grid
Kento Aida, Tokyo Institute of Technology
39
ACT-JST Testbed
 Grid testbed for running applications
to solve large-scale optimization
problem
 construction of 1000CPU scale testbed
 application development
 collaboration among Grid researchers
and application scientists
AIST
TDU
TITECH
Tokushima U.
Kento Aida, Tokyo Institute of Technology
40
Grid Challenge Federation (GCF)
 Test bed constructed for the Grid Challenge event,
programming contest on the grid
 Resources
 Grid Technology Research Center, AIST
 HPCS Lab., U. Tsukuba
 Yuba-Honda Lab., UEC
 Matsuoka Lab., TITECH
 Aida Lab., TITECH
 Ono Lab., Tokushima U.
 Hiraki Lab., U. Tokyo
 Chikayama-Taura Lab.,
U. Tokyo
Kento Aida, Tokyo Institute of Technology
41
ApGrid/PRAGMA
 Grid Partnership among Asia-Pacific region
[ source: http://www.apgrid.org/]
Kento Aida, Tokyo Institute of Technology
42
Titech Grid
[source: http://www.gsic.titech.ac.jp/index-j.html]
Kento Aida, Tokyo Institute of Technology
43
NAREGI
[source: http://www.naregi.org/ ]
Kento Aida, Tokyo Institute of Technology
44
TeraGrid
 The 40Gbps network connects sites.
 20TeraFlops,1PB resources
CalTech,
ANL, SDSC,
NCSA, PSC
[source: http://www.teragrid.org/]
Kento Aida, Tokyo Institute of Technology
45
Operation of Infrastructure
 Objectives
 An organization/staff is required to stably provide a grid
infrastructure to users.
 The current internet is operated by experts (organizations) for
network operation.
Network Operation Center (NOC)
 Grid Operation Center
 organization to operate a grid infrastructure
 providing information of grid resources
 resources in VO
 load on computing resources, traffic on networks, …
 user support
 accounting, documents archives, help desk, trouble shooting, …
Kento Aida, Tokyo Institute of Technology
46
PRAGMA GOC
Kento Aida, Tokyo Institute of Technology
47
Network Weather Map
http://mrtg.koganei.itrc.net/mmap/grid.html
Thanks: Dr. Hirabaru and APAN Tokyo NOC team
Kento Aida, Tokyo Institute of Technology
48
Kento Aida, Tokyo Institute of Technology