LDCM Grid Prototyp

Download Report

Transcript LDCM Grid Prototyp

The LDCM Grid Prototype
Jeff Lubelczyk
& Beth Weinstein
January 4, 2005
1
Prototype Introduction
A Grid infrastructure allows scientists at resourcepoor sites access to remote resource-rich sites
• Enables greater scientific research
• Maximizes existing resources
• Limits the expense of building new facilities
The objective of the LDCM Grid Prototype (LGP) is
to assess the applicability and effectiveness of a
data grid to serve as the infrastructure for
research scientists to generate virtual Landsat-like
data products
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
2
LGP Key POCs
Sponsors
• LDCM - Bill Ochs, Matt Schwaller
• Code 500/580 - Peter Hughes,
Julie Loftis
LGP Team members
• Jeff Lubelczyk (Lead)
• Gail McConaughy (SDS Lead
Technologist)
• Beth Weinstein (Software Lead)
• Ben Kobler (Hardware, Networks)
• Eunice Eng (Software Dev, Data)
• Valerie Ward (Software Dev, Apps)
• Ananth Rao ([SGT] Software
Arch/Dev, Grid Expertise)
• Brooks Davis ([Aerospace Corp]
Globus/Grid Admin Expert)
• Glenn Zenker ([QSS] System
Admin)
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
USGS
• Stu Doescher (Mgmt)
• Chris Doescher (POC)
• Mike Neiers (Systems
Support)
Science Input
• Jeff Masek, 923 (Blender)
• Robert Wolfe, 922
(Blender, Data)
• Ed Masuoka, 922 (MODIS,
Grid)
LDCM Prototype Liaison
• Harper Prior (SAIC)
CEOS grid working group (CA)
• Ken McDonald
• Yonsook Enloe [SGT]
3
Grid - A Layer of Abstraction
• Grid Middleware packages the
underlying infrastructure into
defined APIs
•A common package is the Globus
Toolkit
–Open source, low cost, flexible
solution
User Client
Application
Grid Middleware
•Security (Authentication, Authorization)
•Resource Discovery
•Storage Management
•Scheduling and Job Management
Compute
Storage
West Coast/Platform A
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
Compute
Storage
On Campus/Platform A
Compute
Storage
East Coast/Platform C
4
What the current data grid provides
Security Infrastructure
• Globus Gate Keeper
• Authentication (PKI)
• Authorization
Resource Discovery
• Monitoring and Discovery Service (MDS) [LDAP
like]
Storage Management and Brokering
• Metadata catalogs
• Replica Location Service
Globus Tookit 2.4.2
• Allows use of logical file names
– Physical locations are hidden
Globus
• Storage Resource Management
Gate
GridFTP GRAM
• GridFTP
keeper
– Retrieves data using physical file names
• Data formats and subsetting
Job Scheduling and Resource Allocation
• GRAM (Globus Resource Allocation Manager) -Provides a single common API for requesting and
using remote system resources
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
Note: Portions of the
Globus Toolkit used in
5
Capability 1
High Level Schedule
Major Milestones
• 12/03 - Prototype start
• 6/04 - Demo of Capability 1 grid infrastructure
• Demonstrate simple file transfers and remote application
execution at multiple GSFC labs and USGS EDC
• Ready to build application on top of basic infrastructure
• 12/04 - Demo of Capability 1
• Provide and demonstrate a grid infrastructure that enables a
user program to access and process remote heterogeneous
instrument data at multiple GSFC labs and USGS EDC
• 3/05 - Demo of Capability 2 grid infrastructure
• Demonstrate file transfers and remote application execution
at multiple GSFC labs, USGS EDC, and ARC/GSFC commodity
resources to assess scaleability
• 6/05 - Demo of Capability 2
• Enable the data fusion (blender) algorithm to obtain datasets,
execute, and store the results on any resource within the
Virtual Organization (GSFC labs, USGS EDC, ARC/GSFC)
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
6
The LDCM Demonstration …
Prepares two heterogeneous data sets at
different remote locations for like “footprint”
comparison from a science user’s home site
• The MODIS Reprojection Tool (MRT) serves as
our “typical science application” developed at the
science users site (Building 32 in demo)
• mrtmosaic and resample (subset and reproject)
• Operates on MODIS and LEDAPS (Landsat) surface
reflectance scenes
• Data distributed at remote facilities
• Building 23 (MODIS scenes)
• USGS/EDC (LEDAPS scenes)
Solves a realistic scientific scenario using gridenabled resources
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
7
Capability 1 Virtual Organization
Dell/Linux Server
•Dual Xeon Processors
•8 GB Memory
•438GB Disk Storage
edclxs66
USGS EDC
Sioux Falls, SD
Dell/Linux Server
•Quad Xeon Processors
•16 GB Memory
•438GB Disk Storage
LGP23
GSFC
B23/W316
1 Gbps
USGS/EDC
1 Gbps
Backbone
USGS/EDC
MAX (College Park)
OC48, 2.4Gbps
Backbone
1 Gbps
OC12, 622 Mbps
Shared with DREN
OC12, 622 Mbps
vBNS+ (Chicago)
OC48, 2.4Gbps
Backbone
Dell/Linux Server
•Dual Xeon Processors
•8 GB Memory
•438GB Disk Storage
GSFC SEN
1Gbps
Backbone
1 Gbps
LGP32
Science User_1
GSFC
B32/C101
GSFC
Capability 1
Installed Equipment
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
SEN: Science and Engineering Network
MAX: Mid-Atlantic Crossroads
DREN: Defense Research and Engineering Network
vBNS+: Very high Performance Network Service
8
A “Typical Science Application”
MODIS Reprojection Tool (MRT)
• Software suite distributed by LP DAAC
• Applications used include
• mrtmosaic.exe
– Create 1 scene from adjacent scenes
• resample.exe (Subset)
– Geographic
– Band/Channel
– Projection
• Each operate on MODIS and LEDAPS scene data
Visualization Tool -- Software to display scenes
• HDFLook
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
9
Data
MODIS - MOD09GHK
• MODIS/Terra Surface Reflectance Daily L2G Global 500m SIN
Grid V004
• Sinusoidal projection
• 7 Scenes
• Washington D.C.
(H = 11,12, V = 5)
• Pacific NW
(H = 9, V = 4)
• Obtained from LP DAAC ECS Data Pool
LEDAPS - L7ESR
• LEDAPS Landsat-7 Corrected Surface Reflectance
• UTM projection
• 2 Scenes
• Washington D.C.
(Path = 15, Row = 33)
• Pacific NW areas
(Path = 48, Row = 26)
• Obtained from LEDAPS website
Both compatible with the MRT
All like-area scenes are as temporally coincident as possible
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
10
4 Scenarios to Illustrate Grid Flexibility
Data Services (Move application to data)
• Transfer the MRT to the remote hosts and process the
data remotely, sending the results back to the science
facility
Batch Execution (Parallel computing)
• Demonstrate the execution of the MRT in a parallel batch
environment
Local Processing (User prefers to process locally)
• Transfer the selected data sets to the science user site
for processing
Third Party Processing (No local resource usage)
• Perform a third party data transfer and process the data
remotely
Grid flexibility maximizes science resources
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
11
How we make this happen
Command line interface to execute the LDCM
Grid Prototype (LGP) driver program
The LGP Driver
• Manages the execution of a specified application
• Transfers the application and data as needed
• Uses configuration files as inputs to describe:
• The executable and its location
• The data sets and their location
• The location of the resulting output file(s)
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
12
Capability 1 Software Framework
LDCM Grid Prototype (LGP) Driver
• Provides a generic software system
architecture based on Globus services
• LGP Driver high-level services
• Session Manager – grid session initiation
and user authentication using proxy
certificates
• Data Manager – file transfer using
GridFTP
• Job Manager – job submission and status
in a grid environment
Utilizes the Java Commodity Grid Kits
(CoGs)
• Supplies a layer of abstraction from
underlying Globus services
• Simplifies the programming interface
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
LGP Driver
(Java 1.4.2)
Session
Data
Job
Java CoG 1.1
Globus Tookit 2.4.2
Globus
Gate
keeper
GridFTP
GRAM
13
Demo Scenario 1: Input Data
LEDAPS - L7ESR
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
MODIS - MOD09GHK
14
Demo Scenario 1: Data Services
LDCM VO
2) Run
resample.ex
e on 1
LEDAPS file
USGS
EDC
LEDAPS
GSFC
data.txt
B23
1) Move
resample.exe
from B32 to EDC
3) Move
LEDAPS
resample output
from EDC to
B32
5) Run mrtmosaic.exe
with 2 MOD09GHK
files
7) Run resample.exe
on mrtmosaic output
MOD09GHK
8) Move
mrtmosaic
resample
output from
B23 to B32
4) Move mrtmosaic.exe
from B32 to B23
6) Move resample.exe
from B32 to B23
GSFC
B32
Grid Node
GT 2.4.3
GridFTP Server
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
9) Display
LEDAPS and
MODIS resampled
output using
HDFLook
MRT
15
Demo Scenario 1: Data Services
MRT
resample
mrtmosaic
MODIS
MOD
Mosaic
09GHK
USGS
EDC
LEDAPS – L7ESR
MRT
LEDAPS
resample
MOD09GHK
MODIS – MOD09GHK
1) Move
resample from
B32 to EDC
3) Move
LEDAPS
resample output
from EDC to
B32
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
8) Move
mrtmosaic
resample output
from B23 to
B32
5) Run
Run
7)
mrtmosaicon
resample
with 2
mrtmosaic
MOD09GHK
output
files
6)
4) Move
resample
mrtmosaic
from B32
to B23
MODIS
Mosaic
LEDAPS
GSFC
data.txt
B23
MODIS
LEDAPS
Mosaic
Subset
2) Run
resample on
1 LEDAPS
file
MODIS
MODIS
MODIS
Mosaic
Mosaic
Mosaic
Subset
LDCM VO
MRT
resample
GSFC
resample
mrtmosaic
B32
MRT
9) Display
LEDAPS and
MODIS
resampled output
using HDFLook
16
Capability 1 Task Requirements
Completed
 Science user is at B32 and the data is at EDC
and B23
 2 - 3 instrument types
 10 - 20 scenes
 Spatially and temporally coincident data
 Algorithm must run on B23, B32, and EDC
 Command-line invocation from client side
 Perform distributed computation
 Share distributed data
Verified by executing the 4 scenarios
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
17
Next Steps -- Capability 2
Capability 2 (C2)
• Integrate with the Blender team
• Collaborate to identify meaningful C2 data sets
• Demonstrate blender algorithm
• Assess Grid performance
• Expand the VO to include ARC supercomputing if
available
• Performance Goals
– Demonstrate the processing of 1 day’s worth of data
in the grid environment (~250 scenes)
• Grid Workflow -- increase automation
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
18
Grid Workflow
Our current capabilities allow us to submit jobs only to a
specified resource
The goal of the next phase will be to provide the ability
to submit a job to the “Grid” Virtual Organization
• Grid resource management
• Scheduling policy
• Maximize grid resources
• Manage sub tasks
• Reliable job completion
• Checkpointing and job migration
• Leverage wasted cpu cycles
Next step: Examine Condor and Pegasus open source
Globus toolkit workflow extensions
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
19
Concept of a Future Grid Architecture - LDCM example
Scientist Data Site
USGS/EDC
VO Grid
VO Grid
Operator’s Resource
Status
Interface
Landsat Data
Reflectance Pdts
NASA/GSFC
Data Node/Manager
MODIS
DAAC VIRS
& other
Research Community Data Server
Failure
Recovery
VO Grid
Config
Data Node/Manager
Overall V0 Grid Management
Data Product Distribution
Existing C1 Grid Infrastructure
Proposed C2 Grid Infrastructure
Future Grid Components
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
Univ. of MD
Grid
Workflow
Engine
Job
Manager
Grid
Resource
Manager
Data
Manager
Science
Product
Interface
Session
Manager
Product
Def.
Product
Status &
Recovery
<Blender>
Research
Community
Research
Archive
20
Acronym List
FTP
LDCM
LEDAPS
File Transfer Protocol
Landsat Data Continuity Mission
Landsat Ecosystem Disturbance Analysis
Adaptive Processing System
LGP
LDCM Grid Prototype
LP DAAC Land Processes Distributed Active
Archive Center
MODIS Moderate Resolution Imaging
Spectroradiometer
MRT
MODIS Reprojection Tool
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
21
Condor, Condor-G, DAGman
Condor addresses many workflow challenges for Grid
applications.
• Managing sets of subtasks
• Getting the tasks done reliably and efficiently
• Managing computational resources
Similar to a distributed batch processing system, but with some
interesting twists.
• Scheduling policy
• ClassAds
• DAGman
• Checkpointing and Migration
• Grid-aware & Grid-enabled
• Flocking (linking pools of resources) & Glide-ins
See http://www.cs.wisc.edu/condor/ for more details
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
Chart author: lee liming
argonne national laboratory
22
Pegasus Workflow Transformation
Converts Abstract Workflow (AW) into
Concrete Workflow (CW).
• Uses Metadata to convert user
request to logical data sources
• Obtains AW from Chimera
• Uses replication data to locate
physical files
• Delivers CW to DAGman
• Executes using Condor
• Publishes new replication and
derivation data in RLS and
Chimera (optional)
See http://pegasus.isi.edu/ for
details
Metadata
Catalog
t
DAGman
Replica
Location
Service
Condor
Storage
Storage
System
Storage
System
System
Sponsored by NASA LDCM, NASA/GSFC Code 580
Team: 586/585/SGT/QSS/Aerospace Corp/USGS
Chimera
Virtual Data
Catalog
Compute
Compute
Server
Compute
Server
Compute
Server
Server
Chart author: lee liming
argonne national laboratory
23