GridLab Project

Download Report

Transcript GridLab Project

GridLab Project - A Grid Application Toolkit and Testbed

2020/4/26 • 2002 年 1 月 1 日 ——2005 年 1 月 1 日 1

Project objectives

build components for Grid applications, and realistic testbeds for their development: • Co-development of Infrastructure and Applications • Dynamic Grid Computing simulation and visualization codes to be self aware of the changing Grid environment, and to be able to fully exploit dynamic resources for fundamentally new and innovative applications scenarios.

2020/4/26 2

These elements are included in developing innovative, yet practical, Grid computing technologies, which will then be immediately and easily adopted and exploited by applications from many different research and engineering fields. Specific key objectives are: • Design and develop a Grid Application Toolkit (GAT), • Simultaneously enhance real applications for the Grid, • Test Grid-enabled applications on real testbeds spanning Europe.

2020/4/26 3

GridLab Aims

• Get Computational Scientists using the “ Grid ” and Grid services for real, everyday, production work (AEI Relativists, EU Network, Gravity Wave Data Analysis, Cactus User Community).

• Make it easier for applications to make flexible, efficient, robust, use of the resources available to their virtual organizations.

• Dream up, prototype, and test new application scenarios which make adaptive, dynamic, wild, and futuristic uses of resources.

2020/4/26 4

What GridLab Isn

t

… • Don ’ t want to development low level Grid Infrastructure (just want to nudge it) • Don ’ t want to repeat work which has already been done (want to incorporate and assimilate it … Globus APIs, ASC Portal (GridSphere/Orbiter), GPDK, GridPort, DataGrid, … ) 2020/4/26 5

Important Issues for GridLab

• Apps are configurable!

• Can be anything • Each module may (or may not!) know of what it needs • Estimate total resource needs FOR THAT CONFIG!!

• Launch Job – Where? Best choice? Second best? High Priority to get this!

– Go!

• Job queries info server (or multiple servers…) • What is out there?

• Job publishes to server – Here is what I am doing 2020/4/26 6

• Infrastructure: – Is it ubiquitous? Is it reliable? Does it work?

– Probably not: need abstraction and redundancy!

• Security: – How does user pass proxy from site to site?

– Firewalls? Ports?

• How does user/application get information about Grid?

– Need reliable, ubiquitous Grid information services – Portal, Cell phone, PDA • What is a file? Where does it live?

– Crazy Grid apps will leave pieces of files all over the world • Tracking – How does user track the Grid simulation hierarchies?

2020/4/26 7

Innovation

The GridLab project will significantly advance the current state-of-the-art by developing • key components necessary for application oriented Grid computing (resource estimators and brokers, platform independent portals accessible even from mobile devices, security infrastructure, monitoring tools, etc.); • interfaces to functionally similar components developed by others, 2020/4/26 8

• a Grid Application Toolkit (GAT), for both infrastructure and applications, enabling new generations of Grid enabled applications, and • innovative new Grid computing scenarios to dramatically increase the scale or throughput of possible applications.

2020/4/26 9

How GridLab Advances the State-of-the-Art

• Developing a Grid Laboratory for (i) developers to gridify their applications, employing the best available tools through a high level, flexible Grid Application Toolkit (GAT); (ii) users to deploy these applications through a simulation portal on different resources and testbeds, visualizing, interacting with, and analyzing their simulations both in the work and mobile environments; and (iii) developers to easily include new Grid tools for testing and production use.

10

• Enhancing a variety of existing, resource intensive, applications with the capabilities of the GAT, including Triana and applications already using Cactus. The applications will be tested and deployed on different testbeds to ensure grid interoperability.

2020/4/26 11

• Building communities through collaborative Grid technology. Cactus was designed from the beginning to enable collaborative work between application and computer science communities. Our portal will enable geographically distributed communities to jointly interact with and analyze results of their Grid simulations. Deployment across different transatlantic testbeds encourages joint development work crossing different project, discipline, and national boundaries. These technologies/testbeds will act as catalysts new organizational and collaborative practices in these communities.

2020/4/26 12

• Disseminating results and distributing software through (i) our application communities, (ii) the gluing together of different US and EU research projects, (iii) our active participation in the GGF, (iv) our close association with leading computing centers worldwide, and (v) our partnerships with leading computer vendors, who will bring this technology directly to their industrial customers.

2020/4/26 13

Introduction to Work Packages

• WP1 Grid Application Toolkit • WP2 Cactus Grid Application Toolkit • WP3 Work-Flow Application Toolkit • WP4 Grid Portals • WP5 Testbed Management • WP6 Security • WP7 Adaptive Application Components 2020/4/26 14

• WP8 Data handling and Visualization • WP9 Resource Management • WP10 Information Services • WP11 Monitoring • WP12 Access for mobile users • WP13 Information Dissemination and Exploitation • WP14 Project Management 2020/4/26 15

Broadly, there are three classes of work packages (WPs): • Grid Application Toolkits (GATs), forming the central part of GridLab, • Component WPs that create tools which plug into the GATs providing the underlying functionality, and • Other WPs for creating interfaces, testbeds, dissemination and management.

2020/4/26 16

WPs for Grid Application Toolkits. The central components of GridLab, which form its cornerstone, are the generic Grid Application Toolkit (GAT), and its associated application based Toolkits, the Cactus GAT (CGAT) and the Work-Flow Application Toolkit (TGAT). Each of these three components will be developed as a complete work package. The GAT will consist of generic APIs at both the Grid infrastructure level and application level that will form the connecting glue between components on both sides.

2020/4/26 17

• Component WPs for creating functional pluggable GAT components. In addition to WP2 (CGAT) and WP3 (TGAT), other WPs developing components that will be incorporated through APIs developed by the GAT include the sequence of seven WPs: Security (WP6), Adaptive Application Components (WP7), Data Handling and Visualization (WP8), Resource Management (WP9), Information Services (WP10), Monitoring (WP11), and Access for Mobile Users (WP11). Each of these WPs develops a particular set of components that can be 18

• Other WPs. While most WPs are relatively independent, connecting primarily through the GAT, there is one other component that connects broadly to most other WPs: namely the Portal, WP4. The portal will provide uniform, web-based interfaces to the Grid, providing information about the Grid and the application running on it. For example, their performance and location as well as providing services such as authentication, visualization and data tracking and management. Hence, virtually all other work packages connect to WP4, requiring collaboration towards functionality and interfaces.

2020/4/26 19

2020/4/26 20

2020/4/26 21

Solution

“ Grid Application Toolkit ”

Provides a layer between applications and emerging grid technologies. Provides an application developer orientated API, allowing the flexible use of different tools and services, as well as providing protection from developing software.

“ GridLab Testbed/VO ”

Diverse controllable environment for developing and testing applications and tools, software maintained by people who know it.

Continuous Dialogue GAT Tool Developers End Users GAT-API Developers Grid Infrastructure Developers

2020/4/26 22

Architecture

• Layered structure: user access, middleware (services) and system • Access to Grid only through services using well defined interfaces • User/developer API provides access to / operates on available services • Moving "intelligence" from client side to grid (distributed blobs of intelligence) 2020/4/26 23

2020/4/26

Architecture

User

General View

PORTAL

Grid Services API User Access Layer Grid Services Interfaces

Grid Services

Services Layer Grid Core Services API

Grid Core Services

System Layer

24

Implementation Issues

• It is impossible (and there is no need) to build communication environment from scratch • Choosing existing technology which provides: – Platform and programming language neutrality – Interoperability – Scalability 2020/4/26 25

What is the GAT ?

• Set of application developer APIs for Grid tools and services, and example implementations • Usable from any high level “ application ” code, Cactus, Triana, Portals, Scripts, … ) • More or less … – Set of calls GAT_ToolOrService(arguments) (Any generic – Your chosen tools: Resource broker, information server, application manager, monitoring, data manager, notification, etc, etc – Set of APIs for dealing with the GAT (registration, information, errors, fault tolerance) 2020/4/26 26

(Very Rough)

Generic Code Example

#include “ GAT.h

” Void main() { GATState *gat; RetState *ret; gat = GAT_Init(); ret = GAT_Notify(gat, ” *:Simulation Started ” ); for (t=tinitial;t56MB,cost=0 ” ,machine)) { ret = GAT_Spawn(gat, ” Analysis ” , ” *:machine ” ); } } ret = GAT_MoveFile(gat, ” OutDir ” , ” msftp://modi4.ncsa.uiuc.edu

” ); ret = GAT_Notify(gat, ” *:Simulation Finished ” ); } ret = GAT_Terminate(gat); 2020/4/26 27

GAT Initialization & Termination

GAT_Init() • Initializes the GAT engine • Activates all (local) adaptors • Returns an initial GAT state GAT_Terminate() • Deactivates all adaptors • Shuts down the GAT engine 2020/4/26 28

Finding a Grid Resource

• • •

GAT_FindResource() Pass in a list of resource requirements resource_type=compute, mem>10M, diskspace>1M, job_type=interactive GAT goes out and queries a remote resource broker service What does it expect from the GAT ?

What information would it pass back ?

Return (a list of) available resource(s) matching the above requirements

2020/4/26 –

Simplest example: return “ localhost ”

29

Starting a New Grid Job

GAT_StartJob() • User information about the job – Executable name and path – Command line arguments – Shell environment settings – Working directory (defaulting to ${SCRATCH}) • The resource where to start the new job – Also describes

how

to start it (what "run" command to use) • Return a job ID

How will Grid Jobs be identified (URL, GSH) ?

2020/4/26 30

Querying the Status of Jobs

GAT_QueryResource() • Pass in the job ID and the query tag(s) resource_type=job, resource_id=, query=status

How to identify Grid resources ?

What query tags can be used ?

• GAT contacts a Monitoring Tool and/or Resource Management Tool • Return the current status of the job – exit code if job has finished 2020/4/26 31

Transfering Files

• • • •

GAT_CopyFile(), GAT_MoveFile() Pass in source and destination locations Pick a specific file transfer method, or let the GAT try all registered ones cp, scp, Globus File Transfer (GASS, GSIFtp, ...) Should the GAT provide both synchronuous and asynchronuous services ?

How to describe and handle data objects other than plain files (distributed files, file collections, etc.) ?

2020/4/26 32

• • • •

GAT Requirements

(High Level) APIs designed for application programmers – – Simple, short, clear (concentrate on what is needed now) Callable from F90, (F77?), C, C++, Java Make use of Remote Services and Local Utilities – For example: Cactus has it ’ s own Email thorn “ Notify ” , or use external mail server Extensible parameterizations – For example: execute a command – Which service/protocol to use? Which machine or how to choose it? Interactive or batch? Which queue? – Sensible default behavior Plug-in architecture – – – Multiple implementations (can use all or one) Empty implementations when no service exists Dynamically add implementations 2020/4/26 33

Proposed GAT Design (1)

• (Fortran-friendly) GAT-API functions calls for both high and low level application orientated Grid programming • “ Adaptors ” Grid “ tools ” connect GAT-API with applications and • “ Tools ” can be remote services, or implemented in the GAT library, or in the application itself • Plug-in architecture: “ adaptors ” (& tools) are dynamically loaded libraries (not possible on all platforms, e.g. T3E) 2020/4/26 34

Proposed GAT Design (2)

• • • Application adaptors describe the capabilities of an application and provide a GAT specific callback function: – – I can checkpoint with this function call … I can provide a parameter file for migration with this function call … Tool adaptors describe the capabilities of a tool: – – I can implement GAT_Migrate using this function call … I can implement GAT_Transfer using this function call … Default adaptors provide “ empty ” implementations (really, really, really important for us on apps side … solves our fundamental problem with Grid computing) – – GAT_FindMachine returns “ localhost ” GAT_Transfer uses “ cp ” 2020/4/26 35

Proposed GAT Design (3)

• Implement for different programming languages: – Fortran friendly – strings, function pointers, structures – ANSI C version (also for C++) with Fortran wrappers – Java version (if easier for development start with Java wrappers?) • Extensibility/dynamics for parameters for tools provided by heavy use of string arguments 2020/4/26 36

Proposed GAT Design (4)

• • Opaque object “ GATState ” credentials, as well as information for each transaction: – All GAT calls use GATState: GAT_Transfer(gatstate, … ) – Each thread has it ’ s own GATState (Thread safe) – carries around global information, such as First thing a GAT call does is destroy the return object from the last call, clone GATState for persistent objects GATState handling: – – – – gatstate = GAT_Init(); newgatstate = GAT_CloneContext(gatstate); (threaded) object = GAT_Terminate(gatstate); Set gatstate for each transaction: GAT_SetState(gatstate, ” resource broker ” ) = whatever; – – GAT_WORLD for legacy code if (GAT_IsError(gatstate,object) == GAT_ERROR) {then failed} 2020/4/26 37

Migration: Do-It-Yourself

gat = GAT_Init();

Loop over evolution step:

Application server

if (GAT_ContractViolation(gat)) {

GetCurrentRequirements

GAT_GetMachineList (

with these resources in ½ hour time

)

ChooseMachine ( balance network with machine speed )

Resource broker Application server

GAT_StageExecutable (

to this machine

) GAT_Checkpoint(

whatever

);

Simulation

GAT_Transfer (

checkpoint file to chosen machine using some protocol

);

Create new parameter file

Globus gsiftp

GAT_Transfer (

parameter file to chosen machine using some protocol

) object = GAT_Exec (

run job on new machine

) if (!GAT_IsError(object)) terminate this job

Globus gsissh

} else {

GAT internal

just continue on this machine

} 2020/4/26 38

Migration: High Level Call

gat = GAT_Init() Loop over evolution step If (GAT_ContractViolation(gat)) { object = GAT_Migrate( “ do whatever is best ” ); } Else { just continue on this machine } For example, the Poznan resource broker can do everything to migrate a simulation: Adaptor for resource broker provides a pointer to it’s migration function 2020/4/26 39

GAT Architecture: General APP

2020/4/26

App Adaptor Application GAT GAT-API Adaptor Adaptor Adaptor Adaptor Tool Tool Tool Virtual Organization Tool Tool Transfer protocol: XML-RPC, Soap, …

40

GAT Architecture: Grid Service Provider

HTTPD Adaptor Notify Adaptor Cactus App App Adaptor GAT GAT-API Adaptor Grid Service Provider Service Coordinator Tools for a VO are probably going to be “packaged” together

2020/4/26

RSB Virtual Organization Tool gsissh gsiscp

41

HTTPD Adaptor

GAT Architecture: Grid Service Provider Application and Portal

Notify Adaptor Cactus App App Adaptor Portal GAT GAT-API Adaptor GAT GAT-API Adaptor Grid Service Provider Service Coordinator RSB

2020/4/26

Tool Virtual Organization gsissh gsiscp Tool Tool

42

2020/4/26

GAT Architecture

Applications GAT-API GAT Services Grid Core Services Cactus, Triana, Generic Codes, Portals, Scripts High level calls based on functionality GAT_FileCopy, GAT_ResourceFind Resource Broker, Monitoring System, GRAM, GridFTP, GridSSH, MDS

43

2020/4/26

Proposed GAT Implementation

Applications GAT-API GAT Engine .so .so .so .so .so .so

Adaptors Grid Infrastructure Cactus, Triana, Generic Codes, Portals, Scripts Library containing registration routines and empty implementations Layer providing access to services, shared libraries, dynamically loadable at run time Resource brokers, basic grid services (GRAM, GSIFTP, MDS)

44

Machine A Gridftp Gridscp

2020/4/26

GAT Example

“Application” (Cactus Migration Tool, User Portal, Grid Script) wants to move a file between two other machines How to move the file:

Available software

User authentification

Disk properties (disksize, user quota, inodes)

Network bandwidth Machine B Gridscp

45

2020/4/26

GAT_FileCopy(GAT, “ThisMachine:ThisFile”, ”ThatMachine:ThatFile”) GAT Engine

GAT Example

GAT_GFTPAdaptor.so

IGAT_FileCopy IGAT_FileMove IGAT_MkDir GAT_GSCPAdaptor.so

IGAT_FileCopy GAT API call from Cactus Application, or Portal, or Python script Engine decides (by different means) how to respect this request GridFTP Adaptor says it knows how to both Copy and Move files between machines GridSSH Adaptor only knows how to Copy files

46

W4: Grid Portals

•Building Web-enabled End-User Environments for accessing Grid Services •By Michael Paul Russell •[email protected]

•Dept Computer Science •University of Chicago 2020/4/26 47

Main Goals

• Evaluate GridSphere (developed by the ASC project, soon to be open-sourced) • Extend/Adapt GridSphere to: • Build/Support (Cactus) Simulation Portal • Build/Support Grid Administrative Portal • Build/Support (Generic) User Portal 2020/4/26 48

Tasks

• Setup our development environment.

• Determine where/how production environments will be supported.

• Determine Functional and User Interface Requirements for proposed Grid Portals.

• Work with W1-W3 in order to unify architectures, APIs, and so forth.

• Work with W5-W12 to develop plans for enabling access to the tools these teams are developing.

2020/4/26 49

Tasks: Continued

• Coordinate development of GridSphere with Astrophysics Simulation Collaboratory and other portal groups.

• Stay on top of GGF standards.

• Develop relationships with Grid Computing Environments and other Grid Portal working groups.

2020/4/26 50

Input to GAT Architecture

• Well, not exactly input to GAT … • More likely to combine efforts with TGAT.

• Remember also, Grid Portals are “ always up ” , multi-user environments for providing access to a variety of Grid Services, to coordinate and monitor user activity.

2020/4/26 51

Relationships with other WPs

• Our relationship with W1-W3 is defined through our collaboration on GAT.

• We will work with W5-W11 through email correspondence, phone conferences, and onsite visits. It is important that we maintain communication with these groups, as they are building tools that we want to utilize in our Grid Portals!

2020/4/26 52

GridLab Portal / Work Package Relationships Transient Services that GridLab Portals must support W1 W2 W3 W7 W11 Grid applications will be built with tools developed by these WPs. We must support applications compiled with these tools, particularly as a central or coordinating service.

Standing Services that will support GridSphere W5 W7 W8 W11 GridLab Portals should be able to utilize standing Grid Services developed by these WPs.

W6 2020/4/26 W6 will provide recommendations for how W4 should handle security.

53

Client code downloaded into Internet browser DHTML Java Applets Java CoG GSI MyProxy GRAM GSIFTP MDS 2020/4/26 ASC Portal Architecture HTTPS front-end (Apache + mod_ssl) Java Servlets Container (Tomcat) Orbiter Servlets Presentation Framework (JSP-based page handling) GridSphere Application Server Framework Users Sessions Security Resources Services Cactus Java CoG, JDBC, and other public libraries JDBC/RDBMS (MySQL) Data Data Backend Processes Resource Manager Cactus Application Monitor 54

GridLab Portal Development

• GridLab Portals will be developed with GridSphere/Orbiter, from which we plan to develop an open-source project.

• We will extend GridSphere with an OGSA communication framework.

• OGSA will provide us with a communication infrastructure with which we can construct Grid portals as collections of distinct services, where each service may be administered from distinct host environments.

• This will make it easier to “ plug ” services developed by other Work Packages into our Grid application server framework.

2020/4/26 55

2020/4/26

The end

Thank you!

56