DISCWorld – A Java-Based Metacomputing Environment

Download Report

Transcript DISCWorld – A Java-Based Metacomputing Environment

DISCWorld, Virtual Data Grids and Grid Applications

Paul Coddington

Distributed & High Performance Computing Group Department of Computer Science, University of Adelaide Adelaide SA 5005, Australia Email: [email protected]

December 2002

Background

Distributed and High-Performance Computing Group

– Started 1996 at University of Adelaide, PDC joined Aug 1997 – Ken Hawick, Andrew Wendelborn, Francis Vaughan, Kevin Maciunas – Originally part of Research Data Networks CRC – Research into metacomputing middleware and on-line data archives – DHPC Bangor started in 2000 by Ken Hawick •

Research areas:

– Metacomputing (grid computing) – Java for High-Performance Computing – Parallel computing and parallel algorithms – Cluster computing – Scientific applications of DHPC

DISCWorld Project

• Distributed Information Systems Control World (DISCWorld) is a metacomputing middleware system being developed by DHPC.

• Mainly a vehicle for research into metacomputing systems, but also developing software and applications.

• O-O, written in Java, provides access to well-known services.

• Still work in progress -- design work done, various modules in different states of completion.

• DISCWorld - high-level, but ideas not fully implemented.

• Globus - very low-level, limited capabilities, but defacto standard.

• Would like to use high-level DISCWorld ideas, but utilize grid tools, protocols, etc being developed around Globus.

• Current work includes: – chains or process networks of services for remote distributed processing – transparent access to ``active’’ distributed heirarchical file systems – integration with Globus tools (using Java CoG)

Virtual Data Grids

• Data grids - where storage, searching and accessing large (e.g. Pbyte) distributed data sets is at least as important as data processing.

– High-energy physics, astronomy, satellite data, biological data, … • DHPC area of interest - On-Line Data Archives (OLDA) project.

• Distributed ``active’’ data archives - or virtual data grids.

– Servers don’t just provide ``passive’’ static data from files.

– Can provide smart data pre-processing services (data reduction, conversion, etc).

– Server(s) generate data on-the-fly, or access cached copy.

– Specify data services or requirements (e.g. metadata), not filenames or URLs.

– Transparently access ``best’’ copy from distributed replicas.

– Distributed Active Resource arChitecture (DARC) – International Virtual Data Grid Laboratory (IVDGL) work on virtual data grids • Example – user specifies required satellite image using metadata (time, region, satellite) – DARC node searches distributed archives, finds ``nearest’’ copy, requests data – server does format conversion, georectification, cropping, ...

Active Data Using GASS

Host A Active GASS Client Host B Active GASS Server Host C Job Manager Remote GASS Server Host Table

https://host:port/filename?program=Truncation&offset=1&length=100 https://host:port/Truncation?filename=myfile&offset=&length=100 Servlets using HTTPS

• Legacy applications can access data grid resources using filenames (URLs)

Host A DR Node DR

DARC

TCP Node DR Host B DR DR Node DR Host C •

Distributed Active Resource Architecture (DARC)

– Allows building of distributed storage devices that support active data – Peer-to-peer approach, each machine runs a DARC node – User (or system) supplied Data Resources (DR)

Active Data Using DARC

Host A Active GASS Client Host B GASS GASS Server Proxy File System Node Host Table GSI Node Host C File System Host Table •

Integration of DARC with Globus tools

– Allows DARC to use GASS, GridFTP, Replica Catalog – Allows Globus grid applications to access DARC data resources

Mobile Metacomputing Middleware

• m.Net G3 mobile network testbed in Adelaide city (North Terrace). • Collaborative project to provide metacomputing middleware for mobile devices (e.g. iPAQ, phones) - starting next year.

• DISCWorld metacomputing ideas fit well to a mobile environment: – provide thin client with access to set of well-know remote services – provide resource broking in dynamic environment – Java implementation • Middleware handles low-level network details – dynamic network environment - mobile user, dropouts, handovers – 3G, regular mobile, 802.11 wireless, docking station – quality of service (adding software layer to interface to IP stack) • Allow users (or applications) to specify policies for services, tasks, priorities, costs. • Provide adaptation for dynamic network, user policies, QoS, cost.

Campus Computational Grid

• Many different compute resources available on campus – Supercomputers (SGI box, PC cluster with Ethernet, Sun and PC clusters with Myrinet) – Several small clusters (Sun Netra, Alpha, Linux PC, Javastation, …) – Student labs (Windows PC, iMac with OS X) – Desktop workstations • Student labs are probably the largest computational resource!

• A mixture of non-interoperable cluster management systems, each with pros and cons, significant effort to install and maintain – Condor - desktop workstations and Windows PCs (but not good for parallel machines) – Proprietary CMS (e.g. on Sun cluster) – PBS - other parallel computers – Only Sun Grid Engine currently ported to Mac OS X • How to integrate this heterogeneous mix of computer resources and management systems to make them easily and transparently accessible by a variety of users?

Problems with Campus Grids

• Could integrate with Globus, but how to submit jobs? • Not globusrun - too low level.

• Ideally users would submit jobs as they do now - with shell scripts, PBS job scripts, Condor job submission files - and they run on any resource (whether or not it runs PBS, Condor, etc).

• But currently requires something like CMS script -> RSL/globusrun -> CMS/scheduler • Globus (mostly) handles second translation, but not the first.

• Non-trivial - CMS combines job specification/execution and resource request/brokering, but Globus separates the two. • How to match jobs with appropriate resource (e.g. which jobs need shared memory, Myrinet, Ethernet, no comms?) • How to interface and cycle share with external grid resources?

Grid Applications

• Lattice Gauge Theory – Centre for the Subatomic Structure of Matter (CSSM) – International Lattice Data Grid for sharing simulation data • Bioinformatics – National Centre for Plant Functional Genomics – Molecular Biosciences department – APGrid biogrid project • High-energy physics – Collaboration between CSSM and Jefferson Lab in US – Data analysis and results – Access Grid • Computational chemistry • CANGAROO gamma ray telescope – Collaboration between Australia and Japan – Link to national/international Virtual Observatory projects