Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil [email protected] http://www.hpsearch.org.
Download
Report
Transcript Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil [email protected] http://www.hpsearch.org.
Rapid Prototyping and Deployment of
Distributed Web / Grid Services in a
Service Oriented Architecture using
Scripting
Thesis Proposal
Harshawardhan Gadgil
[email protected]
http://www.hpsearch.org
Outline
Motivation
Literature Survey
Research Issues
HPSearch Architecture
Contributions and Milestones
Applications
Summary
Motivation
Critical Infrastructure systems connect disparate data
sources, high-performance computing applications
and visualization services for real-time data
processing.
Real-time data processing
Results required in real-time. Data available in streams.
Requires pre-processing (e.g. filtering data to remove
unwanted parts).
Scalability
Potentially large number of data sources (Static,
dynamic) or data processing elements (services)
Unpredictable behavior
Fault-tolerance a key factor. E.g. Incorporate new data
sources or processing units on the fly
Motivation (contd.)
System Management
Increasing complexity of application implies
more metadata.
Proper management required to ensure
smooth functioning of the system.
Require easy access to manage system
characteristics.
Motivation
Streaming data Processing
Critical Infrastructure systems
(Scientific applications)
Audio/video applications.
Real-time streaming sources exist
E.g. sensors, satellite stations
Real-time sources
E.g. Collaborative sessions
OR
Static data source (stored A/V files)
Data filtering / transformation
essential in most cases for
converting data to proper format for
processing application
Pre-processing required to modify
A/V characteristic
Real-time processing required.
Crucial for critical infrastructure
applications
OR
Static data sources (databases
containing previously warehoused
observations)
Format (encoding) / bit rate
(quality) etc…
Real-time processing crucial for
collaborative environments
Outline
Motivation
Literature Survey
Research Issues
HPSearch Architecture
Contributions and Milestones
Applications
Summary
Literature Survey
Services (Web / Grid)
Scripting Languages
Benefits
Possible problems
Handling data flow in applications
File-based vs. Streaming
Workflow Systems
Enable gluing High performance components
GUI – based building and programming flavor
Component based architectures
Messaging systems (for High throughput data
transfer)
System Management
Service
“Service is a logical manifestation of a logical /physical resource (DB,
programs, devices, humans etc) and/or some application logic exposed
to network”
-
Web Service Grids: An Evolutionary Approach (2004)
Web Services
Simple mechanism for distributed computing
Language independent, firewall friendly
Grid Services
Are essentially Web Services
Transient – (can be created, destroyed, or die naturally)
State – Maintained between calls to the Web Service
Scripting Languages
Benefits
Enables Rapid prototyping (less code size and development time)
Less effort to
Perform complex tasks
Interface with OS (hosting environment)
Glue code to tie programs
Usually portable
Primarily for Plugging existing components together
However, some disadvantages too
Weak typing
Less structure, difficult to maintain
Some examples
Rhino – Java script for JAVA
Perl, VBScript, (P/J)ython
Scripting vs GUI builders
GUI Builders – Ease of involvement of novice design engineer
Scripting – Provides more flexibility thru direct access
Scripting Environments
Hosting Services
OGSI:Lite & WSRF:Lite
Based on Perl
Rapidly deploy grid services
Matlab / Jython from GEODISE
GEODISE – Suite of CAD integrated with distributed
grid-enabled computing, data, analysis and knowledge
resources
Uses Matlab to provide programatic access to GEODISE
functions along with an existing suite of Matlab tools
Jython used to provide a hosting environment using
Java CoG kit.
Data flow in applications
Real-time processing required.
Typically data transfer involves temporary storing of
data. This data may be transferred using files (E.g.
Grid FTP).
Every component of the chain processes data from
input file, writes processed data to output file.
Time and Space critical in real-time applications hence
file-based transfer is undesirable for real-time
applications.
Tools to automate data transfer and invoke
applications (E.g. Grid Ant, Karajan)
Workflow Architectures
Triana – Graphical PSE to compose scientific
applications
Composed of one or more Triana engines.
Distributed version
Data transfer takes place using JXTA pipes.
Taverna
Can interact with arbitrary services.
Plugins to mediate / operate the service in each case
Uses XScufl (derived from WSFL) workflow language.
Kepler
Java packages for designing and execution.
Has a graphical interface for composing complex workflows
Can wrap existing code written in different languages. For e.g. Perl
script or Matlab script
Component Architectures
XCAT @ IU-Extreme
Connects components (Provides and Uses ports)
Jython based scripting to do application
management tasks (create application, set
properties, invoke application)
Data transfer by GridFTP between components,
Globus Reliable File Transfer (fault tolerance).
Many other systems
Focus mainly on invocation of services as in a
Workflow
Messaging systems
JXTA – P2P middleware, JMS for communication
Pastry
Fault tolerant P2P middleware
Based on Distributed Hash tables
No real-time routing possible
NaradaBrokering @ IU – http://www.naradabrokering.org
Event- brokering system designed to run on a large network of cooperating brokers.
Implements high-performance protocols (message transit time < 1
ms per broker)
Order-preserving optimized message transport
Interface with reliable storage for persistent events
Fault tolerant data transport
Support for different underlying transport implementations such as
TCP, UDP, Multicast, SSL, RTP
System Management
Increasing complexity of systems implies increasing
amount of metadata to be managed
Provide access to System and management of
System metadata
-
WS - Management
E.g. Performance metrics, logs, service metadata
Require ability to query system data and take
actions affecting the characteristics of the system.
For e.g. Perl provides hooks to query system data
Outline
Motivation
Literature Survey
Research Issues
HPSearch Architecture
Contributions and Milestones
Applications
Summary
Research Issues
Support for streaming data processing.
Data transfer and processing in real-time
Data transfer to be carried on between the end-points
(sender and recipient) without the flow engine mediating
- Grid Services Flow Language
Design a run-time system that allows merging data sources,
data filtering and processing applications and visualization tools
in a service-oriented architecture
Assume all components available as Web (Grid) services.
Scalability an issue – Addition of data sources or processing
applications (Services) should not degrade the system
performance
Fault-tolerance – Services and data sources may be lost.
Allow system to detect faults and discover and incorporate
new components.
Research Issues
System Management Interface - Allow access to
system and manipulate the characteristics of system
by querying system metadata
Create Virtual topology for application deployment
Query performance metrics to design policies to
change routing substrate characteristics (E.g. Add new
brokers or links between existing brokers to aid
efficient routing)
Discover Services / brokers / topics of interest.
To dynamically rewire components with data
streams.
Replay events
Useful for achieving recovery after failure
Outline
Motivation
Literature Survey
Research Issues
HPSearch Architecture
Contributions and Milestones
Applications
Summary
HPSearch
Binds URI to a scripting language
We use Mozilla Rhino (A Javascript implementation, Refer:
http://www.mozilla.org/rhino), but the principles may be applied to any other
scripting language
Every Resource may be identified by a URI and HPSearch allows us to manipulate the
resource using the URI.
For e.g. Read from a web address and write it to a local file
x = “http://trex.ucs.indiana.edu/data.txt”;
y = “file:///u/hgadgil/data.txt”;
Resource r = new Resource(“Copier”);
r.port[0].subscribeFrom(x);
/* read from */
r.port[0].publishTo(y);
/* write to */
f = new Flow();
f.addStartActivities(r);
f.start(“1”);
Adding support for WS-Addressing construct, under investigation
HPSearch (contd.)
Currently provide bindings for the following
file://
socket://ip:port
http://, ftp://
topic://
jdbc:
Host-objects to do specific tasks
– invoke web-services using SOAP
PerfMetrics – Bind NaradaBrokering performance metrics.
Store published metrics and allow querying
Resource – Every data source / filter / sink is a resource.
Flow – To create a data flow between resources. Useful for
creating data flows
For more information, visit
WSDL
http://www.hpsearch.org
Architecture
Consists of
SHELL
TASK_SCHEDULER (FLOW_ENGINE)
Front end to scripting.
Distributes tasks among co-operating engines for load-balancing
purposes.
WSPROXY
An AXIS web service wraps an actual service. The behavior of the
service can be controlled by making simple WS calls to this proxy.
Can be controlled by any Workflow Engine
WSProxy handles streaming data communication on behalf of the service.
Service only sees I/P and O/P streams. These could be files or a remote
data stream or even a file transferred via HTTP / FTP or results from a
database query
Can be deployed in standard Web Service containers (such as Tomcat)
Architecture
WSProxy - Interfaces
Runnable
More control over execution (start, suspend, resume, stop…)
Basic idea (read block of data, process it, write it out)
Ideal for designing quick filtering applications that process data
in streams.
Wrapped
Wrap an existing service (Executables [*.exe], Matlab scripts,
shell / Perl scripts etc…)
Less control, can only start, stop
Ideal for wrapping existing programs / services to expose as a
pluggable component / web service
HPSearch
Architecture Overview
HPSearch Kernel
Files
Sockets
Topics
HPSearch Kernel
Request Handler
Request Handler
Java script Shell
DataBase
URIHandler
Task Scheduler
Flow Handler
Web
Service
DBHandler
Web Service EP
WSDLHandler
WSProxy
WSProxyHandler
Other Objects
Service
Broker Network
WSProxy
HPSearch
Kernel
Service
WSProxy
...
Service
So what is the overhead ?
Partial results as of now
Taken on 1.6 GHz Pentium 4 machine w/ 256 MB RAM running
Java 1.4.1_02, NB version 0.98 rc2, Rhino 1.5R3
Shell Init: 2085 mSec (average)
Results from RDAHMM Script (26 lines, small script) takes about
15 mSec (average per line) to execute
Task distribution (2 engine, 4 tasks) 3897.645 mSec
WSProxy (Init – depends on number of streams to initialize) 700
– 2000 mSec (approximate value using System.currentTimeMillis).
Outline
Motivation
Literature Survey
Research Issues
HPSearch Architecture
Contributions and Milestones
Applications
Summary
Contribution of this Thesis
Stream and Service Management - Program data-flows
Incorporate static and dynamic data sources
WSProxy ensures that data flows directly between components
(Services) without the HPSearch engine mediating. Useful for
streaming large amounts of data without clouding the controller.
Scalable ?
We use NB as our messaging substrate which can handle large number
of clients
All components (data sources, data processing and visualization
applications) are clients. HPSearch manages streams and connects and
steers components.
Fault – tolerant ?
Data source, data filter (processing application) failure possible.
HPSearch can use the discovery service to invoke new services (in lieu
of failed services) and reconnect components via streams to continue
data flow
Contribution of this Thesis
(contd.)
System Management - Scripting admin tasks
Creating network (virtual broker network) topology
Querying Performance metrics
Topic / Broker discovery
Rapid deployment of applications
Deploy Network topology
Set Application properties
Deploy Application
In short:
Provide alternative programmatic (scripting) access to
remote services / resources
Milestones
Implement WS front-end to shell
Remotely submit a script for execution, possibly through a portal
WSProxy / Handler: Fault tolerance to handle situations when
The machine hosting the WSProxy dies
The broker which is used by the proxy dies
The HPSearch Engine dies
Design Application Interface
Allow users to create applications using this interface
Set Application properties, Allow modification of application
properties at runtime using scripting
NB Admin objects
NaradaBroker, PerfMetrics, NBDiscovery,
ReplayService
Milestones (contd.)
Design stream negotiation module to allow
WSProxy to negotiate stream characteristics
Select best possible transport and other QoS
elements for data transfer between two
services (for a particular stream)
Applications - To demonstrate the use
Audio / Video mixer application
Multiple data sources and data filtering
applications joined in a chain.
Outline
Motivation
Literature Survey
Research Issues
HPSearch Architecture
Contributions and Milestones
Applications
Summary
Applications
Streaming Data Filtering
Sensor
Source
GPS Data
HPSearch
Kernel - TSE
Kernel - TSE
Data Filter
Filters the input data to get
only the estimate and error
values
Matlab Plotting
Script
Graph
Kernel - TSE
(Distributed)
Services
RDAHMM
Analyze the data
Applications
Creating Virtual Broker Network for deploying
applications
b = new NaradaBroker("school.cs.indiana.edu");
b.create(""); /* OR b.create("file:///u/hgadgil/alternateConfig.conf"); */
b.connectTo("156.56.104.170", "5045", "t", "");
b.requestNodeAddress("156-56-104-170.bl-dhcp.indiana.edu:5045", "0");
c = new NaradaBroker("trex.ucs.indiana.edu");
c.create("");
c.connectTo("156.56.104.170", "5045", "t", "");
c.requestNodeAddress("tcp://156-56-104-170.bl-dhcp.indiana.edu:5045", "0");
school.cs.indiana.edu
156.56.104.170
school.cs.indiana.edu
trex.ucs.indiana.edu
HPSearch
Shell
trex.cs.indiana.edu
Applications
Invoking Arbitrary Web Services
approved = false;
userID = "111-22-3333";
if(loanAmt < 10000)
approved = true;
else {
loanAmt < 10000
wsRA = new
WSDL("http://www.riskAssessor.com/services/RiskAssessor");
risk = wsRA.invoke("assessRisk", userID, loanAmt);
if(risk > 50)
approved = false;
else
risk = WS_riskAssessor(userID, loanAmt)
approved = true;
}
Print "Loan Approved: " + approved;
risk > 50
approved = false
approved = true
approved = true
Print result
Outline
Motivation
Literature Survey
Research Issues
HPSearch Architecture
Contributions and Milestones
Applications
Summary
Summary
This thesis addresses
Managing data streams (Dynamic and static)
Enabling connecting data sources and data processing
components (available as Web Services) for processing
data in real-time for critical infrastructure applications
Develop a general purpose scripting architecture (like
Perl) for a multitude of tasks
Goal is to create an architecture that is
Pluggable / Extensible
Manageable - Programmable
Similar to the UNIX Pipe-Filter Architecture, but
implemented on a Distributed scale