The California Institute for Telecommunications and

Download Report

Transcript The California Institute for Telecommunications and

OptIPuter System Software

Andrew A. Chien SAIC Chair Professor, Computer Science and Engineering, UCSD Director, Center for Networked Systems September 2003 System Software

OptIPuter System Software Team

• •

Challenge

~20 Lead Researchers, Many More in Entire Team

– –

Diverse Researcher Backgrounds and Focus Broad Research Agenda, Abstract Shared Perspective Process

– – –

Innumerable Phone Calls and 1-on-1 Meetings, Fall 2002-Spring 2003 Team Meeting with UCSD and UCI Teams (October 4, 2002) Straw Man OptIPuter System Software Architecture (January 2003)

Goals, Context, Organization, Relationship of Efforts

OptIPuter All Hands Meeting, February 6-7, 2003

– –

First Presentation to Entire Team Feedback, Revision, Improvement, Deeper Understanding, Shared Perspective

Optical Signalling and Network Management Meeting (May 22, 2003)

Mambretti Organized

OptIPuter Software Architecture Version 1.0 (July 2003)

Structure Stabilized, interfaces Becoming Concrete System Software

l

’s Transform Distributed Systems

• •

Key Technology Changes

Massive Bandwidth

100-1000x Increases Wide-Area Systems

“End To End”

l

-Connections

Private Networks, Guaranteed Bandwidth

Endpoints are Parallel Clusters

Large-Scale Network-Attached

Storage

– – –

Instruments Displays Other Peripherals Challenge is Abstractions, Technologies, and Protocols (SOFTWARE!)

Grids and Flexible Wide-Area Sharing to Deliver these Opportunities

Communication Capabilities to Applications

– – –

Tight Wide-area Resource Coupling Simpler Distributed Applications Proactive Computing and Communication System Software

Towards Middleware for

l

-Networked Systems

Globus Architecture

Application DUROC, GARA, Replica Catalogs, Metadata Servers, Brokers, Workflow GRAM, GridFTP, GRIS, Co allocation Globus_IO/XIO & GSI Collective Resource Connectivity Resource Access and Control: Computers, Storage, Networks Fabric • •

Leverage Investment and Capabilities (e.g. Globus 2.2 and 3.0)

– –

Carl Kesselman OptIPuter Participant Ian Foster, OptIPuter Frontier Advisory Board Explore What Must Change

– –

New Software/Protocols for Managing Lambdas Simplify, Deliver Higher Performance and New Capabilities System Software

OptIPuter Software Architecture for Distributed Virtual Computers v1.1

DVC/ Middleware DVC #1 OptIPuter Applications DVC #2 Visualization DVC #3 Higher Level Grid Services Security Models Data Services: DWTP Real-Time Layer 5: SABUL, RBUDP, Fast, GTP Transport Grid and Web Middleware – (Globus/OGSA/WebServices/J2EE) Optical Node Operating Systems Layer 4: XCP Signaling/Mgmt

l

-configuration, Net Management Physical Resources System Software

OptIPuter Links Three Major Sets of Technology Activities

• • •

Distributed Virtual Computers

Provide a Simple Abstractions

– –

Aggregate Component Technology Capabilities Surface Novel Capabilities High speed Transport Protocols [Bannister’s Talk]

Long Thread of High Bandwidth-Delay Product Network Protocols

Span The Range “Reach” For Dedicated Optical Connections

– – –

Complete Integration with IP Network Management Hybrid – to Local Packet-Switched Networks Separate – End-to-end Optical Network Signaling and Management [Mambretti’s Talk]

– – –

Single Domain and Inter-Domain Hybrid Circuit and Packet-Switched Networks Planning and Execution System Software

Distributed Virtual Computers

System Software

Exploiting

l

’s for an Application

Network View: Ad Hoc connections

Applications Request

l

-Connections

Network Recognizes High BW flows and Configures

System View: Enclave of Resources and Connections

a Distributed Virtual Computer (a SYSTEM)

How to Specify, Implement, and Exploit?

System Software

SDSC

DVC Examples

UCI or UIC UCSD CSE SIO/NCMIR

Virtual Cluster (Hide Complexity of Grid; Resource Flexibility)

Shared Single Domain (Spans Multiple)

– – – –

Private Connections; Simple Network Naming Simple Resource Discovery and Access Uniform Performance Characteristics Direct Access to Everything (Storage, Displays, etc.)

Real-Time Virtual Cluster for Distributed Collaborative Visualization

Grid Resources + Real-Time (TMO)

Collaborative Visualization Cluster

Grid Resources + Photonic Multicast or LambdaRAM (Leigh) System Software

Realizing Distributed Virtual Computers

• •

Research Challenges

Application-driven Definition of Abstractions

Useful Collections which Match Application Paradigms and Needs

Incorporates New Collective Models

DVC Description

Namespaces, Communication, Performance, Real Time, …

Standard Specifications; Most Applications Parameterize

Integration Of Component Technologies Executing the DVC on a Grid

Planner That Identifies Resources

Selects from Virtual Grid Resources

Negotiates with Resource Managers and Brokers

Executor and Monitor for DVC

Acquires and Configures

– –

Monitors for Failures and Performance Adapts and Reconfigures System Software

OptIPuter Component Technologies

System Software

Current Storage Views

Network-attached Storage (NAS)

Filesystem protocols; Integrated Access-Control and Security

Low performance; Little Aggregation and Parallelism

Grid View: High-Level Storage Federation

– – –

GridFTP (Distributed File Sharing) GSI-based Access/Authentication Put/Get, Third-Party Transfers, Whole File and Segments

Single-System view: Lower-level storage federation

– – –

Secure Single System View SAN – Block Level Disk and Controller Protocols High Performance, Efficient sharing

Research Areas

– –

Network-Attached Secure Disk Direct Access File Systems System Software

We Need a Distributed Storage Solution for e-Science Distributed Data Generators

BIRN: Distributed Data, Intensive Analysis

– – –

100GB Data Elements; Petabyte Data Sets Comparative and Collective Analysis across Data Elements Visualization of Multi-Scale Data Objects System Software

Storage Research Directions

From Performance to Performability

– – –

Manage and Exploit Multi-Latency Performance Parallel Performance, Stability, and Isolation Integration of Device, Network, Site Reliability Concerns

OptIPuter Storage Directions

– –

Application-Driven Design

Needs, Performance, Device/Site/Network Flexibility, Coding and Selection Integrate Dynamic

l

’s and SAN Networks

Peering, Protocol Interfacing, Performance

Performance Robust Storage

Erasure/Other Redundancy; Large-Scale Parallelism; Statistical Approaches to Performance Isolation

Secure Shared Storage: Threshold Cryptography Approach System Software

OptIPuter Security Considerations

OptIPuter as a Computing Platform

Information Assurance and Security Needed for Applications

Current Plan: use Globus Security Infrastructure

OptIPuter as a Research Platform

Current Efforts

– –

Distributed Security Services (Goodrich & Tamassia) Incremental IP Trace-Back via Packet Marking for DOS Defense (Goodrich)

Enhanced Forensic Analysis By Design (Karin & Peisert)

Planned Efforts

– – –

Minimum Round Trip Latency Control (Goodrich) Hardening Against Attacks by Multi-Path Routing (Goodrich, Karin) End-to-End Application and Session Security Through Dedicated Lambdas (Karin) System Software Source: Karin, UCSD and Goodrich, UCI

Multi-Lambda Security Opportunities

Security Frequently Defined Through Three Measures:

Integrity, Confidentiality, And Reliability (“Uptime”)

• • •

Can These Measures be Enhanced by Employing Multiple Lambdas?

Can Confidentiality be Improved by Dividing the Transmission Over Multiple Lambdas?

Fundamentally or Using “Cheap” Encryption?

Can Integrity be Ensured or Reliability Improved by Exploiting Redundancy?

– –

Source Coding and Performance Adaptive Techniques System Software Source: Goodrich, Karin

Vision – Real-Time Tightly Coupled Wide-Area Distributed Computing

Real Time Object network

Dynamically formed

Distributed Virtual Computer Goals

High-precision Timings of Critical Actions

Tight Bounds on Response Times

Ease of Programming

High-Level Prog

Top-Down Design

Ease of Timing Analysis Source: Kim, UCI System Software

Real-Time: from LAN to WAN

Time-Triggered Message-Triggered Object ( TMO ) Middleware Subsystem Model that can be Easily Implemented on Both Windows and Linux Platforms

Developed a Global Time-Based Coordination for use in Fair and

Compo nents of a C++ object

Efficient Distributed On-Line Game Systems and LAN Feasibility Demonstration

a Step towards Distributed OptIPuter Environment Demonstration

Paper will be Presented at IDPT 2003 Conference, December 2003

var AAC AAC

 

TT Method 1 TT Method 2



Service Method 1 Service Method 2

 Deadlines

No thread, No priority High-level Programming Style Source: Kim, UCI System Software

TMO and OptIPuter Software

• • •

TMO will be Integrated into the Overall OptIPuter Software Architecture Begin Design TMO Programming Framework for the OptIPuter Prototype Implementation TMO Support on Linux Platforms, Including OptIPuter Visualization Cluster (UIC – Leigh, UCI -- Jenks)

" Let us start a chorus at 2pm " data " e-Science " data data

Middleware

FT Support TMOSM Kernel

Middleware

FT Support TMOSM Kernel Lambda mux / demux

Source: Kim, UCI

Lambda mux / demux •

An API Wrapping the Services of the RT Middleware Enables High-Level RT Programming Without a new Compiler System Software

Prophesy: Application Performance Modeling

• • • • •

Performance Modeling of Applications on OptIPuter Cross Platform Comparison (vs. Traditional Grid & Parallel) Yr1: Completed Data Analysis Module Profiling & Instrumentation Yr2: Work with Applications and High Speed Transport Protocols Target applications include:

SIO Geophysical Data Visualization

Actual Execution NCMIR/BIRN Neuroscience Applications DATA COLLECTION Web-based GUI Template Database Performance Database Systems Database DATABASES Model Builder Symbolic Predictor DATA ANALYSIS System Software Source: Taylor, TAMU

Summary

• • •

OptIPuter System Software Team Organization

Development of a Concrete, Shared Perspective

Organization into Tightly-Coupled Teams OptIPuter Software Architecture 1.0 (July 2003)

– –

Provides Focus on Key Problems, Clusters Related Activities Framework for Integrating Diverse Capabilities, Identifying Gaps, Integrating and Delivering Solutions Research Activity Clusters

Distributed Virtual Computers

Including Real-Time, Security, Storage, Performance Modeling

– –

High Speed Transport Protocols Optical Signaling and Network Management System Software