Transcript title

HENP Grids and Networks Global Virtual Organizations

Harvey B Newman, Professor of Physics LHCNet PI, US CMS Collaboration Board Chair WAN In Lab Site Visit, Caltech Meeting the Advanced Network Needs of Science March 5, 2003

Computing Challenges: Petabyes, Petaflops, Global VOs

  

Geographical dispersion: Complexity: Scale: of people and resources the detector and the LHC environment Tens of Petabytes per year of data 5000+ Physicists 250+ Institutes 60+ Countries Major challenges associated with: Communication and collaboration at a distance Managing globally distributed computing & data resources Cooperative software development and physics analysis New Forms of Distributed Systems: Data Grids

Next Generation Networks for Experiments: Goals and Needs

Large data samples explored and analyzed by thousands of globally dispersed scientists, in hundreds of teams

Providing rapid access to event samples, subsets and analyzed physics results from massive data stores

From Petabytes by 2002, ~100 Petabytes by 2007, to ~1 Exabyte by ~2012.

Providing analyzed results with rapid turnaround, by coordinating and managing the large but

LIMITED

computing, data handling and

NETWORK

resources effectively

Enabling rapid access to the data and the collaboration

Across an ensemble of networks of varying capability

Advanced integrated applications, such as Data Grids, rely on seamless operation of our LANs and WANs

With reliable, monitored, quantifiable high performance

First Beams : April 2007 Physics Runs: July 2007 ALICE : heavy ions, p-ions LHC

 

pp

s = 14 TeV Heavy ions L design = 10 34 (e.g. Pb-Pb at cm -2 s -1

s ~ 1000 TeV) TOTEM

27 Km ring 1232 dipoles B=8.3 T (NbTi at 1.9 K)

CMS and ATLAS: pp, general purpose LHCb : pp, B-physics

US

LHC Collaborations

ATLAS CMS

The US provides about 20-25% of the author list in both experiments

US CMS Accelerator US ATLAS

US LHC INSTITUTIONS

Four LHC Experiments: The Petabyte to Exabyte Challenge

ATLAS, CMS, ALICE, LHCB Higgs + New particles; Quark-Gluon Plasma; CP Violation

Data stored CPU ~40 Petabytes/Year and UP; 0.30 Petaflops and UP 0.1 to 1 Exabyte (1 EB = 10 18 Bytes) (2008) (~2013 ?) for the LHC Experiments

LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate (+30 minimum bias events) Reconstructed tracks with pt > 25 GeV All charged tracks with pt > 2 GeV 10 9 events/sec, selectivity: 1 in 10 13 (1 person in a thousand world populations)

LHC Data Grid Hierarchy

~PByte/sec CERN/Outside Resource Ratio ~1:2 Tier0/(

Tier1)/(

Tier2) ~1:1:1 Experiment Online System

Tier 0 +1

~100-1500 MBytes/sec CERN 700k SI95 ~1 PB Disk; Tape Robot

Tier 1

IN2P3 Center ~2.5-10 Gbps RAL Center INFN Center FNAL: 200k SI95; 600 TB 2.5-10 Gbps

Tier 3

~2.5-10 Gbps

Tier 2

Tier2 Center

Institute Institute

Physics data cache Workstations 0.1–10 Gbps

Tier 4

Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels

Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*]

CMS 2001 2002 2003 2004 2005 2006 100 200 300 600 800 2500 ATLAS 50 100 300 600 800 2500 BaBar CDF D0 300 100 600 1100 1600 2300 3000 300 400 2000 3000 6000 400 1600 2400 3200 6400 8000 BTeV DESY 20 100 40 180 100 210 200 240 300 270 500 300 CERN BW [*] 155 310 622 2500 5000 10000 20000 Installed BW. Maximum Link Occupancy 50% Assumed See http://gate.hep.anl.gov/lprice/TAN

History – One large Research Site

Much of the Traffic: SLAC

IN2P3/RAL/INFN; via ESnet+France; Abilene+CERN Current Traffic ~400 Mbps; ESNet Limitation Projections: 0.5 to 24 Tbps by ~2012

Progress: Max. Sustained TCP Thruput on Transatlantic and US Links

       

* 8-9/01 105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN 11/5/01 125 Mbps in One Stream (modified kernel): CIT-CERN 1/09/02 190 Mbps for One stream shared on 2 155 Mbps links 3/11/02 120 Mbps Disk-to-Disk with One Stream on 155 Mbps link (Chicago-CERN) 5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams 6/1/02 290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel) 9/02 850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, OC48 Link 11-12/02 FAST: 940 Mbps in 1 Stream SNV-CERN; 9.4 Gbps in 10 Flows SNV-Chicago Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/; and the Internet2 E2E Initiative: http://www.internet2.edu/e2e

American partners

US-CERN OC48 Deployment

Phase One European partners Cisco 7609 OC12 (Production) OC48 (Develop and Test) Cisco 7609 Caltech (DOE) PoP in Chicago CERN - Geneva

Phase two American partners Cisco 7606 Cisco 7609 Caltech (DOE) PoP in Chicago OC48 (Develop and Test) OC12 (Production) Cisco 7606 European partners Cisco 7609 CERN - Geneva CERN

American partners

OC 48 deployment (Cont’d)

Phase three (Late 2002) Alcatel 7770 DataTAG (CERN) Cisco 7606 Caltech (DoE) Juniper M10 Caltech (DoE) OC48 2.5 Gbps Alcatel 7770 DataTAG (CERN) Cisco 7606 DataTAG (CERN) Juniper M10 DataTAG (CERN) European partners Cisco 7609 Caltech (DoE) OC12 622 Mbps Cisco 7609 DataTAG (CERN) CERN

 

Separate environments for tests and production Transatlantic testbed dedicated to advanced optical network research and intensive data access applications

DataTAG Project

UK SuperJANET4 It GARR-B NewYork STARLIGHT GENEVA GEANT ABILENE ESNET NL SURFnet STAR-TAP CALREN 2 Fr INRIA Atrium VTHD

EU-Solicited Project. CERN , PPARC (UK), Amsterdam (NL), and INFN (IT); and US (DOE/NSF: UIC, NWU and Caltech) partners

Main Aims:

 

Ensure maximum interoperability between US and EU Grid Projects Transatlantic Testbed for advanced network research

2.5 Gbps Wavelength Triangle from 7/02; to 10 Gbps Triangle by Early 2003

LHCnet Network : March 2003

GEANT

CERN -Geneva

Cisco 7606 CERN Switch Linux PC for Performance tests & Monitoring OC12 – 622 Mbps IN2P3 WHO Alcatel 7770 DataTAG (CERN) Cisco 7609 DataTAG (CERN) Optical Mux/Dmux Alcatel 1670 Juniper M10 DataTAG (CERN) OC48 – 2,5 Gbps Cisco 7606 Caltech (DoE) Linux PC for Performance tests & Monitoring Optical Mux/Dmux Alcatel 1670 Alcatel 7770 DataTAG (CERN) Cisco 7609 Caltech (DoE) Juniper M10 Caltech (DoE)

Caltech/DoE PoP – StarLight Chicago

Abilene Development and tests ESnet NASA MREN STARTAP

FAST (Caltech): A Scalable, “Fair” Protocol for Next-Generation Networks: from 0.1 To 100 Gbps I2 LSR 29.3.00

multiple 9.4.02

1 flow 22.8.02

IPv6 SC2002 10 flows SC2002 2 flows SC2002 1 flow SC2002 11/02 Highlights of FAST TCP

  

Standard Packet Size 940 Mbps single flow/GE card 9.4 petabit-m/sec 1.9 times LSR 9.4 Gbps with 10 flows 37.0 petabit-m/sec 6.9 times LSR

22 TB in 6 hours; in 10 flows Implementation

Sender-side (only) mods

 

Delay (RTT) based Stabilized Vegas Internet: distributed feedback system Theory Experiment

R f (s)

TCP

R b ’ (s)

URL: netlab.caltech.edu/FAST

AQM

p

Sunnyval e 3000km Next: 10GbE; 1 GB/sec disk to disk Geneva Baltimore C. Jin, D. Wei, S. Low FAST Team & Partners

TeraGrid (www.teragrid.org) NCSA, ANL, SDSC, Caltech, PSC

A Preview of the Grid Hierarchy and Networks of the LHC Era Abilene Chicago Urbana Indianapolis Caltech San Diego OC-48 (2.5 Gb/s, Abilene) Multiple 10 GbE (Qwest) Multiple 10 GbE (I-WIRE Dark Fiber)

I-WIRE

UIC ANL Starlight / NW Univ

Multiple Carrier Hubs

Ill Inst of Tech Univ of Chicago NCSA/UIUC Indianapolis (Abilene NOC) Source: Charlie Catlett, Argonne

National Light Rail Footprint

SEA POR SAC SVL NYC BOS OGD CHI DEN FRE CLE PIT WDC KAN LAX SDG PHO OLG STR DAL 15808 Terminal, Regen or OADM site Fiber route NAS WAL RAL ATL JAC NLR

Buildout Started November 2002

Initially 4 10G Wavelengths

To 40 10G Waves in Future Transition now to optical, multi-wavelength R&E networks: US, Europe and Intercontinental (US-China-Russia) Initiatives; Efficient use of Wavelengths is an Essential Part of this Picture

HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps

Year 2001 2002 Production

0.155 0.622

Experimental

0.622-2.5 2.5

Remarks

SONET/SDH SONET/SDH DWDM; GigE Integ.

2003 2005

2.5 10 10 2-4 X 10 DWDM; 1 + 10 GigE Integration

Switch;

Provisioning 1 st Gen.

Grids

2007

2-4 X 10 ~10 X 10; 40 Gbps

2009

~10 X 10 or 1-2 X 40 ~5 X 40 or ~20-50 X 10 40 Gbps

Switching

2011

~5 X 40 or ~20 X 10 ~25 X 40 or ~100 X 10 2 nd Gen

Grids Terabit Networks

2013

~T erabit ~MultiTbps ~Fill One Fiber Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade; We are Rapidly Learning to Use and Share Multi-Gbps Networks

       

HENP Lambda Grids: Fibers for Physics

Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data Stores Survivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007) requires that each transaction be completed in a relatively short time. Example: Take 800 secs to complete the transaction. Then Transaction Size (TB) Net Throughput (Gbps) 1 10 10 100 100 1000 (Capacity of Fiber Today) Summary: Providing Switching of 10 Gbps wavelengths within ~3-5 years; and Terabit Switching within 5-8 years would enable “Petascale Grids with Terabyte transactions”, as required to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields.

Emerging

Data Grid

User Communities

NSF Network for Earthquake Engineering Simulation (NEES)

Integrated instrumentation, collaboration, simulation

Grid Physics Network (GriPhyN)

ATLAS, CMS, LIGO, SDSS

Access Grid; VRVS: supporting group-based collaboration

   

And

Genomics, Proteomics, ...

The Earth System Grid and EOSDIS Federating Brain Data Computed MicroTomography …

Virtual Observatories

COJAC : C MS O RCA J ava A nalysis C omponent: Java3D Objectivity JNI Web Services Demonstrated Caltech-Rio de Janeiro (2/02) and Chile (5/02)

CAIGEE

C MS A nalysis – an I ntegrated G rid E nabled E nvironment

 

CIT, UCSD, Riverside, Davis; + UCLA, UCSB NSF ITR – 50% so far Lightweight , functional , making use of existing software AFAP

 

Plug-in Architecture detail with Feedback based on Web Services Expose Grid “ Global System ” to physicists – at various levels of

Supports Request , Preparation , Production , Movement , Analysis of Physics Object Collections

Initial Target: Californian US-CMS physicists

Future: Whole US CMS and CMS

Clarens

HTTP/SOAP/RPC

The Clarens Remote (Light Client) Dataserver: a WAN system for remote data analysis

Clarens servers are deployed at Caltech, Florida, UCSD, FNAL, Bucharest; Extend to UERJ in Rio (CHEPREO)

SRB now installed as Clarens service on Caltech Tier2 (Oracle backend)

Server

NSF ITR: Globally Enabled Analysis Communities

Develop and build Dynamic Workspaces

Build Private Grids to support scientific analysis communities

Using Agent Based Peer-to-peer Web Services

Construct Autonomous Communities Operating Within Global Collaborations

Empower small groups of scientists (Teachers and Students) to profit from and contribute to int’l big science

Drive the democratization of science via the deployment of new technologies

NSF ITR: Key New Concepts

   

Dynamic Workspaces

Provide capability for individual and sub-community to request and receive expanded, contracted or otherwise modified resources, while maintaining the integrity and policies of the Global Enterprise Private Grids

Provide capability for individual and community to request, control and use a heterogeneous mix of Enterprise wide and community specific software, data, meta-data, resources Build on Global Managed End-to-end Grid Services Architecture; Monitoring System Autonomous, Agent-Based, Peer-to-Peer

Private Grids and P2P Sub Communities in Global CMS

A Global Grid Enabled Collaboratory for Scientific Research (GECSR)

 

Caltech (HN PI,JB:CoPI)

Michigan (CoPI,CoPI)

Maryland (CoPI)

and Senior Personnel from

Lawrence Berkeley Lab

Oklahoma

Fermilab

Arlington (U. Texas)

Iowa

Florida State The first Grid-enabled Collaboratory: Tight integration between

Science of Collaboratories,

Globally scalable working environment

A Sophisticated Set of Collaborative Tools (VRVS, VNC; Next-Gen)

Agent based monitoring and decision support system (MonALISA)

GECSR

 

Initial targets are the global HENP collaborations, but GESCR is expected to be widely applicable to other large scale collaborative scientific endeavors

“Giving scientists from all world regions the means to function as full partners in the process of search and discovery”

The importance of Collaboration Services is highlighted in the Cyberinfrastructure report of Atkins et al. 2003

Current Grid Challenges: Secure Workflow Management and Optimization

Maintaining a Global View of Resources and System State

Coherent end-to-end System Monitoring

Adaptive Learning: new algorithms and strategies for execution optimization (increasingly automated)

Workflow: Strategic Balance of Policy Versus Moment-to-moment Capability to Complete Tasks

Balance High Levels of Usage of Limited Resources Against Better Turnaround Times for Priority Jobs

Goal-Oriented Algorithms; Steering Requests According to (Yet to be Developed) Metrics

Handling User-Grid Interactions: Guidelines; Agents

Building Higher Level Services, and an Integrated Scalable User Environment for the Above

14000+ Hosts; 8000+ Registered Users in 64 Countries 56 (7 I2) Reflectors Annual Growth 2 to 3X

MonaLisa:

A Globally Scalable Grid Monitoring System

    

By I. Legrand (Caltech) Deployed on US CMS Grid Agent-based Dynamic information / resource discovery mechanism Talks w/Other Mon. Systems Implemented in

Java/Jini; SNMP

WDSL / SOAP with UDDI Part of a Global

Grid Control Room

Service

    

Distributed System Services Architecture (DSSA): CIT/Romania/Pakistan

Lookup

Agents: Autonomous, Auto-

Discovery

discovering, self-organizing,

Service

collaborative

“Station Servers” (static) host mobile “Dynamic Services” Lookup Service Lookup Service Servers interconnect dynamically; form a robust fabric in which mobile agents travel, with a payload of (analysis) tasks Adaptable to Web services: OGSA; and many platforms

Adaptable to Ubiquitous, mobile working environments

Station Server Station Server Station Server Managing Global Systems of Increasing Scope and Complexity, In the Service of Science and Society, Requires A New Generation of Scalable, Autonomous, Artificially Intelligent Software Systems

MONARC SONN: 3 Regional Centres Learning to Export Jobs (Day 9)

= 0.83

= 0.73

1MB/s ; 150 ms RTT CERN 30 CPUs CALTECH 25 CPUs NUST 20 CPUs = 0.66

Day = 9 By I. Legrand

Networks, Grids, HENP and WAN-in-Lab

Current generation of 2.5-10 Gbps network backbones arrived in the last 15 Months in the US, Europe and Japan

 

Major transoceanic links also at 2.5 - 10 Gbps in 2003 Capability Increased ~4 Times, i.e. 2 3 Times Moore’s

Reliable high End-to-end Performance of network applications (large file transfers; Grids) is required. Achieving this requires:

A Deep understanding of Protocol Issues, for efficient use

 

of wavelengths in the 1 to 10 Gbps range now, and higher speeds (e.g. 40 to 80 Gbps) in the near future Getting high performance (TCP) toolkits in users’ hands End-to-end monitoring; a coherent approach

Removing Regional, Last Mile Bottlenecks and Compromises in Network Quality are now

On the critical path, in all regions

We will Work in Concert with AMPATH, Internet2, Terena, APAN; DataTAG, the Grid projects and the Global Grid Forum

A WAN in Lab facility, available to the Community, is a Key Element in achieving these revolutionary goals

Some Extra Slides Follow

Global Networks for HENP

National and International Networks, with sufficient (rapidly increasing) capacity and capability, are essential for

The daily conduct of collaborative work in both experiment and theory

Detector development & construction on a global scale; Data analysis involving physicists from all world regions

The formation of worldwide collaborations

The conception, design and implementation of next generation facilities as “global networks”

“Collaborations on this scale would never have been attempted, if they could not rely on excellent networks”

The Large Hadron Collider (2007-)

The Next-generation Particle Collider

The largest superconductor installation in the world

Bunch-bunch collisions at 40 MHz, Each generating ~20 interactions

Only one in a trillion may lead to a major physics discovery

Real-time data filtering: Petabytes per second to Gigabytes per second

Accumulated data of many Petabytes/Year

Education and Outreach

QuarkNet has 50 centers nationwide (60 planned)

has:

Each center 2-6 physicist

 

2-12 mentors teachers*

*

Depending on year of the program and local variations

A transatlantic testbed

 

Multiplexing of optical signals into a single OC- 48 transatlantic optical channel

Multi platforms

Vendor independent

Interoperability tests

Performance tests Layer 2 services:

Circuit-Cross-Connect (CCC)

Layer 2 VPN

IP services:

Multicast

IPv6

QoS

Future: GMPLS

GMPLS is an extension of MPLS

Service Implementation example

Abilene & ESnet 10 GbE VLAN 600 Host 1 VLAN 601 BGP IBGP 2,5 Gb/s Optical multiplexer Optical multiplexer BGP GEANT Host 2 2,5 Gb/s CCC GbE OSPF CERN Network Host 1 Host 2 Host 3 Host 3 StarLight- Chicago CERN - Geneva

Logical view of previous implementation

Abilene & ESnet GEANT 2,5 Gb/s POS IP over POS Host 1 VLAN 600 Ethernet over MPLS over POS VLAN 600 Host 1 CERN Network Host 2 VLAN 601 IP over MPLS over POS VLAN 601 Host 2 Host 3 Ethernet over POS Host 3

Using Web Services for Tag Data (Example)

Use ~180,000 Tag objects derived from di-jet ORCA events

Each Tag: run & event number, OID of ORCA event object, then E,Phi,Theta,ID for 5 most energetic particles, and E,Phi,Theta for 5 most energetic jets

These Tag events have been used in various performance tests and demonstrations, e.g. SC2000, SC2001, comparison of Objy vs RDBMS query speeds (GIOD), as source data for COJAC, etc.

DB-Independent Access to Object Collections: Middleware Prototype

First layer ODBC provides database vendor abstraction, allowing any relational (SQL) database to be plugged into the system.

Next layer OTL provides an encapsulation of the results of a SQL query in a form natural to C++, namely STL (standard template library) map and vector objects.

Higher levels map C++ object collections to the client’s required format, and transport the results to the client.

Reverse Engineer and Ingest the CMS JETMET nTuples

From the nTuple description:

Derived an ER diagram for the content

An AOD for the JETMET analysis

We then wrote tools to:

Automatically generate a set of SQL CREATE TABLE commands to create the RDBMS tables

Generate SQL INSERT and bulk load scripts that enable population of the RDBMS tables

We imported the ntuples into

 

SQLServer at Caltech Oracle 9i at Caltech

Oracle 9i at CERN

PostgreSQL at Florida

In the future:

Generate a “Tag” table of data that captures the most often used data columns in the nTuple

GAE Demonstrations

iGrid2002 (Amsterdam)

Clarens (Continued)

Servers

Multi-process (based on Apache), using XML/RPC

Similar, but using SOAP

Lightweight (using select loop) single-process

Functionality: file access (read/download/part/whole), directory listing, file selection, file checksumming, access to SRB, security with PKI/VO infrastructure, RDBMS data selection/analysis, MonaLisa integration

Clients

ROOT client … browsing remote file repositories, files

Platform-independent Python-based client for rapid prototyping

Browser based Java/Javascript client (in planning) … for use when no other client packages is desired

Some clients of historical interest e.g. Objectivity

 

Source/Discussion http://clarens.sourceforge.net

Future

Expose POOL as remote data source in Clarens

Clarens peer-to-peer discovery and communication

Globally Scalable Monitoring Service

Lookup Service Push & Pull rsh & ssh scripts; snmp Farm Monitor Lookup Service Proxy Discovery Client (other service) I. Legrand RC Monitor Service

Component Factory

GUI marshaling

Code Transport

RMI data access Farm Monitor

NSF/ITR: A Global Grid-Enabled Collaboratory for Scientific Research (GECSR) and Grid Analysis Environment (GAE)

CHEP 2001, Beijing

Harvey B Newman California Institute of Technology September 6, 2001

GECSR Features

Persistent Collaboration : desktop, small and large conference rooms, halls, Virtual Control Rooms

Hierarchical, Persistent, ad-hoc peer groups (using Virtual Organization management tools)

“ Language of Access ”: an ontology and terminology for users to control the GECSR.

Example: cost of interrupting an expert, virtual open, and partly-open doors.

Support for Human- System (Agent)-Human Human interactions as well as Human-

 

Evaluation, Evolution and Optimisation of the GECSR Agent-based decision support for users

The GECSR will be delivered in “packages” over the course of the (four-year) project

GECSR: First Year Package

NEESGrid: unifying interface and tool launch system

CHEF Framework: portlets for file transfer using GridFTP, teamlets, announcements, chat,shared calendar, role-based access,threaded discussions,document repository

www.chefproject.org

GIS-GIB: a geographic-information-systems-based Grid information broker

VO Management tools (developed for PPDG)

Videoconferencing and shared desktop: VRVS and VNC ( www.vrvs.org

and www.vnc.org

)

MonaLisa: real time system monitoring and user control

The above tools already exist: the effort is in integrating and identifying the missing functionality.

GECSR: Second Year Package

Enhancements, including:

Detachable windows

Web Services Definition (WSDL) for CHEF

Federated Collaborative Servers

Search capabilities

Learning Management Components

Grid Computational Portal Toolkit

Knowledge Book, HEPBook

Software Agents for intelligent searching etc.

Flexible Authentication

Beyond Traditional Architectures: Mobile Agents

“Agents are objects with rules and legs” -- D. Taylor Application

Mobile Agents: (Semi)-Autonomous, Goal Driven, Adaptive

Execute Asynchronously

    

Reduce Network Load: Local Conversations Overcome Network Latency; Some Outages Adaptive

Robust, Fault Tolerant Naturally Heterogeneous Extensible Concept: Coordinated Agent Architectures