Transcript title
HENP Grids and Networks Global Virtual Organizations
Harvey B Newman, Professor of Physics LHCNet PI, US CMS Collaboration Board Chair WAN In Lab Site Visit, Caltech Meeting the Advanced Network Needs of Science March 5, 2003
Computing Challenges: Petabyes, Petaflops, Global VOs
Geographical dispersion: Complexity: Scale: of people and resources the detector and the LHC environment Tens of Petabytes per year of data 5000+ Physicists 250+ Institutes 60+ Countries Major challenges associated with: Communication and collaboration at a distance Managing globally distributed computing & data resources Cooperative software development and physics analysis New Forms of Distributed Systems: Data Grids
Next Generation Networks for Experiments: Goals and Needs
Large data samples explored and analyzed by thousands of globally dispersed scientists, in hundreds of teams
Providing rapid access to event samples, subsets and analyzed physics results from massive data stores
From Petabytes by 2002, ~100 Petabytes by 2007, to ~1 Exabyte by ~2012.
Providing analyzed results with rapid turnaround, by coordinating and managing the large but
LIMITED
computing, data handling and
NETWORK
resources effectively
Enabling rapid access to the data and the collaboration
Across an ensemble of networks of varying capability
Advanced integrated applications, such as Data Grids, rely on seamless operation of our LANs and WANs
With reliable, monitored, quantifiable high performance
First Beams : April 2007 Physics Runs: July 2007 ALICE : heavy ions, p-ions LHC
pp
s = 14 TeV Heavy ions L design = 10 34 (e.g. Pb-Pb at cm -2 s -1
s ~ 1000 TeV) TOTEM
27 Km ring 1232 dipoles B=8.3 T (NbTi at 1.9 K)
CMS and ATLAS: pp, general purpose LHCb : pp, B-physics
US
LHC Collaborations
ATLAS CMS
The US provides about 20-25% of the author list in both experiments
US CMS Accelerator US ATLAS
US LHC INSTITUTIONS
Four LHC Experiments: The Petabyte to Exabyte Challenge
ATLAS, CMS, ALICE, LHCB Higgs + New particles; Quark-Gluon Plasma; CP Violation
Data stored CPU ~40 Petabytes/Year and UP; 0.30 Petaflops and UP 0.1 to 1 Exabyte (1 EB = 10 18 Bytes) (2008) (~2013 ?) for the LHC Experiments
LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate (+30 minimum bias events) Reconstructed tracks with pt > 25 GeV All charged tracks with pt > 2 GeV 10 9 events/sec, selectivity: 1 in 10 13 (1 person in a thousand world populations)
LHC Data Grid Hierarchy
~PByte/sec CERN/Outside Resource Ratio ~1:2 Tier0/(
Tier1)/(
Tier2) ~1:1:1 Experiment Online System
Tier 0 +1
~100-1500 MBytes/sec CERN 700k SI95 ~1 PB Disk; Tape Robot
Tier 1
IN2P3 Center ~2.5-10 Gbps RAL Center INFN Center FNAL: 200k SI95; 600 TB 2.5-10 Gbps
Tier 3
~2.5-10 Gbps
Tier 2
Tier2 Center
Institute Institute
Physics data cache Workstations 0.1–10 Gbps
Tier 4
Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels
Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*]
CMS 2001 2002 2003 2004 2005 2006 100 200 300 600 800 2500 ATLAS 50 100 300 600 800 2500 BaBar CDF D0 300 100 600 1100 1600 2300 3000 300 400 2000 3000 6000 400 1600 2400 3200 6400 8000 BTeV DESY 20 100 40 180 100 210 200 240 300 270 500 300 CERN BW [*] 155 310 622 2500 5000 10000 20000 Installed BW. Maximum Link Occupancy 50% Assumed See http://gate.hep.anl.gov/lprice/TAN
History – One large Research Site
Much of the Traffic: SLAC
IN2P3/RAL/INFN; via ESnet+France; Abilene+CERN Current Traffic ~400 Mbps; ESNet Limitation Projections: 0.5 to 24 Tbps by ~2012
Progress: Max. Sustained TCP Thruput on Transatlantic and US Links
* 8-9/01 105 Mbps 30 Streams: SLAC-IN2P3; 102 Mbps 1 Stream CIT-CERN 11/5/01 125 Mbps in One Stream (modified kernel): CIT-CERN 1/09/02 190 Mbps for One stream shared on 2 155 Mbps links 3/11/02 120 Mbps Disk-to-Disk with One Stream on 155 Mbps link (Chicago-CERN) 5/20/02 450-600 Mbps SLAC-Manchester on OC12 with ~100 Streams 6/1/02 290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel) 9/02 850, 1350, 1900 Mbps Chicago-CERN 1,2,3 GbE Streams, OC48 Link 11-12/02 FAST: 940 Mbps in 1 Stream SNV-CERN; 9.4 Gbps in 10 Flows SNV-Chicago Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/; and the Internet2 E2E Initiative: http://www.internet2.edu/e2e
American partners
US-CERN OC48 Deployment
Phase One European partners Cisco 7609 OC12 (Production) OC48 (Develop and Test) Cisco 7609 Caltech (DOE) PoP in Chicago CERN - Geneva
Phase two American partners Cisco 7606 Cisco 7609 Caltech (DOE) PoP in Chicago OC48 (Develop and Test) OC12 (Production) Cisco 7606 European partners Cisco 7609 CERN - Geneva CERN
American partners
OC 48 deployment (Cont’d)
Phase three (Late 2002) Alcatel 7770 DataTAG (CERN) Cisco 7606 Caltech (DoE) Juniper M10 Caltech (DoE) OC48 2.5 Gbps Alcatel 7770 DataTAG (CERN) Cisco 7606 DataTAG (CERN) Juniper M10 DataTAG (CERN) European partners Cisco 7609 Caltech (DoE) OC12 622 Mbps Cisco 7609 DataTAG (CERN) CERN
Separate environments for tests and production Transatlantic testbed dedicated to advanced optical network research and intensive data access applications
DataTAG Project
UK SuperJANET4 It GARR-B NewYork STARLIGHT GENEVA GEANT ABILENE ESNET NL SURFnet STAR-TAP CALREN 2 Fr INRIA Atrium VTHD
EU-Solicited Project. CERN , PPARC (UK), Amsterdam (NL), and INFN (IT); and US (DOE/NSF: UIC, NWU and Caltech) partners
Main Aims:
Ensure maximum interoperability between US and EU Grid Projects Transatlantic Testbed for advanced network research
2.5 Gbps Wavelength Triangle from 7/02; to 10 Gbps Triangle by Early 2003
LHCnet Network : March 2003
GEANT
CERN -Geneva
Cisco 7606 CERN Switch Linux PC for Performance tests & Monitoring OC12 – 622 Mbps IN2P3 WHO Alcatel 7770 DataTAG (CERN) Cisco 7609 DataTAG (CERN) Optical Mux/Dmux Alcatel 1670 Juniper M10 DataTAG (CERN) OC48 – 2,5 Gbps Cisco 7606 Caltech (DoE) Linux PC for Performance tests & Monitoring Optical Mux/Dmux Alcatel 1670 Alcatel 7770 DataTAG (CERN) Cisco 7609 Caltech (DoE) Juniper M10 Caltech (DoE)
Caltech/DoE PoP – StarLight Chicago
Abilene Development and tests ESnet NASA MREN STARTAP
FAST (Caltech): A Scalable, “Fair” Protocol for Next-Generation Networks: from 0.1 To 100 Gbps I2 LSR 29.3.00
multiple 9.4.02
1 flow 22.8.02
IPv6 SC2002 10 flows SC2002 2 flows SC2002 1 flow SC2002 11/02 Highlights of FAST TCP
Standard Packet Size 940 Mbps single flow/GE card 9.4 petabit-m/sec 1.9 times LSR 9.4 Gbps with 10 flows 37.0 petabit-m/sec 6.9 times LSR
22 TB in 6 hours; in 10 flows Implementation
Sender-side (only) mods
Delay (RTT) based Stabilized Vegas Internet: distributed feedback system Theory Experiment
R f (s)
TCP
R b ’ (s)
URL: netlab.caltech.edu/FAST
AQM
p
Sunnyval e 3000km Next: 10GbE; 1 GB/sec disk to disk Geneva Baltimore C. Jin, D. Wei, S. Low FAST Team & Partners
TeraGrid (www.teragrid.org) NCSA, ANL, SDSC, Caltech, PSC
A Preview of the Grid Hierarchy and Networks of the LHC Era Abilene Chicago Urbana Indianapolis Caltech San Diego OC-48 (2.5 Gb/s, Abilene) Multiple 10 GbE (Qwest) Multiple 10 GbE (I-WIRE Dark Fiber)
I-WIRE
UIC ANL Starlight / NW Univ
Multiple Carrier Hubs
Ill Inst of Tech Univ of Chicago NCSA/UIUC Indianapolis (Abilene NOC) Source: Charlie Catlett, Argonne
National Light Rail Footprint
SEA POR SAC SVL NYC BOS OGD CHI DEN FRE CLE PIT WDC KAN LAX SDG PHO OLG STR DAL 15808 Terminal, Regen or OADM site Fiber route NAS WAL RAL ATL JAC NLR
Buildout Started November 2002
Initially 4 10G Wavelengths
To 40 10G Waves in Future Transition now to optical, multi-wavelength R&E networks: US, Europe and Intercontinental (US-China-Russia) Initiatives; Efficient use of Wavelengths is an Essential Part of this Picture
HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps
Year 2001 2002 Production
0.155 0.622
Experimental
0.622-2.5 2.5
Remarks
SONET/SDH SONET/SDH DWDM; GigE Integ.
2003 2005
2.5 10 10 2-4 X 10 DWDM; 1 + 10 GigE Integration
Switch;
Provisioning 1 st Gen.
Grids
2007
2-4 X 10 ~10 X 10; 40 Gbps
2009
~10 X 10 or 1-2 X 40 ~5 X 40 or ~20-50 X 10 40 Gbps
Switching
2011
~5 X 40 or ~20 X 10 ~25 X 40 or ~100 X 10 2 nd Gen
Grids Terabit Networks
2013
~T erabit ~MultiTbps ~Fill One Fiber Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade; We are Rapidly Learning to Use and Share Multi-Gbps Networks
HENP Lambda Grids: Fibers for Physics
Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data Stores Survivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007) requires that each transaction be completed in a relatively short time. Example: Take 800 secs to complete the transaction. Then Transaction Size (TB) Net Throughput (Gbps) 1 10 10 100 100 1000 (Capacity of Fiber Today) Summary: Providing Switching of 10 Gbps wavelengths within ~3-5 years; and Terabit Switching within 5-8 years would enable “Petascale Grids with Terabyte transactions”, as required to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields.
Emerging
Data Grid
User Communities
NSF Network for Earthquake Engineering Simulation (NEES)
Integrated instrumentation, collaboration, simulation
Grid Physics Network (GriPhyN)
ATLAS, CMS, LIGO, SDSS
Access Grid; VRVS: supporting group-based collaboration
And
Genomics, Proteomics, ...
The Earth System Grid and EOSDIS Federating Brain Data Computed MicroTomography …
Virtual Observatories
COJAC : C MS O RCA J ava A nalysis C omponent: Java3D Objectivity JNI Web Services Demonstrated Caltech-Rio de Janeiro (2/02) and Chile (5/02)
CAIGEE
C MS A nalysis – an I ntegrated G rid E nabled E nvironment
CIT, UCSD, Riverside, Davis; + UCLA, UCSB NSF ITR – 50% so far Lightweight , functional , making use of existing software AFAP
Plug-in Architecture detail with Feedback based on Web Services Expose Grid “ Global System ” to physicists – at various levels of
Supports Request , Preparation , Production , Movement , Analysis of Physics Object Collections
Initial Target: Californian US-CMS physicists
Future: Whole US CMS and CMS
Clarens
HTTP/SOAP/RPC
The Clarens Remote (Light Client) Dataserver: a WAN system for remote data analysis
Clarens servers are deployed at Caltech, Florida, UCSD, FNAL, Bucharest; Extend to UERJ in Rio (CHEPREO)
SRB now installed as Clarens service on Caltech Tier2 (Oracle backend)
Server
NSF ITR: Globally Enabled Analysis Communities
Develop and build Dynamic Workspaces
Build Private Grids to support scientific analysis communities
Using Agent Based Peer-to-peer Web Services
Construct Autonomous Communities Operating Within Global Collaborations
Empower small groups of scientists (Teachers and Students) to profit from and contribute to int’l big science
Drive the democratization of science via the deployment of new technologies
NSF ITR: Key New Concepts
Dynamic Workspaces
Provide capability for individual and sub-community to request and receive expanded, contracted or otherwise modified resources, while maintaining the integrity and policies of the Global Enterprise Private Grids
Provide capability for individual and community to request, control and use a heterogeneous mix of Enterprise wide and community specific software, data, meta-data, resources Build on Global Managed End-to-end Grid Services Architecture; Monitoring System Autonomous, Agent-Based, Peer-to-Peer
Private Grids and P2P Sub Communities in Global CMS
A Global Grid Enabled Collaboratory for Scientific Research (GECSR)
Caltech (HN PI,JB:CoPI)
Michigan (CoPI,CoPI)
Maryland (CoPI)
and Senior Personnel from
Lawrence Berkeley Lab
Oklahoma
Fermilab
Arlington (U. Texas)
Iowa
Florida State The first Grid-enabled Collaboratory: Tight integration between
Science of Collaboratories,
Globally scalable working environment
A Sophisticated Set of Collaborative Tools (VRVS, VNC; Next-Gen)
Agent based monitoring and decision support system (MonALISA)
GECSR
Initial targets are the global HENP collaborations, but GESCR is expected to be widely applicable to other large scale collaborative scientific endeavors
“Giving scientists from all world regions the means to function as full partners in the process of search and discovery”
The importance of Collaboration Services is highlighted in the Cyberinfrastructure report of Atkins et al. 2003
Current Grid Challenges: Secure Workflow Management and Optimization
Maintaining a Global View of Resources and System State
Coherent end-to-end System Monitoring
Adaptive Learning: new algorithms and strategies for execution optimization (increasingly automated)
Workflow: Strategic Balance of Policy Versus Moment-to-moment Capability to Complete Tasks
Balance High Levels of Usage of Limited Resources Against Better Turnaround Times for Priority Jobs
Goal-Oriented Algorithms; Steering Requests According to (Yet to be Developed) Metrics
Handling User-Grid Interactions: Guidelines; Agents
Building Higher Level Services, and an Integrated Scalable User Environment for the Above
14000+ Hosts; 8000+ Registered Users in 64 Countries 56 (7 I2) Reflectors Annual Growth 2 to 3X
MonaLisa:
A Globally Scalable Grid Monitoring System
By I. Legrand (Caltech) Deployed on US CMS Grid Agent-based Dynamic information / resource discovery mechanism Talks w/Other Mon. Systems Implemented in
Java/Jini; SNMP
WDSL / SOAP with UDDI Part of a Global
Grid Control Room
Service
Distributed System Services Architecture (DSSA): CIT/Romania/Pakistan
Lookup
Agents: Autonomous, Auto-
Discovery
discovering, self-organizing,
Service
collaborative
“Station Servers” (static) host mobile “Dynamic Services” Lookup Service Lookup Service Servers interconnect dynamically; form a robust fabric in which mobile agents travel, with a payload of (analysis) tasks Adaptable to Web services: OGSA; and many platforms
Adaptable to Ubiquitous, mobile working environments
Station Server Station Server Station Server Managing Global Systems of Increasing Scope and Complexity, In the Service of Science and Society, Requires A New Generation of Scalable, Autonomous, Artificially Intelligent Software Systems
MONARC SONN: 3 Regional Centres Learning to Export Jobs (Day 9)
1MB/s ; 150 ms RTT CERN 30 CPUs CALTECH 25 CPUs NUST 20 CPUs
Day = 9 By I. Legrand
Networks, Grids, HENP and WAN-in-Lab
Current generation of 2.5-10 Gbps network backbones arrived in the last 15 Months in the US, Europe and Japan
Major transoceanic links also at 2.5 - 10 Gbps in 2003 Capability Increased ~4 Times, i.e. 2 3 Times Moore’s
Reliable high End-to-end Performance of network applications (large file transfers; Grids) is required. Achieving this requires:
A Deep understanding of Protocol Issues, for efficient use
of wavelengths in the 1 to 10 Gbps range now, and higher speeds (e.g. 40 to 80 Gbps) in the near future Getting high performance (TCP) toolkits in users’ hands End-to-end monitoring; a coherent approach
Removing Regional, Last Mile Bottlenecks and Compromises in Network Quality are now
On the critical path, in all regions
We will Work in Concert with AMPATH, Internet2, Terena, APAN; DataTAG, the Grid projects and the Global Grid Forum
A WAN in Lab facility, available to the Community, is a Key Element in achieving these revolutionary goals
Some Extra Slides Follow
Global Networks for HENP
National and International Networks, with sufficient (rapidly increasing) capacity and capability, are essential for
The daily conduct of collaborative work in both experiment and theory
Detector development & construction on a global scale; Data analysis involving physicists from all world regions
The formation of worldwide collaborations
The conception, design and implementation of next generation facilities as “global networks”
“Collaborations on this scale would never have been attempted, if they could not rely on excellent networks”
The Large Hadron Collider (2007-)
The Next-generation Particle Collider
The largest superconductor installation in the world
Bunch-bunch collisions at 40 MHz, Each generating ~20 interactions
Only one in a trillion may lead to a major physics discovery
Real-time data filtering: Petabytes per second to Gigabytes per second
Accumulated data of many Petabytes/Year
Education and Outreach
QuarkNet has 50 centers nationwide (60 planned)
has:
Each center 2-6 physicist
2-12 mentors teachers*
*
Depending on year of the program and local variations
A transatlantic testbed
Multiplexing of optical signals into a single OC- 48 transatlantic optical channel
Multi platforms
Vendor independent
Interoperability tests
Performance tests Layer 2 services:
Circuit-Cross-Connect (CCC)
Layer 2 VPN
IP services:
Multicast
IPv6
QoS
Future: GMPLS
GMPLS is an extension of MPLS
Service Implementation example
Abilene & ESnet 10 GbE VLAN 600 Host 1 VLAN 601 BGP IBGP 2,5 Gb/s Optical multiplexer Optical multiplexer BGP GEANT Host 2 2,5 Gb/s CCC GbE OSPF CERN Network Host 1 Host 2 Host 3 Host 3 StarLight- Chicago CERN - Geneva
Logical view of previous implementation
Abilene & ESnet GEANT 2,5 Gb/s POS IP over POS Host 1 VLAN 600 Ethernet over MPLS over POS VLAN 600 Host 1 CERN Network Host 2 VLAN 601 IP over MPLS over POS VLAN 601 Host 2 Host 3 Ethernet over POS Host 3
Using Web Services for Tag Data (Example)
Use ~180,000 Tag objects derived from di-jet ORCA events
Each Tag: run & event number, OID of ORCA event object, then E,Phi,Theta,ID for 5 most energetic particles, and E,Phi,Theta for 5 most energetic jets
These Tag events have been used in various performance tests and demonstrations, e.g. SC2000, SC2001, comparison of Objy vs RDBMS query speeds (GIOD), as source data for COJAC, etc.
DB-Independent Access to Object Collections: Middleware Prototype
First layer ODBC provides database vendor abstraction, allowing any relational (SQL) database to be plugged into the system.
Next layer OTL provides an encapsulation of the results of a SQL query in a form natural to C++, namely STL (standard template library) map and vector objects.
Higher levels map C++ object collections to the client’s required format, and transport the results to the client.
Reverse Engineer and Ingest the CMS JETMET nTuples
From the nTuple description:
Derived an ER diagram for the content
An AOD for the JETMET analysis
We then wrote tools to:
Automatically generate a set of SQL CREATE TABLE commands to create the RDBMS tables
Generate SQL INSERT and bulk load scripts that enable population of the RDBMS tables
We imported the ntuples into
SQLServer at Caltech Oracle 9i at Caltech
Oracle 9i at CERN
PostgreSQL at Florida
In the future:
Generate a “Tag” table of data that captures the most often used data columns in the nTuple
GAE Demonstrations
iGrid2002 (Amsterdam)
Clarens (Continued)
Servers
Multi-process (based on Apache), using XML/RPC
Similar, but using SOAP
Lightweight (using select loop) single-process
Functionality: file access (read/download/part/whole), directory listing, file selection, file checksumming, access to SRB, security with PKI/VO infrastructure, RDBMS data selection/analysis, MonaLisa integration
Clients
ROOT client … browsing remote file repositories, files
Platform-independent Python-based client for rapid prototyping
Browser based Java/Javascript client (in planning) … for use when no other client packages is desired
Some clients of historical interest e.g. Objectivity
Source/Discussion http://clarens.sourceforge.net
Future
Expose POOL as remote data source in Clarens
Clarens peer-to-peer discovery and communication
Globally Scalable Monitoring Service
Lookup Service Push & Pull rsh & ssh scripts; snmp Farm Monitor Lookup Service Proxy Discovery Client (other service) I. Legrand RC Monitor Service
Component Factory
GUI marshaling
Code Transport
RMI data access Farm Monitor
NSF/ITR: A Global Grid-Enabled Collaboratory for Scientific Research (GECSR) and Grid Analysis Environment (GAE)
CHEP 2001, Beijing
Harvey B Newman California Institute of Technology September 6, 2001
GECSR Features
Persistent Collaboration : desktop, small and large conference rooms, halls, Virtual Control Rooms
Hierarchical, Persistent, ad-hoc peer groups (using Virtual Organization management tools)
“ Language of Access ”: an ontology and terminology for users to control the GECSR.
Example: cost of interrupting an expert, virtual open, and partly-open doors.
Support for Human- System (Agent)-Human Human interactions as well as Human-
Evaluation, Evolution and Optimisation of the GECSR Agent-based decision support for users
The GECSR will be delivered in “packages” over the course of the (four-year) project
GECSR: First Year Package
NEESGrid: unifying interface and tool launch system
CHEF Framework: portlets for file transfer using GridFTP, teamlets, announcements, chat,shared calendar, role-based access,threaded discussions,document repository
www.chefproject.org
GIS-GIB: a geographic-information-systems-based Grid information broker
VO Management tools (developed for PPDG)
Videoconferencing and shared desktop: VRVS and VNC ( www.vrvs.org
and www.vnc.org
)
MonaLisa: real time system monitoring and user control
The above tools already exist: the effort is in integrating and identifying the missing functionality.
GECSR: Second Year Package
Enhancements, including:
Detachable windows
Web Services Definition (WSDL) for CHEF
Federated Collaborative Servers
Search capabilities
Learning Management Components
Grid Computational Portal Toolkit
Knowledge Book, HEPBook
Software Agents for intelligent searching etc.
Flexible Authentication
Beyond Traditional Architectures: Mobile Agents
“Agents are objects with rules and legs” -- D. Taylor Application
Mobile Agents: (Semi)-Autonomous, Goal Driven, Adaptive
Execute Asynchronously
Reduce Network Load: Local Conversations Overcome Network Latency; Some Outages Adaptive
Robust, Fault Tolerant Naturally Heterogeneous Extensible Concept: Coordinated Agent Architectures