Transcript title

HENP Grids and Networks Global Virtual Organizations Harvey B Newman

FAST Meeting, Caltech July 1, 2002

http://l3www.cern.ch/~newman/HENPGridsNets_FAST070202.ppt

Computing Challenges: Petabyes, Petaflops, Global VOs

  

Geographical dispersion: Complexity: Scale: of people and resources the detector and the LHC environment Tens of Petabytes per year of data 5000+ Physicists 250+ Institutes 60+ Countries Major challenges associated with: Communication and collaboration at a distance Managing globally distributed computing & data resources Cooperative software development and physics analysis New Forms of Distributed Systems: Data Grids

Four LHC Experiments: The Petabyte to Exabyte Challenge

ATLAS, CMS, ALICE, LHCB Higgs + New particles; Quark-Gluon Plasma; CP Violation

Data stored CPU ~40 Petabytes/Year and UP; 0.30 Petaflops and UP 0.1 to 1 Exabyte (1 EB = 10 18 Bytes) (2007) (~2012 ?) for the LHC Experiments

LHC: Higgs Decay into 4 muons (Tracker only); 1000X LEP Data Rate (+30 minimum bias events) Reconstructed tracks with pt > 25 GeV All charged tracks with pt > 2 GeV 10 9 events/sec, selectivity: 1 in 10 13 (1 person in a thousand world populations)

LHC Data Grid Hierarchy

~PByte/sec CERN/Outside Resource Ratio ~1:2 Tier0/(

Tier1)/(

Tier2) ~1:1:1 Experiment Online System ~100-400 MBytes/sec

Tier 0 +1

CERN 700k SI95 ~1 PB Disk; Tape Robot

Tier 1

IN2P3 Center ~2.5-10 Gbps RAL Center INFN Center FNAL: 200k SI95; 600 TB 2.5-10 Gbps

Tier 3

~2.5-10 Gbps

Tier 2

Tier2 Center

Institute Institute

Physics data cache Workstations 0.1–10 Gbps

Tier 4

Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels

Emerging

Data Grid

User Communities

Grid Physics Network (GriPhyN)

ATLAS, CMS, LIGO, SDSS

  

Particle Physics Data Grid (PPDG) Int’l Virtual Data Grid Lab (iVDGL)

NSF Network for Earthquake Engineering Simulation (NEES)

Integrated instrumentation, collaboration, simulation Access Grid; VRVS: supporting group-based collaboration

    

And

Genomics, Proteomics, ...

The Earth System Grid and EOSDIS Federating Brain Data Computed MicroTomography … Virtual Observatories

Projects

PPDG I

GriPhyN

EU DataGrid EU

PPDG II (CP) USA

iVDGL

USA USA

USA

DataTAG

GridPP

HENP Related Data Grid Projects

EU UK

DOE NSF $11.9M + $1.6M 2000-2005 EC €10M 2001-2004

DOE NSF EC

$2M

$9.5M

$13.7M + $2M €4M PPARC >$15M

1999-2001

2001-2004 2001-2006 2002-2004 2001-2004

LCG (Ph1) CERN MS 30 MCHF 2002-2004

Many Other Projects of interest to HENP

Initiatives in US, UK, Italy, France, NL, Germany, Japan, …

Networking initiatives: DataTAG, AMPATH, CALREN-

XD…

US Distributed Terascale Facility: ($53M, 12 TeraFlops, 40 Gb/s network)

Daily, Weekly, Monthly and Yearly Statistics on 155 Mbps US-CERN Link

20 - 100 Mbps Used Routinely in ’01 BaBar: 600 Mbps Throughput in ‘02 BW Upgrades Quickly Followed by Upgraded Production Use

Tier A

"Physicists have indeed foreseen to test the GRID principles starting first from the Computing Centres in Lyon and Stanford (California). A first step towards the ubiquity of the GRID." Pierre Le Hir

Le Monde 12 april 2001 3/2002 D. Linglin: LCG Wkshop

Two centers are trying to work as one: -Data not duplicated -Internationalization transparent access, etc…

CERN-US Line + Abilene Renater + ESnet

RNP Brazil (to 20 Mbps) FIU Miami/So. America (to 80 Mbps)

Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*]

CMS 2001 2002 2003 2004 2005 2006 100 200 300 600 800 2500 ATLAS BaBar 50 100 300 600 800 2500 300 600 1100 1600 2300 3000 CDF D0 BTeV DESY 100 300 400 2000 3000 6000 400 1600 2400 3200 6400 8000 20 40 100 180 100 210 200 240 300 270 500 300 See http://gate.hep.anl.gov/lprice/TAN CERN BW [*] 155 310 622 2500 5000 10000 20000 Installed BW. Maximum Link Occupancy 50% Assumed

MONARC: CMS Analysis Process Hierarchy of Processes (Experiment, Analysis Groups,Individuals) RAW Data 3000 SI95sec/event 1 job year 3000 SI95sec/event 3 jobs per year Experiment Wide Activity (10 9 events) Reconstruction Re-processing 3 Times per year New detector calibrations Or understanding Monte Carlo 5000 SI95sec/event 25 SI95sec/event ~20 jobs per month ~20 Groups’ Activity (10 9

10 7 events) Selection Iterative selection Once per month Trigger based and Physics based refinements 10 SI95sec/event ~500 jobs per day ~25 Individual per Group Activity (10 6 –10 7 events) Analysis Different Physics cuts & MC comparison ~Once per day Algorithms applied to data to get results

Tier0-Tier1 Link Requirements Estimate: for Hoffmann Report 2001

1) Tier1

Tier0 Data Flow for Analysis 2) Tier2

Tier0 Data Flow for Analysis 3) Interactive Collaborative Sessions (30 Peak) 4) Remote Interactive Sessions (30 Flows Peak) 5) Individual (Tier3 or Tier4) data transfers Limit to 10 Flows of 5 Mbytes/sec each TOTAL Per Tier0 - Tier1 Link 0.5 - 1.0 Gbps 0.2 - 0.5 Gbps 0.1 - 0.3 Gbps 0.1 - 0.2 Gbps 0.8 Gbps 1.7 - 2.8 Gbps NOTE:

Adopted by the LHC Experiments; given in the Steering Committee Report on LHC Computing: “1.5 - 3 Gbps per experiment”

Corresponds to ~10 Gbps Baseline BW Installed on US-CERN Link

Report also discussed the effects of higher bandwidths

For example all-optical 10 Gbps Ethernet + WAN by 2002-3

Tier0-Tier1 BW Requirements Estimate: for Hoffmann Report 2001

Does Not Include more recent ATLAS Data Estimates

270 Hz at 10 33

400 Hz at 10 34 Instead of 100Hz Instead of 100Hz

2 MB/Event Instead of 1 MB/Event ?

Does Not Allow Fast Download to Tier3+4 of “Small” Object Collections

Example: Download 10 7 Events of AODs (10 4 Bytes)

100 Gbytes; At 5 Mbytes/sec per person (above) that’s 6 Hours !

This is a still a rough, bottoms-up, static, and hence Conservative Model.

A Dynamic distributed DB or “Grid” system with Caching, Co-scheduling, and Pre-Emptive data movement may well require greater bandwidth

Does Not Include “Virtual Data” operations; Derived Data Copies; Data-description overheads

Further MONARC Model Studies are Needed

Maximum Throughput on Transatlantic Links (155 Mbps) *

      

8/10/01 105 Mbps reached with 30 Streams: SLAC-IN2P3 9/1/01 102 Mbps in One Stream: CIT-CERN 11/5/01 125 Mbps in One Stream (modified kernel): CIT-CERN 1/09/02 190 Mbps for One stream shared on 2 155 Mbps links 3/11/02 120 Mbps Disk-to-Disk with One Stream on 155 Mbps link (Chicago-CERN) 5/20/02 450 Mbps SLAC-Manchester on OC12 with ~100 Streams 6/1/02 290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel) Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/; and the Internet2 E2E Initiative: http://www.internet2.edu/e2e

Some Recent Events: Reported 6/1/02 to ICFA/SCIC

Progress in High Throughput: 0.1 to 1 Gbps

Land Speed Record: SURFNet – Alaska (IPv6)

(0.4+ Gbps) SLAC – Manchester (Les C. and Richard H-J) (0.4+ Gbps)

 

Tsunami (Indiana) (0.8 Gbps UDP) Tokyo – KEK (0.5 – 0.9 Gbps)

Progress in Pre-Production and Production Networking

10 Mbytes/sec FNAL-CERN (Michael Ernst)

15 Mbytes/sec disk-to-disk Chicago-CERN (Sylvain Ravot)

KPNQwest files for Chapter 11; Stops network yesterday.

Near Term Pricing of Competitor (DT) ok.

Unknown impact on prices and future planning in the medium and longer term

Transoceanic Networking Integrated with the Abilene, TeraGrid, Regional Nets and Continental Network Infrastructures in US, Europe, Asia, South America Baseline BW for the US-CERN Link: HENP Transatlantic WG (DOE+NSF ) Link Bandwidth (Mbps) 20000 15000 10000 5000 0 FY2001 FY2002 FY2003 FY2004 FY2005 FY2006 BW (Mbps) 310 Baseline evolution typical of major HENP links 2001-2006 622 2500 5000 10000 20000

  

US-CERN Link: 622 Mbps this month DataTAG 2.5 Gbps Research Link in Summer 2002 10 Gbps Research Link by Approx. Mid-2003

Total U.S. Internet Traffic

100 Pbps 10 Pbps 1 Pbps 100Tbps 10Tbps 1Tbps 100Gbps 10Gbps 1Gbps 100Mbps 10Mbps 1Mbps 100Kbps 10Kbps 1Kbps 100 bps 10 bps 1970 1975 1980 96 1985 Limit of same % GDP as New Measurements Voice Crossover: August 2000 ARPA & NSF Data to 1990 Voice 2.8X/Year 1995 2000 Projected at 3/Year 4X/Year 2005 2010 U.S. Internet Traffic Source: Roberts et al., 2001

Internet Growth Rate Fluctuates Over Time

U.S. Internet Edge Traffic Growth Rate

6 Month Lagging Measure 4.50

4.00

10/00 –4/01 Growth Reported 3.6/year 10/00 –4/01 Growth Reported 4.0/year 3.50

3.00

2.50

2.00

1.50

1.00

0.50

Average: 3.0/year 0.00

Jan 00 Apr 00 Jul 00 Oct 00 Jan 01 Apr 01 Jul 01 Oct 01 Jan 02

Source: Roberts et al., 2002

AMS-IX Internet Exchange Throughput Accelerating Growth in Europe (NL) Monthly Traffic 2X Growth from 8/00 - 3/01; 2X Growth from 8/01 - 12/01

6.0 Gbps Hourly Traffic 3/22/02 4.0 Gbps 2.0 Gbps

ICFA SCIC Meeting March 9 at CERN: Updates from Members

Abilene Upgrade

from 2.5 to 10 Gbps Additional scheduled lambdas planned for targeted for targeted applications: Pacific and National Light Rail

US-CERN

Upgrade On Track: to 622 Mbps in July; Setup and Testing Done in STARLIGHT

 

2.5G Research Lambda by this Summer: STARLIGHT-CERN 2.5G Triangle between STARLIGHT (US), SURFNet (NL), CERN

SLAC + IN2P3 (BaBar)

Getting 100 Mbps over 155 Mbps CERN-US Link

 

50 Mbps Over RENATER 155 Mbps Link, Limited by ESnet 600 Mbps Throughput is BaBar Target for this Year

FNAL

Expect ESnet Upgrade to 622 Mbps this Month

Plans for dark fiber to STARLIGHT underway, could be done in ~4 Months; Railway or Electric Co. provider

ICFA SCIC: A&R Backbone and International Link Progress

GEANT Pan-European Backbone

 

( http://www.dante.net/geant Now interconnects 31 countries Includes many trunks at 2.5 and 10 Gbps )

UK

2.5 Gbps NY-London, with 622 Mbps to ESnet and Abilene

SuperSINET (Japan): 10 Gbps IP and 10 Gbps Wavelength

 

Upgrade to Two 0.6 Gbps Links, to Chicago and Seattle Plan upgrade to 2 X 2.5 Gbps Connection to US West Coast by 2003

CA*net4 (Canada):

Interconnect customer-owned dark fiber nets across Canada at 10 Gbps, starting July 2002 “Lambda-Grids” by ~2004-5

GWIN (Germany): Connection to Abilene Upgraded to 2 X 2.5 Gbps early in 2002

Russia

Start 10 Mbps link to CERN and ~90 Mbps to US Now

2.5

10 Gbps Backbone 210 Primary Participants All 50 States, D.C. and Puerto Rico 80 Partner Corporations and Non-Profits 22 State Research and Education Nets 15 “GigaPoPs” Support 70% of Members Caltech Connection with GbE to New Backbone

National R&E Network Example Germany: DFN TransAtlanticConnectivity Q1 2002

2 X OC12 Now: NY-Hamburg and NY-Frankfurt

ESNet peering at 34 Mbps

Upgrade to 2 X OC48 expected in Q1 2002

Direct Peering to Abilene and

Canarie expected UCAID will add another 2 OC48’s; Proposing a Global Terabit Research Network (GTRN)

STM 16 

FSU Connections via satellite: Yerevan, Minsk, Almaty, Baikal

Speeds of 32 - 512 kbps

SILK Project (2002): NATO funding

Links to Caucasus and Central Asia (8 Countries)

Currently 64-512 kbps

Propose VSAT for 10-50 X BW: NATO + State Funding

National Research Networks in Japan

NIFS 

SuperSINET

Started operation January 4, 2002

Support for 5 important areas:

Nagoya U 

HEP,

Space/Astronomy, GRIDs Provides 10

’s:

Genetics, Nano-Technology, 10 Gbps IP connection Nagoya

7 Direct intersite GbE links Osaka

Some connections to 10 GbE in JFY2002

Osaka U 

HEPnet-J

Will be re-constructed with MPLS-VPN in SuperSINET

Kyoto U ICR Kyoto-U 

Proposal: Two TransPacific 2.5 Gbps Wavelengths, and Japan-CERN Grid Testbed by ~2003

Internet

NIG

Tokyo

ISAS NAO

IP

WDM path

IP router

KEK NII Chiba NII Hitot.

U Tokyo IMS U-Tokyo

DataTAG Project

NewYork UK SuperJANET4 It GARR-B STARLIGHT GENEVA GEANT ABILEN E ESNET NL SURFnet STAR-TAP CALRE N Fr Renater

EU-Solicited Project. CERN , PPARC (UK), Amsterdam (NL), and INFN (IT); and US (DOE/NSF: UIC, NWU and Caltech) partners

Main Aims:

 

Ensure maximum interoperability between US and EU Grid Projects Transatlantic Testbed for advanced network research

2.5 Gbps Wavelength Triangle 7/02 (10 Gbps Triangle in 2003)

TeraGrid (www.teragrid.org) NCSA, ANL, SDSC, Caltech

A Preview of the Grid Hierarchy and Networks of the LHC Era Abilene Chicago Indianapolis Urbana Caltech San Diego OC-48 (2.5 Gb/s, Abilene) Multiple 10 GbE (Qwest) Multiple 10 GbE (I-WIRE Dark Fiber)

I-WIRE

UIC ANL Starlight / NW Univ

Multiple Carrier Hubs

Ill Inst of Tech Univ of Chicago NCSA/UIUC Indianapolis (Abilene NOC) Idea to extend the TeraGrid to CERN Source: Charlie Catlett, Argonne

CA ONI, CALREN-XD + Pacific Light Rail Backbones (Proposed)

Also: LA-Caltech Metro Fiber; National Light Rail

Key Network Issues & Challenges

Net Infrastructure Requirements for High Throughput

Packet Loss must be ~Zero (at and below 10 -6 )

I.e. No “Commodity” networks

Need to track down uncongested packet loss

No Local infrastructure bottlenecks

Multiple Gigabit Ethernet “clear paths” between

selected host pairs are needed now To 10 Gbps Ethernet paths by 2003 or 2004

TCP/IP stack configuration and tuning Absolutely Required

Large Windows; Possibly Multiple Streams

New Concepts of Fair Use Must then be Developed

Careful Router, Server, Client, Interface configuration

Sufficient CPU, I/O and NIC throughput sufficient

 

End-to-end

monitoring and tracking of performance Close collaboration with local and “regional” network staffs

TCP Does Not Scale to the 1-10 Gbps Range

A Short List: Revolutions in Information Technology (2002-7)

  

Managed

Global Data Grids (As Above) Scalable Data-Intensive Metro and Long Haul

Network Technologies

DWDM: 10 Gbps then 40 Gbps per

; 1 to 10 Terabits/sec per fiber

10 Gigabit Ethernet (See www.10gea.org) 10GbE / 10 Gbps LAN/WAN integration

 

Metro Buildout and Optical Cross Connects Dynamic Provisioning

Dynamic Path Building

“Lambda Grids” Defeating the “Last Mile” Problem (Wireless; or Ethernet in the First Mile)

3G and 4G Wireless Broadband (from ca. 2003); and/or Fixed Wireless “Hotspots”

Fiber to the Home

Community-Owned Networks

A Short List: Coming Revolutions in Information Technology

Storage Virtualization

Grid-enabled Storage Resource Middleware (SRM)

iSCSI (Internet Small Computer Storage Interface); Integrated with 10 GbE

Global File Systems

Internet Information Software Technologies

Global Information “Broadcast” Architecture

E.g the Multipoint Information Distribution Protocol ([email protected])

Programmable Coordinated Agent Architectures

E.g. Mobile Agent Reactive Spaces (MARS)

by Cabri et al., University of Modena The “Data Grid” - Human Interface

Interactive monitoring and control of Grid resources

By authorized groups and individuals

By Autonomous Agents

Year 2001 2002 2003 2005 2007 2008 2010 2012

HENP Major Links: Bandwidth Roadmap (Scenario) in Gbps

Production

0.155 0.622 2.5 10 2-4 X 10 ~10 X 10 or 1-2 X 40 ~5 X 40 or ~20 X 10 ~Terabit

Experimental

0.622-2.5 2.5 10 2-4 X 10 ~10 X 10; 40 Gbps ~5 X 40 or ~20-50 X 10 ~25 X 40 or ~100 X 10 ~MultiTerabit

Remarks

SONET/SDH SONET/SDH DWDM; GigE Integ. DWDM; 1 + 10 GigE Integration

Switch;

Provisioning 1 st Gen.

Grids 40 Gbps

Switching 2 nd Gen

Grids Terabit Networks ~Fill One Fiber or Use a Few Fibers

      

One Long Range Scenario (Ca. 2008 12) HENP As a Driver of Optical Networks Petascale Grids with TB Transactions

Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes from 1 to 1000 Petabyte Data Stores Survivability of the HENP Global Grid System, with hundreds of such transactions per day (circa 2007) requires that each transaction be completed in a relatively short time. Example: Take 800 secs to complete the transaction. Then Transaction Size (TB) Net Throughput (Gbps) 1 10 10 100 100 1000 (Capacity of Fiber Today)

Summary: Providing Switching of 10 Gbps wavelengths within ~3 years; and Terabit Switching within 5-10 years would enable “Petascale Grids with Terabyte transactions”, as required to fully realize the discovery potential of major HENP programs, as well as other data-intensive fields.

Internet2 HENP WG [*]

Mission: To help ensure that the required

National and international network infrastructures (end-to-end)

Standardized tools and facilities for high performance and end-to-end monitoring and tracking, and

Collaborative systems

are developed and deployed in a timely manner, and used effectively to meet the needs of the US LHC and other major HENP Programs, as well as the at-large scientific community.

To carry out these developments in a way that is broadly applicable across many fields

Formed an Internet2 WG as a suitable framework: Oct. 26 2001

[*] Co-Chairs: S. McKee (Michigan), H. Newman (Caltech); Sec’y J. Williams (Indiana)

Website: http://www.internet2.edu/henp ; also see the Internet2 End-to-end Initiative: http://www.internet2.edu/e2e

True End to End Experience

User perception

Application

Operating system

Host IP stack

Host network card

Local Area Network

Campus backbone network

Campus link to regional network/GigaPoP

GigaPoP link to Internet2 national backbones

International connections

EYEBALL APPLICATION STACK JACK NETWORK . . .

. . .

. . .

. . .

HENP Scenario Limitations: Technologies and Costs

Router Technology and Costs (Ports and Backplane)

Computer CPU, Disk and I/O Channel Speeds to Send and Receive Data

Link Costs: Unless Dark Fiber (?)

MultiGigabit Transmission Protocols

End-to-End “100 GbE” Ethernet (or something else) by ~2006: for LANs to match WAN speeds

Throughput quality improvements: BW

TCP

< MSS/(RTT*sqrt(loss)) [*]

80% Improvement/Year

Factor of 10 In 4 Years Eastern Europe Far Behind China Improves But Far Behind [*] See “Macroscopic Behavior of the TCP Congestion Avoidance Algorithm,” Matthis, Semke, Mahdavi, Ott, Computer Communication Review 27(3), 7/1997

11900 Hosts; 6620 Registered Users in 61 Countries 43 (7 I2) Reflectors Annual Growth 2 to 3X

Networks, Grids and HENP

Next generation 10 Gbps network backbones are almost here: in the US, Europe and Japan

First stages arriving, starting now

 

Major transoceanic links at 2.5 - 10 Gbps in 2002-3

Network improvements are especially needed in Southeast Europe, So. America; and some other regions:

Romania, Brazil; India, Pakistan, China; Africa

Removing regional, last mile bottlenecks and compromises in network quality are now

All on the critical path

Getting high (reliable; Grid) application performance across networks means!

End-to-end monitoring; a coherent approach

Getting high performance (TCP) toolkits in users’ hands

Working in concert with AMPATH, Internet E2E, I2 HENP WG, DataTAG; the Grid projects and the GGF