CHEETAH: An optical circuit-switched network for eScience

Download Report

Transcript CHEETAH: An optical circuit-switched network for eScience

NSF CHEETAH project
“End-To-End Provisioned Optical
Network Testbed for Large-Scale
eScience Applications”
Xuan Zheng & Malathi Veeraraghavan
Univ. of Virginia
{xuan, mv}@cs.virginia.edu
1
Team & Acknowledgment
• Team (PI/Co-PIs):
–
–
–
–
Malathi Veeraraghavan, Univ. of Virginia
Nagi Rao, Bill Wing, Tony Mezzacappa, ORNL
John Blondin, NCSU
Ibrahim Habib, CUNY
• UVA funding sources:
– NSF EIN grant ANI-0335190 “End-To-End
Provisioned Optical Network Testbed for
Large-Scale eScience Applications”
– NSF ITR small grant ANI-0312376
– DOE FG02-04ER25640
2
Outline
 Background
• CHEETAH
•
•
•
•
Concept
Testbed and peering
Software
Applications: GridFTP, web application
• Conclusions
3
Background: eScience application
requirements for the network
• Our eScience partner: TSI project
– High bandwidth end-to-end for terabyte sized
file transfers
– End-to-end QoS assurance
• remote visualization
• remote computational steering
• CHEETAH: Circuit-Switched High-speed
End-to-End Transport Architecture
TSI: Terascale Supernova Initiative
QoS: Quality of Service
4
Outline
• Background
 CHEETAH
•
•
•
•
Concept
Testbed and peering
Software
Applications: GridFTP, web application
• Conclusions
5
CHEETAH: Circuit-Switched High-speed
End-to-End Transport Architecture
• Optical circuit switched networks
flavor of sharing
– Provide high-speed, end-to-end circuit
connectivity to end hosts on a dynamic
and call-by-call basis
• Meets TSI needs
– High-bandwidth connections for file
transfers
– End-to-end QoS for remote visualization
6
CHEETAH concept
• Use off-the-shelf circuit-based gateways
– that support GMPLS routing and signaling protocols for
dynamic circuit setup/release
• control-plane functionality to be distributed to
network switches
– enables the creation of large-scale shared CO networks
• Require software upgrade on end hosts
End
host
BW-request
BW-request
Bandwidth manager
Connection-oriented
switch
Distributed bandwidth management
Scales to large networks
Bandwidth manager
Connection-oriented
switch
Complete bandwidth
on a link used for one
7
connection
End
host
CHEETAH as an “add-on” service
to the Internet
• Use second NICs at hosts for circuit connectivity
leaving primary NIC for Internet access
Connectionless
Internet
End host
I
Circuit-Switched
Network
Two paths available
End host
II
• Two paths available
• Attempt circuit setup, if rejected,
fall back to using TCP/IP
8
• Can talk to non-CHEETAH end host
CHEETAH topology & equipment
Enterprise networks
bandwidth manager: dynamic distributed sharing
Gbps and 10Gbps
Ethernet
interface
cards
Time-division
multiplexing
optical interface
card
5 GbEs
Ethernet
switch
Hosts
NCSU
Control
Hosts
Raleigh PoP (MCNC)
(Sycamore SN16000)
ORNL PoP
(Sycamore SN16000)
OC192
Ethernet
switch
Atlanta PoP (SoX/SLR)
(Sycamore SN16000)
OC192
(NLR, SLR)
Cray X1
Ethernet
switch
Hosts
Maps GbE to equivalent SONET circuit
G. Tech
9
CHEETAH peering
DC dragon network
Seattle
PNNL
DC PoP
Mclean, VA
George Mason University
Chicago
NERSC
LBL
University of Virginia
Sunnyvale
ANL
FNAL
SLAC
VORTEX
Virginia
Commonwealth
ORNL
University
Raleigh
CalTech
College of William
Atlanta
and Mary
Virginia Tech
Old Dominion University
To CHEETAH Raleigh PoP
10
Cheetah software on end hosts
• Implement cheetah software to run on end hosts
• Integrate with host applications
– applications generate requests for bandwidth as needed
– SHORT-LIVED: increase sharing
– Hold circuit for a few seconds/minutes and release
End-host CHEETAH
software
Applications
Remote viz.
(Ensight)
Web
GridFTP
server
Fixed-Rate
Transport Protocol
(FRTP) designed for
circuits
DNS query
Routing decision
(to check if far end host
to check whether to use
is also on cheetah)
the TCP/IP path or
attempt
a cheetah
Signaling
client
circuit setup)
to request
a circuit
DNS
lookup
Routing
decision
Signaling
client
FRTP
TCP
NIC I
NIC II
Primary TCP/IP path
End-to-end CHEETAH circuit
11
Transport protocol for
end-to-end dedicated circuits
• Requirements & solution:
– No contention for bandwidth resources in network during user
data flow (bandwidth already reserved)
• No congestion control
– Contention at end-hosts due to multitasking
• Flow control: null or window based
– Reliable transfer: error control
• Detect/recover from drops in receiver buffer
– High circuit utilization
• Keep sending rate fixed to match circuit rate
• Hence the name Fixed-Rate Transport Protocol (FRTP)
• Receive rate selection important
• FRTP Implementation
– X. Zheng, A. P. Mudambi, and M. Veeraraghavan, “FRTP: Fixed
Rate Transport Protocol -- A modified version of SABUL for
end-to-end circuits,” Pathnets2004 Workshop in the
Broadnet2004 Conference, Sept. 2004, San Jose, CA.
12
Applications
• GridFTP
• Web applications
13
GridFTP application
• Disk-to-disk transfer considerations
– Hardware solution: RAID striping
• expensive solution
– Split large file into small pieces and store small
files on disks of different hosts in a cluster
• need “collaboration” between each transfer
• not user-friendly
– GridFTP striping with PVFS2 - striping across
disks of different hosts of a cluster
• best solution, but both GridFTP and PVFS2 code need
modifications to use on dedicated circuits
14
GridFTP striped transfer
over PVFS2
•
•
•
PVFS2 (Parallel Virtual File System)
–
Three kinds of roles for nodes in PVFS2
–
Stripes a file across multiple servers like RAID0
•
•
•
Compute node/client: on which applications are run
Metadata server: handles metadata operations
I/O nodes/server: stores file data for PVFS2 file systems
But
–
PVFS2 stripes files starting with a random IO server
–
PVFS2 stripes files starting with a random server (done
with PINT_cached_config_get_next_io() function call in
file src/common/misc/pint-cached-config.c)
jitter = (rand() % num_io_servers);
Change it into jitter = -1 to get a fixed order of data
distribution
Change pvfs2 code:
–
15
But the current GridFTP does not work in this ideal way. The data
channel connections between the sending and receiving sides are arbitrary
because the processing of SPAS and SPOR commands is nondeterministic.
• Does not match with the dedicated circuit model
• Code being modified
Mode E
Mode E
globus-url-copy
SPAS (Listen)
SPOR (Connect)
- returns list of host: port pairs
- connect to the host-port pairs
STOR <FileName>
RETR <FileName>
Control
Control
Host X1
Host A1
Block 1
Block 1
Block 4
Block 4
Host X2
Host A2
Block 2
Block 2
Block 5
Block 5
16
Web Application
Web server
Web client
Web Browser
(e.g. Mozilla)
URL
Response
CHEETAH FT receiver
RSVP-TE
interface
•
download.cgi
CHEETAH FT sender
Data transfer
FRTP
FRTP
RSVP-TE
daemon
•
Web Server
(e.g. Apache)
RSVP-TE
Messages
RSVP-TE
interface
RSVP-TE
daemon
At the web server side
–
–
–
–
Hyperlink to file is a CGI script (download.cgi); filename embedded in hyperlink
Download.cgi started automatically at server when user clicks hyperlink, which
triggers CHEETAH FT sender
CHEETAH FT Sender initiates CHEETAH circuit setup by calling RSVP-TE client.
CHEETAH FT Sender starts data transfer on FRTP/circuit.
–
–
A RSVP-TE client is running as daemon to accept the circuit setup request
A CHEETAH FT receiver is running as daemon to receive the user data
At the web client side
17
Conclusions
• End-to-end dedicated connections appear
to be the right answer for many eScience
applications
– But, many networking problems need to be
solved to achieve cost reduction through scaling
• Utilization concerns: bandwidth sharing + FRTP
• Specific concerns of TSI: TB file handling
– PVFS2 and GridFTP
• Web site: http://cheetah.cs.virginia.edu
18