UVA Presentation - Electrical and Computer Engineering

Download Report

Transcript UVA Presentation - Electrical and Computer Engineering

Resource Optimization in Hybrid
Core Networks with 100G Links
Malathi Veeraraghavan and Zhenzhen Yan
University of Virginia
Date: Oct. 5-6, 2010
(Collaborator: Admela Jukan)
•
Outline
– HNTES System Architecture
– HNTES Demo on Tabletop testbed
– Details of OFAT: Offline Flow Analysis Tool
– PerfSONAR OWAMP data analysis
– Simulation study
– Special issue of IEEE Communications Magazine
– Summary
Sponsored by DOE ASCR grant DE-SC0002350
1
Two web sites
• External project web site:
– http://www.ece.virginia.edu/mv/researc
h/DOE09/index.html
– Software, documents, links to papers
posted
• Collaboration web site:
– https://collab.itc.virginia.edu/portal/site
/e121f110-7b37-4021-8ac14d61197c067a
2
Hybrid Network Traffic
Engineering Software (HTNES)
• MFDB: Monitored Flow Data Base
• Some components can be centralized and rest
distributed
3
Components
• Offline Analysis Tool (OFAT): statistical
R programs to identify heavy-hitter flows
– Most challenging component
– Leverage “human knowledge” about large file
transfer servers and applications
– Populate MFDB
• Monitored flow data base (MFDB)
Source IP
address
Destination
IP address
Protocol
Source
port
Destination
port
Of monitored flow (not all fields are required for each flow)
Status
MonitoredRe
directed
Disabled
Bypass
circuit
endpoints
IP
addresses
and VLAN
IDs
Circuit
duration
Circuit
rate
4
Components contd.
• Flow Monitoring Module
– receives packets for flows in MFDB mirrored to it from the
routers (mirroring pre-configured for MFDB flows)
– initiates the reservation and provisioning of a circuit
– initiates circuit release when packet flow “ends”
• IDC interface
– interfaces with Inter-Domain Controller (IDC)
• IDC (not part of HNTES software)
– sets up circuit on request
– set PBR route in IP router to redirect packets from default
IP-routed path to newly established circuits
– removes PBR entry when circuit is released
5
Components contd.
• MFDB-interface module
– supports human and programmatic interface
to MFDB
• Router control interface module
– Configures mirroring for MFDB flows
6
Four video files
recording experiments
• Demonstration of the MFDB in
MySQL and MFDB-interface modules
• Demonstration of the flow monitoring
module (FMM)
– Flow gets redirected
– Deactive interface to “prove”
redirection
• IDCIM
7
OFAT output file
• After analysis of Netflow data, an
ASCII file is output
• Format
srcIP
dstIP srcport dstport prot
140.90.192.0 129.186.248.0 20 -1 6
165.112.0.0 128.193.216.0 20 -1 6
165.112.0.0 169.230.88.0 20 -1 6
• -1 is used to indicate “don’t match’’
8
MFDB and MFDB interface
module on tabletop testbed
• Monitored Flow DataBase (MFDB) use
MySQL
• MySQL was installed by ESnet team
on Diskpt-1
• We implemented and tested MFDB on
Diskpt-1 installation
• FIRST DEMO VIDEO
9
FMM demo on the
ANI Tabletop Testbed
Diskpt-1
BWdetail
send
MFDB
FMM
Diskpt-2
Step 2: Start
BWdetail (extended
version of iperf)
Step 3
Step 1b: Enter
BWdetail flow
5-tuple in MFDB
BWdetail
recv
tcpdump
A
North-rt1
Step 1a:
Configure router
to mirror
packets destined
to Diskpt-2 to
interface A
South-rt1
Step 3: When FMM receives
packets, it checks the 5-tuple
against the MFDB
FMM DEMO VIDEO 1
10
Tabletop experiment
Diskpt-1
Diskpt-2
BWdetail
recv
BWdetail
send
MFDB
tcpdump
FMM
eth6
192.168.100.50
xe-0/0/1
192.168.100.49
North-rt1
eth1
192.168.255.50
lo
172.16.0.17
eth2
192.168.100.54
GRE Tunnel
Diskpt-1:gtun(192.168.100.78)
North-rt1:gr-0/3/0.0(192.168.100.77)
xe-0/0/2
192.168.100.17
xe-1/2/0
192.168.100.33
xe-0/0/0
192.168.100.18
xe-1/2/0
192.168.100.34
FMM DEMO VIDEO 2
ge-1/0/2
192.168.100.53
South-rt1
11
Four video files
recording experiments
• Demonstration of the MFDB in MySQL
and MFDB-interface modules
• Demonstration of the flow monitoring
module (FMM)
– Flow gets redirected
Deactive interface to “prove” redirection
• IDCIM
12
IDCIM demo
• IDC Interface Module (IDCIM) run on host zelda2
at UVA
• Started with OSCARS Java Client
• Connect to OSCARS test IDC server and
subscribe for notifications
– ./run.sh subscribe -repo conf/axis-tomcat -url
https://oscarsdev.es.net/axis2/services/OSCARSNotify producer https://oscarsdev.es.net/axis2/services/OSCARS -consumer
http://128.143.10.221:8070 -topics idc:INFO
• Automatic signaling
13
IDC-Interface Module (IDCIM)
interaction with FMM and IDC
14
Threaded design
(C++)
(Java)
15
IDCIM demo
• Run client (submodule to be integrated into
FMM)
• Send request for circuit from client to
IDCIM
• IDCIM (Java) packages per IDCP and
sends to IDC with “now” request and
automatic signaling
• When PathSetup is confirmed, IDCIM
notifies client (FMM)
• IDCIM DEMO VIDEO
16
Outline check
• HNTES System Architecture
• HNTES Demo on Tabletop testbed
 Details of OFAT: Offline Flow Analysis Tool
 Discussion of what flows to redirect to circuits
•
•
•
•
PerfSONAR OWAMP data analysis
Elephant-vs-mice simulation study
Special issue of IEEE Communications Magazine
Summary
17
OFAT demonstration
Not yet terminated
http://www.ece.virginia.edu/mv/research/DOE09/index.html
http://www.ece.virginia.edu/mv/research/DOE09/software/software.html
18
OFAT: Offline
Flow Analysis Tool
• Design:
– Download Netflow data from router
– Use flow-export tools to get ASCII file
– Shows 5-tuple, bytes, timestamps of first and
last packet in flow
– Statistical package R programs:
• Find flowlengths and isolate out flows of length 59sec
from each 5-min file (FindLongFlow.R)
• Concatenate flows from all 5-minute files in one day
(one week): (Concatenate.R)
– gaps (1-in-100 sampling): 5-minute gaps acceptable
– “definition” of “long flow”: >= 10minutes
– Output: all flows longer than 10 minutes
19
Methodology contd.
• Sort long flows by protocol number and only save
tcp, GRE, ESP, AH, flows (removed ICMP and UDP)
and print statistics (Protocol.R)
• Sort on ip protocol field and src and dst ports, and
separate out flows for different applications into
different files (Port.R)
• Match long flow IDs and short flow IDs
(LongShortMatch1.R and *2.R)
• Identifies long flows that did not occur as short
flows (AddFlowLength.R)
• Output: MFDB_Long_Only_Flowlength (5-tuple
and duration; Use -1 for don’t care fields)
20
Data files and R files
• Data files from KANS I2 router, July 7,
2009: to demonstrate
• AllLongFlow.txt
• Statistics.txt
• One example: rsync.txt
• File used to populate MFDB:
MFDB_Long_Only_Flowlength
• Example R files: FindLongFlow.R, Concatenate.R
21
Three-track approach to
understanding long flows
I. Netflow data
(analysis with R programs)
Long flows separated by apps
III. Understanding applications
(with tcpdump, talking to
developers: SCP, SFTP,
GridFTP, BBCP)
II. Network requirements
workshop reports
(“human knowledge” mining)
IP addresses for scientific computing
(data transfers) servers
Goal: Identify suitable
candidate flows for the MFDB
22
Track I: Netflow analysis
• CHIC and LOSA routers of Internet2
• One-day data analysis
• 5-day (Mon-Fri) analysis
23
Unidata one-day (HYNES)
srcIP
dstIP
srcPort
dstPort
128.117.136.0
35.8.8.0
388
38650
128.117.136.0
35.8.8.0
388
128.117.136.0
35.8.8.0
128.117.136.0
protocol
firstunix
lastunix
flowlength
6
1246923494
1246923553
59.00099993
38650
6
1246923734
1246923794
59.53600001
388
38650
6
1246923794
1246923854
59.56599998
35.8.8.0
388
38650
6
1246923854
1246923914
59.69000006
128.117.136.0
35.8.8.0
388
38650
6
1246923914
1246923973
59.398
128.117.136.0
35.8.8.0
388
38650
6
1246923974
1246924034
59.648
128.117.136.0
35.8.8.0
388
38650
6
1246924154
1246924214
59.80200005
128.117.136.0
35.8.8.0
388
38650
6
1246924214
1246924274
59.68599987
128.117.136.0
35.8.8.0
388
38650
6
1246924274
1246924334
59.61099982
128.117.136.0
35.8.8.0
388
38650
6
1246924334
1246924394
59.45600009
128.117.136.0
35.8.8.0
388
38650
6
1246924394
1246924454
59.48799992
128.117.136.0
35.8.8.0
388
38650
6
1246924455
1246924514
59.08899999
128.117.136.0
35.8.8.0
388
38650
6
1246924514
1246924573
59.00699997
128.117.136.0
35.8.8.0
388
38650
6
1246924574
1246924633
59.36100006
14 minutes
Between NCAR and Michigan State University
24
Top ten fat flows in one day
bytes
srcIP
dstIP
srcport
dstport
542582720
198.108.24.0
210132404
204.228.64.0
0
0
50
1246879655
1246891060
11405.383
131.225.192.0
129.114.48.0
45677
22
6
1246879655
1246890520
10865.413
186519604
128.135.64.0
131.142.152.0
22
58942
6
1246891541
1246893460
1919.008
165567747
198.32.8.0
198.32.8.0
0
0
47
1246874013
1246874133
119.8869998
146416660
208.100.88.0
141.142.24.0
43094
22
6
1246861049
1246868372
7322.738
127799228
208.100.88.0
141.142.24.0
43094
22
6
1246882716
1246890100
7384.865
113470332
128.117.136.0
128.255.24.0
388
42707
6
1246861049
1246879356
18306.468
106577448
198.108.24.0
204.228.64.0
0
0
50
1246921790
1246923291
1500.479
101287624
131.225.192.0
129.114.48.0
45677
22
6
1246873833
1246878456
4622.681
152.46.0.0
128.255.56.0
873
1934
6
1246861049
1246869092
8042.826
91662040
protocol
firstunix
lastunix
flow length
encapsulated:3; ssh:5; Unidata:1; rsync:1
2 long ssh flows to University of Texas at Austin Texas Advanced Computing
Center (129.114.48.0) from Fermilab (131.225.192.0).
141.142.24.0 corresponds to NCSA (National Center for Supercomputing
Applications) for two other ssh flows.
The Unidata LDM flow is from NCAR (National Center for Atmospheric
Research) with address 128.117.136.0.
25
Data for a per-day basis five-weekday
period (July 6-10, 2009)
date
July 06
July 07
July 08
July 09
July 10
Longest
18306
(113470332
bytes)
25326
(112768756
bytes)
21307
(80492044
bytes)
15364
(30196708
bytes)
23825
(1544080
bytes)
fattest
542582720
(11405 seconds)
867174480
(10443 seconds)
185912032
(4562 seconds)
241363080
(3360 seconds)
310908448
(779 seconds)
longest
26882
(58425675
bytes)
23220
(216722816
bytes)
18004
(26475856
bytes)
19986
(40811386
bytes)
15964
(142172096
bytes)
fattest
187357504
(20402 seconds)
216722816
(23220 seconds)
349049492
(5223 seconds)
385387604
(7201 seconds)
164962944
(2041 seconds)
CHIC
LOSA
Fattest data: remember it is 1-in-100 sampled data
26882 sec = 7.5 hours
26
Data for a five-weekday period
(July 6-10, 2009)
Router
CHIC
LOSA
Number of flows
841062272
268933244
Number (%) of long flows
35632 (0.00424% )
32660 (0.012%)
Total number of bytes
3.11946E+12
1.17578E+12
Number and % of bytes
in long flows
1.111436E+11
(3.563%)
101783460037 (8.66%)
Number of long flows of
different types
33241 (TCP), 0 (IP), 260(GRE),
211(ESP), 0(AH)
26618 (TCP), 67 (IP), 206(GRE),
320(ESP), 7(AH)
Number of long flows of
different applications
(without counting long
ACK flows)
447 (20:ftp), 3023 (22:ssh), 4
(25:smtp), 3078 (80: http), 63 (119:
nntp), 1 (143:imap), 1690 (388: unidata),
142 (443:https) , 25 (554 :rtsp), 560
(873:rsync), 2545 (unassigned), 1134
(dynamic-and-private), SUM =12712
1202(20:ftp), 3109 (22:ssh), 8
(25:smtp), 3528 (80: http), 1 (119:
nntp), 0 (143:imap), 426 (388:
unidata), 307 (443:https) , 15
(554 :rtsp), 570 (873:rsync),
1068(unassigned), 574 (dynamicand-private), SUM =10808
Fattest flow (bytes)
867174480 (10443 seconds)
1779002612 (7381 seconds)
Longest flow (seconds)
25326 (112768756 bytes)
26882 (58425675 bytes)
27
Repeat customers:
ssh long flows
Router
Number of ssh long
flows that occured on
multiple days in that
one-5-day period
(candidates for
MFDB)
LOSA
CHIC
2days
3 days
4 days
5
days
2days
3 days
4 days
5 days
59+60
+55+5
5=229
38+42+
37=117
32+30=
62
28
62+67+
54+53=
236
41+32+
33=106
20+23=
43
17
28
Findings from track II
• Teragrid servers: 15 sites (server names
and IP addresses found)
• ESG data grid servers found
• So far: NP and BES reports studied
• Number of servers found so far: 51
(BES: single) + 15 (BES: ranges) + 39 (NP)
• Some IP address ranges used for
participating institutions
29
“Match” rate between
Track I and Track II
•
•
•
CHIC
LOSA
NP
10060 (29.84%)
1128 (4.14%)
BES
3706 (9.12%)
1254 (4.6%)
Percent of flows for which the src or dst IP
address matches one of the server addresses
found from the Track II study of science projects
Number of long flows in CHIC is 33717
Number of long flows in LOSA is 27279
30
ESnet Netflow data analysis
• Sent OFAT R programs to Chris
Tracy
• Chris ran these programs on one day
data from an ESnet’s “busy” router
• Here are preliminary results
31
Preliminary ESnet results
• Took 1.5 hours to analyze one day’s data from busy ESnet
router on Aug. 17, 2010
• Used gap threshold of 5 minutes and 10 minutes as “long
flow” definition
• LongFlow.txt: 69403 (59 sec: before concatenation)
• All_Long_Flows: 157 (after concatenation)
• MFDB_Long_Only_Flowlength: 24
• Duration in this “Long Flow Only” file:
– 659 (min) to 9723 (max)
• Can use All Long Flows and have HNTES wait to see multiple
packets before triggering circuit setup (to check that it is
not a short flow)
32
ESnet busy router one-day
Statistics
•
•
•
•
•
Number of long flows: 193 (157+36 non-terminated)
Bytes in long flows: 1101995002 (~1TB)
Number of flows: 3339659
Bytes in all flows: 57901985032 (~50TB)
Protocol Statistics: TCP (155) UDP (2), IP, GRE,
AH, ESP (all 0)
• Port Statistics: FTP, SMTP, HTTPs,NNTP, IMAP,
RTSP, rsync (all 0), ssh (32), HTTP (2) Unidata (5),
Unassigned (9), Dynamic_private (4)
33
Question: “What kinds of flows
should we want to move to SDN”
•
Joe Metzger: “For the most part, it isn't worth it to us to touch
anything less than 100Mbps, or possibly even less than 500Mbps.
To some extent, it depends on duration. If somebody is going to
nail up 100Mbps flows for months, yes, we would probably want to
move that to a circuit. But if it is a 100Mbps flow that lasts a few
hours, it isn't a big concern. However, if it is 100 100Mbps parallel
flows between a group of hosts in one location, and a group of hosts
at another location -- then yes, we would want to put that traffic
onto an SDN circuit. Most of the very large flows that have
already been moved to SDN are in the 1-10Gbps range and lasting
for hours, for example:
–
•
https://stats1.es.net/graphite/render/?width=586&height=303&_salt=128216106
4.328&target=fnal-mr2.interface.xe7_0_0%403503.out&from=10%3A30_20100817&until=20%3A30_20100817
Chris or Joe: Think of a curve where one axis is bandwidth and the
other axis is time. We could define points, such that if a flow (or
group of flows) falls below it -- we don't worry about moving it to
SDN. But if it is above some threshold, we do want to move it... 34
Next steps
• Rethinking definition of heavy-hitter
– Our focus was on duration owing to long circuit setup
delay
• Lan and Heidemann 2006 paper:
–
–
–
–
Elephants vs mice (Size: number of bytes)
Cheetah vs snail (Rate)
Tortoises vs dragonfly (Duration)
Porcupine vs stingrays (Burstiness)
• Two dimensions
– Size, rate, burstiness dimension of “heavy hitter”
– Group flows together instead of single flows
35
PerfSONAR OWAMP
data analysis
36
PerfSONAR OWAMP
data analysis
• One-way Active Measurement
Protocol(OWAMP)
• Packet interval: 0.1 sec
– 10 packets per sec
– 600 packets per minute
• Use perl programs provided by
perfSONAR
• Sample columns of the OWAMP
data file:
– endTime loss maxError max_delay
min_delay sent startTime
37
PerfSONAR OWAMP
data analysis
• One day’s data from 23:00 Sep 14 to 23:00
Sep 15, 2010 for ELPA to BOIS
• minDelay ≈ 15ms
• Inter-Quartile Range (IQR) of maxDelay:
38
PerfSONAR OWAMP
data analysis
• Max delay plot:
– ELPA-BOIS
– ALBU-DENV
• Overlapping
paths
• Appears to be
due to traffic
not host issues
39
Simulation study of
heavy hitter flows
• ns2 simulations ongoing
• Create background load at varying levels of
utilization
• Run heavy-hitter flow (size, rate, burstiness)
• Experiment 1: Delay impact
– Run RTP/UDP flow
– Measure and quantify delay impact on RTP/UDP flow at
different levels of utilization
• Experiment 2: Fairness impact
– Run mice transfers (size)
– Raj Jain fairness ratio: throughput, response
time/service time
40
Special issue of IEEE
Communications Magazine
• Topic:
– Hybrid Networking: Evolution Towards Combined IP Services
and Dynamic Circuit Network Capabilities
• Tentative schedule:
– Manuscripts due: Nov 1, 2010
– Acceptance notification: Jan 15, 2010
– Tentative Issue of the Feature Topic: May 2011
• Guest editors
– Admela Jukan, Technische Universität Carolo-Wilhelmina zu
Braunschweig
– Malathi Veeraghavan, University of Virginia
– Masum Hasan, Cisco Systems
• http://dl.comsoc.org/ci1/info/cfp/cfpcommag0511.htm
41
Summary
• Project has several parallel tracks
– HNTES software development
– Testing software on ANI Tabletop testbed
– Netflow data analysis and quantify “value” of
redirection
– Simulation study to identify best candidate flows for
redirection
– PerfSONAR OWAMP data analysis to characterize
delay distribution across ESnet paths
– IEEE Communication Magazine special issue
42