Document 7843457

Download Report

Transcript Document 7843457

Networking for ATLAS Remote Farms
Richard Hughes-Jones
The University of Manchester
DataGrid WP7 – Dante Tests on the GÉANT Core
End-2-End Measurements from the 4th Year VLBI Project at Manchester
New TCP stacks – the effect on throughput
Some Simple Network Tests CERN-Manchester
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
DataGrid WP7 – Dante Tests on the GÉANT Core
 Set-up
 Supermicro PC in:
 London GEANT PoP
 Amsterdam GEANT PoP
 Smartbits in:
 London GEANT PoP
 Frankfurt GEANT PoP
 Long link
UK-SE-DE2-IT-CH-FR-BE-NL
 Short Link
UK-FR-BE-NL
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Tests GÉANT Core: UDP throughput
UDP Throughput
London-Amsterdam
Available BW to packet on wire
Then 1/t
Wire rate 998 Mbit/s
for packets > 1400 bytes
uk-nl_20tg4-hs-w100_01Oct03
1000
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
900
Recv Wire rate Mbits/s





800
700
600
500
400
300
200
100
0
0
 Dips in BW lined to packet loss
 SysKonnect NIC int. per packet
 CPU load important
10
15
20
Spacing between frames us
25
30
35
40
100
90
80
70
60
50
40
30
20
10
0
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
% Packet loss
 Packet Loss
 None for large packets
5
0
5
10
15
20
Spacing between frames us
25
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
30
35
40
 Effect of Packet size
30
Packet re-order uk-nl 10,000 BE sent wait 10 us 01 Oct 03
20
30
London-Amsterdam
Packets at 10 µs – line speed
10,000 sent
Packet Loss ~ 0.1%
25
Out of order %




Out of order %
Tests GÉANT Core: Packet re-ordering
20
15
10
0
1400
1401
1402
1403
Packet size bytes
1404
10
5
0
0
 Re-order Distribution
Packet re-order uk-nl 10,000 sent wait 10 us
500
No. Packets
500
1000
Packet size bytes
400
1400 bytes
1401 bytes
1402 bytes
300
200
100
0
0
1
2 Length
3
4out-of-order
5
6
7
8
9
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
1500






Effect of LBE background
Amsterdam-London
BE Test flow
Packets at 10 µs – line speed
10,000 sent
Packet Loss ~ 0.1%
% Out of order
Tests GÉANT Core: Packet re-ordering
UDP 1472 bytes NL-UK-lbexxx_7nov03
20
18
16
14
12
10
8
6
4
2
0
hstcp
Standard TCP
line speed
90% line speed
2
2.2
2.4
2.6
Total Offered Rate Gbit/s
2.8
3
3.2
200000
180000
160000
140000
120000
100000
80000
60000
40000
20000
0
Packet re-order 1472 bytes uk-nl 21 Oct 03 10,000 sent wait 10 us
0 5000
% lbe
Packet re-order 1400 bytes uk-nl 21 Oct 03 10,000 sent wait 10 us
104500
% lbe
4000
20 % lbe
3500
30 % lbe
3000
402500
% lbe
502000
% lbe
1500
60 % lbe
701000
% lbe
500
80 % lbe
0
1
2
3
4
5
6
Length out-of-order
7
8
9
0 % lbe
10 % lbe
20 % lbe
No. Packets
No. Packets
 Re-order Distributions:
30 % lbe
40 % lbe
50 % lbe
60 % lbe
70 % lbe
80 % lbe
1
2
3
4
5
6
Length out-of-order
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
7
8
9
Tests GÉANT Core: Packet Jitter
 Amsterdam-London
 BE Test flow
 Packet spacing 80 µs
 BE Test flow + Background:
60% BE 1.4Gbit + 40% LBE 780Mbit
Flow: BE Background: none
50000
35000
40000
30000
30000
Frequency
Frequency
Flow:BE Background: 60% BE 1.4Gbit + 40% LBE 780Mbit
40000
20000
10000
25000
20000
15000
10000
5000
0
0
20
40
60
80
100
120
0
140
0
20
40
Latency us
Flow:IPP Background: none
120
140
flow:IPP Background: 60% BE 1.4Gbit + 40% LBE 780Mbit
60000
50000
1-way latency us
200000
Frequency
100
 IPPremium Test flow + Background
 IPPremium Test flow
250000
60
80
Packet Jitter us
150000
100000
50000
40000
30000
20000
10000
0
0
0
20
40
60
80
Packet Jitter us
100
120
140
0
20
40
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
60
80
Packet Jitter us
100
120
140
Tests GÉANT Core: 1-way Delay
Flow:IPP Background: none
11340
1-way latency us
 Amsterdam-London
 IPPremium Test flow
 Packet spacing 80 µs
11320
11300
11280
11260
11240
11220
 BE Test flow + Background:
60% BE 1.4Gbit +
40% LBE 780Mbit
1-way latency us
0
11700
11650
11600
11550
11500
11450
11400
11350
11300
11250
11200
1-way latency us
0
 BE Test flow + Background:
60% BE 1.4Gbit +
40% LBE 780Mbit
2000
4000
6000
8000
10000
No. + 40% LBE 780Mbit
Flow:IPP Background: 60%Packet
BE 1.4Gbit
2000
4000
6000
8000
10000
No. + 40% LBE 780Mbit
Flow:BE Background: 60% Packet
BE 1.4Gbit
12800
12600
12400
12200
12000
11800
11600
11400
11200
11000
0
2000
4000
6000
Packet No.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
8000
10000
VLBI Project: Test Topology
SURFnet
Manchester
JIVE
Dwingaloo
Jodrell
SuperJANET4
Adam Mathews
Steve O’Toole
Univ of Manchester
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
VLBI Project: Throughput
Dwingeloo
1.2 GHz PIII
Gnt5-DwMk5 11Nov03/DwMk5-Gnt5 13Nov03-1472bytes
1200
1000
Recv Wire rate Mbits/s
 Manchester to
 2.0G Hz Xeon
Gnt5-DwMk5
DwMk5-Gnt5
800
600
400
200
0
 Re-ordering vs Offered Load
0
5
12
10
15
20
25
Spacing
between
frames
us
Gnt5-DwMk5 11Nov03-1472 bytes
30
35
40
% Packet loss
10
8
Gnt5-DwMk5
DwMk5-Gnt5
6
4
2
0
0
5
10
15
20
Spacing between frames us
25
30
35
40
Gnt5-DwMk5 11Nov03 1472 bytes
% Kernel
Sender
100
80
60
40
20
0
0
5
10
15
20
Spacing between frames us
25
30
35
40
Gnt5-DwMk5 11Nov03 1472 bytes
% Kernel
Receiver
100
80
60
40
20
0
0
5
10
15
20
Meeting on ATLAS Remote
Farms.
Copenhagen
11 May
Spacing between
frames2004
us
R. Hughes-Jones Manchester
25
30
35
40
VLBI Project: Jitter & 1-way Delay
 1472 byte Packets man -> JIVE
 FWHM 22 µs (B2B 3 µs )
10000
10000
1472 bytes w=50 jitter Gnt5-DwMk5 28Oct03
1000
6000
N(t)
N(t)
8000
1472 bytes w=50 jitter Gnt5-DwMk5 28Oct03
4000
100
10
2000
0
1
0
20
40
60
80
100
120
140
0
20
40
60
Jitter us
80
100
120
140
Jitter us
 1-way Delay – note the packet loss (points with 0 –way delay)
1472 bytes w12 Gnt5-DwMk5 21Oct03
12000
10000
10000
8000
8000
1-way delay us
1-way delay us
1472 bytes w12 Gnt5-DwMk5 21Oct03
12000
6000
4000
2000
6000
4000
2000
0
0
0
1000
2000
3000
Packet No.
40002000
210050002200
2300
2400
2500
2600
Packet No.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
2700
2800
2900
3000
 Aggregated Variance Method
 Divide time series length N into
blocks of size m
 Calc mean of each section Xm(k)
k= 1 … N/m
 Calc variance VXm of these Xm(k)
 Vary m size of the blocks
 Plot on log-log & fit slope β
 Hurst parameter H
Aggrigate-variance Log10( X(m) )
VLBI Project: Packet Loss – Long Range Effects?
4
3.5
3
y = -0.355x + 2.8826
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
sub-sample size Log10( m )
β = 2H -2
 Measure:
β = -0.355 which gives H 0.822
 H =1 no long range dependence
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
2.5
3
Traffic Flows
 Manchester – NetNorthWest - SuperJANET Access links
 Two 1 Gbit/s
 Access links:
SJ4 to GÉANT
GÉANT to SurfNet
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
High Performance TCP – DataTAG
Different TCP stacks tested on the DataTAG Network
128 ms round trip time
Drop 1 in 106
High-Speed
Rapid recovery
Scalable
Very fast recovery
Standard
Recovery would
take ~ 10 mins
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
High Performance TCP – MB-NG
Drop 1 in 25,000
Rtt 6.2 ms
Recover in 1.6 s
Standard
HighSpeed
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Scalable
Some Network Tests
 TCP Request – Response
Zero stats
OK done
Request event
Send event
data
Request-Response
time (Histogram)
●●●
●●●
Get remote statistics
Send statistics:
CPU load & no. int
1-way delay
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Lab Test: TCP Request-Response Histograms
 PC – router – PC
 BE Test flow
 Request spacing 0 µs
1000
 Request spacing 10 ms
0.5M bytes man02-3_7may04
0.5M bytes w 10ms man02-3_7may04
800
600
600
N(t)
N(t)
800
400
200
0
4280
200
4300
4320
4340
4360
4380
4400
4420
4440
0
4280
4460
Latency us
1000
800
600
600
400
200
0
8580
4300
4320
4340
1000
1.0M bytes man02-3_7may04
800
N(t)
N(t)
400
4400
4420
4440
4460
400
200
8600
8620
8640
8660
8680
8700
8720
8740
0
8580
8760
8600
8620
8640
8660
8680
8700
8720
8740
8760
Latency us
1000
1000
2.0 M bytes man02-3_7may04
800
800
600
600
N(t)
N(t)
4380
Latency us
1.0 M bytes w 10ms man02-3_7may04
Latency us
400
2.0 M bytes w 10ms man02-3_7may04
400
200
200
0
17080
4360
17100
17120
17140
17160
17180
17200
17220
17240
17260
0
17080
17100
17120
17140
Latency
Meeting
onusATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
17160
17180
Latency us
17200
17220
17240
17260
Man-CERN: TCP Request-Response Latency
DataTAG PC – backup link
BE Tests
Request spacing 0 µs
Win size 2.5Mbytes
ave time
w05gva-gnt5_7May04_TCP
min time
600000
max time
500000
Latency us




400000
300000
200000
100000
0
0
20000
40000
60000
80000
100000
120000
140000
160000
Message length bytes
 Compare with UDP latency
 Large differences
req-resp UDP latency us
ave time
250000
Latency us
 Rtt of 20 ms
delay*bw = 2.5 Mbytes
w05gva-gnt5_7May04_TCP
300000
200000
150000
100000
50000
 1Mbyte data = 690 pkts
 interesting bursts !
0
0
20000
40000
60000
80000
Message length bytes
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
100000
120000
140000
160000
Man-CERN: UDP Throughput & Packet Loss
1000
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
w05gva-gnt5_7May04_UDP
900
Recv Wire rate Mbits/s
 DataTAG PC – backup link
 BE Tests
 Throughput
800
700
600
500
400
300
200
100
0
0
 Packet loss
5
10
15
20
Spacing between frames us
25
30
35
40
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
w05gva-gnt5_7May04_UDP
18
16
% Packet loss
14
12
10
8
6
4
2
0
0
5
10
15
20
Spacing between frames us
25
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
30
35
40
Traffic Flows
 Manchester – NetNorthWest - SuperJANET Access links
 Link to PC in M/c
 Access links:
 1 GE Man to NNW
 Total Man to NNW
 NNW to
SuperJANET4
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
VLBI Project: Packet Loss – Is it Poisson?
 Divide time series of packets
into 1000 slices of 50 packets
 Total lost packets 1410
 Average number / slice = 1.4
400
350
300
250
N(n)
 Calc Poisson Probability
P(n, µ) = µ n e -µ
n!
run12b
1
1.3
1.4
1.8
200
150
100
50
0
0
5
10
n num lost in sub-sample
 Curves close but not exact
 Could be more than 1 process
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
15
Traffic QoS Classes on GÉANT Backbone
 Normal Traffic
+
 Radio Astronomy Data
+
 Less Than Best Effort
Max Throughput on 2.5 G PoS
 2.0 Gbit/s
 Normal Traffic
+
 Less Than Best Effort
 2.0 Gbit/s
 Normal Traffic
+
 Radio Astronomy Data
 500 Mbit/s
 Normal Traffic
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
Some Measurements made during ER2002
20
num_badorder
25000
No LBE
num_lost
18
20000
14
12
15000
10
8
10000
No. Lost
No. Out of order
16
6
4
5000
2
0
0
20
40
60
80
45000
100
120
Transfer number
140
160
180
num_badorder
25000
num_lost
With 1.8Gbit LBE
40000
20000
35000
30000
15000
25000
20000
10000
15000
10000
5000
5000
0
0
20
40
60
80
Transfer number
100
120
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004
R. Hughes-Jones Manchester
140
0
160
No. Lost
No. Out of order
0
200