FAST v3 - California Institute of Technology

Download Report

Transcript FAST v3 - California Institute of Technology

FAST Protocols for Ultrascale Networks
Internet: distributed feedback control system
TCP: adapts sending rate to congestion
AQM: feeds back congestion information


AQM

wi
tan -1i (t ) 1 
T (t ) 2
l 
p
1
( yl (t )  cl )
cl
Faculty
Doyle (CDS,EE,BE)
Low (CS,EE)
Newman (Physics)
Paganini (UCLA)
Staff/Postdoc
Bunn (CACR)
Jin (CS)
Ravot (Physics)
Singh (CACR)
StarLight
p
Rb’(s)
xi 
CERN
y
TCP
q
research & production
networks
Chicago
Rf (s)
x
WAN in Lab
Caltech
Calren2/Abilene
Geneva
xi ( t ) qi ( t )
 i di
  i (t )qi (t )
Multi-Gbps
50-200ms delay

Theory
Experiment
People
Implementation
Students
Choe (Postech/CIT)
Hu (Williams)
J. Wang (CDS)
Z.Wang (UCLA)
Wei (CS)
155Mb/s
SURFNet
Amsterdam
equilibrium
10Gb/s
slow
start
FAST
retransmit
time
out
FAST
recovery
Industry
Doraiswami (Cisco)
Yip (Cisco)
Partners
CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco
netlab.caltech.edu/FAST
FAST project
 Protocols for ultrascale networks
 >100 Gbps throughput, 50-200ms delay
 Theory, algorithms, design, implement, demo, deployment
 Faculty
 Doyle (CDS, EE, BE): complex systems theory
 Low (CS, EE): PI, networking
 Newman (Physics): application, deployment
 Paganini (EE, UCLA): control theory
 Research staff
 3 postdocs, 3 engineers, 8 students
 Collaboration
 Cisco, Internet2/Abilene, CERN, DataTAG (EU), …
 Funding
 NSF, DoE, Lee Center (AFOSR, ARO, Cisco)
netlab.caltech.edu
Outline
 Motivation
 Theory






Web layout
Content distribution
TCP/AQM (Jin, poster)
TCP/IP (poster)
Enforcing & inducing fairness
Optical switching (future)
netlab.caltech.edu
(poster)
High Energy Physics
 Large global collaborations
2000 physicists from 150 institutions in >30 countries
300-400 physicists in US from >30 universities & labs
 SLAC has 500TB data by 4/2002, world’s largest database
 Typical file transfer ~1 TB
At 622Mbps: ~ 4 hrs
At 2.5Gbps: ~ 1 hr
At 10Gbps: ~15min
Gigantic elephants!
 LHC (Large Hadron Collider) at CERN, to open 2007
Generate data at PB (1015B)/sec
Filtered in realtime by a factor of 106 to 107
Data stored at CERN at 100MB/sec
Many PB of data per year
To rise to Exabytes (1018B) in a decade
netlab.caltech.edu
HEP high speed network
… that must change
netlab.caltech.edu
HEP Network (DataTAG)
NewYork
ABILEN
E
UK
SuperJANET4
It
GARR-B
STARLIGHT
ESNET
GENEVA
GEANT
NL
SURFnet
STAR-TAP
CALRE
N
Fr
Renater
 2.5 Gbps Wavelength Triangle 2002
 10 Gbps Triangle in 2003
netlab.caltech.edu
Newman (Caltech)
Network upgrade
’01 ’02
155 622
netlab.caltech.edu
’03
2.5
’04
5
2001-06
’05
10
Projected performance
’01 ’02
155 622
’03
2.5
’04
5
’05
10
Ns-2: capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps
100 sources, 100 ms round trip propagation delay
J. Wang (Caltech)
netlab.caltech.edu
Projected performance
FAST
Ns-2: capacity = 10Gbps
100 sources, 100 ms round trip propagation delay
netlab.caltech.edu
TCP/RED
J. Wang (Caltech)
Outline
 Motivation
 Theory






Web layout
Content distribution
TCP/AQM (Jin, poster)
TCP/IP (poster)
Enforcing & inducing fairness
Optical switching (future)
netlab.caltech.edu
(poster)
Protocol Decomposition
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
netlab.caltech.edu
HOT (Doyle et al)
 Minimize user response time
 Heavy-tailed file sizes
Duality model
 Maximize aggregate utility
Shortest-path routing
 Minimize path costs
Power control
 Maximize channel capacity
Congestion control
pl(t)
xi(t)
Example congestion measure pl(t)
 Loss (Reno)
 Queueing delay (Vegas)
netlab.caltech.edu
TCP/AQM
pl(t)
TCP:
 Reno
 Vegas
xi(t)
AQM:
 DropTail
 RED
 REM/PI
 AVQ
 Congestion control is a distributed asynchronous algorithm
to share bandwidth
 It has two components


TCP: adapts sending rate (window) to congestion
AQM: adjusts & feeds back congestion information
 They form a distributed feedback control system


Equilibrium & stability depends on both TCP and AQM
And on delay, capacity, routing, #connections
netlab.caltech.edu
Network model
x
Rf(s)
F1
Network
TCP
y
G1
FN
GL
q
Rb
R 
f li
e
Rb li  e
netlab.caltech.edu
AQM

 s li

 s li
’(s)
p
if source i uses link l
if source i uses link l
Vegas model
for every RTT
if W/RTTmin – W/RTT <  then W ++
{
if W/RTTmin – W/RTT >  then W --
}
queue size
Fi:
Gl:

1
xi   2
 Ti (t )
if
xi (t )qi (t )   i di

1
xi   2
 Ti (t )
if
xi (t )qi (t )   i di
xi  0
else
p l  c1l ( yl (t )  cl )
netlab.caltech.edu
E2E queueing delay
Link queueing delay
Vegas model
x
Rf(s)
F1
Network
TCP
y
G1
FN
GL
q
Rb

1
Fi 
sgn 1 
2
T (t )
netlab.caltech.edu
AQM
xi ( t ) qi ( t )
i di

’(s)
p
yl (t )
Gl 
1
cl
Methodology
Protocol
(Reno, Vegas, RED, REM/PI…)
x(t  1)  F ( p(t ), x(t ))
p(t  1)  G ( p(t ), x(t ))
Equilibrium
 Performance
 Throughput, loss, delay
 Fairness
 Utility
netlab.caltech.edu
Dynamics
 Local stability
 Cost of stabilization
Summary: duality model
 Flow control problem
U ( x )
max
s
xs  0
s
s
Rx  c
subject to
 Primal-dual algorithm
x(t  1)  F ( p (t ), x(t ))
Reno, Vegas
p (t  1)  G ( p (t ), x(t ))
DropTail, RED, REM
 TCP/AQM
 Maximize utility with different utility functions
 Theorem
(Low 00):
(x*,p*) primal-dual optimal iff
yl*  cl with equality if
netlab.caltech.edu
pl*  0
Equilibrium of Vegas
Network
 Link queueing delays: pl
 Queue length:
clpl
Sources
 Throughput:
xi
 E2E queueing delay :
qi
 Packets buffered:
xi qi   i d i
Ui(x) = i di log x
 Utility funtion:
 Proportional fairness
netlab.caltech.edu
Persistent congestion
 Vegas exploits buffer process to compute prices
(queueing delays)
 Persistent congestion due to
 Coupling of buffer & price
 Error in propagation delay estimation
 Consequences
 Excessive backlog
 Unfairness to older sources
Theorem
(Low, Peterson, Wang ’02)
A relative error of ei in propagation delay estimation
distorts the utility function to
Uˆ i ( xi )  (1  e i )i di log xi  e i di xi
netlab.caltech.edu
Validation
(L. Wang, Princeton)
Source rates (pkts/ms)
# src1
src2
1 5.98 (6)
2 2.05 (2)
3.92 (4)
3 0.96 (0.94) 1.46 (1.49)
4 0.51 (0.50) 0.72 (0.73)
5 0.29 (0.29) 0.40 (0.40)
#
1
2
3
4
5
queue (pkts)
19.8 (20)
59.0 (60)
127.3 (127)
237.5 (238)
416.3 (416)
netlab.caltech.edu
src3
src4
3.54 (3.57)
1.34 (1.35)
0.68 (0.67)
3.38 (3.39)
1.30 (1.30)
baseRTT (ms)
10.18 (10.18)
13.36 (13.51)
20.17 (20.28)
31.50 (31.50)
49.86 (49.80)
src5
3.28 (3.34)
Methodology
Protocol
(Reno, Vegas, RED, REM/PI…)
x(t  1)  F ( p(t ), x(t ))
p(t  1)  G ( p(t ), x(t ))
Equilibrium
 Performance
 Throughput, loss, delay
 Fairness
 Utility
netlab.caltech.edu
Dynamics
 Local stability
 Cost of stabilization
TCP/RED stability
Small effect on queue
 AIMD
 Mice traffic
 Heterogeneity
Big effect on queue
 Stability!
netlab.caltech.edu
Stable: 20ms delay
Window
70
60
Window (pkts)
50
40
individual window
30
20
10
0
0
1000
2000
3000
4000
5000 6000
time (ms)
7000
8000
9000 10000
Window
Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
netlab.caltech.edu
Stable: 20ms delay
Window
Instantaneous queue
70
800
60
700
600
40
Instantaneous queue (pkts)
Window (pkts)
50
individual window
average window
30
20
500
400
300
200
10
0
0
100
1000
2000
3000
4000
5000 6000
time (ms)
Window
7000
8000
9000 10000
0
0
1000
2000
3000
4000
5000 6000
time (ms)
7000
8000
Queue
Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
netlab.caltech.edu
9000 10000
Unstable: 200ms delay
Window
70
individual window
60
Window (pkts)
50
40
30
20
10
0
0
1000
2000
3000
4000 5000 6000
time (10ms)
7000
8000
9000 10000
Window
Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
netlab.caltech.edu
Unstable: 200ms delay
Window
Instantaneous queue
70
800
individual window
700
60
600
Instantaneous queue (pkts)
Window (pkts)
(pkts)
Window
50
40
30
20
500
400
300
200
10
0
average window
0
1000
2000
3000
4000 5000 6000 7000 8000
time (10ms)
Window
100
9000 10000
0
0
1000
2000
3000
4000 5000 6000
time (10ms)
7000
Queue
Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
netlab.caltech.edu
8000
9000 10000
Other effects on queue
Instantaneous queue
20ms
30% noise
700
600
600
400
300
instantaneous queue (pkts)
500
500
400
300
500
400
300
200
200
200
100
100
100
0
0
0
1000
2000
3000
4000
5000 6000
time (ms)
7000
8000
9000 10000
0
0
10
20
40
50
60
time (sec)
70
80
90
100
200ms
20
500
400
300
200
200
100
100
0
7000
8000
9000 10000
40
50
60
time (sec)
70
80
90
100
600
instantaneous queue (pkts)
instantaneous queue (pkts)
300
30
avg delay 208ms
700
600
400
1000 2000 3000 4000 5000 6000
netlab.caltech.edu
time (10ms)
10
Instantaneous queue (pkts)
30% noise
700
500
0
0
800
800
600
0
30
Instantaneous queue (50% noise)
Instantaneous queue
800
700
avg delay 16ms
700
600
instantaneous queue (pkts)
Instantaneous queue (pkts)
800
800
700
Instantaneous queue (pkts)
Instantaneous queue (pkts)
Instantaneous queue (50% noise)
800
500
400
300
200
100
0
10
20
30
40
50
60
time (sec)
70
80
90
100
0
0
10
20
30
40
50
60
time (sec)
70
80
90
100
Stability: Reno/RED
x
TCP
y
Rf(s)
F1
G1
Network
FN
q
TCP:
 Small 
 Small c
 Large N
RED:
 Small 
 Large delay
netlab.caltech.edu
AQM
GL
Rb
p
’(s)
Theorem (Low et al, Infocom’02)
Reno/RED is stable if
 c 3 3
2

N
3
(c  N ) 
( 1- ) 2
4  2   2 (1   ) 2
Stability: scalable control
x
TCP
Rf(s)
F1
Network
FN
q
xi (t )  xi e

y
G1
AQM
GL
Rb
p
’(s)
i
q (t )
 i mi i
p l (t ) 
1
 yl (t )  cl 
cl
Theorem (Paganini, Doyle, Low, CDC’01)
Provided R is full rank, feedback loop is locally stable
for arbitrary delay, capacity, load and topology
netlab.caltech.edu
Stability: Vegas
x
TCP
y
Rf(s)
F1
G1
Network
FN
q

1

xi 
sgn 1 
2
T (t )
AQM
GL
Rb
xi ( t ) qi ( t )
i di
p
’(s)

p l (t ) 
1
 yl (t )  cl 
cl
Theorem (Choe & Low, Infocom’03)
Provided R is full rank, feedback loop is locally stable if
max xiTi   ( ; M , k02 )
netlab.caltech.edu
Stability: Stabilized Vegas
x
TCP
Rf(s)
F1
Network
FN
q

y
G1
AQM
GL
Rb
1
xi ( t ) qi ( t )
-1
xi 
tan

(
t
)
1

  i (t )qi (t )
i di
2
T (t )
p
’(s)

p l (t ) 
1
 yl (t )  cl 
cl
Theorem (Choe & Low, Infocom’03)
Provided R is full rank, feedback loop is locally stable if
max xiTi   (a,  )
netlab.caltech.edu
Stability: Stabilized Vegas
x
TCP
Rf(s)
F1
Network
FN
q

y
G1
AQM
GL
Rb
1
xi ( t ) qi ( t )
-1
xi 
tan

(
t
)
1

  i (t )qi (t )
i di
2
T (t )
p
’(s)

p l (t ) 
1
 yl (t )  cl 
cl
Application
 Stabilized TCP with current routers
 Queueing delay as congestion measure has right scaling
 Incremental deployment with ECN
netlab.caltech.edu
Outline
 Motivation
 Theory






Web layout
Content distribution
TCP/AQM (Jin, poster)
TCP/IP (poster)
Enforcing & inducing fairness
Optical switching (future)
netlab.caltech.edu
(poster)
Protocol Decomposition
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
netlab.caltech.edu
HOT (Doyle et al)
 Minimize user response time
 Heavy-tailed file sizes
Duality model
 Maximize aggregate utility
Shortest-path routing
 Minimize path costs
Power control
 Maximize channel capacity
Network model
x
y
R
F1
Network
TCP
G1
FN
q
AQM
GL
R
T
p
Rli  1 if source i uses link l
IP routing
x(t  1)  F ( RT p(t ), x(t ))
Reno, Vegas
p(t  1)  G ( p(t ), Rx (t ))
DT, RED, …
netlab.caltech.edu
Duality model of TCP/AQM
 Primal-dual algorithm
x(t  1)  F ( RT p(t ), x(t ))
Reno, Vegas
p(t  1)  G ( p(t ), Rx (t ))
DT, RED, REM/PI, AVQ
 Flow control problem
max
x 0
subject to
U ( x )
i
i
i
Rx  c
 TCP/AQM
 Maximize utility with different utility functions
 Theorem
(Low 00):
(x*,p*) primal-dual optimal iff
yl*  cl with equality if
netlab.caltech.edu
pl*  0
Motivation
Primal : max max
R
x 0
Dual :
netlab.caltech.edu

min 
p 0

U ( x )
i
i
subject to Rx  c
i



U i ( xi )  xi max  Rli pl    pl cl 
i max
Ri
xi  0 
l
 l

Motivation
Primal : max max
R
x 0
Dual :

min 
p 0

U ( x )
i
i
subject to Rx  c
i



U i ( xi )  xi max  Rli pl    pl cl 
i max
Ri
xi  0 
l
 l

Shortest path routing!
Can TCP/IP maximize utility?
netlab.caltech.edu
TCP-AQM/IP
Theorem (Wang et al, Infocom’03)
Primal problem is NP-hard
 Proof
Reduce integer partition to primal problem
Given: integers {c1, …, cn}
Find: set A s.t.
c  c
iA
netlab.caltech.edu
i
iA
i
TCP-AQM/IP
Theorem (Wang et al, Infocom’03)
Primal problem is NP-hard
 Achievable utility of TCP/IP?
 Stability?
 Duality gap?
Conclusion: Inevitable tradeoff between
 achievable utility
 routing stability
netlab.caltech.edu
General network
Conclusion: Inevitable tradeoff between
 achievable utility
 routing stability
random graph
20 nodes, 200 links
netlab.caltech.edu
Achievable utility
Coming together …
Clear & present
Need
Resources
netlab.caltech.edu
Coming together …
Clear & present
Need
Resources
netlab.caltech.edu
Coming together …
Clear & present
Need
Resources
netlab.caltech.edu
FAST
Protocols
FAST Protocols for Ultrascale Networks
Internet: distributed feedback control system
TCP: adapts sending rate to congestion
AQM: feeds back congestion information


AQM

wi
tan -1i (t ) 1 
T (t ) 2
l 
p
1
( yl (t )  cl )
cl
Faculty
Doyle (CDS,EE,BE)
Low (CS,EE)
Newman (Physics)
Paganini (UCLA)
Staff/Postdoc
Bunn (CACR)
Jin (CS)
Ravot (Physics)
Singh (CACR)
StarLight
p
Rb’(s)
xi 
CERN
y
TCP
q
research & production
networks
Chicago
Rf (s)
x
WAN in Lab
Caltech
Calren2/Abilene
Geneva
xi ( t ) qi ( t )
 i di
  i (t )qi (t )
Multi-Gbps
50-200ms delay

Theory
Experiment
People
Implementation
Students
Choe (Postech/CIT)
Hu (Williams)
J. Wang (CDS)
Z.Wang (UCLA)
Wei (CS)
155Mb/s
SURFNet
Amsterdam
equilibrium
10Gb/s
slow
start
FAST
retransmit
time
out
FAST
recovery
Industry
Doraiswami (Cisco)
Yip (Cisco)
Partners
CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco
netlab.caltech.edu/FAST