Improved BGP convergence via Ghost Flushing

Download Report

Transcript Improved BGP convergence via Ghost Flushing

Improved BGP convergence
via Ghost Flushing
Yehuda Afek
Anat Bremler-Barr
‫המרכז הבינתחומי הרצליה‬
Shemer Schwarzd
Problem: BGP Convergence


[Labovitz,Ahuja,Bose,Jahanian]
BGP may take up to 15 minutes to converge.
Here: Reduce the worst case from minutes to
seconds, in a practical way
Problem: BGP Convergence



[Labovitz,Ahuja,Bose,Jahanian]
Events
Time (sec’s, minRouteAdver=30)

E-Down
30•n
n  10,000, up to 15 minutes

E-Up
30•d
d  30, d=diameter

E-Longer
2•30•l
l == path length

E-Shorter
30•d
Here: E-down = l
time units (unit = link delay)
E-Longer = 30•d
Agenda





BGP overview
The BGP convergence problem
Ghost buster rule
Ghost flushing rule
Simulation results
BGP protocol

Distance (Path) vector protocol

Receive AS-path from the neighbors

Chooses the best one (shortest)

Eliminates Routing loops using the AS-path

Two kinds of messages: Announcements and Withdrawal
Problem: Ghost information
One Ghost (old information) makes many,
and in the network it continues recursively
dst: 0
2
1
dst: 0
withdraw
0
dst
4
dst: 0
3
dst: 0
dst: 0
t=0
Problem: Ghost information
One Ghost (old information) makes many,
and in the network it continues recursively
dst: 1 0
annc:1 0
2
4
dst: 1 0
annc:1 0
dst: 2 0
annc:2 0
1
3
dst: 1 0
annc:1 0
0
dst
dst: {}
t=1
Problem: Ghost information
One Ghost (old information) makes many,
and in the network it continues recursively
dst: 3 1 0
2
4
dst: 1 2 0
dst: {}
withdraw
1
3
dst: 1 2 0
0
dst
dst: {}
t=2
Problem: Ghost information
One Ghost
(old information)
makes
many,
minRouteAdver:
Wait 30 sec’s before
sending
next announcement
(BGP)recursively
and in thethenetwork
it continues
dst: 3 1 0
annc: 3 1 0
dst: {}
2
4
dst: 2 1 0
annc: 2 1 0
1
3
dst: 2 1 0
annc: 2 1 0
0
dst
dst: {}
t=6
t=5
t=4
t=3
t=31
t=7
t=8
t=10
t=11
t=14
t=16
t=17
t=20
t=22
t=24
t=27
t=28
E_Down convergence
BGP
Time
Message
30n
nE
In the clique (size 4) example the scenario ends after 62 sec
(= 30(n-2) )
Without MinRouteAdver

Avalanche of Messages O(n!)

Explore all possible paths of length 1, 2 …
1 : 1 0 dst: 0
2
3:30
4:40
1
2 : 2 0 dst: 0
3:30
withdrawal
4:40
0
dst
4
dst: 0
1:10
2:20
3:30
3
dst: 0
1:10
2:20
4:40
dst: {}
t=0
Without MinRouteAdver

Avalanche of Messages O(n!)

Explore all possible paths of length 1, 2 …
1 : 1 0 dst: 1 0
3 : 3 0 annc: 1
4:40
0
2 : 2 0 dst: 2 0
3 : 3 0 annc: 2 0
4:40
dst: 1 0
annc:1 0
1:10
3:30
4:40
2
4
1
dst: 1 0 1 : 1 0
annc: 1 0 3 : 3 0
4:40
dst: {}
3
0
dst
t=0.1
Without MinRouteAdver

Avalanche of Messages O(n!)

Explore all possible paths of length 1, 2 …
1 : 1 2 0 dst: 3 0
3:30
annc:3
4:40
0
dst: 20
annc: 2
2:20
0
3:30
4:40
2
4
1
3
0
dst
dst: 2 0
annc:2 0
1:120
2:20
3:30
dst: 2 0
annc:2 0 1 : 1 2 0
dst: {}
2:20
4:40
t=0.2
Without MinRouteAdver

Avalanche of Messages O(n!)

Explore all possible paths of length 2, 3 …
1 : 1 2 0 dst: 3 0
2
3:30
annc:3
4:40
0
dst: 3 0 1
annc:3
2:230
0
3:30
4:40
4
3
0
dst
dst: 3 0
annc:3 0
1:120
2:210
3:30
dst: 4 0
annc:4 0 1 : 1 2 0
dst: {}
2:210
4:40
t=0.3
Without MinRouteAdver

Avalanche of Messages O(n!)

Explore all possible paths of length 2, 3 …
1 : 1 2 0 dst: 4 0
2
3:310
annc:4
4:40
0
dst: 4 0 1
annc:4
2:230
0
3:310
4:40
4
3
0
dst
dst: 1 2 0 1 : 1 2 0
annc: 1 2 2 : 2 1 0
3:310
0
dst: 4 0
annc:4 0 1 : 1 2 0
dst: {}
2:210
4:40
t=0.4
E_Down convergence
BGP with
MinRouteAdver
BGP without
MinRouteAdver
Time
Message
30n
nE
hn
n!E
h=one link delay
Related Work



Introducing the problem [Labovitz,Ahuja,Bose,Jahanian],
[Labovitz,Wattenhofer,Venkatachary,Ahuja]
 real life evidence
 theoretical analysis
Experimental analysis [Griffin,Premore]
Solution
 Works in Counting to Infinity:
 Adding states [Garcia-Luna-Aceves] – EIGRP like…
 Route Poisoning with Hold-down [Cisco:Rutgers]– IGRP
like...
 Routes consistency
[Pei,Zhao,Wang,Massey,Mankin,Wu,Zhang]
Ghost flushing rule


If ASpath to dst is longer
and
cannot send annoucement
(due to minRouteAdver rule )
then
send withdrawal
Motivation: Flush the ghost information ASAP
Ghost Flushing example
dst: 0
2
1
dst: 0
withdraw
0
dst
4
dst: 0
3
dst: 0
dst: 0
t=0
Ghost Flushing example
dst: 1 0
annc:1 0
2
4
dst: 1 0
annc:1 0
dst: 2 0
annc:2 0
1
3
dst: 1 0
annc:1 0
0
dst
dst: {}
t=1
Ghost Flushing example
Longer ASpath &
minRouteAdver timer 
Send “flushing” withdrawal
dst: 3 1 0
2
4
withdraw
withdraw
dst: {}
withdraw
dst: 1 2 0
1
3
0
dst
dst: 1 2 0
dst:withdraw
{}
t=2
Ghost Flushing example
dst: {}
withdraw
2
4
dst: {}
1
3
0
dst
dst: {}
withdraw
dst: {}
withdraw
dst: {}
t=3
Analysis: Time convergence of
ghost flushing rule, E_down




In each time unit (=h, maximum link delay),
ghost information is erased to a distance
greater by one
After k time units, ghost information ASpath
with length < k has disappeared.
Longest Ghost ASpath = n (in theory).
Hence (worst case) time convergence: nh
E_Down convergence
BGP with
MinRouteAdver
BGP without
MinRouteAdver
Ghost flushing
Time
Message
30n
nE
hn
h=one link delay
hn
n!E
2Ehn/30
Ghost Buster Rule


The convergence time is better than
expected !!!!
Explanation:
The minRouteAdver blocks the
propagation of ghost information, while
the flushing withdrawal “eats” the ghost
information.

Bad (wrong) news propagate slowly
Analysis: Ghost buster rule

Add to the ghost flushing rule:


Router sends announcement, only after delta time
MinRouteAdver similar to


delta:
Common implementation: MinRouteAdver per peer
And, timer almost always on (lots of BGP announcements !)
Analysis: Time convergence
of ghost buster rule

The ghost information disappears at time t:
d+t/(delta+h) = t/h




Every delta+h time the length of the maximum ghost
ASpath is increased by one.
Every h time, the length of the minimum ghost ASpath is
increased by one.
After the failure the length of the maximum ghost ASpath
is d (diameter).
Hence: t = kdh/(k-1)  d,
where k = (delta+h)/h is the rate of the algorithm
E_Down convergence
BGP with
MinRouteAdver
BGP without
MinRouteAdver
Ghost flushing
Ghost flushing
With Ghost buster
Time
30n
Message
nE
hn
n!E
h=one link delay
hn
2Ehn/30
kdh/k-1  dh
d=diameter
k=(delta+h)/h
2Ehkd/30(k-1)
The effect on E_longer

BGP: Convergence time dominated:
1. Time until ghost information vanishes
2. Time until backup path propagates in

Ghost flushing:
helps the first factor
2
4
7
1
3
6
0
dst
5
The effect on E_longer


Original BGP may err:

MinRouteAdver  peer stores wrong ASPath

BGP may err and send the packet in the wrong direction
Ghost flushing:
send withdrawal to a peer. Perhaps by a chance
there may be an alternative path there.
Simulation: BGP code

Shortest path metric

Delay on link between 0.2 to 2 sec

MinRouteAdver randomly in 0 to 30 sec
Simulation: Clique E-down
600
500
)
300
(sec
Time
400
200
Original BGP
100
Modified BGP
0
1
2
3
4
5
6
7 8 9 10 11 12 13 14 15 16 17 18
Number of Nodes
Simulation: ISP topology
9
4
8
5
1
Percentage
7
3
dst
100
90
80
70
60
50
40
30
20
10
0
10
20
30
40
50
60
70
80
Convergence Time
90
100
110
120
Example: Core Internet (ASes)
Out-degree
In-degree
BGP
Ghost
Flushing
1
45
10
963
22
2
52
17
898
51
3
3
4
1031
36
4
112
27
1017
50
5
61
11
1034
36
6
20
24
920
33
7
1
6
2
2.5
8
18
2
1111
54
9
1
1111
981
62
10
1
98
4
5.1
E_longer: Convergence Time
2
4
7
1
3
6
600
500
Original BGP
Modified BGP
Time
400
300
200
100
0
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
Number Of Nodes
0
dst
5
Conclusion

Reduced convergence time from minutes to sec’s.

Does not hurt in other cases

Ghost flushing - no change at BGP messages


Ghost buster solution – a new counting to infinity
solution
BGP very sensitive to minor modifications.