YATOB - North American Network Operators' Group

Download Report

Transcript YATOB - North American Network Operators' Group

Transient BGP Loops
Do they matter, and what
can be done about them?
Nate Kushman
MIT/Akamai
Srikanth Kandula, Dina Katabi and John Wroclawski
What causes:
“Transient BGP Loops”
Sprint
Withdraw
MIT
AT&T
Joe
Bob
Maintenance
MIT
What causes:
“Transient BGP Loops”
Sprint
AT&T
Joe
Bob
Maintenance
MIT
What causes:
“Transient BGP Loops”
Sprint
Withdraw
MIT
AT&T
Joe
Bob
Maintenance
MIT
What causes:
“Transient BGP Loops”
Sprint
AT&T
Routing Loop
Joe
Bob
Maintenance
MIT
What causes:
“Transient BGP Loops”
Sprint
Withdraw
MIT
AT&T
Joe
Bob
Maintenance
MIT
What causes:
“Transient BGP Loops”
Sprint
AT&T
Joe
Bob
Maintenance
MIT
What causes:
“Transient BGP Loops”
Sprint
AT&T
Joe
Bob
Maintenance
MIT
How common are:
“Transient Inter-domain Routing Loops”
• Sprint Study (IMC 2003, IMW 2002):
– Looked at packet traces from the Sprint
backbone
– Up to 90% of the observed packet-loss
was caused by routing loops
– 60-100% of the loops attributable to
BGP
Routing Loop Damage
• Our Study:
– 20 vantage points with BGP feeds
– 2 Months
– 70,000 unique prefixes
– Pinged once every 2 minutes
– Trace-routed once every 30 minutes
– TTL Exceeded responses to detect loops
– Additional pings and traceroutes when loops
detected
Routing Loop Damage
10-15% of updates cause routing loops
Collateral Damage
AS F
AS C
AS A
AS B
AS E
AS D
Collateral Damage
Collateral
Damage
AS F
X
AS C
AS A
AS B
AS E
AS D
Collateral Damage
20
Percentage of Packet Loss
18
16
14
12
10
8
6
4
2
0
-1000
-500
0
500
1000
100 second windows around sharing a loopy link
Prefixes sharing a loopy link see 19% loss
What should be done?
We should prevent forwarding loops
A loop occurs because:
One AS pushes a route update to the data
plane, but other AS's, unaware yet of the
move, try to send packets on the old route
How can we avoid Routing Loops?
Sprint
Withdraw
MIT
AT&T
Joe
Bob
Maintenance
MIT
How can we avoid Routing Loops?
Sprint
Withdraw
MIT
AT&T
Joe
Bob
AT&T still thinks
Joe is routing
Maintenance
through Bob
MIT
How can we avoid Routing Loops?
Sprint
AT&T
Joe
What if:
Bob
AT&T knew about
Maintenance
Joe’s change before
making its own?
MIT
Suspension
• Continue to route traffic
• Tell control system not to propagate the
route
How can we avoid Routing Loops?
Sprint
Withdraw
MIT
AT&T
Joe
Bob
Maintenance
MIT
How can we avoid Routing Loops?
Sprint
Withdraw
MIT
AT&T
Joe
What if:
Bob
Joe sends it’s update
Maintenance
before changing it’s
forwarding table?
MIT
How can we avoid Routing Loops?
Sprint
AT&T
Joe
Bob
Maintenance
MIT
How can we avoid Routing Loops?
Sprint
AT&T
Joe
And also waits for an
Bob
Ack from AT&T
Maintenance
before updating
it’s forwarding table?
MIT
How can we avoid Routing Loops?
Sprint
AT&T
Joe
Then we can be sure
Bob
that AT&T knows
about the path change
Maintenance
before it happens and
will not use the path
MIT
How can we avoid Routing Loops?
Sprint
AT&T
Joe
Bob
Instead, AT&T will
move immediately
Maintenance
to the Sprint path and
the loop is avoided.
MIT
More Generally
• We have proven:
– Loops are prevented in the general case
– Convergence properties similar to normal BGP
• All sorts of good proofs and stuff:
– http://nms.lcs.mit.edu/~nkushman/
Your feedback
• Clearly:
– Planned Maintenance events
• 20% of update events caused by planned
maintenance
– Link up events
• What about?
– Unplanned Link down events
– Trade-off between loss on current path and
collateral damage
In Short
• Routing loops cause significant
performance problems
• Even prefixes with no BGP updates are
significantly affected by loops
• A simple change to BGP can avoid all
routing loops
Questions?