Towards an Accurate AS-level Traceroute Tool
Download
Report
Transcript Towards an Accurate AS-level Traceroute Tool
Internet Routing: Measurement,
Modeling, and Analysis
ACM Sigmetrics 2005 Tutorial
Dr. Jia Wang
[email protected]
AT&T Labs Research
Florham Park, NJ 07932, USA
http://www.research.att.com/~jiawang/
Prof. Zhuoqing Morley Mao
[email protected]
Department of EECS
University of Michigan
Ann Arbor, MI 48109, USA
http://www.eecs.umich.edu/~zmao/
Outline
1.
2.
3.
4.
Overview of Inter-domain routing
Measuring inter-domain paths
BGP Measurement
BGP Modeling
Our opinions should not be taken to represent AT&T policies
2
Part I: Overview of Interdomain Routing
Internet
Loose cooperative effort of Internet
Service Providers (ISPs)
E.g., AT&T, Sprint, UUNet, AOL
Best effort service
Connectedness
Anyone connected to the Internet can
exchange traffic with anyone else
connected to the Internet
4
Internet routing
routes
Control plane:
exchange routes
Internet
: Routing session
Data plane:
forward traffic
IP traffic
Fail over to alternate route
rusty.cs.berkeley.edu
www.cnn.com
IP=169.229.62.116
Prefix=169.229.0.0/16
IP=64.236.16.52
5
Prefix=64.236.16.0/20
Internet routing domain
Autonomous routing domain
Network devices under same technical and administrative
control
Common routing policy
E.g., ISPs, enterprise networks
Autonomous system
Autonomous routing domain with an AS number (ASN)
AS numbers: 16 bits integer
Public AS number: 1 – 64511
Private AS number: 64512 – 65535
Examples
AT&T: 7018, 6431, …
Sprint: 1239, 1240, …
MIT: 3
6
More than 20,000 ASes today
Internet
Autonomous
System
ISP
Level3
Calren
Berkeley
ISP
ISP
Qwest
ISP
business
ISP
ISP
AT&T
Sprint UUnet
ISP
ISP
IP traffic
University
company
GNN
CNN
7
Internet routing architecture
Intra-domain
routing
Calren
Berkeley
Level3
IP traffic
Internet
Inter-domain
routing
GNN
CNN
8
Intra-domain routing
Run within a certain network infrastructure
Optimize routes taken between points within
a network
Internal Gateway Protocols (IGPs)
Metrics based
OSPF (Open Shortest Path First)
RIP (Routing Information Protocol)
IS-IS (Intermediate System to Intermediate
System)
9
Inter-domain routing
Run between networks
Provide full connectivity of entire
Internet
External Gateway Protocol (EGP)
Policy based
BGP (Border Gateway Protocol)
10
Link state protocols
Examples: OSPF, IS-IS
Based on Dijkstra’s shortest path computation
Each router periodically floods immediate
reachability information to other routers
Fast convergence
High communication and computation
overhead
Not scalable for large networks
Requires periodic refreshes
11
Vectoring protocols
Distance vs. Path Vector
Distance: hop count (RIP)
Path: entire path (BGP)
Helps identify loops
Supports policy-based routing based on path
Minimal communication overhead
Takes longer to converge, i.e., in
proportion to the maximum path length
12
Link state vs. vectoring
Link state Vectoring
IGP
EGP
OSPF
IS-IS
RIP
BGP
BGP is a path vector protocol
13
Classful addressing
IPv4: 32 bits
Five classes of networks
Class
Address
Mask
# of networks # of hosts
A
0*
255.0.0.0
128
~1.6M
B
10*
255.255.0.0
16384
65535
C
110*
255.255.255.0
~2.1M
255
D
Used for multicast
E
Reserved and currently unused
Improve scaling factor of routing in the Internet => classless
14
CIDR: Classless Inter-domain
Routing (RFC1519)
No implicit mask based on the class of the network
Explicit masks passed in the routing protocol
Allow aggregation and hierarchical routing
IP address: 12.70.0.0
Address
Mask
Mask: 255.255.252.0
00001100 00100110 00000000 00000000
11111111 11111111 11000000 00000000
Network prefix
CIDR representation: 12.70.0.0/22
Host
identifier
15
Address aggregation
12.70.0.0/24
12.70.1.0/24
12.70.2.0/24
12.70.3.0/24
Internet
ISP A
12.71.0.0/16
ISP B
12.70.0.0/22
12.71.0.0/16
16
Routing and forwarding
Routing
The decision process of choosing optimal
path that is consistent with the
administrative or technical policy
Forwarding
The act of receiving a packet, doing a
lookup, and copying a packet to the next
hop
17
Classless forwarding
Internet
12.70.0.20
10.20.128.10
10.20.128.1
10.20.0.1
IP traffic
10.20.1.1
135.120.0.1
Prefix
12.70.0.0/24
12.70.0.0/16
12.0.0.0/8
0.0.0.0
Next hop
10.20.0.1
10.20.1.1
10.20.128.1
10.20.128.10
18
Inter-domain routing with CIDR
support
BGP-4 [RFC1771]
De facto EGP
Carry routing information between ASes
Path vector protocol
Policy based routing
Run on top of TCP for reliability
Basic operations
Set up BGP session
Exchange all candidate routes
Send incremental updates
19
Establish BGP session
Establish neighboring session
between 12.10.0.1 and 12.10.0.2
12.10.0.1
Prefix
135.120.0.0/24
68.35.0.0/16
TCP 179
Next hop
10.128.0.1
10.192.1.1
12.10.0.2
Prefix
12.70.0.0/24
12.9.0.0/16
Next hop
10.20.0.1
10.20.1.1
20
Exchange all candidate routes
12.70.0.0/24
12.9.0.0/16
10.20.0.1
10.20.1.1
12.10.0.1
12.10.0.2
135.120.0.0/24
68.35.0.0/16
Prefix
135.120.0.0/24
68.35.0.0/16
12.70.0.0/24
12.9.0.0/16
Next hop
10.128.0.1
10.192.1.1
10.20.0.1
10.20.1.1
10.128.0.1
10.192.1.1
Prefix
12.70.0.0/24
12.9.0.0/16
135.120.0.0/24
68.35.0.0/16
Next hop
10.20.0.1
10.20.1.1
10.128.0.1
10.192.1.1
21
Send incremental updates
Withdraw 12.9.0.0/16
12.10.0.1
Prefix
135.120.0.0/24
68.35.0.0/16
12.70.0.0/24
12.9.0.0/16
12.10.0.2
Next hop
10.128.0.1
10.192.1.1
10.20.0.1
10.20.1.1
Prefix
12.70.0.0/24
12.9.0.0/16
135.120.0.0/24
68.35.0.0/16
Next hop
10.20.0.1
10.20.1.1
10.128.0.1
10.192.1.1
22
BGP messages
OPEN: set up a peering session
UPDATE: announce new routes or
withdraw previously announced routes
NOTIFICATION: shut down a peering
session
KEEPALIVE: confirm active connection
at regular interval
23
Internal vs. external BGP
Internet
I-BGP
E-BGP
update
E-BGP
I-BGP
update
AS B
AS C
AS A
24
Scaling I-BGP for large AS
Route reflectors
Confederations
AS 1000
E-BGP
update
EBGP
RR
RR
IBGP
AS 65010
IBGP EBGP
AS 65020
Only best paths
being sent by
RR
25
Establish connectivity
Prefix
135.120.0.0/16
AS 3
Next hop AS path
12.10.0.5 2 1
Prefix
135.120.0.0/16
IBGP
Next hop AS path
12.10.0.1 1
12.10.0.6
EBGP
12.10.0.5
AS 2
AS 1
135.120.0.0/16
IBGP
EBGP
12.10.0.1
12.10.0.2
IBGP
Prefix
135.120.0.0/16
Next hop AS path
12.10.0.1 1 26
IGP and BGP working together
Prefix
135.120.0.0/16
AS 3
IBGP
Next hop AS path
12.10.0.1 1
Prefix
12.10.0.0/30
135.120.0.0/16
12.10.0.6
Next hop
10.10.0.1
10.10.0.1
EBGP
12.10.0.5
AS 1
12.10.0.1
135.120.0.0/16
EBGP
AS 2
12.10.0.2
10.10.0.1
IBGP
IBGP
Prefix
135.120.0.0/16
Next hop AS path
12.10.0.1 1
27
Policy routing
ISP2
ISP1
traffic
Connectivity DOES NOT
imply reachability!
ISP3
ISP4
traffic
Cust1
Cust2
Policy determines how traffic
can flow on the Internet
28
BGP routing process
Routes
received
from peers
Apply
input
policy
Select
best
route
Best
routes
Apply
output
policy
Routes
advised
to peers
Routing Forwarding
table
table
BGP is not shortest path routing!
29
Best route selection
Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
30
Best route selection
Highest local preference
To enforce economical relationships
between domains
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
31
Best route selection
Highest local preference
Shortest AS path
Compare the quality of routes, assuming
shorter AS-path length is better
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
32
Best route selection
Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
To implement “cold potato” routing
between neighboring domains
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
33
Best route selection
Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Prefer EBGP routes to IBGP routes
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
34
Best route selection
Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Prefer routes via the nearest IGP neighbor
To implement “hot potato” routing
Tie breaking rules
35
Best route selection
Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
Router ID based: lowest router ID
Age based: oldest route
36
BGP route propagation
Not all possible routes propagate
Commercial relationships determine policies
for
Route import
Route selection
Route export
37
Typical AS relationships
Provider-customer
customer pay money for transit
Peer-peer
typically exchange respective customers’ traffic for
free
Siblings
Mutual transit agreement
Provide connectivity to the rest of the Internet for
each other
38
AS relationships translate
into BGP export rules
Export to a provider or a peer
Allowed: its routes and routes of its
customers and siblings
Disallowed: routes learned from other
providers or peers
Export to a customer or a sibling
Allowed: its routes, the routes of its
customers and siblings, and routes learned
from its providers and peers
39
Which AS paths are legal?
Valley-free:
After traversing a provider-customer or
peer-peer edge, cannot traverse a
customer-provider or peer-peer edge
Invalid path: >= 2 peer links, downhilluphill, downhill-peer, peer-uphill
40
Example of valley-free paths
[1 2 3], [1 2 6 3] are valley-free
X
X
[1 4 3], [1 4 5 3] are not valley free
41
Inferring AS relationships
Identify the AS-level hierarchy of Internet
Not shortest path routing
Predict AS-level paths
Traffic engineering
Understand the Internet better
Correlate with and interpret BGP update
Identify BGP misconfigurations
E.g., errors in BGP export rules
42
Existing approaches
On inferring Autonomous Systems Relationships in
the Internet, by L. Gao, IEEE Global Internet, 2000.
Characterizing the Internet hierarchy from multiple
vantage points, by L. Subramanian, S. Agarwal, J.
Rexford, and R. Katz, IEEE Infocom, 2002.
Computing the Types of the Relationships between
Autonomous Systems, by G. Battista, M. Patrignani,
and M. Pizzonia, IEEE Infocom, 2003.
On AS-level Path Inference, by Z. Mao, L. Qiu, J.
Wang, and Y. Zhang, ACM Sigmetrics, 2005.
43
Policy routing causes path
inflation
End-to-end paths are significantly longer than
necessary
Why?
Topology and routing policy choices within an ISP,
between pairs of ISPs, and across the global
Internet
Peering policies and interdomain routing lead to
significant inflation
Interdomain path inflation is due to lack of BGP
policy to provide convenient engineering of good
paths across ISPs
44
Path inflation
Based on
[Mahajan03]
Comparing
actual
Internet
paths with
hypothetical
“direct” link
45
Part II: Measuring Interdomain Forwarding Paths
Why do we care?
Characterize end-to-end network paths
Latency
Capacity
Link utilization
Loss rate.
Diagnose routing anomalies
Forwarding loop, blackholes, routing changes,
unexpected paths, main component of end-to-end
latency.
Discover Internet topology
Server placement
47
Key challenge
Need to understand how packets flow
through the Internet without real-time
access to proprietary routing data from
each domain.
Identify accurate packet forwarding paths
Characterize the performance metrics of
each hop along the paths
48
Existing approaches
With access to the source
AS-level traceroute
Towards an Accurate AS-Level Traceroute Tool, by Z.
Mao, J. Rexford, J. Wang, and R. Katz, ACM Sigcomm,
2003.
Scalable and Accurate Identification of AS-Level
Forwarding Paths, by Z. Mao, D. Johnson, J. Rexford, J.
Wang, and R. Katz, IEEE Infocom, 2004.
Without access to the source
Routescope
On AS-level Path Inference, by Z. Mao, L. Qiu, J. Wang,
and Y. Zhang, ACM Sigmetrics, 2005.
49
AS-Level Traceroute
Traceroute gives IP level forwarding
path
IP address of the router interfaces on a
forwarding path
RTT statistics for each hop along the way
50
Traceroute from AT&T Research
to www.cnn.com
traceroute to cnn.com (64.236.24.12), 30 hops max,
40 byte packets
1 oden (135.207.16.1) 1 ms 1 ms 1 ms
2 ***
3 attlr-gate (192.20.225.1) 2 ms 2 ms 2 ms
4 12.119.155.157 (12.119.155.157) 3 ms 4 ms 4
ms
5 gbr6-p52.n54ny.ip.att.net (12.123.192.18) 4 ms
4 ms 4 ms
6 tbr2-p012401.n54ny.ip.att.net (12.122.11.29) 4
ms (ttl=249!) 5 ms (ttl=249!) 5 ms (ttl=249!)
7 ggr2-p390.n54ny.ip.att.net (12.123.3.62) 4 ms 5
ms 4 ms
8 att-gw.ny.aol.net (192.205.32.218) 4 ms 4 ms 4
ms
9 bb2-nye-P1-0.atdn.net (66.185.151.66) 4 ms 4
ms 4 ms
10 bb2-vie-P8-0.atdn.net (66.185.152.201) 13 ms
(ttl=245!) 12 ms (ttl=245!) 12 ms (ttl=245!)
11 bb1-vie-P11-0.atdn.net (66.185.152.206) 10 ms
10 ms 10 ms
12 bb1-cha-P7-0.atdn.net (66.185.152.28) 20 ms
20 ms 20 ms
13 bb1-atm-P6-0.atdn.net (66.185.152.182) 25 ms
25 ms 25 ms
14 pop1-atl-P4-0.atdn.net (66.185.136.17) 25 ms
(ttl=243!) 24 ms (ttl=243!) 24 ms (ttl=243!)
15 * * *
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Destination
unreachable!
Who is responsible for
the forwarding problem?
51
Need to know Inter-domain level
path
Obtain AS level paths
BGP AS path
Traceroute AS path
52
BGP AS path
Signaling path: control traffic
d: path=[BC]
d: path=[C]
Prefix d
Forwarding path: data traffic
Prefix
d
…
AS path
ABC
…
Is BGP AS path the answer? No!
53
BGP AS path is not the
answer
Requires timely access to BGP data
Signaling path may differ from
forwarding path
Route aggregation and filtering
Routing anomalies: e.g., deflections, loops
[Griffin2002]
BGP misconfigurations: e.g., incorrect AS
prepending
Two paths may differ precisely when operators
most need accurate data to diagnose a problem!
54
Traceroute AS path
Obtain IP level path using traceroute
Map IP addresses to ASes
a
b
c
d
Source
e
Destination
AS A
AS B
AS C
AS D
Is traceroute AS path the answer? NO!
55
Traceroute AS path is not the
answer
Identifying ASes along forwarding path
is surprisingly difficult!
Internet route registry
Origin AS in BGP routes
56
Internet route registry
Whois database
E.g. NANOG traceroute, prtraceroute
Out-of-date, incomplete
Address allocation to customers
Acquisition, mergers, break-ups
57
Origin AS in BGP routes
Last AS in the AS path for each prefix
Prefix
AS path
d
ABC
…
…
More accurate and complete than whois data
58
Limitations of BGP origin AS
Multiple Origin AS (MOAS)
Multi-homing
misconfiguration
Internet eXchange Points (IXPs)
Infrastructure addresses may not be advertised
Does not require to be announced publicly
Security concerns
Addresses announced by someone else
Static routed customers
Shared equipments at boundary between ASes
Need accurate IP-to-AS mapping!
59
Accurate AS-level traceroute
Combine BGP and traceroute data
to find a better answer!
60
Assumptions
IP-to-AS mapping
Mappings from BGP tables are mostly
correct.
Change slowly
BGP paths and forwarding paths mostly
match.
70% of the BGP path and traceroute path
match
61
BGP path and traceroute path
could differ!
Inaccurate IP-to-AS mapping
Traceroute problems
Legitimate mismatches
62
BGP path and traceroute path
could differ!
Inaccurate IP-to-AS mapping
Internet eXchange Points (IXPs)
Sibling ASes
Unannounced infrastructure addresses
Traceroute problems
Legitimate mismatches
63
Internet eXchange Points (IXPs)
Shared infrastructure connected to multiple
service providers
Exchange BGP routes and data traffic
May have its own AS number or announced
by participating ASes
Dedicated BGP sessions between pairs of
participating ASes
E.g., Mae-East, Mae-West, PAIX.
64
IXPs cause extra AS hop
Extra AS hop in traceroute path
Large number of fan-in and fan-out ASes
Non transit AS, small address block, likely
MOAS
A
B
C
D
E
A
E
F
B
F
G
C
G
Traceroute AS path
BGP AS path
65
Sibling ASes
Single organization owns and manages multiple ASes
May share address space
Cause extra AS hop
Large fan-in and fan-out for the “sibling AS pair”
A
B
C
H
D
E
A
F
B
G
C
Traceroute AS path
E
D
F
G
BGP AS path
66
Unannounced infrastructure
addresses
ASes do not necessarily announce
infrastructure via BGP
Lead to “unmapped” addresses
Sometimes fall into supernet announced
by AS’s provider or sibling
67
Unannounced infrastructure
addresses
AS loop in
traceroute path
AS A
4. A,C,A
3. B,A
AS B
Substitute AS hop
2. A
Missing AS hop in
traceroute path
Extra AS hop in
traceroute path
AS C
1. A,C
68
BGP path and traceroute path
could differ!
Inaccurate IP-to-AS mapping
Traceroute problems
Forwarding path changing during
traceroute
Interface numbering at AS boundaries
ICMP response refers to outgoing interface
Legitimate mismatches
69
Forwarding path changing during
traceroute
AS D
AS E
Route flaps between A
B C and A D E
AS A
AS A
AS B
AS D
AS C
AS C
AS hop B is substituted by AS D in the traceroute path
70
Interface numbering at AS
boundaries
AS A
AS A
AS C
AS B
AS C
Missing AS hop B in traceroute path
71
ICMP response refers to outgoing
interface
AS A
AS C
ICMP
message
AS B
Extra AS hop B in traceroute path
72
BGP path and traceroute path
could differ!
Inaccurate IP-to-AS mapping
Traceroute problems
Legitimate mismatches
Route aggregation and filtering
Routing anomalies, e.g., deflections
73
Route aggregation/filtering
AS A
8.0.0.0/8 B C
AS B
AS C
8.0.0.0/8
C
8.64.0.0/16 C D
Extended traceroute path due to filtering by AS B
74
Mismatch patterns and causes
Extra
AS
Miss
AS
AS
Loop
Subst
AS
IXP
X
Sibling ASes
X
X
X
X
Unannounced IP
X
X
X
X
Aggregation/ filtering
Other
X
Inter-AS interface
X
ICMP source address
X
X
Routing anomaly
X
X
X
X
X
X
X
X
75
BGP and traceroute data
collection
Initial mappings from
origin AS of a large set of BGP tables
(Ignoring unstable paths)
For each location: Local BGP paths
For each location:
Combine all locations:
Traceroute paths
from multiple locations
Traceroute AS paths
•Compare
•Look for known causes of mismatches
(e.g., IXP, sibling ASes)
•Edit IP-to-AS mappings
(a single change explaining a large number of mismatches)
76
Measurement setup
Eight vantage points
Upstream providers: US-centric tier-1 ISPs
Sweep all routable IP address space
About 200,000 IP addresses, 160,000
prefixes, 15,000 destination ASes
77
Preprocessing BGP paths
Discard prefixes with BGP paths
containing
Routing changes based on BGP updates
Private AS numbers (64512 - 65535)
Empty AS paths (local destinations)
AS loops from misconfiguration
AS SET instead of AS sequence
Less than 1% prefixes affected
78
Preprocessing traceroute
paths
Resolving incomplete traceroute paths
Unresolved hops within a single AS map to that AS
Unmapped hops between ASes
Try match to neighboring AS using DNS, Whois
Trim unresponsive (*) hops at the end
Compare with the beginning of local BGP paths
MOAS at the end of paths
Assume multi-homing without BGP
Validation using AT&T router configurations
More than 98% cases validated
79
Initial IP-to-AS Mapping
Whois
Combined
BGP tables
Resolving
incompletes
Match
44.7%
73.2%
78.0%
Mismatch
29.4%
8.3%
9.0%
1.5
8.8
9.0
Ratio
80
Heuristics to improve mappings
Overall modification to mappings
10% IP-to-AS mappings modified
25 IXPs identified
28 pairs of sibling ASes found
1150 of the /24 prefixes shared
Match
Mismatch
Ratio
IXP
Sibling
ASes
Unannounced
address space
84.4%
85.9%
90.6%
8.7%
7.8%
3.5%
9.7
11.0
26.0
81
Systematic optimization
Dynamic-programming and iterative
improvement
Initial IP-to-AS mapping derived from BGP
routing tables
Identify a small number of modifications
that significantly improve the match rate.
95% match ratio, less than 3% changes,
very robust
82
Optimization results
Input mapping
Full initial Mapping
Heuristically optimized mapping
Omit 10% initial mapping
Omit 4 probing sources
Omit probing destinations
(one probe per unique BGP path)
Mismatch
5.23%
3.08%
6.57%
6.34%
7.12%
83
AS-level path inference
Without access to the source
Challenges
Asymmetric routes: 60%
Complicated routing policies
Multihomed networks
Find the shortest policy path that conforms with AS relationships
84
Routescope
Assumptions
Explicit AS relationship
Peer-peer
Provider-customer
Shortest policy AS path preferred
Valley-free
Uniform routing policy within an AS
AS destination based uniform routing
Stability
These assumptions are mostly correct.
85
AS path inference algorithm
Compose AS graph based on BGP tables
Infer AS relationship
Classify edges based on AS relationship
Customer-provider (UP) link
Provider-customer (DOWN) link
Peering (FLAT) link
Compute shortest policy path conforming the “valleyfree” rule using modified Dijkstra’s algorithm
Infer the first AS hop if multiple paths returned
86
AS path inference accuracy
Total
Match
Match
length
Exact
match
Shorter
Longer
AS7018
(tier-1)
18085
82%
83%
35%
17%
0%
AS2152
(tier-2)
11990
64%
64%
10%
35%
0%
AS8121
(tier-3)
15757
16%
27%
3%
69%
4%
All BGP
gateways
2457
70%
73%
30%
22%
4%
US BGP
gateways
1907
60%
62%
27%
34%
4%
If the first hop is known, 15% of mismatches can be eliminated.
87
First hop inference
AS T1
AS S
AS D
Transition point T1
Source
Only have
access to
D
T2
AS T2
Destination
AS C
Gather candidate first hop ASes from S by launch
traceroute to S from multiple vantage points
Identify the transition point T that is likely to be on
the path from S to D by testing
hop_count(S,T) + hop_count(T,D) = hop_count(S,D88)
Hop count inference
Hop_count(S,T) = hop_count(T,S)
To infer hop_count(H,D): H = T or S
Send ping packet to H
Guess the initial TTL value TTL0 set by H
Get TTL value TTL1 in ICMP response packet received from
H
Hop_count(H,D) = TTL0 - TTL1 + 1
Common value for TTL0:
32 (Win95/98/Me)
64 (Linux, Compaq Tru64)
128 (Win NT/2000/XP)
255 (most UNIX systems)
89
Improvement with known first AS
hop
Total
Match length Improvement
AS7018 (tier-1)
18085
86%
3%
AS2152 (tier-2)
11990
76%
12%
AS8121 (tier-3)
15757
48%
21%
All BGP
gateways
1907
70%
8%
US BGP
gateways
2457
88%
15%
90
Possible causes of inaccuracy
Complicated AS relationships: 15% paths
Two consecutive FLAT links
DOWN link followed by a FLAT link
FLAT link followed by UP link
Routing policies
Shortest path vs. customer routes
Inconsistent advertisement to different peering
locations
BGP tie-breaking rules
AS prepending:>28% ASes
91
Part III: BGP Measurement
BGP routing updates
Route updates at prefix level
No activity in “steady state”
Routing messages indicate changes, no
refreshes
93
Internet routing instability
Large # of BGP updates
Failures
Policy changes
Redundant messages
Routing instability
Route keeps changing, e.g., routes keep
going up and down
94
Implications
Router overhead
Transient delay and loss
Unreachable hosts
High loss rate
High jitter
Long delays
Significant packet reordering
Poor predictability of traffic flow
How do we know if the instability is due to
routing or network congestion?
95
Measure BGP stability
First work by Labovitz et al.
Methodology
Collect routing messages from five public
exchange points
BGP information considered
AS path
Next hop: next hop to reach a network
Two routes are the same if they have the same AS
path and next hop
Other attributes (e.g., MED, communities) ignored
Focus on forwarding path stability
96
Measurement methodology
97
BGP information exchange
Announcements: a router has either
Learned of a new route, or
Made a policy decision that it prefers a new route
Withdrawals: a router concludes that a
network is no longer reachable
Explicit: associated to the withdrawal message
Implicit: (in effect an announcement) when a
route is replaced as a result of an announcement
message
In steady state BGP updates should be only
the result of infrequent policy changes
BGP is stateful, requires no refreshes
Update rate: indication of network stability
98
Example of delayed convergence
stage
0
1
4
2: [1] [41] [431]
node 3: [1] [41] [241]
4: [1] [31] --
Example topology:
9
----
d
1
2
4
3
Assuming node 1 has a route to a destination, and it withdraws the route:
Stage (msg processed)
0:
1: 1->{2,3,4}W
Msg queued
1->{2,3,4}W
2->{3,4}A[241], 3->{2,4}A[341], 4->{2,3}A[431]
2: 2->{3,4}A[241]
3: 3->{2,4}A[341]
4: 4->{2,3}A[431]
3->{2,4}A[341], 4->{2,3}A[431]
4->{2,3}A[431], 4->{2,3}W
MinRouteAdver timer expires:
4->{2,3}W, 3->{2,4}A[3241], 2->{3,4}A[2431]
… (omitted)
9: 3->{2,4}W
Note: In response to a withdrawal from 1, node 3 sends out 3 messages:
3->{2,4}A[341], 3->{2,4}A[3241], 3->{2,4}W
99
Types of
inter-domain routing updates
Forwarding instability
may reflect topology changes
Policy fluctuations (routing instability)
may reflect changes in routing policy information
Pathological updates
redundant updates that are neither routing nor
forwarding instability
Instability
forwarding instability and policy fluctuation
change forwarding path
100
Routing successive events
(instability)
WADiff
W: a route is explicitly withdrawn as it becomes
unreachable
A: is later replaced with an alternative route
Forwarding instability
AADiff
A: a route is implicitly withdrawn
A: then replaced by an alternative route as the
original route becomes unavailable or a new
preferred route becomes available
Forwarding instability
101
Routing successive events
(pathological instability)
WADup
W: a route is explicitly withdrawn
A: then reannounced later
forwarding instability or pathological behavior
AADup
A: a route is implicitly withdrawn
A: then replaced with a duplicate of the original route
pathological behavior or policy fluctuation
WWDup
The repeated transmission of BGP withdrawals for a prefix
that is currently unreachable (pathological behavior)
102
Measurement findings:
overview
Year 2000
BGP updates more than one order of
magnitude larger than expected
Routing information dominated by
pathological updates
Implementation problems
BGP self-synchronization
Unconstrained routing policies
103
Routing problem findings
Implementation problems
Redundant updates
Routers do not maintain the history of the
announcements sent to neighbors
Self-synchronization
BGP routers exchange information simultaneously
may lead to periodic link/router failures
Unconstrained routing policies may lead to
persistent route oscillations
104
Instability measurement
Instability and redundant updates
exhibits strong correlation with load
(30 seconds, 24 hours and seven days
periods)
Instability usually exhibits high
frequency
Pathological updates exhibits both high
and low frequencies
105
Non-localized instability
No single AS dominates instability
statistics
No correlation between size of AS and
its impact on instability statistics
There is no small set of paths that
dominate instability statistics
106
Measurement conclusions
Routing in the Internet exhibits many
undesirable behaviors
Instability over a wide range of time scales
Asymmetric routes
Network outages
Problem seems to worsen
Many problems are due to software
bugs or inefficient router architectures
107
Lessons
Even after decades of experience routing in
the Internet is not a solved problem
This attests the difficulty and complexity of
building distributed algorithm in the Internet,
i.e., in a heterogeneous environment with
products from various vendors
Simple protocols may increase the chance to
be
Understood
Implemented right
108
Better understanding of BGP
dynamics
Difficulties
Multiple administrative domains
Unknown information (policies, topologies)
Unknown operational practices
Ambiguous protocol specs
Proposal: a controlled active measurement infrastructure
for continuous BGP monitoring – BGP Beacons.
109
What is a BGP Beacon?
An unused, globally visible prefix with known
Announce/Withdrawal schedule
For long-term, public use
110
Who will benefit from BGP
Beacon?
Researchers: study BGP dynamics
To calibrate and interpret BGP updates
To study convergence behavior
To analyze routing and data plane interaction
Network operators
Serve to debug reachability problems
Test effects of configuration changes:
E.g., flap damping setting
111
Related work
Differences from Labovitz’s “BGP faultinjector”
Long-term, publicly documented
Varying advertisement schedule
Beacon sequence number (AGG field)
Enabler for many research in routing dynamics
RIPE Ris Beacons
Set up at 9 exchange points
112
Active measurement
infrastructure
Many Observation points:
1:Oregon RouteViews
Internet
ISP
2. RIPE
ISP
3.AT&T
ISP
Send
route update
Upstream
provider
Stub AS
BGP Beacon #1
198.133.206.0/24
ISP
ISP
ISP
ISP
Upstream
provider
ISP
ISP
ISP
4. Verio
5. MIT
6.Berkeley
113
Deployed PSG Beacons
Prefix
Src
AS
Start
date
Upstream
Beacon
provider AS host
Beacon
location
198.133.206.0/24
3130
8/10/02
2914, 1239
Randy Bush
WA, US
192.135.183.0/24
5637
9/4/02
3701, 2914
Dave Meyer
OR, US
203.10.63.0/24
1221
9/25/02
1221
Geoff Huston
Australia
198.32.7.0/24
3944
10/24/02 2914, 8001
Andrew Partan
MD, US
192.83.230.0/24
3130
06/12/03 2914, 1239
Randy Bush
WA, US
114
Deployed PSG Beacons
B1, 2, 3, 5:
Announced and withdrawn with a fixed period
(2 hours) between updates
1st daily ANN: 3:00AM GMT
1st daily WD: 1:00AM GMT
B4: varying period
B5: fail-over experiments
Software available at:
http://www.psg.com/~zmao
115
Beacon 5 schedule
Live host behind
the beacon for
data analysis
Study fail-over
Behavior for
multi-homed
customers
116
Beacon terminology
Internet
Beacon
AS
Beacon prefix:
198.133.206.0/24
Input signal:
Beacon-injected change
3:00:00 GMT: Announce (A0)
5:00:00 GMT: Withdrawal (W)
Output signal:
RouteView
AT&T
5:00:10 A1
5:00:40 W
5:01:10 A2
Signal length: number of updates in
output signal (3 updates)
Signal duration: time between first
and last update in the signal (5:00:10 - 5:01:10, 60 seconds)
Inter-arrival time: time between
consecutive updates
117
Process Beacon data
Identify output signals, ignore external events
Data cleaning
Anchor prefix as reference
Same origin AS as beacon prefix
Statically nailed down
Minimize interference between consecutive
input signals
Beacon period is set to 2 hours
Time stamp and sequence number
Attach additional information in the BGP updates
Make use of a transitive attribute: Aggregator fields
118
Beacon data cleaning process
Goal
Clearly identify
updates associated
with injected routing
change
Discard beacon
events influenced by
external routing
changes
119
Cumulative Beacon statistics:
significant noise
Current observation points:
111 peers: RIPE, Route-View, Berkeley,
MIT, MIT-RON nodes, ATT-Research, AT&T,
AMS-IXP, Verio
Avg expansion: 2*0.2+1*0.8=1.2
120
Cumulative Beacon statistics:
significant noise
Example response to ANN-beacon at peer p
R1: ASpath= 286 209 1 3130 3927
No. transient routes=2
R2: ASpath= 286 209 2914 3130 3927
Out-signal length=1
100 events: 20: R1 R2, 80: R2
Beacon
Max no. Max ANN- Max WD- Max ANN-avg Max WD-avg
transient out-signal out-signal
expansion
expansion
routes
length
length
1
186
11
14
9.7
11.2
2
179
9
15
7.0
10.8
3
117
16
13
5.8
11.4
4
307
18
15
8.8
16.3
121
Cisco vs. Juniper
update rate-limiting
Known last-hop Cisco
and Juniper routers
from the same AS and
location
Average signal length:
average number of
updates observed for a
single beacon-injected
change
122
“Cisco-like” last-hop routers
Linear increase in
signal duration wrt
signal length
Slope=30 second
Due to Cisco’s
default rate-limiting
setting
123
“Juniper-like” last-hop routers
Signal duration
relatively stable
wrt increase in
signal length
Shorter signal
duration compared
to “Cisco-like”
last-hops
124
Route flap damping
A mechanism to punish unstable routes
by suppressing them
Reduce router processing load due to
instability
Prevent sustained routing oscillations
Do not sacrifice convergence times for
well-behaved routes
There is conjecture a single announcement can
cause route suppression.
125
RFC2439: Route flap damping
Cisco default setting
Scope
Penalty
Exponentially decayed
3000
Suppress threshold
Inbound external routes
Per neighbor, per
destination
Penalty
Flap: route change
Increases for each flap
Decays exponentially
2000
1000
750
P(t ' ) P(t )e (t 't )
Reuse threshold
0
2
4
32
Time (min)
126
Route flap damping analysis
Strong evidence for
withdrawal- and
announcementtriggered
suppression.
127
Distinguish between
announcement and withdrawal
Summary:
•WD-triggered sup
more likely
than ANNtriggered sup
•Cisco: overall
more likely trigger
sup than Juniper
(AAAW-pattern)
•Juniper: more
aggressive for
AWAW pattern
128
Convergence analysis
Summary:
•Withdrawals
converge
slower than
announcements
•Most beacon
events converge
within 3 minutes
129
Output signal duration
30
60
90
120
130
Beacon 1’s upstream change
Single-homed
(AS2914)
Multi-homed
(AS1,2914)
Multi-homed
131
(AS1239, 2914)
Beacon for identifying router
behavior
Rate-limiting timer
30 second
Beacon 2
seen from
RouteView data
Different
rate-limiting
behavior:
Cisco vs.
Juniper
132
Inter-arrival time analysis
133
Inter-arrival time modeling
Geometric distribution (body):
Update rate-limiting behavior: every 30 sec
Prob(missing update train) independent of how many
already missed
Mass at 1:
Discretization of timestamps for times<1
Shifted exponential distribution (tail):
Most likely due to route flap damping
134
Motivation
destination
Failure
AS4
Disruption
AS2
AS3
Congestion
BR
C
A
BR
C
B
AS1
BR
C
C
Mitigation
A backbone network is vulnerable to routing
BR
C
D
changes that occur
in other domains.
source
135
Goal
Identify important routing anomalies
Lost reachability
Persistent flapping
Large traffic shifts
Contributions:
•Build a tool to identify a small number of
important routing disruptions from a large
volume of raw BGP updates in real time.
•Use the tool to characterize routing
disruptions in an operational network
136
Capturing Routing Changes
A large operational network
(8/16/2004 – 10/10-2004)
BR
BR
C
BR
BR
C
BR
BR
C
BGP
CPE
Monitor
BR
BR
C
BR
BR
C
BR
BR
C
137
Challenges
Large volume of BGP updates
Millions daily, very bursty
Too much for an operator to manage
Different from root-cause analysis
Identify changes and their effects
Focus on actionable events rather than
diagnosis
Diagnose causes in/near the AS
138
System Architecture
BGP (106)
BR
E Updates
Events (105)
BR
E
BR
E
BGP Update
Grouping
Persistent
Flapping
Prefixes
(101)
“Typed”
Events
Event
Classification
Clusters
Event
Correlation
Frequent
Flapping
Prefixes
(101)
Large
Disruptions
(101)
(103)
Traffic Impact
Prediction
Netflow
Data
BR
E
BR
E
BR
E
From millions of updates to a few dozen reports
139
Grouping BGP Update into Events
Challenge: A single routing change
leads to multiple update messages
affects routing decisions at multiple routers
BR
E
BR
E
BR
E
Approach:
BGP BGP Update
Updates Grouping
Persistent
Flapping
Prefixes
Events
•Group together all updates
for a prefix with
inter-arrival < 70 seconds
•Flag prefixes with changes
lasting > 10 minutes.
140
Grouping Thresholds
Based on our understanding of BGP and data
analysis
Event timeout: 70 seconds
2 * MRAI timer + 10 seconds
98% inter-arrival time < 70 seconds
Convergence timeout: 10 minutes
BGP usually converges within a few minutes
99.9% events < 10 minutes
141
Persistent Flapping Prefixes
A surprising finding:
15.2% of updates were caused by
persistent-flapping prefixes even though
flap damping is enabled.
Types of persistent flapping
Conservative damping parameters (78.6%)
Protocol oscillations due to MED (18.3%)
Unstable interfaces or BGP sessions (3.0%)
142
Example: Unstable eBGP Session
AE
ISP
DE
Peer
BE
CE
p
Customer
Flap damping parameters is session-based
Damping not implemented for iBGP sessions
143
Event Classification
Challenge: Major concerns in network management
Changes in reachability
Heavy load of routing messages on the routers
Change of flow of the traffic through the network
Events
Event
Classification
“Typed” Events,
e.g., Loss/Gain of Reachability
Solution: classify events by severity of their impact
144
Event Category – “No Disruption”
p
AS2
AS1
DE
No Traffic Shift
“No Disruption”:
EE
AE
BE
ISP
no border routers have any traffic shift. (50.3%)
CE
145
Event Category – “Internal Disruption”
p
AS2
AS1
DE
EE
AE
BE
“Internal Disruption”: ISP
all traffic shifts are internal. (15.6%)
CE
Internal Traffic Shift
146
Event Category – “Single External
Disruption”
p
AS2
AS1
DE
external Traffic Shift
EE
AE
BE
“Single External Disruption”:
ISP
only one of the traffic shifts is external (20.7%)
CE
147
Statistics on Event Classification
Events
Updates
No Disruption
50.3%
48.6%
Internal Disruption
15.6%
3.4%
Single External Disruption
20.7%
7.9%
Multiple External Disruption
7.4%
18.2%
Loss/Gain of Reachability
6.0%
21.9%
First 3 categories have significant day-to-day
variations
Updates per event depends on the type of events
and the number of affected routers
148
Event Correlation
Challenge: A single routing change
affects multiple destination prefixes
“Typed”
Events
Event
Correlation
Clusters
Solution:
group the same-type, close-occurring events
149
EBGP Session Reset
Caused most of “single external disruption”
events
Check if the number of prefixes using that
session as the best route changes dramatically
Number of prefixes
session
recovery
session
failure
time
Validation with Syslog router report (95%)
150
Hot-Potato Changes
Hot-Potato Changes
P
AE
11
9
BE
ISP
10
“Hot-potato routing” =
route to closest egress point
CE
Caused “internal disruption” events
Validation with OSPF measurement (95%)
[Teixeira et al – SIGMETRICS’ 04]
151
Traffic Impact Prediction
Challenge: Routing changes have different
impacts on the network which depends
on the popularity of the destinations
Traffic Impact
Prediction
Clusters
Large
Disruptions
Netflow
Data
E
BR
E
BR
E
BR
Solution: weigh each cluster by traffic volume
152
Traffic Impact Prediction
Traffic weight
Per-prefix measurement from netflow
10% prefixes accounts for 90% of traffic
Traffic weight of a cluster
the sum of “traffic weight” of the prefixes
A small number of large clusters have large
traffic weight
Mostly session resets and hot-potato changes
153
Performance Evaluation
Memory
Static memory: “current routes”, 600 MB
Dynamic memory: “clusters”, 300 MB
Speed
99% of intervals of 1 second of updates can be
process within 1 second
Occasional execution lag
Every interval of 70 seconds of updates can be
processed within 70 seconds
Measurements were based on 900MHz CPU
154
Conclusion of BGP
Troubleshooting Tool
BGP troubleshooting system
Fast, online fashion
Operators’ concerns (reachability, flapping, traffic)
Significant information reduction
millions of update a few dozens of large disruptions
Uncovered important network behavior
Hot-Potato changes
Session resets
Persistent-flapping prefixes
155
Part IV BGP Modeling
BGP Is Not Guaranteed to
Converge!
BGP is not guaranteed to converge to a
stable routing. Policy inconsistencies
can lead to “livelock” protocol
oscillations.
Goal:
Design a simple, tractable and complete
model of BGP modeling
Example application: sufficient condition to
guarantee convergence.
157
BGP is Solving What Problem?
Underlying problem
Distributed means of
computing a solution.
Shortest Paths
RIP, OSPF, IS-IS
X?
BGP
X can
aid in the design of policy analysis algorithms and
heuristics,
aid in the analysis and design of BGP and
extensions,
help explain some BGP routing anomalies,
provide a fun way of thinking about the protocol
158
Separate Dynamic and Static
Semantics
Static semantics:
BGP policies Stable Paths Problem
Dynamic semantics:
BGP SPVP
SPVP: Simple Path Vector Protocol
A distributed algorithm for solving Stable Paths
Problem
159
What is Stable Paths
Problem?
Example:
A graph of nodes and
edges,
Node 0, called the origin,
For each non-zero node, a
set or permitted paths to
the origin. This set always
contains the “null path”.
A ranking of permitted
paths at each node. Null
path is always least
preferred.
222 10 0
5
5210
2
4
420
430
3
30
0
1
130
10
most preferred
…
least preferred (not160null)
A Solution to SPP
A solution is an assignment of permitted
paths to each node such that
node u’s assigned path is either the null
path or is a path uwP, where wP is
assigned to node w and {u,w} is an edge
in the graph,
each node is assigned the highest ranked
path among those consistent with the
paths assigned to its neighbors
161
A Solution to SPP
A solution
need not
represent a
shortest path
tree or a
spanning tree
210
20
5
5210
2
4
420
430
3
30
0
1
130
10
162
There can be Multiple
Solutions to an SPP
120
10
120
10
1
120
10
1
0
0
2
210
20
DISAGREE
1
2
210
20
First solution
0
2
210
20
Second solution
163
Multiple Solutions Can Occur
Due to Recovery:
10
1230
1
230
210
2
1
primary
link
0
2
0
1
10
1230
2
230
310
0
backup
link
3210
30
3
Remove primary link
3
3
3210
30
Restore primary link
164
Ranking BGP Paths
Highest local Preference
Shortest AS path Length
Origin: IGP<EGP<INCOMPLETE
Lowest MED value
IBGP preferred over EBGP
Lowest IGP cost
Tie breaking
165
Bad Gadget: No Solution
Stage 1:
1: [10]
2: [210]
3: [30]
Stage 2:
1:[130]
2:[20]
3:[320]
Back to stage 1
2
210
20
4
0
1
130
10
3
320
30
166
Bad Gadget: No Solution
Stage 1:
1: [10]
2: [20]
3: [320]
Stage 2:
1:[130]
2:[210]
3:[30]
Back to stage 1
2
210
20
4
0
1
130
10
3
320
30
167
Has A Solution, But Can Get
Trapped:
4
310
3120
5
5310
563120
53120
4310
453120
43120
1
3
120
10
0
6
2
6310
643120
63120
This part has a solution only
when node 1 is assigned the
direct path (1 0).
210
20
As with DISAGREE, this part
has two distinct solutions 168
Has A Solution, But Can Get
Trapped:
4
310
3120
5
5310
563120
53120
4310
453120
43120
1
3
120
10
0
6
2
6310
643120
63120
This part has a solution only
when node 1 is assigned the
direct path (1 0).
210
20
As with DISAGREE, this part
has two distinct solutions 169
How To Solve An SPP?
Exponential complexity
Just enumerate all path assignments,
And check stability of each….
NP-complete
3-SAT can be reduced to SPP
170
Distributed Algorithms to
Solve SPP
OSPF-like
Distributed topology, path ranks
Solve SPP locally
Exponential worst case
How to avoid loops if multiple solutions exist?
RIP-like:
Pick the best path form neighbors’ paths
Tell neighbors about changes
Can diverge
Not guaranteed to find a solution even if it exists
No bound on convergence time
171
SPVP Protocol
Pick the best path available at any time
process spvp[u] {
receive P from w
{ rib-in(uw) := u P
if rib(u) != best(u) {
rib(u) := best(u)
foreach v in peers(u) {
send rib(u) to v
}
}
}
}
172
SPVP and SPP
SPVP wanders around assignment
space
SPP Solvable
must converge
SPVP Can Diverge
must diverge
173
A sufficient condition for
sanity
If an instance of SPP has an
acyclic dispute digraph, then
Static (SPP)
Dynamic (SPVP)
solvable
safe (can’t diverge)
unique solution
predictable restoration
all sub-problems
uniquely solvable
robust with respect to
link/node failures
174
Dispute Digraph Example
130
10
210
20
1
2
0
20
10
420
210
3420
3
4
3420
30
420
430
BAD GADGET II
CYCLE
430
130
30
175
Dispute Wheels
R_k
u_0
u_k
Q_0
Q_k
•At
u_i,
rank
of
Q_i
u_1
R_1 is less than or equal
u_2to rank of R_iQ_(i+1)
Q_1
R_0
Q_2
Q_(I+1)
Q_i
u_(i+1)
R_i
u_i
•There exists a
dispute wheel iff
there exists
cycle in the dispute
digraph
176
Dispute Wheel Example
1230
120
10
2310
230
20
1
2
3
1
1
2
0
0
3
3
2
3120
310
30
177
A Dynamic Solution
Extend SPVP with a history attribute,
A route’s history contains a path in the
dispute digraph that “explains” how the route
was obtained,
A route history will contain a dispute cycle if
and only if a policy dispute is dynamically
realized.
If a route’s history contains a cycle, then
suppress it ….
178