Towards an Accurate AS-level Traceroute Tool

Download Report

Transcript Towards an Accurate AS-level Traceroute Tool

Internet Routing: Measurement,
Modeling, and Analysis
ACM Sigmetrics 2005 Tutorial
Dr. Jia Wang
[email protected]
AT&T Labs Research
Florham Park, NJ 07932, USA
http://www.research.att.com/~jiawang/
Prof. Zhuoqing Morley Mao
[email protected]
Department of EECS
University of Michigan
Ann Arbor, MI 48109, USA
http://www.eecs.umich.edu/~zmao/
Outline
1.
2.
3.
4.
Overview of Inter-domain routing
Measuring inter-domain paths
BGP Measurement
BGP Modeling
Our opinions should not be taken to represent AT&T policies
2
Part I: Overview of Interdomain Routing
Internet
 Loose cooperative effort of Internet
Service Providers (ISPs)
 E.g., AT&T, Sprint, UUNet, AOL
 Best effort service
 Connectedness
 Anyone connected to the Internet can
exchange traffic with anyone else
connected to the Internet
4
Internet routing
routes
Control plane:
exchange routes
Internet
: Routing session
Data plane:
forward traffic
IP traffic
Fail over to alternate route
rusty.cs.berkeley.edu
www.cnn.com
IP=169.229.62.116
Prefix=169.229.0.0/16
IP=64.236.16.52
5
Prefix=64.236.16.0/20
Internet routing domain
 Autonomous routing domain
 Network devices under same technical and administrative
control
 Common routing policy
 E.g., ISPs, enterprise networks
 Autonomous system
 Autonomous routing domain with an AS number (ASN)
 AS numbers: 16 bits integer
 Public AS number: 1 – 64511
 Private AS number: 64512 – 65535
 Examples
 AT&T: 7018, 6431, …
 Sprint: 1239, 1240, …
 MIT: 3
6
More than 20,000 ASes today
Internet
Autonomous
System
ISP
Level3
Calren
Berkeley
ISP
ISP
Qwest
ISP
business
ISP
ISP
AT&T
Sprint UUnet
ISP
ISP
IP traffic
University
company
GNN
CNN
7
Internet routing architecture
Intra-domain
routing
Calren
Berkeley
Level3
IP traffic
Internet
Inter-domain
routing
GNN
CNN
8
Intra-domain routing
 Run within a certain network infrastructure
 Optimize routes taken between points within
a network
 Internal Gateway Protocols (IGPs)




Metrics based
OSPF (Open Shortest Path First)
RIP (Routing Information Protocol)
IS-IS (Intermediate System to Intermediate
System)
9
Inter-domain routing
 Run between networks
 Provide full connectivity of entire
Internet
 External Gateway Protocol (EGP)
 Policy based
 BGP (Border Gateway Protocol)
10
Link state protocols
 Examples: OSPF, IS-IS
 Based on Dijkstra’s shortest path computation
 Each router periodically floods immediate
reachability information to other routers
 Fast convergence
 High communication and computation
overhead
 Not scalable for large networks
 Requires periodic refreshes
11
Vectoring protocols
 Distance vs. Path Vector
 Distance: hop count (RIP)
 Path: entire path (BGP)
 Helps identify loops
 Supports policy-based routing based on path
 Minimal communication overhead
 Takes longer to converge, i.e., in
proportion to the maximum path length
12
Link state vs. vectoring
Link state Vectoring
IGP
EGP
OSPF
IS-IS
RIP
BGP
BGP is a path vector protocol
13
Classful addressing
 IPv4: 32 bits
 Five classes of networks
Class
Address
Mask
# of networks # of hosts
A
0*
255.0.0.0
128
~1.6M
B
10*
255.255.0.0
16384
65535
C
110*
255.255.255.0
~2.1M
255
D
Used for multicast
E
Reserved and currently unused
Improve scaling factor of routing in the Internet => classless
14
CIDR: Classless Inter-domain
Routing (RFC1519)
 No implicit mask based on the class of the network
 Explicit masks passed in the routing protocol
 Allow aggregation and hierarchical routing
IP address: 12.70.0.0
Address
Mask
Mask: 255.255.252.0
00001100 00100110 00000000 00000000
11111111 11111111 11000000 00000000
Network prefix
CIDR representation: 12.70.0.0/22
Host
identifier
15
Address aggregation
12.70.0.0/24
12.70.1.0/24
12.70.2.0/24
12.70.3.0/24
Internet
ISP A
12.71.0.0/16
ISP B
12.70.0.0/22
12.71.0.0/16
16
Routing and forwarding
 Routing
 The decision process of choosing optimal
path that is consistent with the
administrative or technical policy
 Forwarding
 The act of receiving a packet, doing a
lookup, and copying a packet to the next
hop
17
Classless forwarding
Internet
12.70.0.20
10.20.128.10
10.20.128.1
10.20.0.1
IP traffic
10.20.1.1
135.120.0.1
Prefix
12.70.0.0/24
12.70.0.0/16
12.0.0.0/8
0.0.0.0
Next hop
10.20.0.1
10.20.1.1
10.20.128.1
10.20.128.10
18
Inter-domain routing with CIDR
support
 BGP-4 [RFC1771]






De facto EGP
Carry routing information between ASes
Path vector protocol
Policy based routing
Run on top of TCP for reliability
Basic operations
 Set up BGP session
 Exchange all candidate routes
 Send incremental updates
19
Establish BGP session
Establish neighboring session
between 12.10.0.1 and 12.10.0.2
12.10.0.1
Prefix
135.120.0.0/24
68.35.0.0/16
TCP 179
Next hop
10.128.0.1
10.192.1.1
12.10.0.2
Prefix
12.70.0.0/24
12.9.0.0/16
Next hop
10.20.0.1
10.20.1.1
20
Exchange all candidate routes
12.70.0.0/24
12.9.0.0/16
10.20.0.1
10.20.1.1
12.10.0.1
12.10.0.2
135.120.0.0/24
68.35.0.0/16
Prefix
135.120.0.0/24
68.35.0.0/16
12.70.0.0/24
12.9.0.0/16
Next hop
10.128.0.1
10.192.1.1
10.20.0.1
10.20.1.1
10.128.0.1
10.192.1.1
Prefix
12.70.0.0/24
12.9.0.0/16
135.120.0.0/24
68.35.0.0/16
Next hop
10.20.0.1
10.20.1.1
10.128.0.1
10.192.1.1
21
Send incremental updates
Withdraw 12.9.0.0/16
12.10.0.1
Prefix
135.120.0.0/24
68.35.0.0/16
12.70.0.0/24
12.9.0.0/16
12.10.0.2
Next hop
10.128.0.1
10.192.1.1
10.20.0.1
10.20.1.1
Prefix
12.70.0.0/24
12.9.0.0/16
135.120.0.0/24
68.35.0.0/16
Next hop
10.20.0.1
10.20.1.1
10.128.0.1
10.192.1.1
22
BGP messages
 OPEN: set up a peering session
 UPDATE: announce new routes or
withdraw previously announced routes
 NOTIFICATION: shut down a peering
session
 KEEPALIVE: confirm active connection
at regular interval
23
Internal vs. external BGP
Internet
I-BGP
E-BGP
update
E-BGP
I-BGP
update
AS B
AS C
AS A
24
Scaling I-BGP for large AS
 Route reflectors
 Confederations
AS 1000
E-BGP
update
EBGP
RR
RR
IBGP
AS 65010
IBGP EBGP
AS 65020
Only best paths
being sent by
RR
25
Establish connectivity
Prefix
135.120.0.0/16
AS 3
Next hop AS path
12.10.0.5 2 1
Prefix
135.120.0.0/16
IBGP
Next hop AS path
12.10.0.1 1
12.10.0.6
EBGP
12.10.0.5
AS 2
AS 1
135.120.0.0/16
IBGP
EBGP
12.10.0.1
12.10.0.2
IBGP
Prefix
135.120.0.0/16
Next hop AS path
12.10.0.1 1 26
IGP and BGP working together
Prefix
135.120.0.0/16
AS 3
IBGP
Next hop AS path
12.10.0.1 1
Prefix
12.10.0.0/30
135.120.0.0/16
12.10.0.6
Next hop
10.10.0.1
10.10.0.1
EBGP
12.10.0.5
AS 1
12.10.0.1
135.120.0.0/16
EBGP
AS 2
12.10.0.2
10.10.0.1
IBGP
IBGP
Prefix
135.120.0.0/16
Next hop AS path
12.10.0.1 1
27
Policy routing
ISP2
ISP1
traffic
Connectivity DOES NOT
imply reachability!
ISP3
ISP4
traffic
Cust1
Cust2
Policy determines how traffic
can flow on the Internet
28
BGP routing process
Routes
received
from peers
Apply
input
policy
Select
best
route
Best
routes
Apply
output
policy
Routes
advised
to peers
Routing Forwarding
table
table
BGP is not shortest path routing!
29
Best route selection






Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
30
Best route selection
 Highest local preference
 To enforce economical relationships
between domains





Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
31
Best route selection
 Highest local preference
 Shortest AS path
 Compare the quality of routes, assuming
shorter AS-path length is better




Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules
32
Best route selection
 Highest local preference
 Shortest AS path
 Lowest MED (Multi-Exit-Discriminator)
 To implement “cold potato” routing
between neighboring domains
 I-BGP < E-BGP
 Lowest I-BGP cost to E-BGP egress
 Tie breaking rules
33
Best route selection




Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
 Prefer EBGP routes to IBGP routes
 Lowest I-BGP cost to E-BGP egress
 Tie breaking rules
34
Best route selection





Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
 Prefer routes via the nearest IGP neighbor
 To implement “hot potato” routing
 Tie breaking rules
35
Best route selection
Highest local preference
Shortest AS path
Lowest MED (Multi-Exit-Discriminator)
I-BGP < E-BGP
Lowest I-BGP cost to E-BGP egress
Tie breaking rules








Router ID based: lowest router ID
Age based: oldest route
36
BGP route propagation
 Not all possible routes propagate
 Commercial relationships determine policies
for
 Route import
 Route selection
 Route export
37
Typical AS relationships
 Provider-customer
 customer pay money for transit
 Peer-peer
 typically exchange respective customers’ traffic for
free
 Siblings
 Mutual transit agreement
 Provide connectivity to the rest of the Internet for
each other
38
AS relationships translate
into BGP export rules
 Export to a provider or a peer
 Allowed: its routes and routes of its
customers and siblings
 Disallowed: routes learned from other
providers or peers
 Export to a customer or a sibling
 Allowed: its routes, the routes of its
customers and siblings, and routes learned
from its providers and peers
39
Which AS paths are legal?
 Valley-free:
 After traversing a provider-customer or
peer-peer edge, cannot traverse a
customer-provider or peer-peer edge
 Invalid path: >= 2 peer links, downhilluphill, downhill-peer, peer-uphill
40
Example of valley-free paths
[1 2 3], [1 2 6 3] are valley-free
X
X
[1 4 3], [1 4 5 3] are not valley free
41
Inferring AS relationships
 Identify the AS-level hierarchy of Internet
 Not shortest path routing





Predict AS-level paths
Traffic engineering
Understand the Internet better
Correlate with and interpret BGP update
Identify BGP misconfigurations
 E.g., errors in BGP export rules
42
Existing approaches
 On inferring Autonomous Systems Relationships in
the Internet, by L. Gao, IEEE Global Internet, 2000.
 Characterizing the Internet hierarchy from multiple
vantage points, by L. Subramanian, S. Agarwal, J.
Rexford, and R. Katz, IEEE Infocom, 2002.
 Computing the Types of the Relationships between
Autonomous Systems, by G. Battista, M. Patrignani,
and M. Pizzonia, IEEE Infocom, 2003.
 On AS-level Path Inference, by Z. Mao, L. Qiu, J.
Wang, and Y. Zhang, ACM Sigmetrics, 2005.
43
Policy routing causes path
inflation
 End-to-end paths are significantly longer than
necessary
 Why?
 Topology and routing policy choices within an ISP,
between pairs of ISPs, and across the global
Internet
 Peering policies and interdomain routing lead to
significant inflation
 Interdomain path inflation is due to lack of BGP
policy to provide convenient engineering of good
paths across ISPs
44
Path inflation
 Based on
[Mahajan03]
 Comparing
actual
Internet
paths with
hypothetical
“direct” link
45
Part II: Measuring Interdomain Forwarding Paths
Why do we care?
 Characterize end-to-end network paths




Latency
Capacity
Link utilization
Loss rate.
 Diagnose routing anomalies
 Forwarding loop, blackholes, routing changes,
unexpected paths, main component of end-to-end
latency.
 Discover Internet topology
 Server placement
47
Key challenge
 Need to understand how packets flow
through the Internet without real-time
access to proprietary routing data from
each domain.
 Identify accurate packet forwarding paths
 Characterize the performance metrics of
each hop along the paths
48
Existing approaches
 With access to the source
 AS-level traceroute
 Towards an Accurate AS-Level Traceroute Tool, by Z.
Mao, J. Rexford, J. Wang, and R. Katz, ACM Sigcomm,
2003.
 Scalable and Accurate Identification of AS-Level
Forwarding Paths, by Z. Mao, D. Johnson, J. Rexford, J.
Wang, and R. Katz, IEEE Infocom, 2004.
 Without access to the source
 Routescope
 On AS-level Path Inference, by Z. Mao, L. Qiu, J. Wang,
and Y. Zhang, ACM Sigmetrics, 2005.
49
AS-Level Traceroute
 Traceroute gives IP level forwarding
path
 IP address of the router interfaces on a
forwarding path
 RTT statistics for each hop along the way
50
Traceroute from AT&T Research
to www.cnn.com
traceroute to cnn.com (64.236.24.12), 30 hops max,
40 byte packets
1 oden (135.207.16.1) 1 ms 1 ms 1 ms
2 ***
3 attlr-gate (192.20.225.1) 2 ms 2 ms 2 ms
4 12.119.155.157 (12.119.155.157) 3 ms 4 ms 4
ms
5 gbr6-p52.n54ny.ip.att.net (12.123.192.18) 4 ms
4 ms 4 ms
6 tbr2-p012401.n54ny.ip.att.net (12.122.11.29) 4
ms (ttl=249!) 5 ms (ttl=249!) 5 ms (ttl=249!)
7 ggr2-p390.n54ny.ip.att.net (12.123.3.62) 4 ms 5
ms 4 ms
8 att-gw.ny.aol.net (192.205.32.218) 4 ms 4 ms 4
ms
9 bb2-nye-P1-0.atdn.net (66.185.151.66) 4 ms 4
ms 4 ms
10 bb2-vie-P8-0.atdn.net (66.185.152.201) 13 ms
(ttl=245!) 12 ms (ttl=245!) 12 ms (ttl=245!)
11 bb1-vie-P11-0.atdn.net (66.185.152.206) 10 ms
10 ms 10 ms
12 bb1-cha-P7-0.atdn.net (66.185.152.28) 20 ms
20 ms 20 ms
13 bb1-atm-P6-0.atdn.net (66.185.152.182) 25 ms
25 ms 25 ms
14 pop1-atl-P4-0.atdn.net (66.185.136.17) 25 ms
(ttl=243!) 24 ms (ttl=243!) 24 ms (ttl=243!)
15 * * *
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Destination
unreachable!
Who is responsible for
the forwarding problem?
51
Need to know Inter-domain level
path
 Obtain AS level paths
 BGP AS path
 Traceroute AS path
52
BGP AS path
Signaling path: control traffic
d: path=[BC]
d: path=[C]
Prefix d
Forwarding path: data traffic
Prefix
d
…
AS path
ABC
…
Is BGP AS path the answer? No!
53
BGP AS path is not the
answer
 Requires timely access to BGP data
 Signaling path may differ from
forwarding path
 Route aggregation and filtering
 Routing anomalies: e.g., deflections, loops
[Griffin2002]
 BGP misconfigurations: e.g., incorrect AS
prepending
Two paths may differ precisely when operators
most need accurate data to diagnose a problem!
54
Traceroute AS path
 Obtain IP level path using traceroute
 Map IP addresses to ASes
a
b
c
d
Source
e
Destination
AS A
AS B
AS C
AS D
Is traceroute AS path the answer? NO!
55
Traceroute AS path is not the
answer
 Identifying ASes along forwarding path
is surprisingly difficult!
 Internet route registry
 Origin AS in BGP routes
56
Internet route registry
 Whois database
 E.g. NANOG traceroute, prtraceroute
 Out-of-date, incomplete
 Address allocation to customers
 Acquisition, mergers, break-ups
57
Origin AS in BGP routes
 Last AS in the AS path for each prefix
Prefix
AS path
d
ABC
…
…
 More accurate and complete than whois data
58
Limitations of BGP origin AS
 Multiple Origin AS (MOAS)
 Multi-homing
 misconfiguration
 Internet eXchange Points (IXPs)
 Infrastructure addresses may not be advertised
 Does not require to be announced publicly
 Security concerns
 Addresses announced by someone else
 Static routed customers
 Shared equipments at boundary between ASes
Need accurate IP-to-AS mapping!
59
Accurate AS-level traceroute
Combine BGP and traceroute data
to find a better answer!
60
Assumptions
 IP-to-AS mapping
 Mappings from BGP tables are mostly
correct.
 Change slowly
 BGP paths and forwarding paths mostly
match.
 70% of the BGP path and traceroute path
match
61
BGP path and traceroute path
could differ!
 Inaccurate IP-to-AS mapping
 Traceroute problems
 Legitimate mismatches
62
BGP path and traceroute path
could differ!
 Inaccurate IP-to-AS mapping
 Internet eXchange Points (IXPs)
 Sibling ASes
 Unannounced infrastructure addresses
 Traceroute problems
 Legitimate mismatches
63
Internet eXchange Points (IXPs)
 Shared infrastructure connected to multiple
service providers
 Exchange BGP routes and data traffic
 May have its own AS number or announced
by participating ASes
 Dedicated BGP sessions between pairs of
participating ASes
 E.g., Mae-East, Mae-West, PAIX.
64
IXPs cause extra AS hop
 Extra AS hop in traceroute path
 Large number of fan-in and fan-out ASes
 Non transit AS, small address block, likely
MOAS
A
B
C
D
E
A
E
F
B
F
G
C
G
Traceroute AS path
BGP AS path
65
Sibling ASes




Single organization owns and manages multiple ASes
May share address space
Cause extra AS hop
Large fan-in and fan-out for the “sibling AS pair”
A
B
C
H
D
E
A
F
B
G
C
Traceroute AS path
E
D
F
G
BGP AS path
66
Unannounced infrastructure
addresses
 ASes do not necessarily announce
infrastructure via BGP
 Lead to “unmapped” addresses
 Sometimes fall into supernet announced
by AS’s provider or sibling
67
Unannounced infrastructure
addresses
AS loop in
traceroute path
AS A
4. A,C,A
3. B,A
AS B
Substitute AS hop
2. A
Missing AS hop in
traceroute path
Extra AS hop in
traceroute path
AS C
1. A,C
68
BGP path and traceroute path
could differ!
 Inaccurate IP-to-AS mapping
 Traceroute problems
 Forwarding path changing during
traceroute
 Interface numbering at AS boundaries
 ICMP response refers to outgoing interface
 Legitimate mismatches
69
Forwarding path changing during
traceroute
AS D
AS E
Route flaps between A
B C and A D E
AS A
AS A
AS B
AS D
AS C
AS C
AS hop B is substituted by AS D in the traceroute path
70
Interface numbering at AS
boundaries
AS A
AS A
AS C
AS B
AS C
Missing AS hop B in traceroute path
71
ICMP response refers to outgoing
interface
AS A
AS C
ICMP
message
AS B
Extra AS hop B in traceroute path
72
BGP path and traceroute path
could differ!
 Inaccurate IP-to-AS mapping
 Traceroute problems
 Legitimate mismatches
 Route aggregation and filtering
 Routing anomalies, e.g., deflections
73
Route aggregation/filtering
AS A
8.0.0.0/8 B C
AS B
AS C
8.0.0.0/8
C
8.64.0.0/16 C D
Extended traceroute path due to filtering by AS B
74
Mismatch patterns and causes
Extra
AS
Miss
AS
AS
Loop
Subst
AS
IXP
X
Sibling ASes
X
X
X
X
Unannounced IP
X
X
X
X
Aggregation/ filtering
Other
X
Inter-AS interface
X
ICMP source address
X
X
Routing anomaly
X
X
X
X
X
X
X
X
75
BGP and traceroute data
collection
Initial mappings from
origin AS of a large set of BGP tables
(Ignoring unstable paths)
For each location: Local BGP paths
For each location:
Combine all locations:
Traceroute paths
from multiple locations
Traceroute AS paths
•Compare
•Look for known causes of mismatches
(e.g., IXP, sibling ASes)
•Edit IP-to-AS mappings
(a single change explaining a large number of mismatches)
76
Measurement setup
 Eight vantage points
 Upstream providers: US-centric tier-1 ISPs
 Sweep all routable IP address space
 About 200,000 IP addresses, 160,000
prefixes, 15,000 destination ASes
77
Preprocessing BGP paths
 Discard prefixes with BGP paths
containing





Routing changes based on BGP updates
Private AS numbers (64512 - 65535)
Empty AS paths (local destinations)
AS loops from misconfiguration
AS SET instead of AS sequence
 Less than 1% prefixes affected
78
Preprocessing traceroute
paths
 Resolving incomplete traceroute paths
 Unresolved hops within a single AS map to that AS
 Unmapped hops between ASes
 Try match to neighboring AS using DNS, Whois
 Trim unresponsive (*) hops at the end
 Compare with the beginning of local BGP paths
 MOAS at the end of paths
 Assume multi-homing without BGP
 Validation using AT&T router configurations
 More than 98% cases validated
79
Initial IP-to-AS Mapping
Whois
Combined
BGP tables
Resolving
incompletes
Match
44.7%
73.2%
78.0%
Mismatch
29.4%
8.3%
9.0%
1.5
8.8
9.0
Ratio
80
Heuristics to improve mappings
 Overall modification to mappings




10% IP-to-AS mappings modified
25 IXPs identified
28 pairs of sibling ASes found
1150 of the /24 prefixes shared
Match
Mismatch
Ratio
IXP
Sibling
ASes
Unannounced
address space
84.4%
85.9%
90.6%
8.7%
7.8%
3.5%
9.7
11.0
26.0
81
Systematic optimization
 Dynamic-programming and iterative
improvement
 Initial IP-to-AS mapping derived from BGP
routing tables
 Identify a small number of modifications
that significantly improve the match rate.
 95% match ratio, less than 3% changes,
very robust
82
Optimization results
Input mapping
Full initial Mapping
Heuristically optimized mapping
Omit 10% initial mapping
Omit 4 probing sources
Omit probing destinations
(one probe per unique BGP path)
Mismatch
5.23%
3.08%
6.57%
6.34%
7.12%
83
AS-level path inference
 Without access to the source
 Challenges
 Asymmetric routes: 60%
 Complicated routing policies
 Multihomed networks
Find the shortest policy path that conforms with AS relationships
84
Routescope
 Assumptions
 Explicit AS relationship
 Peer-peer
 Provider-customer
 Shortest policy AS path preferred
 Valley-free
 Uniform routing policy within an AS
 AS destination based uniform routing
 Stability
These assumptions are mostly correct.
85
AS path inference algorithm
 Compose AS graph based on BGP tables
 Infer AS relationship
 Classify edges based on AS relationship
 Customer-provider (UP) link
 Provider-customer (DOWN) link
 Peering (FLAT) link
 Compute shortest policy path conforming the “valleyfree” rule using modified Dijkstra’s algorithm
 Infer the first AS hop if multiple paths returned
86
AS path inference accuracy
Total
Match
Match
length
Exact
match
Shorter
Longer
AS7018
(tier-1)
18085
82%
83%
35%
17%
0%
AS2152
(tier-2)
11990
64%
64%
10%
35%
0%
AS8121
(tier-3)
15757
16%
27%
3%
69%
4%
All BGP
gateways
2457
70%
73%
30%
22%
4%
US BGP
gateways
1907
60%
62%
27%
34%
4%
If the first hop is known, 15% of mismatches can be eliminated.
87
First hop inference
AS T1
AS S
AS D
Transition point T1
Source
Only have
access to
D
T2
AS T2
Destination
AS C
 Gather candidate first hop ASes from S by launch
traceroute to S from multiple vantage points
 Identify the transition point T that is likely to be on
the path from S to D by testing
hop_count(S,T) + hop_count(T,D) = hop_count(S,D88)
Hop count inference
 Hop_count(S,T) = hop_count(T,S)
 To infer hop_count(H,D): H = T or S
 Send ping packet to H
 Guess the initial TTL value TTL0 set by H
 Get TTL value TTL1 in ICMP response packet received from
H
 Hop_count(H,D) = TTL0 - TTL1 + 1
 Common value for TTL0:




32 (Win95/98/Me)
64 (Linux, Compaq Tru64)
128 (Win NT/2000/XP)
255 (most UNIX systems)
89
Improvement with known first AS
hop
Total
Match length Improvement
AS7018 (tier-1)
18085
86%
3%
AS2152 (tier-2)
11990
76%
12%
AS8121 (tier-3)
15757
48%
21%
All BGP
gateways
1907
70%
8%
US BGP
gateways
2457
88%
15%
90
Possible causes of inaccuracy
 Complicated AS relationships: 15% paths
 Two consecutive FLAT links
 DOWN link followed by a FLAT link
 FLAT link followed by UP link
 Routing policies
 Shortest path vs. customer routes
 Inconsistent advertisement to different peering
locations
 BGP tie-breaking rules
 AS prepending:>28% ASes
91
Part III: BGP Measurement
BGP routing updates
 Route updates at prefix level
 No activity in “steady state”
 Routing messages indicate changes, no
refreshes
93
Internet routing instability
 Large # of BGP updates
 Failures
 Policy changes
 Redundant messages
 Routing instability
 Route keeps changing, e.g., routes keep
going up and down
94
Implications
 Router overhead
 Transient delay and loss





Unreachable hosts
High loss rate
High jitter
Long delays
Significant packet reordering
 Poor predictability of traffic flow
How do we know if the instability is due to
routing or network congestion?
95
Measure BGP stability
 First work by Labovitz et al.
 Methodology
 Collect routing messages from five public
exchange points
 BGP information considered
 AS path
 Next hop: next hop to reach a network
 Two routes are the same if they have the same AS
path and next hop
 Other attributes (e.g., MED, communities) ignored
 Focus on forwarding path stability
96
Measurement methodology
97
BGP information exchange
 Announcements: a router has either
 Learned of a new route, or
 Made a policy decision that it prefers a new route
 Withdrawals: a router concludes that a
network is no longer reachable
 Explicit: associated to the withdrawal message
 Implicit: (in effect an announcement) when a
route is replaced as a result of an announcement
message
 In steady state BGP updates should be only
the result of infrequent policy changes
 BGP is stateful, requires no refreshes
 Update rate: indication of network stability
98
Example of delayed convergence
stage
0
1
4
2: [1] [41] [431]
node 3: [1] [41] [241]
4: [1] [31] --
Example topology:
9
----
d
1
2
4
3
Assuming node 1 has a route to a destination, and it withdraws the route:
Stage (msg processed)
0:
1: 1->{2,3,4}W
Msg queued
1->{2,3,4}W
2->{3,4}A[241], 3->{2,4}A[341], 4->{2,3}A[431]
2: 2->{3,4}A[241]
3: 3->{2,4}A[341]
4: 4->{2,3}A[431]
3->{2,4}A[341], 4->{2,3}A[431]
4->{2,3}A[431], 4->{2,3}W
MinRouteAdver timer expires:
4->{2,3}W, 3->{2,4}A[3241], 2->{3,4}A[2431]
… (omitted)
9: 3->{2,4}W
Note: In response to a withdrawal from 1, node 3 sends out 3 messages:
3->{2,4}A[341], 3->{2,4}A[3241], 3->{2,4}W
99
Types of
inter-domain routing updates
 Forwarding instability
 may reflect topology changes
 Policy fluctuations (routing instability)
 may reflect changes in routing policy information
 Pathological updates
 redundant updates that are neither routing nor
forwarding instability
 Instability
 forwarding instability and policy fluctuation 
change forwarding path
100
Routing successive events
(instability)
 WADiff
 W: a route is explicitly withdrawn as it becomes
unreachable
 A: is later replaced with an alternative route
 Forwarding instability
 AADiff
 A: a route is implicitly withdrawn
 A: then replaced by an alternative route as the
original route becomes unavailable or a new
preferred route becomes available
 Forwarding instability
101
Routing successive events
(pathological instability)
 WADup
 W: a route is explicitly withdrawn
 A: then reannounced later
 forwarding instability or pathological behavior
 AADup
 A: a route is implicitly withdrawn
 A: then replaced with a duplicate of the original route
 pathological behavior or policy fluctuation
 WWDup
 The repeated transmission of BGP withdrawals for a prefix
that is currently unreachable (pathological behavior)
102
Measurement findings:
overview
 Year 2000
 BGP updates more than one order of
magnitude larger than expected
 Routing information dominated by
pathological updates
 Implementation problems
 BGP self-synchronization
 Unconstrained routing policies
103
Routing problem findings
 Implementation problems
 Redundant updates
 Routers do not maintain the history of the
announcements sent to neighbors
 Self-synchronization
 BGP routers exchange information simultaneously
 may lead to periodic link/router failures
 Unconstrained routing policies may lead to
persistent route oscillations
104
Instability measurement
 Instability and redundant updates
exhibits strong correlation with load
 (30 seconds, 24 hours and seven days
periods)
 Instability usually exhibits high
frequency
 Pathological updates exhibits both high
and low frequencies
105
Non-localized instability
 No single AS dominates instability
statistics
 No correlation between size of AS and
its impact on instability statistics
 There is no small set of paths that
dominate instability statistics
106
Measurement conclusions
 Routing in the Internet exhibits many
undesirable behaviors




Instability over a wide range of time scales
Asymmetric routes
Network outages
Problem seems to worsen
 Many problems are due to software
bugs or inefficient router architectures
107
Lessons
 Even after decades of experience routing in
the Internet is not a solved problem
 This attests the difficulty and complexity of
building distributed algorithm in the Internet,
i.e., in a heterogeneous environment with
products from various vendors
 Simple protocols may increase the chance to
be
 Understood
 Implemented right
108
Better understanding of BGP
dynamics
 Difficulties




Multiple administrative domains
Unknown information (policies, topologies)
Unknown operational practices
Ambiguous protocol specs
Proposal: a controlled active measurement infrastructure
for continuous BGP monitoring – BGP Beacons.
109
What is a BGP Beacon?
 An unused, globally visible prefix with known
Announce/Withdrawal schedule
 For long-term, public use
110
Who will benefit from BGP
Beacon?
 Researchers: study BGP dynamics
 To calibrate and interpret BGP updates
 To study convergence behavior
 To analyze routing and data plane interaction
 Network operators
 Serve to debug reachability problems
 Test effects of configuration changes:
 E.g., flap damping setting
111
Related work
 Differences from Labovitz’s “BGP faultinjector”




Long-term, publicly documented
Varying advertisement schedule
Beacon sequence number (AGG field)
Enabler for many research in routing dynamics
 RIPE Ris Beacons
 Set up at 9 exchange points
112
Active measurement
infrastructure
Many Observation points:
1:Oregon RouteViews
Internet
ISP
2. RIPE
ISP
3.AT&T
ISP
Send
route update
Upstream
provider
Stub AS
BGP Beacon #1
198.133.206.0/24
ISP
ISP
ISP
ISP
Upstream
provider
ISP
ISP
ISP
4. Verio
5. MIT
6.Berkeley
113
Deployed PSG Beacons
Prefix
Src
AS
Start
date
Upstream
Beacon
provider AS host
Beacon
location
198.133.206.0/24
3130
8/10/02
2914, 1239
Randy Bush
WA, US
192.135.183.0/24
5637
9/4/02
3701, 2914
Dave Meyer
OR, US
203.10.63.0/24
1221
9/25/02
1221
Geoff Huston
Australia
198.32.7.0/24
3944
10/24/02 2914, 8001
Andrew Partan
MD, US
192.83.230.0/24
3130
06/12/03 2914, 1239
Randy Bush
WA, US
114
Deployed PSG Beacons
 B1, 2, 3, 5:
 Announced and withdrawn with a fixed period
 (2 hours) between updates
 1st daily ANN: 3:00AM GMT
 1st daily WD: 1:00AM GMT
 B4: varying period
 B5: fail-over experiments
 Software available at:
http://www.psg.com/~zmao
115
Beacon 5 schedule
Live host behind
the beacon for
data analysis
Study fail-over
Behavior for
multi-homed
customers
116
Beacon terminology
Internet
Beacon
AS
Beacon prefix:
198.133.206.0/24
Input signal:
Beacon-injected change
3:00:00 GMT: Announce (A0)
5:00:00 GMT: Withdrawal (W)
Output signal:
RouteView
AT&T
5:00:10 A1
5:00:40 W
5:01:10 A2
Signal length: number of updates in
output signal (3 updates)
Signal duration: time between first
and last update in the signal (5:00:10 - 5:01:10, 60 seconds)
Inter-arrival time: time between
consecutive updates
117
Process Beacon data
 Identify output signals, ignore external events
 Data cleaning
 Anchor prefix as reference
 Same origin AS as beacon prefix
 Statically nailed down
 Minimize interference between consecutive
input signals
 Beacon period is set to 2 hours
 Time stamp and sequence number
 Attach additional information in the BGP updates
 Make use of a transitive attribute: Aggregator fields
118
Beacon data cleaning process
 Goal
 Clearly identify
updates associated
with injected routing
change
 Discard beacon
events influenced by
external routing
changes
119
Cumulative Beacon statistics:
significant noise
 Current observation points:
 111 peers: RIPE, Route-View, Berkeley,
MIT, MIT-RON nodes, ATT-Research, AT&T,
AMS-IXP, Verio
Avg expansion: 2*0.2+1*0.8=1.2
120
Cumulative Beacon statistics:
significant noise
 Example response to ANN-beacon at peer p
 R1: ASpath= 286 209 1 3130 3927
No. transient routes=2
 R2: ASpath= 286 209 2914 3130 3927
Out-signal length=1
 100 events: 20: R1 R2, 80: R2
Beacon
Max no. Max ANN- Max WD- Max ANN-avg Max WD-avg
transient out-signal out-signal
expansion
expansion
routes
length
length
1
186
11
14
9.7
11.2
2
179
9
15
7.0
10.8
3
117
16
13
5.8
11.4
4
307
18
15
8.8
16.3
121
Cisco vs. Juniper
update rate-limiting
Known last-hop Cisco
and Juniper routers
from the same AS and
location
Average signal length:
average number of
updates observed for a
single beacon-injected
change
122
“Cisco-like” last-hop routers
Linear increase in
signal duration wrt
signal length
Slope=30 second
Due to Cisco’s
default rate-limiting
setting
123
“Juniper-like” last-hop routers
Signal duration
relatively stable
wrt increase in
signal length
Shorter signal
duration compared
to “Cisco-like”
last-hops
124
Route flap damping
 A mechanism to punish unstable routes
by suppressing them
 Reduce router processing load due to
instability
 Prevent sustained routing oscillations
 Do not sacrifice convergence times for
well-behaved routes
There is conjecture a single announcement can
cause route suppression.
125
RFC2439: Route flap damping
Cisco default setting
 Scope
Penalty
Exponentially decayed
3000
Suppress threshold
 Inbound external routes
 Per neighbor, per
destination
 Penalty
 Flap: route change
 Increases for each flap
 Decays exponentially
2000
1000
750
P(t ' )  P(t )e (t 't )
Reuse threshold
0
2
4
32
Time (min)
126
Route flap damping analysis
Strong evidence for
withdrawal- and
announcementtriggered
suppression.
127
Distinguish between
announcement and withdrawal
Summary:
•WD-triggered sup
more likely
than ANNtriggered sup
•Cisco: overall
more likely trigger
sup than Juniper
(AAAW-pattern)
•Juniper: more
aggressive for
AWAW pattern
128
Convergence analysis
Summary:
•Withdrawals
converge
slower than
announcements
•Most beacon
events converge
within 3 minutes
129
Output signal duration
30
60
90
120
130
Beacon 1’s upstream change
Single-homed
(AS2914)
Multi-homed
(AS1,2914)
Multi-homed
131
(AS1239, 2914)
Beacon for identifying router
behavior
Rate-limiting timer
30 second
Beacon 2
seen from
RouteView data
Different
rate-limiting
behavior:
Cisco vs.
Juniper
132
Inter-arrival time analysis
133
Inter-arrival time modeling
 Geometric distribution (body):
 Update rate-limiting behavior: every 30 sec
 Prob(missing update train) independent of how many
already missed
 Mass at 1:
 Discretization of timestamps for times<1
 Shifted exponential distribution (tail):
 Most likely due to route flap damping
134
Motivation
destination
Failure
AS4
Disruption
AS2
AS3
Congestion
BR
C
A
BR
C
B
AS1
BR
C
C
Mitigation
A backbone network is vulnerable to routing
BR
C
D
changes that occur
in other domains.
source
135
Goal
 Identify important routing anomalies
 Lost reachability
 Persistent flapping
 Large traffic shifts
Contributions:
•Build a tool to identify a small number of
important routing disruptions from a large
volume of raw BGP updates in real time.
•Use the tool to characterize routing
disruptions in an operational network
136
Capturing Routing Changes
A large operational network
(8/16/2004 – 10/10-2004)
BR
BR
C
BR
BR
C
BR
BR
C
BGP
CPE
Monitor
BR
BR
C
BR
BR
C
BR
BR
C
137
Challenges
 Large volume of BGP updates
 Millions daily, very bursty
 Too much for an operator to manage
 Different from root-cause analysis
 Identify changes and their effects
 Focus on actionable events rather than
diagnosis
 Diagnose causes in/near the AS
138
System Architecture
BGP (106)
BR
E Updates
Events (105)
BR
E
BR
E
BGP Update
Grouping
Persistent
Flapping
Prefixes
(101)
“Typed”
Events
Event
Classification
Clusters
Event
Correlation
Frequent
Flapping
Prefixes
(101)
Large
Disruptions
(101)
(103)
Traffic Impact
Prediction
Netflow
Data
BR
E
BR
E
BR
E
From millions of updates to a few dozen reports
139
Grouping BGP Update into Events
Challenge: A single routing change
 leads to multiple update messages
 affects routing decisions at multiple routers
BR
E
BR
E
BR
E
Approach:
BGP BGP Update
Updates Grouping
Persistent
Flapping
Prefixes
Events
•Group together all updates
for a prefix with
inter-arrival < 70 seconds
•Flag prefixes with changes
lasting > 10 minutes.
140
Grouping Thresholds
 Based on our understanding of BGP and data
analysis
 Event timeout: 70 seconds
 2 * MRAI timer + 10 seconds
 98% inter-arrival time < 70 seconds
 Convergence timeout: 10 minutes
 BGP usually converges within a few minutes
 99.9% events < 10 minutes
141
Persistent Flapping Prefixes
A surprising finding:
15.2% of updates were caused by
persistent-flapping prefixes even though
flap damping is enabled.
 Types of persistent flapping
 Conservative damping parameters (78.6%)
 Protocol oscillations due to MED (18.3%)
 Unstable interfaces or BGP sessions (3.0%)
142
Example: Unstable eBGP Session
AE
ISP
DE
Peer
BE
CE
p
Customer
 Flap damping parameters is session-based
 Damping not implemented for iBGP sessions
143
Event Classification
Challenge: Major concerns in network management
 Changes in reachability
 Heavy load of routing messages on the routers
 Change of flow of the traffic through the network
Events
Event
Classification
“Typed” Events,
e.g., Loss/Gain of Reachability
Solution: classify events by severity of their impact
144
Event Category – “No Disruption”
p
AS2
AS1
DE
No Traffic Shift
“No Disruption”:
EE
AE
BE
ISP
no border routers have any traffic shift. (50.3%)
CE
145
Event Category – “Internal Disruption”
p
AS2
AS1
DE
EE
AE
BE
“Internal Disruption”: ISP
all traffic shifts are internal. (15.6%)
CE
Internal Traffic Shift
146
Event Category – “Single External
Disruption”
p
AS2
AS1
DE
external Traffic Shift
EE
AE
BE
“Single External Disruption”:
ISP
only one of the traffic shifts is external (20.7%)
CE
147
Statistics on Event Classification
Events
Updates
No Disruption
50.3%
48.6%
Internal Disruption
15.6%
3.4%
Single External Disruption
20.7%
7.9%
Multiple External Disruption
7.4%
18.2%
Loss/Gain of Reachability
6.0%
21.9%
 First 3 categories have significant day-to-day
variations
 Updates per event depends on the type of events
and the number of affected routers
148
Event Correlation
Challenge: A single routing change
 affects multiple destination prefixes
“Typed”
Events
Event
Correlation
Clusters
Solution:
group the same-type, close-occurring events
149
EBGP Session Reset
 Caused most of “single external disruption”
events
 Check if the number of prefixes using that
session as the best route changes dramatically
Number of prefixes
session
recovery
session
failure
time
 Validation with Syslog router report (95%)
150
Hot-Potato Changes
 Hot-Potato Changes
P
AE
11
9
BE
ISP
10
“Hot-potato routing” =
route to closest egress point
CE
 Caused “internal disruption” events
 Validation with OSPF measurement (95%)
[Teixeira et al – SIGMETRICS’ 04]
151
Traffic Impact Prediction
Challenge: Routing changes have different
impacts on the network which depends
on the popularity of the destinations
Traffic Impact
Prediction
Clusters
Large
Disruptions
Netflow
Data
E
BR
E
BR
E
BR
Solution: weigh each cluster by traffic volume
152
Traffic Impact Prediction
 Traffic weight
 Per-prefix measurement from netflow
 10% prefixes accounts for 90% of traffic
 Traffic weight of a cluster
 the sum of “traffic weight” of the prefixes
 A small number of large clusters have large
traffic weight
 Mostly session resets and hot-potato changes
153
Performance Evaluation
 Memory
 Static memory: “current routes”, 600 MB
 Dynamic memory: “clusters”, 300 MB
 Speed
 99% of intervals of 1 second of updates can be
process within 1 second
 Occasional execution lag
 Every interval of 70 seconds of updates can be
processed within 70 seconds
Measurements were based on 900MHz CPU
154
Conclusion of BGP
Troubleshooting Tool
 BGP troubleshooting system
 Fast, online fashion
 Operators’ concerns (reachability, flapping, traffic)
 Significant information reduction
 millions of update  a few dozens of large disruptions
 Uncovered important network behavior
 Hot-Potato changes
 Session resets
 Persistent-flapping prefixes
155
Part IV BGP Modeling
BGP Is Not Guaranteed to
Converge!
 BGP is not guaranteed to converge to a
stable routing. Policy inconsistencies
can lead to “livelock” protocol
oscillations.
 Goal:
 Design a simple, tractable and complete
model of BGP modeling
 Example application: sufficient condition to
guarantee convergence.
157
BGP is Solving What Problem?
Underlying problem
Distributed means of
computing a solution.
Shortest Paths
RIP, OSPF, IS-IS
X?
BGP
 X can
 aid in the design of policy analysis algorithms and
heuristics,
 aid in the analysis and design of BGP and
extensions,
 help explain some BGP routing anomalies,
 provide a fun way of thinking about the protocol
158
Separate Dynamic and Static
Semantics
 Static semantics:
 BGP policies  Stable Paths Problem
 Dynamic semantics:
 BGP  SPVP
 SPVP: Simple Path Vector Protocol
 A distributed algorithm for solving Stable Paths
Problem
159
What is Stable Paths
Problem?
Example:
 A graph of nodes and
edges,
 Node 0, called the origin,
 For each non-zero node, a
set or permitted paths to
the origin. This set always
contains the “null path”.
 A ranking of permitted
paths at each node. Null
path is always least
preferred.
222 10 0
5
5210
2
4
420
430
3
30
0
1
130
10
most preferred
…
least preferred (not160null)
A Solution to SPP
 A solution is an assignment of permitted
paths to each node such that
 node u’s assigned path is either the null
path or is a path uwP, where wP is
assigned to node w and {u,w} is an edge
in the graph,
 each node is assigned the highest ranked
path among those consistent with the
paths assigned to its neighbors
161
A Solution to SPP
 A solution
need not
represent a
shortest path
tree or a
spanning tree
210
20
5
5210
2
4
420
430
3
30
0
1
130
10
162
There can be Multiple
Solutions to an SPP
120
10
120
10
1
120
10
1
0
0
2
210
20
DISAGREE
1
2
210
20
First solution
0
2
210
20
Second solution
163
Multiple Solutions Can Occur
Due to Recovery:
10
1230
1
230
210
2
1
primary
link
0
2
0
1
10
1230
2
230
310
0
backup
link
3210
30
3
Remove primary link
3
3
3210
30
Restore primary link
164
Ranking BGP Paths







Highest local Preference
Shortest AS path Length
Origin: IGP<EGP<INCOMPLETE
Lowest MED value
IBGP preferred over EBGP
Lowest IGP cost
Tie breaking
165
Bad Gadget: No Solution
Stage 1:
1: [10]
2: [210]
3: [30]
Stage 2:
1:[130]
2:[20]
3:[320]
Back to stage 1
2
210
20
4
0
1
130
10
3
320
30
166
Bad Gadget: No Solution
Stage 1:
1: [10]
2: [20]
3: [320]
Stage 2:
1:[130]
2:[210]
3:[30]
Back to stage 1
2
210
20
4
0
1
130
10
3
320
30
167
Has A Solution, But Can Get
Trapped:
4
310
3120
5
5310
563120
53120
4310
453120
43120
1
3
120
10
0
6
2
6310
643120
63120
This part has a solution only
when node 1 is assigned the
direct path (1 0).
210
20
As with DISAGREE, this part
has two distinct solutions 168
Has A Solution, But Can Get
Trapped:
4
310
3120
5
5310
563120
53120
4310
453120
43120
1
3
120
10
0
6
2
6310
643120
63120
This part has a solution only
when node 1 is assigned the
direct path (1 0).
210
20
As with DISAGREE, this part
has two distinct solutions 169
How To Solve An SPP?
 Exponential complexity
 Just enumerate all path assignments,
And check stability of each….
 NP-complete
 3-SAT can be reduced to SPP
170
Distributed Algorithms to
Solve SPP
 OSPF-like




Distributed topology, path ranks
Solve SPP locally
Exponential worst case
How to avoid loops if multiple solutions exist?
 RIP-like:





Pick the best path form neighbors’ paths
Tell neighbors about changes
Can diverge
Not guaranteed to find a solution even if it exists
No bound on convergence time
171
SPVP Protocol
 Pick the best path available at any time
process spvp[u] {
receive P from w 
{ rib-in(uw) := u P
if rib(u) != best(u) {
rib(u) := best(u)
foreach v in peers(u) {
send rib(u) to v
}
}
}
}
172
SPVP and SPP
 SPVP wanders around assignment
space
SPP Solvable
must converge
SPVP Can Diverge
must diverge
173
A sufficient condition for
sanity
If an instance of SPP has an
acyclic dispute digraph, then
Static (SPP)
Dynamic (SPVP)
solvable
safe (can’t diverge)
unique solution
predictable restoration
all sub-problems
uniquely solvable
robust with respect to
link/node failures
174
Dispute Digraph Example
130
10
210
20
1
2
0
20
10
420
210
3420
3
4
3420
30
420
430
BAD GADGET II
CYCLE
430
130
30
175
Dispute Wheels
R_k
u_0
u_k
Q_0
Q_k
•At
u_i,
rank
of
Q_i
u_1
R_1 is less than or equal
u_2to rank of R_iQ_(i+1)
Q_1
R_0
Q_2
Q_(I+1)
Q_i
u_(i+1)
R_i
u_i
•There exists a
dispute wheel iff
there exists
cycle in the dispute
digraph
176
Dispute Wheel Example
1230
120
10
2310
230
20
1
2
3
1
1
2
0
0
3
3
2
3120
310
30
177
A Dynamic Solution
 Extend SPVP with a history attribute,
 A route’s history contains a path in the
dispute digraph that “explains” how the route
was obtained,
 A route history will contain a dispute cycle if
and only if a policy dispute is dynamically
realized.
 If a route’s history contains a cycle, then
suppress it ….
178