Computer Networks (Graduate level) Lecture 10: Inter-domain Routing

Download Report

Transcript Computer Networks (Graduate level) Lecture 10: Inter-domain Routing

Computer Networks

(Graduate level)

Lecture 10: Inter-domain Routing University of Tehran Dept. of EE and Computer Engineering By: Univ. of Tehran Dr. Nasser Yazdani Computer Network 1

  

Inter-Domain Routing

Border Gateway Protocol (BGP) Assigned reading  [LAB00] Delayed Internet Routing Convergence Sources  RFC1771: main BGP RFC     RFC1772-3-4: application, experiences, and analysis of BGP RFC1965: AS confederations for BGP Christian Huitema: “Routing in the Internet”, chapters 8 and 9.

John Stewart III: “BGP4 - Inter-domain routing in the Internet” Univ. of Tehran Computer Network 2

Outline

External BGP (E-BGP)  Internal BGP (I-BGP)  Multi-Homing  Stability Issues  Scalability Issues Univ. of Tehran Computer Network 3

Internet Routing

Internet organized as a two      level hierarchy First level – autonomous systems (AS’s)  AS – region of network under a single administrative domain Each AS assigned unique ID AS’s peer at network exchange routing information.

AS’s run an intra-domain routing protocols   Distance Vector, e.g., RIP Link State, e.g., OSPF Between AS’s runs inter-domain routing protocols, e.g., Border Gateway Routing (BGP)  De facto standard today, BGP-4 Univ. of Tehran Computer Network 4

Example

AS-1 AS-2

Interior router BGP router

AS-3

Univ. of Tehran Computer Network 5

Inter-domain Routing basics

 Internet is composed of over 16000 autonomous systems  BGP = Border Gateway Protocol  Is a Policy-Based routing protocol  Is the de facto inter-domain routing protocol of today’s global Internet  Relatively simple but configuration is complex and the entire world can see, and be impacted by, your mistakes. 6

 

History

Mid-80s: EGP  Reachability protocol (no shortest path)   Did not accommodate cycles (tree topology) Evolved when all networks connected to NSF backbone Result: BGP introduced as routing protocol  Latest version = BGP 4   BGP-4 supports CIDR Primary objective: connectivity not performance Univ. of Tehran Computer Network 7

Choices

   Link state or distance vector?

 No universal metric – policy decisions Problems with distance-vector:  Bellman-Ford algorithm may not converge Problems with link state:    Metric used by routers not the same – loops LS database too large – entire Internet May expose policies to other AS’s Univ. of Tehran Computer Network 8

Solution: Distance Vector with Path

   Each routing update carries the entire path Loops are detected as follows:  When AS gets route check if AS already is in path  If yes, reject route  If no, add self and (possibly) advertise route further Advantage:  Metrics are local - AS chooses path, protocol ensures no loops Univ. of Tehran Computer Network 9

   

Interconnecting BGP Peers

BGP uses TCP to connect peers AS’s exchange reachability information through their BGP routers, only when routes change Advantages:  Simplifies BGP   Disadvantages  Congestion control on a routing protocol?

 No need for periodic refresh - routes are valid until withdrawn, or the connection is lost Incremental updates Poor interaction during high load Univ. of Tehran Computer Network 10

BGP Operations (Simplified)

Establish session on TCP port 179 AS1 BGP session Exchange all active routes Exchange incremental updates AS2 While connection is ALIVE exchange route UPDATE messages

11

Customers and Providers

provider provider customer IP traffic customer Customer pays provider for access to the Internet

Univ. of Tehran Computer Network 12

The “Peering” Relationship

peer provider peer customer traffic allowed

Univ. of Tehran

traffic NOT allowed Peers provide transit between their respective customers Peers do not provide transit between peers Peers (often) do not exchange $$$

Computer Network 13

Peering Provides Shortcuts

Peering also allows connectivity between the customers of “Tier 1” providers.

Univ. of Tehran Computer Network

peer provider peer customer

14

Peering Wars

Peer

   Reduces upstream transit costs Can increase end-to-end performance May be the only way to connect your customers to some part of the Internet (“Tier 1”)   

Don’t Peer

You would rather have customers Peers are usually your competition Peering relationships may require periodic renegotiation

Peering struggles are by far the most contentious issues in the ISP world!

Computer Network 15

  

AS Categories

Stub: an AS that has only a single connection to one other AS - carries only local traffic.

Multi-homed: an AS that has connections to more than one AS, but does not carry transit traffic Transit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions) Univ. of Tehran Computer Network 16

AS Categories

AS1 AS2 Stub Univ. of Tehran AS1 AS3 AS2 Multi-homed Computer Network AS1 AS3 AS2 Transit 17

Policy with BGP

   BGP provides capability for enforcing various policies Policies are not part of BGP: they are provided to BGP as configuration information BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s Univ. of Tehran Computer Network 18

Examples of BGP Policies

   A multi-homed AS refuses to act as transit  Limit path advertisement A multi-homed AS can become transit for some AS’s  Only advertise paths to some AS’s An AS can favor or disfavor certain AS’s for traffic transit from itself Univ. of Tehran Computer Network 19

   

Routing Information Bases (RIB)

Routes are stored in RIBs Adj-RIBs-In: routing info that has been learned from other routers (unprocessed routing info) Loc-RIB: local routing information selected from Adj-RIBs-In (routes selected locally) Adj-RIBs-Out: info to be advertised to peers (routes to be advertised) Univ. of Tehran Computer Network 20

Architecture of Dynamic Routing

OSPF AS 1 BGP IGP = Interior Gateway Protocol Metric based: OSPF, IS-IS, RIP, EIGRP (cisco) EGP = Exterior Gateway Protocol AS 2 EIGRP Policy based: BGP The Routing Domain of BGP is the entire Internet

Univ. of Tehran Computer Network 21

Four Types of BGP Messages

Open : Establish a peering session.  Keep Alive : Handshake at regular intervals.  Notification : Shuts down a peering session.  Update : Announcing new routes or withdrawing previously announced routes.

announcement = prefix + attributes values

22

Two Types of BGP Neighbor Relationships

AS1

eBGP • •

External Neighbor (eBGP) in a different Autonomous Systems Internal Neighbor (iBGP) in the same Autonomous System iBGP is routed using Interior Gateway Protocol (IGP)!

iBGP

AS2

23

iBGP Peers Must be Fully Meshed

eBGP update iBGP updates • iBGP is needed to avoid routing loops within an AS • Injecting external routes into IGP does not scale and causes BGP policy information to be lost • BGP does not provide “shortest path” routing iBGP neighbors do not announce routes received via iBGP to other iBGP neighbors.

24

Important BGP attributes

    LocalPREF  Local preference policy to choose “most” preferred route Multi-exit Discriminator (MED)  Which peering point to choose?

Import Rules  What route advertisements do I accept?

Export Rules  Which routes do I forward to whom?

Univ. of Tehran Computer Network 25

Implementing Customer/Provider and Peer/Peer relationships

Two parts:

  Enforce transit relationships  Outbound route filtering Enforce order of route preference  provider < peer < customer Univ. of Tehran Computer Network 26

Import Routes

provider route peer route customer route ISP route From provider From provider From peer

Univ. of Tehran

From customer From customer

Computer Network

From peer

27

Export Routes

provider route peer route customer route ISP route To provider From provider To peer

Univ. of Tehran

To customer To customer

Computer Network

To peer filters block

28

BGP Common Header

0 1 2 3 Marker (security and message delineation) 16 bytes Length (2 bytes) Type (1 byte)

Types: OPEN, UPDATE, NOTIFICATION, KEEPALIVE

Univ. of Tehran Computer Network 29

BGP OPEN message

0 1 2 3 Marker (security and message delineation) Length My autonomous system Type: open BGP identifier Hold time version Parameter length Optional parameters My AS: id assigned to that AS Hold timer: max interval between KEEPALIVE or UPDATE messages interval implies no keep_alive.

BGP ID: IP address of one interface (same for all messages) Univ. of Tehran Computer Network 30

BGP UPDATE message

0 1 2 3 Marker (security and message delineation) ..routes len Length Type: update Withdrawn..

Withdrawn routes (variable) ...

Path attribute len Path attributes (variable) Network layer reachability information (NLRI) (variable) •Many prefixes may be included in UPDATE, but must share same attributes.

•UPDATE message may report multiple withdrawn routes.

Univ. of Tehran Computer Network 31

   

BGP UPDATE Message

List of withdrawn routes Network layer reachability information  List of reachable prefixes Path attributes    Origin Path Metrics All prefixes advertised in a message have same path attributes Univ. of Tehran Computer Network 32

NLRI

 Network Level Reachability Information  list of IP address prefixes encoded as follows: Length (1 byte) Prefix (variable) Univ. of Tehran Computer Network 33

Path attributes

Type-Length-Value encoding Attribute type (2 bytes) Attribute length (1-2 bytes) Attribute Value (variable length) Attribute type field Attribute flags (1 byte) Attribute type code (1 byte) Flags: optional, v.s. well-known transitive, partial, extended length Univ. of Tehran Computer Network 34

BGP NOTIFICATION message

0 1 2 3 Marker (security and message delineation) Length Error sub-code Type: NOTIFICATION Data Error code •Used for error notification TCP connection is closed immediately after notification Univ. of Tehran Computer Network 35

BGP KEEPALIVE message

0 1 2 3 Marker (security and message delineation) Length Type: KEEPALIVE Sent periodically to peers to ensure connectivity.

If hold_time is zero, messages are not sent.

.

Sent in place of an UPDATE message Univ. of Tehran Computer Network 36

Path Selection Criteria

   Information based on path attributes Attributes + external (policy) information Examples:     Hop count Policy considerations  Preference for AS  Presence or absence of certain AS Path origin Link dynamics Univ. of Tehran Computer Network 37

Route Selection Summary

Highest Local Preference Shortest ASPATH Lowest MED i-BGP < e-BGP Lowest IGP cost to BGP egress Lowest router ID

Univ. of Tehran

Enforce relationships traffic engineering

Computer Network

Throw up hands and break ties

38

Back to Frank …

peer provider peer customer Local preference only used in iBGP AS 4 local pref = 80 local pref = 90 local pref = 100 AS 3 AS 2 Higher Local preference values are more preferred AS 1 13.13.0.0/16

39

Backup Links with Local Preference (Outbound Traffic)

AS 1 primary link backup link Set Local Pref = 100 for all routes from AS 1 AS 65000 Set Local Pref = 50 for all routes from AS 1 Forces outbound traffic to take primary link, unless link is down.

We’ll talk about inbound traffic soon …

40

ASPATH Attribute

135.207.0.0/16 AS Path = 1239 7018 6341 AS 1239 Sprint 135.207.0.0/16 AS Path = 6341 AS 6341 AT&T Research 135.207.0.0/16 Prefix Originated 135.207.0.0/16 AS Path = 1755 1239 7018 6341 AS 1129 Global Access AS 1755 Ebone 135.207.0.0/16 AS Path = 7018 6341 135.207.0.0/16 AS Path = 1129 1755 1239 7018 6341 AS 12654 RIPE NCC RIS project AS7018 AT&T 135.207.0.0/16 AS Path = 3549 7018 6341 135.207.0.0/16 AS Path = 7018 6341 AS 3549 Global Crossing

41

COMMUNITY Attribute to the Rescue!

AS 1 provider AS 3 provider AS 3: normal customer local pref is 100, peer local pref is 90 192.0.2.0/24 ASPATH = 2 primary backup customer AS 2 192.0.2.0/24 192.0.2.0/24 ASPATH = 2 COMMUNITY = 3:70 Customer import policy at AS 3: If 3:90 in COMMUNITY then set local preference to 90 If 3:80 in COMMUNITY then set local preference to 80 If 3:70 in COMMUNITY then set local preference to 70

42

Hot Potato Routing: Go for the Closest Egress Point

192.44.78.0/24 egress 1 15 56 egress 2 IGP distances This Router has two BGP routes to 192.44.78.0/24. Hot potato: get traffic off of your network as Soon as possible. Go for egress 1!

43

Getting Burned by the Hot Potato

High bandwidth Provider backbone 2865 17 Heavy Content Web Farm SFF Low bandwidth customer backbone 15 San Diego Many customers want their provider to carry the bits! 56 NYC tiny http request huge http reply

44

Cold Potato Routing with MEDs (Multi-Exit Discriminator Attribute)

Prefer lower MED values 192.44.78.0/24 MED = 15 2865 17 192.44.78.0/24 MED = 56 Heavy Content Web Farm 15 56 192.44.78.0/24 This means that MEDs must be considered BEFORE IGP distance!

Note1 : some providers will not listen to MEDs Note2 : MEDs need not be tied to IGP distance

45

MED

• MED is typically used in provider/subscriber scenarios • It can lead to unfairness if used between ISP because it may force one ISP to carry more traffic: SF ISP1 ISP2 • ISP1 ignores MED from ISP2 • ISP2 obeys MED from ISP1 • ISP2 ends up carrying traffic most of the way NY Univ. of Tehran Computer Network 46

1

Policies Can Interact Strangely (“Route Pinning” Example)

customer backup 2 Install backup link using community 3 Disaster strikes primary link and the backup takes over 4 Primary link is restored but some traffic remains pinned to backup

Path Attributes

  Categories (recall flags):     well-known mandatory (passed on) well-known discretionary (passed on) optional transitive (passed on) optional non-transitive (if unrecognized, not passed on) Optional attributes allow for BGP extensions Univ. of Tehran Computer Network 48

Path attribute message format (repeated)

Attribute flags O T P E 0 O: optional or well-known T: transitive or local P: partially evaluated E: length in 1 or 2 bytes Attribute type code Origin AS_path Next hop etc.

Univ. of Tehran Computer Network 49

ORIGIN path attribute

  Well-known, mandatory attribute.

Describes how a prefix was generated at the origin AS. Possible values:    IGP : prefix learned from IGP EGP : prefix learned through EGP INCOMPLETE : none of the above (often seen for static routes) Univ. of Tehran Computer Network 50

Next hop path attribute

    Well-known, mandatory attribute NEXT_HOP: IP address of border router to be used as next hop Usually, next hop is the router sending the UPDATE message Useful when some routers do not speak BGP Univ. of Tehran Computer Network 51

Other Attributes

   ORIGIN  Source of route (IGP, EGP, other) NEXT_HOP  Address of next hop router to use  Used to direct traffic to non-BGP router Check out http://www.cisco.com

explanation for full Univ. of Tehran Computer Network 52

Outline

 External BGP (E-BGP)  Internal BGP (I-BGP)  Multi-Homing  Stability Issues Univ. of Tehran Computer Network 53

I-BGP

Univ. of Tehran Computer Network 54

 

Internal BGP (I-BGP)

Same messages as E-BGP Different rules about re-advertising prefixes:    Prefix learned from E-BGP can be advertised to I-BGP neighbor and vice-versa, but Prefix learned from one I-BGP neighbor cannot be advertised to another I-BGP neighbor Reason: no AS PATH within the same AS and thus danger of looping.

Univ. of Tehran Computer Network 55

Internal BGP (I-BGP)

• R3 can tell R1 and R2 prefixes from R4 • R3 can tell R4 prefixes from R1 and R2 • R3 cannot tell R2 prefixes from R1 R2 can only find these prefixes through a

direct connection

to R1 Result: I-BGP routers must be fully connected (via TCP)!

• contrast with E-BGP sessions that map to physical links AS1 R1 R2 Univ. of Tehran R3 E-BGP R4 I-BGP Computer Network AS2 56

Link Failures

   Two types of link failures:   Failure on an E-BGP link Failure on an I-BGP Link These failures are treated completely different in BGP Why?

Univ. of Tehran Computer Network 57

Failure on an E-BGP Link

• If the link R1-R2 goes down • The TCP connection breaks • BGP routes are removed • This is the

desired

behavior AS1 R1 E-BGP session R2 Physical link AS2 138.39.1.1/30 138.39.1.2/30 Univ. of Tehran Computer Network 58

Failure on an I-BGP Link

•If link R1-R2 goes down, R1 and R2 should still be able to exchange traffic •The indirect path through R3 must be used •Thus, E-BGP and I-BGP must use

different conventions

with respect to TCP endpoints 138.39.1.2/30 R2 Physical link 138.39.1.1/30 R1 I-BGP connection R3 Univ. of Tehran Computer Network 59

Outline

External BGP (E-BGP)  Internal BGP (I-BGP)  Multi-Homing  Stability Issues  Scalability Issues Univ. of Tehran Computer Network 60

  

Multi-homing

With multi-homing, a single network has more than one connection to the Internet.

Improves reliability and performance:   Can accommodate link failure Bandwidth is sum of links to Internet Challenges  Getting policy right (MED, etc..)  Addressing Univ. of Tehran Computer Network 61

Multi-homing to a Single Provider Case 1

  Easy solution:  Use IMUX or Multi-link PPP Hard solution:  Use BGP  Makes assumptions about traffic (same amount of prefixes can be reached from both links) ISP R1 R2 Customer Univ. of Tehran Computer Network 62

 

Multi-homing to a single provider: Case 2

If multiple prefixes, may use MED  good if traffic load from prefixes is equal If single prefix, load may be unequal  break-down prefix and advertise different prefixes over different links 138.39/16 ISP R1 R2 Customer R3 204.70/16 Univ. of Tehran Computer Network 63

 

Multi-homing to a single provider: Case 3

For ISP-> customer traffic, same as before:  use MED  good if traffic load to prefixes is equal For customer -> ISP traffic:  R3 alternates links  multiple default routes 138.39/16 R1 ISP R3 Customer R2 204.70/16 Univ. of Tehran Computer Network 64

  

Multi-homing to a single provider: Case 4

Most reliable approach  no equipment sharing Customer -> ISP:  same as case 2 ISP -> customer:  same as case 3 R1 ISP R2 R3 R4 Customer 138.39/16 204.70/16 Univ. of Tehran Computer Network 65

Multi-homing to Multiple Providers

   Major issues:  Addressing  Aggregation Customer address space:     Delegated by ISP1 Delegated by ISP2 Delegated by ISP1 and ISP2 Obtained independently Advantage and disadvantage?

ISP1 ISP3 Customer ISP2 Univ. of Tehran Computer Network 66

Address Space from one ISP

       Customer uses address space from one, I.e ISP1 ISP1 advertises /16 aggregate Customer advertises /24 route to ISP2 ISP2 relays route to ISP1 and ISP3 ISP2-3 use /24 route ISP1 routes directly Problems with traffic load?

138.39/16 ISP1 ISP3 Customer 138.39.1/24 ISP2 Univ. of Tehran Computer Network 67

Pitfalls

     ISP1 aggregates to a /19 at border router to reduce internal tables.

ISP1 still announces /16.

ISP1 hears /24 from ISP2.

ISP1 routes packets for customer to ISP2!

Workaround: ISP1 must inject /24 into I-BGP.

138.39/16 ISP1 138.39.0/19 ISP3 Customer 138.39.1/24 ISP2 Univ. of Tehran Computer Network 68

Address Space from Both ISPs

    ISP1 and ISP2 continue to announce aggregates Load sharing depends on traffic to two prefixes Lack of reliability: if ISP1 link goes down, part of customer becomes inaccessible.

Customer may announce prefixes to both ISPs, but still problems with longest match as in case 1.

ISP1 ISP3 ISP2 138.39.1/24 Customer 204.70.1/24 Univ. of Tehran Computer Network 69

Address Space Obtained Independently

  Offers the most control, but at the cost of aggregation.

Still need to control paths    suppose ISP1 large, ISP2-3 small customer advertises long path to ISP1, but local-pref attribute used to override ISP3 learns shorter path from ISP2 ISP1 ISP3 Customer ISP2 Univ. of Tehran Computer Network 70

Outline

External BGP (e-BGP)  Internal BGP (i-BGP)  Multi-Homing  Stability Issues  Scalability Issues Univ. of Tehran Computer Network 71

Convergence in the real-world?

  [Labovitz99] Experimental results from two year study which measured 150,000 BGP faults injected into peering sessions at several IXPs   Found Internet averages 3 minutes to converge after failover Some multihomed failovers (short to long ASPath) require 15 minutes Univ. of Tehran Computer Network 72

 

Signs of Routing Instability

Record of BGP messages at major exchanges, packet loss 30 times and delay 4.

Discovered orders of magnitude larger than expected updates  Bulk were duplicate withdrawals   Stateless implementation of BGP – did not keep track of information passed to peers Impact of few implementations  Strong frequency (30/60 sec) components  Interaction with other local routing/links etc.

Univ. of Tehran Computer Network 73

30 Second Bursts

Univ. of Tehran Computer Network 74

How Long Does BGP Take to Adapt to Changes?

100 90 80 70 60 50 40 30 20 10 0 0 20 40 60 80 100 120

Seconds Until Convergence

140 160 Tup Tshort Tlong Tdow n

Thanks to Abha Ahuja and Craig Labovitz for this plot.

Univ. of Tehran Computer Network 75

Route Flap Storm

     Overloaded routers fail to send Keep_Alive message and marked as down I-BGP peers find alternate paths Overloaded router re-establishes peering session Must send large updates Increased load causes more routers to fail!

Univ. of Tehran Computer Network 76

Route Flap Dampening

   Routers now give higher priority to BGP/Keep_Alive to avoid problem Associate a penalty with each route   Increase when route flaps Exponentially decay penalty with time When penalty reaches threshold, suppress route Univ. of Tehran Computer Network 77

BGP Limitations: Oscillations

(0R,1R,*R) AS 2 AS 0 (*R,1R,2R) R AS 1 (0R,*R,2R) Univ. of Tehran Computer Network 78

BGP Limitations: Oscillations

AS 2 (0R,1R,*R)  (*0R,1R,-) W AS 0 (-,*1R,2R)  (*R,1R,2R) R W W AS 1 (*0R,-,2R)  (0R,*R,2R) Univ. of Tehran Computer Network 79

BGP Limitations: Oscillations

01R AS 2 (*0R,1R,-)  (01R,*1R,-) AS 0 (-,*1R,2R)  (-,*1R,2R) 01R R AS 1 (-,-,*2R)  (*0R,-,2R) Univ. of Tehran Computer Network 80

BGP Limitations: Oscillations

AS 2 (01R,*1R,-)  (*01R,10R,-) AS 0 (-,-,*2R)  (-,*1R,2R) 10R R 10R AS 1 (-,-,*2R)  (-,-,*2R) Univ. of Tehran Computer Network 81

BGP Limitations: Oscillations

AS 0 (-,-,-)  (-,-,*2R) 20R AS 2 (*01R,10R,-)  (*01R,10R,-) R 20R AS 1 (-,-,*20R)  (-,-,*2R) Univ. of Tehran Computer Network 82

BGP Limitations: Oscillations

AS 2 (*01R,10R,-)  (*01R,-,-) AS 0 (-,*12R,-)  (-,-,-) 12R R 12R AS 1 (-,-,*20R)  (-,-,*20R) Univ. of Tehran Computer Network 83

BGP Limitations: Oscillations

21R AS 2 (*01R,-,-)  (*01R,-,-) AS 0 (-,*12R,21R)  (-,*12R,-) R 21R AS 1 (-,-,-)  (-,-,*20R) Univ. of Tehran Computer Network 84

  

BGP Oscillations

Can possible explore every possible path through network  (n-1)! Combinations Limit between update messages (MinRouteAdver) reduces exploration  Forces router to process all outstanding messages Typical Internet failover times  New/shorter link  60 seconds    Results in simple replacement at nodes Down link  180 seconds  Results in search of possible options Longer link  120 seconds  Results in replacement or search based on length Univ. of Tehran Computer Network 85

Problems

  Routing table size  Need an entry for all paths to all networks Required memory= O((N + M*A) * K)  N: number of networks    M: mean AS distance (in terms of hops) A: number of AS’s K: number of BGP peers Univ. of Tehran Computer Network 86

Routing Table Size

Networks

2,100 4,000

Mean AS Distance Number of AS’s BGP Peers/Net

5 59 3 10 100 6 10,000 15 300 10 100,000 20 3,000 20

Memory

27,000 108,000 490,000 1,040,000  Problem reduced with CIDR Univ. of Tehran Computer Network 87

Outline

External BGP (e-BGP)  Internal BGP (i-BGP)  Multi-Homing  Stability Issues  Scalability Issues Univ. of Tehran Computer Network 88

Big and Getting Bigger

    Scaling the iBGP mesh   Confederations Route Reflectors BGP Table Growth  Address aggregation (CIDR)  Address allocation AS number allocation and use Dynamics of BGP  Inherent vs. accidental oscillation  Rate limiting and route flap dampening  Lots and lots of noise  Slow convergence time Univ. of Tehran Computer Network

Scale Scale Scale Scale Scale Scale Scale Scale Scale Scale Scale Scale Scale

89

iBGP Mesh Does Not Scale

eBGP update iBGP updates • N border routers means N(N-1)/2 peering sessions • Each router must have N-1 iBGP sessions configured • The addition a single iBGP speaker requires configuration changes to all other iBGP speakers • Size of iBGP routing table can be order N larger than number of best routes (remember alternate routes!) • Each router has to listen to update noise from each neighbor Currently four solutions: (0) Buy bigger routers!

(1) Break AS into smaller ASes (2) BGP Route reflectors (3) BGP confederations

90

BGP Table Growth

Thanks to Geoff Huston. http://www.telstra.net/ops/bgptable.html on August 8, 2001

Univ. of Tehran Computer Network 91

Large BGP Tables Considered Harmful

• Routing tables must store best routes and alternate routes • Burden can be large for routers with many alternate routes (route reflectors for example) • Routers have been known to die • Increases CPU load, especially during session reset Moore’s Law may save us in theory. But in practice it means spending money to upgrade equipment …

Univ. of Tehran Computer Network 92

Deaggregation Due to Multihoming May be a Leading Cause

If AS 1 does not announce the more specific prefix, then most traffic to AS 2 will go through AS 3 because it is a longer match 12.2.0.0/16 12.0.0.0/8 AS 1 provider 12.2.0.0/16 AS 3 provider AS 2 customer AS 2 is “punching a hole” in The CIDR block of AS 1

Univ. of Tehran

12.2.0.0/16

Computer Network 93

How Many ASNs are there?

Thanks to Geoff Huston. http://www.telstra.net/ops on June 23, 2001

Univ. of Tehran Computer Network 94

When will we run out of ASNs?

64,511

Univ. of Tehran Computer Network

2005?

2007?

95

What is to be done?

  Make ASNs larger than 16 bits   How about 32 bits? See Internet Draft: “BGP support for four-octet AS number space” (draft-ietf-idr-as4bytes-03.txt)  Requires protocol change and wide deployment Change the way ASNs are used   Allow multihomed, non-transit networks to use private ASNs Uses ASE (AS number Substitution on Egress )   See Internet Draft: “Autonomous System Number Substitution on Egress” (draft-jhaas-ase-00.txt) Works at edge, requires protocol change (for loop prevention) Computer Network 96 

AS Graphs Can Be Fun

The subgraph showing all ASes that have more than 100 neighbors in full graph of 11,158 nodes. July 6, 2001. Point of view: AT&T route-server

97

AS Graphs Do Not Show Topology!

BGP was designed to throw away information!

The AS graph may look like this.

Univ. of Tehran

Reality may be closer to this…

Computer Network 98

BGP Dynamics

  How many updates are flying around the Internet?

How long Does it take Routes to Change? Univ. of Tehran

The goals of (1) fast convergence (2) minimal updates (3) path redundancy are at odds

Computer Network 99

Daily Update Count

Univ. of Tehran Computer Network 100

What is the Sound of One Route Flapping?

Univ. of Tehran Computer Network 101

A Few Bad Apples …

Most prefixes are stable most of the time. On this day, about 83% of the prefixes were not updated.

Typically, 80% of the updates are for less than 5% Of the prefixes.

Percent of BGP table prefixes

Computer Network

Data source: RIPE NCC

102

Two BGP Mechanisms for Squashing Updates

 Rate limiting on sending updates   Send batch of updates every MinRouteAdvertisementInterval seconds (+/- random fuzz) Default value is 30 seconds  A router can change its mind about best routes many times within this interval without telling neighbors

Effective in dampening oscillations inherent in the vectoring approach Must be turned on with configuration

 Route Flap Dampening  Punish routes for misbehaving Univ. of Tehran Computer Network 103

Two Main Factors in Delayed Convergence

  Rate limiting timer slows everything down BGP can explore many alternate paths before giving up or arriving at a new path  No global knowledge in vectoring protocols Univ. of Tehran Computer Network 104

Q: Why All the Updates?

   Networks come, networks go There’s always a router rebooting somewhere Hardware failure, flaky interface cards, backhoes digging, floods in Houston, …

This is “normal” --- exactly what dynamic routing is designed for…

Univ. of Tehran Computer Network 105

Q: Why All the Updates?

           Misconfiguration Route flap dampening not widely used BGP exploring many alternate paths Software bugs in implementation of routing protocols BGP session resets due to congestion or lack of interoperability: BGP sessions are brittle. One malformed update is enough to reset session and flap 100K routes. (Consequence of incremental approach) IGP instability exported by use of MEDs or IGP tie breaker Sub-optimal vendor implementation choices Secret sauce routing algorithms attempting fancy-dancy tricks Weird policy interactions (MED oscillation, BAD GADGETS??) Gnomes, sprites, and fairies ….

Univ. of Tehran

A: NO ONE REALLY KNOWS …

Computer Network 106

IGP Tie Breaking Can Export Internal Instability to the Whole Wide World

192.44.78.0/24 AS 1 AS 2 AS 3 AS 4 15 10 FLAP FLAP 56 192.44.78.0/24 ASPATH = 4 2 1 FLAP FLAP 192.44.78.0/24 ASPATH = 4 3 1

107

MEDs Can Export Internal Instability

192.44.78.0/24 MED = 15 2865 15 FLAP FLAP 10 FLAP FLAP 56 192.44.78.0/24 17 Heavy Content Web Farm 192.44.78.0/24 MED = 56 OR 10 FLAP FLAP

108

Implementation Does Matter!

stateless withdraws widely deployed stateful withdraws widely deployed Thanks to Abha Ahuja and Craig Labovitz for this plot.

Univ. of Tehran Computer Network 109

How Long Will Interdomain Routing Continue to Scale?

A quote from some recent email:

... the existing interdomain routing infrastructure is rapidly nearing the end of its useful lifetime. It appears unlikely that mere tweaks of BGP will stave off fundamental scaling issues, brought on by growth, multihoming and other causes.

Is this true or false? How can we tell?

Univ. of Tehran

Research required…

Computer Network 110

Next Lecture: Multicasting

  How to send data to a group of receivers.

Assigned reading    Multicast Routing in Datagram Internetworks and Extended LANs Next Branch Multicast (NBM) routing protocol Chapter 4, multicasting. 111 Univ. of Tehran Computer Network