Diapositiva 1

Download Report

Transcript Diapositiva 1

Corso di Reti di Calcolatori II
Simon Pietro Romano
[email protected]
Inter-domain routing with BGP4
Copyright notes…
●
●
●
This is a shrinked version of a tutorial taught by Prof. Olivier
Bonaventure from Universite catholique de Louvain (UCL),
Belgium
You can obtain an HTML or OpenOffice version of this
tutorial with the hypertext links by sending an email to the
author.
This work is licensed under a Creative Commons License:
–
●
The updated versions of the slides may be found on:
–
●
http://creativecommons.org/licenses/by-sa/2.0/
http://totem.info.ucl.ac.be/BGP
Tim Griffin maintains a very long and up to date list of
references on BGP; see:
–
http://www.cambridge.intel-research.net/~griffin/interdomain/
Outline
●
Organization of the global Internet
–
●
●
Example of domains
BGP basics
BGP in large networks
How to route IP packets in the global
Internet ?
●
A map of the global Internet in 2000 (source:
http://research.lumeta.com/ches/map/gallery/index.html)
Organization of the Internet
●
Internet is composed of more than 10.000
autonomous routing domains (AS – Autonomous System)
–
A domain is a set of routers, links, hosts and local
area networks under the same administrative control
●
A domain can be very large...
–
●
A domain can be very small...
–
–
AS568: SUMNET-AS DISO-UNRRA contains 73154560 IP
addresses
AS2111: IST-ATRIUM TE Experiment a single PC running
Linux...
Domains are interconnected in various ways
●
●
The interconnection of all domains should in theory allow
packets to be sent anywhere
Usually a packet will need to cross a few ASes to reach its
destination
Types of domains
●
Transit domain
–
A transit domain allows external domains to use its
own infrastructure to send packets to other domains
S1
S2
●
T2
T1
T3
S4
S3
Examples
–
UUNet, OpenTransit, GEANT, Internet2, RENATER,
EQUANT, BT, Telia, Level3,...
Types of domains (2)
●
Stub domain
–
A stub domain does not allow external domains to
use its infrastructure to send packets to other domains
●
A stub is connected to at least one transit domain
–
–
Single-homed stub : connected to one transit domain
Dual-homed stub : connected to two transit domains
S1
S2
–
T3
S4
S3
Content-rich stub domain
●
–
T2
T1
Large web servers : Yahoo, Google, MSN, TF1, BBC,...
Access-rich stub domain
●
ISPs providing Internet access via CATV, ADSL, ...
A Stub domain : Belnet (http://www.belnet.be)
Note well: other maps of ISPs may be found at:
http://www.cs.washington.edu/research/networking/rocketfuel/interactive/
A transit domain : Easynet
http://www.easynet.be/home/index.cfm?id=15&l=1
A transit domain : GEANT
(source http://www.dante.net)
A transit domain : BT/IGnite
Source : http://www.ignite.net/info/maps.shtml
A large transit domain : UUNet
Source: http://www.uu.net
Outline
●
Organization of the global Internet
–
●
●
Example of domains
BGP basics
BGP in large networks
Architecture of a normal IP router
Routing
protocol
Routing table
The "best" paths selected from the routing table
built by the routing protocols are installed in the
forwarding table
Shap.
Control
IP packets
Class.
IP packets
Forwarding
Table
Pol
Forwarding
Shap.
Class.
Pol
Forwarding decision based on longest match
Update of TTL and checksum fields in IP packets
IP packets
Internet routing
–
Interior Gateway Protocol (IGP)
●
Routing of IP packets inside each domain
– Only knows topology of its domain
Domain4
Domain2
Domain1
–
Domain3
Exterior Gateway Protocol (EGP)
●
Routing of IP packets between domains
–
Each domain is considered as a blackbox
Intra-domain routing
●
Goal
–
Allow routers to transmit IP packets along the best
path towards their destination
●
best usually means the shortest path
–
●
–
●
Shortest measured in seconds or as number of hops
sometimes best means the less loaded path
Allow to find alternate routes in case of failures
Behavior
–
All routers exchange routing information
●
●
Each domain router can obtain routing information for the
whole domain
The network operator or the routing protocol selects the
cost of each link
Outline
●
●
Organization of the global Internet
BGP basics
–
–
–
●
Routing policies
The Border Gateway Protocol
How to prefer some routes over others
BGP in large networks
Inter-domain routing
●
Goals
–
Allow to transmit IP packets along the best path
towards their destination through several transit
domains while taking into account the routing policies
of each domain without knowing the detailed
topology of those domains
●
●
From an inter-domain viewpoint, best path often means
cheapest path
Each domain is free to specify inside its routing policy the
domains for which it agrees to provide a transit service and
the method it uses to select the best path to reach each
destination
Domains versus
Autonomous Systems
●
●
The BGP inter-domain routing protocol deals with
Autonomous Systems (AS)
–
An AS is defined as <<a set of routers under a single technical
administration ... that presents a consistent picture of what
destinations are reachable through it.>>
–
Each AS is identified by its AS number
In practice
–
A domain is often equivalent to an AS
–
A domain may be composed of several ASes
●
–
Ex: Worldcom uses AS701, AS702, ...
Many domains do not have an AS number
●
Ex: small networks connected to one provider without using
BGP
Useful links
●
●
Each AS on the Internet has been assigned a 16
bits number by the Regional Internet Registries
For a current list of assigned AS numbers:
–
●
http://www.cidr-report.org/autnums.html
More information:
–
–
http://whois.ripe.net
http://www.radb.net
Types of inter-domain links
●
Two types of inter-domain links
–
Private link
●
Usually a leased line between two routers belonging to the
two connected domains
R2
R1
Domain B
Domain A
–
Connection via a public interconnection point
●
Usually Gigabit or higher Ethernet switch that
interconnects routers belonging to different domains
R2
Physical link
Interdomain link
R3
R1
R4
Routing policies
●
●
In theory BGP allows each domain to define its
own routing policy...
In practice there are two common policies
–
customer-provider peering
●
–
Customer c buys Internet connectivity from provider P
shared-cost peering
●
Domains x and y agree to exchange packets by using
a direct link or through an interconnection point
Customer-provider peering
AS1
$
AS2
$
$
Customer
AS4
AS3
$
Provider
$
–
Principle
●
Customer sends to its provider its internal routes and the
routes learned from its own customers
–
●
AS7
Provider will advertise those routes to the entire Internet to allow
anyone to reach the Customer
Provider sends to its customers all known routes
–
Customer will be able to reach anyone on the Internet
Shared-cost peering
AS1
$
AS2
$
$
$
AS4
AS3
Shared-cost
Customer-provider
$
–
Principle
●
PeerX sends to PeerY its internal routes and the routes
learned from its own customers
–
–
●
AS7
PeerY will use shared link to reach PeerX and PeerX's customers
PeerX's providers are not reachable via the shared link
PeerY sends to PeerX its internal routes and the routes
learned from its own customers
–
–
PeerX will use shared link to reach PeerY and PeerY's customers
PeerY's providers are not reachable via the shared link
Routing policies
●
A domain specifies its routing policy by defining
on each BGP router two sets of filters for each
peer
–
Import filter
●
–
Export filter
●
●
Specifies which routes can be accepted by the router among
all the received routes from a given peer
Specifies which routes can be advertised by the router to a
given peer
Filters can be defined in RPSL
–
Routing Policy Specification Language (RFC 2622)
Note well: Internet Routing Registries contain the routing policies of various ISPs, see :
http://www.ripe.net/ripencc/pub-services/whois.html, http://www.arin.net/whois/index.html,
http://www.apnic.net/apnic-bin/whois.pl
Routing policies
Simple example with RPSL
AS1
$
AS2
$
AS3
$
$
AS4
Shared-cost
Customer-provider
$
Import policy for AS4
Import: from AS3 accept AS3
import: from AS7 accept AS7
import: from AS1 accept ANY
import: from AS2 accept ANY
Export policy for AS4
export: to AS3 announce AS4 AS7
export: to AS7 announce ANY
export: to AS1 announce AS4 AS7
export: to AS2 announce AS4 AS7
AS7
Import policy for AS7
Import: from AS4 accept ANY
Export policy for AS7
export: to AS4 announce AS7
Outline
●
●
Organization of the global Internet
BGP basics
–
–
–
●
Routing policies
The Border Gateway Protocol
How to prefer some routes over others
BGP in large networks
The Border Gateway Protocol
●
Principle
–
Path vector protocol
●
BGP router advertises its best route to each destination
AS5
prefix:1.0.0.0/8
●ASPath: AS1
●
AS1
1.0.0.0/8
prefix:1.0.0.0/8
●ASPath: AS1
AS2
●
–
prefix:1.0.0.0/8
●ASPath: ::AS2:AS4AS1
●
prefix:1.0.0.0/8
●ASPath: AS4:AS1
●
AS4
... with incremental updates
●
Advertisements are only sent when their content changes
“Origin” of the routes announced by
BGP
●
Where do the routes announced by a BGP router come
from ?
–
Learned from other BGP routers
●
–
Static configuration
●
●
●
–
BGP router only propagates the received routes
BGP router is configured to advertise some prefixes
Drawback : requires manual configuration
Advantage : Stable set of advertised prefixes
Learned from an Interior Gateway Protocol
●
●
The prefixes received from the IGP are advertised by the BGP router
usually as an aggregate
Advantage
–
●
BGP advertisements follow network state, prefix is automatically withdrawn by
BGP if it is not reachable via IGP
Drawback
–
BGP announcements will be unstable if IGP is unstable...
Policies and BGP
●
Two mechanisms to support policies in BGP
–
Each domain defines itself which is the best route to
reach each destination based on the routes learned
from its peers
●
●
–
The chosen best route is not necessarily the ''shortest'' route
as with IGPs
Only the best route towards each destination can be
announced to external peers
Each domain determines, on its own, which routes
can be advertised to each peer
●
An AS does not necessarily advertise to all its neighbors all
the routes that it knows
Conceptual model of a BGP router
Legend:
Adj-RIB-In  Adjacency Routing Information Base for incoming messages
Adj-RIB-Out  Adjacency Routing Information Base for outgoing messages
Loc-RIB  Local Routing Information Base
BGP Adj-RIB-In
BGP Msgs
from Peer[N]
Peer[N]
Peer[1]
Import filter
Attribute
BGP Msgs manipulation
from Peer[1]
BGP Loc-RIB
All
acceptable
routes
BGP Decision
Process
One best
route to each
destination
Import filter(Peer[i])
Determines which BGP Msgs
are acceptable from Peer[i]
BGP Routing Information Base
Contains all the acceptable routes
learned from all Peers + internal routes
● BGP decision process selects
the best route towards each destination
BGP Adj-RIB-Out
Peer[N]
BGP Msgs
to Peer[N]
Peer[1]
Export filter
Attribute
manipulation
BGP Msgs
to Peer[1]
Export filter(Peer[i])
Determines which
routes can be sent to Peer[i]
BGP : Principles of operation
●
Principles
–
BGP relies on the
incremental exchange of path vectors
BGP session established over
TCP connection between
peers
Each peer sends all its
active routes
AS3
R1
BGP
session
BGP Msgs
R2
AS4
As long as the BGP session remains up
Incrementally update BGP routing tables
BGP : Principles of operation (2)
●
Simplified model of BGP
–
–
2 types of BGP path vectors
UPDATE
●
●
Used to announce a route towards one prefix
Content of UPDATE
–
–
–
–
Destination address/prefix
Inter-domain path used to reach destination (AS-Path)
Next-hop (address of the router advertising the route)
WITHDRAW
●
●
Used to indicate that a previously announced route is not
reachable anymore
Content of WITHDRAW
–
Unreachable destination address/prefix
Events during a BGP session
1. Addition of a new route to RIB
–
–
A new internal route was added on local router
●
static route added by configuration
●
Dynamic route learned from IGP
Reception of UPDATE message announcing a new or
modified route
2. Removal of a route from RIB
–
–
Removal of an internal route
●
Static route is removed from router configuration
●
Intra-domain route declared unreachable by IGP
Reception of WITHDRAW message
3. Loss of BGP session
–
All routes learned from this peer removed from RIB
The BGP messages
●
Variable length messages with fixed size header
OPEN
used to establish BGP session
● UPDATE
used to send new routes and to remove
Marker ( 16 bytes ) : All 11...
unusable routes
● NOTIFICATION
used to inform the remote peer of
Type
Length : 16 bits
an error
BGP session is closed upon transmission or
reception of NOTIFICATION message
Max length of BGP messages : 4096 bytes ● KEEPALIVE
one message must be sent at least every
30 seconds on each BGP session
● ROUTE_REFRESH
used to support graceful restart
32 bits
●
The OPEN message
●
Used to establish a BGP session between two
BGP peers
32 bits
Currently version 4
Version
My AS Number
Hold Time
BGP Identifier
Opt. Len
Optional Parameters
Variable Length
Encoded in TLV Format
AS # of the BGP peer sending the message
Hold Time : maximum delay between successive
KEEPALIVE, and/or UPDATE messages
BGP Id : Usually IP v4 loopback address
of BGP peer
Optional field :
Used notably for capabilities negotiation
Establishment of a BGP session
Usually, a BGP session can only be established between two manually configured peers. Each
peer needs to be configured with the IP address and the AS number of the remote peer
CONNECT.req
SYN(port=179)
CONNECT.ind
CONNECT.resp
CONNECT.conf
SYN+ACK(port=179)
TCP connection established
DATA.req(OPEN)
ACK(port=179)
TCP connection established
DATA(BGP OPEN)
ACK
DATA.req(OPEN)
BGP session established
DATA.req(OPEN)
DATA.req(OPEN)
DATA(BGP OPEN)
ACK
BGP session established
The UPDATE message
–
Single message type used to carry both IPv4 route
announcements and route withdrawals
32 bits
# Withdrawn routes
Withdrawn routes
Variable Length
LEN
Prefix length in bits
Withdrawn prefix (1-4 octets)
Tot. Path Attr. Len
Path attributes
Variable Length
Network Layer
Reachability Information
Variable Length
LEN
Prefix length in bits
Advertised prefix (1-4 octets)
The KEEPALIVE and
NOTIFICATION messages
●
The KEEPALIVE message
–
–
●
BGP Message containing only the default header
Every HoldTime/3 seconds, send a KEEPALIVE message if
no recent BGP message was sent
The NOTIFICATION message
–
indicates problem in processing of BGP message
●
BGP session is released upon transmission/reception of
NOTIFICATION
Example errors:
● 2: OPEN Message Error
● Unsupported Version, Unsupported
Optional Parameter, ...
● 3: UPDATE Message Error
● Malformed Attribute List, ...
● 4: Hold Timer Expired
● 5: Finite State Machine Error
● 6: Cease
●
Err Code
SubCode
Additional data
(variable length)
BGP and IP
A first example
–
Initial updates
UPDATE
●prefix:194.100.0.0/24,
●NextHop:R2
●ASPath: AS20:AS10
UPDATE
●prefix:194.100.0.0/24,
●NextHop:R1
●ASPath: AS10
AS10
AS20
R1
BGP
R2
194.100.0.0/24
UPDATE
●prefix:194.100.0.0/24,
●NextHop:R1
●ASPath: AS10
R3
194.100.1.0/24
BGP
BGP
AS30
UPDATE
●prefix:194.100.0.0/24,
●NextHop:R4
●ASPath: AS40:AS10
R4
AS40
–
What happens if link AS10-AS20 goes down ?
BGP and IP
A first example (2)
●
●
●
If link AS10-AS20 goes down, AS20 will not
consider anymore the path learned from AS10
AS20 will thus remove this path from its routing
table and will instead select the path learned from
AS40
This will force AS20 to send the following
UPDATE to AS30:
UPDATE
●prefix:194.100.0.0/24,
●NextHop:R2
●ASPath: AS20:AS40:AS10
BGP and IP
A second example
AS20
AS10
AS30
195.100.0.0/30
R1 195.100.0.1
195.100.0.2
194.100.0.0/24
195.100.0.4/30
R2
195.100.0.5
195.100.0.6
BGP
R3
194.100.1.0/24
194.100.2.0/23
UPDATE
●prefix:194.100.0.0/24,
●NextHop:195.100.0.1
●ASPath: AS10
UPDATE
●prefix:194.100.2.0/23,
●NextHop:195.100.0.2
●ASPath: AS20
–
In this example, we only consider the BGP
messages concerning the following IP
networks: 194.100.0.0/24, 194.100.1.0/24 and
194.100.2.0/23
Main Path attributes of UPDATE message
●
NextHop : IP address of router used to reach destination
●
ASPath : Path followed by the route advertisement
BGP and IP
A second example (2)
AS20
AS10
AS30
195.100.0.4/30
195.100.0.0/30
R1 195.100.0.1
195.100.0.2 R2 195.100.0.5
195.100.0.6
194.100.0.0/24
BGP
BGP
R3
194.100.1.0/24
194.100.2.0/23
UPDATE
●prefix:194.100.0.0/24
●NextHop:195.100.0.5
●ASPath: AS20:AS10
UPDATE
●prefix:194.100.2.0/23
●NextHop:195.100.0.5
●ASPath: AS20
UPDATE
●prefix:194.100.1.0/24,
●NextHop:195.100.0.2
●ASPath: AS20;AS30
UPDATE
●prefix:194.100.1.0/24,
●NextHop:195.100.0.6
●ASPath: AS30
BGP and IP
A second example (3)
AS20
AS10
AS30
195.100.0.4/30
195.100.0.0/30
R1 195.100.0.1
195.100.0.2 R2 195.100.0.5
195.100.0.6 R3
194.100.0.0/24
BGP
194.100.1.0/24
194.100.2.0/23
WITHDRAW
●prefix:194.100.1.0/24
Outline
●
●
Organization of the global Internet
BGP basics
–
–
–
●
Routing policies
The Border Gateway Protocol
How to prefer some routes over others
BGP in large networks
How to prefer some routes over others
?
RA
RB
AS2
Backup: 2Mbps
Primary: 34Mbps
R1
AS1
●
How to ensure that packets will flow on primary link ?
RA
AS2
RB
R3
Expensive
AS1 R1
●
AS3
R5
Cheap
R2
AS4
How to prefer cheap link over expensive link ?
AS5
How to prefer some routes over others
(2) ?
BGP Msgs
from Peer[N]
Peer[N]
Peer[1]
Import filter
BGP Msgs
from Peer[1]
Attribute
manipulation
BGP RIB
All
acceptable
routes
BGP Decision
Process
One best
route to each
destination
Peer[N]
BGP Msgs
to Peer[N]
Peer[1]
Export filter
Attribute
manipulation
BGP Msgs
to Peer[1]
Simplified BGP Decision Process
Import filter
● Select routes with highest
● Selection of acceptable routes
local-pref
● Addition of local-pref attribute
● If there are several routes,
inside received BGP Msg
choose routes with the
● Normal quality route: local-pref=100
shortest ASPath
● Better than normal route:local-pref=200
● If there are still several routes
● Worse than normal route:local-pref=50
tie-breaking rule
How to prefer some routes over others
(3) ?
RA
AS2
Backup: 2Mbps
RB
Primary: 34Mbps
R1
AS1
RPSL-like policy for AS1
aut-num: AS1
import: from AS2 RA at R1 set localpref=100;
from AS2 RB at R1 set localpref=200;
accept ANY
export: to AS2 RA at R1 announce AS1
to AS2 RB at R1 announce AS1
RPSL-like policy for AS2
aut-num: AS2
import: from AS1 R1 at RA set localpref=100;
from AS1 R1 at RB set localpref=200;
accept AS1
export: to AS1 R1 at RA announce ANY
to AS2 R1 at RB announce ANY
How to prefer some routes over others
(4) ?
RA
AS2
RB
R3
AS3
Expensive
R5
AS1 R1
Cheap
R2
AS5
AS4
RPSL policy for AS1
aut-num: AS1
import: from AS2 RA at R1 set localpref=100;
from AS4 R2 at R1 set localpref=200;
accept ANY
export: to AS2 RA at R1 announce AS1
to AS4 R2 at R1 announce AS1
●
●
AS1 will prefer to send packets over the cheap link
But the flow of the packets destined to AS1 will depend on
the routing policy of the other domains
Limitations of local-pref
–
In theory
●
Each domain is free to define its order of preference for the
routes learned from external peers
1.0.0.0/8
Preferred paths for AS3
1. AS4:AS1
2. AS1
AS3
●
AS1
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS4
How to reach 1.0.0.0/8 from AS3 and AS4 ?
Limitations of local-pref (2)
●
AS1 sends its UPDATE messages ...
1.0.0.0/8
UPDATE
●Prefix:1.0.0.0/8
●ASPath: AS1
AS3
AS1
UPDATE
●Prefix:1.0.0.0/8
●ASPath: AS1
AS4
Preferred paths for AS3
1. AS4:AS1
2. AS1
Preferred paths for AS4
1. AS3:AS1
2. AS1
Routing table for AS3
1.0.0.0/8 ASPath: AS1 (best)
Routing table for AS4
1.0.0.0/8 ASPath: AS1 (best)
Limitations of local-pref (3)
●
First possibility
–
AS3 sends its UPDATE first...
1.0.0.0/8
AS1
Preferred paths for AS3
1. AS4:AS1
2. AS1
AS3
Routing table for AS3
1.0.0.0/8 ASPath: AS1 (best)
●
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS4
UPDATE
●Prefix:1.0.0.0/8
●ASPath: AS3:AS1
Stable route assignment
Routing table for AS4
1.0.0.0/8 ASPath: AS1
1.0.0.0/8 ASPath:AS3:AS1 (best)
Limitations of local-pref (4)
●
Second possibility
–
AS4 sends its UPDATE first...
1.0.0.0/8
Preferred paths for AS3
1. AS4:AS1
2. AS1
AS1
AS3
Routing table for AS3
1.0.0.0/8 ASPath: AS1
1.0.0.0/8 ASPath: AS4:AS1 (best)
●
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS4
UPDATE
●Prefix:1.0.0.0/8
●ASPath: AS4:AS1
Routing table for AS4
1.0.0.0/8 ASPath: AS1 (best)
Another (but different) stable route assignment
Limitations of local-pref (5)
●
Third possibility
–
AS3 and AS4 send their UPDATE together...
1.0.0.0/8
Preferred paths for AS3
1. AS4:AS1
2. AS1
AS3
UPDATE
●Prefix:1.0.0.0/8
●ASPath: AS3:AS1
●
●
AS1
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS4
UPDATE
●Prefix:1.0.0.0/8
●ASPath: AS4:AS1
AS3 prefers the indirect path and will thus send withdraw since the
chosen best path is via AS4
AS4 prefers the indirect path and will thus send withdraw
since the chosen best path is via AS3
Limitations of local-pref (6)
●
Third possibility (cont.)
–
AS3 and AS4 send their UPDATE together...
Preferred paths for AS3
1. AS4:AS1
2. AS1
1.0.0.0/8
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS1
AS3
WITHDRAW
●Prefix:1.0.0.0/8
●
WITHDRAW
●Prefix:1.0.0.0/8
AS3 learns that the indirect route is not available anymore
–
●
AS4
AS3 will reannounce its direct route...
AS4 learns that the indirect route is not available anymore
–
AS4 will reannounce its direct route...
More limitations of local-pref
●
Unfortunately, inter-domain routing may not converge at
all in some cases...
AS1
Preferred paths for AS3
1. AS4:AS0
2. AS0
AS3
●
AS0
Preferred paths for AS1
1. AS3:AS0
2. AS0
Preferred paths for AS4
1. AS1:AS0
2. AS0
AS4
How to reach a destination inside AS0 in this case ?
local-pref and economical
relationships
●
In practice, local-pref is often used to enforce
economical relationships
Prov1
Prov2
$
$
Peer1
AS1
Peer4
Peer2
$
Cust1
Local-pref values used by AS1
> 1000 for the routes received from a Customer
500 – 999 for the routes learned from a Peer
< 500 for the routes learned from a Provider
Peer3
$
Cust2
Shared-cost
$ Customer-provider
Since AS1 is paid to carry packets towards
Cust1 and Cust2, it will select a route
towards those networks whenever possible
● Since AS1 does not pay to carry packets
towards Peer1-4, AS1 will select a route
towards those networks whenever possible
●
Consequence of this utilization of
local-pref
Which route will be used by AS1 to reach AS5 ?
AS2
$
AS1
AS3
$
$
AS4
AS8
$
$
AS5
$
AS7
– Internet
and how
willare
AS5
reach
AS1 ?
paths
often
asymmetrical
$
AS6
Shared-cost
$ Customer-provider
$
●
Guidelines for
a safe utilization of local-pref
●
The directed graph composed of the customer->
provider links is loop-free
–
An AS cannot be a customer of a provider of its
providers
$
AS1
–
AS2
$
AS3
$
An AS always prefer a route via a customer over a
route via a provider or a peer
●
With some restrictions on the graph composed of peer-topeer relationships, it is also possible to allow an AS to give
the same preference to a route via a customer or via a peer
The Organization of the Internet
–
Tier-1 ISPs
●
●
Dozen of large ISPs
interconnected by shared-cost
Provide transit service
–
–
Tier-2 ISPs
●
●
●
●
Regional or National ISPs
Customer of T1 ISP(s)
Provider of T2 ISP(s)
shared-cost with other T2 ISPs
–
–
Uunet, Level3, OpenTransit, ...
France Telecom, BT,
Belgacom
Tier-3 ISPs
●
●
●
Smaller ISPs, Corporate
Networks, Content providers
Customers of T2 or T1 ISPs
shared-cost with other T3 ISPs
Composition of Internet paths
●
Most Internet paths contain a sequence of
–
–
–
0 or more Customer->Provider relationships
0 or 1 Peer-to-Peer relationships
0 or more Provider->Customer relationships
AS1
AS2
$
$
$
$
AS4
AS3
$
AS9
AS8
$
$
$
AS7
Shared-cost
Customer-provider
Outline
●
●
●
Organization of the global Internet
BGP basics
BGP in large networks
–
–
–
●
The needs for iBGP
Confederations and Route Reflectors
The dynamics of BGP
Inter-domain traffic engineering with BGP
BGP and IP
Second example
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
194.100.0.0/23
AS30
R2
195.100.0.10
BGP
AS20
195.100.0.8/30
195.100.0.9
194.100.4.0/23
●
195.100.0.6
R3
BGP
195.100.0.4/30
R4 195.100.0.5
Problem
–
How can R2 (resp. R4) advertise to R4 (resp. R2) the
routes learned from AS10 (resp. AS30) ?
BGP and IP
Second example (2)
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
194.100.0.0/23
AS30
R2
195.100.0.10
BGP
AS20
IGP
●
195.100.0.4/30
R4 195.100.0.5
First solution
–
●
BGP
195.100.0.8/30
195.100.0.9
194.100.4.0/23
Use IGP (OSPF/ISIS,RIP) to carry BGP routes
Drawbacks
–
–
195.100.0.6
IGP may not be able to support so many routes
IGP does not carry BGP attributes like ASPath !
R3
The AS7007 incident
●
The AS7007 incident
AS7007
AS Y
AS x
RX
R1
4.0.0.0/8 : AS x:AS3:AS6
●
RY
4.0.0.0/8 : AS7007 !!!!!!
A single configuration error in two routers
–
–
All routes learned from ASX on R1 were redistributed to R2 via IGP and R2
announced them to ASY
Consequence
●
●
–
●
R2
AS7007 advertised routes that almost all IP addresses were belonging to AS7007
These routes were shorter than the real routes ...
Two hours of disruption for large parts of the Internet !
http://answerpointe.cctec.com/maillists/nanog/historical/9704/msg00342.html
iBGP and eBGP
194.100.2.0/23
AS30
AS10
195.100.0.2
R2
195.100.0.0/30
195.100.0.10
R1 195.100.0.1
eBGP
195.100.0.8/30
194.100.0.0/23
AS20
iBGP
195.100.0.9
194.100.4.0/23
●
195.100.0.6
R3
eBGP
195.100.0.4/30
R4 195.100.0.5
Solution
–
Use BGP to carry routes between all routers of domain
●
●
●
Two different types of BGP sessions
eBGP between routers belonging to different ASes
iBGP between each pair of routers belonging to the same AS
–
–
Each BGP router inside ASx maintains an iBGP session with all other
BGP routers of ASx (full iBGP mesh)
Note that the iBGP sessions do not necessarily follow physical topology
iBGP versus eBGP
●
Differences between iBGP and eBGP
–
local-pref attribute is only carried inside
messages sent over iBGP session
–
Over an eBGP session, a router only advertises
its best route towards each destination
●
–
Usually, import and export filters are defined for each
eBGP session
Over an iBGP session, a router advertises only
its best routes learned over eBGP sessions
●
●
A route learned over an iBGP session is never advertised
over another iBGP session
Usually, no filter is applied on iBGP sessions
iBGP and eBGP : Example
UPDATE (via eBGP)
●Prefix:194.100.0.0/23,
●NextHop:195.100.0.1
●ASPath: AS10
AS10
194.100.2.0/23
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
AS30
R2
195.100.0.6
195.100.0.10
R3
195.100.0.8/30
iBGP
eBGP
UPDATE (via iBGP)
195.100.0.9
195.100.0.4/30
●Prefix:194.100.0.0/23,
●NextHop:195.100.0.1
R4 195.100.0.5 UPDATE (via eBGP)
●Prefix:194.100.0.0/23,
●ASPath: AS10
194.100.4.0/23
●NextHop:195.100.0.5
●Local-pref:1000
●ASPath: AS20:AS10
●
Note that the next-hop and the AS-Path of BGP update
messages are only updated when sent over an eBGP session
iBGP and eBGP
Packet Forwarding
194.100.2.0/23
AS30
AS10
195.100.0.2
R2
195.100.0.0/30
195.100.0.10
R1 195.100.0.1
eBGP
195.100.0.8/30
194.100.0.0/23
AS20
iBGP
195.100.0.9
194.100.4.0/23
195.100.0.6
R3
eBGP
195.100.0.4/30
R4 195.100.0.5
BGP routing table of R2
194.100.0.0/23 via 195.100.0.1
BGP routing table of R4
194.100.0.0/23 via 195.100.0.1
IGP routing table of R2
195.100.0.0/30 West
195.100.0.4/30 via 195.100.0.9
195.100.0.8/30 South
194.100.0.4/23 via 195.100.0.9
194.100.2.0/23 North
IGP routing table of R4
195.100.0.0/30 via 195.100.0.10
195.100.0.4/30 East
195.100.0.8/30 North
194.100.2.0/23 via 195.100.0.10
194.100.0.4/23 West
iBGP and eBGP
Packet Forwarding (2)
194.100.2.0/23
AS30
AS10
195.100.0.2
R2
195.100.0.0/30
195.100.0.10
R1 195.100.0.1
eBGP
195.100.0.8/30
194.100.0.0/23
AS20
iBGP
195.100.0.9
BGP routing table of R4
194.100.0.0/23 via 195.100.0.1
194.100.4.0/23
IGP routing table of R4
195.100.0.0/30 via 195.100.0.10
195.100.0.4/30 East
195.100.0.8/30 North
194.100.2.0/23 via 195.100.0.10
194.100.4.0/23 West
195.100.0.6
R3
eBGP
195.100.0.4/30
R4 195.100.0.5
Forwarding of R4
194.100.0.0/23 via 195.100.0.10
195.100.0.0/30 via 195.100.0.10
195.100.0.4/30 East
195.100.0.8/30 North
194.100.2.0/23 via 195.100.0.10
194.100.4.0/23 West
The forwarding table of a router is thus built based on both the IGP and the BGP
tables
Using non-BGP routers
194.100.2.0/23
AS30
AS10
195.100.0.2
R2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
iBGP
195.100.0.6
R5
eBGP
R3
12.0.0.0/8
195.100.0.4/30
194.100.4.0/23
●
R4 195.100.0.5
Problem
–
What happens when there are internal backbone routers
between BGP routers inside an AS ?
●
●
iBGP session between BGP routers is easily established when IGP is
running since iBGP runs over TCP connection
How to populate the routing table of the backbone routers to ensure that
they will be able to route any IP packet ?
Using non-BGP routers (2)
194.100.2.0/23
AS30
AS10
195.100.0.2
R2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
iBGP
195.100.0.6
R5
R3
eBGP
195.100.0.4/30
194.100.4.0/23
●
R4 195.100.0.5
First solution
–
Use tunnels between BGP routers to encapsulate interdomain
packets
●
GRE tunnel
–
●
Needs static configuration and be careful with MTU issues
MPLS tunnel
–
Can be dynamically established in MPLS enabled backbone
MPLS in large ISP networks
●
Only one BGP table lookup inside the AS
–
Use a hierarchy of labels
●
●
top label is used to reach egress router
second label is used to reach eBGP peer
RG
RH
RA
B4
R1
B3
RB
R2
R5
Egress Border router
– packets are label
switched
B6
AS1
RC
RD
Ingress Border router
– Maintains full BGP routing table
– Attach two labels based on routing
table
R7
RE
RF
Using non-BGP routers (3)
194.100.2.0/23
AS30
AS10
195.100.0.2
R2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
iBGP
195.100.0.6
R5
eBGP
R3
12.0.0.0/8
195.100.0.4/30
194.100.4.0/23
●
R4 195.100.0.5
Second solution
–
–
Use IGP (OSPF/IS-IS - RIP) to redistribute inter-domain
routes to internal backbone routers
Drawbacks
●
●
Size of BGP tables may completely overload the IGP
Make sure that BGP routes learned by R2 and injected inside IGP will
not be re-injected inside BGP by R4 !
Using non-BGP routers (4)
194.100.2.0/23
AS30
AS10
195.100.0.2
R2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
iBGP
iBGP
194.100.4.0/23
●
195.100.0.6
iBGP
R5
eBGP
R3
12.0.0.0/8
195.100.0.4/30
R4 195.100.0.5
Third solution
–
–
Run BGP on internal backbone routers
Internal backbone routers need to participate in iBGP full mesh
●
Internal backbone routers receive BGP routes via iBGP but never
advertise any routes
–
Remember: a route learned over an iBGP session is never advertised over
another iBGP session
The roles of IGP and BGP
194.100.2.0/23
AS10
195.100.0.2
R2
195.100.0.0/30
R1 195.100.0.1
194.100.0.0/23
iBGP
eBGP
AS20
R5
iBGP
194.100.4.0/23
iBGP
R4
AS30
195.100.0.4/30
195.100.0.6
eBGP
–
Role of the IGP inside AS20
●
–
12.0.0.0/8
Distribute internal topology and internal addresses
R2-R4-R5)
Role of BGP inside AS20
●
●
R3
Distribute the routes towards external destinations
IGP must run to allow BGP routers to establish iBGP sessions
The iBGP full mesh
●
Drawback
–
N*(N-1)/2 iBGP sessions for N routers
R
R
R
R
R
R
R
R
iBGP session
R
Outline
●
●
●
Organization of the global Internet
BGP basics
BGP in large networks
–
–
–
●
The needs for iBGP
Confederations and Route Reflectors
The dynamics of BGP
Inter-domain traffic engineering with BGP
How to scale iBGP in large domains ?
●
Confederations
–
Divide the large domain in smaller sub-domains
●
●
Use iBGP full mesh inside each sub-domain
Use eBGP between sub-domains
Confederation : AS20
R
R
Member-AS
AS65001
–
R
R
R
R
R
iBGP session
eBGP session
Each router is configured with two AS numbers
●
●
–
R
Member-AS
AS65002
Its confederation AS number
Its Member-AS AS number
Usually, a single IGP covers the whole domain
Confederations: example
UPDATE (via eBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
AS20
eBGP
RX
R2
AS10
R6
iBGP eBGP
iBGP
R1
AS65021
iBGP
iBGP
R3
AS65020
R5
eBGP
RY
AS30
●
On the eBGP session between R2 and RX, R2 belongs to AS20
●
On the eBGP session between R5 and RY, R5 belongs to AS20
●
On the eBGP session between R1 and R6, R1 belongs to AS65020 and
R6 belongs to AS65021
Confederations : example (2)
UPDATE (via iBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
UPDATE (via eBGP)
●Prefix:1.0.0.0/8,
●ASPath: [AS65020]:AS10
AS20
eBGP
RX
R2
AS10
iBGP eBGP
iBGP
R1
R6
AS65021
iBGP
iBGP
R3
eBGP
AS65020
R5
RY
AS30
●
When propagating an UPDATE via eBGP to another router of the same
confederation, R1 inserts its Member-AS number in the AS_PATH
Confederations : example (3)
●
When propagating an UPDATE via eBGP to a router outside its
confederation, R5 removes the internal path from the AS_Path and
inserts its Confederation AS number in the AS_PATH
UPDATE (via iBGP)
●Prefix:1.0.0.0/8,
●ASPath:
[AS65020]:AS10
AS20
eBGP
RX
R2
AS10
iBGP eBGP
iBGP
R1
R6
AS65021UPDATE (via eBGP)
Prefix:1.0.0.0/8,
iBGP
●
ASPath: AS20:AS10
iBGP
R3
AS65020
●
R5
eBGP
RY
AS30
●
In practice, BGP confederations are particularly useful when two
companies or two distinct ASes from the same company must be
merged in a single AS
Route reflectors
An alternative to confederations
●
Route reflectors (RFC 2796)
–
A route reflector is a special router that is allowed to propagate
the routes learned over iBGP sessions on other iBGP sessions
Normal iBGP full mesh
eBGP
R2
iBGP with one route reflector
eBGP
iBGP
iBGP
R2
iBGP
R1
iBGP
iBGP
eBGP
R3
RR
iBGP
eBGP
R3
Route
Reflector
Behavior of a Route Reflector
●
Two types of iBGP peers of a route reflector
R1
R2
iBGP
iBGP
....
RN
iBGP
RR clients peers
( do not participate in
iBGP full mesh)
RR
iBGP
iBGP
RX
iBGP
iBGP
RZ
iBGP
RY
iBGP
Non-clients peers
(participate in iBGP full mesh)
Behavior of a Route Reflector
●
Route received from an eBGP session or a client peer
–
Select best path
–
Advertise to
●
All client peers
●
All non-client peers
RR clients peers
....
R2
R1
iBGP
iBGP
RN
iBGP
RR
●
iBGP
Route received from
non-client peer
–
–
Select best path
Advertise to
●
All client peers
iBGP
RX
iBGP
iBGP
RZ
iBGP
RY
iBGP
Non-clients peers
Fault tolerance of route reflectors
●
How to avoid having the RR as a single point of failure ?
–
Solution
●
Allow each client peer to be connected at 2 RRs
R1
RR clients peers
....
R2
iBGP iBGP
RR1
–
iBGP
iBGP
RN
iBGP
RR2
Issue
●
Configuration errors may cause redistribution loops
–
–
ORIGINATOR_ID used to carry router ID of originator of route
CLUSTER_LIST contains the list of RR that sent the UPDATE message
inside the current AS
Route reflectors : an example
UPDATE (via
eBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
AS20
eBGP
RX
R2
AS10
iBGP iBGP
UPDATE (via
eBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
RR1
iBGP
iBGP
R3
eBGP
RZ
RR6
eBGP
R5
●
R2 and R3 are clients of Route Reflector RR1
●
RR1 and RR6 are in iBGP full mesh
●
R5 is client of Route Reflector RR6
RY
AS30
Route reflectors : an example (2)
UPDATE (via iBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
AS20
●Nexthop:RX
eBGP
RX
R2
AS10
iBGP iBGP
RR1
RR6
iBGP
iBGP
R3
eBGP
RZ
●
UPDATE (via iBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
●Nexthop:RZ
R5
eBGP
RY
AS30
RR1 will select its best path towards 1.0.0.0/8 and will
re-advertise it by adding the ORIGINATOR_ID and the CLUSTERID
Route reflectors : an example (3)
eBGP
RX
UPDATE (via iBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
●Nexthop:RX
●ORIGINATOR_ID:R2
AS20
●CLUSTER_ID:RR1
R2
UPDATE (via iBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
●Nexthop:RX
●ORIGINATOR_ID:R2
●CLUSTER_ID:RR1
AS10
iBGP iBGP
RR1
eBGP
●
iBGP
iBGP
R3
RZ
RR6
R5
eBGP
RY
AS30
RR1 prefers the path to 1.0.0.0/8 via RX-R2
–
–
RR1 advertises this path to its client peer (R3)
● the path is not advertised to R2 since R2 already received it
RR1 advertises this path to its non-client peer (RR6)
Route reflectors : an example (4)
UPDATE (via iBGP)
●Prefix:1.0.0.0/8,
●ASPath: AS10
●Nexthop:RX
●ORIGINATOR_ID:R2
●CLUSTER_ID:RR1:RR6
AS20
eBGP
RX
R2
AS10
iBGP iBGP
RR1
RR6
iBGP
iBGP
R3
eBGP
RZ
●
RY
AS30
RR6 advertises the path to 1.0.0.0/8 via RX-R2
–
●
R5
eBGP
to its client peer R5
R5 will remove ORIGINATOR_ID and CLUSTER_ID
before advertising the path to RY via eBGP
Hierarchy of route reflectors
●
In large domains, a hierarchy of route reflectors
can be built
R1, R2 and R3 are clients of
route reflectors RR1 and RR2
●
R5
R4
R1
RR4
RR1
R2
RR1 and RR2 are clients of route
reflectors RRA and RRB
●
RRA
R6
R4 and R5 are clients of route
reflector RRA
●
R3
RR5
RR2
RRC
RRB
R6 is client of route reflectors
RR4 and RR5
●
RRA, RRB and RRC are in full
iBGP mesh
●
iBGP session
Confederations versus Route
reflectors
●
Confederations
–
–
–
–
–
–
Solves iBGP scaling
Redundancy with
iBGP full-mesh inside
each MemberAS
Possible to run one
IGP per Member AS
Requires manual
router configuration
Can be used when
merging domains
Can lead to some
routing oscillations
●
Route reflectors
–
–
–
–
–
Solves iBGP scaling
Redundancy by
using Redundant
RRs
Usually a single IGP
for the whole AS
Requires manual
router configuration
Can lead to some
routing oscillations