CSE 524: Lecture 8 - Tamkang University

Download Report

Transcript CSE 524: Lecture 8 - Tamkang University

Computer Networks
Network layer (Part 4)
1
Network layer (so far)
• Network layer functions
• Network layer implementation (IP)
• Today
– Network layer devices (routers)
•
•
•
•
Network processors
Input/output port functions
Forwarding functions
Switching fabric
– Advanced network layer topics
• Routing problems
• Routing metric selection
• Overlay networks
2
NL: Router Architecture Overview
Key router functions:
– Run routing algorithms/protocol (RIP, OSPF, BGP) and construct
routing table
– Switch/forward datagrams from incoming to outgoing link based on
route
3
NL: Routing vs. Forwarding
• Routing: process by which the forwarding table is
built and maintained
– One or more routing protocols
– Procedures (algorithms) to convert routing info to
forwarding table.
• Forwarding: the process of moving packets from
input to output
– The forwarding table
– Information in the packet
4
NL: What Does a Router Look Like?
• Network processor/controller
– Handles routing protocols, error conditions
• Line cards
– Network interface cards
• Forwarding engine
– Fast path routing (hardware vs. software)
• Backplane
– Switch or bus interconnect
5
NL: Network Processor
• Runs routing protocol and downloads forwarding table
to forwarding engines
– Use two forwarding tables per engine to allow easy
switchover (double buffering)
• Typically performs “slow” path processing
–
–
–
–
ICMP error messages
IP option processing
IP fragmentation
IP multicast packets
6
NL: Fast-path router processing
•
•
•
•
•
Packet arrives arrives at inbound line card
Header transferred to forwarding engine
Forwarding engine determines output interface
Forwarding engine signals result to line card
Packet copied to outbound line card
7
NL: Input Port Functions
Physical layer:
bit-level reception
Data link layer:
e.g., Ethernet
see chapter 5
Decentralized switching:
• given datagram dest., lookup output port using
routing table in input port memory
• goal: complete input port processing at ‘line
speed’
• queuing: if datagrams arrive faster than
forwarding rate into switch fabric
8
NL: Input Port Queuing
• Fabric slower than input ports combined => queuing may
occur at input queues
• Head-of-the-Line (HOL) blocking: queued datagram at
front of queue prevents others in queue from moving
forward
• queueing delay and loss due to input buffer overflow!
9
NL: Input Port Queuing
• Possible solution
– Virtual output buffering
• Maintain per output buffer at input
• Solves head of line blocking problem
• Each of MxN input buffer places bid for output
– Crossbar connect
– Challenge: map of bids to schedule for crossbar
10
NL: Forwarding Engine
• General purpose processor + software
• Packet trains help route hit rate
– Packet train = sequence of packets for same/similar flows
– Similar to idea behind IP switching (ATM/MPLS) where long-lived
flows map into single label
• Example
– Partridge, et. al. “A 50-Gb/s IP Router”, IEEE Trans. On Networking,
Vol 6, No 3, June 1998.
– 8KB L1 Icache
• Holds full forwarding code
– 96KB L2 cache
• Forwarding table cache
– 16MB L3 cache
• Full forwarding table x 2 - double buffered for updates
11
NL: Binary trie
Route
A
B
C
D
E
F
G
H
I
Prefixes
0*
01000*
011*
1*
100*
1100*
1101*
1110*
1111*
0
1
A
D
1
0
0
1
0
C
0
1
0
1
E
0
1
0
1
F
G
H
I
0
B
12
NL: Path-compressed binary trie
• Eliminate single branch point nodes
• Variants include PATRICIA and BSD tries
Route
A
B
C
D
E
F
G
H
I
Prefixes
0*
01000*
011*
1*
100*
1100*
1101*
B
1110*
1111*
Bit=1
0
1
Bit=3 A
0
Bit=2 D
1
0
C
1
E
Bit=3
0
Bit=4
0
F
1
1
Bit=4
0
1
G
H
I
13
NL: Patricia tries and variable prefix match
• Patricia Tree
• Arrange route entries into a series of bit tests
• Worst case = 32 bit tests
• Problem: memory speed is a bottleneck
• Used in older BSD Unix routing implementations
0
Bit to test – 0 = left child,1 = right child
10
default
0/0
128.2/16
16
128.32/16
19
128.32.130/240
128.32.150/24
14
NL: Multi-bit tries
• Compare multiple bits at a time
–
–
–
–
Reduces memory accesses
Forces table expansion for prefixes falling in between strides
Variable-length multi-bit tries
Fixed-length multi-bit tries
• Most route entries are Class C
• Cut prefix tree at 16 bit depth
–
–
–
–
–
Many prefixes 8, 16, 24 bits in length
64K bit mask
Bit = 1 if tree continues below cut (root head)
Bit = 1 if leaf at depth 16 or less (genuine head)
Bit = 0 if part of range covered by leaf
15
NL: Variable stride multi-bit trie
• Single level has variable stride lengths
Route
A
B
C
D
E
F
G
H
I
Prefixes
0*
01000*
011*
1*
100*
A
1100*
1101*
1110*
1111*
00
01
10
11
A
00 01
D
0
10 11
C
C
E
D
1
00 01
F
G
10 11
H
I
0 1
B
16
NL: Fixed stride multi-bit trie
• Single level has equal strides
Route
A
B
C
D
E
F
G
H
I
Prefixes
0*
01000*
011*
1*
100*
000
1100*
1101*
A
1110*
1111*
001
A
010
011
A
C
B
00 01 10 11
100
101
E
110
D
111
D
D
F F G G H H I I
00 01 10 11 00 01 10 11
17
NL: Other data structures
• Ruiz-Sanchez, Biersack, Dabbous, “Survey and
Taxonomy of IP address Lookup Algorithms”, IEEE
Network, Vol. 15, No. 2, March 2001
–
–
–
–
–
–
–
–
LC trie
Lulea trie
Full expansion/compression
Binary search on prefix lengths
Binary range search
Multiway range search
Multiway range trees
Binary search on hash tables (Waldvogel – SIGCOMM 97)
18
NL: Prefix Match issues
• Scaling
– IPv6
• Stride choice
– Tuning stride to route table
– Bit shuffling
19
NL: Speeding up Prefix Match - Alternatives
• Route caches
– Temporal locality
– Many packets to same destination
• Protocol acceleration
– Add clue (5 bits) to IP header
– Indicate where IP lookup ended on previous node (Bremler-Barr
SIGCOMM 99)
• Content addressable memory (CAM)
– Hardware based route lookup
– Input = tag, output = value associated with tag
– Requires exact match with tag
• Multiple cycles (1 per prefix searched) with single CAM
• Multiple CAMs (1 per prefix) searched in parallel
– Ternary CAM
• 0,1,don’t care values in tag match
• Priority (i.e. longest prefix) by order of entries in CAM
20
NL: Types of network switching fabrics
Memory
Multistage interconnection
Crossbar interconnection
Bus
21
NL: Types of network switching fabrics
• Issues
– Switch contention
• Packets arrive faster than switching fabric can switch
• Speed of switching fabric versus line card speed determines input
queuing vs. output queuing
22
NL: Switching Via Memory
First generation routers:
• packet copied by system’s (single) CPU
• 2 bus crossings per datagram
• speed limited by memory bandwidth
Modern routers:
• input port processor performs lookup, copy into memory
• Cisco Catalyst 8500
Input
Port
Memory
System Bus
Output
Port
23
NL: Switching Via Bus
• Datagram from input port memory
to output port memory via a shared bus
• Bus contention: switching speed limited by bus
bandwidth
• 1 Gbps bus, Cisco 1900: sufficient speed for access
and enterprise routers (not regional or backbone)
24
NL: Switching Via An Interconnection Network
• Overcome bus bandwidth limitations
• Crossbar networks
– Fully connected (n2 elements)
– All one-to-one, invertible permutations supported
25
NL: Switching Via An Interconnection Network
• Crossbar with N2 elements hard to scale
• Multi-stage interconnection networks (Banyan)
–
–
–
–
–
–
Initially developed to connect processors in multiprocessor
Typically (n log n) elements
Datagram fragmented fixed length cells
Cells switched through the fabric
Cisco 12000: Gbps through an interconnection network
Blocking possible (not all one-to-one, invertible permutations supported)
A
W
B
X
C
Y
D
Z
26
NL: Output Ports
• Output contention
– Datagrams arrive from fabric faster than output port’s transmission rate
– Buffering required
– Scheduling discipline chooses among queued datagrams for transmission
27
NL: Output port queueing
• buffering when arrival rate via switch exceeds ouput line speed
• queueing (delay) and loss due to output port buffer overflow!
28
NL: Advanced topics
•
•
•
•
Routing synchronization
Routing instability
Routing metrics
Overlay networks
29
NL: Routing Update Synchronization
• Another interesting robustness issue to consider...
• Even apparently independent processes can eventually
synchronize
– Intuitive assumption that independent streams will not
synchronize is not always valid
– Periodic routing protocol messages from different routers
– Abrupt transition from unsynchronized to synchronized
system states
30
NL: Examples/Sources of Synchronization
• TCP congestion windows
– Cyclical behavior shared by flows through gateway
• Periodic transmission by audio/video applications
• Periodic downloads
• Synchronized client restart
– After a catastrophic failure
• Periodic routing messages
– Manifests itself as periodic packet loss on pings
• Pendulum clocks on same wall
• Automobile traffic patterns
31
NL: How Synchronization Occurs
T
A
Message from B
T
Weak Coupling when
A’s behavior is
triggered off of B’s
message arrival!
A
Weak coupling
can result in
eventual
synchronization
32
NL: Routing Source of Synchronization
• Router resets timer after processing its own and incoming
updates
• Creates weak coupling among routers
• Solutions
– Set timer based on clock event that is not a function of processing
other routers’ updates, or
– Add randomization, or reset timer before processing update
• With increasing randomization, abrupt transition from predominantly
synchronized to predominantly unsynchronized
• Most protocols now incorporate some form of randomization
33
NL: Routing Instability
• References
– C. Labovitz, R. Malan, F. Jahanian, ``Internet Routing
Stability'', SIGCOMM 1997.
• Record of BGP messages at major exchanges
• Discovered orders of magnitude larger than expected
updates
– Bulk were duplicate withdrawals
• Stateless implementation of BGP – did not keep track of information
passed to peers
• Impact of few implementations
– Strong frequency (30/60 sec) components
• Interaction with other local routing/links etc.
34
NL: Route Flap Storm
• Overloaded routers fail to send Keep_Alive message
and marked as down
• BGP peers find alternate paths
• Overloaded router re-establishes peering session
• Must send large updates
• Increased load causes more routers to fail!
35
NL: Route Flap Dampening
• Routers now give higher priority to BGP/Keep_Alive
to avoid problem
• Associate a penalty with each route
– Increase when route flaps
– Exponentially decay penalty with time
• When penalty reaches threshold, suppress route
36
NL: BGP Oscillations
• Can possible explore every possible path through
network  (n-1)! Combinations
• Limit between update messages (MinRouteAdver)
reduces exploration
– Forces router to process all outstanding messages
• Typical Internet failover times
– New/shorter link  60 seconds
• Results in simple replacement at nodes
– Down link  180 seconds
• Results in search of possible options
– Longer link  120 seconds
• Results in replacement or search based on length
37
NL: Routing Metrics
• Choice of link cost defines traffic load
– Low cost = high probability link belongs to SPT and will
attract traffic, which increases cost
• Main problem: convergence
– Avoid oscillations
– Achieve good network utilization
38
NL: Metric Choices
• Static metrics (e.g., hop count)
– Good only if links are homogeneous
– Definitely not the case in the Internet
• Static metrics do not take into account
– Link delay
– Link capacity
– Link load (hard to measure)
39
NL: Original ARPANET Metric
• Cost proportional to queue size
– Instantaneous queue length as delay estimator
• Problems
– Did not take into account link speed
– Poor indicator of expected delay due to rapid fluctuations
– Delay may be longer even if queue size is small due to
contention for other resources
40
NL: Metric 2 - Delay Shortest Path Tree
• Delay = (depart time - arrival time) + transmission
time + link propagation delay
– (Depart time - arrival time) captures queuing
– Transmission time captures link capacity
– Link propagation delay captures the physical length of the
link
• Measurements averaged over 10 seconds
– Update sent if difference > threshold, or every 50 seconds
41
NL: Performance of Metric 2
• Works well for light to moderate load
– Static values dominate
• Oscillates under heavy load
– Queuing dominates
42
NL: Specific Problems
• Range is too wide
– 9.6 Kbps highly loaded link can appear 127 times costlier
than 56 Kbps lightly loaded link
– Can make a 127-hop path look better than 1-hop
• No limit to change between reports
• All nodes calculate routes simultaneously
– Triggered by link update
43
NL: Example
A
Net X
Net Y
B
44
NL: Example
After everyone re-calculates routes:
A
Net X
Net Y
B
.. Oscillations!
45
NL: Consequences
•
•
•
•
Low network utilization (50% in example)
Congestion can spread elsewhere
Routes could oscillate between short and long paths
Large swings lead to frequent route updates
– More messages
– Frequent SPT re-calculation
46
NL: Revised Link Metric
• Better metric: packet delay = f(queueing,
transmission, propagation)
• When lightly loaded, transmission and propagation
are good predictors
• When heavily loaded queueing delay is dominant and
so transmission and propagation are bad predictors
47
NL: Normalized Metric
• If a loaded link looks very bad then everyone will
move off of it
• Want some to stay on to load balance and avoid
oscillations
• It is still an OK path for some
• Hop normalized metric diverts routes that have an
alternate that is not too much longer
• Also limited relative values and range of values
advertised  gradual change
48
NL: Revised Metric
• Limits on relative change
– Measured link delay is taken over 10sec period
– Link utilization is computed as .5*current sample
+ .5*last average
– Max change limited to slightly more than ½ hop
– Min change limited to slightly less than ½ hop
– Bounds oscillations
• Normalized according to link type
– Satellite should look good when queueing on other
links increases
49
NL: Routing Metric vs. Link Utilization
225
140
9.6 satellite
New metric
(routing units)
90
75
60
30
0
9.6 terrestrial
56 satellite
56 terrestrial
25%
50%
75%
100%
Utilization
50
NL: Observations
• Utilization effects
– High load never increases cost more than 3*cost of idle link
– Cost = f(link utilization) only at moderate to high loads
• Link types
– Most expensive link is 7 * least expensive link
– High-speed satellite link is more attractive than low-speed
terrestrial link
• Allows routes to be gradually shed from link
51
NL: Idealized Network Response Maps
•Load of “average” link as a function of that link’s cost
•Created empirically
1.0
0.8
Mean load
on link
Increasing
applied
network
load
0.6
0.4
0.2
0.0
0.5
1.0
1.5
2.0
2.5
Link cost
3.0
3.5
4.0
52
NL: Equilibrium Calculation
•Combine utilization to cost and cost to utilization maps
•Equilibrium points at intersections
1.0
Increasing
applied
network load
HN-SPF
0.8
D-SPF
Mean load
on link
0.6
0.4
0.2
0.0
0.5
1.0
1.5
2.0
2.5
Link cost
3.0
3.5
4.0
53
NL: Routing Dynamics
•Limiting maximum metric change bounds oscillation
Bounded
oscillation
1.0
Metric map
0.75
Utilization
0.5
0.25
0
Network response
0.5 1.0
1.5 2.0 2.5 3.0 3.5 4.0
Link reported cost
54
NL: Routing Dynamics
Metric map
1.0
0.75
Easing in
a new link
Utilization
0.5
Network response
0.25
0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Reported cost
55
NL: Overlay Routing
• Basic idea:
– Treat multiple hops through IP network as one hop in an
overlay network
– Run routing protocol on overlay nodes
• Why?
– For performance – can run more clever protocol on overlay
– For efficiency – can make core routers very simple
– For functionality – can provide new features such as
multicast, active processing, IPv6
56
NL: Overlay for Performance
• References
– Savage et. al. “The End-to-End Effects of Internet Path
Selection”, SIGCOMM 99
– Anderson et. al. “Resilient Overlay Networks”, SOSP 2001
• Why would IP routing not give good performance?
– Policy routing – limits selection/advertisement of routes
– Early exit/hot-potato routing – local not global incentives
– Lack of performance based metrics – AS hop count is the
wide area metric
• How bad is it really?
– Look at performance gain an overlay provides
57
NL: Quantifying Performance Loss
• Measure round trip time (RTT) and loss rate between
pairs of hosts
– ICMP rate limiting
• Alternate path characteristics
– 30-55% of hosts had lower latency
– 10% of alternate routes have 50% lower latency
– 75-85% have lower loss rates
58
NL: Bandwidth Estimation
• RTT & loss for multi-hop path
– RTT by addition
– Loss either worst or combine of hops – why?
• Large number of flows combination of probabilities
• Small number of flows worst hop
• Bandwidth calculation
– TCP bandwidth is based primarily on loss and RTT
• 70-80% paths have better bandwidth
• 10-20% of paths have 3x improvement
59
NL: Overlay for Efficiency
• Multi-path routing
– More efficient use of links or QOS
– Need to be able to direct packets based on more than just
destination address  can be computationally expensive
– What granularity? Per source? Per connection? Per packet?
• Per packet  re-ordering
• Per source, per flow  coarse grain vs. fine grain
– Take advantage of relative duration of flows
• Most bytes on long flows
60
NL: Overlay for Features
• How do we add new features to the network?
– Does every router need to support new feature?
– Choices
• Reprogram all routers  active networks
• Support new feature within an overlay
– Basic technique: tunnel packets
• Tunnels
– IP-in-IP encapsulation
– Poor interaction with firewalls, multi-path routers, etc.
61
NL: Examples
• IP V6 & IP Multicast
– Tunnels between routers supporting feature
• Mobile IP
– Home agent tunnels packets to mobile host’s location
– http://www.rfc-editor.org/rfc/rfc2002.txt
• QOS
– Needs some support from intermediate routers
62
NL: Overlay Challenges
• How do you build efficient overlay
– Probably don’t want all N2 links – which links to create?
– Without direct knowledge of underlying topology how to
know what’s nearby and what is efficient?
63
NL: Future of Overlay
• Application specific overlays
– Why should overlay nodes only do routing?
• Caching
– Intercept requests and create responses
• Transcoding
– Changing content of packets to match available bandwidth
• Peer-to-peer applications
64
NL: Network layer summary
•
•
•
•
Network layer functions
Specific network layers (IPv4, IPv6)
Specific network layer devices (routers)
Advanced network layer topics
65
NL: Network trace
• http://www.cse.ogi.edu/class/cse524/trace.txt
66
NL: End of material for midterm
• Midterm next Monday 10/29/2001 covering….
– Technical material in lectures
– Chapters 1, 4, and 5
• Chapter 1
• Chapter 4.1-4.7
• Chapter 5
– Review questions at end of chapters
67