csci5211: Computer Networks and Data Communications

Download Report

Transcript csci5211: Computer Networks and Data Communications

Router Design
• Overview of Generic Router Architecture
• Input-Queued Switches (Routers)
• IP Address Look-up Algorithms
• Packet Classification Algorithms
Readings: Do required readings (you can skip the
math in Section V and Appendix in [Mc+99]);
Also do some of the optional readings if
interested
Q: Any volunteers for scribes?
CSci5221:
Router Design
1
CSci5221:
Router Design
. . .
. . .
Routers in a Network
Sample Routers and Switches
Cisco 12416 Router
up to 160 Gb/s throughput
up to 10 Gb/s ports
Juniper Networks T640 Router
up to 160 Gb/s throughput
up to 10 Gb/s ports
CSci5221:
3Com 4950
24 port gigabit
Ethernet switch
Router Design
3
High Capacity Router
• Cisco CRS-1
– up to 46 Tb/s thruput
• two rack types
• line card rack
– 640 Gb/s thruput
– up to 16 line cards
• up to 40 Gb/s each
– up to 72 racks
• switch rack
– central switch stage
– up to 8 racks
• in-service scaling
CSci5221:
Router Design
4
Components of a Basic Router
• Input/Output Interfaces (II,
OI)
II
IPP
OPP
. . .
• Input Port Processor (IPP)
OI
output
queue
routing
table
. . .
– convert between optical signals
and electronic signals
– extract timing from received
signals
– encode (decode) data for
transmission
CP
– synchronize signals
– determine required OI or OIs
from routing table
• Output Port Processor (OPP)
– queue outgoing cells
• shared bus interconnects IPPs
and OPPs
CSci5221:
Router Design
 Control
Processor (CP)
» configures routing tables
» coordinates end-to-end channel setup
together with neighboring routers
Generic Router Architecture
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
1
1
Buffer
Memory
Address
Table
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
2
2
Header Processing
Lookup
IP Address
Update
Header
Address
Table
CSci5221:
Queue
Packet
Buffer
Memory
Address
Table
Data Hdr
Queue
Packet
Router Design
N
N
Queue
Packet
Buffer
Memory
6
Switch Fabric: Three Design Approaches
CSci5221:
Router Design
7
Switch Fabric: First Generation Routers
• Traditional computers with switching under direct
control of the CPU
• Packet copied to the system’s memory
• Speed limited by the memory bandwidth (two bus
crossings per packet)
Input
Port
Memory
Output
Port
System Bus
CSci5221:
Router Design
8
Shared Memory (1st Generation)
Shared Backplane
CPU
Route
Table
Buffer
Memory
Line
Interface
Line
Interface
Line
Interface
MAC
MAC
MAC
Typically < 0.5Gbps aggregate capacity
Limited by rate of shared memory
CSci5221:
Router Design
9
Switch Fabric: Switching Via a Bus
• Packet from input port
memory to output port
memory via a shared bus
• Bus contention: switching speed
limited by bus bandwidth
• 1 Gbps bus, Cisco 1900: sufficient speed
for access and enterprise
routers (not regional or backbone)
CSci5221:
Router Design
10
Shared Bus (2nd Generation)
CPU
Typically < 5Gb/s aggregate
capacity; Limited by shared bus
Router Design
Buffer
Memory
Line
Card
Line
Card
Line
Card
Buffer
Memory
Buffer
Memory
Buffer
Memory
Fwding
Cache
Fwding
Cache
Fwding
Cache
MAC
MAC
MAC
CSci5221:
Route
Table
11
Switch Fabric: Interconnection Network
• Banyan networks, other interconnection nets
initially created for multiprocessors
• Advanced design: fragmenting packet into fixed
length cells to send through the fabric
• Cisco 12000: switches Gbps through the
interconnection network
CSci5221:
Router Design
12
Point-to-Point Switch (3rd Generation)
Switched Backplane
Line
Card
CPU
Card
Line
Card
Local
Buffer
Memory
Routing
Table
Local
Buffer
Memory
Fwding
Table
Fwding
Table
MAC
MAC
Typically < 50Gbps
aggregate capacity
CSci5221:
Router Design
13
Buffer Placement: Output Port Queuing
• Buffering when the aggregate arrival rate exceeds the
output line speed
• Memory must operate at very high speed
CSci5221:
Router Design
14
Simple model of output queued
switch
Link 1, ingress
Link rate, R
Link 2, ingress
R
Link 3, ingress
R
Link 4, ingress
R
CSci5221:
Router Design
Link 1, egress
Link rate, R
Link 2, egress
R
Link 3, egress
R
Link 4, egress
R
15
Characteristics of an output
queued (OQ) switch
• arriving packets immediately written into output queue,
without intermediate buffering
• flow of packets to one output does not affect flow to
another output
• OQ switch is work conserving: output line always busy when
there is a packet in switch for it
• OQ switch has highest throughput, lowest average delay
CSci5221:
Router Design
16
Switching Speed-up Needed
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
1
1
Buffer
Memory
Address
Table
Data Hdr
Header Processing
Lookup
IP Address
Queue
Packet
Update
Header
2
2
NQueue
times line
Packet
rate
Buffer
Memory
Address
Table
N times line rate
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
Address
Table
CSci5221:
Router Design
N
N
Queue
Packet
Buffer
Memory
17
Buffer Placement: Input Port Queuing
• Fabric slower than input ports combined
– So, queuing may occur at input queues
• Head-of-the-Line (HOL) blocking
– Queued packet at the front of the queue prevents
others in queue from moving forward
CSci5221:
Router Design
18
Simple model of input queued
switch
Link 1, ingress
R
Link 2
Link 1
R1
Link 2, ingress
R
Link 3
Link 4
Link 3, ingress
R
Link 4, ingress
R
CSci5221:
Router Design
Link 1, egress
R
Link 2, egress
R
Link 3, egress
R
Link 4, egress
R
19
Head-of-line Blocking
• Packet at the head of an input queue cannot be
transferred, thus blocking the following
packets (or cells – packets of fixed size)
Cannot be transferred because
is blocked by red packet
Input 1
Output 1
Input 2
Output 2
Input 3
CSci5221:
Cannot be
transferred
because output
buffer full
Router Design
Output 3
20
Characteristics of an input
queued (IQ) switch
• arriving packets written into input queue
• only one packet can be sent to output link at a
time
• head-of-line blocking
• IQ switch cannot keep output links fully utilized
CSci5221:
Router Design
21
Buffer Placement: Design Trade-offs
• Output queues
– Pro: work-conserving, so maximizes throughput
– Con: memory must operate at speed N*R
• Input queues
– Pro: memory can operate at speed R
– Con: head-of-line blocking for access to output
• Work-conserving: output line is always busy when
there is a packet in the switch for it
• Head-of-line blocking: head packet in a FIFO
cannot be transmitted, forcing others to wait
CSci5221:
Router Design
22
What is capacity of IQ: Model
[optional: Karol et al Globecom’86]
•Large input-queued switch with
– single FIFO at each input
– packet destinations i.i.d. (independently, identically
distributed), uniform across outputs
– HoL blocked packets not flushed
•throughput analysis
–
–
–
–
–
–
saturated switch (i.e., always arrival at each input queue)
ball/urns model: N balls, N urns
focus on first urn
Xt - number of balls in urn at time t
Dt- number balls removed from all ums at end of time t
Dt is switch thruput
CSci5221:
Router Design
23
Model (cont’d)
• At+1 - no. balls dropped into urn 1 at t+1
• Xt+1 = (Xt-1)+ + At+1
• where
 Dt 
k
Dt  k
P At 1  k    1 / N  1  1 / N 
k 
• E(Dt) = ρN where ρ is output throughput
• for large N, binomial distribution can be
approximated by Poisson distribution,
P( At  k ) 
CSci5221:
Router Design

k
k!
e
24
Model (cont’d)
E ( A )  EA  2( EA)
EX 
2(1  EA)
2
2
where EA = ρ, E(A2) = ρ + ρ2
therefore
2
2  
EX 
2(1   )
EX = 1, therefore
2   2
1
2(1   )
and ρ =2-√2 » 58.6%
CSci5221:
Router Design
25
A Router with Input Queues
Head of Line Blocking
Delay
The best that
any queueing
system can
achieve.
0%
20%
40%
60%
Load
CSci5221:
Router Design
80%
100%
2  2  58%
26
Solution to Avoid Head-of-line Blocking
• How to improve capacity without increasing switching fabric
speed ?
• Maintain at each input N virtual queues, i.e., one per output
– use non-FIFO scheduler, matching input/output
Input 1
Output 1
Output 2
Input 2
Output 3
Input 3
CSci5221:
Router Design
27
Virtual Output Queueing
• assume fixed length
packets
1
• each input manages
separate queue per output
• at each time, matching
scheduler finds best
possible packets from
inputs to said to outputs
• maximum-weight matching
N
CSci5221:
Router Design
1
.
.
.
matching
scheduler
N
.
.
.
28
Matching
• Lij(t): no. of packets at input i for output j at t
• bipartite graph (V1 x V2,E), E ¥ V1xV2
– V1,V2 inputs, outputs; (i,j) ÎE iff Lij(t) > 0
• matching: subset of E such that no two edges are adjacent
• maximal matching: no more edges can be added
An aside:
stability (of a [queueing] system):
input
output
• Assuming the arrival rate is (i.e., # of
arrivals per unit of time) less or equal to
the system capacity
• The system is stable if and only if no
queue grows infinitely (under any arrival
patterns) as t ® ¥
CSci5221:
Router Design
29
Matching problems
• maximum size matching
– matching with largest number of edges
– when traffic uniform, provides 100% utilization
– network flow problem, O(N5/2)
•
maximum weight matching
– add weight wij to edge from i to j
–
–
–
–
• e.g., wij: # of packets from input i to output j in the queue
matching with highest weight
when wij = Lij(t) provides 100% utilization
equivalent to a network flow problem, O(N3)
MWM algorithms involve backtracking:
i.e. edges laid down in one iteration may be removed later
algorithm not amenable to pipelining
CSci5221:
Router Design
30
Scheduling Algorithms
19
3
4
1
21
18
7
19
19
1
7
Practical
Maximal Matchings
Max Size Matching
 Not stable
CSci5221:
18
 Not stable
Router Design
Max Wt Matching
 Stable
31
Switch Algorithms
19
19
1
18
7
Maximal matching
Max Size Matching
Not stable
Not stable
Max Wt Matching
Stable, low backlogs
Better performance
Easier to implement
CSci5221:
Router Design
32
Better Matching Algorithms
•
Need simple algorithms that perform well
•
Randomized algorithms with linear complexity available
–
–
–
efficient packet processing packets at line speeds
high throughput
low latencies/backlogs
– Tassiulas’ Randomized Algorithm
– LAURA
– SERENA
Use both randomization, history, problem structure and arrival
information
For more details, see optional reading [SGP02]: “Efficient Randomized
Algorithms for Input-Queued Switch Scheduling” by Shah, Giaccone
and Prabhakar, IEEE Micro Vol 22, Issue 1, Jan 2002
CSci5221:
Router Design
33
Combined Input-Output Queued
(CIOQ) Routers
• Both input and output
interfaces store packets
• Advantages
input interface
output interface
– Easy to built
• Utilization 1 can be achieved
with limited input/output
speedup (<= 2)
Backplane
• Disadvantages
– Harder to design algorithms
• Two congestion points
• Need to design flow control
CSci5221:
Router Design
RO
C
34
Output Queue Emulation
using CIOQ (with Speed-up)
Stable Marriage Problem
-- Gale Shapely Algorithm (GSA)
• As long as there is a free man m
– m proposes to highest ranked women w in his list he
hasn’t proposed yet
– If w is free, m an w are engaged
– If w is engaged to m’ and w prefers m to m’, w releases
m’
• Otherwise m remains free
• A stable matching exists for every set of
preference lists
• Complexity: worst-case O(N2)
CSci5221:
Router Design
35
Stable Marriage Problem
• Consider N women and N men
• Each woman/man ranks each man/woman in the
order of their preferences
• Stable matching, a matching with no blocking pairs
• Blocking pair; let p(i) denote the pair of i
– There are matched pairs (k, p(k)) and (j, p(j)) such that k
prefers p(j) to p(k), and p(j) prefers k to j
CSci5221:
Router Design
36
Example
men
1
2
3
4
pref. list
2 4 3 1
1 4 3 2
4 3 2 1
1 2 4 3
women
1
2
3
4
pref. list
1 4 3 2
3 1 4 2
1 2 3 4
2 1 4 3
• If men propose to women, the stable matching is
–
–
–
–
1st round: (1,2), (2,1), (3,4), (4,1) -> w1 releases m2
2nd round: (2,4) ->w4 releases m3;
3rd round: (3,3);
final match: (1,2), (2,4), (3,3), (4,1)
• What is the stable matching if women propose to
men?
CSci5221:
Router Design
37
OQ Emulation with a Speedup of 2
• Each input and output maintains a preference list
• Input preference list: list of cells at that input
ordered in the inverse order of their arrival
• Output preference list: list of all input cells to be
forwarded to that output ordered by the times
they would be served in an Output Queueing
schedule
• Use GSA to match inputs to outputs
– Outputs initiate the matching
• Can emulate all work-conserving schedulers
For more info, see the optional reading [C+99] “Matching Output
Queueing with a Combined Input Output Queued Switch.”
CSci5221:
Router Design
38
Line Cards
• Interfacing
to/from link
–
–
–
–
–
–
–
Packet forwarding (FIB)
Packet filtering (ACLs)
Buffer management
Link scheduling
Rate-limiting
Packet marking
Measurement
CSci5221:
Router Design
FIB
Transmit
• Packet handling
Receive
– Physical link
– Switching fabric
to/from switch
39
Line Card: Abstract view
Header Processing
Data
Hdr
Lookup
Update
IP Address Header
IP Address
Hdr
Next Hop
Address
Table
CSci5221:
Queue
Packet
Data
Router Design
Buffer
Memory
40
Line Cards: Longest-Prefix Match
Forwarding
• Forwarding Information Base in IP routers
– Maps each IP prefix to next-hop link(s)
• Destination-based forwarding
– Packet has a destination address
– Router identifies longest-matching prefix
– Pushing complexity into forwarding decisions
FIB
4.0.0.0/8
4.83.128.0/17
12.0.0.0/8
12.34.158.0/24
126.255.103.0/24
destination
12.34.158.5
CSci5221:
Router Design
outgoing link
Serial0/0.1
41
Line Cards: Packet Forwarding
Evolution
• Software on the router CPU
– Central processor makes forwarding decision
– Not scalable to large aggregate throughput
• Route cache on the line card
– Maintain a small FIB cache on each line card
– Store (destination, output link) mappings
– Cache misses handled by the router CPU
• Full FIB on each line card
– Store the entire FIB on each line card
– Apply dedicated hardware for longest-prefix match
CSci5221:
Router Design
42
Line Cards: Packet Filtering With
Access Control Lists
Should arriving
packet be allowed
in? Departing packet
let out?
• “Five tuple” for access control lists (ACLs)
– Source and destination IP addresses
– TCP/UDP source and destination ports
– Protocol (e.g., UDP vs. TCP)
CSci5221:
Router Design
43
ACL Examples
• Filter packets based on source address
– Customer access link to the service provider
– Source address should fall in customer prefix
• Filter packets based on port number
– Block traffic for unwanted applications
– Known security vulnerabilities, peer-to-peer, …
• Block pairs of hosts from communicating
– Protect access to special servers
– E.g., block the dorms from the grading server 
CSci5221:
Router Design
44
Line Cards: Mapping Traffic to
Classes
• Gold traffic
– All traffic to/from President’s IP address
– All traffic to/from the port number for DNS
• Silver traffic
– All traffic to/from academic and administrative buildings
• Bronze traffic
– All traffic on the public wireless network
• Then, schedule resources accordingly
– 50% for gold, 30% for silver, and 20% for bronze
CSci5221:
Router Design
45
Addressing and Look-up
• Flat address
– Ethernet: 48 bit MAC
address
– ATM: 28 bit VPI/VCI
– DS-0: timeslot location
• Limited scalability
• High speed lookup
• Hierarchical address
– IP <network>.<subnet>.<host>
– Telephone: country.area.home
• Scalable
• Easy lookup if boundary is
fixed
– telephony
• Difficult lookup if boundary is
flexible
– longest prefix match for IP
CSci5221:
Router Design
46
Lookups Must be Fast
Year
Line
40Byte
packets
(Mpkt/s)
1997
622Mb/s
1.94
1999
2.5Gb/s
7.81
2001
10Gb/s
31.25
2003
40Gb/s
125
1. lookup mechanism must be simple, easy to implement
2. memory access time long-term bottleneck
CSci5221:
Router Design
47
Memory Technology (2003-04)
Technology
Single chip $/chip
2012
density
($/MByte)
Networking
DRAM
64 MB
SRAM
4 MB
TCAM
1 MB
~$0.005
per Mbyte
$30-$50
Access
speed
Watts/chip
40-80ns
0.5-2W
4-8ns
1-3W
4-8ns
15-30W
($0.50-$0.75)
~$0.02
$20-$30
per Mbyte
($5-$8)
~$0.2
$200-$250
per Mbit
($200-$250)
Note: price, speed, power manufacturer and market dependent
CSci5221:
Router Design
48
Lookup Mechanism is Protocol
Dependent
Protocol
Mechanism
Techniques
MPLS, ATM,
Ethernet
Exact match
search
–Direct lookup
–Associative lookup
–Hashing
–Binary/Multi-way Search Trie/Tree
IPv4, IPv6
Longest-prefix
match search
-Radix trie and variants
-Compressed trie
-Binary search on prefix intervals
CSci5221:
Router Design
49
Exact Matches in Ethernet
Switches
•
•
•
•
layer-2 addresses usually 48-bits long
address global, not just local to link
range/size of address not “negotiable”
248 > 1012, therefore cannot hold all addresses in table and
use direct lookup
CSci5221:
Router Design
50
Exact Matches in Ethernet
Switches (Associative Lookup)
•
associative memory (aka Content Addressable Memory, CAM)
compares all entries in parallel against incoming data
Associative
Memory
(“CAM”)
Network address
Location
48bits
CSci5221:
“Normal”
Memory
Port
Match
Router Design
51
Exact Matches in Ethernet Switches
Hashing
Pointer
List/Bucket
Data
16, say
Memory
Address
48
Hashing
Function
Data
Network
Address
Address
Memory
List of network addresses in
this bucket
•
•
•
use pseudo-random hash function (relatively insensitive to actual function)
bucket linearly searched (or could be binary search, etc.)
unpredictable number of memory references
CSci5221:
Router Design
52
Exact Matches Using Hashing
Number of memory references
Expected number of memory references :
1
(Expected length of list | list not empty)
2

1


 1 
2  1  (1  1 / N ) M 
ER 
Where:
ER = Expected number of memory references
M - Number of memory addresses in table
N -- Number of linked lists
 = M/N
CSci5221:
Router Design
53
Exact Matches in Ethernet Switches
48
Hashing
Function
16, say
Data
Network
Address
Address
Perfect Hashing
Memory
Port
There always exists perfect hash function
Goal: With perfect hash function, memory lookup
always takes O(1) memory references
Problem:
- finding perfect hash function very complex
- updates?
CSci5221:
Router Design
54
Exact Matches in Ethernet
Switches: Hashing
• advantages:
– simple
– expected lookup time is small
• disadvantages
– inefficient use of memory
– non-deterministic lookup time
=> attractive for software-based switches, but decreasing
use in hardware platforms
CSci5221:
Router Design
55
IP Address Lookup
• routing tables contain (prefix, next hop) pairs
• address in packet compared to stored prefixes,
starting at left
• prefix that matches largest number of address
bits is desired match
• packet forwarded to specified next hop
Problem - large router may have
100,000 prefixes in its list
routing table
prefix
10*
01*
110*
1011*
0001*
0101 1*
0001 0*
0011 00*
1011 001*
1011 010*
0100 110*
0100 1100*
1011 0011*
1001 1000*
0101 1001*
next
hop
7
5
3
5
0
7
1
2
3
5
6
4
8
10
9
address: 1011 0010 1000
CSci5221:
Router Design
Longest Prefix Match Harder than
Exact Match
• destination address of arriving packet does not
carry information to determine length of longest
matching prefix
• need to search space of all prefix lengths; as well
as space of prefixes of given length
CSci5221:
Router Design
57
LPM in IPv4: exact match
Use 32 exact match algorithms
Exact match
against prefixes
of length 1
Network Address
Exact match
against prefixes
of length 2
Priority
Encode
and pick
Port
Exact match
against prefixes
of length 32
CSci5221:
Router Design
58
Address Lookup Using Tries
Trie node
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
• prefixes “spelled” out by
following path from root
• to find best prefix, spell out
address in tree
• last green node marks longest
matching prefix
Lookup 10111
• adding prefix easy
CSci5221:
Router Design
next-hop-ptr (if prefix)
right-ptr
left-ptr
A
1
C
P2
G
P3
B
1
0
1
D
1
E
0
1
add P5=1110*
0
P4
H
P5
P1
F
I
59
Binary Tries
• W-bit prefixes: O(W) lookup, O(NW) storage and
O(W) update complexity
Advantages
Disadvantages
Simplicity
Worst case lookup slow
Wastage of storage space in
chains
Extensible to wider fields
CSci5221:
Router Design
60
Leaf Condensing in Binary Trie
Trie node
next-hop-ptr (if prefix)
A
1
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
C
P2
G
B
1
0
1
left-ptr or
next-hop
E
right-ptr or
next-hop
D
P1
0
P3
P4
CSci5221:
Router Design
61
Leaf-pushed Binary Trie
Trie node
A
1
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
C
P2
G
B
1
0
1
0
left-ptr or
next-hop
right-ptr or
next-hop
D
P1
E
P2
P3 P4
CSci5221:
Router Design
62
PATRICIA
• PATRICIA (practical algorithm to
retrieve coded information in
alphanumeric)
next-hop-ptr bit-position
(if prefix)
(skip count)
– Eliminate internal nodes with only one
left-ptr or
right-ptr or
descendant
next-hop ptr next-hop ptr
– Encode bit position for determining
Lookup 10111
(right) branching
– A match must be performed at the end
Bitpos 12345
A
– May need to backtrack
2
• Used in the Unix BSD implementation
Lookup 10101
0
1
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
CSci5221:
B
P2
5
P3
Router Design
1
P4
P1
C
Lookup 00101
Lookup 100101
Lookup 110101
Lookup 101101
63
PATRICIA
• W-bit prefixes: O(W) lookup, O(N)
storage and O(W) update complexity
Advantages
Disadvantages
decreased storage
extensible to wider fields
worst case lookup slow
backtracking makes
implementation complex
See the URL below for a simple and clear illustration of PATRICIA for
dictionary indexing and search:
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA/
CSci5221:
Router Design
64
Path-compressed Trie
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
1, _, 2
0
A
1
B
10,P2,4
0
1010,P3,5
P1
C
D
1
E
Path-compressed tree node structure
P4
Lookup 10111
variable-length next-hop (if
prefix present)
bitstring
left-ptr
CSci5221:
Router Design
bit-position
right-ptr
65
Path-compressed Trie
• W-bit prefixes: O(W) lookup, O(N)
storage and O(W) update complexity
Advantages
Disadvantages
decreased storage
worst case lookup slow
CSci5221:
Router Design
66
Multi-bit Tries
Binary trie
W
Depth = W
Degree = 2
Stride = 1 bit
Multi-ary trie
W/k
Depth = W/k
Degree = 2k
Stride = k bits
CSci5221:
Router Design
67
Prefix Expansion with Multi-bit Tries
If stride = k bits, prefix lengths that
are not a multiple of k need to be
expanded
E.g., k = 2:
Prefix
Expanded prefixes
0*
00*, 01*
11*
11*
Maximum number of expanded prefixes
corresponding to one non-expanded prefix = 2k-1
CSci5221:
Router Design
68
4-ary Trie (k=2)
A four-ary trie node
next-hop-ptr (if prefix)
ptr00 ptr01 ptr10 ptr11
A
10
B
P2
D
P3
10
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
CSci5221:
11
G
C
10
10
Router Design
E
P11
11
11
P41
Lookup 10111
P42
F
P12
H
69
Prefix Expansion Increases
Storage Consumption
• replication of next-hop ptr
• greater number of unused (null) pointers in a node
Time ~ W/k
Storage ~ NW/k * 2k-1
CSci5221:
Router Design
70
Generalization: Different
Strides at Each Trie Level
•
•
•
•
16-8-8 split
4-10-10-8 split
24-8 split
21-3-8 split
CSci5221:
Router Design
71
Choice of Strides: Controlled
Prefix Expansion
Given forwarding table and desired number of memory
accesses in worst case (i.e., maximum tree depth, D)
A dynamic programming algorithm to compute
optimal sequence of strides that minimizes
storage requirements: runs in O(W2D) time
Advantages
Disadvantages
Optimal storage under
these constraints
Updates lead to suboptimality anyway
Hardware implementation
difficult
CSci5221:
Router Design
72
More Fast IP Address Lookup Algorithms
• Binary Search on Levels [Waldvogel et al, SIGCOMM 1997]
– Represent look-up tree as array of hash tables
– Notion of “marker” to guide binary search
– Prefix expansion to reduce size of array (thus memory
accesses)
• Lulea’s Algorithm [Lulea et al SIGCOMM 1997]
– Key goal: compactly represent routing table in small memory
(hopefully, within cache size), to minimize memory access
– Use a three-level data structure
• Cut the look-up tree at level 16 and level 24
– Clever ways to design compact data structures to represent
routing look-up info at each level
• Other methods …
CSci5221:
Router Design
73
Packet Classification
• general router mechanism
– firewalls
– network address translation
– web server load balancing
– special processing for selected flows
• common form of based on 5 IP header fields
– source/dest. addr. – either/both specified by prefixes
– protocol field - may be “wild-card”
– source/dest. port #s (TCP/UDP) - may be port ranges
• no ideal design
– exhaustive search - slow links, few filters
– ternary content-addressable memory – exhaustive search
– efficient special cases - exact match, one or two address
prefixes
CSci5221:
Router Design
74
Packet Classification
L3-DA
Field 1
L3-SA
Field 2
L4-PROT
Field k
…
Rule 1
5.3.40.0/21
2.13.8.11/32
…
UDP
A1
Rule 2
5.168.3.0/24
152.133.0.0/16
…
TCP
A2
…
…
…
…
…
…
Rule N
5.168.0.0/16
152.0.0.0/8
…
ANY
AN
Action
Example: packet (5.168.3.32, 152.133.171.71, …, TCP)
Packet Classification: find action associated with highest
priority rule matching incoming packet header
CSci5221:
Router Design
75
Formal Problem Definition
Given classifier C with N rules, Rj, 1 ≤ j ≤ N, where Rj
consists of three entities:
1) a regular expression Rj[i], 1 ≤ i ≤ d, on each of the d header
fields,
2) a number, pri(Rj), indicating the priority of the rule in the
classifier, and
3) an action, referred to as action(Rj).
For incoming packet P with header considered as d-tuple of points (P1,
P2, …, Pd), the d-dimensional packet classification problem is to find
rule Rm with highest priority among all rules Rj matching d-tuple; i.e.,
pri(Rm) > pri(Rj), " j ¹ m, 1 ≤ j ≤ N, such that Pi matches Rj[i], 1 ≤ i ≤
d. Rule Rm is best matching rule for packet P.
CSci5221:
Router Design
76
Routing Lookup: Instance of 1D
Classification
•
•
•
•
•
one-dimension (destination address)
forwarding table º classifier
routing table entry º rule
outgoing interface º action
prefix-length º priority
CSci5221:
Router Design
77
Example 4D Classifier
Rule
L3-DA
L3-SA
L4-DP
L3-PROT
Action
R1
152.163.190.69/255.25
5.255.255
152.163.80.11/255.25
5.255.255
*
*
Deny
R2
152.168.3/255.255.255
152.163.200.157/255.
255.255.255
eq www
udp
Deny
R3
152.168.3/255.255.255
152.163.200.157/255.
255.255.255
range 20-21
udp
Permit
R4
152.168.3/255.255.255
152.163.200.157/255.
255.255.255
eq www
tcp
Deny
R5
*
*
*
*
Deny
CSci5221:
Router Design
78
Example Classification Results
Pkt
Hdr
L3-DA
L3-SA
L4-DP
L3-PROT
Rule,
Action
P1
152.163.190.69
152.163.80.11
www
tcp
R1, Deny
P2
152.168.3.21
152.163.200.157
www
udp
R2, Deny
CSci5221:
Router Design
79
Geometric Interpretation
Dimension 2
R7
R6
Packet classification problem: Find
the highest priority rectangle
containing an incoming point
P1
P2
R3
e.g. (128.16.46.23,
*)
R4
R5
R2
R1
e.g. (144.24/24, 64/16)
Dimension 1
CSci5221:
Router Design
80
Linear Search
• keep rules in a linked list
• O(N) storage, O(N) lookup time, O(1) update
complexity
CSci5221:
Router Design
81
Ternary Match Operation
• Each TCAM entry stores a value, V, and mask, M
• Hence, two bits (Vi and Mi) for each bit position i (i=1..W)
• For an incoming packet header, H = {Hi}, the TCAM entry outputs
a match if Hi matches Vi in each bit position for which Mi equals ‘1’.
CSci5221:
Vi
Mi
Match in bit position i ?
X
0
Yes
0
1
iff (Hi==0)
1
1
iff (Hi==1)
Router Design
82
Lookups/Classification with
Ternary CAM
TCAM
RAM
0 1.23.11.3, tcp
1
0
2
3
0
0
1
Memory array
Packet
Header
M
CSci5221:
1.23.x.x, x
Router Design
Priority
encoder
Action
Memory
Action
1
83
Lookups/Classification with
Ternary CAM
0
1
TCAM
For LPM
1.23.11.3
0
2
3
P32 1
P31
0
0
Memory array
Packet
Header
M
CSci5221:
1.23.x.x
RAM
Priority
encoder
Action
Memory
Action
P8 1
Router Design
84
Range-to-prefix Blowup
• prefixes easier to handle than ranges
• can transform ranges to prefixes
Range-to-prefix blowup problem
CSci5221:
Router Design
85
Range-to-prefix Blowup
Maximum memory blowup = factor of (2W-2)d
Rule
Range
Maximal Prefixes
R1
[3,11]
0011, 01**, 10**
R2
[2,7]
001*, 01**
R3
[4,11]
01**, 10**
R4
[4,7]
01**
R5
[1,14]
CSci5221:
Luckily, real-life does not
see too many arbitrary
ranges.
0001, 001*, 01**, 10**, 110*, 1110
Router Design
86
TCAMs
Advantages
Disadvantages
extensible to multiple
fields
fast: 10-16 ns today (66100 M searches per second)
going to 250 Msps
simple to understand and
use
inflexible: range-to-prefix
blowup
high power, cost:
low density, largest available
in 2003-4 is ~2MB, i.e., 128K x
128 (can be cascaded)
CSci5221:
Router Design
87
Example Classifier
CSci5221:
Rule
Destination
Address
Source
Address
R1
0*
10*
R2
0*
01*
R3
0*
1*
R4
00*
1*
R5
00*
11*
R6
10*
1*
R7
*
00*
Router Design
88
Hierarchical Tries
Search (000,010)
Dimension DA
1
0
0
0
R3
R4
R6
R5
CSci5221:
R2 R1
Router Design
R7
Rule
DA
SA
R1
0*
10*
R2
0*
01*
R3
0*
1*
R4
00*
1*
R5
00*
11*
R6
10*
1*
R7
*
00*
O(NW) memory
O(W2) lookup
Dimension SA
89
Set-pruning Tries
Search (000,010)
Dimension DA
1
0
0
0
R3
R4
R7
CSci5221:
R2 R1
Router Design
DA
SA
R1
0*
10*
R2
0*
01*
R3
0*
1*
R4
00*
1*
R5
00*
11*
R6
10*
1*
R7
*
00*
O(N2) memory
R6
O(2W) lookup
R3
R2 R1 R5
R3 R7
Rule
R7
Dimension SA
90
Grid-of-Tries
Search (000,010)
Dimension DA
1
0
0
0
switch
pointers
0
0
0
R4
R3
R5
R3
CSci5221:
0
R3
R2 R1
Router Design
R6
R7
Rule
DA
SA
R1
0*
10*
R2
0*
01*
R3
0*
1*
R4
00*
1*
R5
00*
11*
R6
10*
1*
R7
*
00*
O(NW) memory
O(2W) lookup
Dimension SA
91
Grid-of-Tries
20K 2D rules: 2MB, 9 memory accesses (with prefix-expansion)
Advantages
Disadvantages
good solution for two
dimensions
difficult to carry update
not easily extensible to more
than two dimensions
CSci5221:
Router Design
92
Classification Algorithms: Speed
vs. Storage Tradeoff
Lower bounds for Point Location in N regions with
d dimensions from Computational Geometry
O(log N) time with O(Nd) storage, or
O(logd-1N) time with O(N) storage
N = 100, d = 4, Nd = 100 MBytes and
logd-1N = 350 memory accesses
CSci5221:
Router Design
93
Packet Classification Summary
• Algorithms discussed so far
– good for two fields, doesn’t scale to more than two fields,
OR
– good for very small classifiers (< 50 rules) only, OR
– have non-deterministic classification time, etc.
• Heuristic-Based Algorithms
– Recursive Flow Classification (RFC)
• Exploit structure of classifiers, recursively reduce rule space
– Hierarchical Intelligent Cuttings (HiCuts)
• Use heuristics to reduce d-dim search space into sub-spaces
– Tuple Space Search
• decompose query into a number of exact match queries
• store rules into hash table
CSci5221:
Router Design
94
Example of Packet Flow in RFC
CSci5221:
Router Design
95
Geometric Interpretation
Dimension 2
R7
R6
Packet classification problem: Find
the highest priority rectangle
containing an incoming point
P1
P2
R3
e.g. (128.16.46.23,
*)
R4
R5
R2
R1
e.g. (144.24/24, 64/16)
Dimension 1
CSci5221:
Router Design
96
RFC Example
• Four fields  six chunks
– Source and destination IP addresses  two chuncks each
– Protocol number  one chunck
– Destination port number  one chunck
CSci5221:
Router Design
97
Lookup: What’s Used Out
There?
• overwhelming majority of routers:
– modifications of multi-bit tries (h/w optimized trie
algorithms)
– DRAM (sometimes SRAM) based, large number of routes
(>0.25M)
– parallelism required for speed/storage becomes an issue
• others mostly TCAM based
– for smaller number of routes (256K)
– used more frequently in L2/L3 switches
– power and cost main bottlenecks
CSci5221:
Router Design
98
Classification: What’s Used
Out There?
• majority of hardware platforms: TCAMs
– High performance, cost, power, deterministic worst-case
• some others: Modifications of RFC
– Low speed, low cost DRAM-based, heuristic
– Works well in software platforms
• some others: nothing/linear search/simulatedparallel-search etc.
CSci5221:
Router Design
99