csci5211: Computer Networks and Data Communications

Download Report

Transcript csci5211: Computer Networks and Data Communications

Router Design
• Overview of Generic Router Architecture
• Input-Queued Switches (Routers)
• IP Address Look-up Algorithms
• Packet Classification Algorithms
Readings: do required and optional readings if
interested
winter 2008
Router Design
1
winter 2008
Router Design
. . .
. . .
Routers in a Network
Sample Routers and Switches
Cisco 12416 Router
up to 160 Gb/s throughput
up to 10 Gb/s ports
Juniper Networks T640 Router
up to 160 Gb/s throughput
up to 10 Gb/s ports
winter 2008
3Com 4950
24 port gigabit
Ethernet switch
Router Design
3
High Capacity Router
• Cisco CRS-1
– up to 46 Tb/s thruput
• two rack types
• line card rack
– 640 Gb/s thruput
– up to 16 line cards
• up to 40 Gb/s each
– up to 72 racks
• switch rack
– central switch stage
– up to 8 racks
• in-service scaling
winter 2008
Router Design
4
Components of a Basic Router
• Input/Output Interfaces (II,
OI)
IPP
II
OPP
OI
output
queue
• Input Port Processor (IPP)
. . .
routing
table
. . .
– convert between optical signals
and electronic signals
– extract timing from received
signals
– encode (decode) data for
transmission
CP
– synchronize signals
– determine required OI or OIs
from routing table
• Output Port Processor (OPP)
– queue outgoing cells
• shared bus interconnects IPPs
and OPPs
winter 2008
 Control
Processor (CP)
» configures routing tables
» coordinates end-to-end channel setup
together with neighboring routers
Router Design
Generic Router Architecture
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
1
1
Buffer
Memory
Address
Table
Data Hdr
Header Processing
Lookup
IP Address
Queue
Packet
Update
Header
2
2
NQueue
times line
Packet
rate
Buffer
Memory
Address
Table
N times line rate
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
N
N
Buffer
Memory
Address
Table
winter 2008
Queue
Packet
Router Design
6
Switch Fabric: Three Design Approaches
winter 2008
Router Design
7
Switch Fabric: First Generation Routers
• Traditional computers with switching under direct
control of the CPU
• Packet copied to the system’s memory
• Speed limited by the memory bandwidth (two bus
crossings per packet)
Input
Port
Memory
Output
Port
System Bus
winter 2008
Router Design
8
Shared Memory (1st Generation)
Shared Backplane
CPU
Route
Table
Buffer
Memory
Line
Interface
Line
Interface
Line
Interface
MAC
MAC
MAC
Typically < 0.5Gbps aggregate capacity
Limited by rate of shared memory
winter 2008
Router Design
9
Switch Fabric: Switching Via a Bus
• Packet from input port
memory to output port
memory via a shared bus
• Bus contention: switching speed
limited by bus bandwidth
• 1 Gbps bus, Cisco 1900: sufficient speed
for access and enterprise
routers (not regional or backbone)
winter 2008
Router Design
10
Shared Bus (2nd Generation)
CPU
Typically < 5Gb/s aggregate
capacity; Limited by shared bus
Buffer
Memory
Line
Card
Line
Card
Line
Card
Buffer
Memory
Buffer
Memory
Buffer
Memory
Fwding
Cache
Fwding
Cache
Fwding
Cache
MAC
MAC
MAC
winter 2008
Route
Table
Router Design
11
Switch Fabric: Interconnection Network
• Banyan networks, other interconnection nets
initially created for multiprocessors
• Advanced design: fragmenting packet into fixed
length cells to send through the fabric
• Cisco 12000: switches Gbps through the
interconnection network
winter 2008
Router Design
12
Point-to-Point Switch (3rd Generation)
Switched Backplane
Line
Card
CPU
Card
Line
Card
Local
Buffer
Memory
Routing
Table
Local
Buffer
Memory
Fwding
Table
Fwding
Table
MAC
MAC
Typically < 50Gbps
aggregate capacity
winter 2008
Router Design
13
Buffer Placement: Output Port Queuing
• Buffering when the aggregate arrival rate exceeds the
output line speed
• Memory must operate at very high speed
winter 2008
Router Design
14
Simple model of output queued
switch
Link 1, ingress
Link 1, egress
Link rate, R
Link rate, R
Link 2, ingress
Link 2, egress
R
R
Link 3, ingress
Link 3, egress
R
R
Link 4, ingress
Link 4, egress
R
winter 2008
R
Router Design
15
Characteristics of an output
queued (OQ) switch
• arriving packets immediately written into output queue,
without intermediate buffering
• flow of packets to one output does not affect flow to
another output
• OQ switch is work conserving: output line always busy when
there is a packet in switch for it
• OQ switch has highest throughput, lowest average delay
winter 2008
Router Design
16
Switching Speed-up Needed
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
1
1
Buffer
Memory
Address
Table
Data Hdr
Header Processing
Lookup
IP Address
Queue
Packet
Update
Header
2
2
NQueue
times line
Packet
rate
Buffer
Memory
Address
Table
N times line rate
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
N
N
Buffer
Memory
Address
Table
winter 2008
Queue
Packet
Router Design
17
Buffer Placement: Input Port Queuing
• Fabric slower than input ports combined
– So, queuing may occur at input queues
• Head-of-the-Line (HOL) blocking
– Queued packet at the front of the queue prevents
others in queue from moving forward
winter 2008
Router Design
18
Simple model of input queued
switch
Link 1, ingress
Link 1, egress
R
Link 2
Link 1
R1
Link 3
Link 4
R
Link 2, ingress
Link 2, egress
R
R
Link 3, ingress
Link 3, egress
R
R
Link 4, egress
Link 4, ingress
R
R
winter 2008
Router Design
19
Head-of-line Blocking
• Packet at the head of an input queue cannot be
transferred, thus blocking the following
packets (or cells – packets of fixed size)
Cannot be transferred because
is blocked by red packet
Input 1
Output 1
Input 2
Output 2
Input 3
winter 2008
Cannot be
transferred
because output
buffer full
Router Design
Output 3
20
Characteristics of an input
queued (IQ) switch
• arriving packets written into input queue
• only one packet can be sent to output link at a
time
• head-of-line blocking
• IQ switch cannot keep output links fully utilized
winter 2008
Router Design
21
Buffer Placement: Design Trade-offs
• Output queues
– Pro: work-conserving, so maximizes throughput
– Con: memory must operate at speed N*R
• Input queues
– Pro: memory can operate at speed R
– Con: head-of-line blocking for access to output
• Work-conserving: output line is always busy when
there is a packet in the switch for it
• Head-of-line blocking: head packet in a FIFO
cannot be transmitted, forcing others to wait
winter 2008
Router Design
22
What is capacity of IQ: Model
[optional: Karol et al Globecom’86]
•Large input-queued switch with
– single FIFO at each input
– packet destinations i.i.d. (independently, identically
distributed), uniform across outputs
– HoL blocked packets not flushed
•throughput analysis
–
–
–
–
–
–
saturated switch (i.e., always arrival at each input queue)
ball/urns model: N balls, N urns
focus on first urn
Xt - number of balls in urn at time t
Dt- number balls removed from all ums at end of time t
Dt is switch thruput
winter 2008
Router Design
23
Model (cont’d)
• At+1 - no. balls dropped into urn 1 at t+1
• Xt+1 = (Xt-1)+ + At+1
• where
 Dt 
k
Dt  k
P At 1  k    1 / N  1  1 / N 
k
• E(Dt) = ρN where ρ is output throughput
• for large N, binomial distribution can be
approximated by Poisson distribution,
P( At  k ) 
winter 2008

Router Design
k
k!
e

24
Model (cont’d)
E ( A )  EA  2( EA)
EX 
2(1  EA)
2
2
where EA = ρ, E(A2) = ρ + ρ2
therefore
2
2  
EX 
2(1   )
EX = 1, therefore
2   2
1
2(1   )
and ρ =2-√2 58.6%
winter 2008
Router Design
25
A Router with Input Queues
Head of Line Blocking
Delay
The best that
any queueing
system can
achieve.
0%
20%
40%
60%
Load
winter 2008
Router Design
80%
100%
2  2  58%
26
Solution to Avoid Head-of-line Blocking
• How to improve capacity without increasing switching fabric
speed ?
• Maintain at each input N virtual queues, i.e., one per output
– use non-FIFO scheduler, matching input/output
Input 1
Output 1
Output 2
Input 2
Output 3
Input 3
winter 2008
Router Design
27
Virtual Output Queueing
• assume fixed length
packets
1
• each input manages
separate queue per output
• at each time, matching
scheduler finds best
possible packets from
inputs to said to outputs
• maximum-weight matching
N
winter 2008
1
.
.
.
matching
scheduler
N
.
.
.
Router Design
28
Matching
• Lij(t): no. of packets at input i for output j at t
• bipartite graph (V1V2,E), EV1V2
– V1,V2 inputs, outputs
– (i,j)  E iff Lij(t) > 0
• matching: subset of E such that input
no two edges are adjacent
winter 2008
Router Design
output
29
Matching problems
• maximum size matching
– matching with largest number of edges
– when traffic uniform, provides 100% utilization
– network flow problem, O(N5/2)
•
maximum weight matching
–
–
–
–
–
add weight wij to edge from i to j
matching with highest weight
when wij = Lij(t) provides 100% utilization
equivalent to a network flow problem, O(N3)
MWM algorithms involve backtracking:
i.e. edges laid down in one iteration may be removed later
 algorithm not amenable to pipelining
winter 2008
Router Design
30
Scheduling Algorithms
19
3
4
1
21
18
7
19
19
1
18
7
Practical
Maximal Matchings
 Not stable
winter 2008
Max Size Matching
 Not stable
Router Design
Max Wt Matching
 Stable
31
Switch Algorithms
19
19
1
18
7
Maximal matching
Not stable
Max Size Matching
Not stable
Max Wt Matching
Stable, low backlogs
Better performance
Easier to implement
winter 2008
Router Design
32
Better Matching Algorithms
•
Need simple algorithms that perform well
•
Randomized algorithms with linear complexity available
–
–
–
efficient packet processing packets at line speeds
high throughput
low latencies/backlogs
– Tassiulas’ Randomized Algorithm
– LAURA
– SERENA
Use both randomization, history, problem structure and arrival
information
For more details, see “Efficient Randomized Algorithms for Input-Queued
Switch Scheduling” by Shah, Giaccone and Prabhakar, IEEE Micro Vol
22, Issue 1, Jan 2002
winter 2008
Router Design
33
Combined Input-Output Queued
(CIOQ) Routers
• Both input and output
interfaces store packets
• Advantages
input interface
output interface
– Easy to built
• Utilization 1 can be achieved
with limited input/output
speedup (<= 2)
Backplane
• Disadvantages
– Harder to design algorithms
• Two congestion points
• Need to design flow control
winter 2008
Router Design
RO
C
34
Output Queue Emulation
using CIOQ (with Speed-up)
Stable Marriage Problem
-- Gale Shapely Algorithm (GSA)
• As long as there is a free man m
– m proposes to highest ranked women w in his list he
hasn’t proposed yet
– If w is free, m an w are engaged
– If w is engaged to m’ and w prefers m to m’, w releases m’
• Otherwise m remains free
• A stable matching exists for every set of
preference lists
• Complexity: worst-case O(N2)
winter 2008
Router Design
35
Stable Marriage Problem
• Consider N women and N men
• Each woman/man ranks each man/woman in the
order of their preferences
• Stable matching, a matching with no blocking pairs
• Blocking pair; let p(i) denote the pair of i
– There are matched pairs (k, p(k)) and (j, p(j)) such that k
prefers p(j) to p(k), and p(j) prefers k to j
winter 2008
Router Design
36
Example
men
1
2
3
4
pref. list
2 4 3 1
1 4 3 2
4 3 2 1
1 2 4 3
women
1
2
3
4
pref. list
1 4 3 2
3 1 4 2
1 2 3 4
2 1 4 3
• If men propose to women, the stable matching is
–
–
–
–
1st round: (1,2), (2,1), (3,4), (4,1) -> w1 releases m2
2nd round: (2,4) ->w4 releases m3;
3rd round: (3,3);
final match: (1,2), (2,4), (3,3), (4,1)
• What is the stable matching if women propose to
men?
winter 2008
Router Design
37
OQ Emulation with a Speedup of 2
• Each input and output maintains a preference list
• Input preference list: list of cells at that input
ordered in the inverse order of their arrival
• Output preference list: list of all input cells to be
forwarded to that output ordered by the times
they would be served in an Output Queueing
schedule
• Use GSA to match inputs to outputs
– Outputs initiate the matching
• Can emulate all work-conserving schedulers
winter 2008
Router Design
38
Line Cards
• Interfacing
to/from link
–
–
–
–
–
–
–
Packet forwarding (FIB)
Packet filtering (ACLs)
Buffer management
Link scheduling
Rate-limiting
Packet marking
Measurement
winter 2008
Router Design
FIB
Transmit
• Packet handling
Receive
– Physical link
– Switching fabric
to/from switch
39
Line Card: Abstract view
Header Processing
Data
Hdr
Lookup
Update
IP Address Header
IP Address
Hdr
Next Hop
Address
Table
winter 2008
Queue
Packet
Data
Buffer
Memory
Router Design
40
Line Cards: Longest-Prefix Match
Forwarding
• Forwarding Information Base in IP routers
– Maps each IP prefix to next-hop link(s)
• Destination-based forwarding
– Packet has a destination address
– Router identifies longest-matching prefix
– Pushing complexity into forwarding decisions
FIB
destination
12.34.158.5
winter 2008
4.0.0.0/8
4.83.128.0/17
12.0.0.0/8
12.34.158.0/24
126.255.103.0/24
Router Design
outgoing link
Serial0/0.1
41
Line Cards: Packet Forwarding
Evolution
• Software on the router CPU
– Central processor makes forwarding decision
– Not scalable to large aggregate throughput
• Route cache on the line card
– Maintain a small FIB cache on each line card
– Store (destination, output link) mappings
– Cache misses handled by the router CPU
• Full FIB on each line card
– Store the entire FIB on each line card
– Apply dedicated hardware for longest-prefix match
winter 2008
Router Design
42
Line Cards: Packet Filtering With
Access Control Lists
Should arriving
packet be allowed
in? Departing packet
let out?
• “Five tuple” for access control lists (ACLs)
– Source and destination IP addresses
– TCP/UDP source and destination ports
– Protocol (e.g., UDP vs. TCP)
winter 2008
Router Design
43
ACL Examples
• Filter packets based on source address
– Customer access link to the service provider
– Source address should fall in customer prefix
• Filter packets based on port number
– Block traffic for unwanted applications
– Known security vulnerabilities, peer-to-peer, …
• Block pairs of hosts from communicating
– Protect access to special servers
– E.g., block the dorms from the grading server 
winter 2008
Router Design
44
Line Cards: Mapping Traffic to
Classes
• Gold traffic
– All traffic to/from President’s IP address
– All traffic to/from the port number for DNS
• Silver traffic
– All traffic to/from academic and administrative buildings
• Bronze traffic
– All traffic on the public wireless network
• Then, schedule resources accordingly
– 50% for gold, 30% for silver, and 20% for bronze
winter 2008
Router Design
45
Addressing and Look-up
• Flat address
– Ethernet: 48 bit MAC
address
– ATM: 28 bit VPI/VCI
– DS-0: timeslot location
• Limited scalability
• High speed lookup
• Hierarchical address
– IP <network>.<subnet>.<host>
– Telephone: country.area.home
• Scalable
• Easy lookup if boundary is
fixed
– telephony
• Difficult lookup if boundary is
flexible
– longest prefix match for IP
winter 2008
Router Design
46
Lookups Must be Fast
Year
Line
40Byte
packets
(Mpkt/s)
1997
622Mb/s
1.94
1999
2.5Gb/s
7.81
2001
10Gb/s
31.25
2003
40Gb/s
125
1. lookup mechanism must be simple, easy to implement
2. memory access time long-term bottleneck
winter 2008
Router Design
47
Memory Technology (2003-04)
Technology
Single chip $/chip
density
($/MByte)
Access
speed
Watts/chip
Networking
DRAM
64 MB
$30-$50
($0.50-$0.75)
40-80ns
0.5-2W
SRAM
4 MB
$20-$30
($5-$8)
4-8ns
1-3W
TCAM
1 MB
$200-$250
($200-$250)
4-8ns
15-30W
Note: price, speed, power manufacturer and market dependent
winter 2008
Router Design
48
Lookup Mechanism is Protocol
Dependent
Protocol
Mechanism
Techniques
MPLS, ATM,
Ethernet
Exact match
search
–Direct lookup
–Associative lookup
–Hashing
–Binary/Multi-way Search Trie/Tree
IPv4, IPv6
Longest-prefix
match search
-Radix trie and variants
-Compressed trie
-Binary search on prefix intervals
winter 2008
Router Design
49
Exact Matches in Ethernet
Switches
•
•
•
•
layer-2 addresses usually 48-bits long
address global, not just local to link
range/size of address not “negotiable”
248 > 1012, therefore cannot hold all addresses in table and
use direct lookup
winter 2008
Router Design
50
Exact Matches in Ethernet
Switches (Associative Lookup)
•
associative memory (aka Content Addressable Memory, CAM)
compares all entries in parallel against incoming data
Associative
Memory
(“CAM”)
Network address
48bits
winter 2008
“Normal”
Memory
Location
Port
Match
Router Design
51
Exact Matches in Ethernet Switches
Hashing
Pointer
List/Bucket
Data
16, say
Memory
Address
48
Hashing
Function
Data
Network
Address
Address
Memory
List of network addresses in
this bucket
•
•
•
use pseudo-random hash function (relatively insensitive to actual function)
bucket linearly searched (or could be binary search, etc.)
unpredictable number of memory references
winter 2008
Router Design
52
Exact Matches Using Hashing
Number of memory references
Expectednumber of memory references:
1
ER  (Expectedlength of list | list not empty)
2

1


 1 
2  1  (1  1 / N ) M 
Where:
ER = Expected number of memory references
M - Number of memory addresses in table
N -- Number of linked lists
 = M/N
winter 2008
Router Design
53
Exact Matches in Ethernet Switches
48
Hashing
Function
16, say
Data
Network
Address
Address
Perfect Hashing
Memory
Port
There always exists perfect hash function
Goal: With perfect hash function, memory lookup
always takes O(1) memory references
Problem:
- finding perfect hash function very complex
- updates?
winter 2008
Router Design
54
Exact Matches in Ethernet
Switches: Hashing
• advantages:
– simple
– expected lookup time is small
• disadvantages
– inefficient use of memory
– non-deterministic lookup time
 attractive for software-based switches, but decreasing
use in hardware platforms
winter 2008
Router Design
55
IP Address Lookup
• routing tables contain (prefix, next hop) pairs
• address in packet compared to stored prefixes,
starting at left
• prefix that matches largest number of address
bits is desired match
• packet forwarded to specified next hop
Problem - large router may have
100,000 prefixes in its list
routing table
prefix
10*
01*
110*
1011*
0001*
0101 1*
0001 0*
0011 00*
1011 001*
1011 010*
0100 110*
0100 1100*
1011 0011*
1001 1000*
0101 1001*
next
hop
7
5
3
5
0
7
1
2
3
5
6
4
8
10
9
address: 1011 0010 1000
winter 2008
Router Design
Longest Prefix Match Harder than
Exact Match
• destination address of arriving packet does not
carry information to determine length of longest
matching prefix
• need to search space of all prefix lengths; as well
as space of prefixes of given length
winter 2008
Router Design
57
LPM in IPv4: exact match
Use 32 exact match algorithms
Exact match
against prefixes
of length 1
Network Address
Exact match
against prefixes
of length 2
Priority
Encode
and pick
Port
Exact match
against prefixes
of length 32
winter 2008
Router Design
58
Address Lookup Using Tries
Trie node
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
• prefixes “spelled” out by
following path from root
• to find best prefix, spell out
address in tree
• last green node marks longest
matching prefix
Lookup 10111
• adding prefix easy
winter 2008
next-hop-ptr (if prefix)
right-ptr
left-ptr
A
1
C
P2
G
P3
B
1
0
1
D
1
E
0
1
Router Design
add P5=1110*
0
P4
H
P5
P1
F
I
59
Binary Tries
• W-bit prefixes: O(W) lookup, O(NW) storage and
O(W) update complexity
Advantages
Disadvantages
Simplicity
Worst case lookup slow
Wastage of storage space in
Extensible to wider fields
winter 2008
chains
Router Design
60
Leaf-pushed Binary Trie
Trie node
A
1
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
C
P2
G
B
1
0
1
0
left-ptr or
next-hop
right-ptr or
next-hop
D
P1
E
P2
P3 P4
winter 2008
Router Design
61
PATRICIA
Patricia tree internal node
• PATRICIA (practical algorithm to
retrieve coded information in
alphanumeric)
bit-position
right-ptr
left-ptr
– Eliminate internal nodes with only one
descendant
– Encode bit position for determining
(right) branching
0
2
A
1
B
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
winter 2008
0
3
1
P2
Lookup 10111
Bitpos 12345
P1
E
C
5
F
0
P3
Router Design
1
G
P4
62
PATRICIA
• W-bit prefixes: O(W2) lookup, O(N)
storage and O(W) update complexity
Advantages
Disadvantages
decreased storage
worst case lookup slow
backtracking makes
implementation complex
extensible to wider fields
winter 2008
Router Design
63
Path-compressed Tree
1, , 2
0
P1
111*
H1
P2
10*
H2
10,P2,4
P3
1010*
H3
P4
10101
H4
1
1010,P3,5
A
1
B
P1
C
D
1
E
Path-compressed tree node structure
P4
Lookup 10111
variable-length next-hop (if
prefix present)
bitstring
left-ptr
winter 2008
Router Design
bit-position
right-ptr
64
Path-compressed Tree
• W-bit prefixes: O(W) lookup, O(N)
storage and O(W) update complexity
Advantages
Disadvantages
decreased storage
worst case lookup slow
winter 2008
Router Design
65
Multi-bit Tries
Binary trie
W
Depth = W
Degree = 2
Stride = 1 bit
Multi-ary trie
W/k
winter 2008
Depth = W/k
Degree = 2k
Stride = k bits
Router Design
66
Prefix Expansion with Multi-bit Tries
If stride = k bits, prefix lengths that
are not a multiple of k need to be
expanded
E.g., k = 2:
Prefix
Expanded prefixes
0*
00*, 01*
11*
11*
Maximum number of expanded prefixes
corresponding to one non-expanded prefix = 2k-1
winter 2008
Router Design
67
4-ary Trie (k=2)
A four-ary trie node
next-hop-ptr (if prefix)
ptr00 ptr01 ptr10 ptr11
A
10
B
P2
D
P3
10
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
winter 2008
11
G
C
10
10
E
P11
11
11
P41
Lookup 10111
P42
F
P12
H
Router Design
68
Prefix Expansion Increases
Storage Consumption
• replication of next-hop ptr
• greater number of unused (null) pointers in a node
Time ~ W/k
Storage ~ NW/k * 2k-1
winter 2008
Router Design
69
Generalization: Different
Strides at Each Trie Level
•
•
•
•
16-8-8 split
4-10-10-8 split
24-8 split
21-3-8 split
winter 2008
Router Design
70
Choice of Strides: Controlled
Prefix Expansion
Given forwarding table and desired number of memory
accesses in worst case (i.e., maximum tree depth, D)
A dynamic programming algorithm to compute
optimal sequence of strides that minimizes
storage requirements: runs in O(W2D) time
Advantages
Disadvantages
Optimal storage under
these constraints
Updates lead to suboptimality anyway
Hardware implementation
difficult
winter 2008
Router Design
71
Fast IP Address Lookup Algorithms
• Lulea’s Algorithm (SIGCOMM 1997)
– Key goal: compactly represent routing table in small
memory (hopefully, within cache size), to minimize
memory access
– Use a three-level data structure
• Cut the look-up tree at level 16 and level 24
– Clever ways to design compact data structures to
represent routing look-up info at each level
• Binary Search on Levels (SIGCOMM 1997)
– Represent look-up tree as array of hash tables
– Notion of “marker” to guide binary search
– Prefix expansion to reduce size of array (thus memory
accesses)
winter 2008
Router Design
72
Packet Classification
• general router mechanism
– firewalls
– network address translation
– web server load balancing
– special processing for selected flows
• common form of based on 5 IP header fields
– source/dest. addr. – either/both specified by prefixes
– protocol field - may be “wild-card”
– source/dest. port #s (TCP/UDP) - may be port ranges
• no ideal design
– exhaustive search - slow links, few filters
– ternary content-addressable memory – exhaustive search
– efficient special cases - exact match, one or two address
prefixes
winter 2008
Router Design
73
Packet Classification
L3-DA
Field 1
L3-SA
Field 2
L4-PROT
Field k
…
Rule 1
5.3.40.0/21
2.13.8.11/32
…
UDP
A1
Rule 2
5.168.3.0/24
152.133.0.0/16
…
TCP
A2
…
…
…
…
…
…
Rule N
5.168.0.0/16
152.0.0.0/8
…
ANY
AN
Action
Example: packet (5.168.3.32, 152.133.171.71, …, TCP)
Packet Classification: find action associated with highest
priority rule matching incoming packet header
winter 2008
Router Design
74
Formal Problem Definition
Given classifier C with N rules, Rj, 1  j  N, where Rj
consists of three entities:
1) a regular expression Rj[i], 1  i  d, on each of the d header
fields,
2) a number, pri(Rj), indicating the priority of the rule in the
classifier, and
3) an action, referred to as action(Rj).
For incoming packet P with header considered as d-tuple of points
(P1, P2, …, Pd), the d-dimensional packet classification problem is to
find rule Rm with highest priority among all rules Rj matching d-tuple;
i.e., pri(Rm) > pri(Rj),  j  m, 1  j  N, such that Pi matches Rj[i], 1
 i  d. Rule Rm is best matching rule for packet P.
winter 2008
Router Design
75
Routing Lookup: Instance of 1D
Classification
•
•
•
•
•
one-dimension (destination address)
forwarding table  classifier
routing table entry  rule
outgoing interface  action
prefix-length  priority
winter 2008
Router Design
76
Example 4D Classifier
Rule
L3-DA
L3-SA
L4-DP
L3-PROT
Action
R1
152.163.190.69/255.25
5.255.255
152.163.80.11/255.25
5.255.255
*
*
Deny
R2
152.168.3/255.255.255
152.163.200.157/255.
255.255.255
eq www
udp
Deny
R3
152.168.3/255.255.255
152.163.200.157/255.
255.255.255
range 20-21
udp
Permit
R4
152.168.3/255.255.255
152.163.200.157/255.
255.255.255
eq www
tcp
Deny
R5
*
*
*
*
Deny
winter 2008
Router Design
77
Example Classification Results
Pkt
Hdr
L3-DA
L3-SA
L4-DP
L3-PROT
Rule,
Action
P1
152.163.190.69
152.163.80.11
www
tcp
R1, Deny
P2
152.168.3.21
152.163.200.157
www
udp
R2, Deny
winter 2008
Router Design
78
Geometric Interpretation
Dimension 2
R7
R6
Packet classification problem: Find
the highest priority rectangle
containing an incoming point
P1
P2
R3
e.g. (128.16.46.23,
*)
R4
R5
R2
R1
e.g. (144.24/24, 64/16)
Dimension 1
winter 2008
Router Design
79
Linear Search
• keep rules in a linked list
• O(N) storage, O(N) lookup time, O(1) update
complexity
winter 2008
Router Design
80
Ternary Match Operation
• Each TCAM entry stores a value, V, and mask, M
• Hence, two bits (Vi and Mi) for each bit position i (i=1..W)
• For an incoming packet header, H = {Hi}, the TCAM entry outputs
a match if Hi matches Vi in each bit position for which Mi equals ‘1’.
winter 2008
Vi
Mi
Match in bit position i ?
X
0
Yes
0
1
Iff (Hi==0)
1
1
Iff (Hi==1)
Router Design
81
Lookups/Classification with
Ternary CAM
TCAM
RAM
0 1.23.11.3, tcp
1
0
2
3
0
0
1
Memory array
Packet
Header
M
winter 2008
1.23.x.x, x
Priority
encoder
Action
Memory
Action
1
Router Design
82
Lookups/Classification with
Ternary CAM
0
1
TCAM
For LPM
1.23.11.3
0
2
3
P32 1
P31
0
0
Memory array
Packet
Header
M
winter 2008
1.23.x.x
RAM
Priority
encoder
Action
Memory
Action
P8 1
Router Design
83
Range-to-prefix Blowup
• prefixes easier to handle than ranges
• can transform ranges to prefixes
Range-to-prefix blowup problem
winter 2008
Router Design
84
Range-to-prefix Blowup
Maximum memory blowup = factor of (2W-2)d
Rule
Range
Maximal Prefixes
R1
[3,11]
0011, 01**, 10**
R2
[2,7]
001*, 01**
R3
[4,11]
01**, 10**
R4
[4,7]
01**
R5
[1,14]
winter 2008
Luckily, real-life does not
see too many arbitrary
ranges.
0001, 001*, 01**, 10**, 110*, 1110
Router Design
85
TCAMs
Advantages
Disadvantages
extensible to multiple
fields
fast: 10-16 ns today (66100 M searches per second)
going to 250 Msps
simple to understand and
use
inflexible: range-to-prefix
blowup
high power, cost:
low density, largest available
in 2003-4 is ~2MB, i.e., 128K x
128 (can be cascaded)
winter 2008
Router Design
86
Example Classifier
winter 2008
Rule
Destination
Address
Source
Address
R1
0*
10*
R2
0*
01*
R3
0*
1*
R4
00*
1*
R5
00*
11*
R6
10*
1*
R7
*
00*
Router Design
87
Hierarchical Tries
Search (000,010)
Dimension DA
1
0
0
0
R3
R4
R6
R5
winter 2008
R2 R1
R7
Router Design
Rule
DA
SA
R1
0*
10*
R2
0*
01*
R3
0*
1*
R4
00*
1*
R5
00*
11*
R6
10*
1*
R7
*
00*
O(NW) memory
O(W2) lookup
Dimension SA
88
Set-pruning Tries
Search (000,010)
Dimension DA
1
0
0
0
R4
R7
R2 R1 R5 R7
winter 2008
DA
SA
R1
0*
10*
R2
0*
01*
R3
0*
1*
R4
00*
1*
R5
00*
11*
R6
10*
1*
R7
*
00*
O(N2) memory
R6
O(2W) lookup
R3
R2 R1
Rule
R7
Router Design
Dimension SA
89
Grid-of-Tries
Search (000,010)
Dimension DA
1
0
0
0
switch
pointers
0
0
0
R4
R5
winter 2008
0
R2 R1
R3
R6
R7
Router Design
Rule
DA
SA
R1
0*
10*
R2
0*
01*
R3
0*
1*
R4
00*
1*
R5
00*
11*
R6
10*
1*
R7
*
00*
O(NW) memory
O(2W) lookup
Dimension SA
90
Grid-of-Tries
20K 2D rules: 2MB, 9 memory accesses (with prefix-expansion)
Advantages
Disadvantages
good solution for two
dimensions
difficult to carry update
not easily extensible to more
than two dimensions
winter 2008
Router Design
91
Classification Algorithms: Speed
vs. Storage Tradeoff
Lower bounds for Point Location in N regions with
d dimensions from Computational Geometry
O(log N) time with O(Nd) storage, or
O(logd-1N) time with O(N) storage
N = 100, d = 4, Nd = 100 MBytes and
logd-1N = 350 memory accesses
winter 2008
Router Design
92
Packet Classification Summary
• Algorithms discussed so far
– good for two fields, doesn’t scale to more than two fields,
OR
– good for very small classifiers (< 50 rules) only, OR
– have non-deterministic classification time, etc.
• Heuristic-Based Algorithms
– Recursive Flow Classification (RFC)
• Exploit structure of classifiers, recursively reduce rule space
– Hierarchical Intelligent Cuttings (HiCuts)
• Use heuristics to reduce d-dim search space into sub-spaces
– Tuple Space Search
• decompose query into a number of exact match queries
• store rules into hash table
winter 2008
Router Design
93
Example of Packet Flow in RFC
winter 2008
Router Design
94
RFC Example
• Four fields  six chunks
– Source and destination IP addresses  two chuncks each
– Protocol number  one chunck
– Destination port number  one chunck
winter 2008
Router Design
95
Lookup: What’s Used Out
There?
• overwhelming majority of routers:
– modifications of multi-bit tries (h/w optimized trie
algorithms)
– DRAM (sometimes SRAM) based, large number of routes
(>0.25M)
– parallelism required for speed/storage becomes an issue
• others mostly TCAM based
– for smaller number of routes (256K)
– used more frequently in L2/L3 switches
– power and cost main bottlenecks
winter 2008
Router Design
96
Classification: What’s Used Out
There?
• majority of hardware platforms: TCAMs
– High performance, cost, power, deterministic worst-case
• some others: Modifications of RFC
– Low speed, low cost DRAM-based, heuristic
– Works well in software platforms
• some others: nothing/linear search/simulatedparallel-search etc.
winter 2008
Router Design
97