Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

Download Report

Transcript Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

Networking Virtualization Using FPGAs
Russell Tessier, Deepak Unnikrishnan,
Dong Yin, and Lixin Gao
Reconfigurable Computing Group
Department of Electrical and Computer Engineering
University of Massachusetts, Amherst, USA
Outline
 Network virtualization
 Advantage of FPGA for virtualization
 Architecture of FPGA-based virtualization system
 Partially-reconfigurable implementation
 Results
2
Virtual Networking
 Internet growing to include new applications and services
 Cloud computing, Data center networking
 Challenges
 Lack of innovation at network core – Fixed internet routers
 Coupled infrastructure-service provider model
 Solution
 Network virtualization
 Router hardware shared
across multiple networks
3
Network Virtualization
 Many logical networks over a physical
infrastructure
B
A
F
E
C
 Virtual nodes
 Shared network resources among
multiple virtual networks
D
D
B
A
F
C
E
 Reduces costs
 Independent routing policies
B
A
C
D
F
E
4
Virtual Router
 Independent routing policies
for each virtual router
 Key challenges
• Isolation
• Performance
• Flexibility
• Scalability
Physical router
Virtual router A
Virtual router B
Routing
Control
Routing
Control
Forwarding Table
Forwarding Table
DEMUX
MUX
5
Traditional Network Virtualization Techniques
 Software
 Full virtualization
 Container virtualization
 Limitations
 Limited performance (~100Mbps)
 Limited isolation
 ASIC
 Supercharging PlanetLab Platform1
 Juniper E series
 Possible Limitations
 Flexibility
 Scalability
1
VM
VM
HW
Full virtualization
OS
instance
OS
instance
HW
Container virtualization
“Supercharging PlanetLab – A High Performance, Multi-Application, Overlay Network Platform”, J. Turner et al.,
6
SIGCOMM 2007
Virtualization using FPGAs
 A novel network virtualization substrate which
• Uses FPGA to implement high performance virtual routers
• Introduces scalability through virtual routers in host software
• Exploits reconfiguration to customize hardware virtual routers
Virtual
Router 1
Virtual
Router 2
A
FP G
Virtual
Router 3
Virtual
Router 4
7
System Overview
Software
Virtual
Router
NIC
Software
Virtual
Router
Software
Virtual
Router NIC
Software bridge
Kernel driver
PCI
Linux
SDRAM
SRAM
HW VRouter
SDRAM
HW VRouter
PHY
1G
Ethernet
I/F
SRAM
NetFPGA
8
Architecture
PHY
MAC
MAC RX Q
CPU RX Q
PHY
MAC
MAC TX Q
Dynamic
Design
Select
Input
Arbiter
MAC RX Q
Output
Queue
Hardware
Virtual
Router 2
CPU RX Q
PHY
MAC
MAC RX Q
VIP
0
1
2
3
CPU RX Q
NetFPGA
TYPE
HW
HW
SW
SW
NetFPGA
driver
Bridge
To
Switch
PC NIC
Host Workstation
CPU TX Q
MAC TX Q
MAC TX Q
CPU
Transceiver
Software
Virtual
Router 2
PHY
MAC
PHY
MAC
CPU TX Q
PHY
MAC
CPU TX Q
PCI BUS
CONTROL
Software
Virtual
Router 1
Kernel space
NetFPGA
PCI I/F
ID
0
1
0
1
PHY
MAC
CPU TX Q
MAC RX Q
CPU RX Q
PHY
MAC
MAC TX Q
Hardware
Virtual
Router 1
Kernel space
NetFPGA
driver
NetFPGA
PCI I/F
Bridge
PC NIC
From
Switch
User space
9
Scalable Network Virtualization
 Approaches for implementing scalable virtual routers
• Single receiver approach
• All packets routed through NetFPGA hardware
• Multi-receiver approach
• Use a switch to separate packets to NetFPGA and software
10
Virtual Router Customization
 Multiple virtual routers share the substrate
 Individual virtual routers may need customization
 Challenge
• Minimize the impact on traffic in shared hardware
virtual routers during modification
11
Multi Receiver Approach
MAC RX Q
CPU RX Q
PHY
MAC
MAC RX Q
MAC TX Q
Dynamic
Design
Select
Input
Arbiter
Output
Queue
Hardware
Virtual
Router 2
CPU RX Q
PHY
MAC
MAC RX Q
VIP
0
1
2
3
CPU RX Q
Source
NetFPGA
TYPE
HW
HW
SW
SW
ID
0
1
0
1
PHY
MAC
CPU TX Q
MAC RX Q
CPU RX Q
PHY
MAC
MAC TX Q
Hardware
Virtual
Router 1
PHY
MAC
CPU TX Q
MAC TX Q
PHY
MAC
CPU TX Q
MAC TX Q
CPU
Transceiver
PHY
MAC
CPU TX Q
Destination
PHY
MAC
PCI BUS
CONTROL
NetFPGA
PCI I/F
Software
Virtual
Router 1
NetFPGA
driver
Bridge
Switch
PC NIC
Host OS
Kernel space
NetFPGA
driver
NetFPGA
PCI I/F
Bridge
Software
Virtual
Router 2
User space
PC NIC
Switch
Kernel space
12
Dynamic Reconfiguration
Reconfigure FPGA
MAC RX Q
New
HW
Hardware
Virtual
Virtual
Router 1
Router
CPU RX Q
PHY
MAC
MAC RX Q
CPU RX Q
PHY
MAC
MAC RX Q
VIP
0
1
2
3
CPU RX Q
Source
NetFPGA
Output
Queue
TYPE
HW
HW
SW
SW
ID
0
1
0
1
PHY
MAC
CPU TX Q
MAC TX Q
PHY
MAC
CPU TX Q
MAC TX Q
CPU
Transceiver
PHY
MAC
CPU TX Q
PCI BUS
CONTROL
NetFPGA
PCI I/F
SwSoftware
Virtual
Virtual
Router
Router 1
NetFPGA
driver
Bridge
Switch
Address
Remap
MAC TX Q
Hardware
Virtual
Router 2
MAC RX Q
PHY
MAC
CPU TX Q
Dynamic
Design
Select
Input
Arbiter
CPU RX Q
PHY
MAC
MAC TX Q
Destination
PHY
MAC
PC NIC
Host OS
Kernel space
NetFPGA
driver
NetFPGA
PCI I/F
Bridge
Software
Virtual
Router 2
User space
PC NIC
Switch
Kernel space
13
Single receiver approach
PHY
MAC
MAC RX Q
CPU RX Q
PHY
MAC
MAC TX Q
Dynamic
Design
Select
Input
Arbiter
MAC RX Q
Output
Queue
Hardware
Virtual
Router 2
CPU RX Q
PHY
MAC
MAC RX Q
VIP
0
1
2
3
CPU RX Q
NetFPGA
TYPE
HW
HW
SW
SW
NetFPGA
driver
Bridge
To
Switch
PC NIC
Host Workstation
CPU TX Q
MAC TX Q
MAC TX Q
CPU
Transceiver
Software
Virtual
Router 2
PHY
MAC
PHY
MAC
CPU TX Q
PHY
MAC
CPU TX Q
PCI BUS
CONTROL
Software
Virtual
Router 1
Kernel space
NetFPGA
PCI I/F
ID
0
1
0
1
PHY
MAC
CPU TX Q
MAC RX Q
CPU RX Q
PHY
MAC
MAC TX Q
Hardware
Virtual
Router 1
Kernel space
NetFPGA
driver
NetFPGA
PCI I/F
Bridge
PC NIC
From
Switch
User space
14
Partial Reconfiguration
 Use partial reconfiguration to independently configure
virtual routers
15
Partial Reconfiguration
 Up to 2 partially reconfigurable virtual routers in Virtex 2
 Up to 20 partially reconfigurable virtual routers in Virtex 5
16
Experimental Approach
 Metrics
• Throughput
• Latency
Source
NetFPGA
Pktgen/
iPerf
1Gbps
Virtual
Router
1Gbps
Sink
NetFPGA
Pktcap/
iPerf
 Packet generation
• NetFPGA packet generator
• iPerf
 Ping utility used for latency measurements
 Software virtual routers run on 3Ghz AMD X2 6000+
processor, 2GB RAM, Intel E1000 GbE in PCIe slot
17
Dynamic Reconfiguration Overhead
 Full FPGA reconfiguration
•
•
•
•
•
Source pings destination through a single h/w virtual router
Migrate traffic to software
Reconfigure FPGA
Migrate traffic to hardware
12 seconds reconfiguration overhead
 Partial FPGA reconfiguration
• 0.6 seconds reconfiguration
• Remaining virtual routers in FPGA remain active
18
Throughput of Single Hardware Virtual Router
 Hardware virtual router 1 to 2 order better than the software
virtual router
 Consistent forwarding rates vs packet size
Throughput (Mbps)
1000
Hardware
virtual router
100
Software
virtual router
10
1
64
128
512
1024
Packet size (bytes)
1470
19
Full FPGA Reconfiguration
 Two virtual routers (A, B) initially in FPGA
 During reconfiguration router A migrated to software, the
other eliminated
 After reconfiguration two virtual routers (A, B’) again in FPGA
Reduced
Throughput
20
Partial FPGA Reconfiguration
 A remains in hardware and operates at full speed
 20X speedup in reconfiguration down time due to partial
reconfiguration
Sustained
Throughput
21
Scalability - Average Throughput of Virtual Routers
 Aggregate throughput of 1Gbps up to 4 h/w virtual routers
 Adding software virtual routers causes drop in aggregate throughput
Throughput (Mbps)
1000
Avg hardware and
software
100
Software only
10
1
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
Virtual routers
22
Scalability – Average Latency of Virtual Routers
 Latency of h/w virtual router substantially better than
software virtual router
1.2
Software only
Latency (ms)
1
0.8
Avg hardware and
software
0.6
0.4
0.2
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
Virtual routers
23
Throughput Effect of Reconfiguration Frequency
 Partial reconfiguration beneficial during frequent updates
24
Average throughput for combined hardware/software
 Rigid placement constraints limit #hardware virtual routers
 Larger FPGAs sustain high throughputs for more virtual routers
25
Altera DE4
26
Throughput Measurements on DE-4 Reference Router



Consistent packet forwarding rates of 1Gbps for all packet sizes
(64-1024) bytes.
Packet throughput matches NetFPGA reference router in all cases.
Packet buffering in on-chip SRAM (300Kbit).
27
Resource Utilization

Logic Utilization
 14% LUTs
 19% Logic registers
 9% Memory bits
 13% IOs
Altera DE4
NetFPGA
Used
Available
%
Used
Available
%
Combinat
ional
LUTs
24,727
182,400
14%
19,783
47,232
41%
Logic
Registers
34,218
182,400
19%
14,682
47232
31%
9%
24
232
10.3%
13%
356
692
51%
Memory
bits
IOs
1,448,61 14,625,79
2
2
118
888
28
Reconfiguration Time
 20X Faster virtual router updates with partial reconfiguration
Full Reconfiguration
Partial Reconfiguration
29
Resource Usage and Power Consumption
 Virtex 2 can support
• 5 h/w virtual routers (without support for software routers)
• 4 h/w virtual routers (with support for software routers)
• 2 partially reconfigurable hardware routers
 Virtex 5 can support up to 32 virtual routers
 A single h/w virtual router consumes
• 156mW static power
• 688mW dynamic power
 Clock gating saves 10% power
30
Dynamic Virtual Network Allocation
 Assign virtual networks to software/hardware virtual
routers based on bandwidth-resource requirements
 15-20% more successful network upgrades with dynamic
allocation
31
Conclusion
 FPGAs are ideal for a variety of networking applications
• Partial reconfiguration growing in importance
 Virtual networking is an emerging networking area
• Combined FPGA and software system gives high performance and
scalability
• A novel heterogeneous network virtualization substrate using
FPGAs
 NetFPGA platform used to implement up to five virtual routers
• Two order of magnitude throughput increase versus software.
 Partial FPGA reconfiguration reduces the number of
implemented routers but reduces reconfiguration time to 0.6s
• Allocation algorithm migrates highest throughput routers to
hardware
32