NetLord - sigcomm

Download Report

Transcript NetLord - sigcomm

NetLord:
A Scalable Multi-tenant Network Architecture
for Virtualized Cloud Datacenters
Jayaram Mudigonda
Praveen Yalagandula, Jeff Mogul, Bryan Stiekes, Yanick
Pouffary
Hewlett-Packard Company
© Copyright 2010 Hewlett-Packard Development Company, L.P.
The Goal
Build the right network for a cloud datacenter?
2
6 July 2015
The Goal
Build the right network for a cloud datacenter?
3
6 July 2015
Cloud Datacenter
•
Provides Infrastructure as a Service
− Shared across multiple tenants
•
Virtualized
− Tenants run Virtual Machines (VMs)
4
6 July 2015
The Right Network
•
Virtualization + Multi-tenancy
− A Virtual Network to each tenant
− Flexible abstraction: No restrictions on addressing or protocols
•
Scale
− Tenants, Servers: 10s of 1000s, VMs: 100s of 1000s
− Adequate bandwidth
•
Inexpensive
− Capital: Cheap COTS components
− Operational: Ease of management, High bisection BW
5
6 July 2015
The Challenge
•
Basic COTS switching gear 
− Limited functionality
− Limited resources:
• Not enough Forwarding Information Base (FIB) space
•
COTS Switches + Bisection BW  Multipathing
•
Multipathig and/or Scale  More switch resources
6
6 July 2015
The Challenge
Assume no switch support and conserve switch
resources (FIB)
to
Simultaneously achieve: Scale, Multi-tenancy,
Bisection BW, and Ease of config
7
6 July 2015
State of the Art – Scale
•
Most prior work is limited by one or more of:
− New protocols
− Modified control and/or data planes
− Preferred topologies
− Resources (such as table space) on switches
8
6 July 2015
State of the Art – Multi-tenancy
•
Traditional VLANs
− Single tenant
− No L3 abstraction
− Cannot scale beyond 4K
•
Commercial: Amazon EC2 and Microsoft Azure
•
Most of recent work is on segregation not virtualization
•
Diverter [WREN 2010]
− Carves a SINGLE IP address space among tenants
− Restricts: VM placements and IP address assignment
− Can be limited by the FIB space in switches
9
6 July 2015
NetLord
•
An encapsulation scheme and
A complementary switch configuration

•
Scalable multi-tenancy
•
Ease of configuration
•
Order of magnitude reduction in FIB requirements
•
High bisection BW
10
6 July 2015
Why Encapsulate?
•
Unmodified VM packets onto the network
− Excessive FIB pressure and no L3 abstraction
•
Alternative: Rewrite headers
− Cannot assume L3 header  Rewrite only MAC hdr
• Rewrite with server MACs  Somewhat reduced FIB
• Cannot identify the right VM on destination Server
11
6 July 2015
NetLord Encapsulation
•
Two headers: a MAC and an IP
•
Reduced FIB pressure
− Outer Src MAC = MAC of the Src edge switch
− Outer Dst MAC = MAC of the Dst edge switch
•
Correct delivery
− Right edge switch: The outer MAC header
− Right server: Right port # in the outer dest IP addr
− Right VM: Tenant-ID frm outer dest IP + Inner dest MAC
•
Clean abstraction
− No assumptions about VM protocols and/or addressing
12
6 July 2015
NetLord Encapsulation
VM1
Tenant 15
Server 1
Hypervisor
NetLor
d
Agent
ES-MAC1
P1
IP Address Format
0 Port#
7 bits
VM2
Tenant 15
TenantID
24 bits
Hypervisor
NetLor
d
Agent
COTS Ethernet
VM1  VM2
13
6 July 2015
IP-Range-2
ES-MAC2
P2
ES-MAC2
ES-MAC1
P1.0.0.15

P2.0.0.15
Server 2
ES-MAC1

ES-MAC2
Switch Configuration
•
Outer MAC hdr takes pkt to egress edge switch
•
A switch on MAC Pkt addressed to itself
− Strips MAC hdr and forwards based on IP hdr inside
− Standard behavior
•
Correct forwarding 
− Configure the L3 forwarding tables right
− Make sure to match the server configs
14
6 July 2015
Switch Configuration
Server 1
Server 2
Hypervisor
Hypervisor
1.*.*.*
1.0.0.1
NLA
15
NLA
Prefix
Gateway, port
1.*.*.*
1.0.0.1, 1
2.*.*.*
2.0.0.1, 2
…
…
48.*.*.*
48.0.0.1, 48
6 July 2015
Server 48
…
2.*.*.*
2.0.0.1
2
1
Hypervisor
NLA
…
48
48.*.*.*
48.0.0.1
NL-ARP
•
Builds and maintains the mapping
<VM-MAC,Subnet,Tenant-ID>
 <SWITCH-MAC,PORT>
•
Query – response. Similar to regular ARP
•
NL-Agent broadcasts on VM boot-up and migration
16
6 July 2015
Goals Achieved: Abstraction
•
No assumptions about VM-to-VM pkt contents

•
No restrictions on addressing schemes
•
No restrictions on L3 protocols
17
6 July 2015
Goals Achieved: Scale
•
24-bit tenant ID  Millions of tenants
•
Order of magnitude reduction in FIB requirements
− Non-edge switches see only edge-switch MACs
− Edge switches see their local server MACs in addition
•
Small number of L3 table entries in edge
switches
− One per downlink
18
6 July 2015
Goals Achieved: Ease of Config
•
Automatic
•
Local
•
Same across all switches  Easy to debug
•
Does not change  Easy to debug
•
Easy per-tenant traffic engineering in the network
− Tenant ID readily available in standard IP header
19
6 July 2015
Bisection Bandwidth
•
Reduced FIB pressure  Utilize SPAIN
effectively
J. Mudigonda, P. Yalagandula, M. Al-Fares, and J. C. Mogul. SPAIN: COTS
Data-Center Ethernet for Multipathing over Arbitrary Topologies. In Proc.
USENIX NSDI, 2010.
20
6 July 2015
SPAIN in One Slide
•
Multipath forwarding in container/pod datacenters
− Few thousand servers (or unique MAC addresses)
− COTS Ethernet (with VLAN support)
− Arbitrary topologies
•
A central, off-line manager:
−
−
−
−
•
Discover topology
Compute paths; exploit redundancy in the topology
Merge paths into a minimal set of trees
Install trees as VLANs into switches
End-host (NIC driver):
− Download VLAN maps from the manager
− Spread flows over VLANs (i.e., paths)
21
6 July 2015
SPAIN & Extreme Scalability
•
Critical resource: Forwarding table entries (FIB)
− Each MAC address appears on multiple VLANs
− Worst case FIB = #VLANS × #MACs
− Overflowing FIB  Floods and poor goodput
•
64K Entries  only a few 1000 unique MACs
•
NetLord’s ability to hide MACs helps greatly
22
6 July 2015
Putting It All Together
Server 1
VM1
Server 2
VM2
VM1VM2
Hypervisor
Hypervisor
NetLor
dVM1VM2
Agent
P1.*.*.* macAmac
IP1IP
2
B
NetLor
d VM1VM2
Agent
VM1VM2
COTS Ethernet
23
6 July 2015
IP1IP
2
macB
macA
IP1IP
P2.*.*.*
2
macAmac
B
Evaluation
•
Analytical arguments:
− Max VMs : ~650 K using 48-port 64K FIB switches
− NL-ARP overheads:
< 1% of link BW, ~0.5% of Server memory
•
Prototype + VM emulation
− NLA, encaping overheads are ignorable
− Substantial goodput improvements
24
6 July 2015
Experimental Setup: Topology
•
25
74 Servers in a 2-level fat-tree topology
6 July 2015
Prototype, VM Emulation
User-level
Process
•
NLA Kernel module
…
•
VM Emulator
Protocol Stack
− A thin module above NLA
− Exports a virtual device
− Re-writes MAC addresses
VM Emulator
NetLord Agent
NIC Driver
• Unique pair for each TCP cxn
Physical NIC
•
Network
26
6 July 2015
Up to 3K VMs / Servers
− 200K VMs in all
Experimental Setup: Workload
•
Parallel shuffles
− Emulating shuffle-phase of Map-Reduce jobs
− Each shuffle: 74 mappers & 74 reducers, one per server
− Each mapper transfers 10MB of data to all reducers
27
6 July 2015
Goodput
25
Traditional
SPAIN
NetLord
Goodput (in Gbps)
20
15
10
5
0
74
3.7K
7.4K
14.8K
37K
Number of VMs
74K
148K
222K
Floods
Traditional
SPAIN
NetLord
Flooded Packets (in Millions)
30
25
20
15
10
5
0
74
3.7K
7.4K
14.8K
37K
Number of VMs
74K
148K
222K
Summary
NetLord combines simple existing primitives in a
novel fashion to achieve several out-sized benefits
of practical importance:
•
Scale
•
Multi-tenancy
•
Ease-of-use
•
Bisection BW
30
6 July 2015
•
31
BACKUP
6 July 2015
Putting It All Together
Server 1
VM1
Server 2
VM2
VM1VM2
Hypervisor
Hypervisor
NetLor
dVM1VM2
Agent
P1.*.*.* macAmac
IP1IP
2
B
NetLor
d VM1VM2
Agent
VM1VM2
MAC-A
COTS Ethernet
32
6 July 2015
IP1IP
2
IP1IP
P2.*.*.*
2
macAmac
B