Transcript ppt - NOISE

Data Center Networking
CS 6250: Computer Networking
Fall 2011
1
Cloud Computing
• Elastic resources
– Expand and contract resources
– Pay-per-use
– Infrastructure on demand
• Multi-tenancy
– Multiple independent users
– Security and resource isolation
– Amortize the cost of the (shared) infrastructure
• Flexibility service management
2
– Resiliency: isolate failure of servers and storage
– Workload movement: move work to other locations
Trend of Data Center
By J. Nicholas Hoover , InformationWeek June 17, 2008 04:00 AM
http://gigaom.com/cloud/google-builds-megadata-center-in-finland/
200 million Euro
http://www.datacenterknowledge.com
Data centers will be larger and larger in the future cloud
computing era to benefit from commodities of scale.
Tens of thousand  Hundreds of thousands in # of servers
Most important things in data center management
- Economies of scale
- High utilization of equipment
- Maximize revenue
- Amortize administration cost
- Low power consumption (http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf)
Cloud Service Models
• Software as a Service
– Provider licenses applications to users as a service
– e.g., customer relationship management, email, …
– Avoid costs of installation, maintenance, patches, …
• Platform as a Service
– Provider offers software platform for building applications
– e.g., Google’s App-Engine
– Avoid worrying about scalability of platform
• Infrastructure as a Service
– Provider offers raw computing, storage, and network
– e.g., Amazon’s Elastic Computing Cloud (EC2)
– Avoid buying servers and estimating resource needs
4
Multi-Tier Applications
• Applications consist of tasks
– Many separate components
– Running on different machines
Front end
Server
• Commodity computers
– Many general-purpose computers
– Not one big mainframe
– Easier scaling
Aggregator
Aggregator
Aggregator
……
Aggregator
…
5
Worker
Worker
…
Worker
Worker
Worker
Enabling Technology:
Virtualization
• Multiple virtual machines on one physical machine
• Applications run unmodified as on real machine
•6 VM can migrate from one computer to another
Status Quo: Virtual Switch in Server
7
Top-of-Rack Architecture
• Rack of servers
– Commodity servers
– And top-of-rack switch
• Modular design
– Preconfigured racks
– Power, network, and
storage cabling
• Aggregate to the next level
8
Modularity
• Containers
• Many containers
9
Data Center Challenges
•
•
•
•
•
•
Traffic load balance
Support for VM migration
Achieving bisection bandwidth
Power savings / Cooling
Network management (provisioning)
Security (dealing with multiple tenants)
10
Data Center Costs (Monthly Costs)
• Servers: 45%
– CPU, memory, disk
• Infrastructure: 25%
– UPS, cooling, power distribution
• Power draw: 15%
– Electrical utility costs
• Network: 15%
– Switches, links, transit
http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx
11
Common Data Center Topology
Internet
Core
Aggregation
Access
Data Center
Layer-3 router
Layer-2/3 switch
Layer-2 switch
Servers
12
Data Center Network Topology
Internet
CR
AR
AR
S
S
S
S
S
…
~ 1,000 servers/pod
13
CR
...
S
…
AR
AR
...
Key
• CR = Core Router
• AR = Access Router
• S = Ethernet Switch
• A = Rack of app. servers
Requirements for future data center
• To catch up with the trend of mega data center, DCN technology
should meet the requirements as below
–
–
–
–
–
–
High Scalability
Transparent VM migration (high agility)
Easy deployment requiring less human administration
Efficient communication
Loop free forwarding
Fault Tolerance
• Current DCN technology can’t meet the requirements.
– Layer 3 protocol can not support the transparent VM migration.
– Current Layer 2 protocol is not scalable due to the size of forwarding table
and native broadcasting for address resolution.
Problems with Common Topologies
• Single point of failure
• Over subscription of links higher up in the
topology
• Tradeoff between cost and provisioning
15
Capacity Mismatch
CR
~ 200:1
AR
S
CR
AR
AR
AR
S
S
S
S
S
~ 40:1
S
S
~ 5:1
…
16
S
S
…
...
S
…
S
…
Data-Center Routing
Internet
CR
DC-Layer 3
AR
AR
SS
SS
SS
SS
CR
...
AR
AR
DC-Layer 2
SS
…
SS
…
~ 1,000 servers/pod == IP subnet
17
...
Key
• CR = Core Router (L3)
• AR = Access Router (L3)
• S = Ethernet Switch (L2)
• A = Rack of app. servers
Reminder: Layer 2 vs. Layer 3
• Ethernet switching (layer 2)
– Cheaper switch equipment
– Fixed addresses and auto-configuration
– Seamless mobility, migration, and failover
• IP routing (layer 3)
– Scalability through hierarchical addressing
– Efficiency through shortest-path routing
– Multipath routing through equal-cost multipath
• So, like in enterprises…
18
– Data centers often connect layer-2 islands by IP
routers
Need for Layer 2
• Certain monitoring apps require server with
same role to be on the same VLAN
• Using same IP on dual homed servers
• Allows organic growth of server farms
• Migration is easier
19
Review of Layer 2 & Layer 3
• Layer 2
– One spanning tree for entire network
• Prevents loops
• Ignores alternate paths
• Layer 3
– Shortest path routing between source and destination
– Best-effort delivery
20
FAT Tree-Based Solution
• Connect end-host together using a fat-tree topology
– Infrastructure consist of cheap devices
• Each port supports same speed as endhost
– All devices can transmit at line speed if packets are distributed
along existing paths
– A k-port fat tree can support k3/4 hosts
21
Fat-Tree Topology
22
Problems with a Vanilla Fat-tree
• Layer 3 will only use one of the existing equal
cost paths
• Packet re-ordering occurs if layer 3 blindly takes
advantage of path diversity
23
Modified Fat Tree
• Enforce special addressing scheme in DC
– Allows host attached to same switch to route only
through switch
– Allows inter-pod traffic to stay within pod
– unused.PodNumber.switchnumber.Endhost
• Use two level look-ups to distribute traffic and
maintain packet ordering.
24
Two-Level Lookups
• First level is prefix lookup
– Used to route down the topology to endhost
• Second level is a suffix lookup
– Used to route up towards core
– Diffuses and spreads out traffic
– Maintains packet ordering by using the same ports for
the same endhost
25
Diffusion Optimizations
• Flow classification
– Eliminates local congestion
– Assign to traffic to ports on a per-flow basis instead of
a per-host basis
• Flow scheduling
– Eliminates global congestion
– Prevent long lived flows from sharing the same links
– Assign long lived flows to different links
26
Drawbacks
• No inherent support for VLAN traffic
• Data center is fixed size
• Ignored connectivity to the Internet
• Waste of address space
– Requires NAT at border
27
Data Center Traffic
Engineering
Challenges and Opportunities
28
Wide-Area Network
...
Servers
Router
DNS
Server
DNS-based
site
selection
29
Data
Centers
...
Servers
Router
Internet
Clients
Wide-Area Network: Ingress Proxies
...
Servers
Router
...
Data
Centers
Servers
Router
Proxy
Proxy
Clients
30
Traffic Engineering Challenges
• Scale
– Many switches, hosts, and virtual machines
• Churn
– Large number of component failures
– Virtual Machine (VM) migration
• Traffic characteristics
– High traffic volume and dense traffic matrix
– Volatile, unpredictable traffic patterns
• Performance requirements
31
– Delay-sensitive applications
– Resource isolation between tenants
Traffic Engineering Opportunities
• Efficient network
– Low propagation delay and high capacity
• Specialized topology
– Fat tree, Clos network, etc.
– Opportunities for hierarchical addressing
• Control over both network and hosts
– Joint optimization of routing and server placement
– Can move network functionality into the end host
• Flexible movement of workload
32
– Services replicated at multiple servers and data
centers
– Virtual Machine (VM) migration
PortLand: Main Idea
Add a new host
Key features
- Layer 2 protocol based on tree topology
- PMAC encode the position information
- Data forwarding proceeds based on PMAC
- Edge switch’s responsible for mapping between
PMAC and AMAC
- Fabric manger’s responsible for address resolution
- Edge switch makes PMAC invisible to end host
- Each switch node can identify its position by itself
- Fabric manager keep information of overall topology.
Corresponding to the fault, it notifies affected nodes.
Transfer a packet
Questions in discussion board
• Question about Fabric Manager
o How to make Fabric Manager robust ?
o How to build scalable Fabric Manager ?
 Redundant deployment or cluster Fabric manager could be solution
• Question about reality of base-line tree topology
o Is the tree topology common in real world ?
 Yes, multi-rooted tree topology has been a traditional topology in data center.
[A scalable, commodity data center network architecture
Mohammad La-Fares and et el, SIGCOMM ‘08]
Discussion points
o Is the PortLand applicable to other topology ?
 The Idea of central ARP management could be applicable.
 To solve forwarding loop problem, TRILL header-like method would be necessary.
 The benefits of PMAC ? It would require larger size of forwarding table.
• Question about benefits from VM migration
o VM migration helps to reduce traffic going through aggregate/core switches.
o How about user requirement change?
o How about power consumption ?
• ETC
o Feasibility to mimic PortLand on layer 3 protocol.
o How about using pseudo IP ?
o Delay to boot up whole data center
Status Quo: Conventional DC Network
Internet
CR
DC-Layer 3
AR
AR
S
S
CR
...
AR
AR
DC-Layer 2
Key
•
•
S
S
S
S
...
•
•
…
CR = Core Router (L3)
AR = Access Router (L3)
S = Ethernet Switch (L2)
A = Rack of app. servers
…
~ 1,000 servers/pod == IP subnet
Reference – “Data Center: Load balancing Data Center Services”, Cisco
2004
Conventional DC Network Problems
CR
S
CR
AR
AR
AR
AR
S
~ 200:1
S
S
S
S
S
~ S
~
40:1
5
…
:
1
S
S
…
...
S
…
S
…
Dependence on high-cost proprietary routers
Extremely limited server-to-server capacity
And More Problems …
CR
AR
S
S
S
…
CR
AR
S
~ 200:1
S
S
…
IP subnet (VLAN) #1
S
…
AR
AR
S
S
S
S
S
…
IP subnet (VLAN) #2
• Resource fragmentation, significantly lowering
utilization (and cost-efficiency)
And More Problems …
CR
AR
S
S
S
…
CR
AR
Complicated manual
L2/L3 re-configuration
S
~ 200:1
S
S
…
IP subnet (VLAN) #1
S
…
AR
AR
S
S
S
S
S
…
IP subnet (VLAN) #2
• Resource fragmentation, significantly lowering
cloud utilization (and cost-efficiency)
All We Need is Just a Huge L2
Switch,
or an Abstraction of One
...
CR
S
…
CR
AR
AR
AR
AR
S
S
S
S
S
S
S
S
S
…
S
S
...
…
…
VL2 Approach
• Layer 2 based using future commodity switches
• Hierarchy has 2:
– access switches (top of rack)
– load balancing switches
• Eliminate spanning tree
– Flat routing
– Allows network to take advantage of path diversity
• Prevent MAC address learning
– 4D architecture to distribute data plane information
– TOR: Only need to learn address for the intermediate switches
– Core: learn for TOR switches
• Support efficient grouping of hosts (VLAN replacement)
41
VL2
42
VL2 Components
• Top-of-Rack switch:
– Aggregate traffic from 20 end host in a rack
– Performs ip to mac translation
• Intermediate Switch
– Disperses traffic
– Balances traffic among switches
– Used for valiant load balancing
• Decision Element
– Places routes in switches
– Maintain a directory services of IP to MAC
• End-host
– Performs IP-to-MAC lookup
43
Routing in VL2
• End-host checks flow cache for MAC of flow
– If not found ask agent to resolve
– Agent returns list of MACs for server and MACs for intermediate
routers
• Send traffic to Top of Router
– Traffic is triple-encapsulated
• Traffic is sent to intermediate destination
• Traffic is sent to Top of rack switch of destination
44
The Illusion of a Huge L2 Switch
1. L2 semantics
2. Uniform high
capacity
…
…
3. Performance
isolation
…
…
Objectives and Solutions
Objective
Approach
Solution
1. Layer-2 semantics
Employ flat addressing
Name-location
separation & resolution
service
2. Uniform
high capacity between
servers
Guarantee bandwidth
for
hose-model traffic
Flow-based random
traffic indirection
(Valiant LB)
Enforce hose model
using existing
mechanisms only
TCP
3. Performance
Isolation
Name/Location Separation
Cope with host churns with very little overhead
Directory
Service
Switches run link-state routing and
maintain only switch-level topology
• Allows to use low-cost switches
…
x  ToR2
• Protects network and hosts from host-state churn
y  ToR3
• Obviates host and switch reconfiguration
z  ToR34
ToR1
…
. . . ToR2
. . . ToR3
. . . ToR4
ToR3 y
payload
ToR34 z
payload
47
x
yy, z
Servers use flat names
z
Lookup &
Response
Clos Network Topology
Offer huge aggr capacity & multi paths at modest cost
Int
D
(# of 10G ports)
Aggr
48
96
144
...
...
48
TOR
20
Servers
...
Max DC size
(# of Servers)
11,520
...
46,080
K aggr switches with D ports
103,680
......
........
20*(DK/4) Servers
Valiant Load Balancing: Indirection
Cope with arbitrary TMs with very little overhead
IANY
IANY
IANY
Links used
for up paths
Links used
for down paths
[ ECMP + IP Anycast ]
•
•
•
•
Harness huge bisection bandwidth
Obviate esoteric traffic engineering or optimization
Ensure robustness to failures
Work with switch mechanisms available today
T1
IANY T35 zy
T2
payload
x
49
T3
T4
T5
T6
1. Must
spread
Equal
Cost
Multitraffic
Path Forwarding
y2. Must ensure dst
z independence
Properties of Desired Solutions
• Backwards compatible with existing
infrastructure
– No changes in application
– Support of layer 2 (Ethernet)
• Cost-effective
– Low power consumption & heat emission
– Cheap infrastructure
•
•
•
•
Allows host communication at line speed
Zero-configuration
No loops
Fault-tolerant
50
Research Questions
• What topology to use in data centers?
– Reducing wiring complexity
– Achieving high bisection bandwidth
– Exploiting capabilities of optics and wireless
• Routing architecture?
– Flat layer-2 network vs. hybrid switch/router
– Flat vs. hierarchical addressing
• How to perform traffic engineering?
– Over-engineering vs. adapting to load
– Server selection, VM placement, or optimizing routing
• Virtualization of NICs, servers, switches, …
51
Research Questions
• Rethinking TCP congestion control?
– Low propagation delay and high bandwidth
– “Incast” problem leading to bursty packet loss
• Division of labor for TE, access control, …
– VM, hypervisor, ToR, and core switches/routers
• Reducing energy consumption
– Better load balancing vs. selective shutting down
• Wide-area traffic engineering
– Selecting the least-loaded or closest data center
• Security
52
– Preventing information leakage and attacks