VL2: A Scalable and Flexible Data Center Network

Transcript VL2: A Scalable and Flexible Data Center Network

Networking the Cloud
Presenter: b97901184 電機三姜慧如





Data Centers holding tens to hundreds of
thousands of servers.
Concurrently supporting a large number of
distinct services.
Economies of scale.
Dynamically reallocate servers among
services as workload pattern changes.
High utilization is needed.  Agility!


Agility:
Any machine to be able to play any role.
“Any Service, any Server.”


Ugly Secret: 30% utilization is considered good.
Uneven application fit:
-- Each server has CPU, memory, disk: most applications exhaust one
resource, stranding the others.

Long provisioning timescales:
-- New servers purchased quarterly at best.

Uncertainty in demand:
-- Demand for a new service can spike quickly.

Risk management:
-- Not having spare servers to meet demand brings failure just when
success is at hand.

Session state and storage constraints:
-- If the world were stateless servers….

Workload management
-- Means for rapidly installing a service’s code on a
server.  Virtual Machines, disk images.

Storage management
-- Means for a server to access persistent data.
Distributed filesystems.

Network
-- Means for communicating with other servers,
regardless of where they are in the data center.
But:
 Today’s data center network prevent agility in
several ways.
CR
S
CR
AR
AR
AR
AR
S
S
S
S
1:240
I want
S more
1:80 S
S
1:5
…


S
...
…
…
Static network assignment
Fragmentation of resource
I have spare ones,
S
S
but…



S
…
Poor server to server connectivity
Traffics affects each other
Poor reliability and utilization

Achieve scale by assigning servers
topologically related IP addresses and dividing
servers among VLANs.
 Limited utility of VMs, cannot migrate out the
original VLAN while keeping the same IP address.
  Fragmentation of address space.
  Configuration needed when reassigned to different
services.

CR
AR
1. L2 semantics
...
AR
2. Uniform high
S
S
capacity
S
S
…
9
S
AR
3. Performance
S
isolation
S
…
CR
...
S
S
…
AR
S
S
S
…
Developers want a mental model where all
their servers, and only their servers, are
plugged into an Ethernet switch.
1. Layer-2 semantics
-- Flat addressing, so any server can have any IP Address.
-- Server configuration is the same as in a LAN.
-- VM keeps the same IP address even after migration
2. Uniform high capacity
-- Capacity between servers limited only by their NICs.
-- No need to consider topology when adding servers.
3. Performance isolation
-- Traffic of one service should be unaffected by others.
Objective
1. Layer-2 semantics
Approach
Employ flat addressing
Solution
Name-location separation
& resolution service
2. Uniform
high capacity
between servers
Guarantee bandwidth
for
hose-model traffic
Flow-based random traffic
indirection
(Valiant LB)
3. Performance
Enforce hose model using
existing mechanisms only
TCP
Isolation
“Hose”: each node has ingress/egress bandwidth constraints
11

Ethernet switching (layer 2)
 Cheaper switch equipment
 Fixed addresses and auto-configuration
 Seamless mobility, migration, and failover

IP routing (layer 3)
 Scalability through hierarchical addressing
 Efficiency through shortest-path routing
 Multipath routing through equal-cost multipath

So, like in enterprises…
 Data centers often connect layer-2 islands by IP routers
12


Data-Center traffic analysis:
DC traffic != Internet traffic
 Traffic volume between servers to entering/leaving data center is 4:1
 Demand for bandwidth between servers growing faster
 Network is the bottleneck of computation

Flow distribution analysis:
 Majority of flows are small, biggest flow size is 100MB
 The distribution of internal flows is simpler and more uniform
 50% times of 10 concurrent flows, 5% greater than 80 concurrent
flows

Traffic matrix analysis:
 Poor summarizing of traffic patterns
 Instability of traffic patterns

Failure characteristics:
 Pattern of networking equipment failures: 95% < 1min, 98% < 1hr,
99.6% < 1 day, 0.09% > 10 days
 No obvious way to eliminate all failures from the top of the hierarchy

Flat Addressing:
Allow service instances (ex. virtual machines) to be placed
anywhere in the network.

Valiant Load Balancing:
(Randomly) Spread network traffic uniformly across network
paths.

End-system based address resolution:
To scale to large server pools, without introducing complexity
to the network control plane.

Design principle:
 Randomizing to cope with volatility:
▪ Using Valiant Load Balancing (VLB) to do destination independent
traffic spreading across multiple intermediate nodes
 Building on proven networking technology:
▪ Using IP routing and forwarding technologies available in commodity
switches
 Separating names from locators:
▪ Using directory system to maintain the mapping between names and
locations
 Embracing end systems:
▪ A VL2 agent at each server
Offer huge aggr capacity & multi paths at modest cost
Int
...
...
Aggr
K aggr switches with D ports
...
...
18
TOR
20
Servers
......
........
20*(DK/4) Servers
Cope with arbitrary TMs with very little overhead
IANY
IANY
IANY
Links used
for up paths
Links used
for down paths
[ ECMP + IP Anycast ]
•
•
•
•
Harness huge bisection bandwidth
Obviate esoteric traffic engineering or optimization
Ensure robustness to failures
Work with switch mechanisms available today
T1
IANY T35 zy
T2
payload
x
20
T3
T4
T5
T6
1. Must
spread
Equal
Cost
Multitraffic
Path Forwarding
y2. Must ensure dst
z independence
IANY
IANY
IANY
Links used
for up paths
Links used
for down paths
T1
IANY T53
T2
T3
x
y
T4
T5
yz payload
z
T6

How Smart servers use Dumb switches– Encapsulation.


Commodity switches have simple forwarding primitives.
Complexity moved to servers -- computing the headers.
RSM
RSM
Servers
3. Replicate
RSM
RSM
4. Ack
(6. Disseminate)
2. Set
...
DS
DS
2. Reply
...
DS
2. Reply
1. Lookup
...
5. Ack
1. Update
Agent
Agent
“Lookup”
“Update”
Directory
Servers
Switches run link-state routing and
maintain only switch-level
topology
LAs
ToR1 . . .
ToR3 y payload
ToR34 z payload
AAs
ToR2
x
...
ToR3
y,yz
Servers use flat
names
...
Directory
Service
ToR4
…
x  ToR2
y  ToR3
z  ToR34
…
z
Lookup &
Response


Data center Oses already heavily modified for
VMs, storage clouds, etc.
No change to apps or clients outside DC.

Uniform high capacity:
 All-to-all data shuffle stress test:
▪ 75 servers, deliver 500MB
▪ Maximal achievable goodput is 62.3
▪ VL2 network efficiency as 58.8/62.3 = 94%

Performance isolation:
 Two types of services:
▪ Service one: 18 servers do single TCP transfer all the time
▪ Service two: 19 servers starts a 8GB transfer over TCP every 2 seconds
▪ Service two: 19 servers burst short TCP connections

Convergence after link failures
 75 servers
 All-to-all data shuffle
 Disconnect links between intermediate and aggregation switches