CPS 210 Course Intro

Download Report

Transcript CPS 210 Course Intro

Model-Based Resource Provisioning
for a Web Service Utility
Ron Doyle*, Jeff Chase, Omer Asad,
Wei Jin, Amin Vahdat
Internet Systems and Storage Group
Department of Computer Science
Duke University
*
Internet Service Utilities
Shared server cluster
• Web hosting centers
• Shared reserve capacity to
handle surges and failures.
• Service/load multiplexing
• Dynamic provisioning
Service is contractual
• Performance isolation
• Differentiated service
• SLAs
Utility Resource Management
Goal: meet contractual service quality (SLA) targets
under changing load; use resources efficiently.
Approach: assign each hosted service a dynamic
“slice” of resources.
• Combine “slivers” of shared servers, i.e., CPU time
and memory.
Resource containers [Banga99], VMware ESX
[Waldspurger02], PlanetLab
• Assign shares of storage server I/O throughput.
Given the mechanisms for performance isolation and
proportional sharing, how do we set the knobs?
Adaptive Multi-Resource Provisioning
This work addresses resource allocation policy for
multiple resources, with a focus on memory & storage.
1. Provisioning: how much? [Muse SOSP01]
2. Assignment: which servers and storage units?
clients
Utility data
center
Actuator
(directives)
Utility OS
executive or
service
Monitor
manager
(observations)
Model-Based Provisioning
Resources interact in complex ways to determine
overall service performance.
Resource
manager
candidate
allotments
performance
predictions
Application
models
workload profiles
(e.g., access locality)
storage models
• Incorporate a model of
application behavior.
• Model predicts effects of
candidate allotments.
• Plan allotments that are
predicted to yield desired
behavior.
• Monitor load and adapt as
load intensity varies.
Goals
Research question: how can a resource manager
incorporate these models when they exist?
Manage multiple resources for diverse system goals.
Meet SLA targets for response time
Use surplus to optimize global average response
time, yield, or value.
Adjust to constraints discovered during assignment.
Storage-aware caching [Forney03]
Demonstrate that even simple models are a powerful
basis for dynamic resource management.
Non-goals
We are NOT trying to:
• build better models (you can plug in your favorite)
• parameterize or adapt models online from system
observations
• manage network bandwidth
• schedule resources within each slice
• solve the assignment problem (bin-packing)
• allocate resources across the wide area
• make probabilistic performance guarantees
Assume stable average case behavior at each load level,
and provision for average response time.
System Context
configuration
commands
MBRP
Load and
performance
measures
clients
offered
load λ per
service
reconfigurable
redirecting
switch
Muse [SOSP01]
server pool
stateless
interchangeable
storage
tier
Enforcing Slices
Our prototype uses the Dash Web server [Asad02] to
enforce resource control for slices at user level.
• Based on Flash [Pai99] using DAFS network storage.
Asynchronous I/O from user space to user-level cache
Low overhead (zero-copy, etc.), and user-level control
• Fully asynchronous, event-driven server
“SEDA meets Click.”
• Independently size caches for co-hosted services.
• Request Windows [Jin03]: control the number of
outstanding I/Os on a per-service basis.
• Dash is part of the utility’s trusted computing base.
A Simple Web Service Model
arrival
rate λ
• Streams of requests with
stable average case behavior
per request class
CPU
• Varying load intensity λ
• Provision each stage, and M
Object cache (M)
Storage
• Downstream demand grows
and shrinks with M (inverse)
λS
M yields hit rate H
λS= λ (1 – H)
• Bottlenecks limit demand
downstream
• Generalize to stages or tiers
Web Cache Model
• Footprint T objects
• Average size S
• Size is independent of
popularity
• Cache M objects
• Given Zipf popularity 
H1
0.8
1 – M1 – α
H = -------------1 – T1 – α
0.6
0.4
.7
• LFU approximation
• Integrate over the
Zipf PDF
.9
0.2
1.1
5000
10000
15000
Cache Size (M)
20000
Storage Arrival Rate (IOPS)
• Each miss requires S
I/O operations.
• S determines intensity of
bulk I/O in this service’s
storage load.
• Model predicts storage
response time RSfor load
λS given an IOPS share 
per-service.
• Account for prefetching
and sequential locality
indirectly.
λS
70
λs = λS(1 – H)
60
50
.7
.9
1.1
40
30
20
10
5000
10000
15000
Cache Size (M)
20000
140
• Dynamic cache resizing
• Storage IOPS demand
λS matches model
prediction (squint)
• A few transient shifts
in request locality
120
Predicted λ Storage (IOPS)
Storage IOPS Moving Average
100
3.5
3
80
2.5
60
2
1.5
40
1
20
0.5
0
0
0
10
20
30
Time (Minutes)
40
50
λS (IOPS)
• Load λ grows during
trace segment.
4
Consumed Memory
Cache Size (MB)
• IBM 2001 segment
4.5
Allotted Memory
Thousands
An Example using Dash
A Model-Based Allocator
MBRP is a package of three primitives that coordinate
with an assignment planner.
• Candidate
Plan an initial allotment vector with CPU share and [M, ]
• LocalAdjust
Adjust a vector to adapt to a resource constraint or
surplus, while staying on target for response time.
• GroupAdjust
Modify a set of vectors to adapt to a fixed resource
constraint or surplus exposed during assignment.
Use any surplus to meet system-wide goals.
Candidate
There is a large space of possible allotment vectors
to meet a given response time target.
Simplify the search space with a simple principle:
Build a balanced system.
• Set the CPU share and storage allotment  to hit a
preconfigured target utilization level .
The  determines response time at storage and CPU.
• Select the minimum M and H that can hit the SLA
target for overall response time.
Refine  based on M and H and resulting λS.
Converges quickly.
LocalAdjust
250
• E.g., in this graph it grows
memory to respond to an
IOPS constraint.
Note: it’s not linear.
LocalAdjust
200
50
150
40
30
100
20
50
Storage Allotment (φ)
10
Memory Allotment
0
0
100
200
300
Arrival Rate (λ)
0
400
Memory (MB)
• Take as much as you can
of the constrained
resource, then rebalance
to meet SLA target.
Candidate
60
Storage Allotment (φ)
• LocalAdjust adapts to
constraint in one resource
by adding more of
another.
70
GroupAdjust
50
E.g., planner mapped all
vectors to a shared server,
leaving surplus memory.
Adapt vectors to conform to
constraint or use the surplus
to meet a global goal.
E.g., for services with the
same profiles (, S, T),
prefer the service with the
heaviest load.
λ = 50
45
Memory Allotment (MB)
Input: set of allotment
vectors, with a group
constraint or surplus.
λ = 25
λ = 75
λ = 100
40
35
30
25
20
15
10
5
0
0
20
40
60
Available Memory (MB)
80
100
Example: Differentiated Service
Different SLA targets.
Provision memory to meet
targets first, then optimize
global response time.
(Give next unit of surplus
memory to the most
constrained service.)
30
25
Memory Allotment (MB)
Four identical services:
- same load λ
- same profiles (, S, T)
- same storage units
20
15
10
5
Highest SLA Response Time
2nd Highest SLA Response
3rd Highest SLA Reponse
Lowest SLA Response Time
0
45
55
65
75
Increase Total Memory (MB) ->
85
Some Other Results in the Paper
1. GroupAdjust for services with different profiles
and equivalent loads: prefer higher-locality
services.
2. Simple dynamic example to optimize for global
response time in a storage-aware fashion.
3. “Putting it all together” experiment: adjust to
changes in locality, SLA targets, and available
resources as well as changes in load.
4. Handle overload by shifting a co-hosted service to
another server (bin-packing assignment).
5. Preliminary evaluation of storage model.
Conclusion
Models are important for self-managing systems.
MBRP shows how to use models to adapt proactively.
Respond proactively to changing load signal, rather than
reacting to off-target performance measures.
It’s easy to plug better models into the framework.
It seems clear that we can generalize this.
Broader class of systems (e.g., multi-tier) and system
goals (e.g., availability).
But: models may be brittle or just plain wrong (HAL).
Self-managing systems will combine proactive and
reactive mechanisms.
http://issg.cs.duke.edu
http://www.cs.duke.edu/~chase
Assignment Planning
Map services to servers and
storage units
Utility Center with Distributed
Servers and Storage
Allocator primitives work in
concert with assignment
planning
Bin-packing services, balancing
affinity, migration costs, local
constraints/ surplus
Assignment
Related Work
Proportional-share schedulers: mechanism to enforce provisioning
policies.
• Resource Containers[Banga99], Cluster Reserves[Aron00]
Response-time schedulers: meet SLA targets without explicit
partitioning/provisioning.
• Neptune[Shen02], Facade [Lumb03]
Adaptive Resource Management for Servers: reactive, feedback-based
adjustment of server resources.
• Web Server Performance Guarantees[Abdelzaher02], Predictable
Web Server QoS[Aron-PhD], SEDA[Welsh01]
Memory/storage management: goal-directed allotment of resources to
services.
• Storage Aware Caching[Forney02], Value Sensitive Caching [Kelly99],
Hippodrome[Anderson02]
Multiple Shared Resources
Bottleneck Behavior
Non-bottleneck resource adjustments have little
effect.
Global Constraints
Services compete for resources in zero-sum game
Local Constraints
Service assignment to nodes exposes local resource
constraints.
Caching
Memory allotment affects storage load for single
service, impacting available resources for other
services
Adaptive Resource Provisioning
Utility OS Services
• Predictable average-case response time
• Resource intensive
Workload Models predict
• Resource Demand
• Resource Interaction
• Effect of allotment decisions
Framework is reactive to changes in workload
characteristics for dynamic adaptation
Outline
Overview
Resource control mechanisms
Web Service Models
Model-Based Allocator
Conclusions