18_performance
Download
Report
Transcript 18_performance
System Performance
& Scalability
i206 Fall 2010
John Chuang
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
John Chuang
http://bits.blogs.nytimes.com/2007/11/26/yahoos-cybermonday-meltdown/index.html
2
Computing Trends
Multi-core CPUs
Data centers
Cloud computing
What are the drivers?
- scalability, availability,
cost-effectiveness
John Chuang
Servic e
Server
Client
Server
Client
Server
3
Lecture Outline
Performance Metrics
Availability
Queuing theory
- M/M/1 queue
Scalability
- M/M/m queue
John Chuang
4
What is Performance?
Users want fast response time and high
availability
Managers want happy users, and many of
them, while minimizing cost
What are standard measures of system
performance?
John Chuang
5
Performance Metrics
Response time (seconds)
Throughput (MIPS, Mbps, TPS, ...)
Resource utilization (%)
Availability (%)
John Chuang
6
Availability
QuickTime™ and a
decompressor
are needed to see this picture.
Availability = MTTF / (MTTF + MTTR)
-Mean-time-to-failure (MTTF)
-Mean-time-to-recover (MTTR)
Quic kT ime™ and a
dec ompress or
are needed to s ee this pi cture.
Availability
Down-time per year
One hour down-time per:
90%
36 days
9 hours
99%
3.7 days
4.1 days
99.9%
9 hours
41.6 days
99.99%
53 minutes
1.14 years
99.999%
5 minutes
11.41 years
John Chuang
7
Response Time
Client
Formulate
request
Network
Server
Message latency
Queuing time
Processing time
Message latency
Interpret
response
Adapted from: David Messerschmitt
M/M/1 Queue (m = 100)
Response Time (s)
0.25
0.2
0.15
0.1
0.05
0
0
John Chuang
0.2
0.4
0.6
Utilization
0.8
1
8
Queuing Theory
1. Arrival
Process
5. Customer
Population
John Chuang
6. Service
Discipline
4. System
Capacity
2. Service Time
Distribution
3. Number of
Servers
Source: Raj Jain
9
Kendall’s Notation (1953)
1. Arrival
Process
5. Customer
Population
2. Service Time Distribution
6. Service
Discipline
4. System
Capacity
3. Number of Servers
A/B/c/k/N/D
-
A: arrival process
B: service time distribution
c: number of servers
k: system capacity
N: population size
D: service discipline
John Chuang
M: Markov (exponential,
memoryless, random,
Poisson)
D: deterministic
E: Erlang
H: hyper-exponential
G: general
FCFS: first come first served
FCLS: first come last served
RR: round-robin
etc.
10
Example Systems
/
/FCFS (simplified as M/M/1)
8
8
M/M/1/
-
Markovian (Poisson, memoryless) arrival
Markovian service time
1 server
Infinite server capacity
Infinite arrival stream
First-come-first-serve discipline
Other examples:
- M/M/1/k (finite capacity)
- M/M/m (m servers)
- G/D/1 (arbitrary arrival, deterministic service time)
John Chuang
11
M/M/1 Queue
Poisson arrival, with average arrival rate of l jobs/sec
Poisson service, with average service rate of m jobs/sec
Single server with infinite queue
System utilization (hopefully < 1):
r = l/m
Average number of jobs in system:
N = n·pn = r/(1 - r)
System throughput (if r < 1) :
X=l
Average response time (from Little’s Law):
R = N/X = 1/(m - l)
John Chuang
12
Example: Web Server
Web server receives 40 requests/second
Web server can process 100 requests/second
What is server utilization?
At any given time, how many requests are at
server (waiting plus being processed)?
What is the mean total delay at server (waiting
plus processing)?
What happens when traffic rate doubles?
John Chuang
13
Example: Web Server
l = 40 requests/second
m = 100 requests/second
Utilization = r = l/m = 40/100 = 40%
# of requests = N = r/(1 - r) = 0.67
Average time spent at server = R = N/X =
0.67/40 = 17ms
John Chuang
14
Example: Traffic Doubled
l = 80 requests/second
m = 100 requests/second
Utilization = r = l/m = 80/100 = 80%
# of requests = N = r/(1 - r) = 4
Average time spent at server = R = N/X =
4/80 = 50ms (more than doubled!)
John Chuang
15
Approaching Congestion
l = 99 requests/second
m = 100 requests/second
Utilization = r = l/m = 99/100 = 99%
# of requests = N = r/(1 - r) = 99
Average time spent at server = R = N/X =
99/99 = 1 second!
John Chuang
16
Utilization Affects Performance
M/M/1 Queue (m = 100)
Response Time (s)
0.25
0.2
0.15
0.1
0.05
0
0
0.2
0.4
0.6
0.8
1
Utilization
John Chuang
17
M/M/1/k Queue (Finite Capacity)
r = l/m
N = r/(1-r) – (k+1)rk+1/(1-rk+1)
R = N/X = N/leff
- where leff = l(1-Pk) = effective arrival rate
- and Pk = rk(1-r)/(1-rk+1) = probability of a
full queue
Loss rate = l - leff
John Chuang
18
M/M/1/k Response Time
M/M/1 and M/M/1/k Queues (m = 100)
0.25
M/M/1
M/M/1/1
Response Time (s)
0.2
M/M/1/2
M/M/1/10
0.15
M/M/1/100
0.1
0.05
0
0
0.2
0.4
0.6
0.8
1
Utilization
John Chuang
19
M/M/1/k Throughput
Throughput given Service rate m = 100 jobs/sec
100
M/M/1
Throughput (jobs/sec)
90
M/M/1/1
80
M/M/1/2
70
M/M/1/10
60
M/M/1/100
50
40
30
20
10
0
0
0.2
0.4
0.6
0.8
1
Utilization
John Chuang
20
Lecture Outline
Performance Metrics
Availability
Queuing theory
- M/M/1 queue
Scalability
- M/M/m queue
John Chuang
21
Scalability
The capability of a system to increase
total throughput under an increased load
when resources (typically hardware) are
added
- Cost of additional resource
- Performance degradation under increased
load
John Chuang
22
Scalability Example
Original web server: can process m
requests/sec; accepts requests at l/sec
Now request rate increases to 10l/sec
and web server is swamped (r = 10l/m)!
Need to add new hardware!
John Chuang
23
Which is better?
Option 1: One big web server that can process
10m requests/sec
Option 2: Ten web servers, each can process m
requests/sec; each accepts 10% of requests
(l/sec per server)
Option 3: Ten web servers, each can process m
requests/sec; share single queue (load
balancer) that accepts requests at 10l/sec
John Chuang
24
Option 2: (ten M/M/1 queues)
l
m
l
m
l
m
l
m
l
m
l
John Chuang
10l
10m
m
m
m
Option 3: M/M/10 queue
m
m
l
m
l
m
l
m
l
Option 1: M/M/1 queue with big server
m
m
10l
m
m
m
m
m
25
M/M/m Queue (m Servers)
r = l/mm
N = mr + rf/(1-r)
where
and
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
John Chuang
26
Which is Better?
m = 10; m = 100; l = 50
Option 1
(M/M/1 big)
Utilization (r)
0.5
Number of
requests (N)
Response
Time (R)
Option 2
(ten M/M/1)
0.5
Option 3
(M/M/10)
0.5
1
1*10
5.036
2ms
20ms
10.07ms
Remember: Scalability is not just about performance!
John Chuang
27