18_performance

Transcript 18_performance

System Performance
& Scalability
i206 Fall 2010
John Chuang
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
John Chuang
http://bits.blogs.nytimes.com/2007/11/26/yahoos-cybermonday-meltdown/index.html
2
Computing Trends
 Multi-core CPUs
 Data centers
 Cloud computing
 What are the drivers?
- scalability, availability,
cost-effectiveness
John Chuang
Servic e
Server
Client
Server
Client
Server
3
Lecture Outline
 Performance Metrics
 Availability
 Queuing theory
- M/M/1 queue
 Scalability
- M/M/m queue
John Chuang
4
What is Performance?
 Users want fast response time and high
availability
 Managers want happy users, and many of
them, while minimizing cost
 What are standard measures of system
performance?
John Chuang
5
Performance Metrics




Response time (seconds)
Throughput (MIPS, Mbps, TPS, ...)
Resource utilization (%)
Availability (%)
John Chuang
6
Availability
QuickTime™ and a
decompressor
are needed to see this picture.
Availability = MTTF / (MTTF + MTTR)
-Mean-time-to-failure (MTTF)
-Mean-time-to-recover (MTTR)
Quic kT ime™ and a
dec ompress or
are needed to s ee this pi cture.
Availability
Down-time per year
One hour down-time per:
90%
36 days
9 hours
99%
3.7 days
4.1 days
99.9%
9 hours
41.6 days
99.99%
53 minutes
1.14 years
99.999%
5 minutes
11.41 years
John Chuang
7
Response Time
Client
Formulate
request
Network
Server
Message latency
Queuing time
Processing time
Message latency
Interpret
response
Adapted from: David Messerschmitt
M/M/1 Queue (m = 100)
Response Time (s)
0.25
0.2
0.15
0.1
0.05
0
0
John Chuang
0.2
0.4
0.6
Utilization
0.8
1
8
Queuing Theory
1. Arrival
Process
5. Customer
Population
John Chuang
6. Service
Discipline
4. System
Capacity
2. Service Time
Distribution
3. Number of
Servers
Source: Raj Jain
9
Kendall’s Notation (1953)
1. Arrival
Process
5. Customer
Population
2. Service Time Distribution
6. Service
Discipline
4. System
Capacity
3. Number of Servers
 A/B/c/k/N/D
-
A: arrival process
B: service time distribution
c: number of servers
k: system capacity
N: population size
D: service discipline
John Chuang
M: Markov (exponential,
memoryless, random,
Poisson)
D: deterministic
E: Erlang
H: hyper-exponential
G: general
FCFS: first come first served
FCLS: first come last served
RR: round-robin
etc.
10
Example Systems
/
/FCFS (simplified as M/M/1)
8
8
 M/M/1/
-
Markovian (Poisson, memoryless) arrival
Markovian service time
1 server
Infinite server capacity
Infinite arrival stream
First-come-first-serve discipline
 Other examples:
- M/M/1/k (finite capacity)
- M/M/m (m servers)
- G/D/1 (arbitrary arrival, deterministic service time)
John Chuang
11
M/M/1 Queue
 Poisson arrival, with average arrival rate of l jobs/sec
 Poisson service, with average service rate of m jobs/sec
 Single server with infinite queue
 System utilization (hopefully < 1):
r = l/m
 Average number of jobs in system:
N =  n·pn = r/(1 - r)
 System throughput (if r < 1) :
X=l
 Average response time (from Little’s Law):
R = N/X = 1/(m - l)
John Chuang
12
Example: Web Server
Web server receives 40 requests/second
Web server can process 100 requests/second
What is server utilization?
At any given time, how many requests are at
server (waiting plus being processed)?
 What is the mean total delay at server (waiting
plus processing)?
 What happens when traffic rate doubles?




John Chuang
13
Example: Web Server





l = 40 requests/second
m = 100 requests/second
Utilization = r = l/m = 40/100 = 40%
# of requests = N = r/(1 - r) = 0.67
Average time spent at server = R = N/X =
0.67/40 = 17ms
John Chuang
14
Example: Traffic Doubled





l = 80 requests/second
m = 100 requests/second
Utilization = r = l/m = 80/100 = 80%
# of requests = N = r/(1 - r) = 4
Average time spent at server = R = N/X =
4/80 = 50ms (more than doubled!)
John Chuang
15
Approaching Congestion





l = 99 requests/second
m = 100 requests/second
Utilization = r = l/m = 99/100 = 99%
# of requests = N = r/(1 - r) = 99
Average time spent at server = R = N/X =
99/99 = 1 second!
John Chuang
16
Utilization Affects Performance
M/M/1 Queue (m = 100)
Response Time (s)
0.25
0.2
0.15
0.1
0.05
0
0
0.2
0.4
0.6
0.8
1
Utilization
John Chuang
17
M/M/1/k Queue (Finite Capacity)
 r = l/m
 N = r/(1-r) – (k+1)rk+1/(1-rk+1)
 R = N/X = N/leff
- where leff = l(1-Pk) = effective arrival rate
- and Pk = rk(1-r)/(1-rk+1) = probability of a
full queue
 Loss rate = l - leff
John Chuang
18
M/M/1/k Response Time
M/M/1 and M/M/1/k Queues (m = 100)
0.25
M/M/1
M/M/1/1
Response Time (s)
0.2
M/M/1/2
M/M/1/10
0.15
M/M/1/100
0.1
0.05
0
0
0.2
0.4
0.6
0.8
1
Utilization
John Chuang
19
M/M/1/k Throughput
Throughput given Service rate m = 100 jobs/sec
100
M/M/1
Throughput (jobs/sec)
90
M/M/1/1
80
M/M/1/2
70
M/M/1/10
60
M/M/1/100
50
40
30
20
10
0
0
0.2
0.4
0.6
0.8
1
Utilization
John Chuang
20
Lecture Outline
 Performance Metrics
 Availability
 Queuing theory
- M/M/1 queue
 Scalability
- M/M/m queue
John Chuang
21
Scalability
 The capability of a system to increase
total throughput under an increased load
when resources (typically hardware) are
added
- Cost of additional resource
- Performance degradation under increased
load
John Chuang
22
Scalability Example
 Original web server: can process m
requests/sec; accepts requests at l/sec
 Now request rate increases to 10l/sec
and web server is swamped (r = 10l/m)!
 Need to add new hardware!
John Chuang
23
Which is better?
 Option 1: One big web server that can process
10m requests/sec
 Option 2: Ten web servers, each can process m
requests/sec; each accepts 10% of requests
(l/sec per server)
 Option 3: Ten web servers, each can process m
requests/sec; share single queue (load
balancer) that accepts requests at 10l/sec
John Chuang
24
Option 2: (ten M/M/1 queues)
l
m
l
m
l
m
l
m
l
m
l
John Chuang
10l
10m
m
m
m
Option 3: M/M/10 queue
m
m
l
m
l
m
l
m
l
Option 1: M/M/1 queue with big server
m
m
10l
m
m
m
m
m
25
M/M/m Queue (m Servers)
 r = l/mm
 N = mr + rf/(1-r)
where
and

QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
John Chuang
26
Which is Better?
m = 10; m = 100; l = 50
Option 1
(M/M/1 big)
Utilization (r)
0.5
Number of
requests (N)
Response
Time (R)
Option 2
(ten M/M/1)
0.5
Option 3
(M/M/10)
0.5
1
1*10
5.036
2ms
20ms
10.07ms
Remember: Scalability is not just about performance!
John Chuang
27

18_performance

Transcript 18_performance

Directory