SOSP-yuan.ppt

Download Report

Transcript SOSP-yuan.ppt

Energy-Efficient Soft Real-Time
CPU Scheduling for
Mobile Multimedia Systems
GRACE
Wanghong Yuan, Klara Nahrstedt
Department of Computer Science
University of Illinois at Urbana-Champaign
Mobile Multimedia Devices
Challenges
– Demanding QoS requirements
– Limited system resources
 Energy
Operating system
– Manage resources to conserve energy while
supporting quality
 CPU
Opportunities
Hardware adapts performance for energy
– Dynamic frequency/voltage scaling (DVS)
 Energy  speed2
Applications: soft real-time nature
– Release job periodically
 Soft deadline
– Meet deadline statistically (e.g., 95%)
GRACE-OS
Enhanced CPU scheduler
– Soft real-time scheduling + DVS
 Which app, when, how long and how fast
– Stochastic scheduling decisions
 Minimize energy while providing soft
guarantees
Part of
GRACE
cross-layer adaptation framework
Overview
Multimedia Applications
GRACE-OS
monitoring
stochastic requirements
scheduling
SRT Scheduler
demand
Profiler
distribution
time constraint
Speed Adaptor
speed scaling
CPU
Demand Prediction
1. Online profiling
– Count number of cycles used by each job
2. Online estimation
– Group and count occurrence frequency
cumulative
probability
1
F(x) = P [X  x]
Cmin=b0 b1 b2
P[X  bk]
bk
br-1 br=Cmax
Demand distribution of mpgplay
Demand distribution of tmndec
1
1
0.8
0.6
first 50
first 100
all frames
0.4
0.2
0
cumulative probability
cumulative probability
Observations
0.8
first 50
first 100
all frames
0.6
0.4
0.2
0
4.5
5.7 6.9 8.1 9.3
frame cycles (millions)
7.2 8.2 9.2 10.2 11.2 12.2
frame cycles (millions)
Demand distribution is stable or changes slowly
Stochastic Allocation
How many cycles to allocate per job?
– Worst-case vs. stochastic
– Just enough to meet statistical requirements
Requiring  percent of deadlines
 Each job meets deadline with probability 
 Allocate C cycles, such that F(C)=P[XC]  
Find the C
Set C to smallest bk with F(bk)  
1
cumulative
probability

stochastic requirement
F(x)
b0 b1 b2
bk
stochastic allocation C
br-1 br
Scheduling
Earliest deadline first (EDF) scheduling
1. Allocate cycle budget per job
2. Execute job with earliest deadline and +budget
3. Charge budget by number of cycles consumed
 Preempt if budget is exhausted
Which job to execute, when, how long
How Fast ?
Intuitively, uniform speed
– Minimum energy if use exactly the allocated
However, jobs use cycles statistically
– Often complete before using up the allocated
– Potential to save more energy
 Stochastic DVS
Stochastic DVS
For each job
1. Allocate time
2. Find speed Sx for each allocated cycle x
 Time is 1/Sx and energy is (1 - F(x))S2x
such that
Speed Schedule
Piece-wise approximation
b0
Speed schedule
– List of points (cycle bi, speed Sbi)
Change speed to Sbi at bi cycles
bk
Example
1 x 106
200 MHz
speed (MHz)
cycle:
0
speed: 100 MHz
2 x 106
400 MHz
400
6x
105
200
100
106
106
cycle usage
Job 1=2.6x106 cycles
200
100
106
4x
105
cycle usage
Job 2=1.4x106 cycles
Implementation
Hardware: HP N5470 laptop
– Athlon CPU (300, 500, 600, 700, 800, 1000MHz)
 Round speed schedule to upper bound
GRACE-OS: extension to Linux kernel 2.4.18
process
system
control
calls
block
standard
Linux
scheduler
SRT-DVS modules
• Soft real-time scheduling
• PowerNow speed scaling
hook
Evaluation
Compare with deterministic allocation or DVS
DVS
uniform reclamation
worst-case wrsUni
allocation stochastic
stoUni
stochastic
wrsRec
wrsSto
stoRec
GRACE-OS
Metrics
Quality support
– Deadline miss ratio
 Applications require to meet 95%
Energy conservation
– CPU time distribution at speeds [Flautner02]
 More time in low speeds  better
– Normalized energy
Normalized Energy
56.7
20.5
5
5
5
5
5
5
7.8
7.8
7.8
7.8
7.8
7.8
3.7
3.3
2.7
3.7
3.3
2.7
8.2
17.2
8.2
15
21.9
28.8
17.2
30
32.5
45
30.1
42.1
60
28.8
normalzied energy
75
0
mpgplay
wrsUni
wrsRec
tmndec
wrsSto
toast
stoUni
madplay
concurrent
stoRec
grace-os
GRACE-OS consumes least energy
However, limited due to few speed options
Time Distribution (concurrent run)
300MHz
500MHz
600MHz
700MHz
800MHz
1000MHz
80
53.2
64.4
20.7
20
52.7
40
83.8
60
20.6
% of CPU time
100
0
wrsUni
wrsRec
wrsSto
stoUni
stoRec
grace-os
GRACE-OS spends most busy time at lowest
Deadline Miss Ratio
miss ratio (%)
10
8
mpgplay
concurrent
6
5.2
4.8
4.9
4
2
0.4 0.3
0.3 0.5
0.5 0.4
0.5
wrsUni
wrsRec
wrsSto
stoUni
0.6
0.4
0
GRACE-OS bounds miss ratio
stoRec grace-os
Conclusion
Lessons
– Effective for multimedia applications
 Periodic
 Stable demand distribution
– Limited by few speed options
Future work
– Impact on other resources
– GRACE http://rsim.cs.uiuc.edu/grace
Backup
Power P(s)  s3
Energy E(s)  s2
speed
1
deadline
deadline
1/2
t
2t
E = p(1) x t = t
t
2t
E = p(1/2) x 2t
= (1/2)3 x 2t
= t/4
SRT + DVS
context switch
1. Store speed for switched-out
2. New speed for switched-in
speed
speed up
within job
A1
new job
A1
B1
B1 A1
execution
A1
A2