Milestone Slides
Download
Report
Transcript Milestone Slides
Migration Cost Aware Task Scheduling
18-743 Milestone
Shraddha Joshi, Brian Osbun
10/24/2013
Outline
Problem description
Approach
Milestone I
Methodology
Results
Observations
Future Work
2
Problem Description
Dynamic schedulers allow task migration during program
execution
Migrating threads to new cores can have hidden costs
Cold cache misses
Overhead of moving architectural state
Congestion on interconnect during transfer
Most task schedulers ignore migration overhead
Problem statement: quantify and consider the task migration
cost when evaluating scheduling possibilities
3
Approach
Architectural transfer is essentially fixed cost
Can be modeled independently
Cache effects have a larger and variable cost
Published theory requires static analysis*
Determination of “useful” blocks in cache
Reaching memory blocks (RMB): all blocks that may be in cache at a point
Live memory blocks (LMB): all blocks that may be referenced before eviction
Usefulness is intersection of RMB and LMB
Dynamic analysis
Need prediction!
For now: everything’s useful!
*Source-Hardy et. al. Estimation of Cache Related Migration Delays for Multi-Core Processors with
Shared Instruction Caches
4
Milestone I
To be able to quantify migration cost in terms of a metric
Our predictor: past cache accesses
Our prediction: future cache misses
5
Methodology
Sniper simulation framework
Clustered architecture
Homogeneous cores
Clustered L1 caches (shared by 4 cores)
Shared L2 cache
Two benchmarks
Radix (SPLASH-2) (compute intensive)
x264 (PARSEC) (memory intensive)
Variables
Time interval before and after migration
Cache size and parameters
6
Results
Benchmark Comparison
350
300
250
Additional Misses
200
150
100
50
0
0
10000
20000
30000
40000
-50
50000
60000
70000
80000
90000
Previous Accesses
Radix
x264
Log. (Radix)
Log. (x264)
7
Results
Cache Comparison
450
400
350
Additional Misses
300
250
200
150
100
50
0
0
10000
20000
30000
-50
40000
50000
60000
70000
80000
90000
Previous Accesses
32k 4way
32k 8way
16k 4way
16k 8way
Log. (32k 4way)
Log. (32k 8way)
Log. (16k 4way)
Log. (16k 8way)
8
Observations
Large cost variance between applications
Compute-intensive v. memory-intensive
Radix: 0.115 accesses per instruction
x264: 0.368 accesses per instruction
Some sensitivity to cache parameters
Smaller cache may evict contents anyway
9
Future Work
Refine accuracy of cost mechanism
Correlate future cost with past predictors
Incorporate migration cost metric into scheduling algorithm
10
Q&A
11