Milestone Slides

Download Report

Transcript Milestone Slides

Migration Cost Aware Task Scheduling
18-743 Milestone
Shraddha Joshi, Brian Osbun
10/24/2013
Outline







Problem description
Approach
Milestone I
Methodology
Results
Observations
Future Work
2
Problem Description


Dynamic schedulers allow task migration during program
execution
Migrating threads to new cores can have hidden costs
 Cold cache misses
 Overhead of moving architectural state
 Congestion on interconnect during transfer


Most task schedulers ignore migration overhead
Problem statement: quantify and consider the task migration
cost when evaluating scheduling possibilities
3
Approach

Architectural transfer is essentially fixed cost
 Can be modeled independently


Cache effects have a larger and variable cost
Published theory requires static analysis*





Determination of “useful” blocks in cache
Reaching memory blocks (RMB): all blocks that may be in cache at a point
Live memory blocks (LMB): all blocks that may be referenced before eviction
Usefulness is intersection of RMB and LMB
Dynamic analysis
 Need prediction!
 For now: everything’s useful!
*Source-Hardy et. al. Estimation of Cache Related Migration Delays for Multi-Core Processors with
Shared Instruction Caches
4
Milestone I

To be able to quantify migration cost in terms of a metric
 Our predictor: past cache accesses
 Our prediction: future cache misses
5
Methodology


Sniper simulation framework
Clustered architecture
 Homogeneous cores
 Clustered L1 caches (shared by 4 cores)
 Shared L2 cache

Two benchmarks
 Radix (SPLASH-2) (compute intensive)
 x264 (PARSEC) (memory intensive)

Variables
 Time interval before and after migration
 Cache size and parameters
6
Results
Benchmark Comparison
350
300
250
Additional Misses
200
150
100
50
0
0
10000
20000
30000
40000
-50
50000
60000
70000
80000
90000
Previous Accesses
Radix
x264
Log. (Radix)
Log. (x264)
7
Results
Cache Comparison
450
400
350
Additional Misses
300
250
200
150
100
50
0
0
10000
20000
30000
-50
40000
50000
60000
70000
80000
90000
Previous Accesses
32k 4way
32k 8way
16k 4way
16k 8way
Log. (32k 4way)
Log. (32k 8way)
Log. (16k 4way)
Log. (16k 8way)
8
Observations

Large cost variance between applications
 Compute-intensive v. memory-intensive
 Radix: 0.115 accesses per instruction
 x264: 0.368 accesses per instruction

Some sensitivity to cache parameters
 Smaller cache may evict contents anyway
9
Future Work



Refine accuracy of cost mechanism
Correlate future cost with past predictors
Incorporate migration cost metric into scheduling algorithm
10
Q&A
11