Read more...

Download Report

Transcript Read more...

DISSERTATION RESEARCH
PLAN
Mitesh Meswani
Outline

Dissertation Research Update
 Previous
Approach and Results
 Modified Research Plan
 Identifying Resources
 Identifying Signatures
 Performance Counters for profiling
 Representative tracing and validation
Previous Methodology




Trace Selection: Trace the steady state execution of
the benchmark suite using CPI for measuring
representativeness, One trace per benchmark.
Simulate the traces for different SMT knob settings
recording the best setting for each pair
Use regression modeling techniques to generate an
analytical prediction model to predict best settings
for a pair
Prove model effectiveness for predicting settings for
traces from other benchmarks
Recap of Previous Results

Models using Decision Trees for SPEC CPU2000 and
Stream
 Prediction
of SMT mode: 97.5%
 Prediction of SMT Thread Priority: 83%
Modified Plan Summary




Represent the use of relevant shared resources by a
benchmark
Identify signatures of shared resource usage within
benchmarks using performance counters
Use traces that represent signatures of shared
resource usage that can cover 80% of the
benchmarks execution
Finally, identify the best SMT knob settings of the
representative traces
Shared Resources


Shared Resources (seven): TLB, Cache Memory (L2,
L3), Branch Unit, FP Unit, FXU Unit, Compareregister Unit, Branch prediction hardware (history
table)
How many resources to consider? :- Analyze current
traces to eliminate resources contribute less than a
threshold value to cycles spent in shared resources.
 Compare-register
unit is not significant
 Branch unit is also not significant
Signatures

How many? :A
resource may have mild, moderate, or high
contribution, to cycles spent in shared resources
 Idea: If we have five resources, equal contribution
would mean 100/5 = approx 20% of cycles per
resource, using this as basis
 Mild
(1% to 15%)
 Moderate: 16% to 24%
 High : Greater than 24%
Finding Signatures



Profile the benchmark execution to find cycles spent
in the monitored shared resources
Using performance counters sample the counters
periodically
Categorize the benchmark execution (SPEC
CPU2000) in one of the possible permutation
Finding signatures Continued

Profiling benchmark execution:
 Only
six counters allowed per execution
 What are the Counts for a sample period? :-Merge
them from different executions ?
 Use the highest sampling rate ?
Perf Counters to collect data

Identified Counters
 FP:
Completion stalls due to FPU (CMPLU_STALLS_FPU)
 FXU: Completion stalls due to FXU
(CMPLU_STALLS_FXU)

Derived Counters:
 LSU
Stalls= Total Stalls in LSU – Stalls due to d-cache
miss – stalls due to d-tlb miss
Perf Counters to collect data continued

Unsolved TLB:



Caches




Total d-tlb misses, Total i-tlb misses , miss resolution sites not known
Total Cycles spent for accessing d-tlb known, includes cost of hits and
misses
L2 , L3 hit for data and instruction known,
Maybe greater than actual penalty, execution overlaps misses, or miss
down misspredicted branches
Maybe use d-cache miss penalty and i-cache miss penatly on POWER5
which are counted only if completion is stalled.
Branch History

Affects prediction, Counter available to count cycles missprediction stalls
completion
Representative Traces



Collect traces if required, that represent the
signatures found in benchmark profiling
Use the performance data from simulation of single
traces to verify the signatures
Collect data for evaluating SMT-knobs on
representative traces
Validation


Use Scientific applications to verify if they are
covered by signatures for 80% of their execution
TO DO Identify test applications.