Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J.
Download
Report
Transcript Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J.
Fun Size Your Data:
Using Statistical Techniques to
Efficiently Compress and Exploit
Benchmarking Results
David J. Lilja
Electrical and Computer Engineering
University of Minnesota
[email protected]
Electrical and Computer Engineering
The Problem
Benchmark
programs
We can generate heaps of data
Heaps o’ data
445 446 397 226
388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022
827 572 597 364 …
But it’s noisy
Too much to understand or use efficiently
Electrical and Computer Engineering
A Solution
Statistical design of experiments techniques
Compress complex benchmark results
Exploit the results in interesting ways
Extract new insights
Demonstrate using
Microarchitecture-aware floorplanning
Benchmark classification
Electrical and Computer Engineering
Why Do We Need Statistics?
Draw meaningful conclusions in the presence of
noisy measurements
Noise filtering
Aggregate data into meaningful information
Data compression
Heaps o’ data
445 446 397 226
388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022
827 572 597 364 …
x ...
Electrical and Computer Engineering
Why Do We Need Statistics?
Draw meaningful conclusions in the presence of
noisy measurements
Noise filtering
Aggregate data into meaningful information
Data compression
Heaps o’ data
445 446 397 226
388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022
827 572 597 364 …
x ...
Electrical and Computer Engineering
Design of Experiments for Data
Compression
A
B
V1
√
√
V2
√
√
V3
√
V4
C
√
√
√
√
445 446 397 226
388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022
827 572 597 364 …
Effects of each input
A, B, C
Effects of interactions
AB, AC, BC, ABC
Electrical and Computer Engineering
Types of Designs of Experiments
Full factorial design with replication
O(vm) experiments = O(43)
Fractional factorial designs
O(2m) experiments = O(23)
Multifactorial design (P&B)
O(m) experiments = O(3)
Main effects only – no interactions
A
B
V1
√
√
V2
√
√
V3
√
V4
C
√
√
√
√
m-factor resolution x designs
k O(2m) experiments = k O(23)
Selected interactions
Electrical and Computer Engineering
Example:
Architecture-Aware
Floor-Planner
V. Nookala, S. Sapatnekar, D. Lilja, DAC’05.
Electrical and Computer Engineering
Motivation
Imbalance between device and wire
delays
Global wire delays > system clock
cycle in nanometer technology Layout
wire
Electrical and Computer Engineering
Solution
Wire-pipelining
Layout
If delay > a clock cycle → insert flipflops along a wire
Several methods for optimal FF insertion
on a wire
wire
FF
• Li et al. [DATE 02]
• Cocchini et al. [ICCAD 02]
• Hassoun et al. [ICCAD 02]
But what about the performance impact of the
pipeline delays?
Electrical and Computer Engineering
Impact on Performance
Execution time = num-instr * cycles/instr (CPI) * cycle-time
Wire-pipelining
Electrical and Computer Engineering
Impact on Performance
Execution time = num-instr * cycles/instr (CPI) * cycle-time
Wire-pipelining
Key idea
Some buses are critical
Some can be freely pipelined without (much) penalty
Electrical and Computer Engineering
Change Objective Function
Execution time = num-instr * cycles/instr (CPI) * cycle-time
Wire-pipelining
Traditional physical design objectives
Minimize area, total wire length, etc.
New objective
Optimize only throughput critical wires to maximize
overall performance
Electrical and Computer Engineering
Conventional Microarchitecture
Interaction with Floor Planner
µ-arch
Benchmarks
Simulation
Methodology
CPI info
Frequency
Physical Design
Electrical and Computer Engineering
Microarchitecture-aware Physical
Design
µ-arch
Benchmarks
Simulation
Methodology
CPI info
Frequency
Physical Design
Layout
Incorporate wire-pipelining models into the simulator
Extra pipeline stages in processor
Simulator needs to adjust operation latencies
Electrical and Computer Engineering
But There are Problems
µ-arch
Benchmarks
Simulation
Methodology
CPI info
Frequency
Physical Design
Layout
Simulation is too slow
2000-3000 instructions per simulated instruction
Numerous benchmark programs to consider
Exponential search space
Thousands of combinations tried in physical design step
Electrical and Computer Engineering
Design of Experiments Methodology
µ-arch
Design of Experiments based
Simulation Methodology
benchmarks
MinneSPEC
Reduced input sets
# Simulations is
linear in the number
of buses (if no interactions)
Bus, interaction
weights
benchmarks
Frequency
Floorplanning
Layout
Validation
Electrical and Computer Engineering
Related Floorplanning Work
Simulated Annealing (SA)
CPI look up table [Liao et al, DAC 04]
Bus access ratios from simulation profiles
Minimize the weighted sum of bus latencies
[Ekpanyapong et al, DAC 04]
Throughput sensitivity models for a selected few
critical paths
Limited sampling for a large solution space
[Jagannathan et al, ASPDAC 05]
Our approach
Design of experiments to identify criticality of each bus
Electrical and Computer Engineering
Microarchitecture and factors
22 buses → 19 factors in
experimental design
Some factors model multiple
buses
Fetch
Decode
IADD2
RUU
REG
IADD3
IMULT
LSQ
BPRED
DL1
IL1
ITLB
IADD1
L2
FADD
DTLB
FMULT
Electrical and Computer Engineering
2-level Resolution III Design
2-levels for each factor
Lowest and highest possible values (range)
Latency range of buses
Min = 0
Max = Chip corner-corner wire latency
19 factors 32 simulations (nearest power of 2)
Captured by a design matrix (32x19)
• 32 rows - 32 simulations
• 19 columns - Factor values
Electrical and Computer Engineering
Experimental setup
Nine SPEC 2000 benchmarks
MinneSPEC reduced input sets
SimpleScalar simulator
Floorplanner -- PARQUET
Simulated annealing based
Objective function
Minimize the weighted sum of bus latencies
Secondarily minimize aspect ratio and area
Electrical and Computer Engineering
Comparisons
Case
Description
SFP
Our “statistical floorplanner”
acc
Access ratios from [Ekpanyapong et al, DAC 04]
minWL
Traditional floorplanning
Electrical and Computer Engineering
Typical Results for Single Benchmark
Electrical and Computer Engineering
Averaged Over All Benchmarks
Compared to acc
3-7% point
improvement
Better improvements
over acc at higher
frequencies
SFP-comb ≈ SFP
(within about 1-3%
points)
Electrical and Computer Engineering
Summary
Use statistical design of experiments
Compress benchmark data into critical bus weights
Used by microarchitecture-aware floorplanner
Optimizes insertion of pipeline delays on wires to
maximize performance
Extend methodology for other critical objectives
Power consumption
Heat distribution
Electrical and Computer Engineering
Collaborators and Funders
Vidyasagar Nookala
Joshua J. Yi
Sachin Sapatnekar
Semiconductor Research Corporation (SRC)
Intel
IBM
Electrical and Computer Engineering