Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J.

Download Report

Transcript Fun Size Your Data: Using Statistical Techniques to Efficiently Compress and Exploit Benchmarking Results David J.

Fun Size Your Data:
Using Statistical Techniques to
Efficiently Compress and Exploit
Benchmarking Results
David J. Lilja
Electrical and Computer Engineering
University of Minnesota
[email protected]
Electrical and Computer Engineering
The Problem
Benchmark
programs
 We can generate heaps of data


Heaps o’ data
445 446 397 226
388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022
827 572 597 364 …
But it’s noisy
Too much to understand or use efficiently
Electrical and Computer Engineering
A Solution
Statistical design of experiments techniques
 Compress complex benchmark results
 Exploit the results in interesting ways
 Extract new insights
Demonstrate using
 Microarchitecture-aware floorplanning
 Benchmark classification
Electrical and Computer Engineering
Why Do We Need Statistics?
 Draw meaningful conclusions in the presence of
noisy measurements
 Noise filtering
 Aggregate data into meaningful information
 Data compression
Heaps o’ data
445 446 397 226
388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022
827 572 597 364 …
x  ...
Electrical and Computer Engineering
Why Do We Need Statistics?
 Draw meaningful conclusions in the presence of
noisy measurements
 Noise filtering
 Aggregate data into meaningful information
 Data compression
Heaps o’ data
445 446 397 226
388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022
827 572 597 364 …
x  ...
Electrical and Computer Engineering
Design of Experiments for Data
Compression
A
B
V1
√
√
V2
√
√
V3
√
V4
C
√
√
√
√
445 446 397 226
388 3445 188 1002
47762 432 54 12
98 345 2245 8839
77492 472 565 999
1 34 882 545 4022
827 572 597 364 …
Effects of each input
 A, B, C
Effects of interactions
 AB, AC, BC, ABC
Electrical and Computer Engineering
Types of Designs of Experiments
 Full factorial design with replication
 O(vm) experiments = O(43)
 Fractional factorial designs
 O(2m) experiments = O(23)
 Multifactorial design (P&B)
 O(m) experiments = O(3)
 Main effects only – no interactions
A
B
V1
√
√
V2
√
√
V3
√
V4
C
√
√
√
√
 m-factor resolution x designs
 k O(2m) experiments = k O(23)
 Selected interactions
Electrical and Computer Engineering
Example:
Architecture-Aware
Floor-Planner
V. Nookala, S. Sapatnekar, D. Lilja, DAC’05.
Electrical and Computer Engineering
Motivation
Imbalance between device and wire
delays
Global wire delays > system clock
cycle in nanometer technology Layout
wire
Electrical and Computer Engineering
Solution
 Wire-pipelining
Layout
 If delay > a clock cycle → insert flipflops along a wire
 Several methods for optimal FF insertion
on a wire
wire
FF
• Li et al. [DATE 02]
• Cocchini et al. [ICCAD 02]
• Hassoun et al. [ICCAD 02]
 But what about the performance impact of the
pipeline delays?
Electrical and Computer Engineering
Impact on Performance
Execution time = num-instr * cycles/instr (CPI) * cycle-time
Wire-pipelining



Electrical and Computer Engineering
Impact on Performance
Execution time = num-instr * cycles/instr (CPI) * cycle-time
Wire-pipelining



 Key idea
 Some buses are critical
 Some can be freely pipelined without (much) penalty
Electrical and Computer Engineering
Change Objective Function
Execution time = num-instr * cycles/instr (CPI) * cycle-time
Wire-pipelining



 Traditional physical design objectives
 Minimize area, total wire length, etc.
 New objective
 Optimize only throughput critical wires to maximize
overall performance
Electrical and Computer Engineering
Conventional Microarchitecture
Interaction with Floor Planner
µ-arch
Benchmarks
Simulation
Methodology
CPI info
Frequency
Physical Design
Electrical and Computer Engineering
Microarchitecture-aware Physical
Design
µ-arch
Benchmarks
Simulation
Methodology
CPI info
Frequency
Physical Design
Layout
 Incorporate wire-pipelining models into the simulator
 Extra pipeline stages in processor
 Simulator needs to adjust operation latencies
Electrical and Computer Engineering
But There are Problems
µ-arch
Benchmarks
Simulation
Methodology
CPI info
Frequency
Physical Design
Layout
 Simulation is too slow
 2000-3000 instructions per simulated instruction
 Numerous benchmark programs to consider
 Exponential search space
 Thousands of combinations tried in physical design step
Electrical and Computer Engineering
Design of Experiments Methodology
µ-arch
Design of Experiments based
Simulation Methodology
benchmarks
MinneSPEC
Reduced input sets
# Simulations is
linear in the number
of buses (if no interactions)
Bus, interaction
weights
benchmarks
Frequency
Floorplanning
Layout
Validation
Electrical and Computer Engineering
Related Floorplanning Work
 Simulated Annealing (SA)
 CPI look up table [Liao et al, DAC 04]
 Bus access ratios from simulation profiles
 Minimize the weighted sum of bus latencies
[Ekpanyapong et al, DAC 04]
 Throughput sensitivity models for a selected few
critical paths
 Limited sampling for a large solution space
[Jagannathan et al, ASPDAC 05]
 Our approach
 Design of experiments to identify criticality of each bus
Electrical and Computer Engineering
Microarchitecture and factors
 22 buses → 19 factors in
experimental design
 Some factors model multiple
buses
Fetch
Decode
IADD2
RUU
REG
IADD3
IMULT
LSQ
BPRED
DL1
IL1
ITLB
IADD1
L2
FADD
DTLB
FMULT
Electrical and Computer Engineering
2-level Resolution III Design
 2-levels for each factor
 Lowest and highest possible values (range)
Latency range of buses
 Min = 0
 Max = Chip corner-corner wire latency
19 factors  32 simulations (nearest power of 2)
 Captured by a design matrix (32x19)
• 32 rows - 32 simulations
• 19 columns - Factor values
Electrical and Computer Engineering
Experimental setup
 Nine SPEC 2000 benchmarks
 MinneSPEC reduced input sets
 SimpleScalar simulator
 Floorplanner -- PARQUET
 Simulated annealing based
 Objective function
Minimize the weighted sum of bus latencies
 Secondarily minimize aspect ratio and area
Electrical and Computer Engineering
Comparisons
Case
Description
SFP
Our “statistical floorplanner”
acc
Access ratios from [Ekpanyapong et al, DAC 04]
minWL
Traditional floorplanning
Electrical and Computer Engineering
Typical Results for Single Benchmark
Electrical and Computer Engineering
Averaged Over All Benchmarks
Compared to acc
 3-7% point
improvement
Better improvements
over acc at higher
frequencies
SFP-comb ≈ SFP
(within about 1-3%
points)
Electrical and Computer Engineering
Summary
Use statistical design of experiments
 Compress benchmark data into critical bus weights
Used by microarchitecture-aware floorplanner
 Optimizes insertion of pipeline delays on wires to
maximize performance
Extend methodology for other critical objectives
 Power consumption
 Heat distribution
Electrical and Computer Engineering
Collaborators and Funders
Vidyasagar Nookala
Joshua J. Yi
Sachin Sapatnekar
Semiconductor Research Corporation (SRC)
Intel
IBM
Electrical and Computer Engineering