4-Expoiting-Asymmetr..
Download
Report
Transcript 4-Expoiting-Asymmetr..
-Sam Ganzfried
-Ryan Sukauye
-Aniket Ponkshe
Outline
Effects of asymmetry and how to handle them
Design Space Exploration for Core Architecture
Accelerating ‘Critical Sections’
Asymmetric Chip Multiprocessors
Most current programs assume that all computation
cores are equal
When cores are not equal, can negatively impact
application stability and scalability
Sources of asymmetry:
Process Variation
Frequency Scaling
Explicit in processor design
Ways to Improve Stability
Asymmetry-aware scheduler
Asymmetry-aware applications
Fine grained threading
Simulation Results
Outline
Effects of asymmetry and how to handle them
Design Space Exploration for Core Architecture
Accelerating ‘Critical Sections’
Prior Heterogeneous Approaches
Architecture given:
Existing architectures
Different generations of same processor family
Scaled editions of same processor (e.g., Balakrishnan et
al., ‘05)
Monotonicity:
Total ordering among the cores in terms of performance
that remains the same for all applications (e.g., EV6 vs.
EV5).
Greatly outperformed homogeneous CMP’s.
Increasing the Design Space
[Kumar et al., ’06]
Full space of heterogeneous processors is huge:
Can change various architectural parameters on single
processor
Combined performance of multiple different cores on
arbitrary permutations of the applications.
Simplifying assumptions:
Separability: performance is sum of individual
performances.
Good static scheduling of threads to cores.
Only consider 4-core processors.
Private L2 caches.
Methodology
480 possible cores: over 2.2 billion 4-core MPs.
Wide range of area and power budgets.
10 benchmarks for constructing workloads:
E.g., chemistry, chess, combinatorial optimization.
Considered all possible 4-threaded combinations.
250 million cycles of each application on each core.
Evaluated using weighted speedup.
Experimental results
Particular given 4-thread workload:
Best CMP has all cores different.
7% higher throughput over best homogeneous CMP.
16.7% improvement with dynamic mapping.
Workload with given budget:
Advantage of diversity even for all same workloads!
Significant benefit to diversity if either area or power
reasonably constrained.
Best heterogeneous CMP not constructed of cores that
make good general-purpose uniprocessors.
Experiments cont’d
Quantifying inefficiency due to monotonicity
Best non-monotonic design outperformed best
monotonic design by 7.5%.
Outperformed best homogeneous CMP design by 15.4%.
Search techniques
Mostly brute-force search was used (~2.2 billion
options).
Used hill-climbing to speed up search.
11% better than best homogeneous CMP
4.5% worse than exhaustive search.
Outline
Effects of asymmetry and how to handle them
Design Space Exploration for Core Architecture
Accelerating ‘Critical Sections’
Accelerating Critical Sections
Questions
Critical Sections vs. Serial Bottleneck
What would a traditional CMP do on encountering a critical
section?
What does ACS do?
ACS
Advantage:
Lock and shared data reside on cache hierarchy of large core
Downside:
Transfer private data from small core to large core on demand
False serialization
Critical Sections vs. Serial
Bottleneck
b) On a CMP
c) With ACS
a) Serial, Parallel and
Critical Parts
Some results…
# of cores above which ACS gives better performance
Performance Trade Offs in
ACS
Access private data vs. shared
data
Faster Critical Sections vs.
Fewer Threads
ACS…
Provides performance
benefits on increasing
number of cores
Increases scalability
Issues:
False Serialization: Bit Vector at each small core
Fine grained locks: Problem on Saturation
Future Research:
Accommodating Multiple Large Cores
Either for different critical sections
Or for different Operations
More than one application