sim-alpha: A Validated Alpha 21264 Simulator
Download
Report
Transcript sim-alpha: A Validated Alpha 21264 Simulator
sim-alpha: A Validated,
Execution-Driven Alpha 21264
Simulator
Rajagopalan Desikan, Doug Burger,
Stephen Keckler, Todd Austin
Introduction
sim-alpha: execution-driven simulator
Execution-driven simulation is the most
accurate simulation technique
Detailed simulation of the memory system and the
processor pipeline are done simultaneously.
It models the implementation constraints and
the performance low-level features in Alpha
21264.
The sim-alpha goals
Extend the SimpleScalar tool set to model an
existing microprocessor (EV6
microarchitecture)
Compare the simulator against actual
hardware for accurate modeling
Release the simulator for use by researchers
studying extensions to existing
implementations
Code overview
Code structure
Code for each pipeline stage in a separate .c
file
Each .c file has corresponding .h file
containing function prototypes, constants,
and extern statements for global variables.
Microprocessor features
Issue width of 6 instructions (4 integer and 2 floating
point) during each CPU cycle from a 20-entry integer
issue queue and a 15-entry floating point issue queue.
80-entry reorder buffer.
4 integer units with an 80-entry register file. These units
are called sub-clusters and operate on specific classes
of instructions.
2 floating-point units with a 72-entry register file.
32-entry load queue
32-entry store queue
Alpha 21264 tournament predictor with local, global and
choice predictors
Microarchitectural features
Line predictor
Way predictor
Predicts which set is being accessed
Partitioned integer execution core
Predicts I-cache line to be accessed in next cycle
2 clusters: each one has a copy of integer register file and
2 subclusters (lower and upper)
Static slotting
Instructions are statically assigned to the 2 subclusters (slot
stage) and then the 21264 dynamically chooses the cluster
during issue.
Microarchitectural features (cont.)
Load use speculation
Different memory traps
Issuing of instructions dependent on a load assuming a
load hit. If load misses instructions are squashed and reissued
Load-load trap: when newer load issues before earlier load
to same address
Load-store trap: when newer load issues before earlier
store to same address
stWait table
1024 one bit table, indexed by PC, to stall issue of loads
causing order traps. Processor does not issue a load for
which the stWait bit is set until previous stores have issued.
Basic structures
Fetch queue
Slot queue
Mapping table (logical to physical register)
Reorder buffer
Issue queue
Load queue
Store queue
Ready queue
Event queue (events to free issue queue entry 2
cycles after issue and signal completion of execution
of instruction)
Sim-alpha internals
Sim-alpha is execution-driven, so it executes
instructions down the mis-speculated path in
the same way an actual processor would
execute them.
captures the behavior of mis-speculated
instructions, but
The correct path is known only at commit time and
cannot be simulated easily.
EV6 Pipeline
Pipeline stages
Fetch stage:
Instruction cache access
Fetch_width number of instructions per cycle
(default:4)
Slot stage:
Static assignment of instructions to either upper or
lower subclusters.
Control instructions access the branch predictor.
Pipeline stages (cont.)
Map stage:
Identifies the input and output registers
Checks for availability of reorder buffer entry, integer or
floating point issue queue entry, physical output register and
load or store queue entry (if instruction is load or store).
If input physical registers are ready, instruction is placed in
ready queue.
Issue stage:
Picks instructions from ready queues, checks the availability of
functional units and issues the instruction to FUs. Register
read latency is charged here. Events are set up for queue
entry release and instruction completion.
Pipeline stages (cont.)
Writeback stage:
Wakes up the dependent instructions when a producing
instruction completes.
Load instructions access the D-cache
Mispredictions are indicated in the corresponding reorder
buffer entry
Commit stage
Retires instructions from reorder buffer
Examines the head of reorder buffer for mispredictions and
traps and flushes the pipeline in these cases.
Simulator Performance
The simulator has been validated compared to a
hardware 21264 implementation, and has achieved
a 2% error across a suite of microbenchmarks
designed to stress various microarchitectural
features.
The error across the 10 SPECINT 2000 benchmarks
is 6.6% and the 12 SPECFP 2000 benchmarks is
21%
The greater error in floating point benchmarks is due
to insufficient modeling of floating point pipeline and
inaccuracies in memory system implementation.
Summary
sim-alpha provides a flexible, validated
baseline for researchers to evaluate new
architectural enhancements
Option to turn off constraints and change
parameters such as fetch width, issue queue
sizes, reorder buffer size allows the user to
study their influence.