sim-alpha: A Validated Alpha 21264 Simulator

Download Report

Transcript sim-alpha: A Validated Alpha 21264 Simulator

sim-alpha: A Validated,
Execution-Driven Alpha 21264
Simulator
Rajagopalan Desikan, Doug Burger,
Stephen Keckler, Todd Austin
Introduction


sim-alpha: execution-driven simulator
Execution-driven simulation is the most
accurate simulation technique


Detailed simulation of the memory system and the
processor pipeline are done simultaneously.
It models the implementation constraints and
the performance low-level features in Alpha
21264.
The sim-alpha goals



Extend the SimpleScalar tool set to model an
existing microprocessor (EV6
microarchitecture)
Compare the simulator against actual
hardware for accurate modeling
Release the simulator for use by researchers
studying extensions to existing
implementations
Code overview
Code structure


Code for each pipeline stage in a separate .c
file
Each .c file has corresponding .h file
containing function prototypes, constants,
and extern statements for global variables.
Microprocessor features







Issue width of 6 instructions (4 integer and 2 floating
point) during each CPU cycle from a 20-entry integer
issue queue and a 15-entry floating point issue queue.
80-entry reorder buffer.
4 integer units with an 80-entry register file. These units
are called sub-clusters and operate on specific classes
of instructions.
2 floating-point units with a 72-entry register file.
32-entry load queue
32-entry store queue
Alpha 21264 tournament predictor with local, global and
choice predictors
Microarchitectural features

Line predictor


Way predictor


Predicts which set is being accessed
Partitioned integer execution core


Predicts I-cache line to be accessed in next cycle
2 clusters: each one has a copy of integer register file and
2 subclusters (lower and upper)
Static slotting

Instructions are statically assigned to the 2 subclusters (slot
stage) and then the 21264 dynamically chooses the cluster
during issue.
Microarchitectural features (cont.)

Load use speculation


Different memory traps



Issuing of instructions dependent on a load assuming a
load hit. If load misses  instructions are squashed and reissued
Load-load trap: when newer load issues before earlier load
to same address
Load-store trap: when newer load issues before earlier
store to same address
stWait table

1024 one bit table, indexed by PC, to stall issue of loads
causing order traps. Processor does not issue a load for
which the stWait bit is set until previous stores have issued.
Basic structures









Fetch queue
Slot queue
Mapping table (logical to physical register)
Reorder buffer
Issue queue
Load queue
Store queue
Ready queue
Event queue (events to free issue queue entry 2
cycles after issue and signal completion of execution
of instruction)
Sim-alpha internals

Sim-alpha is execution-driven, so it executes
instructions down the mis-speculated path in
the same way an actual processor would
execute them.


captures the behavior of mis-speculated
instructions, but
The correct path is known only at commit time and
cannot be simulated easily.
EV6 Pipeline
Pipeline stages

Fetch stage:



Instruction cache access
Fetch_width number of instructions per cycle
(default:4)
Slot stage:


Static assignment of instructions to either upper or
lower subclusters.
Control instructions access the branch predictor.
Pipeline stages (cont.)


Map stage:
 Identifies the input and output registers
 Checks for availability of reorder buffer entry, integer or
floating point issue queue entry, physical output register and
load or store queue entry (if instruction is load or store).
 If input physical registers are ready, instruction is placed in
ready queue.
Issue stage:
 Picks instructions from ready queues, checks the availability of
functional units and issues the instruction to FUs. Register
read latency is charged here. Events are set up for queue
entry release and instruction completion.
Pipeline stages (cont.)

Writeback stage:




Wakes up the dependent instructions when a producing
instruction completes.
Load instructions access the D-cache
Mispredictions are indicated in the corresponding reorder
buffer entry
Commit stage


Retires instructions from reorder buffer
Examines the head of reorder buffer for mispredictions and
traps and flushes the pipeline in these cases.
Simulator Performance



The simulator has been validated compared to a
hardware 21264 implementation, and has achieved
a 2% error across a suite of microbenchmarks
designed to stress various microarchitectural
features.
The error across the 10 SPECINT 2000 benchmarks
is 6.6% and the 12 SPECFP 2000 benchmarks is
21%
The greater error in floating point benchmarks is due
to insufficient modeling of floating point pipeline and
inaccuracies in memory system implementation.
Summary


sim-alpha provides a flexible, validated
baseline for researchers to evaluate new
architectural enhancements
Option to turn off constraints and change
parameters such as fetch width, issue queue
sizes, reorder buffer size allows the user to
study their influence.