HASim Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts Institute of Technology* Computer Science and Artificial.

Download Report

Transcript HASim Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts Institute of Technology* Computer Science and Artificial.

HASim
Implementing a Functional/Timing
Partitioned
Microprocessor Simulator with an FPGA
Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind*
Massachusetts Institute of Technology*
Computer Science and Artificial Intelligence Lab
Cambridge, MA
Intel Corporation VSSAD†
Hudson, MA
HASim - Why?
Micro-architectural Simulations are
Important
Better estimates for expected outcomes
SW Simulations are slow to run
100s of KIPs
HW “simulations” take a long time to
design
Underlying Beliefs
Modeling something is generally easier
than designing it
Don't need to be totally faithful to design for
what you need
It's easy to make modeling mistakes
Need to insert checks to assure you didn't
cheat
Appropriate partitioning improves reuses
Split computational aspects from timing
HASim – What?
HASim is a partitioned hardware
simulation framework
Two Partitions:
Functional (FP) – Executes instructions
Timing (TP) – Responsible for determining the
timing of the emulated machine
HASim: The Picture
Timing Partition
Token
Gen
Fet
Dec
Exe
Mem
Functional
Partition
Memory
Bypassing Unit
RegFile
LCom
GCom
Functional Partition Zoom – In
TP Request to do
Instruction i
<Token>
Response to TP’s
Request
<Token,
DependencyInfo>
Info From Prev.
Stage
<Token,
Inst>
Token
Table
Information to
Next Stage
<Token, DecodedInst>
Decoder
Unit
To MapTable (in
BypassUnit)
Functional Partition - Execute
<Token>
<Token, Result Value>
<Token, DecodedInst>
Token
Table
<Token, ExecedInst>
Execute
To RegFile (in
BypassUnit)
Automated Checks
We'd like our model to:
Obey Causality of data usage
No reading values before they're created
Meet expected times for different stages
e.g. Decode of an instruction completes takes at
least 1 cycle
Decode should not take more than two cycles
Want very these very simple checks
Let's have the FP verify these!
Verifying Casuality
All execution interactions to the functional
model are provided
Annotate all data with emulated clock it
was created on
FP checks time on accesses of data
Leveraging FP structure for Timing
Sometimes the best way to model
something is to just make it
Use the target designs cache structure
as the FP's cache structure
Can just measure the number of target ticks
May need to record some more information
(where misses occurred) to get appropriate
timing
Similar Ideas - FAST
FAST – similar underlying beliefs
Differences:
SW vs. HW functional partitions
decoupled partitions vs. tight coupling
some additional correctness checks needed
Not clear which approach is more
effective
Similar Ideas - UNUM
UNUM – Another parameterized HW
framework
Much more emphasis on HW quality and
structure
Much more work to generate
“Believable” low-level values
Aimed later in the design selection cycle
Current Progress
Initial functional partition
Singlescalar OOO design
Simple RISC ISA
Physical Reg File
Fast branch rewinds
Simple Pipeline Timing Partition
Future Progress
Porting a real ISA to design
x86 (w/ µops)
More complicated timing models
Reorder Buffer designs
Large Cache simulations
Thanks!
{ndave, pellauer, arvind}@csail.mit.edu
[email protected]