Building a synthesizable x86
Download
Report
Transcript Building a synthesizable x86
Building a Synthesizable x86
Eriko Nurvitadhi, James C. Hoe, Babak Falsafi
{enurvita,jhoe,babak}@ece.cmu.edu
SIMFLEX/PROTOFLEX
Computer Architecture Lab at
http://www.ece.cmu.edu/~simflex/
Motivation
• Build synth x86 func model for prototyping
most widely-used ISA
Intel won’t give out theirs
• Problem: a very complicated ISA
many instructions
482 instructions total (**ADD has 14 variations)
many individually complicated instructions
PUSHAD – push all GP registers to stack
many under-specified instructions
LOADALL inst; BCD operation flag updates
• Also must be maintainable & extensible
return on investment
June 22, 2006
2
Overcoming Complexity
• 4 key ingredients in our approach
working SW simulator as design spec
simplified multi-cycle datapath
high-level HDL
HW-SW co-simulation validation & evaluation
• What we have today. . .
an x86 functional model in Bluespec
all real-mode general-purpose insts
includes I/O instructions!
June 22, 2006
boots FreeDOS OS in co-simulation testbench
synthesizes to 85% of a Virtex II Pro 70 FPGA
Max 10 MIPS (based on synthesis + simulation)
3
Outline
• Introduction
• Our Approach
• Status and Results
• Discussions and Future work
June 22, 2006
4
Functional View of an ISA
Inst_1
ACT
ACT
ACT
ACT
ACT
Inst_2
Inst_n
ACT
beh_1 beh_2
beh_m
functional model
• ISA = architectural states + instructions
• instruction = set of alternate behaviors
e.g., due to different addressing modes
x86 has 482 insts but ~1000 behaviors
• behavior = sequence of actions that read
& alter states
June 22, 2006
5
SW x86 Sim as ISA Spec
• Simulator source code =
precise and executable design spec
• We use Bochs
(http://bochs.sourceforge.net/)
open-source
code structure fits our high-level ISA view
i.e.,
explicit architecture state declaration
one instruction behavior C++ function
(Essentially) complete x86 functionalities
simulate complete PC system
run various OSs (e.g., Linux, Win XP)
support 386 through Pentium Pro
June 22, 2006
6
Multi-cycle Implementation
• Sequential, multi-cycle execution
Start
Fetch
Decode
Execute
Commit
Finish
• Top-level view
arch, aux
states
decoder
FU
FU
FU
FU
Mem accesses
I/O operations
FU
x86 functional model
June 22, 2006
7
Bluespec Design Capture
• Explicit state declaration
x86 architectural states
auxiliary simulation states used by Bochs
• Predicated atomic rules
one rule one action in our ISA view
• Maintainability & extensibility
new behavior: add rules
changing behavior: add/modify rules
• Optimizations (low-level)
June 22, 2006
reduce logic: reuse + combine rules
reduce critical path delay: split rules
8
HW-SW co-simulation for
Validation and Evaluation
• Virtually “plug-in” our model into a PC
execute Bochs to provide reference behavior
simulate RTL along side the simulated Bochs PC
• For validation and performance (CPI)
eval
Bochs
Bochs
RTL
CPU
RTL
==
CPU
MEM
Validation
June 22, 2006
I/Os
CPU
MEM
I/Os
Performance Evaluation
9
Co-Simulation Testbench
Bochs src code
Manual coding
Bluespec x86
Bluespec
compilation
Automated
Workloads on
Bochs
Verilog x86
C++ conversion
(Verilator)
C++ x86
June 22, 2006
Bochs simulation
Co-simulation
Traces
Validation and performance evaluation results
10
Outline
• Introduction
• Our Approach
• Status and Results
• Discussions and Future work
June 22, 2006
11
Implementation Progress
• Implemented ISA subset
all real-mode general purpose instructions
166 insts, 369 inst behaviors
compared to complete x86
482 insts, ~1000 inst behaviors
• Synthesis
June 22, 2006
convert Bluespec to synthesizable Verilog
Xilinx ISE 7.1, Virtex II Pro 70 (FPGA on BEE2)
results: 98 MHz, 28K Slices (85% util)
12
Co-simulation Results
• Validation
validated our model w/ FreeDOS bootup traces
tested first 140M dynamic instructions
exercised 183 inst behaviors
• Performance Evaluation
also with FreeDOS bootup traces
20
CPI
15
10
5
0
1
5
10
Memory latency (cycles)
June 22, 2006
MIPS (@ 98 MHz)
12
10
8
6
4
2
0
1
5
10
Memory latency (cycles)
13
A Complete x86?
• To finish the x86 model
can be done, but takes effort
consumes a lot of FPGA resources
• Do we really need all of it?
a workload uses only a subset of the ISA
some insts used more often than others
parts of ISA is never or rarely used
• PROTOFLEX migration
June 22, 2006
combine FPGA & simulation
model necessary subset in HW, the rest in SW
14
Future Work
• Short-term (Fall’06)
implement protected-mode support
validate/evaluate w/ more workloads
Linux, SPEC-CPU, commercial apps (DB2)
deployment on the BEE2 board
• Long-term
full-system prototype execution
architectural exploration
SIMFLEX/PROTOFLEX
Computer Architecture Lab at
http://www.ece.cmu.edu/~simflex/
June 22, 2006
15