Building a synthesizable x86

Download Report

Transcript Building a synthesizable x86

Building a Synthesizable x86
Eriko Nurvitadhi, James C. Hoe, Babak Falsafi
{enurvita,jhoe,babak}@ece.cmu.edu
SIMFLEX/PROTOFLEX
Computer Architecture Lab at
http://www.ece.cmu.edu/~simflex/
Motivation
• Build synth x86 func model for prototyping


most widely-used ISA
Intel won’t give out theirs
• Problem: a very complicated ISA

many instructions
 482 instructions total (**ADD has 14 variations)

many individually complicated instructions
 PUSHAD – push all GP registers to stack

many under-specified instructions
 LOADALL inst; BCD operation flag updates
• Also must be maintainable & extensible
 return on investment
June 22, 2006
2
Overcoming Complexity
• 4 key ingredients in our approach




working SW simulator as design spec
simplified multi-cycle datapath
high-level HDL
HW-SW co-simulation validation & evaluation
• What we have today. . .


an x86 functional model in Bluespec
all real-mode general-purpose insts
 includes I/O instructions!



June 22, 2006
boots FreeDOS OS in co-simulation testbench
synthesizes to 85% of a Virtex II Pro 70 FPGA
Max 10 MIPS (based on synthesis + simulation)
3
Outline
• Introduction
• Our Approach
• Status and Results
• Discussions and Future work
June 22, 2006
4
Functional View of an ISA
Inst_1
ACT
ACT
ACT
ACT
ACT
Inst_2
Inst_n
ACT
beh_1 beh_2
beh_m
functional model
• ISA = architectural states + instructions
• instruction = set of alternate behaviors


e.g., due to different addressing modes
x86 has 482 insts but ~1000 behaviors
• behavior = sequence of actions that read
& alter states
June 22, 2006
5
SW x86 Sim as ISA Spec
• Simulator source code =
precise and executable design spec
• We use Bochs
(http://bochs.sourceforge.net/)

open-source

code structure fits our high-level ISA view
 i.e.,

explicit architecture state declaration
one instruction behavior  C++ function
(Essentially) complete x86 functionalities
 simulate complete PC system
 run various OSs (e.g., Linux, Win XP)
 support 386 through Pentium Pro
June 22, 2006
6
Multi-cycle Implementation
• Sequential, multi-cycle execution
Start
Fetch
Decode
Execute
Commit
Finish
• Top-level view
arch, aux
states
decoder
FU
FU
FU
FU
Mem accesses
I/O operations
FU
x86 functional model
June 22, 2006
7
Bluespec Design Capture
• Explicit state declaration


x86 architectural states
auxiliary simulation states used by Bochs
• Predicated atomic rules

one rule  one action in our ISA view
• Maintainability & extensibility


new behavior: add rules
changing behavior: add/modify rules
• Optimizations (low-level)


June 22, 2006
reduce logic: reuse + combine rules
reduce critical path delay: split rules
8
HW-SW co-simulation for
Validation and Evaluation
• Virtually “plug-in” our model into a PC


execute Bochs to provide reference behavior
simulate RTL along side the simulated Bochs PC
• For validation and performance (CPI)
eval
Bochs
Bochs
RTL
CPU
RTL
==
CPU
MEM
Validation
June 22, 2006
I/Os
CPU
MEM
I/Os
Performance Evaluation
9
Co-Simulation Testbench
Bochs src code
Manual coding
Bluespec x86
Bluespec
compilation
Automated
Workloads on
Bochs
Verilog x86
C++ conversion
(Verilator)
C++ x86
June 22, 2006
Bochs simulation
Co-simulation
Traces
Validation and performance evaluation results
10
Outline
• Introduction
• Our Approach
• Status and Results
• Discussions and Future work
June 22, 2006
11
Implementation Progress
• Implemented ISA subset

all real-mode general purpose instructions
 166 insts, 369 inst behaviors

compared to complete x86
 482 insts, ~1000 inst behaviors
• Synthesis



June 22, 2006
convert Bluespec to synthesizable Verilog
Xilinx ISE 7.1, Virtex II Pro 70 (FPGA on BEE2)
results: 98 MHz, 28K Slices (85% util)
12
Co-simulation Results
• Validation

validated our model w/ FreeDOS bootup traces
 tested first 140M dynamic instructions
 exercised 183 inst behaviors
• Performance Evaluation
also with FreeDOS bootup traces
20
CPI
15
10
5
0
1
5
10
Memory latency (cycles)
June 22, 2006
MIPS (@ 98 MHz)

12
10
8
6
4
2
0
1
5
10
Memory latency (cycles)
13
A Complete x86?
• To finish the x86 model


can be done, but takes effort
consumes a lot of FPGA resources
• Do we really need all of it?
a workload uses only a subset of the ISA
 some insts used more often than others
 parts of ISA is never or rarely used

• PROTOFLEX migration


June 22, 2006
combine FPGA & simulation
model necessary subset in HW, the rest in SW
14
Future Work
• Short-term (Fall’06)


implement protected-mode support
validate/evaluate w/ more workloads
 Linux, SPEC-CPU, commercial apps (DB2)

deployment on the BEE2 board
• Long-term


full-system prototype execution
architectural exploration
SIMFLEX/PROTOFLEX
Computer Architecture Lab at
http://www.ece.cmu.edu/~simflex/
June 22, 2006
15