Transcript A Synthesizeable VHDL Model of the 1750A Integer Subset
A Synthesizeable VHDL Model of the 1750A Integer Subset
Robert B. Reese/ Vince Sanders Microsystems Prototyping Lab Mississippi State/NSF Engineering Research Center
[email protected], http://www.erc.msstate.edu/mpl
Introduction
Robert Reese, PhD EE, TAMU 1985
–
MCC Cad Program ‘85-88
–
Joined MSU ECE faculty 09/89
Microsystems Prototyping Lab
–
Associated with MSU/NSF ERC
–
3 Faculty, 2 full time engineers, MS/PhD grad students
–
Current projects: Mixed Signal VLSI, VHDL modeling, ECAD via the WWW
2/98 BR/VS - MPL -1750A 2
Presentation Outline
Goals
1750A Overview
F9450 Implementation
Model Structure
Model Testing
Synthesis Results
What’s left….
2/98 BR/VS - MPL -1750A 3
Goals
Synthesizeable Model of 1750A for Legacy Replacement Experiments
Constraints
– –
Not much time (start 8/15/97, finish 12/31/97) Unfamiliarity with 1750A
Decided on
– – –
Use F9450 Implementation (D. Barker) No floating point Would not try to duplicate F9450 instruction cycle counts
2/98 BR/VS - MPL -1750A 4
Environment
Sun SparcStations, Solaris OS
Model Tech VHDL Simulator (Mentor qhdl/qhsim)
Synopsys synthesis
Sparcserver with 8x250Mhz CPUs, 2Gb RAM used for regression tests
2/98 BR/VS - MPL -1750A 5
Mil-Spec-1750A
True CISC (Complicated Inst Set Comp)
–
Large number of instructions
–
Many addressing modes (e.g., load has 8 addressing modes)
–
16 bit (single precision) and 32 bit (double precision) operations
–
16 General Purpose Registers, Status, Fault, Pending Interrupt, Interrupt Mask
–
Optional Extended Addressing capability
2/98 BR/VS - MPL -1750A 6
1750A (cont.)
Separate set of IO instructions
– –
13 required, 40 optional IO instructions access optional assets such as timers, MMU
Console Mode Operation
–
Allows external hardware access to internal registers
–
14 console operations
16 Interrupt Sources
Mil-Std 1750 does not specify bus
2/98
interface
BR/VS - MPL -1750A 7
Fairchild F9450
Complete Mil-Std-1750A implementation
Shared External Addr/Data Bus (IB)
–
Arbitrated bus access
–
wait signals for both addr & data
Shortest instruction 4 clocks (logic ops), longest integer op 245 clks (dbl precision integer divide)
Microprogrammed control
2/98 BR/VS - MPL -1750A 8
1750A VHDL Model
Implements 173 opcodes
–
All integer operations
–
Required IO + 2 optional IO
–
Console Mode Enter, Examine Register, Continue
Implements F9450 Bus functionality
Instruction cycle counts same or less than F9450
Synthesizeable
2/98 BR/VS - MPL -1750A 9
C O N T R O L F S M IR IR_exe Decode Logic IBADDR FLT SW PI FMK IMK BBUS ABUS ALU IC_old 2/98 BR/VS - MPL -1750A IB MDR IC MAR A B Inc Logic RF CORE Constant Gen Logic XH XL YH YL C Bus Diagram for Model 10
Register Definition (all registers 16-bits) IR IR_exe MDR IBADDR IC IC_old MAR A,B C FLT SW PI FMK IMK XH,XL YH,YL instruction register, dest for instr being fetched instruction currently being executed memory data register, data buffer to/from IB address register, drives IB instruction counter of next instruction instruction counter for currently exe inst memory address register, operand address buffer registers for data read from RF core buffer register for data written to RF core fault register status word register pending interrupt register fault mask register interrupt mask register temporary registers used in Mult, Div, IO ops temporary registers used in Mult, Div, IO ops Some smaller misc registers not shown on diagram. Used for RF address computation, constant block addressing, etc.
2/98 BR/VS - MPL -1750A 11
Entity Hierarchy
cpu1750a
- tristate signals to pads here
cpucore
- structural 2/98
biu
- external bus interface
aproc incdec -
increment,decrement
aproclogic
- IC, MAR
dpath
- structural
rf
- structural - structural
rfcore
- 16 GPR regs (latches)
rflogic
- buffer regs for in/out data, RF addressing
alulogic
- all ALU functions except +/-
addsub
- ALU adder/subtractor
constants
- constants generation BR/VS - MPL -1750A
fault
- interrupt logic
ioproc
- temp regs for IO, mul/div ops
decode
- opcode decode
control
- structural
fsm0
- fsm nstate logic
fsm1
- fsm nstate logic
fsm2
- fsm nstate logic …..
fsm6
- fsm nstate logic
merge
- merge for fsm0:fsm6 outputs
cstate
- fsm state registers 12
Comments on Model Hierarchy
Often created separate entities for purposes of hardware mapping
2/98 –
ALU split into alulogic and addsub
•
alulogic is random logic
•
addsub implementation technology dependent (I.e. X4000 fastcarry chain versus standard cell CLA implementation)
–
Register file split into rfcore (16 GPRs) and rflogic
•
X4000E CLB DPRAM good for RFCORE impl.
BR/VS - MPL -1750A 13
Control (#states = 547) FSM0 FSM1 FSM2 FSM3 FSM4 FSM5 FSM6 M E R G E (OR) 2/98 BR/VS - MPL -1750A Dpath Ctrl (Unregistered, 6 to BIU, 2 to Fault) Dpath Ctrl (Registered , 83 signals) FSM State, Flags (Registered , 53 signals) 14
Comments on Control
547 states in FSM (3.2 states average per opcode)
–
states NOT distributed equally (logic ops < 1.0 unique states per opcode, VIO required 24 states)
Unregistered signals from MERGE go to shallow logic (BIU fsm) or immediately registered (FAULT) at destination
2/98 BR/VS - MPL -1750A 15
Comments on Control (cont.)
FSM split for easier synthesis
Efficient synthesis requires 2 step process
–
synthesis of indiv. blocks to gates
–
flatten gate netlist from CONTROL down and resynthesize to remove gates due to MERGE block
FSM implemented 1-level subroutine capability to increase state sharing
2/98 BR/VS - MPL -1750A 16
Model Testing
VHDL Testbench has cpu1750a + memory + stimulus
Unix-based sim1750/as1750 for producing golden results
K. Hill provided SEAFAC* VSW 1750A assembly tests (circa 1984)
–
272 non-floating point tests
* Systems Engineering Avionics FACility Verification SoftWare 2/98 BR/VS - MPL -1750A 17
SEAFAC Tests
Separate ASM file for each instruction/addressing mode
–
lubi5131.asm : load from upper byte, memory indirect indexed
–
lubi5130.asm: load from upper byte, memory indirect
Multiple operand data sets
–
lubi5130 contain 18 operand sets
Result, flags, interrupt bits checked
2/98 BR/VS - MPL -1750A 18
Regression Test System
Perl script which would
–
Read original SEAFAC ASM file, convert to be compatible with as1750/sim1750
–
Run sim1750a to produce golden result
–
Run VHDL simulation to produce test result
–
Indicate pass/fail, if fail, indicate operand set(s) which failed
2/98 BR/VS - MPL -1750A 19
Regression Test Results
Regression Tests run against behavioral model and synthesized netlist model
Of the 272 Tests:
2/98 –
220 passed
–
48 could not be automatically converted or incompatible with sim1750
–
4 failed because simulator produced incorrect ‘C’ flag value (bug identified in 1750A simulator C code).
BR/VS - MPL -1750A 20
Sample Execution Times
drqqa110 (dbl prec div), 242 op sets
– –
110 min (gate-level) 4 m : 50s (behavioral)
ddxqa220 (dbl prec div), 198 op sets
– –
85 min (gate-level) 3 m : 39s (behavioral)
dmrq9210 (dbl prec mul), 198 op sets
– –
80 min (gate-level) 3 m : 51s (behavioral)
2/98 BR/VS - MPL -1750A 21
Synthesis Results Module addsub alulogic aproclogic biu constants control decode fault inc ioproc rfcore rflogic Total SCMOS Cells 192 390 202 379 244 2902 563 594 45 298 899 562 7270 *RFCORE could be implemented in < 50 CLBS using X4000E DPRAM X4000 CLBS 44 174 133 170 142 1718 317 246 12 168 362* 289 3775 2/98 BR/VS - MPL -1750A 22
Comments on Synthesis Results
Model Synthesized to:
–
MSU SCMOS standard cell library
–
X4000 CLB Netlist
Only SCMOS Netlist simulated
Synthesized for area, used a max fanout constraint
No attempt to make use of special X4000 features (ROM, DPRAM, fast carry, etc).
2/98 BR/VS - MPL -1750A 23
Comments on Synthesis (cont)
Synopsys DesignWare Library used for Incrementer, Add/Sub blocks
–
CLA architecture specified
Synthesis time for entire design < 2 hours
–
will increase if more constraints specified
2/98 BR/VS - MPL -1750A 24
Synthesis Tweaking
RFCORE for Xilinx used 1 FF per bit (16x16 bits)
–
128 CLBs for storage, rest for decoding
–
If X4000E DPRAM (1 write port, 2 read ports), RFCORE < 50 CLBs
Decode/Constant blocks basically ROMs, CLB count can be lower if ROM capability used.
2/98 BR/VS - MPL -1750A 25
Synthesis Tweaking (cont.)
Two stage synthesis for Control significantly reduced cell count
–
Xilinx
•
after 1st phase 2653 CLBs
•
after 2nd phase 1718 CLBs
–
SCMOS
• •
after 1st phase 4649 after 2nd phase 2902
2/98 BR/VS - MPL -1750A 26
What is Left?
Hardware mapping improvements
Better testing of bus interface, interrupts
Add floating point
–
Estimate control increase by 50%
–
After tweaking, addition of FP, estimate approx 4100 CLBs.
Also add optional IO, console mode
–
2 timers to datapath
2/98 BR/VS - MPL -1750A 27
Projected CLBs if FP added, RF/Decode/Constants optimized Module addsub alulogic aproclogic biu constants control decode fault inc ioproc rfcore rflogic Total CLBs (Now) 44 174 133 170 142 1718 317 246 12 168 362 289 3775 CLBs (projected) constants+decode decreased 50%, rfcore decreased, control + 50% 44 174 133 170 71 2577 159 246 12 168 50 289 4093 2/98 BR/VS - MPL -1750A 28
Would Microcode Reduce Control Size?
Using current CLB/State numbers: Of 1718 Control CLBs, only 83 used for CSTATE 1635 * 32 bits/CLB = 52320 uCode bits Will guesstimate an average of 3 uWords per instruction (would be based on average # of machine cycles per instruction).
uCode width = datapath control lines + next uAddress selection = 91 + (9 bits direct address + 5 bits condition) = 105 bits estimate 178 opcodes * 3 uWords * 105 = 56070 bits 2/98 BR/VS - MPL -1750A 29
Would Microcode Reduce Control Size? (cont.) Not a clear winner between FSM and uCode for Xilinx Clever use of machine cycles could reduce average microcode words per opcode Vertical encoding of datapath signals could reduce uCode width (could also reduce gate count in FSM as well) uCode versus FSM tradeoffs is technology dependent .
What about other FPGA technologies besides Xilinx?
Not Clear…..
Ucode would give more predictable delay path for control.
Further investigation may be warranted.
2/98 BR/VS - MPL -1750A 30
In Closing
CDROM has all VHDL (behavioral and netlist) and regression tests
Regression Perl scripts dependent on qhsim but conversion to different simulator should not be difficult
For questions:
–
reese,[email protected]
2/98 BR/VS - MPL -1750A 31