A Synthesizeable VHDL Model of the 1750A Integer Subset

Download Report

Transcript A Synthesizeable VHDL Model of the 1750A Integer Subset

A Synthesizeable VHDL Model of the 1750A Integer Subset

Robert B. Reese/ Vince Sanders Microsystems Prototyping Lab Mississippi State/NSF Engineering Research Center

[email protected], http://www.erc.msstate.edu/mpl

Introduction

Robert Reese, PhD EE, TAMU 1985

MCC Cad Program ‘85-88

Joined MSU ECE faculty 09/89

Microsystems Prototyping Lab

Associated with MSU/NSF ERC

3 Faculty, 2 full time engineers, MS/PhD grad students

Current projects: Mixed Signal VLSI, VHDL modeling, ECAD via the WWW

2/98 BR/VS - MPL -1750A 2

Presentation Outline

Goals

1750A Overview

F9450 Implementation

Model Structure

Model Testing

Synthesis Results

What’s left….

2/98 BR/VS - MPL -1750A 3

Goals

Synthesizeable Model of 1750A for Legacy Replacement Experiments

Constraints

– –

Not much time (start 8/15/97, finish 12/31/97) Unfamiliarity with 1750A

Decided on

– – –

Use F9450 Implementation (D. Barker) No floating point Would not try to duplicate F9450 instruction cycle counts

2/98 BR/VS - MPL -1750A 4

Environment

Sun SparcStations, Solaris OS

Model Tech VHDL Simulator (Mentor qhdl/qhsim)

Synopsys synthesis

Sparcserver with 8x250Mhz CPUs, 2Gb RAM used for regression tests

2/98 BR/VS - MPL -1750A 5

Mil-Spec-1750A

True CISC (Complicated Inst Set Comp)

Large number of instructions

Many addressing modes (e.g., load has 8 addressing modes)

16 bit (single precision) and 32 bit (double precision) operations

16 General Purpose Registers, Status, Fault, Pending Interrupt, Interrupt Mask

Optional Extended Addressing capability

2/98 BR/VS - MPL -1750A 6

1750A (cont.)

Separate set of IO instructions

– –

13 required, 40 optional IO instructions access optional assets such as timers, MMU

Console Mode Operation

Allows external hardware access to internal registers

14 console operations

16 Interrupt Sources

Mil-Std 1750 does not specify bus

2/98

interface

BR/VS - MPL -1750A 7

Fairchild F9450

Complete Mil-Std-1750A implementation

Shared External Addr/Data Bus (IB)

Arbitrated bus access

wait signals for both addr & data

Shortest instruction 4 clocks (logic ops), longest integer op 245 clks (dbl precision integer divide)

Microprogrammed control

2/98 BR/VS - MPL -1750A 8

1750A VHDL Model

Implements 173 opcodes

All integer operations

Required IO + 2 optional IO

Console Mode Enter, Examine Register, Continue

Implements F9450 Bus functionality

Instruction cycle counts same or less than F9450

Synthesizeable

2/98 BR/VS - MPL -1750A 9

C O N T R O L F S M IR IR_exe Decode Logic IBADDR FLT SW PI FMK IMK BBUS ABUS ALU IC_old 2/98 BR/VS - MPL -1750A IB MDR IC MAR A B Inc Logic RF CORE Constant Gen Logic XH XL YH YL C Bus Diagram for Model 10

Register Definition (all registers 16-bits) IR IR_exe MDR IBADDR IC IC_old MAR A,B C FLT SW PI FMK IMK XH,XL YH,YL instruction register, dest for instr being fetched instruction currently being executed memory data register, data buffer to/from IB address register, drives IB instruction counter of next instruction instruction counter for currently exe inst memory address register, operand address buffer registers for data read from RF core buffer register for data written to RF core fault register status word register pending interrupt register fault mask register interrupt mask register temporary registers used in Mult, Div, IO ops temporary registers used in Mult, Div, IO ops Some smaller misc registers not shown on diagram. Used for RF address computation, constant block addressing, etc.

2/98 BR/VS - MPL -1750A 11

Entity Hierarchy

cpu1750a

- tristate signals to pads here

cpucore

- structural 2/98

biu

- external bus interface

aproc incdec -

increment,decrement

aproclogic

- IC, MAR

dpath

- structural

rf

- structural - structural

rfcore

- 16 GPR regs (latches)

rflogic

- buffer regs for in/out data, RF addressing

alulogic

- all ALU functions except +/-

addsub

- ALU adder/subtractor

constants

- constants generation BR/VS - MPL -1750A

fault

- interrupt logic

ioproc

- temp regs for IO, mul/div ops

decode

- opcode decode

control

- structural

fsm0

- fsm nstate logic

fsm1

- fsm nstate logic

fsm2

- fsm nstate logic …..

fsm6

- fsm nstate logic

merge

- merge for fsm0:fsm6 outputs

cstate

- fsm state registers 12

Comments on Model Hierarchy

Often created separate entities for purposes of hardware mapping

2/98 –

ALU split into alulogic and addsub

alulogic is random logic

addsub implementation technology dependent (I.e. X4000 fastcarry chain versus standard cell CLA implementation)

Register file split into rfcore (16 GPRs) and rflogic

X4000E CLB DPRAM good for RFCORE impl.

BR/VS - MPL -1750A 13

Control (#states = 547) FSM0 FSM1 FSM2 FSM3 FSM4 FSM5 FSM6 M E R G E (OR) 2/98 BR/VS - MPL -1750A Dpath Ctrl (Unregistered, 6 to BIU, 2 to Fault) Dpath Ctrl (Registered , 83 signals) FSM State, Flags (Registered , 53 signals) 14

Comments on Control

547 states in FSM (3.2 states average per opcode)

states NOT distributed equally (logic ops < 1.0 unique states per opcode, VIO required 24 states)

Unregistered signals from MERGE go to shallow logic (BIU fsm) or immediately registered (FAULT) at destination

2/98 BR/VS - MPL -1750A 15

Comments on Control (cont.)

FSM split for easier synthesis

Efficient synthesis requires 2 step process

synthesis of indiv. blocks to gates

flatten gate netlist from CONTROL down and resynthesize to remove gates due to MERGE block

FSM implemented 1-level subroutine capability to increase state sharing

2/98 BR/VS - MPL -1750A 16

Model Testing

VHDL Testbench has cpu1750a + memory + stimulus

Unix-based sim1750/as1750 for producing golden results

K. Hill provided SEAFAC* VSW 1750A assembly tests (circa 1984)

272 non-floating point tests

* Systems Engineering Avionics FACility Verification SoftWare 2/98 BR/VS - MPL -1750A 17

SEAFAC Tests

Separate ASM file for each instruction/addressing mode

lubi5131.asm : load from upper byte, memory indirect indexed

lubi5130.asm: load from upper byte, memory indirect

Multiple operand data sets

lubi5130 contain 18 operand sets

Result, flags, interrupt bits checked

2/98 BR/VS - MPL -1750A 18

Regression Test System

Perl script which would

Read original SEAFAC ASM file, convert to be compatible with as1750/sim1750

Run sim1750a to produce golden result

Run VHDL simulation to produce test result

Indicate pass/fail, if fail, indicate operand set(s) which failed

2/98 BR/VS - MPL -1750A 19

Regression Test Results

Regression Tests run against behavioral model and synthesized netlist model

Of the 272 Tests:

2/98 –

220 passed

48 could not be automatically converted or incompatible with sim1750

4 failed because simulator produced incorrect ‘C’ flag value (bug identified in 1750A simulator C code).

BR/VS - MPL -1750A 20

Sample Execution Times

drqqa110 (dbl prec div), 242 op sets

– –

110 min (gate-level) 4 m : 50s (behavioral)

ddxqa220 (dbl prec div), 198 op sets

– –

85 min (gate-level) 3 m : 39s (behavioral)

dmrq9210 (dbl prec mul), 198 op sets

– –

80 min (gate-level) 3 m : 51s (behavioral)

2/98 BR/VS - MPL -1750A 21

Synthesis Results Module addsub alulogic aproclogic biu constants control decode fault inc ioproc rfcore rflogic Total SCMOS Cells 192 390 202 379 244 2902 563 594 45 298 899 562 7270 *RFCORE could be implemented in < 50 CLBS using X4000E DPRAM X4000 CLBS 44 174 133 170 142 1718 317 246 12 168 362* 289 3775 2/98 BR/VS - MPL -1750A 22

Comments on Synthesis Results

Model Synthesized to:

MSU SCMOS standard cell library

X4000 CLB Netlist

Only SCMOS Netlist simulated

Synthesized for area, used a max fanout constraint

No attempt to make use of special X4000 features (ROM, DPRAM, fast carry, etc).

2/98 BR/VS - MPL -1750A 23

Comments on Synthesis (cont)

Synopsys DesignWare Library used for Incrementer, Add/Sub blocks

CLA architecture specified

Synthesis time for entire design < 2 hours

will increase if more constraints specified

2/98 BR/VS - MPL -1750A 24

Synthesis Tweaking

RFCORE for Xilinx used 1 FF per bit (16x16 bits)

128 CLBs for storage, rest for decoding

If X4000E DPRAM (1 write port, 2 read ports), RFCORE < 50 CLBs

Decode/Constant blocks basically ROMs, CLB count can be lower if ROM capability used.

2/98 BR/VS - MPL -1750A 25

Synthesis Tweaking (cont.)

Two stage synthesis for Control significantly reduced cell count

Xilinx

after 1st phase 2653 CLBs

after 2nd phase 1718 CLBs

SCMOS

• •

after 1st phase 4649 after 2nd phase 2902

2/98 BR/VS - MPL -1750A 26

What is Left?

Hardware mapping improvements

Better testing of bus interface, interrupts

Add floating point

Estimate control increase by 50%

After tweaking, addition of FP, estimate approx 4100 CLBs.

Also add optional IO, console mode

2 timers to datapath

2/98 BR/VS - MPL -1750A 27

Projected CLBs if FP added, RF/Decode/Constants optimized Module addsub alulogic aproclogic biu constants control decode fault inc ioproc rfcore rflogic Total CLBs (Now) 44 174 133 170 142 1718 317 246 12 168 362 289 3775 CLBs (projected) constants+decode decreased 50%, rfcore decreased, control + 50% 44 174 133 170 71 2577 159 246 12 168 50 289 4093 2/98 BR/VS - MPL -1750A 28

Would Microcode Reduce Control Size?

Using current CLB/State numbers: Of 1718 Control CLBs, only 83 used for CSTATE 1635 * 32 bits/CLB = 52320 uCode bits Will guesstimate an average of 3 uWords per instruction (would be based on average # of machine cycles per instruction).

uCode width = datapath control lines + next uAddress selection = 91 + (9 bits direct address + 5 bits condition) = 105 bits estimate 178 opcodes * 3 uWords * 105 = 56070 bits 2/98 BR/VS - MPL -1750A 29

Would Microcode Reduce Control Size? (cont.) Not a clear winner between FSM and uCode for Xilinx Clever use of machine cycles could reduce average microcode words per opcode Vertical encoding of datapath signals could reduce uCode width (could also reduce gate count in FSM as well) uCode versus FSM tradeoffs is technology dependent .

What about other FPGA technologies besides Xilinx?

Not Clear…..

Ucode would give more predictable delay path for control.

Further investigation may be warranted.

2/98 BR/VS - MPL -1750A 30

In Closing

CDROM has all VHDL (behavioral and netlist) and regression tests

Regression Perl scripts dependent on qhsim but conversion to different simulator should not be difficult

For questions:

reese,[email protected]

2/98 BR/VS - MPL -1750A 31