Document 7828849
Download
Report
Transcript Document 7828849
HW/SW Co-design
Lecture 4:
Lab 2 – Passive HW Accelerator
Design
Course material designed by Professor Yarsun Hsu, EE Dept, NTHU
RA: Yi-Chiun Fang, EE Dept, NTHU
Outline
Introduction to AMBA Bus System
Passive Hardware Design
Interrupt Service Routine
Environment Configuration
Co-designed System with GHDL
Simulation
Co-designed System on FPGA
INTRODUCTION TO AMBA
BUS SYSTEM
AMBA 2.0 Bus System (1/7)
Established by ARM
Advanced High-performance Bus (AHB)
For high-performance, high clock frequency system modules such
as embedded processor, DMA controller, and memory controller
Advanced Peripheral Bus (APB)
Optimized for minimal power consumption and reduced interface
complexity to support peripheral functions
For more details, please refer to the following documents
AMBA 2.0 Specification
Introduction to AMBA Bus System
GRLIB AHBCTRL - AMBA AHB controller with plug&play support
AMBA 2.0 Bus System (2/7)
Slave on AHB
The only master on APB
AMBA 2.0 Bus System (3/7)
AMBA AHB is
designed to be
used with a central
multiplexor
interconnection
scheme
Avoids tri-state bus
AMBA 2.0 Bus System (4/7)
An AHB transfer
consists of two
distinct sections
The address
phase, which lasts
only a single cycle
The data phase,
which may require
several cycles
This is achieved using
the HREADY signal
AMBA 2.0 Bus System (5/7)
A slave may insert wait states into any transfer
For write operations, the bus master will hold the
data stable throughout the extended cycles
For read transfers, the slave does not have to
provide valid data until the transfer is about to
complete
wait states
AMBA 2.0 Bus System (6/7)
GRLIB implements AMBA AHB with slight
modifications
Please refer to the GRLIB User's Manual and
GRLIB IP Cores Manual for detailed information
AMBA 2.0 Bus System (7/7)
The GRLIB implementation of AHB includes a
mechanism to provide plug&play support
The implementation is located at grlib-gpl-1.0.19b3188/lib/grlib/amba/
The configuration record from each AHB unit is sent to
the AHB bus controller via the HCONFIG signal
identification of attached
units
interrupt routing
address mapping of
slaves
type ahb_config_type is array (0 to
NAHBCFG-1) of amba_config_word;
PASSIVE HARDWARE DESIGN
Passive HW Accelerators
The accelerator (bus slave) does not actively
send signals to the bus
It only responds to the master
The master gives commands to the slave via its
control registers and probes its status registers
master
slave
Passive 1-D IDCT HW Acc. (1/4)
A simple 2-stage design
Gate delay
Stage 1: ~1 mult
Stage 2: ~3 add
Action register
Write ‘1’ to start, reset
to 0 automatically by the
accelerator when done
Mode register
Row/column mode
No wait states
Immediate response
action
mode
Passive 1-D IDCT HW Acc. (2/4)
Data packing
Since the 8x8 blocks are of type short (16-bit), each
value occupies only half of the data bus (32-bit)
We pack two values together to increase data bus
utilization and reduce the communication overhead
The action bit and mode bit are also packed together
MSB
16 bits
Y2n, x2n
16 bits
Y2n+1, x2n+1
32 bits
31
2
UNUSED
1
0
mode action
Passive 1-D IDCT HW Acc. (3/4)
1-D IDCT calculation
STEP1: Write Y registers (4 transfers)
STEP2: Write mode bit & action bit
STEP3: Poll the action bit
STEP4: Read x registers after action bit reset
Passive 1-D IDCT HW Acc. (4/4)
static void
hw_idct_1d(short *dst, short *src, unsigned int mode)
{
long *long_ptr = (long *)src;
Y_array_base[0] = long_ptr[0];
Y_array_base[1] = long_ptr[1];
...
*c_reg = (long)((mode << 1) | 0x1);
while (*c_reg & 0x1){
/*busy waiting loop*/ }
dst[ 0] = ((short *)x_array_base)[0];
dst[ 8] = ((short *)x_array_base)[1];
...
}
INTERRUPT SERVICE
ROUTINE
GRLIB GPTIMER (1/2)
General Purpose Timer Unit
Timers are present in almost any electronic device which
needs timing functions (e.g. timekeeping & time
measurement)
Acts as a slave on AMBA APB
Provides a common decrementing prescaler (clocked by
the system clock) and decrementing timers
Capable of asserting
interrupt on timer
underflow
We initialize timer 2 for
1ms resolution (i.e. an
interrupt will be asserted
every 1ms)
GRLIB GPTIMER (2/2)
Please refer to the GRLIB IP Cores Manual
for detailed information
eCos ISR (1/3)
When an interrupt occurs, the processor jumps to
a specific address for execution of the Interrupt
Service Routine (ISR)
One of the key concerns in embedded systems
with respect to interrupts is latency, which is the
interval of time from when an interrupt occurs
until the ISR begins to execute
interrupt
latency
eCos ISR (2/3)
Basic API for implementing ISR
Please refer to the eCos Reference Manual for
detailed information
#include <cyg/kernel/kapi.h>
void cyg_interrupt_create(cyg_vector_t vector,
cyg_priority_t priority, cyg_addrword_t
data, cyg_ISR_t* isr, cyg_DSR_t* dsr, cyg_handle_t*
handle, cyg_interrupt* intr);
void cyg_interrupt_delete(cyg_handle_t interrupt);
void cyg_interrupt_attach(cyg_handle_t interrupt);
void cyg_interrupt_detach(cyg_handle_t interrupt);
void cyg_interrupt_acknowledge(cyg_vector_t vector);
void cyg_interrupt_mask(cyg_vector_t vector);
void cyg_interrupt_unmask(cyg_vector_t vector);
eCos ISR (3/3)
An ISR is a C function which takes the
following form
An ISR should complete as soon as
possible
cyg_uint32
isr_function(cyg_vector_t vector, cyg_addrword_t data)
{
...
/* do the service routine */
return CYG_ISR_HANDLED;
}
Program Profiling (1/2)
We use GPTIMER for time measurment
Every time the timer asserts an interrupt, the
timer ISR will increase a global variable
time_tick
cyg_uint32
timer_isr(cyg_vector_t vector, cyg_addrword_t data)
{
unsigned long *time_tick = (unsigned long *) data;
(*time_tick)++;
cyg_interrupt_acknowledge(vector);
return CYG_ISR_HANDLED;
}
Program Profiling (2/2)
We record the latency of every function
block by monitoring the time_tick
variable
void
func()
{
unsigned long local_timer = time_tick;
...
time_elapsed += (time_tick - local_timer);
}
ENVIRONMENT
CONFIGURATION
Build SW Application
Copy the files in lab_pkg/lab2/sw to your
original Lab 1 directory
Replace the Makefile and modify the path for
ECOSDIR in Makefile
Type “make” to build
-D_HW_ACC_ flag will link the co-designed
version of hw_idct_2d() in idct_hw.c with the
testbench
Without this flag, hw_idct_2d() will be identical to
sw_idct_2d()
-D_PROFILING_ flag will enable profiling using
timer interrupt, and report the results in the end
Install IDCT Accelerator
Copy lab_pkg/lab2/hw/devices.vhd to
grlib-gpl-1.0.19-b3188/lib/grlib/amba/
and replace the original file
Copy lab_pkg/lab2/hw/libs.txt and the
whole lab_pkg/lab2/hw/esw folder to grlibgpl-1.0.19-b3188/lib/
The 1-D IDCT passive accelerator is located at
lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd
Copy lab_pkg/lab2/hw/leon3mp.vhd to
grlib-gpl-1.0.19-b3188/designs/leon3-grxc3s-1500/ and replace the original file
CO-DESIGNED SYSTEM WITH
GHDL SIMULATION
GHDL Simulation (1/6)
We compile our program as a virtual
SDRAM for LEON3 processor
LEON3 will fetch the instructions and
perform the corresponding operations
All the hardware signals can be recorded
and dumped by GHDL
GHDL Simulation (2/6)
In order to perform GHDL simulation, we disallow our
program to link with eCos
Remove -D__ECOS &
-I$(ECOSDIR)/include from CFLAGS
Remove -Ttarget.ld, -nostdlib, &
-L$(ECOSDIR)/lib from LFLAGS
Remove –D_PROFILING_ flag
You can remove -D_VERBOSE_ for faster simulation
You can modify the NUM_BLKS macro in idct_test.c to
reduce the number of testbench iterations
Type “make” to build
You should see a file named sdram.srec
GHDL Simulation (3/6)
Start Cygwin
cd grlib-gpl-1.0.19-b3188/designs/leon3-grxc3s-1500/
make distclean
make soft
Copy sdram.srec we
built into this directory
and replace the
original one
make ghdl
You can check for
syntax errors through
GHDL
GHDL Simulation (4/6)
Type “./testbench.exe --vcd=waveform.vcd”
after compilation to begin simulation
You should see an AHB slave with “Unknown
vendor” appear, which is our IDCT accelerator
GHDL Simulation (5/6)
The dump file waveform.vcd can be
viewed on-the-fly using GTKWave
Drag waveform.vcd and drop it over the
gtkwave.exe icon to open
You can also use Windows cmd to open
“File → Reload Waveform” in GTKWave to
update the dump file
GHDL Simulation (6/6)
addr data stage stage
1
2
phase phase
probe
control reg
CO-DESIGNED SYSTEM ON
FPGA
Build FPGA Bitstream (1/2)
Type “make ise | tee ise_log” under
grlib-gpl-1.0.19b3188/designs/leon3-gr-xc3s-1500/
after you install the accelerator
It is strongly suggested that you verify the
hardware with GHDL simulation first
It is also suggested that you take a look at
ise_log for more information
Configure your FPGA with leon3mp.bit
after generating the bitstream
Build FPGA Bitstream (2/2)
After entering GRMON, check the system
configuration using “info sys”
You should see a device with “Unknown
vendor” appear
Profiling Results
Build the program with -D_PROFILING_
flag on
Compare the computation results of
sw_idct_2d() and hw_idct_2d()
Compare the
computation
results with
and without
-D_VERBOSE_
flag