A methodology for design space exploration of on-chipnetworks Luciano Lavagno

Download Report

Transcript A methodology for design space exploration of on-chipnetworks Luciano Lavagno

A methodology for
design space exploration of
on-chip networks
Luciano Lavagno
Politecnico di Torino, Italy
Cadence Berkeley Labs, CA
[email protected]
http://polimage.polito.it/~lavagno
Laura Vanzago
STMicroelectronics
[email protected]
‹#›
MPSOC summer school, Chateau de Pizay, July 2002
Outline
 System-on-chip design flow
– Functional and architectural modeling
– Mapping
– Performance simulation
– Communication refinement
– Implementation
 Case study: wireless LAN architectural exploration
– functional model
– on-chip communication architectural model
– design space exploration
Luciano Lavagno ©
MPSOC 2002 - 2
The System-On-Chip Design Flow
 Specify:
– What does the customer really want?
 Architect:
– What is the most cost and performance effective
architecture to implement it?
– What existing components can I adapt and re-use?
 Evaluate:
– What is the performance impact of a cheaper architecture?
 Implement:
– What can I generate automatically from libraries and
customization?
 Idea: separate computation, communication and performance
Luciano Lavagno ©
MPSOC 2002 - 3
The System-On-Chip Design Flow
1
System
Behavior
Behavior
Simulation
System
Architecture
2
Mapping
3
Performance
Simulation
Communication
Refinement
4
Flow To Implementation
Luciano Lavagno ©
MPSOC 2002 - 4
The System-On-Chip Design Flow

Performance
Simulation
behavior annotated with
architectural effects


Analyze / Visualize
Results
Luciano Lavagno ©
Annotation
of architectural
timing and energy
onto behavior
MPSOC 2002 - 5
Functional Modeling
MPEG Decoder
VLD
M
BA
IZ,IQ
M
IDCT
M
BA
MEM
M
MC
M
BA
MEM
M
DISPLAY
M
Luciano Lavagno ©
MPSOC 2002 - 6
Communication Refinement
VLD
M
BA
IZ,IQ
M
IDCT
BA
MEM
M
MC
M
BA
MEM
M
DISPLAY
M
M
BA
MEM
BA
IZ,IQ
IDCT
M
M
M
M
M
M
SEG
SEG
REAS
SEG
REAS
REAS
VLD
M
M
MC
M
BA
MEM
M
DISPLAY
BUS
Luciano Lavagno ©
MPSOC 2002 - 7
Optimization
BA
MEM
BA
IZ,IQ
IDCT
M
M
M
M
M
M
SEG
SEG
REAS
SEG
REAS
REAS
VLD
M
M
MC
M
BA
MEM
M
DISPLAY
BUS
Luciano Lavagno ©
MPSOC 2002 - 8
Functional modeling
1
System
Behavior
Behavior
Simulation
System
Architecture
2
Mapping
3
Performance
Simulation
Communication
Refinement
Flow To Implementation
Luciano Lavagno ©
4
MPSOC 2002 - 9
Architectural Modeling
1
System
Behavior
Behavior
Simulation
System
Architecture
2
Mapping
3
Performance
Simulation
Communication
Refinement
Flow To Implementation
Luciano Lavagno ©
4
MPSOC 2002 - 10
1
Mapping
System
Behavior
Behavior
Simulation
System
Architecture
2
Mapping
3
Performance
Simulation
Communication
Refinement
Flow To Implementation
Luciano Lavagno ©
4
MPSOC 2002 - 11
Outline
 System-on-chip design flow
– Functional and architectural modeling
– Mapping
– Performance simulation
– Communication refinement
– Implementation
 Case study: wireless LAN architectural exploration
– functional model
– on-chip communication architectural model
– design space exploration
Luciano Lavagno ©
MPSOC 2002 - 12
1
Performance Modeling
Behavior
Simulation
Delay Script
// HW implem
delay() {
input(x);
run();
delay(2.0 / cps);
output(y); }
Luciano Lavagno ©
System
Architecture
2
Mapping
3
Performance
Simulation
Functional Model
my_ip() {
f = x.read();
r = f * k;
y.write(r); }
Separate Delay Model
System
Behavior
Communication
Refinement
Flow To Implementation
4
Inline Delay Model
Annotated
IP Functional Model
IP Functional Model
my_ip() {
f = x.read();
r = f * k;
y.write(r); }
my_ip() {
f = x.read();
r = f * k;
__DelayCycles(2);
y.write(r); }
MPSOC 2002 - 13
Software Performance Estimation
v__st_tmp = v__st;
startup(proc);
if (events[proc][0] & 1)
goto L16;
ANSI C
Input

Analyse
basic blocks
compute delays
Specify behavior
and I/O

Compile
generated C and
run natively
Virtual Machine
Instructions


Generate new C
ld
ld
op
ld
li
op
ts
-br
with delay annotations
v__st_tmp = v__st;
__DELAY(LI+LI+LI+LI+LI+LI+OPc);
startup(proc);
if (events[proc][0] & 1) {
__DELAY(OPi+LD+LI+OPc+LD+OPi+OPi+IF);
goto L16;
}
Architecture
Characterization
Performance
Estimation
Luciano Lavagno ©
MPSOC 2002 - 14
Communication refinement
Delay Independent API
e.g. unbounded FIFO Write, Read (vector of «any» type )

P
APP
C
HW/SW Independent System
Communications e.g. Bounded FIFO
Process
P
Module
C
Bus independent Virtual Component Interface
Write, Read (address, bus-able data chunk...)
P
VCI
C
Physical Bus Transfers
e.g. Arbitrated PIBus protocol
Module Interface
P
VCI
to Physical-Bus
Luciano
Lavagno
©
SYS
Wrapper
PHY
C
MPSOC 2002 - 15
Communication refinement
sw task
hw task
SYS
producer
filter
Interrupt
service routine
wake
up
writer
chan.
writer
RTOS
queue
dest
SYS
chan.
reader
wtr
n
fifo
reader
fifo
fifo
writer
RTOS board support package
VCI
CPU
ITC
MEM
src
wrapper
Luciano Lavagno ©
PHY
MPSOC 2002 - 16
Communication Refinement
A
B
Value()
Post(5)
Semaphore
Protected
SemProt_Send
mutex_lck
setEnabled
memcpy
signal
RTOS
SwMutexes
MemoryAccess
BusMaster
wait
memcpy
signal
Computation
Comm. Services
write read
CPU
Mem
SlaveAdapter
busRequest
BusArbiter
Luciano Lavagno ©
SemProt_Recv
Arch. Services
busResponse
arbiterReq/Release
MPSOC 2002 - 17
Mapping communication links to a pattern
Luciano Lavagno ©
MPSOC 2002 - 18
Mapping communication links to a pattern
Luciano Lavagno ©
MPSOC 2002 - 19
1
System
Behavior
Behavior
Simulation
Protocol
Stack
Control
Data
Speech In
Multiplexer
RPE-LTP
Encoder
Channel
Coder
Channel
Decoder
De-interleaver
Demultiplexer
Figure 1
DSP
Protocol
Stack
DSP
Internal
RAM
Control
Data
Figure 2
RPE-LTP
Decoder
BUS
Speech
Out
Luciano Lavagno ©
Dedicated
Hardware
SW
B
Value()
Post(5)
SemProt_Send
Semaphore
Protected
SemProt_Send
mutex_l
ock; setEnabled
memcpy;
signal
RTOS
SwMutexes
BusMaster
RAM
4
Driver
A
writeread
MemoryAccess
CPU
Figure 3
Microprocessor
Flow To Implementation
To RF
Interleaver
From RF
Demodulator
Modulator
Mapping
Performance
Simulation
Communication
Refinement
Address
decoder
HW
SW
2
3
Communication abstraction
layer refinement
HW
System
Architecture
SlaveAdapter
busRequest
BusArbiter
SemProt_Recv
wait
;
memc
py;
User Visible
sign
al
Pattern Services
Architecture Services
Mem
busIndica
busIndicat
tion
ion
arbiterRequest/
Release
MPSOC 2002 - 20
Performance simulation by mapping
F1
F2
F3
Function
Mapping
A1
F1
A2
Comm
A3
F2
Arbiter
Luciano Lavagno ©
Architecture
Comm
F3
Performance
simulation
model
MPSOC 2002 - 21
1
Performance simulation
System
Behavior
System
Architecture
Behavior
Simulation
2
Mapping
3
Performance
Simulation
Communication
Refinement
Flow To Implementation
4
Software Gantt Charts
Architecture Analysis
Luciano Lavagno ©
MPSOC 2002 - 22
Exploring Design Trade Offs
 Iteration through different mapping experiments
 Gradual refinement of the design
 Evaluation
– of the "refined" design
– of system performance after implementation
 Export implementation to
– Testbench and top-level netlist
– Hardware netlist
– Software RTOS customization
Luciano Lavagno ©
MPSOC 2002 - 23
Outline
 System-on-chip design flow
– Functional and architectural modeling
– Mapping
– Performance simulation
– Communication refinement
– Implementation
 Case study: wireless LAN architectural exploration
– functional model
– on-chip communication architectural model
– design space exploration
Luciano Lavagno ©
MPSOC 2002 - 24
Implementation by mapping
F1
F2
F3
Function
Mapping
A1
A2
F1
A3
F2
Intfc
Intfc
F3
Intfc
BUS
Luciano Lavagno ©
Architecture
Intfc
Implementation
model
MPSOC
2002 - 25
Flow to Implementation
1
System
Behavior
Behavior
Simulation
System
Architecture
2
Mapping
3
Export refined design to co-verification
and implementation tools
Performance
Simulation
Communication
Refinement
Flow To Implementation
4
System Exploration
Communication Refinement
Flow To Implementation
Hardware
Top-level
Luciano Lavagno ©
System
Test Bench
Software
on RTOS
MPSOC 2002 - 26
Flow to Implementation
Behavior
D
A
C
G
E
H
F
B
I
J
Task
X
Task
Y
ISR
W
RTOS
ROM
ASIC
RAM
CPU
bus
Luciano Lavagno ©
Architecture
MPSOC 2002 - 27
Customizing RTOS
D
A
C
B
Task
X
Task
Y
RTOS
CPU
Luciano Lavagno ©
#include <psos.h>
$<StandardHeader,'RTOS rootialization'>
#include "init.h"
$<RtosAndCpuIncludes>
#include "tasks.h"
$<BoardSupportPackageIncludes>
/* Device Driver includes/device handle decls */
$<LynxSwIncludes>
#include "drivers.h"
/* Device Driver includes/device handle decls */
/* Mutex semaphore per protected data-buffer */
E $<LynxDriverIncludes>
unsigned long I_24_I_50_MainDisp_mutex;
$<LynxDeviceHandleDecs>
unsigned long I_24_I_50_SubDisp_mutex;
/* Mutex semaphore per protected data-buffer */
/* Define an identifier for each task
*/
$<MutexVariableDefinitions>
unsigned long task_I_13_I_6__ready;
/* Define an identifier for each task
*/
unsigned long task_I_26__ready;
$<TaskIdDefinitions>
void root(void ) {
void root(void ) {
/* Mutex semaphore per protected data-buffer */
/* Mutex semaphore per protected data-buffer */
k_fatal(0x20000004, K_LOCAL);
$<CreateMutexes>
k_fatal(0x20000004, K_LOCAL);
/* Create each software task
*/
/* Create each software task
*/
$<CreateTasks>
if (t_create("T0", 10, 1024, 1024, T_LOCAL|,
/* Register interrupt service routines
*/
&task_I_13_I_6__ready)) k_fatal(0x20000001,
$<RegisterInterrupts>
if (t_create("T1", 11, 1024, 1024, T_LOCAL|,
/* Schedule each software task.
*/
&task_I_26__ready)) k_fatal(0x20000001,
$<StartTasks>
…
/* Delete or suspend the root task.
*/
$<DeleteSelf>
MPSOC 2002 - 28
Creating SW Communication Code
D
A
C
B
Task
X
Task
Y
RTOS
CPU
Luciano Lavagno ©
#include <psos.h>
#define LYNX_BEGIN_ATOMIC() OSDisableInt()
#define LYNX_END_ATOMIC() OSEnableInt()
#define LYNX_SET_PENDING(taskEventName) ev_receive(allevents, \
(EV_ANY || EV_NOWAIT), 0, events_r)
#define LYNX_SET_READY(taskEventName) ev_send(taskEventName, allevents)
E
#define LYNX_MUTEX_REQUEST(mutex) sm_p(mutex, SM_WAIT, 0)
#define LYNX_MUTEX_RELEASE(mutex) sm_v(mutex)
#define LYNX_ISR_ENTER() OSEnterISR()
void lynx_Run(lynx_inst_ident_t inst_id)
#define LYNX_ISR_EXIT() OSExitISR() {
char buffinput[10] = "";
if (lynx_Enabled(inst_id,in)){
lynx_Value(inst_id, in, &buffinput);
... behaviour d functionality ....
lynx_Post(inst_id, out, &buffinput);
}
#define I_31_I_64_Value_MainDisp(inst_id,
buff_p) \
((\
(LYNX_MUTEX_REQUEST(I_3_DM_1_X_mutex)), \
(LYNX_MEMCPY(buff_p,&I_3_DM_1_X,sizeof(I_3_DM_1_X))), \
(Probe_I_31_I_64_Value_MainDisp), \
(LYNX_MUTEX_RELEASE(I_3_DM_1_X_mutex)) ), &I_3_DM_1_X\
)
MPSOC 2002 - 29
Creating HW Communication Code
Interrupt
Register
Mapped
D
G
H
CPU
I
J
D
ISR
Task
Y
RTOS
J
ISR
W
ASIC
= LYNX
= memcpy (64 bit read)
= data buffer
= frozen data buffer
= presence bit
CPU
Data bus (16)
Addressr bus
Interrupt bus
B
u
s
I
n
t
f
a
c
e
16
ASIC
Interrupt
Register
Mapped
G
64
64
H
16
8
Interrupt
Register
Mapped
32
I
bus
Luciano Lavagno ©
MPSOC 2002 - 30
Creating Testbench
Test1
Test2
A
B
C
D
System-level Simulation
Results DB
Results DB
Source
Source
A
B
C
D
Co-Verification
Luciano Lavagno ©
MPSOC 2002 - 31
Comparing Results
Luciano Lavagno ©
MPSOC 2002 - 32
Outline
 System-on-chip design flow
– Functional and architectural modeling
– Mapping
– Performance simulation
– Communication refinement
– Implementation
 Case study: wireless LAN architectural exploration
– functional model
– on-chip communication architectural model
– design space exploration
Luciano Lavagno ©
MPSOC 2002 - 33
Case study: wireless LAN physical layer
Protocol Stack
HiperLan/2
PicoRadio
Application
Ad Hoc Networks:
Low Rate: b/sec - kb/sec
Low Power: 100mW
Network
MultiMedia Wireless
Networks;
High Rate: 10 Mb/sec
Low Power: 10-100 mW
MAC
OFDM Physical
Layer/Digital BB
OFDM TX
Luciano Lavagno ©
Dynamic
Reconfiguration
OFDM RX
MPSOC 2002 - 34
Design Flow
OFDM TX
MAC, Network and
Higher Layers
OFDM Physical
Layer
OFDM RX
C
Functional
IP Reuse
Functional
Partitioning
TX
RX
Application Specification
English (UML, …)
Algorithm Exploration
COSSAP/C (Matlab/Simulink, …)
Functional Simulation
and Refinement
VCC (SystemStudio, …)
Architecture Exploration:
Performance Simulation
VCC (SystemStudio, …)
Mapping
Mapping
Architecture Refinement
Luciano Lavagno ©
Implementation
MPSOC 2002 - 36
Top-level Hiperlan/2 Functional Model
Luciano Lavagno ©
MPSOC 2002 - 37
Hiperlan/2 OFDM Transmitter
Luciano Lavagno ©
MPSOC 2002 - 38
Hiperlan/2 OFDM Receiver
Luciano Lavagno ©
MPSOC 2002 - 39
Heterogeneous Behavior
MAC
GoT/GoR
Sync
Idle
State
GoT
Preamble
GoR
TX
RX
GoT
N
DataPath
Sym
GoR
CostToT CostToR
N
DataPath
Control-FSM
TX – Static Dataflow
Luciano Lavagno ©
RX – Dynamic Dataflow
MPSOC 2002 - 40
Example of functional block
LenghtPar = 64
64
Real
Imag
64
FFT
OutReal
OutImag
Imported from Cossap
environment
Luciano Lavagno ©
void CPP_MODEL_IMPLEMENTATION::Init()
{
….; Length = LenghtPar.Value(); // read parameter
// Set data rate on 2 input ports: Real and Imag
Real.SetDataRate(Length);
Imag.SetDataRate(Length);
}
// Run() is executed every time the firing rule is satisfied
void CPP_MODEL_IMPLEMENTATION::Run()
{
for (i=0; i<Real.GetDataRate(); i++) {
// Read data from the input ports
data[i] [0] = Real.Value();
data[i] [1] = Imag.Value();
}
// Call the FFT procedure (C functional model)
fft_cns_rot_bfp(data,….);
// Write data to two output ports (OutReal, OutImag)
for( i=0; i< Lenght; i++) {
OutReal.Post(data[i][0]);
OutImag.Post(data[i][1]);}}
}
}
MPSOC 2002 - 41
Outline
 System-on-chip design flow
– Functional and architectural modeling
– Mapping
– Performance simulation
– Communication refinement
– Implementation
 Case study: wireless LAN architectural exploration
– functional model
– on-chip communication architectural model
– design space exploration
Luciano Lavagno ©
MPSOC 2002 - 42
Wireless LAN physical layer SOC architecture
FPGA
FFT FIR
UART
BUFFER
XBAR
Interface
XBAR
FPGA
config. mem.
Int. bridge
DPR2/SPS2
Bridge
Processor bus
Interface
Processor bus
Ck, reset
I/D
Micro
caches
Jtag
Interface
Luciano Lavagno ©
Reset CK2 CK1
Clock
gen.
Datapath
SPS2
(instruction/
data RAM)
MCK VDD VSS TEST(0..2)
MPSOC 2002 - 43
Crossbar features
 The crossbar model is flexible in the number of supported
masters and slaves
(evaluated at simulation initialization time)
 A prioritized FIFO is used to arbitrate multiple master
requests for each slave
 Number of parallel slave accesses defined through a
parameter
 A transmission can be suspended by higher priority
requests (preemptive)
 Arbitration overhead and slave access delays are
parameterized
Luciano Lavagno ©
MPSOC 2002 - 44
Crossbar Architecture Service Structure
Beh1
FPGA
Post()
Value()
MEM
XBAR
DFSHAREDMEMORY (HW->HW)
Sender
Memory MEM
Beh2
Behavior
Network
FFT
Architecture
Network
Communication
Pattern
Receiver Service Stack
XBarSlave MEM PORT
XBarMaster FPGA PORT
XBarMaster FFT PORT
Luciano Lavagno ©
Mem0
Mem1
XBarArbiter
BUS
MPSOC 2002 - 45
Crossbar Architecture Service Structure
Beh1
Post()
FPGA
Value()
MEM
XBAR
DFREGISTERDIRECT (HW->HW)
Behavior
Network
FFT
Architecture
Network
Communication
Pattern
Receiver Service Stack
Sender
XBarMaster FPGA PORT
XBarMaster FFT PORT
XBarArbiter
Luciano Lavagno ©
Beh2
BUS
MPSOC 2002 - 46
Outline
 System-on-chip design flow
– Functional and architectural modeling
– Mapping
– Performance simulation
– Communication refinement
– Implementation
 Case study: wireless LAN architectural exploration
– functional model
– on-chip communication architectural model
– design space exploration
Luciano Lavagno ©
MPSOC 2002 - 48
Design Space Exploration
 Explored several computation/communication architecture
configurations
– FFT throughput
(1 item / 4 clock cycles vs 1 item / 1 clock cycle)
– Number of buffer ports FFT  FIR on crossbar
– FPGA  FFT communication pattern
– Shared Memory
– Register Direct
Luciano Lavagno ©
MPSOC 2002 - 49
Exploration Results
FFT:1/4
SM
4000
FFT:1/1
SM
RD
RD
1P
2P
1P
2P
1P
2P
1P
2P
SimA1
SimB1
SimC1
SimD1
SimA2
SimB2
SimC2
SimD2
3500
3000
2500
2000
1500
1000
500
0
FPGA
BitRate (Mb/s) 5.8 7.2
Luciano Lavagno ©
8.2
8.2
FFT
9.6
Hiperlan/2
FIR
spec.
13.7 12.5 15.6MPSOC 2002
12 - 50
Conclusion
 System-On-Chip Design requires methodology, tools and
libraries
 Separate computation, communication and architecture
– computation: compiled and scheduled
– communication: refined via patterns
 Map computation and communication onto platform
– simulate performance
– generate implementation model for HW, SW and
communication
Luciano Lavagno ©
MPSOC 2002 - 51