MODERN presentation template

Download Report

Transcript MODERN presentation template

WP4: Relationship between workpackages
MODERN 1st Year Review
June 30, 2010
CONFIDENTIAL 1
WP4 Presentation Outline
Resources and funding
Deliverables in the review period
Task Structure
Task activity review
Summary
MODERN 1st Year Review
June 30, 2010
CONFIDENTIAL 22
Jun. 22, 2010
WP4 Resources and Funding
Effort (m/m)
Total:
Effort (m/m) 2009
Contract
signed
Advanced
payment
LETI
36
11.5 (9 T4.1 / 2.5 T4.2)
12 (8 T4.1 / 4 T4.2)
Y
Y
CSEM
24
0 (T4.2)
12 (T4.2)
N
N
TMPO
57.6
18.5 (16 T4.2 / 2.5 T4.4)
22.3 (15.8 T4.2 / 6.5 T4.4)
Y
Y
ELX
36
12 (T4.2)
12 (T4.2)
Y
Y
TEKL
12
9 (T4.2)
3 (T4.2)
N
Y
ST I
72
20 (4 T4.2 / 16 T4.4)
30 (9 T4.2 / 21 T4.4)
N
N
ST F
66
5.5 (T4.3)
30 (T4.3)
Y
Y
ISD
123
55 (T4.3)
40 (T4.3)
Y
N
LIRM
30
14 (T4.5)
16 (T4.5)
Y
Y
NMX
30
5 (T4.3)
12 (T4.3)
N
N
THL
60
14 (13 T4.3 / 1 T4.5)
15 (12 T4.3 / 3 T4.5)
Y
Y
UPC
42
12 (3 T4.1 / 9 T4.4)
15 (10 T4.1 / 5 T4.4)
Y
Y
UNBO
23
8 (T4.4)
8 (T4.4)
N
N
POLI
36
15.3 (T4.2)
12 (T4.2)
N
N
Partner
Effort (m/m) 2010
(planned)
MODERN 1st Year Review
June 30, 2010
CONFIDENTIAL 33
Jun. 22, 2010
WP4 Meetings
Face2face meetings
– ST I, Agrate Brianza, Italy, April 3rd, 2009
• Participants: all WP4 partners
Web meetings
– November 20th, 2009
• Participants: all WP4 partners
– February 26th, 2010
• Participants: all WP4 task leaders and several partners
– May 21st, 2010
• Participants: all WP4 partners
MODERN 1st Year Review
June 30, 2010
CONFIDENTIAL 44
Jun. 22, 2010
WP4 Deliverables in the Review Period (1/2)
No.
Planned
Status
D4.1.1
M24
On track
D4.1.2
M30
D4.1.3
M36
D4.2.1
M12
Delivered
D4.2.2
M24
On track
D4.2.3
M24
On track
D4.2.4
M36
D4.2.5
M36
Explanation
Reports on PV-aware (self-) adaptive compensation and
optimization techniques, including on-chip monitors
Tape-out of prototype on-chip sensors and level shifter
circuits for (self-) adaptive design
Report on trade-off metrics for (self-) adaptive
compensation and optimization techniques
Reports on PV-tolerant asynchronous blocks and on ultra
low-power circuits/architectures. Prototype
asynchronous/de-synchronization flow
Reports on PV-tolerant noise and EMI reduction techniques,
and on asynchronous and de-synchronized communication
scheme benchmarking
Advanced asynchronous/de-synchronization flow.
Delivery of the first de-synchronized design, and ultra lowpower circuits/architectures
Report on PV-tolerant architectures and circuit performance
analysis and current profile estimation. Synthesis and
simulation of ultra low-power circuits/architectures
High-level asynchronous synthesis tool and exploitation on
high-performance advanced industrial de-synchronized
design. Advanced power shaping methodology and tool for
low-EMI design
MODERN 1st Year Review
June 30, 2010
CONFIDENTIAL 55
Jun. 22, 2010
WP4 Deliverables in the Review Period (2/2)
No.
Planned
Status
D4.3.1
M12
Delivered
D4.3.2
M24
On track
D4.3.3
M24
On track
D4.3.4
M36
D4.4.1
M24
On track
D4.4.2
M24
On track
D4.4.3
M30
D4.4.4
M36
D4.5.1
M24
Explanation
Robust architecture design specification, and SystemC
model for a multi-core SoC virtual platform
High-level models for robust and predictable blocks and
architectures, also including NVM design, and robustness
assessment report
Functional and test specs for a validated controller for ADC
and PLL components. Fault-tolerant on-chip global
communication scheme on a multi-core SoC virtual platform
Validated macro blocks for controllers implementation on
the multi-core SoC virtual platform. Report on signal coding
for robust NVM design
Report on yield prediction tool and regular structures for PVtolerant blocks
Report on customizable and regular architectures for
homogeneous multi-threading and signal processing,.
Design flow for mapping on mask-programmable blocks
Tape-out of a chip based on regular transistor arrays
On track
Exploitation of the design flow for signal processing
application mapping on the proposed regular architectures.
Report on regular design impact on yield improvement
Report on programming methods and tools for PV-tolerant,
reliable, and predictable MPSoC architectures
MODERN 1st Year Review
June 30, 2010
CONFIDENTIAL 66
Jun. 22, 2010
WP4 Task Structure
Task T4.1: Variability-aware design
– Partners: LETI, UPC, STF
– Definition and development of (self-) adaptive compensation and optimization
techniques to cope with the increasing impact of PV variations
– New adaptive voltage and frequency scaling (AVFS) techniques, which can be
exploited either after testing or at run-time, will be developed
Task T4.2: Variation-tolerant, robust, low-noise and low-EMI
architectures/micro-architectures
– Partners: ELX, CSEM, TMPO, LETI, POLI, ST I, TEKL
– Development and design of advanced macro-blocks for robust and reliable
systems
– Adaptive architectures based on asynchronous and de-synchronization
techniques
– On-chip communication schemes (GALS paradigm)
– Synthesis of PV-tolerant asynchronous/de-synchronized functional blocks and
architectures for low-EMI design
– Design of ultra low-power applications
MODERN 1st Year Review
June 30, 2010
CONFIDENTIAL 77
Jun. 22, 2010
WP4 Task Structure
Task T4.3: Design of reliable systems
– Partners: ISD, THL, NMX, ST F
– Design of highly reliable analog, mixed-mode, digital, and Non Volatile Memory
(NVM) systems based on unreliable foundations subject to large PV variations
and degradation
Task T4.4: Design of regular architectures and circuits for high
manufacturability and yield
– Partners: ST I, TMPO, UPC, UNBO
– Development of customizable circuits, macro-blocks, and architectures based
on regular structures, in order to improve manufacturability and predictability
Task T4.5: Distributed reconfigurable PV-robust architectures
– Partners: THL, LIRM
– Development of MPSoC design and distributed and reconfigurable PV-tolerant
architectures
– Programming methods and tools for predictable and PV-robust computing
architectures
MODERN 1st Year Review
June 30, 2010
CONFIDENTIAL 88
Jun. 22, 2010
T4.1: Local Adaptive Voltage and Frequency Scaling
(LETI, STF)
A Local Adaptive Voltage and Frequency Scaling approach is proposed :
- Allowing a local variability management
- Requiring isolated Voltage Islands
- Requiring isolated Frequency islands
 Variability is managed at fine grain
tuning dynamically V/F (WP4.1)
according to on-chip diagnostic (WP3)
 A Globally Asynchronous and
Locally Synchronous architecture
is proposed (WP4.2)
Decision Maker
Parameter Control
Diagnostic
S
K : Actuators
S : Sensors
Circuit
MODERN 1st Year Review
June 30, 2010
S
S
K
Action
K
S
Digital Block
CONFIDENTIAL 99
Jun. 22, 2010
K
T4.1: Design of efficient Level Shifters (UPC, LETI)
► Isolated voltage islands are requiring efficient Level Shifters developed by
UPC:
-
Early work achieved with some delay due to non-available first year funding
-
A test-chip design is planned for second year so that deliverables will be
completed on-time
Dynamic tuning of voltage and frequency
requires new Local actuators developed by LETI:
-
Reduce the dynamic power by DVFS
P stat + dyn
Serve as a regulator using an adaptive
P stat + dyn
{Vhigh,Fhigh}
DFS
{Vlow,Flow}
DVS
technique, to exchange timing margins
against power budget.
Fclk_reduite
MODERN 1st Year Review
June 30, 2010
{Vlow,0}
Fclk
10
CONFIDENTIAL 10
Jun. 22, 2010
Fclk_reduite
Fclk
T4.2: Architectures to mitigate PV (CSEM)
Block architecture: for a full adder, which is the best architecture (ripple carry, carry lookahead) and VDD for reducing the effect of PV?
At 500mV, RCA adder is about 2X slower than CSL adder at 500 mV, but σ/μ of delay is
about 28% smaller due to longer critical path length.
But when we compare CSL at 500mV and RCA at 600mV, we see that RCA at 600 mV
architectures are about 4% faster and less power hungry and 1.8X less sensitive to intradie device-to-device process variations.
By comparing RCA at 500mV and RCA at 600mV it is clear that 21% of this improvement
is due to higher VDD.
CSL@500 RCA@500 RCA@600
mV
mV
mV
EPO
Delay
σ/μ
Effect of power supply on circuit variability
1
1
8.1%
0.67
2.09
5.8%
CSL: Carry Select Adder
RCA: Ripple Carry Adder
EPO: Energy Per Operation
MODERN 1st Year Review
June 30, 2010
11
CONFIDENTIAL 11
Jun. 22, 2010
0.94
0.96
4.6%
T4.2: Desynchronization flow and EMI reduction
(ELX, POLI, ST I ) 1/2
Set up and tested an automatic flow for de-synchronization
– Inherit the properties of asynchronous circuits with little effort
– Use a mixture of EDA vendors
•
Magma for the backend and Synopsys for the frontend and signoff
Apply the paradigm for EMI reduction
– Analyzed supply current to estimate EMI improvement
– Additionally, to improve the EMI reduction, multiplexed delay lines were used to introduce
local clock jitter in the Elastic Clocks
Fully integrated desynchronization into the ST implementation flow:
– SOC Encounter for physical design
– PrimeTime for sign-off
– Apache RedHawk for power rail analysis
Tested on two designs:
–
–
–
–
An H.264 video encoder circuit
An ST7 8-bit microcontroller
10-15 dB EMI improvement
Augment in robustness
MODERN 1st Year Review
June 30, 2010
12
CONFIDENTIAL 12
Jun. 22, 2010
T4.2: Desynchronization flow and EMI reduction
(ELX, POLI, ST I) 2/2
CUSTOMER
FLOW
RTL
ELASTIX
DESIGN FLOW
Identify
regions
RTL, UPF
to elasticize
TCL script
Synthesis
Chip finish
Volt. Domains,
Netlist, SDC, DEF
Elastix circuit
TCL script, transformations
SDC
Tool DB
STA
Timing
Netlist, DEF
Delay line
synthesis
TCL script,
(multi-corner)
SDC
ECO
Timing, netlistElastix timing
Floorplan,
Placement
CTS
Routing
Sign-off
TCL script,
SDC
closure
MODERN 1st Year Review
June 30, 2010
13
CONFIDENTIAL 13
Jun. 22, 2010
T4.2: Variability-tolerant low-EMI asynchronous
circuits (TMPO)
Tiempo contribution is to enable the design of variability-tolerant low EMI
asynchronous circuits and evaluate/predict at design time the EMC behavior
Tiempo first year achievements
►
Set up a flow to design PVT-tolerant asynchronous cells
►
Set up a flow to estimate current consumption profile and estimate EMI
►
Tiempo demonstrated the flows on its asynchronous AES ciphering IP
AES asynchonous circuit current
curve (extracted post P&R)
Corresponding current spectrum
Below -30 dB
MODERN 1st Year Review
June 30, 2010
14
CONFIDENTIAL 14
Jun. 22, 2010
T4.2: Robust asynchronous QDI communication
for NoC (LETI)
Within a Local Adaptive Voltage and Frequency Scaling Architecture for
dynamic variations compensation :
-
Isolated voltage islands are requested (T4.1 work)
-
Isolated frequency islands are also requested and a GALS
architecture is proposed
In this GALS context, an asynchronous NoC is developed by CEA in
T4.2:
-
During the first year, an asynchronous library cells has been
developed
-
32 nm technology, about 40 cells, fully characterized
MODERN 1st Year Review
June 30, 2010
15
CONFIDENTIAL 15
Jun. 22, 2010
T4.2: Integration of Power Shaping technology into
EDA flows (TEKL, STI)
Synthesis
TEKL has integrated its power shaping technology into
a Cadence Encounter-based as well as a Synopsys
ICC-based ASIC backend flow. Part of this work has
been done in close collaboration with ST I.
FloorDirector®
Place&Route
Integration seamlessly into mainstream flows is
done by analysing a given design using standard
indudstry formats such as Verilog, SDF and SDC,
and exporting modified Verilog as well as flow
specific clock tree synthesis directives.
MODERN 1st Year Review
June 30, 2010
16
CONFIDENTIAL 16
Jun. 22, 2010
T4.3: SystemC virtual platform for multicore SoC
(ISD, THL) 1/2
ISD develops a clock-accurate, transaction-level SystemC virtual
platform (VP) of a multicore SoC.
Once validated, the VP is extended to incorporate fault tolerance.
ISD also designs highly-reliable AMS blocks, e.g. PLL
Cluster of PEs
(Thales)
CPE1
NoC
(ISD)
CPE2
…
CPEN
Interconnect
Shared Memory
(ISD)
SE1
SE2
…
MODERN 1st Year Review
June 30, 2010
SEN
17
CONFIDENTIAL 17
Jun. 22, 2010
T4.3: SystemC virtual platform for multicore SoC
(ISD) 2/2
Multilayered fault tolerant approach
to diagnose and recover from
permanent and transient node/link
faults.
Methodology includes
8192 mp
16384 sh_mem
Speedup
8192 sh_mem
32768 mp
16384 mp
32768 sh_mem
1.5
1.4
1.3
– packet encoding/retransmission,
– fault tolerant routing
– offline static reconfiguration.
Study performance degradation due
to static and dynamic faults.
1.2
1.1
1
0.9
0.8
5
10
15
20
25
30
35
Hypercube size
Speedup vs hypercube size for parallel sorting (no faults, 1 st version of
virtual platform)
MODERN 1st Year Review
June 30, 2010
18
CONFIDENTIAL 18
Jun. 22, 2010
T4.3: Multi-Core Architecture (THL) 1/2
Goal
– Definition and development of a flexible, highly-parameterized, userfriendly framework for exploring performance, power consumption and
reliability trade-offs (different architectural and algorithmic solutions and
technology process variations) in future multi-core systems
Results
– Integration of Thales customized processor tile in coherence with faulttolerance scenarios selected for preliminary platform reliability evaluation
– Preliminary VP models and specifications exchanged between ISD and
THL
– Experimentation in Thales with processor model integration and platform
simulation based on preliminary test-benches
MODERN 1st Year Review
June 30, 2010
19
CONFIDENTIAL 19
Jun. 22, 2010
T4.3: Multi-Core Architecture (THL) 2/2
The Network Interface Module is in charge of network protocol translation. Because the
iNoC and the NoC may not use the same protocol, or share the same frequency the tile
must be isolated from the NoC.
Interfaces between modules are defined so that the SystemC model of this architecture
allows to test any module. The simulator is based on OCP TL2.
I/O
Cluster
Cluster
Cluster
Cluster
Cluster
Shared Memory
Cluster
NOCI
Cluster
Cluster
Tile
NOCI
NOCI
Cluster switch
NOCI
NOCI
Tile
Tile
NOCI
Tile
CTR
Cluster
Tile
Cluster
LMEM
DMA
I
N
O
C
CCI
DXI
NIM
DDR
ACC
IP
I/O
Ctrl DDR
NOCI
MODERN 1st Year Review
June 30, 2010
20
CONFIDENTIAL 20
Jun. 22, 2010
T4.3: Fault Tolerant NoC architecture (STF)
Extended Spidergon STNoC to support fault tolerant routing through
direction and destination reprogramming
Both node and link faults have been considered
Industrial application in STMicroelectronics products using SSTNoC
technology
MODERN 1st Year Review
June 30, 2010
21
CONFIDENTIAL 21
Jun. 22, 2010
T4.3: Design of Highly-reliable Non-Volatile Memory
systems (NUM) 1/2
shrinking
of
technology
nodes
increasing
of the
number of
bit-per-cell
memories
more
prone to
errors
Design of highly-reliable Non-Volatile Memory (NVM) systems
To identify architecture
solutions for an efficient
implementations of AD and SP
for NVM products
To exploit advanced coding (AD) techniques and signal processing (SP) for robust NVM design
Advanced coding techniques:
Convolutional codes
Turbo codes
Trellis Coded
Modulation
Key Enabler for advanced
coding and DSP: Soft
Decisions
Signal Processing
LDPC
Cell-to-cell
interference,
Trapping
MODERN 1st Year Review
June 30, 2010
De-trapping
analog or quantized
(with more levels
than the actual ones)
read of the memory
cell
22
CONFIDENTIAL 22
Jun. 22, 2010
Rule-of-thumb: for N
b/c, N+2 quantization
bits are enough
T4.3: Design of Highly-reliable Non-Volatile Memory
systems (NUM) 2/2
Preliminary analysis: concatenation of two codes
Inner code: to improve the “channel”
reliability
Outer code: to “crunch” all the errors
Comparison of different concatenation schemes to
Minimize parity
check bits
Achieve high
reliability target
(UBER < 10-15)
Have reasonable
latency overhead
Current best
solution:
LDPC+BCH

Next step
High level architecture of the decoder
MODERN 1st Year Review
June 30, 2010
23
CONFIDENTIAL 23
Jun. 22, 2010

T4.4: Design of Mask Programmable IPs for Fast
SoC Development (STI, UNBO)
M1/M2
connections
•
•
•
•
Regular Transistor Array
Customization through Regular Tile Datapath
Metal Layers
Customization through
Via Connections
VIA 4
connections
Base-cell developed (logic)
• Base-tile developed (logic + routing)
P&R to be achieved with standard CAD • Tile logic fully synthesizable
• 1 Via customization (Via 4)
2 Metal customization (M1 + M2)
• All tiles identical
Standard CORE library compliance
Customization Flow (same Front-End for both solutions)
Syntax checks
Pseudo-C
Code
(Griffy)
HDL netlist
Pipelined
architecture
distillation
ANSI C
emulation
.dot for
GraphViz
Front-End flow already implemented
Transistor
Array
Standard P&R CAD Flow
and Signoff
Tile
Datapath
Automatic Via 4 Layer
generation for GDS view
and standard Signoff
IP
Ready for
integration
Back-End flow under development
MODERN 1st Year Review
June 30, 2010
24
CONFIDENTIAL 24
Jun. 22, 2010
T4.4: Definition of customizable MP architecture
(UNBO, STI)
Programming model for application mapping on a
regular multiprocessor architecture:
– Results:
•
Implementation of a compilation flow based on CUDA
programming model
– Future activities:
•
High level memory transfers management through
automated programming of DMA channels
Hardware/software design methodology for
mapping accelerators on a customizable
multiprocessor architecture:
– Results:
•
Template architecture
Implementation of a scalable and parametric systemC (TLM) multiprocessor architecture
– Future activities:
•
•
Integration of accelerators emulation function with the
system-C model
High level management of heterogeneous distributed
hardware acceleration
MODERN 1st Year Review
June 30, 2010
Design flow
25
CONFIDENTIAL 25
Jun. 22, 2010
T4.4: Development of a via-configurable regular
transistor array (UPC, STI)
Main Target:
To develop a via-configurable regular transistor array (VCTA). The performance
–area -power trade-offs of this approach for regular design will be evaluated,
along with its impact on random defectivity, parametric yield, and
manufacturability
Highlights:
•
•
•
VCTA basic architecture studied and implemented
Basic elements and blocks implemented
Regularity evaluation (part of D4.4.1, M24) using verification tools to
compute geometrical regularity characteristics
A paper has been submitted to VLSI SOC 2010 conference
Lowlights: funding delays have affected the development of activities
Work plan and on going activities:
Work on regular cell fabric to integrate ( Placement and Routing ) as automatic
as possible, using state of art CAD tools
MODERN 1st Year Review
June 30, 2010
26
CONFIDENTIAL 26
Jun. 22, 2010
T4.4: Regular structures for variability-tolerant
asynchronous circuits (TMPO, STI)
Main Target:
Study regular structures of variability-tolerant asynchronous circuits and evaluate
their benefits on manufacturability and yield.
Highlights:
• Study for characterization of asynchronous cells and macro-blocks
completed
• Study of effect of variability on asynchronous circuits completed
• Set up of a flow to characterize asynchronous building blocks: Design of
about 40 different cells and different drivers
• Characterization of asynchronous cells designed has been completed
• CAD view ( .lib, functional, verilog, schematic, symbols , layout) defined
Work plan and on going activities:
• Characterization based on technology data to evaluate benefits of circuits
designed in term of manufacturability (support from industrial partner for
technology data access)
MODERN 1st Year Review
June 30, 2010
27
CONFIDENTIAL 27
Jun. 22, 2010
T4.5: Distributed reconfigurable PV-robust
architectures (THL)
– Definition of fault scenarios
– Definition of specification for solving the faults described in the fault scenarios.
• Fault tolerance Operating Library allowing the user to detect faults and to
solve detected problems.
• Definition of a set of functions (called through an API) that are used by the
operating system running on the architecture to detect faults, and by the
user to receive fault reports
– From the given information, the user computes a new tile mapping for running
processes
• After a reset of the chip, the new mapping will be used and the chip shall
continue working like before the fault, without using the faulty part
• The new mapping implies new communication schemes
– Next step includes the development of re-mapping generation tools
MODERN 1st Year Review
June 30, 2010
28
CONFIDENTIAL 28
Jun. 22, 2010
T4.5: Distributed reconfigurable PV-robust
architectures (LIRM) 1/2
Network layer (packet switching)
► Distributed,
homogeneous
MPSoC Architecture (HSScale Architecture), from
model to Hardware
Hardware processing layer
MIPS
R3000
► Run-Time
Task remapping
(Self Adaptive Task Migration)
► Distributed
RAM
NI
OS developed
► Monitors
(CPU load for
instance) used
The Network Processor Unit

Processor 32 bit type MIPS R3000 CPU

No MMU, OS kernel…

Simple Interface memory

gcc4.0.1 cross-compiler
MODERN 1st Year Review
June 30, 2010
29
CONFIDENTIAL 29
Jun. 22, 2010
T4.5: Distributed reconfigurable PV-robust
architectures (LIRM) 2/2
Validation
100
100,00%
90
90,00%
80
80,00%
70
70,00%
60
60,00%
50
50,00%
40
40,00%
REF
30
30,00%
20
20,00%
10
10,00%
0
FIFO monitoring
Throughput (KB/s)
Task Migration performances
0,00%
0
2
4
6
Throughput
8
10
Time (s)
IVLC FIFO
12
14
IQ FIFO
16
IDCT FIFO
Exploration
System C Model, Architecture Model
(Game theory for instance)
9000
Threshold
8000
7000
6000
5000
4000
3000
2000
1000
0
-1000
0
5
10
15
20
25
Threshold
30
35
40
MODERN 1st Year Review
June 30, 2010
30
CONFIDENTIAL 30
Jun. 22, 2010
WP4 Summary
All WP4 activities are on track and progressing according to milestones
D4.2.1 and D4.3.1 delivered on time (M12)
All other deliverables on track
Funding situation is not good: Several national public authorities haven’t
signed the contract and granted the expected funding
Many WP4 partners are suffering from this situation and even if some
activities were initially delayed, the strong commitment of WP4 partners to
MODERN kept all WP4 activities and deliverables on track
However, if lack of funding from national Pas will persist in 2010, this will
impact on WP4 activities and deliverables
WP4 is delivering innovative and outstanding scientific work with a prompt
and timely industrial exploitation and good cooperation among the partners
MODERN 1st Year Review
June 30, 2010
31
CONFIDENTIAL 31
Jun. 22, 2010