Synchronous Design - Vienna University of Technology

Download Report

Transcript Synchronous Design - Vienna University of Technology

Advanced Digital Design
GALS Design
Andreas Steininger
Vienna University of Technology
Outline

Global synchrony & clock distribution


types of synchrony
The GALS approach

communication

synchronization

Muller C-Element, Mutex & Arbiter

data driven clock & pausable clock

TMR example with pausible clock
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
2
Even/Odd Synchronizer




works for two periodic clocks only with frequency ratio
within certain range
avoids performance penalty of synchronizers
largely eliminates potential for metastability
for details see
[Dally & Tell, The Even/Odd Synchronizer, ASYNC 2010]
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
3
Types of Synchrony

synchronous



mesochronous



= multisynchronous
identical frequency (no accumulating drift) but
unknown maybe varying phase relationship (bounded)
example: different PLLs driven by the same source
plesiochronous



identical frequency, constant phase relation
classical synchronous system driven by one clock source
same nominal clock frequency, mutual (low) drift
independent clock sources with same nominal frequency
heterochronous


= multisynchronous
clocks totally unrelated
independent clock sources with different nominal frequency
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
4
Global Synchrony?

Problem 1: Clock distribution



Low-skew clock distribution becomes difficult for
large chips and high frequencies
Clock networks consume a considerable share of the
power
Problem 2: Clock selection



SoC contains many IPs, each specified for its own
frequency
specific frequencies required for some functions
(interface standards, e.g.)
dynamic local changes due to voltage & frequency
scaling, clock & power gating
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
5
Clock Distribution
synchronous approach:
TRGsrc
tCO
valid
tCO
tpd
valid
clock skew 1
TRGsnk
setup violation
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
6*
Clock Distribution
synchronous approach:
TRGsrc
tCO
alid
tpd
tCO
valid
clock skew 2
TRGsnk
hold violation
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
7*
Clock Distribution
asynchronous approach:
TRGsrc
ACK
tCO
tpd
completion TRGsrc
detection
ACK tCO
valid
valid
REQ delay
REQ
TRGsnk
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
8*
Clock Distribution
asynchronous approach:
TRGsrc
ACK
tCO
tpd
completion TRGsrc
detection
ACK tCO
valid
valid
ACK delay
REQ
TRGsnk
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
9*
Clock Distribution
asynchronous approach:
TRGsrc
ACK
tCO
tpd
completion TRGsrc
detection
ACK tCO
alid
valid
data delay
REQ
TRGsnk
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
10 *
The GALS Approach

SoC is clearly structured into IPs anyway

run each at its desired individual frequency
=> synchronous islands


efficient, well understood
communication between IPs


has to bridge clock boundaries
may run over larger distances
=> asynchronous paradigm (handshakebased) better suited for composition
Globally Asynchronous Locally Synchronous (GALS)
First mention in PhD thesis by Chapiro / Stanford 84
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
11
A GALS Example
DSP
CPU
2,7GHz
2GHz
PCI-IF
533MHz
Lecture "Advanced Digital Design"
USB-IF
24MHz
© A. Steininger / TU Vienna
12
Communication in GALS

Boundary Synchronizers



Shared Memory



data exchange decoupled through memory
shared memory needs arbitration
Dual-Clock FIFOs



direct data exchange
controlled by handshake => synchronizer
data exchange buffered through FIFO-queue
status flags need synchronization
Local Clock Stretching


direct data exchange
sender can halt receiver clock while data in transition
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
13
Boundary Synchronizers
CPU
2GHz







S
S
DSP
2,7GHz
data moving over clock domain boundary
metastability problems
=> need to insert handshake
…with synchronizers
and (optional) buffers
control flow sender / receiver strongly coupled
handshake loop limits speed
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
14 *
Shared Memory
CPU
2GHz
Arbitration
DSP
2,7GHz
shared memory




perfect decoupling of data path
potential metastability problems at arbitration logic
potential blocking through arbitration
low speed, high efforts => rarely used
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
15 *
Clocked FIFO
CPU
reg array
2GHz
full




DSP
Pointer
S
mgmt
2,7GHz
empty
good decoupling of data path
potential metastability problems with pointer mgmt.
potential blocking through full / empty
high speed, high efforts (reg array)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
16 *
Pausable Clocking
DSP
CPU
2GHz
latch
2,7GHz
pausable
clock




SRC: request SNK to stop clock
SNK: acknowldege stopping of clock
open data latch (safe now!)
SRC: release SNK clock blocking
SNK: release ACK, close data latch
start clocking (data stable now!)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
17 *
Pausable Clocking
CPU
2GHz
latch
DSP
2,7GHz
pausable
clock





coupling of data path
potential metastability problems with pausable clock
potential blocking through handshake for pausing
high speed, moderate efforts (pausable clock)
receiver clock distribution delay may cause problems
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
18 *
Fundamental Asynchronous
Building Blocks
will be needed for pausable clocking
(and others) …
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
19
Muller C-Element
a
IF a = b
THEN y = a
ELSE hold y
b
C
y
a
b
a
C
reset
RS
y
b
y
set
David Eugene Muller (1924 – 2008), Professor at Univ. of Illinois:
Muller, D. E.; Bartky, W. S. (1959), "A Theory of Asynchronous Circuits",
Proc. Int'l Symp. Theory of Switching, Part 1 (Harvard Univ. Press): 204–243
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
20
Function of a MCE
consider a MCE with n inputs

AND for transitions



n-of-n threshold gate


change output only if all inputs agree on changing
voter


need a  on all inputs for a  output
need a  on all inputs for a  output
keep old state until agreement on change
memory element

storage loop like D-latch, different input stack
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
21 *
Muller C-Element: Circuit
[Sutherland]
[Martin]
[van Berkel]
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
22
Mutual Exclusion

purpose:


function:




decide order of asynchronous events
handle pairs of request_in / grant_out
requests may arrive in any order
MUTEX must activate only one grant_out at a time
(respond to the first requester)
problem:

resolve concurrent requests
=> metastability problem
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
r1
g1
r2
g2
23
MUTEX: Circuit
SR-latch
r1
g1’
g1
Vout,latch
Vmeta
Vth,inv
r2
g2’
g2
Vout,inv
t
„Metastability filter“: e.g., lo-threshold inverter
BUT: Doesn’t a lo-threshold inverter produce glitches?
[from D. J. Kinniment „Synchronization and Arbitration in Digital Systems“, Wiley]
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
24 *
Popular MUTEX Implem.
SR-latch
metastability filter
C.L. Seitz, Ideas about arbiters, Lambda, 1 (fi rst quarter):10–14, 1980.
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
25 *
MUTEX: Operation
r1
g1’
g1
Vout,FF
Vmeta
Vth,inv
g2’
r2
g2
t
r1
4-phase
protocol
r2
g1
g2
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
26
MUTEX vs. Synchronizer

Synchronizer





purpose: synchronize asynchronous input
important: fast resolution in both directions
freedom: final decision not important
circuit: flip flop (special design)
MUTEX




purpose: serialize concurrent requests
important: never activate both grants
freedom: infinite resolution time
circuit: SR-latch plus metastability filter
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
27 *
Arbiter: Principle

purpose:


manage access of clients to shared ressource(s)
method:

handle pairs of request_in / grant_out

on the client side

on the ressource side
client requests may arrive in any order
 arbiter must assign one ressource
to only one client at a time
(respond to the first requester)
=> needs Mutual Exclusion (MUTEX)

Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
28
Arbiter: Function
Client 1
Client 2
C1r
C1g
R1r
C2r
R1g
Common
Resource 1
C2g
can have more than two clients: “multiway arbiter”
can have more than one resource
PhD Naqvi: Fault Tolerant NoC (incl . Arbiter)
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
29 *
Arbiter: Operation
C1r
C2r
R1r
R1g
C1g
C2g
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
30 *
Arbiter: Circuit
C1g
client 1
C1r
C
R1g
r1
g1
R1r
MUTEX
C2r
client 2
r2
Common
Resource
g2
merge requests
C2g
C
allow one request at a time only
delay request until
previous cycle finished
Lecture "Advanced Digital Design"
relay grant to requester
keep grant alive until
resource disables it
© A. Steininger & M. Delvai / TU Vienna
31 *
Tree Arbiter
Client 1
Client 2
Client 3
Client 4
C1r
C1g
C2r
R1r
C2g R1g
C1r
C1g
R1r
R1r
C2r
R1g
C1g R1g
C2g
C1r
Common
Resource
C2r
C2g
can add further tree levels to handle more clients
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
32
Data-Driven Clocking

Principle:




as soon as new data arrive => start clocking
determine number k of clock cycles required to
process new data
stop clocking after k cycles, wait for next data
Properties:




need to switch clock on and off
=> beware spurious clock pulses!
no metastability problem: data stable as soon as
consumer clock starts
potential for power saving
useful for very specific applications only (no pipe!)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
33
Data-Driven Clock: Circuit
CLK out
 CLK half
period determined by D
D
D
CLK out
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
34
Data-Driven Clock: Circuit
CLK out
 transition on
REQ answered
by transition
on CLK out
REQ
C
D
ACK
D
 min CLK half
period determined by D
CLK out
REQ
ACK
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
35 *
Pausable Clocking

Principle:




producer requests consumer‘s clock to pause
data are provided to input register during idle time
consumer‘s clock may resume

free running („pausable clock“)

with one cycle only („stoppable clock“)
Properties:



need to switch clock on and off
=> beware spurious clock pulses!
=> beware of clock tree delays!
producer controls consumer‘s clock (blocking!)
applications must be able to cope with paused clock
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
36
Pausable Clock: Circuit
CLK out
REQ
C
D
CLK out
 inverter generates
next REQ
from ACK
 self-oscillation
ACK
D
REQ
ACK
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
37 *
Pausable Clock: Circuit
CLK out
Mutex
ACK’
REQ’
 external unit can
safely stop CLK by
activating REQ’
C
 … and gets ACK’
as a response
D
D
CLK out
REQ’
ACK’
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
38 *
Pausable Clock: n Clients
CLK out
Mutex
Arbiter
C
REQ1
ACK1
REQ2
ACK2
D
 for more external sources an arbiter can be added
before the Mutex
 the two inverters can be eliminated by using a
Muller C-Element with inverting output
R. Mullins and S. Moore “Demystifying Data-Driven and Pausible
Clocking Schemes”, Proc. 13th Intl. Symp. on Advanced Research in
Asynchronous Circuits and Systems (ASYNC), 2007 pp. 175–185
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
39 *
Conventional TMR

Advantages:


Drawbacks:


Lecture "Advanced Digital Design"
mask all single
faults
single clock source
no recovery
© A. Steininger & M. Delvai / TU Vienna
40
GALS-TMR
 use independent clock => avoid single point of failure
 cannot do concurrent voting, since operation not in sync
 use voting over FF state at predefined intervals instead
PhD Lechner: Fault Tolerant GALS Architecture
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
41
GALS-TMR Details

every nth clock cycle



stop own clock
synchronize with others
perform recovery step
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
42
Pausable Clock vs. Crystal

pros:





cheap to implement internally
no extra pins
no mechanical issues (acceleration)
stoppable 
cons:




arbiter is no standard cell
frequency is not as stable (PVT)
frequency is not as high
lacking tool support
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
43
Summary (1)

The generally used MTBU formula does not assume
any knowledge about the input signal and its relation
to the clock. In practice, such knowledge can often
be exploited to optimize the synchronizer.

Synchrony is not a binary property, there is a range
of globally synchronous, mesochronous,
plesiochronous and heterochronous systems.

Asynchronous systems are tolerant against delays,
while synchronous systems are not. The GALS
approach therefore makes long-term communication
asynchronous, while retaining the efficient and well
proven synchronous paradigm for locally restricted
islands.
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
44
Summary (2)

GALS allows choosing the most appropriate clock for
each island.

Communication in GALS can be based on synchronizers, shared memory, FIFO or pausable clocking.

A data driven clock is activated on demand only
when data arrives to be processed.

A pausable clock can be stopped on demand. This is
useful in GALS when moving data from one domain
to the other, as it confines the potential for
metastability to the arbiter.

Even a fault-tolerant TMR solution based on
pausable clocks can be implemented that avoids the
clock source as a single point of failure.
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
45
Summary (3)

For the Muller C-Element, if both inputs match the
output will assume the same value.

The purpose of a MUTEX element is to select one
among two (or more) possibly concurrent client
requests. It may remain undecided for an arbitrary
time, but never select more than one clients.

The purpose of an arbiter is to grant access to one (or
more) resource(s) shared between two (or more)
clients. Again access must be granted to one client at a
time only.
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna
46