Keystone/LVS Technology Readiness Review Kevin Zhang

Download Report

Transcript Keystone/LVS Technology Readiness Review Kevin Zhang

Synchronization Ideas
Charles E. Dike
Intel Corporation
R
®
Charles Dike
1
Introduction
• Tutorial
• Share some ideas about
synchronization and metastability
• Introduce NEW, IMPROVED theory on
metastability
• Charles Dike ([email protected])
R
®
Charles Dike
2
Why and where synchronize?
 Reduce latency between independent clock domains.
 Asynchronous domain to synchronous clock.
 Synchronous clock to an independent synchronous
clock.
 Benefit - higher performance in critical circuits.
Synchronous Clock at 1.5GHz
Synchronous
Clock
at 3.0 GHz
Asynchronous
Circuit
Pausable
Clock
at 1.8 GHz
R
®
Charles Dike
Synchronous Clock at 1.5GHz
3
Design Direction
80s
towards 100MHz
90s
towards 1GHz
00s
multi-GHz
VALUE ADDED
FPU
FPU
MEM
MEM
ALU
R
®
Charles Dike
MEM
ALU
MEM
4
FPU
ALU
FPU
ALU
Chip Area Networks
Late 00s
multi-GHz
R
®
Charles Dike
5
I believe….
• We must be able to synchronize all
domains to a PLL controlled clock
• Interconnect on chip will be
asynchronous (GALS)
• We need to minimize latency
• There will be two basic synchronizer
uses - near neighbor and the chip net
R
®
Charles Dike
6
Topics of Discussion
• Generic synchronizer of the type used
in the TeraFlops computer
• Simple synchronizer of the type used in
StrongArm
• The Myrinet pipeline synchronization
scheme
• Latest understanding of metastability
R
®
Charles Dike
7
Generic Synchronizer
• Handles self timed to synchronous
interfaces and vice-versa
• Supports synchronous to synchronous
interfaces
• Can handle streaming data
• Adaptable to any speed range
• Possibly used over the chip network
R
®
Charles Dike
8
Two flop synch
VALID
D
#1
Q
D
#2
Q
CLK
R
®
Charles Dike
9
Single latch synch
ACK Q
D
Q
D
D
Q
D
Q REQ
Q
CLK1
CLK2
S
Write Valid
R
Read Valid
LATCH OUTPUT
SENDER CLOCK
RECEIVER CLOCK
R
®
Charles Dike
10
Multi latch synch
ACK Q
D
Q
D
D
Q
D
Q REQ
Q
CLK1
CLK2
S
R
Write Valid
ACK Q
D
Read Valid
Q
D
D
Q
D
Q REQ
Q
CLK1
CLK2
S
Write Valid
R
®
Charles Dike
R
Read Valid
11
General Case
WRITE
POINTER
0
0
0
0
0
1
0
0
0
0
FULL
STATUS
REGISTER
SYNC
PADDING
LATENCY
EN
1
1
1
1
1
0
0
0
0
0
READ
POINTER
S
Y
N
C
H
R
O
N
I
Z
E
R
S
EMPTY
EN
EN
Write Clock
Read Clock
Write Enable
R
®
Charles Dike
1
0
0
0
0
0
0
0
0
0
12
WRITE
POINTER
empty case
READ
POINTER
STATUS
REGISTER
Write Pointer a
Write Pointer b
Write Enable
Write Clock
SYNCHRONIZER
D Q
EN
R
D
D Q
EN
R
D
Q
D
R
Q
R
Q
R
D
Q
EMPTY
Read Pointer a
EMPTY
R
Read Clock
Read Pointer b
R
®
Charles Dike
13
WRITE
POINTER
0
0
0
0
0
1
0
0
0
0
FULL
General Case
STATUS
REGISTER
SYNC
PADDING
LATENCY
EN
1
1
1
1
1
0
0
0
0
0
READ
POINTER
S
Y
N
C
H
R
O
N
I
Z
E
R
S
EMPTY
EN
EN
Write Clock
Read Clock
Write Enable
R
®
Charles Dike
1
0
0
0
0
0
0
0
0
0
14
Topics of Discussion
• Generic synchronizer of the type used
in the TeraFlops computer
• Simple synchronizer of the type used in
StrongArm mprocessor
• The Myrinet pipeline synchronization
scheme
• Latest understanding of metastability
R
®
Charles Dike
15
Simple Synchronizer
• Constrained by frequency ratio
• Supports synchronous to synchronous
interfaces
• Does it support asynch to synch? Yes,
with restrictions.
• Possibly used in local neighbor
synchronizers
R
®
Charles Dike
16
Simple Synchronizer
D
SLOW CLK
w
Q
A
D
Q
x
MI*
A1
D
y
Q
A2
D
z
Q
SYNC
A3
Divide by 2
FAST CLK
MI* = Metastable Immune
R
®
Charles Dike
17
timing1
D
Q
A
SLOW
D
Q
A1
D
Q
A2
D
Q
SYNC
A3
MI*
Divide by 2
FAST CLOCK
1
FAST
2
3
4
5
6
SLOW CLOCK
A
A1
A2
A3
SYNC
R
®
Charles Dike
18
timing2
D
Q
A
SLOW
D
Q
A1
D
Q
A2
D
Q
SYNC
A3
MI*
Divide by 2
FAST CLOCK
1
FAST
2
3
4
5
6
SLOW CLOCK
SYNC
CHEATER CLOCK
R
®
Charles Dike
19
timing3
D
Q
A
SLOW
D
Q
A1
D
Q
A2
D
Q
SYNC
A3
MI*
Divide by 2
FAST CLOCK
1
FAST
2
3
4
5
6
SLOW CLOCK
SYNC
CHEATER CLOCK
R
®
Charles Dike
20
timing4
D
Q
A
D
SLOW
Q
A1
D
Q
A2
D
Q
SYNC
A3
MI*
Divide by 2
D
Q
FAST
A
D
Q
A1
D
Q
A2
D
Q
SYNC
A3
MI*
FAST
FAST CLOCK
1
2
3
4
5
6
SLOW CLOCK
SYNC
SLOW CLOCK#
SYNC
R
®
Charles Dike
21
FAST CLOCK
1
transfers
2
3
4
5
6
SLOW CLOCK
SYNC
CHEATER CLOCK
SLOW TO FAST TRANSFER
D
Q
D
FAST TO SLOW TRANSFER
Q
D
Q
D
SLOW CLOCK
SYNC
FAST CLOCK
R
®
Charles Dike
SYNC
FAST CLOCK
SLOW CLOCK
22
Q
Topics of Discussion
• Generic synchronizer of the type used
in the TeraFlops computer
• Simple synchronizer of the type used in
StrongArm
• The Myrinet pipeline synchronization
scheme
• Latest understanding of metastability
R
®
Charles Dike
23
Pipeline Synchronizer
• Supports synchronous to synchronous
interfaces
• Supports asynch to synch and viceversa
• Possibly used in local neighbor
synchronizers
• Essentially a distributed fifo and
synchronizer
R
®
Charles Dike
24
Pipeline Synchronizer
f0
S
f1
Ri
Ro
Di Do
Ai
R
®
Charles Dike
Ao
S
f0
Ri
Ro
Di Do
Ai
Ao
S
Ri
Ro
Di Do
Ai
Ao
25
ME element
f0
S
R
®
Charles Dike
f0
R1 A1
REQ
ME
R0 A0
X
26
Fifo element
Ro
Ri
C
Ri
Ro
Di Do
Ai
Data
Ao
C
Ai
R
®
Charles Dike
Ao
27
Async to sync
Asynchronous
Synchronous
f0
S
f1
Ri
Ro
Di Do
Ai
Ao
S
f0
Ri
Ro
Di Do
Ai
Ao
S
Ri
Ro
Di Do
Ai
Ao
f0
f1
R
®
Charles Dike
28
Sync to async
Asynchronous
Ro
Synchronous
Ri
Ro
Do Di
Ao
Ai
Ri
Ro
Do Di
S
f0
Ao
Ai
Ri
Do Di
S
f1
Ao
Ai
S
f0
f0
f1
R
®
Charles Dike
29
Points to ponder #1
• All synchronizing interfaces have one thing in
common - a latching element that holds data while
metastabilities are being resolved.
• There is no way to avoid the latency which is
required to resolve metastabilities.
• To minimize latency the latching element
characteristics can be improved.
• We will be required to understand and use this
knowledge. This is the future of digital design.
R
®
Charles Dike
30
Topics of Discussion
• Generic synchronizer of the type used
in the TeraFlops computer
• Simple synchronizer of the type used in
StrongArm
• The Myrinet pipeline synchronization
scheme
• Latest understanding of metastability
R
®
Charles Dike
31
Role of the Synchronizing Flop
• Reorients incoming information to a
clock edge
• Its performance determines system
failure rate or latency
R
®
Charles Dike
32
Real Life
• There is no magic bullet
• There is a lot of misinformation on metastability
around
• To date many circuits have been over designed
through planning and luck
• Whenever a circuit fails based on too high of a
frequency ultimately the cause of failure is
metastability
• There is no way to synchronize a signal faster than
about the time it takes to pass a signal through six
static gates
R
®
Charles Dike
33
Metastability is....
OUT
SET
NODE A
NODE B
OUT
RESET
R
®
Charles Dike
34
Technical terms
• Tw (window size) - likelihood of entering a
metastable state - in units of time
• Tau (t) - rate at which metastability
resolves - in units of time
• MTBF (Mean Time Between Failures)
MTBF =
<Vn2>=4kT/C
R
®
Charles Dike
e t/t
Twfdfc
< thermal noise
35
Simple jamb latch
NODE A
NODE B
OUT
DATA
CLOCK
RESET
D time of data
after clock
Propagation delay
R
®
Charles Dike
36
Simple jamb latch
NODE A
NODE B
OUT
DATA
CLOCK
RESET
~RC time constant
D time of data
after clock
Propagation delay
R
®
Charles Dike
37
Rough Histogram
Tw
The slope is the t
D time of data
after clock
(log scale)
D time of data
after clock
Propagation delay
Propagation delay
t/t
e
MTBF =
Twfdfc
R
®
Charles Dike
38
Why is the theory a problem?
• It assumes a uniform distribution of data about
the clock
– What happens when data always violates the setup/ hold window?
• It is not detailed enough
– Doesn’t consider a deterministic region
– Doesn’t account for thermal noise
• People tend to extrapolate the theory
improperly
t/t
e
MTBF =
Twfdfc
R
®
Charles Dike
39
Overview of refined theory
• Not everything past a normal propagation
is a metastable event
• The Tw window can’t be improved by
input edge rates
• Tw has a complex relationship to t based
on load
• The MTBF formula needs to be modified
due to non-uniform distribution of data
about the clock input
R
®
Charles Dike
40
Schematic
R
®
Charles Dike
41
Simulation
of a typical latching device
Simulation of Typical Latching Device
tau= 29.9 ps, Tw= 211.9 ps normal prop= 189.2 ps
1000
Window width in ps
100
4.8 ps
2.8 ps
1.8 ps
10
0.8 ps
1
0.15
0.2
0.25
0.3
0.35
0.1
propagation delay
delayininpsns
propagation
R
®
Charles Dike
42
Test case
PULSE GENERATOR
#1
DELAY
PC
D
R
Q
TRIGGER
TEK 11801-B
OSCILLOSCOPE
PULSE GENERATOR
#2
R
®
Charles Dike
DELAY
INPUT
43
Measuring real data
10000000
1000000
100000
10000
1000
Series1
100
10
-3.00E-10
-2.50E-10
-2.00E-10
-1.50E-10
-1.00E-10
-5.00E-11
1
0.00E+00
0.1
advancing time
R
®
Charles Dike
44
5.00E-11
1.00E-10
Histogram
0.6mv/0.1ps
Inflection point
time
R
®
Charles Dike
45
Histogram
0.6mv/0.1ps
Inflection point
time
R
®
Charles Dike
46
Measured versus Basic
Tw
0.6mv/0.1ps
The slope is the t
D time of data
after clock
(log scale)
Propagation delay
Propagation delay
t/t
e
MTBF =
Twfdfc
R
®
Charles Dike
47
t Simulated....
Battery
Voltage
Controlled Switch
R1 = 100 W
R1 = 100M W
R
®
Charles Dike
48
Tau Simulated 2
Latch outputs at nodes 1 and 2
1.5
volts
1.0
t=
0.5
| t1 - t2 |
ln
V2
V1
0.0
1.0
1.2
1.4ns
Semilog difference between latch outputs
Where:
V1 = voltage at time t1
V2 = voltage at time t2
100
t2
volts
10-3
t1
R
®
10-6
Charles Dike
1.0
time
1.2
1.4ns
49
2
<Vn >=4kT/C=4kTBR
k = 1.38 x 10-23 J/K
t = 20 picoseconds
B = 1/t = 5 x 1010Hz
R = ~400 W
T = 300o K
Vn = ~0.6 mv
R
®
Charles Dike
50
Putting it all together
1.80 ns
normal
180 ps
18.0 ps
1.80 ps
0.18 ps
18.0 fs
1.80 fs
0.18 fs
-50
R
®
Charles Dike
0
50
100
150
(picoseconds)
200
250
51
A
Putting it all together
deterministic ?
1.80 ns
180 ps
18.0 ps
1.80 ps
0.18 ps
18.0 fs
1.80 fs
0.18 fs
-50
R
®
Charles Dike
0
50
100
150
(picoseconds)
200
250
52
B
Putting it all together
deterministic
1.80 ns
180 ps
1.80 v
18.0 ps
180 mv
1.80 ps
18.0 mv
Thermal noise point
0.18 ps
1.80 mv
18.0 fs
180 mv
1.80 fs
18.0 mv
0.18 fs
1.80 mv
-50
R
®
Charles Dike
0
50
100
150
(picoseconds)
200
250
53
C
Putting it all together
deterministic
1.80 ns
true metastability
180 ps
18.0 ps
1.80 ps
0.18 ps
18.0 fs
T=19 ps
1.80 fs
0.18 fs
-50
R
®
Charles Dike
0
50
100
150
(picoseconds)
200
250
54
D
Putting it all together
deterministic
1.80 ns
true metastability
180 ps
18.0 ps
Tw=15 ps
1.80 ps
0.18 ps
18.0 fs
T=19 ps
1.80 fs
0.18 fs
-50
R
®
Charles Dike
0
50
100
150
(picoseconds)
200
250
55
E
e t/t
MTBF =
Twfdfc
e(t-deter)/t
MTBF =
Twfdfc
e(t-0.5*deter)/t
MTBF =
Twfdfc
R
®
Charles Dike
Simple case
Worst case
Expected
56
Points to ponder #2
Jakov Seizovic postulated a “malicious” asynchronous
signal:
no matter how we position the sampling window, and no
matter how small we make the sampling window, the
asynchronous transition will appear in that window.
This case has to be assumed when interfacing to a signal of
unknown probability distribution.
We know something about just how malicious a signal can
be.
R
®
Charles Dike
57
Exploring
R
®
Charles Dike
58
Worst case bound
R
®
Charles Dike
59
Not worst case bound
Uniform distribution
12 ps jitter
< 0.1 ps
R
®
Charles Dike
60
Final comments
• With the proper synchronizing device it may be possible to
synchronize a signal within a single clock cycle. The constraints
are:
– You require about 35 ts in order to get the MTBF out to about 1
century.
– Each typical static gate delay is equivalent to about 5 ts in a properly
designed synchronizing flop.
– The metastability MTBF of a device should probably be an order of
magnitude better than the mechanical MTBF.
– You must assume a ‘malicious’ input to the synchronizer.
Nevertheless, this only adds about 5 ts to the delay.
– Standard flop designs are generally very poor synchronizers. Use a
jamb structure. It has the best transconductance.
– You should never require more than two synchronizing flops in series
R
®
Charles Dike
61
Conclusion
• There are several ways to communicate between
independent domains
• I believe more asynchronous domains will appear that are
imbedded within synchronous designs
– Latency must be reduced to maximize the use of asynchronous designs.
– This is a burden that asynch designers must bear
– We need to know the limitations of synchronization and metastability
• Chip area networks are coming and they will open up
opportunities for asynchronous design
R
®
Charles Dike
62
References
• T. Sakurai, “Optimization of CMOS Arbiter and Synchronizer
Circuits with Submicrometer MOSFET’s,” IEEE J. Solid State
•
•
•
•
•
R
®
Charles Dike
Circuits, vol. 23,no. 4, pp. 901-906, Aug 1988.
L. Kleeman and A. Cantoni, “Metastable Behavior in Digital
Systems,” IEEE Design & Test of Computers, pp. 4-19, Dec 1987.
I. E. Sutherland, “Micropipelines.” Turing Award Lecture,
Communications of the ACM, 32(6), pp.720-738, 1989.
J. N. Seizovic, “Pipeline Synchronization,” Proc. Int’l Symp.
Advanced Research in Asynchronous Circuits and Systems, CS Press,
1994.
C. Dike and E. Burton, “Miller and Noise Effects in a Synchronizing
Flip-Flop,” IEEE J. Solid State Circuits, vol. 34,no. 6, pp. 849-855,
June 1999.
A. Van der Ziel, Noise in Measurements. New York: Wiley, 1976.
63
Overview of present theory
• Everything past a normal propagation is
considered a metastable event
• A deterministic region doesn’t exist
• Tw has no fixed relationship to t
• The MTBF formula assumes a uniform
distribution of data about the clock input
MTBF =
R
®
Charles Dike
e t/t
Twfdfc
64