ppt - UCSD VLSI CAD Laboratory

Download Report

Transcript ppt - UCSD VLSI CAD Laboratory

Mobile System Considerations
for SDRAM Interface Trends
Andrew B. Kahng†‡, Vaishnav Srinivas‡¥
June 5th, 2011
CSE† and ECE‡ Departments
University of California, San Diego
Qualcomm Inc. ¥
Outline
• SDRAM Memory Interfaces: Today and Tomorrow
• Motivation
• Trends in DRAM Density and Data Rate
• Trends in Mobile Processor Requirements
• Memory Interface Calculator
• Exploration Using the Calculator
• Summary and Next Steps
(2/13)
SDRAM Memory Interfaces Today and Tomorrow
• Various interconnect and signaling options exist:
o Interconnect:
Die stack/MCP
POP
DIMM
3D-Stack
o Signaling:
DDR, XDR, Serial, Wide IO
• Exploration of these options based on the primary bounds
(Capacity, Throughput, Power and Latency) is required for
making the correct tradeoffs
(3/13)
Motivation
• The memory interface calculator includes:
o IO switching, bias and termination power
o IO/PHY and interconnect latencies
o Input parameters for exploration:
• Termination values
• Loading
• Number of data and strobe pins
• Memory timing parameters
• IO/PHY “retiming” power
• Predict gaps between offerings and requirements
• Integrating into CACTI can help exploration of system metrics
(4/13)
Trends in DRAM Capabilities
• DRAM densities to double
every 3 years
• Projections for DRAM densities
revised downwards over time
• Current densities at 4Gb/die
1000
100
10
1
8.
Gb/s
6.
5.
4.
3.
2.
1.
.
1999
2001
2003
2004
2005
2006
2007
2009
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
7.
1999
2001
2003
2004
2005
2006
2007
2009
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
9.
DRAM I/O Rate
(Source: ITRS ITWG)
DRAM Density (Gbits/die)
(Source: ITRS ITWG)
• DRAM data rates to
double every 4-5 years
• Projections for DRAM
data rates revised
upwards over time
• Current data-rates at
2.2 Gb/s
(5/13)
Trends in Mobile Processor Requirements
• Trends for mobile processor requirements
o Capacity to scale 3-4x every 3 years
o Throughput to double every 3 years
• The requirements are very dynamic!
• Quick exploration and projection for compatible memories is
useful
20
18
16
14
Market
2010
2011
2012
2013
2014
12
Desktop
3.0
4.2
5.6
7.4
10.2
10
Laptop
2.0
3.3
4.6
6.3
8.0
8
Mobile
0.3
0.5
0.8
1.0
1.3
6
4
Capacity Requirements in GB
(Source: IDC)
2
0
2008
2010
2012
2014
2016
2018
Mobile Handset Throughput Requirements in GB/s
(Source: Qualcomm)
(6/13)
Memory Interface Calculator
Primary Bound
Capacity
Throughput
Power
Latency
Parameters affected
Number of ranks and channels
Memory Density
Capacitive loading
Data-rate, number of data lanes
Timing parameters
Signal Integrity skew and jitter
Termination scheme
Supply voltage
Activity factor
Number of pipeline stages
Interconnect delay
Memory access time
(7/13)
Memory Interface Calculator Summary
Bound
LPDDR2
TSS-Wide IO
Clock Speed (MHz)
300-533,
DDR
200-333, SDR
Throughput (GB/s)
3-4.3
12-24
6-13
12-17
12-17
~40
~10
~120
~60
~20
~50
~35
~100
~50
~50
~90
~45
~220
~110
~70
Active Idle IO Power (mW)
~6-10
~2-4
~500-600
~450
~200
Active Idle Core Power (mW)
~20
~20
~150
~20
~20
2-8 for
dual-rank
DIMM
0.5-1 for
x32 dual rank
0.5-1 for
x32 dual rank
Peak IO Power Efficiency
(mW/GBps)
Peak Core Power Efficiency
(mW/GBps)
Total Peak Power Efficiency
(mW/GBps)
Capacity (GB)
(Current trends)
Latency from MC-DRAM-MC
0.5-1 for
0.5-2 through
x32
multi-die stacking
dual rank
~50ns
~40ns
DDR3
Serial
Mobile-XDR
400-800, DDR 4-8 GHz, Serial 400-533, Octal
~45ns, but
~65ns, PLL lock
penalty if DLL is
penalty if off
off (~512 Tck)
~60ns, DLL
penalty if off
(8/13)
Memory Interface Calculator Summary
• The spider chart highlights the design
space covered
o Wide IO covers the largest space
for lower capacities
o Large capacity systems still need
DDR3/DDR4
• Alternatives to be explored outside
the existing space?
30
30
25
25
20
20
LPDDR2 (2 x32)
LPDDR2 (2 x32)
WideIO (4 x128)
WideIO (4 x128)
Serial (x32)
Serial (x32)
Mobile Req
15
15
10
10
55
00
2008 2010
2010 2012
2012 2014
2014 2016
2016 2018
2018
2008
Memory Interface Design Space
Throughput
(2,25)
Max
LPDDR2
DDR3
Power
Efficiency
(0.002,0.04)
Capacity
(0,8)
LPDDR3
DDR4
M-XDR
Serial
Wide IO
1/Latency
(0.01,0.04)
• Before LPDDR3 came up in JEDEC,
Wide-IO and Serial Memory were
being explored.
• LPDDR3 was brought up as a way to
fill this gap in 2012-2014 timeframe
Throughput in GB/s
(9/13)
Exploration using the calculator
• How fast can LPDDR3 operate?
o
o
o
o
o
With terminations?
With DLL/better retiming?
With lower loading?
With better packaging?
POP versus MCP
• Wide IO exploration?
o Transition to DDR for Wide IO?
o Number of data lanes per strobe – 8, 16 or 32?
o When does interface timing and signal/power integrity become an issue
for Wide IO?
• High-capacity memory alternatives to DDR3/DDR4?
o MCP with larger number of wire-bonded dies?
o TSS with large number of stacks (8?)
o TSS-MCP if stacking with processor is a thermal risk?
(10/13)
LPDDR3 Exploration
Inputs to the calculator
Value
Number of memories on data pin
1
Number of memories on add pin
1
Number of memories on clk pin
1
Frequency of clock
Units
1250
MHz
Retiming current
25
mA
Number of data pins
32
Number of DQS pairs
8
Termination RTT on DQ & DQS
60
ohms
Termination RTT on CA
60
ohms
Memory density for each memory
core
4
Gb
TDS
100
ps
TDH
100
ps
TDQSQ
100
ps
TQHS
100
ps
Outputs of the calculator
Signal Swing on DQ&DQS, Vsw.
DQ
Switching Power on DQ
Switching Power on DQS
Switching Power on CLK + CLK
diff termination
Bias and Static Power
Signal Swing on CA, Vsw.DQ
Switching Power on CA
Termination Power
I/O power for CPU chip
Throughput
Capacity
Latency
Tskew
Tjitter
Terror
Timing margin WRITE
Timing margin READ
Value
Units
0.80
52.80
52.80
(V)
(mW)
(mW)
12.78
30.00
0.65
19.24
225.45
393.07
10
0.5
38.6
41
29
20
60
-5
(mW)
(mW)
(V)
(mW)
(mW)
(mW)
GB/s
GB
ns
ps
ps
ps
ps
ps
(11/13)
LPDDR3 Exploration
Maximum speeds for:
Preliminary Answers from the calculator
POP, Unterminated LPDDR3
with ~150ps memory timing parameters
(tDS/tDH/tDQSQ/tQHS)?
800MHz for single-rank
800MHz for dual-rank will need careful architecture
and design
POP, Terminated LPDDR3 with ~100ps
memory timing parameters?
1250MHz
External (MCP), Unterminated LPDDR3
Even 533MHz for dual-rank is
challenging and may need sophisticated retiming
External (MCP), Terminated LPDDR3?
1066MHz
(12/13)
Summary and Next Steps
• A simple framework to model interconnect and IO/PHY timing
and power for existing and upcoming SDRAM memory
interfaces
• Helps explore standards and design space
• Helps identify gaps between DRAM and SOCs
• Next Steps:
o Integrate the memory interface models within CACTI
o Challenge the calculator for future usage cases for mobile
products
o Include more parameters, including silicon area, packaging
options and number of data lanes per strobe pin
(13/13)