Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering Directorate NASA Langley Research Center [email protected] [email protected] 757-864-1097 (Tak) 757-864-1098 (Jeff) LaRC Ng MAPLD 2005 / A208

Download Report

Transcript Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering Directorate NASA Langley Research Center [email protected] [email protected] 757-864-1097 (Tak) 757-864-1098 (Jeff) LaRC Ng MAPLD 2005 / A208

Ng

Radiation Tolerant Intelligent Memory Stack (RTIMS)

Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering Directorate NASA Langley Research Center [email protected]

[email protected] 757-864-1097 (Tak) 757-864-1098 (Jeff)

LaRC

MAPLD 2005 / A208 1

Ng

Agenda

• What is it ?

• Goals • Components selection • FPGA SEU mitigation • XTMR tools • Status • Future work • Points to ponder

2

LaRC

MAPLD 2005 / A208

Ng

What is it ?

• Radiation tolerant – Use commercial-off-the-shelf (COTS) components • Reprogrammable FPGA • High performance • Lower cost – Pick parts with applicable mitigation techniques • Shielding, over-current protection, triple module redundancy, FPGA configuration scrubbing • Intelligent – Reprogrammable FPGA • SDRAM controller • Capacity to add custom logic • Memory – Large capacity • SDRAM • Stack – 3D vs 2D, board space saving 3

LaRC

MAPLD 2005 / A208

Ng

Goals

• Large memory capacity – 256 MB EDAC • Single +3.3V power supply • Simple interface, LVTTL compatible • Throughput – 32 MWord write – 16 MWord read • Reprogram via the JTAG interface • Spare FPGA gate capacity for user application • Radiation characteristics – Total ionizing dose of 100 krad (Si) at 25 o – SEU: best practice C – SEL of 60 MeV-cm 2 /mg requirement • Operating temperature: -40

o

C / +85

o

C 4

LaRC

MAPLD 2005 / A208

Ng

Components Selection (1/3)

• FPGA – Reprogrammable – Xilinx Virtex, Virtex-II • XQR2V1000 – Total ionizing dose of 200 krad (Si) (data sheet) – SEL of 160 MeV-cm 2 /mg (data sheet) – Current limiters • Limited SEFI – POR, SelectMAP, JTAG – 1.5E-6 upsets/device/day (data sheet) • SOFT – Mitigation techniques: TMR, configuration scrubbing – XQ2V1000-4BG575 • Military version for lower cost – SEL may not be as good as XQR2V1000 – SEL of 124 MeV-cm 2 /mg • Capacity of 1 M gates • 328 Signal I/Os

LaRC

MAPLD 2005 / A208 5

Ng

Components Selection (2/3)

• EEPROM – Xilinx XQR18V04 • Total ionizing dose of 10 krad (Si) (data sheet) – 30 krad (Si) for read only (data sheet) • SEL of 120 MeV-mg/cm 2 (data sheet) • SEU of 120 MeV-mg/cm 2 (data sheet) • SDRAM – Elpida EDS5108ABTA (512Mb) • Total ionizing dose of 50 krad (Si) • SEL of 80 MeV-mg/cm 2 at 85

o

C, 100

o

• SEU – Bit error rate of 6.96E-12 errors/bit-day C, 125

o

– SEFI error rate of 1.3E-4 errors/device-day C • Linear Regulator – Texas Instrument TPS75715 (1.5V LDO regulator) • Total ionizing dose of • SEL of 60 MeV-cm 2 /mg 10 krad (Si) 6

LaRC

MAPLD 2005 / A208

Ng

Components Selection (3/3)

• Current limiters – Maxim-IC MAX893L (1.2A) , MAX891L (0.5A) • Total ionizing dose SEL of 30 krad (Si) • Power-On-Reset circuit – Maxim-IC MAX803 • Total ionizing dose of 20 krad (Si) • Stacking technology – Provided by 3D Plus 7

LaRC

MAPLD 2005 / A208

Ng

Radiation Mitigation

• Total ionizing dose – Local shielding – Package shielding, thickness depend on requirement • SEL – Current limiting device • SEU – Memory contents • TMR, EDAC – FPGA SEU • Configuration scrubbing, TMR • SEFI – Best effort to minimize the SEFI rate – Mitigate at higher level 8

LaRC

MAPLD 2005 / A208

Block Diagram

Ng 9

LaRC

MAPLD 2005 / A208

Ng

FPGA SEU Mitigation (1/5)

• Input – Xilinx recommendation • Use 3 pins per signal, connected on the board • Bus signals: use one pin per signal, add EDAC, save pins – The sending side must generate EDAC check bits • Pins can be used up quickly – Implementation • Module Interface – Use 3 pins per signal for address/controls – Use 1 pin per signal for Din • EDAC is optional • Single point failure rate increases without EDAC 10

LaRC

MAPLD 2005 / A208

Ng

FPGA SEU Mitigation (2/5)

• Output – Xilinx recommendation • Use 3 pins per signal, connected on the board – Not glitch-free – Signal integrity • Bus signals: use one pin per signal, add EDAC, save pins – The receiving side must also implement EDAC • Pins can be used up quickly – Implementation • Module interface – Use 3 pins per signal for controls – Use 1 pin per signal for Dout • EDAC is optional • Single point failure rate increases without EDAC 11

LaRC

MAPLD 2005 / A208

Ng

FPGA SEU Mitigation (3/5)

• Output – Implementation … • SDRAM interface – Clock, Address • 3 sets, equivalent signals are not connected together on the board, • Each set drives two SDRAMs – Controls • 4 sets, equivalent signals are not connected together on the board • Two of the sets, each drives two SDRAMs • The other two sets, each drives one SDRAM • Switch EDAC/TMR configured SDRAM

LaRC

MAPLD 2005 / A208 12

Ng

FPGA SEU Mitigation (4/5)

• Bi-directional – Xilinx recommendation • Use 1 pin per signal • Path from voter to the pin becomes possible single point failure – Implementation • SDRAM Interface – TMR configured SDRAMs • 3 sets of data bus – EDAC configure SDRAMs • Use 1 pin per signal

LaRC

MAPLD 2005 / A208 13

Ng

FPGA SEU Mitigation (5/5)

• Implication on data integrity of the SDRAM contents – EDAC configured SDRAMs • 256 MB • Output drivers and input receivers are possible single point failure – TMR configured SDRAMs • 128 MB • No single point failure • Back ground SDRAMs content scrubbing 14

LaRC

MAPLD 2005 / A208

Ng

XTMR Tool (1/4)

• Fairly fast • Gates utilized – Average utilization cost of TMR is

~3.2x

– RTIMS actual • 4.3x

• Gates multiplier = 3 + 3 * (fraction of flops + fraction of I/Os) – It is closer to 3x for design that is mostly gates – It is closer to 6x for design that is mostly flops – RTIMS actual: 36% flops • Additional multiplier for design with SRL16 15

LaRC

MAPLD 2005 / A208

Ng

XTMR Tool (2/4)

• Internal performance degradation – Average performance impact of TMR is

~10%

– RTIMS actual • ~20% • 6 logic levels original – Add a voter, 7 levels – ~15% performance impact • Longer routing – 3.8x gates – ~5% performance impact 16

LaRC

MAPLD 2005 / A208

Ng

XTMR Tool (3/4)

• I/O performance degradation • Input Pin – TMR • Voters after the FF • Lock the FF in the IOB – No TMR on input pin • 3 FFs after the input receiver • Can’t lock the FF in the IOB • Performance penalty • RTIMS actual: increased from 1.8 ns to 3.6 ns 17

LaRC

MAPLD 2005 / A208

Ng

XTMR Tool (4/4)

• Output Pin – Triplicate pin, tied together on board • Add Voter before the output driver • Glitch • Can’t lock the FF in the IOB • Performance penalty • Signal integrity – Not triplicating pin • Add voter before the output driver • Glitch • Can’t lock the FF in the IOB • Performance penalty – RTIMS actual: increased from 4.5 ns to 6.4 ns 18

LaRC

MAPLD 2005 / A208

Ng

Storage state

• Correct SEU on storage state before the next SEU that make it uncorrectable • Memory content – Scrubbing • Flop state – Basic Xilinx flop: FDCPE(PRE, D, CE, C, CLR, Q) – Inputs of FLOP are corrected – Unless CE is active, the Flop state is not corrected.

– 3 minority voters and 3 OR gates can be added to force a CE on error detected – Expensive to apply this universally – For “almost” static flop, the following FLOP is used

LaRC

MAPLD 2005 / A208 19

Ng

A few other things (1/4)

• Digital Clock Manger – Use 3 DCMs for each DCM that is in the original design – DCM is a unit • SEU on a FLOP in the DCM – Corrected by configuration scrubbing – Reset only – 3 counters, each counter is clocked by a DCM – When one of the counter value is different from the other two, we know which DCM is operating differently than the others – Each counter is TMR so that a SEU on the counter other than the clock path will not produce an error

LaRC

MAPLD 2005 / A208 20

Ng

A few other things (2/4)

• Configuration scrubbing – Similar to Virtex – Virtex II • Whole configuration is loaded with 1 type 2 command • The order of configuration loading is – GCLK, CLB and IOB, Memory Content, and Memory Control – Script to split the loading into three type 2 command • GCLK, CLB, IOB • Memory control • Memory content – On power up the whole configuration is loaded – On scrubbing, only GCLK, CLB, IOB, and memory control are loaded

LaRC

MAPLD 2005 / A208 21

Ng

A few other things (3/4)

• Configuration scrubbing – Scrubber logic is TMR and it is part of the FPGA code – Master SelectMap for configuration with configuration clock continue to run after initial load – Scrubber logic is clocked by the configuration clock • The generation of the configuration clock becomes a possible single point failure • Can switch to Slave SelectMap and add an external oscillator 22

LaRC

MAPLD 2005 / A208

Ng

A few other things (4/4)

• SelectMap Interface SEFI detection – Implement a 16x1 distribute memory as SRL16 with initial value of all zeros – Instruct XTMR not to convert it to registers – Write a signature into this memory prior to configuration scrubbing – This memory shall be clear because of the reloading of the CLB during configuration scrubbing – Read the memory content after configuration scrubbing – A non-zero content indicates scrubbing failure

LaRC

MAPLD 2005 / A208 23

Ng

Stack

SDRAM MISC

24

LaRC

MAPLD 2005 / A208

Ng

Status

• 20 Modules – Related paper: "Radiation Tolerant and Intelligent Memory for Space" (P1025) – 144-Lead QFP package – Dimensions:42.5mm x 42.5mm x 13.0 mm – Mass: 70g with radiation shielding – Power: ~4.0 W peak – To Be Verified / Analyzed • Total Ionizing Dose > 100 krad (Si) • SEU in GEO less than 1.5E-6 per day • Latch-Up Immune to 60 MeV-cm 2 /mg

LaRC

MAPLD 2005 / A208 25

Ng

Future Work

• VHDL and Place & Route – Works in progress • Minimize SEFI • Error detection and recording • Error recovery • What is the SEFI rate of RTIMS ?

• Environment testing – Life test (accelerated component life testing) – 100 krad (Si) TID radiation tests – SEL and SEU radiation tests – Vacuum and temperature tests – Mechanical stress tests – Electrostatic discharge tests 26

LaRC

MAPLD 2005 / A208

Ng

Points to ponder

• XTMR – Not a turn key process • Scrub memory content • Almost static flop • DCM failure detection and reset • Glitch-free output is no longer glitch-free • Signal integrity with dotted output – IO • 3 pins for one signal, EDAC • Tie the triplicate IO together vs carry three signals on the board with the voter implemented on the receiving side – One size does not fit all

LaRC

MAPLD 2005 / A208 27