Fault modeling

Transcript Fault modeling

Fault Models and Injection Strategies in SystemC
Specifications
ANTONIO MIELE
Dipartimento di Elettronica e Informazione
[email protected]
2
Agenda
• Introduction
• Related work
• The considered simulator: ReSP
• Injection strategies
• Fault modeling
• Case studies
• Conclusions and future work
3
Introduction
• SystemC and TLM are the de-facto standard in HW/SW system
specifications. Goals:
 Simplify the design flow
 Increase simulation speed
 Allow an early evaluation of the system
• Even if there is an extensive literature for different aspects of the
design and analysis of TLM specifications, little work has been done
for reliability assessment
 Need for new fault injection strategies to be integrated in the novel
design frameworks
 Need for suitable fault models for high level specifications
4
Goal
• Propose a fault injection and analysis tool as an extension of ReSP, a
simulation platform for SystemC models targeted for multiprocessors
systems design and analysis
 Development of the necessary support within ReSP to ease the
injection and the analysis
 Definition of a fault modeling methodology for high level
descriptions specified with SystemC and TLM
5
Related Work
• Many fault injection tools and strategies have been defined for VHDL
descriptions
• Classical injection strategies:
 Mutants – the nominal component is replaced with an
instrumented one able to simulate the nominal and faulty
behaviors
 Saboteurs – a fault injection module is inserted on the line to be
corrupted
 Simulation commands – the simulation console provides features
for modifying the value of the signals
6
Related Work (2)
• Few approaches for SystemC descriptions:
 [Fin et al. 2001] adopts the mutant approach for injecting faults in
descriptions at different levels of abstraction considering only the
stuck-at fault for testing purposes
 [Misera et al. 2006] proposes a multi-language framework
exploiting saboteurs
 [Chang et al. 2007] explores through some examples the injection
at different levels of abstraction using saboteurs and analyzing
controllability and observability
 [Misera et al. 2007] investigates the three approaches in SystemC
descriptions and, in particular, simulation commands proposing a
patching of the SystemC kernel
7
Related Work (3)
• [Moraes et al. 2003] and [Martins et al. 2000] propose two frameworks
for fault injection in Java and C++ models using simulation commands
implemented by means of reflection features
• Background:
 Reflection (or introspection) is a feature offered by modern
programming languages such as Java or Python to discover,
access and modify the structure of a program at runtime
 E.g.:
>>> dir(MyComponent)
['port1', 'port2', 'register', . . . ]
>>> getattr(MyComponent, 'register')
3
8
ReSP Simulation Platform
• ReSP [Beltrame et al. 2008] is a simulator for hardware/software
system specifications
 Particularly targeted for transaction-level multiprocessor systems
• Built in SystemC and Python
 SystemC is the standard in hardware/software system
specification
• C++ extended with hardware modeling concepts
• C++ execution speed
 Python offers introspection and scripting capabilities
• Offers a non-intrusive visibility on all the platform elements
• Run-time composition and dynamic management of the specification
• Provides an enhanced simulation control (asynchronous
pause/resume commands)
9
ReSP Simulation Platform (2)
• Integration-oriented top-down design approach with a systematic
reuse of IP cores
 An IP repository is provided for building the system
 The designer can build the system by instantiating and connecting
components at runtime
• Advantages:
 Easy integration of new IPs
 Fine grain control of
system-level simulation
 Effortless development of
tools for system analysis
and design space
exploration
10
The proposed extension to the platform
• ReSP simulator has been adopted and extended for implementing a fault
injection environment
 Definition of fault models and injection strategies
• Required qualities:
 Transparency
• No instrumentation of the
components’ descriptions
 Dynamism
• Definition and planning of the
fault injections at run-time
 Independence from the system
description
• In particular from the
abstraction level of the
descriptions
11
Injection strategies
• Simulation commands
 The simulation platform has been extended for providing injection
commands based on Python reflection and scripting
• Saboteurs
 Saboteur models have been included in the IP repository
 They can be instantiated and connected to the components of the
considered system
• Mutant approach is supported even if not considered due to its
intrusiveness
 However, mutants can be included in the IP repository
12
Injection strategies (2)
• In SystemC, components are C++ objects
 Component internal state (registers, memories, ...) is modeled with
object attributes (variables)
• Transient faults can be modeled as a value change in an object
variable
• Simulation commands are implemented for transient faults by using
Python reflection
 Component methods and attributes can be retrieved with
introspection
 Object state modification is performed dynamically thanks to
introspection and scripting
13
Injection strategies (3)
• E.g.:
• Saboteurs are used for injecting faults on interconnections
4
proc
5
sab
mem
14
Injection strategies (4)
• Permanent faults require a register or an interconnection to be blocked
to a specified value
• Permanent fault strategy implementation as simulation commands are
based on the event watching mechanism
 Every time the faulty location is changed, it is updated with the
faulty value
• Saboteurs are used for injecting permanent faults on interconnections
15
Analysis and Execution Strategies
• Interactive simulation mode:
 The user controls the simulation through the console enhanced
with asynchronous execution pause and resume
• Batch simulation mode:
 Fault injection campaigns can be performed
• Provided facilities:
 Automatic instantiation of saboteurs
 Automatic instantiation of golden models
 Generation and execution of fault campaigns
 Collection and analysis of results
16
Analysis and Execution Strategies (2)
• System execution monitoring:
 Step by step execution
• Execute for a while and analyze system status
 Probes on interconnections
• Monitor events on interconnections
 Event watching mechanism
• Monitor events inside the components
17
Case Study: simulation of fault tolerant architectures
• Four different fault tolerant versions of the same architecture have
been considered for fault injection experiments
Loosely synchronized
dual processor
Lock-step dual processor
Processor running hardened SW
TMR processor
Demo after the presentation…
18
Fault modeling
• Which corruptions of internal state and interconnections represent real
physical misbehaviors?
• Definition of a methodology for modeling faults in IP descriptions
modeled at the different levels of abstractions supported by SystemC
• IPs description can be provided as:
 TLM description
• It can be used “as is”
 RTL description
• It has to be abstracted to a TLM model
• This analysis focuses on SEU
19
SystemC and TLM Abstraction Levels
• SystemC and TLM abstraction levels can be described in terms of
computation and communication aspects
20
Mutation Models
• A mutation model identifies all the elements of the specification that
can be manipulated to model the effects of a physical fault
• Mutation models have to be defined for each abstraction level
• The mutation model is the basis for the definition of a fault model
21
Mutation Models (2)
• RTL
 Every register of the circuit is represented in terms of object
attribute and all I/O are specified
 All actual fault locations can be accessed and corrupted
• BCA
 Core functionality is expressed algorithmically
 Mutation models are:
• Corruption of attributes that represent the internal state of the
component between different elaborations
• Corruption of all I/O
22
Mutation Models (3)
• AT and LT
 Core functionality is expressed algorithmically
 Communication is expressed with blocking/non-blocking function
calls and standard data structures
 Mutation models are:
• Corruption of attributes representing the internal state of the
component
• Interface function misbehavior:
23
Fault modeling
• The idea is to list the misbehaviors affecting the considered IP and
represent them in terms of mutation models
 Generation of a fault dictionary for each IP
• Fault modeling specifies how to use available mutation models
 Fault modeling gives a semantic meaning to the mutation models
• Two different approaches:
 IPs without RTL implementation
• Make assumptions during analysis due to the lack of implementation
details
 IPs with RTL implementation
• Define and abstract all physical misbehaviors
24
Fault modeling without RTL implementation
• The fault modeling strategy relies on assumptions on the possible
misbehaviors
• Misbehaviors are derived from the analysis of the LT (or AT) specification
Three main aspects have to be considered:
 Component attributes
• Can be corrupted according to their actual domain
 Interface functions
• Can be mutated as previously described
 Functional algorithmic description
• Each basic task composing the algorithmic description is mutated as:
– The task is not erroneously executed
– The task is executed when not requested
– The task is not executed
• The obtained fault models can be refined with the available information on
the implementation or on the synthesis process
25
Fault modeling with RTL implementation
• A complete taxonomy of physical misbehaviors can be defined for the
RTL model and abstracted to LT (or AT) level
• During component abstraction, we keep track of how fault locations
(i.e., storage elements) change between abstraction levels
• Computation abstraction:
 Registers used for storing data among different elaborations are
mapped on component attributes
 Registers used for storing data in the same multi-cycle elaboration
are removed since elaboration is performed atomically
• Communication abstraction:
 Registers used for multi-cycle communication protocols are
removed since transactions are executed atomically
26
Fault modeling with RTL implementation (2)
• Fault modeling:
 Registers used for storing data among different elaborations are
mapped on component attributes
Corruption of attributes
 Registers used for storing data in the same multi-cycle elaboration
are removed since elaboration is performed atomically
The effects of the faulty functionality on the component internal
state and output represented with the available mutation models
27
Fault modeling with RTL implementation (3)
• Fault modeling:
 Registers used for multi-cycle communication protocols are
removed since transactions are executed atomically
The effects of the faulty communication in terms of mutation
models
28
Case study: the NoC switch
• An LT model of a NoC switch has been considered
 5 I/O directions
 Alternating Bit Protocol used for flow control
 Data transmitted in units called flits (header, body and tail)
• Fault modeling on the LT description:
 Attributes can be corrupted according to their domains
• std::queue<struct flit> buffers[5]
• int reservation[5]
• int next
 The control flow RTL implementation is analyzed and faults abstracted
• Only phantom and fake calls corresponds to physical failures
 Two functions representing receiving and transmission can be
behaviorally corrupted
• An RTL model has been implemented and categorized in terms of faults
29
Case study: the NoC switch (2)
 Two functions representing reception and transmission can be
behaviorally mutated
30
Case study: the NoC switch (3)
• An RTL description of the circuit has been implemented:
• Results:
 RTL contains 1798 fault locations:
 the corruption of only 30 are not represented by fault models
 23 fault models have been defined for LT model:
 14 are effective, 7 redundant and only 2 do not correspond to
any physical misbehaviors
31
Conclusions and Future Work
• A fault injection environment has been implemented as extension of
ReSP simulation platform for SystemC/TLM specifications
• A fault modeling methodology has been proposed for SystemC/TLM
specifications
• Future work:
• Adoption of the fault injection environment and the fault modeling
methodology for the definition of a methodology for reliability
assessment of SystemC/TLM specifications