Dependability Benchmarking of Soc

Download Report

Transcript Dependability Benchmarking of Soc

Dependability Benchmarking of
Soc-embedded control systems
Juan-Carlos Ruiz
Technical University of Valencia, Spain
3rd SIGDeB Workshop on Dependability Benchmarking
ISSRE 2005, Chicago, Illinois USA, November 2005
Outline

Benchmarking Context

Benchmark Specification

Benchmark Prototype

Ongoing Work
SoC-embedded
Automotive Control Applications
Per-Cylinder
Knock
Powertrain
Electronic
Control
Units
Electronic
Fuel Injection
Entertainment
Body & Chassis
Electronic
Valve timing
Electronic
Ignition
&
Communications
Electronically
controlled
Automatic
Transmission
Electronic
Combustion
Any Engine ECU handles (at least) …


Fuel injection

Angle  When is the fuel injected?

Timing  How long is the injection?
Air management

Flow shape  How does the air enter the cylinder?

Air Mass
 How much air enters the cylinder?
Engine ECU model
sensors
Air management
Throttle
Engine
internal
variables
Flow
shape
Fuel injection
Electronic
Control
Unit
timing
control software
air mass
angle
RTOS (optional)
Outputs from
the ECU
control loops
µController-/DSP-based Hardware
actuators
Outline

Benchmarking Context

Benchmark Specification

Benchmark Prototype

Ongoing Work
Measures for
powertrain system integrators (I)
ECU Failure modes
• No new data (stuck-at failure)
• Delayed data (missed deadline)
• Output control deviation
- Close to nominal
- Far from nominal
ECU control outputs
Fuel injection
• Angle
• Timing
Air management
• Volume
• Flow Shape
• Unpredictable
e.g. the engine stops, the engine is damaged
• Predictable
- Noise & Vibrations
- Non-optimal behavior
Powertrain System Failure Modes
Measures for
powertrain system integrators (II)
{Number of failures of this type in the considered output / Total number of experiments}
ECU control outputs
Unsafety levels
+
-
Fuel injection
Angle
Air management
Timing
Failure modes
Engine ECU Reset
Value
Time
Note:
Air Mass
Flow shape
unpredictable
No new data
unpredictable
non-optimal
noise & vibration
noise & vibration
Close to nominal
non-optimal
noise & vibration
non-optimal
non-optimal
Far from nominal
unpredictable
non-optimal
noise & vibration
noise & vibration
Delayed data
non-optimal
noise & vibration
non-optimal
non-optimal
(1) Table referred to a diesel engine PSA DW12
Throttle
Model
sensors
System Under Benchmarking (SUB) &
Dependability Benchmark Target (DBT)
Air management
Fuel injection
Flow
shape
timing
throttle and the
engine are replaced
angle
RTOS
For economical and
safety reasons, the
DBT
volume
Engine
internal
variables

ECU
Outputs from
the ECU
control loops
µController-/DSP-based Hardware

actuators
by two models
These models must
be customised
according to each
Engine
Model
specific engine
SUB
Tools supporting the definition
of Engine models
real
engine

If available, we can use a
synthetic model of the engine

Otherwise, the model can be
obtained from the real engine:
1.
Run the workload
2.
Trace the engine behaviour
3.
The resulting traces are
those defining a model of the
engine behaviour in absence of
faults
Workload

Workload = Engine internal variables + Throttle

Engine internal variables are generated by the engine

Throttle inputs are computing according to one of the
following driving cycles:

Acceleration-Deceleration cycle

Urban Driving Cycle

Extra-Urban Driving Cycle
Emission certification
of light duty vehicles in Europe
(EEC Directive 90/C81/01)
Worload detail
Urban Driving Cycle
Speed average
Time
Distance
Maximum speed
18.35 km/h
13.00 min
3976.11 m
50.00 km/h
Extra-urban Driving Cycle
Speed average
Time
Distance
Maximum speed
61.40 km/h
6.67 min
6822.22 m
120.00 km/h
Faultload
“About 80% of hardware faults are transient, and as VLSI
implementation use smaller geometries and lower power
levels, the importance of transients increases” [Cunha
DSN2002] [Somani Computer1997][DBench ETIE2 report]

Transient physical (hardware) faults affecting the
ECU memory that are software-emulated using the
single bit-flip fault model
Benchmark Conduct
Golden run
Sequence of experiments
Experiment 1 Experiment 2
fault injection
Event-oriented
measures
Start-up
fault activation
(error)
Fault injection
Time-oriented
measures
error detection
…
Experiment N
failure
ECU’s activity Observation
error detection
latency
observation time
Technical considerations

Engine ECUs are typically manufactured as SoCs



Control software is stored and executes inside a chip
(observability and controllability issue)
Running faultloads without introducing spatial and temporal
intrusion is a challenge
Our advises:

Exploit on-chip debugging features currently supplied by most
automotive embedded microcontrollers



On-the-fly memory access
Program and data tracing facilities
To increase portability select standard and processor
independent OCD mechanisms, like the ones defined by Nexus
Outline

Benchmarking Context

Benchmark Specification

Benchmark Prototype

Ongoing Work
Benchmark Prototype
ECU software runs here
USB link
In-Circuit
Debugger for
Nexus
Nexus
Adapter
MPC565
Evaluation Board
Benchmark Analysis
RAW
measures
Probes
1
Internals
Memory
position
Error
Handlers
Error
Latencies
Failures
Inducing a
predictable
engine behavior
Non-optimal
Periodic
Outputs
Internals
Externals
Error
Coverage
&
Distribution
Non-Periodic
Outputs
Events
Dependability
Measure
Internals
2
Externals
Inducing an
unpredictable
engine behavior
Vibrations
Error
Activation
Externals
Error
Detection
Failures
Far from
Nominal
No
Failure
No new
Data
Missed
Deadline
3
Close to
Nominal
Case Study: Diesel ECUs (DECUs)
Inputs from
Sensors
Engine ECU Control Loops
Intake air pressure
Outputs to
Actuators
Fuel Timing
(20 ms)
Common rail compressor
discharge valve
Fuel Angle
(50 ms)
Swirl valve
Common rail pressure
Crankshaft angle
Waste gate valve
Camshaft angle
Throttle position (Reference speed)
Current engine speed (in rpm)
ECU version 1
•
•
•
•
Implemented on a RTOS called μC/OS II
Each control output computed in a different OS task
Tasks uses semaphores & waiting facilities of the OS
OS scheduling policy is Rate-monotonic
Air Shape
(20 ms)
Air Volume
(500 ms)
Injector1
…
InjectorN
…
ECU version 2
• Implemented without OS
• Each control output computed in a different program
procedure
• The main program schedules the execution of each
program procedure
• The scheduling policy is computed off-line
Some Results
Acc.-Dec. Cycle
DECU with RTOS
Failure Ratio:
- Upredictable:
Urban Driving Cycle
DECU with RTOS
1.775 %
0.598 %
Failure Ratio:
- Upredictable:
Extra-Urban Cycle
DECU with RTOS
5.76 %
1.35 %
Failure Ratio:
- Upredictable:
5.10 %
2.72 %
- Noise & Vibrations: 0.539 %
- Noise & Vibrations: 2.48 %
- Noise & Vibrations: 0.00 %
- Non-Optimal:
- Non-Optimal:
- Non-Optimal:
0.638 %
DECU without RTOS
Failure Ratio:
10.659 %
1.93 %
DECU without RTOS
Failure Ratio:
2.38 %
2.38 %
DECU without RTOS
Failure Ratio:
5.76 %
- Upredictable:
2.52 %
- Upredictable:
0.34 %
- Upredictable:
1.28 %
- Noise & Vibrations:
3.517 %
- Noise & Vibrations:
1.36 %
- Noise & Vibrations:
1.6 %
- Non-Optimal:
4.622 %
- Non-Optimal:
0.68 %
- Non-Optimal:
2.88 %
(Results obtained from a 5 days benchmark execution  300 exp. per driving cycle)
Practical considerations

Observation time after fault injection is limited by the
trace memory of the debugger connected to the
debugging ports

The number of probes that can be connected to a
debugging port is limited. Thus, obtaining the
benchmarking measures should require to run several
times the same golden run or experiment.
Outline

Benchmarking Context

Benchmark Specification

Benchmark Prototype

Ongoing Work
Current Working Context

ARTEMIS workshop, June-July 2005, Paris



Increasing interest of the industrial community in
the use of SW components in (SoC-)embedded
systems
Need of benchmarking other types of components
in control systems (RTOS, Middlewares, etc.)
To what extend what we know can be applied
to such type of reseach?
Ongoing Research

SoC systems = compound of components


Parameter corruption techniques of major
interest to evaluate component robustness


Component = Interface + Implementation
New technique for parameter corruption in SoCs
using OCD mechanisms [PRDC11 (to appear)]
The key issue here is not to reinvent the
wheel but rather to explore to what extend
what exists can be applied to SoCs
Thanks for your attention!!
Any question, comment, or suggestions ?
Benchmark measures

Failure modes in control outputs

Time failures (out of time control delivery)

Value failures (no new value, value in tolerable bounds, value out
of tolerable bounds)


Impact of failures over the system and users (unsafety levels)

Without consequences

With consequences, but non-catastrophic

With catastrophic consequences
Benchmark performers must correlate, for each control output,
failure modes and their impact over the system and users
Some Results
Urban Driving Cycle
Extra-Urban Driving Cycle
Number of BEs: 300
Number of BEs: 300
DECU with RTOS
DECU with RTOS
Failure Ratio:
- Upredictable:
- Noise & Vibrations:
- Non-Optimal:
5.76 %
1.35 %
2.48 %
1.93 %
DECU without RTOS
Failure Ratio:
- Upredictable:
- Noise & Vibrations:
- Non-Optimal:
Failure Ratio:
- Upredictable:
- Noise & Vibrations:
- Non-Optimal:
5.1 %
2.72 %
0%
2.38 %
DECU without RTOS
2.38 %
0.34 %
1.36 %
0.68 %
Failure Ratio:
- Upredictable:
- Noise & Vibrations:
- Non-Optimal:
5.76 %
1.28 %
1.6 %
2.88 %
Some Results
Acc.-Dec. Cycle
DECU with RTOS
Failure Ratio:
- Upredictable:
Urban Driving Cycle
DECU with RTOS
1.775 %
0.598 %
Failure Ratio:
- Upredictable:
Extra-Urban Cycle
DECU with RTOS
5.76 %
1.35 %
Failure Ratio:
- Upredictable:
5.10 %
2.72 %
- Noise & Vibrations: 0.539 %
- Noise & Vibrations: 2.48 %
- Noise & Vibrations: 0.00 %
- Non-Optimal:
- Non-Optimal:
- Non-Optimal:
0.638 %
DECU without RTOS
Failure Ratio:
10.659 %
1.93 %
DECU without RTOS
Failure Ratio:
2.38 %
2.38 %
DECU without RTOS
Failure Ratio:
5.76 %
- Upredictable:
2.52 %
- Upredictable:
0.34 %
- Upredictable:
1.28 %
- Noise & Vibrations:
3.517 %
- Noise & Vibrations:
1.36 %
- Noise & Vibrations:
1.6 %
- Non-Optimal:
4.622 %
- Non-Optimal:
0.68 %
- Non-Optimal:
2.88 %
(Results obtained from a 5 days benchmark execution  300 exp. per driving cycle)
Experimental Set-up
Experiment
Analyzer
SUB Monitor
Experimental
measurements stored in
Experiment
Repository
Benchmark Target activity
monitoring
Faultload
Controller
Fault
Injection
Process
Monitoring interface
Faultload
interface
Benchmark
Manager
Dependability
Measures
COTS software Components
(Potential Benchmark Targets)
Workload
interface
Exercise SoC-embedded
components
System Under Benchmarking
(SUB)
Workload
Controller
Results
3000 experiments
SW Configuration: RTOS (μC/OS II)
Workload: Acceleration-Deceleration
31%
Error
No error
69%
detected
errors
26,7 %
OTHER
4%
DTLBER
0%
CHSTP
6%
ITLBER
0%
SEE
38%
Non-detected errors 73,3 %
MCE
31%
Failure ration 40 %
ALE
6%
FPUVE
2%
SYSE
0%
FPASE
13%
Fault Injection Procedure
FI experiment setup
start
SoC
Power-up Reset
SoC software is
loaded in memory
Temporal trigger
Yes
Timer expires ?
set watchpoint
SoC software
starts execution
Spatial trigger
Set a external
timer (e.g. in a PC)
Yes
watchpoint
message?
No
No
Fault injection process
Read Memory
Bit-flip
Write Fault
end
Hardware Fault models

Transient faults (Single & Multiple bit-flip)
Memory location
0x00014B04
1. Read Memory
(e.g. bit71000.0011bit0)
XOR
2. Bit-flip = Memory  Mask
(e.g. mask: bit70011.1000bit0)
3. Write fault
(e.g. bit71011.1011bit0)

Permanent faults (stuck-at model)
Continuous monitoring of the location where the fault must be
introduced:
stuck-at “1”  bit-flip = Memory OR Mask (bits to flip at “1”)
stuck-at “0”  bit-flip = Memory AND Mask (bits to flip at “0”)
Technical considerations

The number of probes that can be connected to a debugging port
is limited. Thus, studying the system activity in presence of faults
should require to run several times a fault injection experiment

The observation time after a fault injection is limited by the trace
memory of the components connected to the debugging ports
FI campaign
Golden run
Golden run
Golden run
Experiment 1
Experiment 2
Experiment N
Experiment 1´
Experiment 2´
…
FI experiments
(1 fault per experiment)
Experiment N´
fault injection
error detection
Experiment
fault activation (error)
failure
INERTE : Integrated NExus-based RealTime fault injection tool for Embedded
systems
For each
Fault Injection
campaign
Configuration
File
Experiment
Generator Module
FI Campaign
report
Fault Injector
Analysis Tool
For each
Fault Injection
experiment
Golden Run Trace
FI Trace
Trace
Repository
Experiment Generator Module
When
521 200000.ms
1 ROD 0x00015FB4 0x40 88265.ms OSUnMapTbl
1 ROD 0x00015FB4
0x40 88265.ms OSUnMapTbl
2 ROD 0x00015F6D 0x10 70262.ms OSUnMapTbl
3 ROD 0x00015FB5 0x40 103116.ms OSUnMapTbl
4 COD 0x00014A85 0x02 57053.ms ConvertirDatosInyeccion
Where
5 COD 0x00014A21 0x80 129717.ms ConvertirDatosInyeccion
6 COD 0x00014B04 0x01 115127.ms ConvertirDatosInyeccion
7 RWD 0x00070B25 0x10 77078.ms ConsignaPresionRail
8 RWD 0x00070B46 0x10 97479.ms ConsignaPresionRail
9 COD 0x00014419 0x02 139488.ms Interp2d
10 COD 0x00014138 0x20 79351.ms Interp2d
11 COD 0x000143F6 0x08 85503.ms Interp2d
12 COD 0x0001457A 0x40 59389.ms Interp2d
13 COD 0x000141D7 0x01 96898.ms Interp2d
14 COD 0x0001416B 0x01 146757.ms Interp2d
15 COD 0x000143C7 0x08 58150.ms Interp2d
16 COD 0x000141C4 0x20 128517.ms Interp2d
17 COD 0x00013FAA 0x80 76006.ms Interp2d
18 COD 0x000140BA 0x04 61788.ms Interp2d
19 COD 0x000140FF 0x08 136874.ms Interp2d
20 COD 0x0001427C 0x08 97722.ms Interp2d
…
Configuration Files
(Where & When faults are injected)
Fault Injector
Fault Injection Script
(written in PRACTICE)
Commercial Nexus
debugging tool from
Lauterbach®
SoC application
inputs & outputs
SoC application
Tasks
SoC
Internal Registers
Golden run
procesing
Fault injection
procesing
For the time being,
Multibit flip is not considered
Fault activation vs Non-activation:
Error, 1173
No error, 1786
Analysis Tool
Error syndrome:
Detected Errors, 431
- Failure before error detection, 15
No Detected Errors, 742
Errors not provoking a failure, 454
Errors leading to Failure, 288
Failures:
Data close to expected output, 116
Data far from expected output, 172
Trace
Repository
B::Trace.List_(-50000.)--(0.)_address_data_ti.back_mark.mark
_____record|address_____|d.l_____|ti.back___|mark
-**********|
-0000001128| D:00070BB8 00000000
----0000001127| D:00070BBC 00000000 0.540us ----0000001126| D:00070BC0 00000000 0.700us ----0000001125| D:00070BC4 00000000 0.700us ----0000001124| D:00070BC8 00000000 0.960us ----0000001123| D:00070BCC 00000000 1.040us ----0000001122| D:00070BD0 00000000 0.700us ----0000001121| D:00070BD4 00000000 0.700us ----0000001120| D:00070BB8 000003E8 1.026s ----0000001119| D:00070BBA 00000014 1.760us ----0000001118| D:00070BBC 0000000F 1.740us ----0000001117|
239.200us A--…
Error detection
IBRK
LBRK
DTLBER
ITLBER
SEE
FPASE
SYSE
FPUVE
ALE
MCE
CHSTP
OTHER
mechanism:
,0
,0
,0
,0
, 234
, 26
, 11
, 11
, 25
, 97
, 31
,5
Error detection latency:
Min, 0.000008620
Max, 0.002321500
Avg, 0.000097840
Analysis completed:
- 3000 experiments analyzed.
- 41 dropped, 11 due multibitflips.
Anatomy of a SoC-based
control system

A SoC is a chip-embedded computer
SoC internal memory
Sensors
Sensor readings
Control outputs
Actuators
Control Component
Inputs
Task1
…
Outputs
TaskN
RTOS interface
Controller
or DSP
RTOS Component