Mars pathfinder failure Dedicated Systems Experts 2005 - Martin TIMMERMAN p. 1

Transcript Mars pathfinder failure Dedicated Systems Experts 2005 - Martin TIMMERMAN p. 1

Mars pathfinder failure
Dedicated Systems Experts 2005 - Martin TIMMERMAN
p. 1
Documentation
• Speech Dave Wilner recorded by Mike
Jones
• Comments by Glenn Reeves, Mars
Pathfinder Flight Software Cognizant
Engineer
– Hereafter called JPL (Jet Propulsion Lab)
• Talk by Ian A. Mason, University of New
England, Australia
– http://mcs.une.edu.au/~iam/Data/threads/threads.html
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 2
Pathfinder mission
• LAUNCH 4/12/1996
• Mars Pathfinder was originally
designed as a
technology demonstration
of a way to deliver an
instrumented lander and a
free-ranging robotic rover to
the surface of the red planet.
• Pathfinder not only
accomplished this goal but also
returned an unprecedented
amount of data and outlived
its primary design life.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 3
Budget
• Due to limited funds, Pathfinder’s
development had to be dramatically
different from the way in which previous
spacecraft had been developed.
• Instead of the traditional 8- to 10-year
schedule and $1-billion-plus budget,
Pathfinder was developed in three years
for less than $150 million
= the cost of some Hollywood movies!
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 4
Pathfinder exploration
• landing: 4/7/1997
last transmission: 27/09/1997
• Pathfinder & Soujerner
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 5
Lander
• The lander was controlled by a derivative of the
commercially available
IBM RAD6000 computer,
radiation-hardened to survive the flight.
• The computer featured a computing speed of
20 MIPS 128 MB of DRAM
for storage of flight software and engineering
and science data, including images and rover
information.
• 6 MB ROM
stored flight software and time-critical data.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 6
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 7
Rover Sojourner
• The rover, capable of autonomous
navigation and performance of tasks,
communicated with Earth via the lander.
• Sojourner’s control system was built
around an
Intel 80C85,
with a computing speed of
0,1 MIPS and 500 KB of RAM.
• ? ROM
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 8
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 9
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 10
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 11
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 12
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 13
The lander
hardware and software
Dedicated Systems Experts 2005 - Martin TIMMERMAN
p. 14
VMEbus
Radio
thrusters,
valves,
a sun sensor,
a star scanner
Camera
controls
Mil1553
Cruise stage
CPU
RS6000
Mil 1553 bus
Lander
interface to
accelerometers,
a radar altimeter,
an instrument for
meteorological science
known as the ASI/MET
Mil1553: specific paradigm:
the software will schedule activity at an 8 Hz rate.
This **feature** dictated the architecture of the software
which controls both the 1553 bus and the devices attached to it.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 15
The software
• VxWorks 5.x (x = 3 or 4?)
• 2 tasks to control the 1553 bus and the
attached instruments.
• bc_sched task (called the bus scheduler)
– a task controlled the setup of transactions on
the 1553 bus
• bc_dist task (for distribution) task
also referred as the “communication task”
– handles the collection of the transaction
results i.e. the data.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 16
t1 - bus hardware starts via hardware control on the 8 Hz boundary.
The transactions for the this cycle had been set up by the previous execution
of the bc_sched task.
Marsrobot general communication pattern
t2 - 1553 traffic is complete and the bc_dist task is awakened.
t3 - bc_dist task has completed all of the data distribution
t4 - bc_sched task is awakened to setup transactions for the next cycle
t5 - bc_sched activity is complete
125 ms (8 Hz)
Mil 1553 transaction
bc-sched
HIGH priority
bc-sched
bc-dist
MEDIUM priority
Check order!
bc-dist
Spacecraft functions
LOW priority
Science functions (ASI/MET, …)
LOWEST priority
t1
t2
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
t3
t4
t5
t1
time
p. 17
1553 communication
• Powered 1553 devices deliver data.
• Tasks in the system that access the information
collected over the 1553 do so via a
double buffered shared memory
mechanism into which the bc_dist task places
the latest data.
• The exception to this is the ASI/MET task which
is delivered its information via an interprocess
communication mechanism (IPC). The IPC
mechanism uses the VxWorks pipe() facility.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 18
VMEbus
D-Buffer
RS6000
D-Buffer
CPU
IPC PIPE
Radio
thrusters,
valves,
a sun sensor,
a star scanner
Camera
controls
Mil1553
Cruise stage
D-Buffer
Packed
buffer
File
Descriptor
List
MEM
Lander
interface to
accelerometers,
a radar altimeter,
an instrument for
meteorological science
known as the ASI/MET
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 19
Dedicated Systems’ tasking graphics
model - example
P4
D
Mailbox
started
thread
D4
message queue
P3
Thread
data usage
Shared
data
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 20
bc-sched
Shared
data
Shared
data
Mil 1553
trans
setup
Mil 1553
data
setup
Shared
data
Shared
data
Spacecraft
Spacecraft
function
function
taks
1
tasks
Science
Science
function
function
task
1
tasks
bc-dist
pipe
ASI/MET
task
File
Descriptor
Table
System_mutex
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 21
IPC mechanism
• Tasks wait on one or more IPC "queues" for
messages to arrive using the
VxWorks select()
mechanism to wait for message arrival.
• Multiple queues are used when both high and
lower priority messages are required.
• Most of the IPC traffic in the system is not for
the delivery of real-time data. The exception to
this is the use of the IPC mechanism with the
ASI/MET task.
• The cause of the reset on Mars was in the use
and configuration of the IPC mechanism.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 22
VXWorks Select ()
• Pending on multiple file descriptiors:
this routine permits a task to pend until one of a set of
file descriptors becomes available
• Wait for multiple I/O devices (task level and driver level)
• file descriptors
– pReadFds, pWriteFds
• Bits set in pReadFds will cause select() to pend until data
becomes available on
any of the corresponding file descriptors.
• Bits set in pWriteFds will cause select() to pend until
any of the corresponding file descriptors
becomes available.
•
http://www.eelab.usyd.edu.au/tornado/docs/vxworks/ref/selectLib.html
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 23
Marsrobot design
Middle priority long lasting
Comm thread
bc_dist
Thread A
Low priority thread
Thread B
Lowest priority sporadic
meteo thread
ASI/MET
System_mutex
Different I/O
channels
Thread C
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
Shared ressource for
Communication
Using select()
p. 24
The problem
• Priority inversion
– Bounded
– Unbounded
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 25
Priority Inversion
• Priority inversion occurs when a thread of low
priority blocks the execution of threads of higher
priority.
• Priority inversion comes in two flavours:
– bounded priority inversion (common & relatively
harmless)
– unbounded priority inversion (insidious & potentially
disastrous)
• Priority inversion is not new
– the earliest mention of it that I've found dates back to
the Burroughs MCP (Master Control Program) of the
early 1970's.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 26
Bounded Priority Inversion
• Suppose a high priority thread becomes blocked
waiting for an event to happen. A low priority
thread then starts to run and in doing so obtains
(i.e locks) a mutex for a shared resource. While
the mutex is locked by the low priority thread,
the event occurs waking up the high priority
thread.
• Inversion takes place when the high priority
thread tries to lock the mutex held by the low
priority thread. In effect the high priority thread
must wait for the low priority thread to finish.
• It is called bounded inversion since the inversion
is limited by the duration of the critical section.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 27
Bounded priority inversion
run
blocked
ISR A
ready
HIGH:
TASK A (40)
Lock
MUTEX (m)
Bounded
inversion
time
LOW
TASK C (30)
Lock
MUTEX (m)
UnLock
MUTEX (m)
time
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 28
Unbounded Priority Inversion
• This is a simple elaboration on bounded
inversion. Here the high level thread can be
blocked indefinitely by a medium priority thread.
The medium level thread running prevents the
low priority thread from releasing the lock. All
that is required for this to happen is that while
the low level thread has locked the mutex, the
medium level thread becomes unblocked,
preempting the low level thread. The medium
level thread then runs indefinitely.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 29
Unbounded priority inversion
run
ISR B
ISR A
blocked
ready
HIGH:
TASK A (40)
Lock
MUTEX (m)
Unbounded inversion time
MIDDLE:
TASK B (35)
LOW
TASK C (30)
Lock
MUTEX (m)
time
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 30
Mission failure
• The failure was identified by the spacecraft as a
failure of the bc_dist task to complete its
execution before the bc_sched task started.
• The reaction to this by the spacecraft was to
reset the computer.
• This reset reinitializes all of the hardware and
software. It also terminates the execution of the
current ground commanded activities. No
science or engineering data is lost that has
already been collected (the data in RAM is
recovered so long as power is not lost).
• The remainder of the activities for that day were
not accomplished until the next day.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 31
Marsrobot normal operation
run
blocked
Comm thread
pre-emption
ready
Comm thread
Pre-emption
HIGH:
Bus thread
bc_sched
OK!
MIDDLE
Comm thread
bc_dist
LOW
Tasks
Lock
SystemMUTEX (m)
Un-Lock
SystemMUTEX (m)
LOWEST
Meteo thead
End of
cycle
time
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 32
Marsrobot priority inversion
run
blocked
Comm thread
pre-emption
ready
Comm thread
Pre-emption
HIGH:
Bus thread
bc_sched
NOK!
MIDDLE
Comm thread
bc_dist
Lock
SystemMUTEX (m)
LOW
Tasks
System
Reset
Lock
SystemMUTEX (m)
LOWEST
Meteo thead
Lock
SystemMUTEX (m)
Un-Lock
SystemMUTEX (m)
End of
cycle
time
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 33
Priority inversion
• The higher priority bc_dist task was blocked by the much
lower priority ASI/MET task that was holding a shared
resource.
• The ASI/MET task had acquired this resource and then
been preempted by several of the medium priority tasks.
• When the bc_sched task was activated, to setup the
transactions for the next 1553 bus cycle, it detected that
the bc_dist task had not completed its execution.
• The resource that caused this problem was a mutex
(here called system_mutex) used within the select()
mechanism to control access to the list of file descriptors
that the select() mechanism was to wait on.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 34
• The select() mechanism creates a system_mutex to protect the
"wait list" of file descriptors for those devices which support select().
• The VxWorks pipe mechanism is such a device and the IPC
mechanism used is based on using pipes.
• The ASI/MET task had called select(), which had called pipeIoctl(),
which had called selNodeAdd(), which was in the process of giving
the system_mutex.
• The ASI/ MET task was preempted and semGive() was not
completed.
• Several medium priority tasks ran until the bc_dist task was
activated.
• The bc_dist task attempted to send the newest ASI/MET data via
the IPC mechanism which called pipeWrite().
• pipeWrite() blocked, taking the system_mutex. More of the
medium priority tasks ran, still not allowing the ASI/MET task to run,
until the bc_sched task was awakened.
• At that point, the bc_sched task determined that the bc_dist task
had not completed its cycle (a hard deadline in the system) and
declared the error that initiated the reset.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 35
Debug the problem
• On replica on earth
• Total Tracing on
– Context switches
– Uses of synchronisation objects
– Interrupts
• Took time to reproduce the error
• Trace analyses => priority inversion
problem
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 36
Bug Detection
• The software that flies on Mars Pathfinder has
several debug features within it that are used in
the lab but are not used on the flight spacecraft
(not used because some of them produce more
information than we can send back to Earth).
• These features remain in the software by design
because JPL strongly believes in the
"test what you fly and fly what you test"
philosophy.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 37
• One of these tools is a trace/log facility which was
originally developed to find a bug in an early version of
the VxWorks port (Wind River ported VxWorks to the
RS6000 processor for us for this mission).
• This trace/log facility was built by David Cummings who
was one of the software engineers on the task. Lisa
Stanley, of Wind River, took this facility and instrumented
the pipe services, msgQ services, interrupt handling,
select services, and the tExec task.
• The facility initializes at startup and continues to collect
data (in ring buffers) until told to stop. The facility
produces a voluminous dump of information when
asked.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 38
System tracing
• Traces system call or OS events
• Uses circular buffer
• Overhead
• RT if ......
TICKER
routine
ASI/MET
bc_dist
bc_sched
VxWorks 5.x
Tx
Ty
TRACE
Physical I/O (BIOS)
Hardware
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 39
• After the problem occurred on Mars JPL did run
the same set of activities over and over again in
the lab.
• The bc_sched was already coded so as to stop
the trace/log collection and dump the data (even
though JPL knew they could not get the dump in
flight) for this error.
• So, when JPL went into the lab to test it they did
not have to change the software.
• In less that 18 hours JPL were able to cause the
problem to occur. Once they were able to
reproduce the failure the priority inversion
problem was obvious.
??
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 40
Problem correction (1)
• Once JPL understood the problem the fix
appeared obvious:
change the creation flags for the
semaphore so as to enable the priority
inheritance.
• The Wind River folks, for many of their services,
supply global configuration variables for
parameters such as the "options" parameter for
the semMCreate used by the select service
(although this is not documented and those who do not have vxWorks
source code or have not studied the source code might be unaware of this
feature).
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 41
Problem correction (2)
• However, the fix is not so obvious for several reasons
1. The code for this is in the selectLib() and is common for all
device creations. When you change this global variable all of the
select semaphores created after that point will be created with
the new options. There was no easy way in our initialization
logic to only modify the semaphore associated with the pipe
used for bc_dist task to ASI/MET task communications.
2. If you make this change, and it is applied on a global basis, how
will this change the behavior of the rest of the system ?
3. The priority inversion option was deliberately left out by Wind
River in the default selectLib() service for optimum
performance.
How will performance degrade if we turn the priority inversion
on ?
4. Was there some intrinsic behavior of the select mechanism itself
that would change if the priority inversion was enabled ?
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 42
Problem correction (3)
• JPL did end up modifying the global variable to include the priority
inversion. This corrected the problem.
• JPL asked Wind River to analyze the potential impacts for (3) and
(4).
• They concluded that the performance impact would be minimal and
that the behavior of select() would not change so long as there was
always only one task waiting for any particular file descriptor.
This is true in our system. JPL believes that the debate at Wind
River still continues on whether the priority inversion option should
be on as the default.
• For (1) and (2) the change did alter the characteristics of all of the
select mutexes. JPL concluded, both by analysis and test, that there
was no adverse behavior. JPL tested the system extensively before
they changed the software on the spacecraft.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 43
CHANGED THE SOFTWARE ON THE
SPACECRAFT
• JPL did not use the vxWorks shell to change the
software
(although the shell is usable on the spacecraft).
• The process of "patching" the software on the
spacecraft is a specialized process. It involves
sending the differences between what you have
onboard and what you want (and have on Earth)
to the spacecraft.
• Custom software on the spacecraft (with a
whole bunch of validation) modifies the onboard
copy.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 44
WHY DIDN’T JPL CATCH IT BEFORE
LAUNCH ?
• The problem would only manifest itself when
ASI/MET data was being collected and
intermediate tasks were heavily loaded.
• Our before launch testing was limited to the
"best case" high data rates and science
activities.
• The fact that data rates from the surface were
higher than anticipated and the amount of
science activities proportionally greater served to
aggravate the problem.
• We did not expect nor test the "better than we
could have ever imagined" case.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 45
Lessons learned
• Only detailed traces of actual system behavior enabled
the faulty execution sequence to be captured and
identified.
• Leaving the « debugging » facilities in the system
saved the day. Without the ability to modify the system
in the field, the problem could not have been corrected.
• Finally, the engineer's initial analysis that
"the data bus task executes very frequently and is time-
critical -- we shouldn't spend the extra time in it to
perform priority inheritance"
was exactly wrong.
• It is precisely in such time critical and important
situations where correctness is essential, even at
some additional performance cost.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 46
Lessons learned – human factors
• JPL engineers later confessed that one or two system
resets had occurred in their months of pre-flight testing.
They had never been reproducible or explainable, and so
the engineers, in a very human-nature response of
denial, decided that they probably weren't important,
using the rationale "it was probably caused by a
hardware glitch".
• Part of it too was the engineers' focus. They were
extremely focused on ensuring the quality and flawless
operation of the landing software. Should it have failed,
the mission would have been lost. It is entirely
understandable for the engineers to discount occasional
glitches in the less-critical land-mission software,
particularly given that a spacecraft reset was a viable
recovery strategy at that phase of the mission.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 47
Priority inversion solution
implementations
• History
• Priority Inheritance – pro’s and con’s
• Priority ceiling - pro’s and con’s
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 48
History
• Theory provides (at least) two simple solutions to priority inversion:
– Priority Inheritance Protocol
– Priority Ceiling Protocol
• The first is the simplest, while the second has nicer theoretical
properties.
• The theoretical results (about both) date back to about 1987, while
the actual protocols date back quite earlier.
• Burroughs MCP implemented a version of Priority Ceiling Protocol in
the 1970's
• Lampson & Redell suggest the Priority Ceiling Protocol (in Mesa) in
the late 1970's
• Important IEEE paper by L. Sha, R. Rajkumar & P. Lehoczky. Priority
Inheritance Protocols: An Approach to Real-Time Synchronization.
IEEE Transactions on Computers, vol. 39, pp. 1175-1185, Sep 1990
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 49
Priority Inheritance Protocol
• Priority Inheritance means that when a thread waits on a mutex
owned by a lower priority thread, the priority of the owner is
increased to that of the waiter. In the priority inheritance protocol
when a thread locks a mutex its priority is not changed. The action
takes place when a thread attempts to lock a mutex owned by
another thread.
• In this situation the priority of the thread owning the mutex is raised
to the priority of the blocked thread (if higher).
• When the thread releases the mutex its old priority (i.e prior to
locking this mutex) is restored.
• This prevents unbounded priority inversion since the low priority
thread gets a high priority and thus cannot be pre-empted by
medium priority thread.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 50
Priority Inheritance Protocol (cont'd)
• The theoretical results concerning the Priority
Inheritance Protocol are:
•  A thread can only be blocked once by each thread of
lower priority, and the duration of each blockage is
limited to one critical section.
•  If there are n mutexes (which can block a thread), the
thread can be blocked at most n times.
•  It does not prevent deadlock.
•  Blocking can be prolonged
(i.e. all blocking can be chained together).
• The practical aspects of the Priority Inheritance Protocol
are:
–  It is easy to program using the protocol.
–  It is complicated to implement.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 51
Priority Ceiling Protocol
• Priority Ceiling means that while a thread owns the
mutex it runs at a priority higher than any other thread
that may acquire the mutex.
• In the priority ceiling solution each shared mutex us
initialised to a priority ceiling.
• Whenever a thread locks this mutex, the priority of the
thread is raised to the priority ceiling.
• This works as long as the priority ceiling is greater than
the priorities of any thread that may lock the mutex
(hence its name).
• Note again how this solves the unbounded priority
inversion problem.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 52
Priority Ceiling Protocol (cont'd)
• The theoretical results concerning the Priority
Inheritance Protocol also require a scheduling rider:
• A thread can lock a mutex only if it's priority is higher
than the ceilings of all other locked mutexes.
• The theoretical results are then:
–  The protocol prevents deadlock.
–  A thread can only be blocked by one other thread's (maximal)
critical section.
–  The protocol introduces a new form of blocking (i.e the
scheduling rider).
• The practical results are:
–  It is easy to implement (modulo the scheduling rider).
–  It requires careful programming (i.e. correct choices of
ceilings).
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 53
POSIX Solutions
•  POSIX provides both solutions:
– the priority ceiling protocol
(although it doesn't seem to require the scheduling rider)
– the priority inheritance protocol
•  It also allows for three scheduling algorithms on a
thread by thread basis:
– FIFO (used for high priority threads)
– Round Robin, (used for routine threads), and
– other (a standard way to be non-standard).
•  It also allows for the tweaking of which threads
compete with which other threads (contention scope)
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 54
Java's Solutions
•
•
•
•
•
•
 Java doesn't specify a scheduling policy:
Each thread has a priority that is used by the Java runtime in scheduling
threads for execution. A thread that has a higher priority than another
thread is typically scheduled ahead of the other thread. However, the way
thread priorities precisely affect scheduling is platform dependent. In some
systems, priority-based scheduling is guaranteed, while in others, priorities
act only as hints to the scheduler. Therefore you should not depend on
priorities in designing your program .
from page 1359 of The Java Class Libraries
 The Java Library fails to even specify protocols for avoiding priority
inversion.
 The reason for this total cop-out on the part of Java is perhaps because
of its desire for platform independence. Since it has to run on a variety of
operating systems (pre-emptive (Microsoft) vs non-pre-emptive (Apple)).
 Consequently, multithreaded Java programs can run well in one
operating, and not run at all in another operating system.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 55
Case study Conclusions
Dedicated Systems Experts 2005 - Martin TIMMERMAN
p. 56
Conclusions
• The Pathfinder Problem was fixed by
simply flicking a switch.
• The system_mutex merely had to be
initialized with priority inheritance turned
on.
• Priority inversion is not new but a lot of
designers ignore it.
• There are non documented features in an
RTOS.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 57
Application Design Advice
• Rules
– No other function call between de P and V operations
– Critical code should be as short as possible – in most
cases people doesn’t know what they are really doing
and they lock too much in onder to be “sure
– Never use a mutex inside a lock of another mutex
• Multi-processor situation is much more difficult
• Both systems are OK to solve the problem
• Priority ceiling is simpler to implement in the OS
but needs design attention.
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 58
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 59
Other interesting info
• Streamlined Design Approach Lands Mars
Pathfinder
– Steven A. Stolper, ComTier
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 60
Dedicated Systems Experts – 2005 – Martin TIMMERMAN
p. 61

Mars pathfinder failure Dedicated Systems Experts 2005 - Martin TIMMERMAN p. 1

Transcript Mars pathfinder failure Dedicated Systems Experts 2005 - Martin TIMMERMAN p. 1

Directory