Transcript [Slides]

Learning Reactive Behavior
in Autonomous Vehicles:
SAMUEL
• Sanaa Kamari
SAMUEL
• Computer system that learns reactive
behavior for autonomous vehicles.
– Reactive behavior is the set of actions taken
by an AV as a reaction to sensor readings.
• uses Genetic algorithm to improve
decision making rules.
• Each individual in SAMUEL is an entire
rule set or strategy.
Motivation for SAMUEL
• Learning facilitates extraction of rules from
the expert.
• Rules are context based => impossible to
account for every situation.
– Given a set of conditions, the system is able
to learn the rules of operation from observing
and recording his own actions.
• Samuel uses a simulation environment to
learn.
SAMUEL
• Problem specific module.
– The world model and its
interface.
– Set of internal and external
sensors
– Controllers that control the
AV simulator
– Critic component that
criticizes the success or
failure of the AV.
[1]
SAMUEL (cont)
Performance module
– Matches the rules.
– Performs conflict resolution.
– Assign some strength values
to the rules.
• Learning module.
– Uses GA to develop reactive
behavior, as a set of
condition-reaction rules.
• GA searches for the behavior to
exhibit the best performance
– Behaviors are evaluated in
real world model.
– Behaviors are selected for
duplication and modification.
[1]
Experiment Domain:
Autonomous Underwater
Vehicle navigation and
collision avoidance
• Training the AUV simulator by virtually
positioning it in the center of a field with 25
mines, and an objective outside the field.
• 2D AUV must navigate through a dense mine
field toward a stationary object.
• AUV Actions: set speed and direction each
decision cycle.
• System does not learn path, but a set of rules
that reactively decide a move at each step.
Experiment Results
• Great improvement in
both static and moving
mines.
• SAMUEL shows that
reactive behavior can be
learned.
[1]
Domain: ROBOT
Continuous and embedded
learning
• To create Autonomous systems that
continue to learn throughout their lives.
• To adapt a robot’s behavior in response to
changes in its operating environment and
capabilities.
• experiment: robot learns to adapt to failure
in its sonar sensors.
Continuous and Embedded
learning Model
• Execution module:
controls the robot’s
interaction with its
environment.
• Learning module:
continuously tests new
strategies for the robot
against a simulation
model of its environment.
[2]
Execution Model
• Includes a rule-based system that operates on
reactive (stimulus-response) rules.
– IF range = [35, 45] AND front sonar < 20 AND right
sonar > 50 THEN SET turn = -24 (Strength 0.8)
• Monitor: Identifies symptoms of sonar failure.
– measures output of sonar, compare it to recent
readings and direction of motion.
– Modifies simulation used by learning sys to
replicate failure.
Learning Module
• Uses SAMUEL: uses Genetic algorithm to
improve decision making rules.
Experiment
• Task requires Robot to go
from one side of a room to
the other through an
opening.
• Robot placed randomly 4 ft
from back wall.
• Location of opening is
random.
• Center of front wall is 12.5ft
from back wall
Experiment (cont)
• Robot begins with a set of default rules for moving toward the
goal.
• Learning starts with simulation that includes and all sonars
working.
• After an initial period one ore more sonars are blinded.
• Monitor detects failed sonars, learning simulation is adjusted
to reflect failure.
• Population of competing strategies is re-initialized and
learning continues.
• The online Robot uses the best rules discovered by the
learning system since the last change to the learning
simulation model,
Experiment Results
• Robot in motion with all
sensors intact:
– a) during run and b) at goal.
• Robot in motion after
adapting to loss of three
sensors: front, front right
and right:
– a) during run, and b) at goal.
[2]
Experiment Results
[2]
• a) Robot with full sensors passing directly through
doorway.
• b) Robot with front sonar covered.
• c) Robot after adapting to covered sonar. It uses side
sonar to find opening, and then turns into the opening.
References
• [1]. A. C. Schultz and J. J.Grefenstetts, “Using a
genetic algorithm to learn reactive behavior for
autonomous vehicles,” in Proceedings of the
AIAA Guidance, Navigation, and Control
Conference, (Hilton Head, SC), 1992.
• [2]. A. C. Schultz and J. J.Grefenstetts,
”Continuous and Embedded Learning in
Autonomous Vehicles: Adapting to Sensor
Failures”, in Proceeding of SPIE vol. 4024, pg
55-62, 2000.