Transcript [Slides]
Learning Reactive Behavior in Autonomous Vehicles: SAMUEL • Sanaa Kamari SAMUEL • Computer system that learns reactive behavior for autonomous vehicles. – Reactive behavior is the set of actions taken by an AV as a reaction to sensor readings. • uses Genetic algorithm to improve decision making rules. • Each individual in SAMUEL is an entire rule set or strategy. Motivation for SAMUEL • Learning facilitates extraction of rules from the expert. • Rules are context based => impossible to account for every situation. – Given a set of conditions, the system is able to learn the rules of operation from observing and recording his own actions. • Samuel uses a simulation environment to learn. SAMUEL • Problem specific module. – The world model and its interface. – Set of internal and external sensors – Controllers that control the AV simulator – Critic component that criticizes the success or failure of the AV. [1] SAMUEL (cont) Performance module – Matches the rules. – Performs conflict resolution. – Assign some strength values to the rules. • Learning module. – Uses GA to develop reactive behavior, as a set of condition-reaction rules. • GA searches for the behavior to exhibit the best performance – Behaviors are evaluated in real world model. – Behaviors are selected for duplication and modification. [1] Experiment Domain: Autonomous Underwater Vehicle navigation and collision avoidance • Training the AUV simulator by virtually positioning it in the center of a field with 25 mines, and an objective outside the field. • 2D AUV must navigate through a dense mine field toward a stationary object. • AUV Actions: set speed and direction each decision cycle. • System does not learn path, but a set of rules that reactively decide a move at each step. Experiment Results • Great improvement in both static and moving mines. • SAMUEL shows that reactive behavior can be learned. [1] Domain: ROBOT Continuous and embedded learning • To create Autonomous systems that continue to learn throughout their lives. • To adapt a robot’s behavior in response to changes in its operating environment and capabilities. • experiment: robot learns to adapt to failure in its sonar sensors. Continuous and Embedded learning Model • Execution module: controls the robot’s interaction with its environment. • Learning module: continuously tests new strategies for the robot against a simulation model of its environment. [2] Execution Model • Includes a rule-based system that operates on reactive (stimulus-response) rules. – IF range = [35, 45] AND front sonar < 20 AND right sonar > 50 THEN SET turn = -24 (Strength 0.8) • Monitor: Identifies symptoms of sonar failure. – measures output of sonar, compare it to recent readings and direction of motion. – Modifies simulation used by learning sys to replicate failure. Learning Module • Uses SAMUEL: uses Genetic algorithm to improve decision making rules. Experiment • Task requires Robot to go from one side of a room to the other through an opening. • Robot placed randomly 4 ft from back wall. • Location of opening is random. • Center of front wall is 12.5ft from back wall Experiment (cont) • Robot begins with a set of default rules for moving toward the goal. • Learning starts with simulation that includes and all sonars working. • After an initial period one ore more sonars are blinded. • Monitor detects failed sonars, learning simulation is adjusted to reflect failure. • Population of competing strategies is re-initialized and learning continues. • The online Robot uses the best rules discovered by the learning system since the last change to the learning simulation model, Experiment Results • Robot in motion with all sensors intact: – a) during run and b) at goal. • Robot in motion after adapting to loss of three sensors: front, front right and right: – a) during run, and b) at goal. [2] Experiment Results [2] • a) Robot with full sensors passing directly through doorway. • b) Robot with front sonar covered. • c) Robot after adapting to covered sonar. It uses side sonar to find opening, and then turns into the opening. References • [1]. A. C. Schultz and J. J.Grefenstetts, “Using a genetic algorithm to learn reactive behavior for autonomous vehicles,” in Proceedings of the AIAA Guidance, Navigation, and Control Conference, (Hilton Head, SC), 1992. • [2]. A. C. Schultz and J. J.Grefenstetts, ”Continuous and Embedded Learning in Autonomous Vehicles: Adapting to Sensor Failures”, in Proceeding of SPIE vol. 4024, pg 55-62, 2000.