Option & Constraint - Cognitive Engineering Center

Download Report

Transcript Option & Constraint - Cognitive Engineering Center

Option and Constraint Generation
using Work Domain Analysis
Presenter: Guliz Tokadli
Dr. Karen Feigh
Introduction
Interactive Machine Learning
Reinforcement Learning
Work Domain Analysis
Micro-world of Pac-Man
Introduction
 Goal: To develop a method to exploit experienced humans’
knowledge to improve algorithms for machine learning
using work domain analysis techniques from cognitive
engineering
 Current practice: Using programmer derived or autoderived primitives, options and constraints, or inferring
these from human coaching or demonstration
 How is our approach new: Provides a systematic way to
mine a human’s knowledge about a domain and to
translate it to a hierarchical goal structure
 How will we know we’ve succeeded: We will be able to
generate better machine learning algorithms than current
practice in a shorter period of time with less expert
intervention
3
Interactive Machine Learning
Reinforcement Learning is a method to generate
Our
Objective:
Robots
and
other
machines
that can
policies
for
agent
tasked
with
making
decisions.
learn from people unfamiliar with machine learning
Learning
from action policy: state  action
algorithms.
It maximizes the reward
From:
To:
4
Reinforcement Learning – Option & Constraint
GOAL is the
ultimate
purpose
GOAL
to gain.
CONSTRAINTS are defined as the
set of all state action pairs that the
agent should not do.
CONSTRAINTS
- Don’t
do X.
- Don’t move onto unscared
ghost.
OPTIONS are used for
generalization of primitive
actions to include
temporally
OPTIONS
extended courses of action.
An option O: <I, π, β>.
PRIMITIVES are the set of
fundamental actions the
PRIMITIVES
agent can effect such as
Up, Right, Left, Down
(5) A. J. Irani, J. A. Rosalia, K. Subramanian, C. L. I. Jr., and A. L. Thomaz, “Using both positive and negative instructions from humans to decompose reinforcement
learning problems,” Georgia Institute of Technology, Tech. Rep., 2014.
(6) R.S.Sutton and A.G.Barto, Reinforcement Learning: An Introduction. The MIT Press, 2012.
5
Work Domain Analysis Method:
Abstraction Hierarchy
Levels of AH
Meaning of Levels for Pac-Man
5-level functional
decomposition usedExamples
for
Functional
Thecomplex
goals of Pac-Man.
Staying Alive
modeling
sociotechnicalWhat
systems.
Why
Purposes (FP)
System
is
described
at
different
levels
of
Abstract
What criteria is required to
Avoid Ghosts
abstraction
with “Why
is How” What
re
lationship.
How
Why
Functions
(AF) judge whether
Pac-Man
achieving the purposes.
Generalized
Functions (GF)
What functions are required to
accomplish Pac-Man’s abstract
functions.
How
What
Why
Physical
Functions (PF)
The limitation and capabilities of
the system,
What
How
Why
Distance to Nearest
Ghost
How
What
Ghosts, Pac-Man,
Walls
Physical Objects The objects 6and characters on
(PO)
the maze.
(1)N. Naikar, R. Hopcroft, and A. Moylan, “Work domain analysis : Theoretical concepts and methodology,” pp. 5–35, 2005.
(2)N. Naikar, Work Domain Analysis: Concepts, Guidelines, and Cases. CRC Press, 2013.
(3)J. Rasmussen, Information Processing and Human-Machine Interaction: An Approach to Cognitive Engineering. Elsevier Science, 1986.
(4) N.A. Stanton, P.M. Salmon, G.H. Walker, C. Baber and D.P. Jenkins, Human Factors Methods, Ch. 4 Cognitive Task Analysis Methods, 2005
6
Maneuver Pac-Man
Micro-World of Pac-Man
Classic arcade game,
Is invented in 1980
Why we chose Pac-Man:
Helps to understand the research problem
Helps to build AH
 Map linking the goals
 Provide insight into RL policies
Used for many ML and RL studies
 Allow the comparison of the results
 Find what is needed quickly and
conveniently.
7
Results of Pac-Man Study
Familiarization & Interview Phase
Modeling Phase
Option & Constraint Set Generation
Phase
Pac-Man Study - Method
Familiarization Phase: 16 participants played PacMan for 10 minutes
Interview Phase: researchers interviewed the
participants using a structured interview script
designed to generate abstraction hierarchies
Modeling Phase: researchers created abstraction
hierarchies for each player and composite
abstraction hierarchies for high and low
performing players
Option & Constraint Set Generation Phase:
researchers translated composite abstraction
hierarchies into sets of options and constraints
9
Participant Performance
High Performers
Low
Performers
Familiarization and Interview Phase
Modelling Phase
Option & Constraint Set Generation Phase
10
Modeling Phase - Procedure
1. Creation of individual AH per player
2. Harmonization of statements in AHs
3. Combination of AHs of high and low
performers  Performance-based AH
Familiarization and Interview Phase
Modelling Phase
Option & Constraint Set Generation Phase
11
WDA: Performance-Based AH
Aggregated AH: Low and High Performance AHs are represented
as single AH.
Familiarization and Interview Phase
Modelling Phase
Option & Constraint Set Generation Phase
12
WDA: Performance-Based AH
High Performers
Low Performers
Player perspective is as ‘Competitor’ Player perspective is as ‘Exploratory
– Maximum score and minimum time spirit’ – Learning Tricks Having both defensive and offensive
actions
Having dual level of protection for
Pac-Man as proactive and reactive
strategy
Having only defensive actions
‘Clearing current quadrant’ as action
is using as tactic
‘Clearing current quadrant’ is using as
strategy
Quick observation to act in minimum Observation on quadrant is being
time interval
used for ‘Learning Tricks’
Familiarization and Interview Phase
Modelling Phase
Option & Constraint Set Generation Phase
13
Generation of Option and Constraint Sets
High PerformerDefined OC Set
Low PerformerDefined OC Set
Familiarization and Interview Phase
Modelling Phase
Option & Constraint Set Generation Phase
14
High Performer-Defined OC Set Creation
Familiarization and Interview Phase
Modelling Phase
Option & Constraint Set Generation Phase
15
Low Performer-Defined OC Set Creation
Familiarization and Interview Phase
Modelling Phase
Option & Constraint Set Generation Phase
16
Conclusion
Conclusion
 Implemented
Our
Goal is to develop
optionaand
method
constraint
to improve
sets separately
algorithmsonfor
RLmachine
algorithms
learning
and evaluation
based on human
of the performers
experience–and
ableknowledge
to show the
using
differences
Work
Domain
between
Analysis.
performers
Now

Will
wework
are able
to auto-generate
to generate option
option and
and constraint
constraint sets using
for different
state-oflevels
the-art
of performance.
methods
Low Performers
High Performers
18
Thank you!
Questions and Comments?
19