Computational Discovery of Communicable Knowledge

Transcript Computational Discovery of Communicable Knowledge

Challenges in Learning
Plan Knowledge
Pat Langley
School of Computing and Informatics
Arizona State University
Tempe, Arizona USA
Institute for the Study of Learning and Expertise
Palo Alto, California USA
Thanks to D. Choi, T. Konik, U. Kutur, N. Li, D. Nau, N. Nejati, and D. Shapiro for
their many contributions. This talk reports research funded by grants from DARPA
IPTO, which is not responsible for its contents.
Outline of the Talk
1. Brief review of learning plan knowledge
2. Learning from different sources
3. Learning for new performance tasks
4. Learning in different scenarios
5. Learning with novel representations
6. Some responses to these challenges
7. Concluding remarks
The Problem: Learning Plan Knowledge
 Given: Basic knowledge about some action-oriented domain.
(e.g., state/goal representation, operators)
 Given: A set of training problems (e.g., initial states, goals,
and possibly more)
 Given: Some performance task that the system must carry out.
 Given: A performance mechanism that can use knowledge to
carry out that task.
 Learn: Knowledge that will let the system improve its ability
to perform new tasks from the same or similar domain.
Topics Not Covered
This talk will range widely, but I will not cover issues related to:
 Learning with impoverished representations
 Interested in human-like, intelligent behavior
 Most work on reinforcement learning is irrelevant
 Acquiring basic knowledge about domain
 Interested in building on such knowledge
 Most work on learning action models is too basic
 Nonincremental learning from large data sets
 Interested in human-like incremental learning
 This rules out most data-mining approaches
Historical Topics
There has been a long history of work on learning plan knowledge:
 Forming macro-operators
 Fikes et al. (1972), Iba (1988), Mooney (1989), Botea et al. (2005)
 Inducing forward-chaining control rules
 Anzai & Simon (1978) Mitchell et al. (1981), Langley (1982)
 Learning control rules analytically
 Laird et al. (1986), Mitchell et al. (1986), Minton (1988)
 Problem solving by analogy
 Veloso (1994), Jones & Langley (1995), VanLehn & Jones (1994)
 Inducing control rules for partial-order plans
 Kautukam & Kambhampati (1994), Estlin & Mooney (1997)
Historical Trends
Work on learning plan knowledge has seen many shifts in fashion:
 Early hope for improving problem solvers/planners (19781985)
 Excitement/confusion introduced by EBL movement (19861992)
 Some doubts raised by the “utility problem” (19881993)
 Mass migration to reinforcement learning paradigm (19932003)
 Resurgence of interest in learning plan knowledge (2004present)
Throughout these changes, the problems and potential of learning
plan knowledge have remained.
Traditional Sources of Information
Most research on learning for planning has assumed the system
uses search to generate:
 Successful paths that achieve the goals (positive instances)
 Failed paths that do not achieve the goals (negative instances)
 Alternative paths of different desirability (preferred instances)
But humans learn from other sources of information and our AI
systems should as well.
Challenge: Learn from Many Sources
There has been relatively little research on plan learning from:
 Demonstrations of solved problems (Nejati et al., 2006)
 Explicit instruction from teacher (Blythe et al., 2007)
 Advice or hints from teacher (Mostow, 1983)
 Mental simulations or daydreaming (Mueller, 1985)
 Undesirable side effects during execution
Humans learn from all of these sources, and our learning systems
should support the same capabilities.
Moreover, we should develop single systems that integrate plan
knowledge learned from all of them (Oblinger, 2006).
Traditional Performance Tasks
Most research on learning for planning has assumed the system
aims to improve:
 The efficiency of plan generation (nodes expanded, time)
 The quality of generated plans (path length, utility)
 The coverage of plan knowledge (problems solved)
But humans learn and use plan knowledge for other purposes
that are just as valid.
Challenge: Learn for Plan Execution
Many important domains require executing plan knowledge in
some environment that includes:
 operators with likely but nonguaranteed effects
 external events not directly under the agents control
 other agents that are pursuing their own goals
Urban driving is one setting that raises all three of these issues.
Complex board games like chess, although deterministic, still
require interleaving of planning and execution.
We need more research on plan learning in contexts of this sort
(e.g., Benson, 1995; Fern et al., 2004).
Challenge: Learn for Plan Understanding
Another understudied problem is learning for plan understanding.
 Given: A partially observed sequence of states influenced by
another agent’s actions.
 Given: Learned knowledge about how to achieve goals.
 Find: The other agent’s goals and the plans it is pursuing to
achieve them.
Plan understanding is important not only in complex games, but
in military planning, politics, and other settings.
This performance task suggests new learning problems, methods,
and evaluation criteria.
Traditional Learning Scenarios
Most research on learning for planning has assumed the system:
 Trains on problems from a given distribution / domain
 Tests on problems from the same distribution / domain
Success depends on the extent to which the learner generalizes
well to new problems from the same domain.
But humans also use their learned plan knowledge in other, more
flexible ways to improve performance.
Challenge: Cumulative Learning
In complex domains, humans learn plan knowledge gradually:
 Starting with small, relatively easy problems
 Moving to complex problems after mastering simpler ones
Later acquisitions build naturally on earlier experience, learning
to cumulative learning.
Our education system depends heavily on such “vertical transfer”
of learned knowledge.
We need more learning systems that demonstrate this form of
cumulative improvement (e.g., Reddy & Tadepalli, 1997).
Challenge: Cross-Domain Transfer
In other cases, humans exhibit a form of transfer that involves:
 Learning to solve problems in one domain
 Reusing this knowledge to solve problems in another domain
that is superficially quite different
Such cross-domain transfer is related to within-domain analogical
reasoning, but it is far more challenging.
In its extreme form, the two domains support similar solutions but
have no shared symbols or predicates.
We need more learning systems that demonstrate this radical form
of knowledge reuse.
Traditional Learned Representations
Most research on learning for planning has focused on learning:
 Control rules that reduce effective branching factor
 Macro-operators that reduce effective solution depth
These grew naturally from representations used to create handcrafted expert problem solvers.
But now we have other representations of plan knowledge that
suggest new learning tasks and methods.
Nor does this refer to POMDPs, workflows, or other highly
constrained formalisms.
Challenge: Learn HTNs
Hierarchical task networks (HTNs) offer the most effective
planning available, but they are expensive to build manually.
HTNs provide an ideal target for learning because they have:
 the modularity and flexibility of search-control rules
 the large-scale structure of macro-operators
Machine learning has automated the creation of expert classifiers.
We should do the same for HTNs, which are effectively expert
planning systems.
Challenge: Learn HTNs
We can define the task of learning hierarchical task networks as:
 Given: Basic knowledge about some action-oriented domain
 Given: A set of training problems (initial states and goals)
 Given: Some performance task the system must carry out.
 Given: Some module that uses HTNs to perform this task
 Learn: An HTN that lets the system improve its performance
on new tasks from the same or similar domain.
We need more research on this important topic (e.g., Reddy &
Tadepalli, 1997; Ilghami et al., 2005).
Some Responses
Our recent research attempts to respond to these challenges by
developing methods that:
 acquire a constrained but important class of HTNs
 that one can use for both planning and reactive control
 from both successful problem solving and expert traces
 that extends naturally to support cross-domain transfer
Moreover, these ideas are embedded in an integrated architecture
that supports many capabilities  ICARUS (Langley, 2006).
Conceptual Knowledge in ICARUS
Nonprimitive Concept
(patient-form-filled ?patient)
Primitive Concept
(assigned-mission ?patient
?mission)
 Conceptual knowledge is cast as Horn clauses that specify
relevant relations in the environment
 Memory is organized hierarchically
 Divided into primitive and non-primitive predicates
HTN Methods in ICARUS
HTN goal
concept
subgoal
HTN method
precondition
concept
HTN method
operator
 Similar to SHOP2 but methods indexed by goals they achieve
 Each method decomposes a goal into subgoals
 If a method’s goal is active and its precondition is satisfied,
then try to achieve its subgoals or apply its operators
Operators in ICARUS
Action
Effects Concept
(arrival-time ?patient)
(get-arrival-time ?patient
?from ?to)
Precondition Concept
(patient ?p) and
(travel-from ?p ?from) and
(travel-to ?p ?to)
 Operators describe low-level actions that agents can execute
directly in the environment
 Preconditions: legal conditions for action execution
 Effects: expected changes when action is executed
Training Input: Expert Traces and Goals
Operator instance
(get-arrival-time P2)
State
Goal concept
(all-patients-arranged)
Concept instance
(assigned-flight P1 M1)
 Expert demonstration traces
 Operators the expert uses and the resulting belief state
 State: Set of concept instances
 Goal is a concept instance in the final state
 ICARUS learns generalized skills that achieves similar goals
Learning Plan Knowledge from Demonstration
Reactive Executor
Problem
Initial
State
?
goal
Plan
Knowledge
Learned
plan
knowledge
If Impasse
HTNs
Demonstration Traces
LIGHT
Expert
States and actions
Operators
Concept definitions
Background
knowledge
Learning HTNs by Trace Analysis
concepts
actions
Learning HTNs by Trace Analysis
Operator Chaining
Learning HTNs by Trace Analysis
Concept Chaining
concepts
actions
(transfer-hospital
patient1 hospital2)
Time:3
Explanation
Structure for Trace
(arrange-ground-transportation
SFO hospital2 1pm)
(close-airport
hospital2 SFO)
(location patient1
SFO 1pm)
(assigned patient1
NW32)
(dest-airport
patient1 SFO)
Time:1
(assign patient1
NW32)
(flight-available)
(arrival-time NW32
1pm)
Time:2
(query-arrival-time)
(scheduled
NW32)
(transfer-hospital
?patient ?hospital)
Hierarchical Task
Network Structure
(arrange-ground-transportation
?loc ?hospital ?time)
(close-airport
?hospital ?loc)
(location ?patient
?loc ?time)
(assigned ?patient
?flight)
(dest-airport
?patient ?loc)
(arrival-time ?flight
?time)
(scheduled
?flight)
(flight-available)
(assign ?patient
?flight)
(query-arrival-time)
Transfer by Representation Mapping
Source domain
Target domain
concepts
actions
Predicate
mappings
Challenge: Learn with Richer Goals
HTNs are more expressive than classical plans (Errol et al., 1994).
Our approach loses this advantage because it assumes the head of
each method is a goal it achieves, but we can:
 Extend goal concepts to describe temporal behavior
 Revise the execution module to handle these structures
 Augment trace analysis to reason about temporal goals
 Learn new methods with temporal goals in their heads
This scheme should acquire the full class of HTNs while still
retaining the tractability of goal-directed learning.
Challenge: Extend Conceptual Vocabularies
Our approach to learning HTNs relies on the concept hierarchy
used to explain solution traces.
The method would be less dependent if it extended this hierarchy:
 Given: A set of concepts used in goals, states, and methods
 Given: New methods acquired from sample solution traces
 Find: New concepts that produce improved performance as
the result of future method learning.
This would support a bootstrapped learner that invents predicates
to describe states, goals, and methods.
Challenge: Extend Conceptual Vocabularies
Our approach to utilizing predicate invention has three steps:
 Define a new concept for the precondition of each method
learned by chaining off a concept definition.
 Check traces for states in which this concept becomes true
and learn methods to achieve it.
 During performance, treat each method’s precondition as its
first subgoal, which it can achieve if submethods are known.
This technique would make an HTN more complete by growing
it downward, introducing nonterminal symbols as necessary.
We have partially implemented this scheme and hope to report
results at the next meeting.
Concluding Remarks: Research Style
Clearly, there remain many open problems to address in learning
plan knowledge.
These involve new abilities, not improvements on existing ones,
which suggests that we:
 Look at human behavior for ideas on how to proceed
 Develop integrated systems rather than component algorithms
 Demonstrate their behavior on challenging domains
These strategies will help us extend the reach of our learning
systems, not just strengthen their grasp.
Concluding Remarks: Evaluation
We must evaluate our new plan learners, but this does not mean:
 Measuring their speed in generating plans
 Showing they run faster than existing systems
 Entering them in planning competitions
More appropriate experiments would revolve around:
 Demonstrating entirely new functionalities
 Running lesion studies to show new features are required
 Using performance measures appropriate to the task
These steps will produce conceptual advances and scientific
understanding far more than will mindless bake-offs.
Concluding Remarks: Summary
Learning plan knowledge is a key area with many open problems:
 Learning from traces, advice, and other sources
 Transferring knowledge within and across domains
 Learning and extending rich structures like HTNs
These challenges will benefit from earlier work on plan learning,
but they also require new ideas.
Together, they should lead us toward learning systems that rival
humans in their flexibility and power.
End of Presentation
ICARUS Concepts for In-City Driving
((in-rightmost-lane ?self ?clane)
:percepts ((self ?self) (segment ?seg)
(line ?clane segment ?seg))
:relations ((driving-well-in-segment ?self ?seg ?clane)
(last-lane ?clane)
(not (lane-to-right ?clane ?anylane))))
((driving-well-in-segment ?self ?seg ?lane)
:percepts ((self ?self) (segment ?seg) (line ?lane segment ?seg))
:relations ((in-segment ?self ?seg) (in-lane ?self ?lane)
(aligned-with-lane-in-segment ?self ?seg ?lane)
(centered-in-lane ?self ?seg ?lane)
(steering-wheel-straight ?self)))
((in-lane ?self ?lane)
:percepts ((self ?self segment ?seg) (line ?lane segment ?seg dist ?dist))
:tests
( (> ?dist -10) (<= ?dist 0)))
Representing Short-Term Beliefs/Goals
(current-street me A)
(lane-to-right g599 g601)
(last-lane g599)
(at-speed-for-u-turn me)
(steering-wheel-not-straight me)
(in-lane me g599)
(on-right-side-in-segment me)
(building-on-left g288)
(building-on-left g427)
(building-on-left g431)
(building-on-right g287)
(increasing-direction me)
(current-segment me g550)
(first-lane g599)
(last-lane g601)
(slow-for-right-turn me)
(centered-in-lane me g550 g599)
(in-segment me g550)
(intersection-behind g550 g522)
(building-on-left g425)
(building-on-left g429)
(building-on-left g433)
(building-on-right g279)
(buildings-on-right g287 g279)
ICARUS Skills for In-City Driving
((in-rightmost-lane ?self ?line)
:percepts ((self ?self) (line ?line))
:start
((last-lane ?line))
:subgoals ((driving-well-in-segment ?self ?seg ?line)))
((driving-well-in-segment ?self ?seg ?line)
:percepts ((segment ?seg) (line ?line) (self ?self))
:start
((steering-wheel-straight ?self))
:subgoals ((in-segment ?self ?seg)
(centered-in-lane ?self ?seg ?line)
(aligned-with-lane-in-segment ?self ?seg ?line)
(steering-wheel-straight ?self)))
((in-segment ?self ?endsg)
:percepts ((self ?self speed ?speed) (intersection ?int cross ?cross)
(segment ?endsg street ?cross angle ?angle))
:start
((in-intersection-for-right-turn ?self ?int))
:actions ((steer 1)))
ICARUS Interleaves Execution and Problem Solving
Skill Hierarchy
Problem
Reactive
Execution
?
no
impasse?
Primitive Skills
yes
Executed plan
Problem
Solving
This organization reflects the psychological distinction between automatized
and controlled behavior.