Transcript Document
CISC453 Winter 2010
Planning & Acting in the Real World
AIMA3e Ch 11
Time & Resources
Hierarchical Techniques
Relaxing Environmental
Assumptions
2
Overview
extending planning language & algorithms
1. allow actions that have durations & resource constraints
yields a new "scheduling problem" paradigm
incorporating action durations & timing, required resources
2. hierarchical planning techniques
control the complexity of large scale plans by hierarchical
structuring of actions
3. uncertain environments
non-deterministic domains
4. multiagent environments
Planning & Acting in the Real World
Scheduling versus Planning
3
recall from classical planning (Ch 10)
PDDL representations only allowed us to decide the relative
ordering among planning actions
up till now we've concentrated on what actions to do, given
their PRECONDs & EFFECTs
in the real world, other properties must be considered
actions occur at particular moments in time, have a beginning
and an end, occupy or require a certain amount of time
for a new category of Scheduling Problems we need to consider
the absolute times when an event or action will occur & the
durations of the events or actions
typically these are solved in 2 phases: planning then scheduling
a planning phase selects actions, respecting ordering constraints
this might be done by a human expert, and automated planners are
suitable if they yield minimal ordering constraints
then a scheduling phase incorporates temporal information so that
the result meets resource & deadline constraints
Time, Schedules & Resources
4
the Job-Shop Scheduling (JSS) paradigm includes
the requirement to complete a set of jobs
each job consists of a sequence of actions with ordering
constraints
each action
has a given duration and may also require some resources
resource constraints indicate the type of resource, the number of
it that are required, and whether the resource is consumed in the
action or is reusable
the goal is to determine a schedule
one that minimizes the total time required to complete all jobs,
(the makespan)
while respecting resource requirements & constraints
Planning & Acting in the Real World
Job-Shop Scheduling Problem (JSSP)
5
JSSP involves a list of jobs to do
where a job is a fixed sequence of actions
actions have quantitative time durations & ordering constraints
actions use resources (which may be shared among jobs)
to solve the JSSP: find a schedule that
determines a start time for each action
1. that obeys all hard constraints
e.g. no temporal overlap between mutex actions (those using the same
one-action-at-a-time resource)
2. for our purposes, we'll operationalize cost as the total time to
perform all actions and jobs
note that the cost function could be more complex (it could include the
resources used, time delays incurred, ...)
our example: automobile assembly scheduling
the jobs: assemble two cars
each job has 3 actions: add the engine, add the wheels, inspect the
whole car
a resource constraint is that we do the engine & wheel actions at a
special one-car-only work station
Ex: Car Construction Scheduling
6
the job shop scheduling problem of assembling 2 cars
includes required times & resource constraints
notation: A < B indicates action A must precede action B
Jobs({AddEngine1 < AddWheels1 < Inspect1},
{AddEngine2 < AddWheels2 < Inspect2})
Resources (EngineHoists(1), WheelStations(1), Inspectors(2), LugNuts(500))
Action(AddEngine1, DURATION: 30,
USE: EngineHoists(1))
Action(AddEngine2, DURATION: 60,
USE: EngineHoists(1))
Action(AddWheels1, DURATION:30,
CONSUME: LugNuts(20), USE: WheelStations(1))
Action(AddWheels2, DURATION:15,
CONSUME: LugNuts(20), USE: WheelStations(1))
Action(Inspecti DURATION: 10,
USE: Inspectors(1))
Planning & Acting in the Real World
Car Construction Scheduling
7
note that the action schemas
list resources as numerical quantities, not named entities
so Inspectors(2), rather than Inspector(I1) & Inspector(I2)
this process of aggregation is a general one
it groups objects that are indistinguishable with respect to the
current purpose
this can help reduce complexity of the solution
for example, a candidate schedule that requires (concurrently)
more than the number of aggregated resources can be rejected
without having to exhaustively try assignments of individuals to
actions
Planning & Acting in the Real World
Planning + Scheduling for JSSP
8
Planning + Scheduling for Job-Shop Problems
scheduling differs from standard planning problem
considers when an action starts and when it ends
so in addition to order (planning), duration is also considered
we begin with ignoring the resource constraints, solving the
temporal domain issues to minimize the makespan
this requires finding the earliest start times for all actions
consistent with the problem's ordering constraints
we create a partially-ordered plan, representing ordering
constraints in a directed graph of actions
then we apply the critical path method to determine the start and
end times for each action
Planning & Acting in the Real World
Graph of POP + Critical Path
the critical path is the path with longest total duration
it is "critical" in that it sets the duration for the whole plan and
delaying the start of any action on it extends the whole plan
it is the sequence of actions, each of which has no slack
each must begin at a particular time, otherwise the whole plan is
delayed
actions off the critical path have a window of time given by the
earliest possible start time ES & the latest possible start time LS
the illustrated solution assumes no resource constraints
note that the 2 engines are being added simultaneously
the figure shows [ES, LS] for each action, & slack is LS - ES
the time required is indicated below the action name & bold links
mark the critical path
9
JSSP: (1)Temporal Constraints
10
schedule for the problem
is given by ES & LS times for all actions
note the 15 minutes slack for each action in the top job, versus 0
(by definition) in the critical path job
formulas for ES & LS also outline a dynamic-programming
algorithm for computing them
A, B are actions, A < B indicates A must come before B
ES(Start) =0
ES(B) = maxA<B ES(A) + Duration(A)
LS(Finish) = ES(Finish)
LS(A) = minB>A LS(B) - Duration(A)
complexity is O(Nb) where N is number of actions and b is the
maximum branching factor into or out of an action
so without resource constraints, given a partial ordering of actions,
finding the minimum duration schedule is (a pleasant surprise!)
computationally easy
JSSP: (1)Temporal Constraints
11
timeline for the
solution
grey rectangles
give intervals for
actions
empty portions
show slack
Planning & Acting in the Real World
Solution from POP + Critical Path
1. the partially-ordered plan (above)
2. the schedule from the critical-path method (below)
notice that this solution still omits resource constraints
for example, the 2 engines are being added simultaneously
12
Scheduling with Resources
13
including resource constraints
critical path calculations involve conjunctions of linear
inequalities over action start & end times
they become more complicated when resource constraints are
included (for example, each AddEngine action requires the 1
EngineHoist, so they cannot overlap)
they introduce disjunctions of linear inequalities for possible
orderings & as a result, complexity becomes NP-hard!!
here's a solution accounting for resource constraints
reusable resources are in the left column, actions align with resources
this shortest solution schedule requires 115 minutes
Scheduling with Resources
including resource constraints
notice
that the shortest solution is 30 minutes longer than the critical
path without resource constraints
that multiple inspector resource units are not needed for this job,
indicating the possibility for reallocation of this resource
that the "critical path" now is: AddEngine1, AddEngine2,
AddWheels2, Inspect2.
the remaining actions have considerable slack time, they can begin
much later without affecting the total plan time
14
Scheduling with Resources
15
for including resource constraints
a variety of solution techniques have been tested
one simple approach uses the minimum slack heuristic
at each step schedule next the unscheduled action that has its
predecessors scheduled & has the least slack
update ES & LS for impacted actions & repeat
note the similarity to minimum-remaining values (MRV)
heuristic of CSPs
applied to this example, it yields a 130 minute solution
15 minutes longer than the optimal solution
difficult scheduling problems may require a different approach
they may involve reconsidering actions & constraints, integrating
the planning & scheduling phases by including durations &
overlaps in constructing the POP
this approach is a focus of current research interest
Planning & Acting in the Real World
Time & Resource Constraints
16
summary
alternative approaches to planning with time & resource
constraints
1. serial: plan, then schedule
use a partial or full-order planner
then schedule to determine actual start times
2. interleaved: mix planning and scheduling
for example, include resource constraints during partial planning
these can determine conflicts between actions
notes:
remember that so far we are still working in classical planning
environments
so, fully observable, deterministic, static and discrete
Planning & Acting in the Real World
17
Hierarchical Planning
next
we add techniques to the handle plan complexity issue
HTN: hierarchical task network planning
this works in a top-down fashion
similar to the stepwise refinement approach to programming
plans that are built from a fixed set of small atomic actions
will become unwieldy as the planning problem grows large
we need to plan at a higher level of abstraction
reduce complexity by hierarchical decomposition of plan steps
at each level of the hierarchy a planning task is reduced to a
small number of activities at the next lower level
the low number of activities
means the computational cost of arranging these activities can
be lowered
Planning & Acting in the Real World
18
Hierarchical Planning
an example: the Hawaiian vacation plan
recall: the AIMA authors live/work in San Francisco Bay area
go to SFO airport
take flight to Honolulu
do vacation stuff for 2 weeks
take flight back to SFO
go Home
each action in this plan actually embodies another planning
task
for example: the go to SFO airport action might be expanded
drive to long term parking at SFO
park
take shuttle to passenger terminal
& each action can be decomposed until the level consists of
actions that can be executed without deliberation
note: some component actions might not be refined until plan
execution time (interleaving: a somewhat different topic)
Planning & Acting in the Real World
Hierarchical Planning
19
basic approach
at each level, each component is reduced to a small number of
activities at the next lower level
this keeps the computational cost of arranging them low
otherwise, there are too many individual atomic actions for
non-trivial problems (yielding high branching factor & depth)
the formalism is HTN planning
Hierarchical Task Network planning
notes
we retain the basic environmental assumptions as for classical
planning
what we previously simply called actions are now "primitive
actions"
we add HLAs: High Level Actions (like go to SFO airport)
each has 1 or more possible refinements
refinements are sequences of actions, either HLAs or primitive
actions
Hierarchical Task Network
20
alternative refinements: notation
for the HLA: Go(Home, SFO)
Refinement (Go(Home, SFO),
STEPS: [Drive(Home, SFOLongTermParking),
Shuttle(SFOLongTermParking, SFO)])
Refinement (Go(Home, SFO),
STEPS: [Taxi(Home, SFO)])
the HLAs and their refinements
capture knowledge about how to do things
terminology: if the HLA refines to only primitive actions
it is called an implementation
the implementation of a high-level plan (sequence of HLAs)
concatenates the implementations for each HLA
the preconditions/effects representation of primitive action
schemas allows a decision about whether an implementation of
a high-level plan achieves the goal
Hierarchical Task Network
21
HLAs & refinements & plan goals
in the HTN approach, the goal is achieved if any
implementation achieves it
this is the case since an agent may choose the
implementation to execute (unlike non-deterministic
environments where "nature" chooses)
in the simplest case there's a single implementation of an HLA
we get preconds/effects from the implementation, and then treat
the HLA as a primitive action
where there are multiple implementations, either
1. search over implementations for 1 that solves the problem
OR
2. reason over HLAs directly
derive provably correct abstract plans independent of the specific
implementations
Planning & Acting in the Real World
Search Over Implementations
22
1. the search approach
this involves generation of refinements by replacing an HLA in
the current plan with a candidate refinement until the plan
achieves the goal
the algorithm on the next slide shows a version using
breadth-first tree search, considering plans in the order of the
depth of nesting of refinements
note that other search versions (graph-search) and strategies
(depth-first, iterative deepening) may be formulated by redesigning the algorithm
explores the space of sequences derived from knowledge in the
HLA library re: how things should be done
the action sequences of refinements & their preconditions code
knowledge about the planning domain
HTN planners can generate very large plans with little search
Planning & Acting in the Real World
Search Over Implementations
23
the search algorithm for refinements of HLAs
function HIERARCHICAL-SEARCH(problem, hierarchy) returns a solution or failure
frontier a FIFO queue with [Act] as the only element
loop do
if EMPTY?(frontier) then return failure
plan POP(frontier)
/* chooses the shallowest plan in frontier */
hla the first HLA in plan, or null if none
prefix, suffix the action subsequences before and after hla in plan
outcome RESULT(problem.INITIAL-STATE, prefix)
if hla is null then
/* so plan is primitive & outcome is its result */
if outcome satisfies problem.GOAL then return plan
/* insert all refinements of the current hla into the queue */
else for each sequence in REFINEMENTS(hla, outcome, hierarchy) do
frontier INSERT(APPEND(prefix, sequence, suffix), frontier)
Planning & Acting in the Real World
24
HTN Examples
O-PLAN: an example of a real-world system
the O-PLAN system does both planning & scheduling,
commercially for the Hitachi company
one specific sample problem concerns a product line of 350
items involving 35 machines and 2000+ different operations
for this problem, the planner produces a 30-day schedule of
3x8-hour shifts, with 10s of millions of steps
a major benefit of the hierarchical structure with the HTN
approach is the results are often easily understood by humans
abstracting away from excessive detail
(1) makes large scale planning/scheduling feasible
(2) enhances comprehensibility
Planning & Acting in the Real World
25
HTN Efficiency
computational comparisons for a hypothetical domain
assumption 1: a non-hierarchical progression planner with d
primitive actions, b possibilities at each state: O(bd)
assumption 2: an HTN planner with r refinements of each
non-primitive, each with k actions at each level
how many different refinement trees does this yield?
depth: number of levels below the root = logkd
then the number of internal refinement nodes = 1 + k + k2 + … +
klogkd-1 = (d - 1)/(k - 1)
each internal node has r possible refinements, so r(d - 1)/(k - 1)
possible regular decomposition trees
the message: keeping r small & k large yields big savings
(roughly kth root of non-hierarchical cost if b & r are
comparable)
nice as a goal, but long action sequences that are useful over a
range of problems are rare
Planning & Acting in the Real World
26
HTN Efficiency
HTN computational efficiency
building the plan library is critically important to achieving
efficiency gains HTN planning
so, might the refinements be learned?
as one example, an agent could build plans conventionally then
save them as a refinement of an HLA defined as the current
task/problem
one goal is "generalizing" the methods that are built, eliminating
problem-instance specific detail, keeping only key plan
components
Planning & Acting in the Real World
27
Hierarchical Planning
we've just looked at the approach of searching over
fully refined plans
that is, full implementations
the algorithm refines plans to primitive actions in order to
check whether they achieve the problem goal
now we move on to searching for abstract solutions
the checking occurs at the level of HLAs
possibly with preconditions/effects descriptions for HLAs
the result is that search is in the much smaller HLA space,
after which we refine the resulting plan
Planning & Acting in the Real World
Hierarchical Planning
28
searching for abstract solutions
this approach will require that HLA descriptions have the
downward refinement property
every high level plan that apparently solves the problem (from the
description of its steps) has at least 1 implementation that
achieves the goal
since search is not at the level of sequences of primitive
actions, a core issue is the describing of effects of actions
(HLAs) with multiple implementations
assuming a problem description with only +ve preconds & goals,
we might describe an HLA's +ve effects in terms of those achieved
by every implementation, and its -ve effects in terms of those
resulting from any implementation
this would satisfy the downward refinement property
however, requiring an effect to be true for every implementation is
too restrictive, it assumes that an adversary chooses the
implementation (assumes an underlying non-deterministic model)
29
Plan Search in HLA Space
plan search in HLA space
there are alternative models for which implementation is
chosen, either
(1) demonic non-determinism where some adversary makes the
choice
(2) angelic non-determinism, where the agent chooses
if we adopt angelic semantics for HLA descriptions
the resulting notation uses simple set operations/notation
the key concept is that of the reachable set for some HLA h &
state s, notation: Reach(s, h)
this is the set of states reachable by any implementation of h
(since under angelic semantics, the agent gets to choose)
for a sequence of HLAs [h1, h2] the reachable set is the union
of all reachable sets from applying h2 in each state in the
reachable set of h1 (for notation details see p 411)
a sequence of HLAs forming a high level plan is a solution if
its reachable set intersects the set of goal states
Planning & Acting in the Real World
30
Plan Search in HLA Space
illustration of reachable sets, sequences of HLAs
dots are states, shaded areas = goal states
darker arrows: possible implementations of h1
lighter arrows: possible implementations of h2
(a) reachable set for HLA h1
(b) reachable set for the sequence [h1, h2]
circled dots show the sequence achieving the goal
Planning & Acting in the Real World
31
Planning in HLA Space
using this model
planning consists of searching in HLA space for a sequence
with a reachable set that intersects the goal, then refining
that abstract plan
note: we haven't considered yet the issue of
representing reachable sets as the effects of HLAs
our basic planning model has states as conjunctions of fluents
if we treat the fluents of a planning problem as state
variables, then under angelic semantics an HLA controls the
values of these variables, depending on which implementation
is actually selected
HLA may have 9 different effects on a given variable
if it starts true, in can always keep it true, always make it false,
or have a choice & similarly for a variable that is initially false
any combination of the 3 choices for each case is possible,
yielding 32 or 9 effects
Planning & Acting in the Real World
32
Planning in HLA Space
using this model
so there are 9 possible combinations of choices for the effects
on variables
we introduce some additional notation to capture this idea
note some slight formatting differences between the details of
the notation used here versus in the textbook
~ indicates possibility, the dependence on the agent's choice of
implementation
~+A indicates the possibility of adding A
~-A represents the possible deleting of A
~±A stands for possibly adding or deleting A
Planning & Acting in the Real World
Planning in HLA Space
33
possible effects of HLAs
a simple example uses the HLA for going to the airport
Go(Home, SFO)
Refinement (Go(Home, SFO),
STEPS: [Drive(Home, SFOLongTermParking), Shuttle(SFOLongTermParking, SFO)])
Refinement (Go(Home, SFO),
STEPS: [Taxi(Home, SFO)])
this HLA has ~-Cash as a possible effect, since the agent may choose
the refinement of going by taxi & have to pay
we can use this notation & angelic reachable state semantics to
illustrate how an HLA sequence [h1, h2] reaches a goal
it's often the case that an HLA's effects can only be approximated
(since it may have infinitely many implementations & produce
arbitrarily "wiggly" reachable sets)
we use approximate descriptions of result states of HLAs that are
optimistic: REACH+(s, h) or pessimistic: REACH-(s, h)
one may overestimate, the other underestimate
here's the definition of the relationship
REACH-(s, h) REACH(s, h) REACH+(s, h)
Planning in HLA Space
34
possible effects of HLAs using approximate descriptions
of result states
with approximate descriptions, we need to reconsider how to
apply/interpret the goal test
(1) if the optimistic reachable set for a plan does not intersect the
goal, then the plan is not a solution
(2) if the pessimistic reachable set for a plan intersects the goal,
then the plan is a solution
(3) if the optimistic set intersects but the pessimistic set does not,
the goal test is not decided & we need to refine the plan to resolve
residual ambiguity
Planning in HLA Space
35
illustration
shading shows the set of goal states
reachable sets: R+ (optimistic) shown by dashed boundary, R(pessimistic) by solid boundary
in (a) the plan shown by a dark arrow achieves the goal & the
plan shown by the lighter arrow does not
in (b), the plan needs further refinement since the R+
(optimistic) set intersects the goal but the R- (pessimistic) does
not
Planning in HLA Space
36
the algorithm
hierarchical planning with approximate angelic descriptions
function ANGELIC-SEARCH(problem, hierarchy, initialPlan) returns solution or fail
frontier a FIFO queue with initialPlan as the only element
loop do
if EMPTY?(frontier) then return fail
plan POP(frontier)
/* chooses shallowest node in frontier */
if REACH+(problem.INITIAL-STATE, plan) intersects problem.GOAL then
/* opt'c*/
if plan is primitive then return plan
/* REACH+ is exact for primitive plans */
guaranteed REACH-(problem.INITIAL-STATE, plan) problem.GOAL /* pess'c*/
/* pessimistic set includes a goal state & we're not in infinite regress of refinements */
if guaranteed {} and MAKING-PROGRESS(plan, initialPlan) then
finalState any element of guaranteed
return DECOMPOSE(hierarchy, problem.INITIAL-STATE, plan, finalState)
hla some HLA in plan
prefix, suffix the action subsequences before & after hla in plan
for each sequence in REFINEMENTS(hla, outcome, hierarchy) do
frontier INSERT(APPEND(prefix, sequence, suffix), frontier)
Planning in HLA Space
the decompose function
mutually recursive with ANGELIC-SEARCH
regress from goal to generate successful plan at next level of
refinement
function DECOMPOSE(hierarchy, s0, plan, sf) returns a solution
solution an empty plan
while plan is not empty do
action REMOVE-LAST(plan)
si a state in REACH-(s0, plan) such that sf REACH-(si, action)
problem a problem with INITIAL-STATE = si and GOAL = sf
solution APPEND(ANGELIC-SEARCH(problem, hierarchy, action), solution)
s f si
return solution
37
38
Planning in HLA Space
notes
ANGELIC-SEARCH has the same basic structure as the
previous algorithm (BFS in space of refinements)
the algorithm detects plans that are or aren't solutions by
checking intersections of optimistic & pessimistic reachable
sets with the goal
when it finds a workable abstract plan, it decomposes the
original problem into subproblems, one for each step of the
plan
the initial state & goal for each subproblem are derived by
regressing the guaranteed reachable goal state through the
action schemas for each step of the plan
ANGELIC-SEARCH has a computational advantage over the
previous hierarchical search algorithm, which in turn may
have a large advantage over plain old exhaustive search
Planning & Acting in the Real World
Least Cost & Angelic Search
39
the same approach can be adapted to find a least cost
solution
this generalizes the reachable set concept so that a state,
instead of being reachable or not, has costs for the most
efficient way of getting to it ( for unreachable states)
then optimistic & pessimistic descriptions bound the costs
the holy grail of hierarchical planning
this revision may allow finding a provably optimal abstract plan
without checking all implementations
extensions: the approach can also be applied to online search
in the form of hierarchical lookahead algorithms (recall LRTA*)
the resulting algorithm resembles the human approach to problems
like the vacation plan
initially consider alternatives at the abstract level, over long time
scales
leave parts of the plan abstract until execution time, though other
parts are expanded into detail (flights, lodging) to guarantee feasibility
of the plan
Nondeterministic Domains
40
finally, we'll relax some of the environment
assumptions of the classical planning model
in part, these parallel the extensions of our earlier (CISC352)
discussions of search
we'll consider the issues in 3 sub-categories
(1) sensorless planning (conformant planning)
completely drop the observability property for the environment
(2) contingency planning
for partially observable & nondeterministic environments
(3) online planning & replanning
for unknown environments
however, we begin with some background
Planning & Acting in the Real World
BKGD: Nondeterministic Domains
41
note some distinct differences from the search
paradigms
the factored representation of states allows an alternative belief
state representation
plus, we have the availability of the domain-independent
heuristics developed for classical planning
as usual, we explore issues using a prototype problem
this time it's the task of painting a chair & table so that their
colors match
in the initial state, the agent has 2 cans of paint, colors unknown,
likewise the chair & table colors are unknown, & only the table is
visible
plus there are actions to remove the lid of a can, & to paint from
an open can (see the next slide)
The Furniture Painting Problem
42
the furniture painting problem
Init(Object(Table) Object(Chair) Can(C1) Can(C2) InView(Table)
Goal(Color(Chair, c) Color(Table, c))
Action(RemoveLid(can),
PRECOND: Can(can)
EFFECT: Open(can))
Action(Paint(x, can),
PRECOND: Object(x) Can(can) Color(Can, c) Open(can)
EFFECT: Color(x, c))
Planning & Acting in the Real World
BKGD: Nondeterministic Domains
43
the environment
since it may not be fully observable, we'll allow action
schemas to have variables in preconditions & effects that
aren't in the action's variable list
Paint(x, can) omits the variable c representing the color of the
paint in can
the agent may not know what color is in a can
in some variants, the agent will have to use percepts it gets
while executing the plan, so planning needs to model sensors
the mechanism: Percept Schemas
Percept (Color(x, c),
PRECOND: Object(x) InView(x))
Percept (Color(can, c),
PRECOND: Can(can) InView(can) Open(can))
when an object is in view, the agent will perceive its color
if an open can is in view, the agent will perceive the paint color
Planning & Acting in the Real World
BKGD: Nondeterministic Domains
44
we still need an Action Schema for inspecting objects
Action (LookAt(x),
PRECOND: InView(y) (x y)
EFFECT: InView(x) ¬ InView(y))
in a fully observable environment, we include a percept axiom
with no preconds for each fluent
of course, a sensorless agent has no percept axioms
note: it can still coerce the table & chair to the same color to
solve the problem (though it won't know what color that is)
a contingent planning agent with sensors can do better
inspect the objects, & if they're the same color, done
otherwise check the paint cans & if one is the same color as an
object, paint the other object with it
otherwise paint both objects any color
an online agent produces contingent plans with few branches
handling problems as they occur by replanning
Planning & Acting in the Real World
BKGD: Nondeterministic Domains
45
a contingent planner assumes that the effects of an action are
successful
a replanning agent checks results, generating new plans to fix
any detected flaws
in the real world we find combinations of approaches
Planning & Acting in the Real World
Sensorless Planning Belief States
46
unobservable environment = Sensorless Planning
these problems are belief state planning problems with physical
transitions represented by action schemas
we assume a deterministic environment
we represent belief states as logical formulas rather than the
explicit sets of atomic states we saw for sensorless search
for the prototype planning problem: furniture painting
1. we omit the InView fluents
2. some fluents hold in all belief states, so we can omit them for
brevity: (Object(Table), Object(Chair), Can(C1), Can(C2))
3. the agent knows things have a color (x c Color(x, c)), but
doesn't know the color of anything or the open vs closed state of
cans
4. yields an initial belief state b0 = Color(x, C(x)), where C(x) is a
Skolem function to replace the existentially quantified variable
5. we drop the closed-world assumption of classical planning, so
states may contain +ve & -ve fluents & if a fluent does not appear,
its value is unknown
Sensorless Planning Belief States
47
belief states
specify how the world could be
they are represented as logical formulas
each is a set of possible worlds that satisfy the formula
in a belief state b, actions available to the agent are those
with their preconds satisfied in b
given the initial belief state b0 = Color(x, C(x)), a simple
solution for the painting problem plan is:
[RemoveLid(Can1), Paint(Chair, Can1), Paint(Table, Can1)]
we'll update belief states as actions are taken, using the rule
b' = RESULT(b, a) = {s': s' = RESULTP(s, a) and s b}
where RESULTP defines the physical transition model
Planning & Acting in the Real World
Sensorless Planning Belief States
48
updating belief states
we assume that the initial belief state is 1-CNF form, that is, a
conjunction of literals
b' is derived based on what happens for the literals l in the
physical states s that are in b when a is applied
if the truth value of a literal is known in b then in b' it is given by
the current value, plus the add list of a & the delete list of a
if a literal's truth value is unknown, 1 of 3 cases applies
1. a adds l so it must be true in b'
2. a deletes l so it must be false in b'
3. a does not affect l so it remains unknown (thus is not in b')
Planning & Acting in the Real World
Sensorless Planning Belief States
49
updating belief states: the example plan
recall the sensorless agent's solution plan for the furniture
painting problem
[RemoveLid(Can1), Paint(Chair, Can1), Paint(Table, Can1)]
apply RemoveLid(Can1) to b0 = Color(x, C(x))
(1) b1= Color(x, C(x)) Open(Can1)
apply Paint(Chair, Can1) to b1
precondition Color(Can1, c) is satisfied by Color(x, C(x)) with the
binding {x/Can1, c/C(Can1)}
(2) b2 = Color(x, C(x)) Open(Can1) Color(Chair, C(Can1))
now apply the last action to get the next belief state, b3
(3) b3 = Color(x, C(x)) Open(Can1) Color(Chair, C(Can1)) Color(Table, C(Can1))
note that this satisfies the plan goal (Goal(Color(Chair, c)
Color(Table, c))with c bound to C(Can1)
Sensorless Planning Belief States
50
the painting problem solution
this illustrates that the family of belief states given as
conjunctions of literals is closed under updates defined by
PDDL action schemas
so given n total fluents, any belief state is represented as a
conjunction of size O(n) (despite the O(2n) states in the
world)
however, this is only the case when action schemas have the
same effects for all states in which their preconds are satisfied
if an action's effects depends on the state, dependencies among
fluents are introduced & the 1-CNF property does not apply
illustrated by an example from the simple vacuum world on the
next slides
Planning & Acting in the Real World
51
Recall Vacuum World
the simple vacuum world state space
Planning & Acting in the Real World
Sensorless Planning Belief States
52
if an action's effects depends on the state
dependencies among fluents are introduced & the 1-CNF
property does not apply
the effect of the Suck action depends on where it is done
(CleanL if agent is AtL, but CleanR if agent is AtR)
this requires conditional effects for action schemas:
when condition: effect, or for the vacuum world
Action (Suck,
Effect: when AtL: CleanL when AtR: CleanR)
considering conditional effects & belief states
applying the conditional action to the initial belief state yields a
result belief state
(AtL CleanL) (AtR CleanR)
so the belief state formula is no longer 1-CNF, and in the
worst case may be exponential in size
Planning & Acting in the Real World
Sensorless Planning Belief States
53
to a degree, the available options are
(1) use conditional effects for actions & deal with the loss of
the belief state representational simplicity
(2) use a conventional action representation whose
preconditions, if unsatisfied, are inapplicable & leave the
resulting state undefined
for sensorless planning, conditional effects are preferable
they yield "wiggly" belief states (& maybe that's inevitable
anyway for non-trivial problems)
an alternative is a conservative approximation of belief states (all
literals whose truth values can be determined, with the others
treated as unknown)
this yields planning that is sound but incomplete (if problem requires
interactions among literals)
Planning & Acting in the Real World
Sensorless Planning Belief States
54
another alternative
the agent (algorithm) could attempt to use actions
sequences that keep the belief state simple (1-CNF) as in
this vacuum world example
the target is a plan consisting of actions that will yield the simple
belief state representation, for example:
[Right, Suck, Left, Suck]
b0 = True
b1 = AtR
b2 = AtR CleanR
b3 = AtL CleanR
b4 = AtL CleanR CleanL
note that some alternative sequences (e.g. those beginning with
the Suck action) would break the 1-CNF representation
more simple belief states are attractive, as even human behaviour
shows - the evidence is our carrying out of frequent small actions
to reduce uncertainty (keeping the belief state manageable)
Sensorless Planning Belief States
55
yet another alternative for representing belief states
under the relaxed observability
we might represent belief states in terms of an initial belief
state + a sequence of actions, yielding an O(n + m) bound on
belief state size
a world of n literals, with a maximum of m actions in a sequence
if so, the issues relate to the difficulty of calculating when an
action is applicable or a goal is satisfied
we might use an entailment test: b0 Am ╞ Gm, where
b0 is the initial belief state
Am are the successor state axioms for the actions in the
sequence, and Gm states the goal is achieved after m actions
so we want to show b0 Am ¬Gm is unsatisfiable
a good SAT solver may be able to determine this quite efficiently
Planning & Acting in the Real World
Sensorless Planning Heuristics
as a last consideration
we return to the question of the use of heuristics to prune the
search space
notice that for belief states, solving for a subset of the belief
state must be easier than solving it entirely
if b1 b2 then h*(b1) h*(b2)
thus an admissible heuristic for a subset of states in the belief
state is an admissible heuristic for the belief state
candidate subsets include singletons, the individual states
assuming we adopt 1 of the admissible heuristics we saw for
classical planning, and that s1, ..., sN is a random selection of
states in belief state b, an accurate admissible heuristic is
H(b) = max{h(s1), ..., h(sN)}
still other alternatives involve converting to planning graph form,
where the initial state layer is derived from b
just its literals if b is 1-CNF or potentially derived from a non-CNF
representation
56
Contingent Planning
we relax some of the environmental assumptions of
classical planning to deal with environments that are
partially observable and/or non-deterministic
for such environments, a plan includes branching based on
percepts (recall percept schemas from the introduction)
Percept (Color(x, c),
PRECOND: Object(x) InView(x))
Percept (Color(can, c),
PRECOND: Can(can) InView(can) Open(can))
at plan execution, we represent a belief state as logical
formulas
the plan includes contingent/conditional branches
check branch conditions: does the current belief state entail
the condition or its negation
the conditions include first order properties (existential
quantification), so they may have multiple substitutions
an agent gets to choose one, applying it to the remainder of
the plan
57
Contingent Planning
a contingent plan solution for the painting problem
[LookAt(Table), LookAt(Chair),
if Color(Table, c) Color(Chair, c) then NoOp
else [RemoveLid(Can1), LookAt(Can1), RemoveLid(Can2), LookAt(Can2)
if Color(Table, c) Color(can, c) then Paint(Chair, can)
else if Color(Chair, c) Color(Can, c) then Paint(Table, can)
else [Paint(Chair, Can1), Paint(Table, Can1)]]]
note: Color(Table, c) Color(can, c)
this might be satisfied under both {can/Can1} and {can/Can2} if
both cans are the same color as the table
the previous-to-new belief state calculation occurs in 2 stages
(1) after an action, a, as with the sensorless agent
b^ = (b - DEL(a)) Add(a), where b^ is the predicted belief state,
represented as a conjunction of literals
(2) then in the percept stage, determine which percept axioms
hold in the now partially updated belief state, and add their
percepts + preconditions
58
Contingent Planning
59
(2) updating the belief state from the percept axioms
Percept(p, PRECOND: c), where c is conjunction of literals
suppose percept literals p1, ..., pk are received
for a given percept p, there's either a single percept axiom or there
may be more than 1
if just 1, add it's percept literal & preconditions to the belief state
if > 1, then we have to deal with multiple candidate preconditions
add p & the disjunction of the preconditions that may hold in the
predicted belief state b^
if this is the case, we've given up the 1-CNF form for belief state
representation and similar issues arise as for conditional effects for the
sensorless planner
given a way to generate exact or approximate belief states
(1) the algorithm for contingent search may generate contingent
plans
(2) actions with nondeterministic effects (disjunctive EFFECTs) can
be handled with minor changes to belief state updating
(3) heuristics, including those that were suggested for sensorless
planning, are available
Contingent Planning
the AND-OR-GRAPH-SEARCH algorithm
AND nodes indicate non-determinism, must all be handled, while
OR nodes indicate choices of actions from states
the algorithm
is depth first, mutually recursive, & returns a conditional plan
notation: [x | l] is the list formed by prepending x to the list l
function AND-OR-GRAPH-SEARCH(problem) returns a conditional plan, or failure
return OR-SEARCH(problem.INITIAL-STATE, problem, [])
function OR-SEARCH(state, problem, path) returns a conditional plan or failure
if problem.GOAL-TEST(state) then return the empty plan
if state is on path then return failure
/* repeated state on this path */
for each action in problem.ACTIONS(state) do
plan AND-SEARCH(RESULTS(state, action), problem, [state | path] )
if plan failure then return [action | plan]
return failure
function AND-SEARCH(states, problem, path) returns a conditional plan or failure
for each si in states do
plani OR-SEARCH(si, problem, path )
if plan = failure then return failure
return [ if s1 then plan1 else if s2 then plan2 else … if sn-1 then plann-1 else plann]
60
Online Replanning
61
replanning
this approach uses/captures knowledge about what the agent is
trying to do
some form of execution monitoring triggers replanning
it interleaves executing & planning, dealing with some
contingencies by including Replan branches in the plan
if the agent encounters a Replan during plan execution, it returns
to planning mode
why Replan?
may be error or omission in the world model used to build the plan
e.g. no state variable to represent the quantity of paint in a can (so
it could even be empty), or exogenous events (a can wasn't
properly sealed & the paint dried up), or a goal may be changed
environment monitoring by the online agent
(1) action monitoring: check preconds before executing an action
(2) plan monitoring: check that the remaining plan will still work
(3) goal monitoring: before executing, ask: "Is a better set of
goals available?"
62
Online Replanning
a replanning example
action monitoring indicates the agent's state is not as
planned, so it should try to get back to a state in the original
plan, minimizing total cost
when the agent finds it is in not in the expected state, E, but
observes that it is instead in O, it Replans
Planning & Acting in the Real World
Online Replanning
63
replanning in the furniture painting problem
[LookAt(Table), LookAt(Chair),
if Color(Table, c) Color(Chair, c) then NoOp
else [RemoveLid(Can1), LookAt(Can1),
if Color(Table, c) Color(Can1, c) then Paint(Chair, Can1)
else REPLAN]]
the online planning agent, having painted the Chair, checks the
preconds for the remaining empty plan: that the table & chair
are the same colour
suppose the new paint didn't cover well & the old colour still shows
the agent needs to determine where in the whole plan to return to,
& what repair action sequence to use to get there
given that the current state matches that before Paint(Chair, Can1), an
empty repair sequence & new plan of the same [Paint] sequence is OK
the agent resumes execution monitoring, retries the Paint action &
loops like this until colours match
note that the loop is online: plan-execute-replan, not explicit in the
plan
64
Online Replanning
replan
the original plan doesn't handle all contingencies, the REPLAN
step could generate an entirely new plan
a plan monitoring agent may detect faults earlier, before the
corresponding actions are executed: when the current state
means that the remaining plan won't work
so it checks preconditions for success of the remaining plan
for each of its steps, except those contributed by some other step in
the remaining plan
the goal is to detect future failure as early as possible, & replan
note: in (rare) cases it might even detect serendipitous success
action monitoring by checking preconditions is relatively easy
to include but plan monitoring is more difficult
partial order & planning graph structures include information that
may support the plan monitoring approach
Planning & Acting in the Real World
65
Online Replanning
with replanning, plans will always succeed, right?
still there can be "dead ends", states from which no repair is
possible
a flawed model can lead the plan into dead ends
an example of a flawed model: the general assumption of
unlimited resources (for example, bottomless paint cans)
however, if we assume there are no dead ends, there will be a
plan to reach a goal from any state
and if we further assume that the environment is truly nondeterministic (that there's always a non-zero chance of success)
then a replanning agent will eventually achieve the goal
Planning & Acting in the Real World
66
Online Replanning
when replanning fails
another problem is that actions may not really be nondeterministic - instead, they may depend on preconditions the
agent does not know about
for example, that painting from an empty paint can has no effect
& will never lead to the goal
there are alternative approaches to cope with such failures
(1) the agent might randomly select a candidate repair plan
(open another can?)
(2) the agent also might learn a better model
modifying the world model to match percepts when predictions fail
Planning & Acting in the Real World
Multiagent Planning
67
the next relaxation of environmental assumptions
there may be multiple agents whose actions need to be taken
into account in formulating our plans
background: distinguish several slightly different paradigms
(1) multieffector planning
this is what we might call multitasking, really a single central agent
but with multiple ways of interacting with the environment,
simultaneously (or, like a multiarmed robot)
(2) multibody planning
here we consider multiple detached units moving separately, but
sharing percepts to generate a common representation of the
world state that is the basis of the plan
one version of the multibody scenario has central plan formulation
but somewhat decoupled execution
for example, a fleet/squadron of reconnaissance robots that are
sometimes out of communications range
multibody subplans for each individual body include communication
actions
68
Multiagent Planning
variations on the theme
with a central planning agent, there's a shared goal
it's also possible for distinct agents, each generating plans, to
have a shared goal
the latter paradigm suggests the new prototypical problem:
planning for a tennis doubles team
so shared goal situations can be either multibody (1 central
plan) or multiagent (each developing a plan, but with a
requirement for coordination mechanisms)
a system could even be some hybrid of centralized &
multiagent planning
as an example, the package delivery company develops
centralized routing plans but each truck driver may respond to
unforeseen weather, traffic issues with independent planning
Planning & Acting in the Real World
Multiagent Planning
69
our first model involves multiple simultaneous actions
the terminology is multiactor settings
we merge aspects of the multieffector, multibody, & multiagent
paradigms, then consider issues related to transition models,
correctness of plans, efficiency/complexity of planning algorithms
correctness: a correct plan, if carried out by the actors will
achieve the goal
note that in a true multiagent situation, they might not agree
synchronization: a simplifying assumption we apply that all
actions require the same length of time, & multiple actions at a
step in the plan are simultaneous
under a deterministic environment assumption, the transition
model is given by the function: Result(s, a)
action choices for a single agent = b, & b may be quite large
in the multiactor model with n actors, now an action is joint using
the notation <ai, ..., an>, where ai is the action for the ith actor
Multiactor Scenario
70
complexity implications of the transition model
now with bn joint actions we have a bn branching factor for
planning
since planning algorithm complexity was already an issue, a shared
target for multiactor planning systems is to treat the actors as
decoupled so that complexity is linear in n rather then exponential
the loose coupling of actors may allow an approximate to linear
improvement
this is analogous to issues we've encountered before: additive
heuristics for independent subproblems in planning, reducing of a
CSP graph to a tree (or multiple trees) to apply efficient
algorithms, ...
in multiactor planning: for loosely coupled problems, we treat them
as decoupled & then apply fixes as required to handle any
interactions
so the action schemas of the transition model treat actors as
independent
Multiactor Scenario
71
prototype problem: doubles tennis
the problem is formulated as returning a ball hit to the team,
while retaining court coverage
there are 2 players on the team, each is either at the net or
baseline, on the right side or left side of the court
actions are the moving of a player (actor) or the hitting of the
ball by a player
Planning & Acting in the Real World
72
Doubles Tennis Problem
here's the conventional (independence assumption)
multiactor problem setup for doubles tennis
Actors(A, B)
Init(At(A, LeftBaseline) At(B, RightNet) Approaching(Ball,
RightBaseline)) Partner(A, B) Partner(B, A)
Goal(Returned(Ball) At(a, RightNet) At(a, LeftNet)
Action(Hit(actor, Ball),
PRECOND: Approaching(Ball, loc) At(actor, loc)
EFFECT: Returned(Ball)
Action(Go(actor, to),
PRECOND: At(actor, loc) to loc
EFFECT: At(actor, to) ¬ At(actor, loc))
Planning & Acting in the Real World
Multiactor Tennis Doubles Scenario
73
for the multiactor tennis problem
here is a joint plan given the problem description
Plan 1:
A:[Go(A, RightBaseline), Hit(A, Ball)]
B:[NoOp(B), NoOp(B)]
what are issues given the current problem representation?
a legal and apparently successful plan could still have both
players hitting the ball at the same time (though that really won't
work)
the preconditions don't include constraints to preclude
interference of this type
a solution: revise the action schemas to include concurrent
action lists that can explicitly state actions are or are not
concurrent
Planning & Acting in the Real World
Controlling Concurrent Actions
74
a revised Hit action requires it be by 1 actor
this is represented by including a concurrent action list
Action(Hit(a, Ball),
CONCURRENT: b a ¬Hit(b, Ball)
PRECOND: Approaching(Ball, loc) At(a, loc)
EFFECT: Returned(Ball)
some actions might require concurrency for success
apparently tennis players require large coolers full of
refreshing drinks & 2 actors are required to carry the cooler
Action(Carry(a, cooler, here, there),
CONCURRENT: b a Carry(b, cooler, here, there)
PRECOND: At(a, here) At(cooler, here) Cooler(cooler)
EFFECT: At(a, there) At(cooler, there) ¬At(a, here) ¬At(cooler, here)
Planning & Acting in the Real World
75
Multiactor Scenario
given appropriately revised action schemas
including concurrent action lists
it becomes relatively simple to adapt the classical planning
algorithms for multiactor planning
it depends on there being loose coupling of subplans
so the plan search algorithm does not encounter concurrency
constraints too frequently
further, the HTN approaches, techniques for partial
observability, contingency & replanning techniques may also
be adapted for the loosely coupled multiactor problems
next: full blown multiagent scenarios
each agent makes independent plans
Planning & Acting in the Real World
76
Multiple Agents
cooperation & coordination
each agent formulates its own plan, but based on shared
goals & a shared knowledge base
we continue with the doubles tennis example problem
Plan 1:
A:[Go(A, RightBaseline), Hit(A, Ball)]
B:[NoOp(B), NoOp(B)]
Plan 2:
A:[Go(A, LeftNet), NoOp(A)]
B:[Go(B, RightBaseline), Hit(B, Ball)]
either of these plans may work if both agents use it, but if A
does 1 & B does 2 (or vice versa), both or neither returns the
ball
so there has to be some mechanism that results in agents
agreeing on a single plan
Planning & Acting in the Real World
Multiple Agents
77
techniques for agreement on a single plan
(A) convention: adopt or agree upon some constraint on the
selection of joint plans, for example in doubles tennis, "stay on
your side of the court"
or a baseball center fielder takes fly balls hit "in the gap"
conventions are observable at more global levels among multiple
agents, when, for example, drivers agree to drive on a particular
side of the road
in higher order contexts, the conventions become "social laws"
(B) communication: between agents, as when 1 doubles player
yells "mine" to a teammate
the signal indicates which is the preferred joint plan
see similar examples in other team sports as when a baseball fielder
calls for the catch on a popup
note that the communication could be non-verbal
plan recognition applies when 1 agent begins execution & the initial
actions unambiguously indicate which plan to follow
78
Multiple Agents
the AIMA authors discuss natural world conventions
these may be the outcome of evolutionary processes
in harvester ant colonies - there is no central control
yet they execute elaborate "plans" where each individual ant
takes on 1 of multiple roles based on its current local conditions
convention or communication?
planning & "spontaneous" human social events (Aberdeen)?
another example from the natural world is the flocking
behaviour of birds
this can be seen as a cooperative multiagent process
successful simulations of flocking behaviour algorithmically over
a collection of agents ("boids") are possible if each observes its
neighbours & maximizes a weighted sum of 3 elements
(1) cohesion: +ve for closer to average position of neighbours
(2) separation: -ve for too close to a neighbour
(3) alignment: +ve for closer to the average heading of neighbours
Planning & Acting in the Real World
Multiple Agents
79
convention & emergent behaviour
where complex global behavior can arise from the interaction of
simple local rules
in the boids example, the result is a pseudorigid "flock" that has
approximately constant density, does not disperse over time, &
makes occasional swooping motions
each agent operates without having any joint plan to explicitly
indicate actions of other agents
see some boids background & a demo at: boids online
UMP! (ultimate multiagent problems)
these involve cooperation within a team & competition against
another team, without central planning/control
robot soccer is an example, as are other similar dynamic team
sports (hockey, basketball)
may be less true of say baseball, football where some central
control is possible & high degree of convention + communication
Summary
80
moving away from the limits of classical planning
(1) actions consume (& possibly produce) resources which we
treat as aggregates to control complexity
formulate partial plans, taking resource constraints into account,
then refine them
(2) time is a resource that can be considered with dedicated
scheduling algorithms or perhaps integrated with planning
(3) a HTN (Hierarchical Task Network) approach captures
knowledge in HLAs (High Level Actions) that may have multiple
implementations as sequences of lower level actions
angelic semantics for interpreting the effects of HLAs allows
planning in the space of HLAs without refinement into primitive
actions
HTN systems can create large, real-world plans
(4) classical planning's environment assumptions are too
rigid/optimistic for many problem domains
full observability, deterministic actions, a single agent
Summary
81
relaxing the assumptions of classical planning
(5) contingent & sensorless planning
contingent planning uses percepts during execution to conditionally
branch to appropriate subplans
sensorless/conformant planning may succeed in coercing the world
to a goal state without any percepts
for contingent & sensorless paradigms, plans are built by search in
the belief space, for which the techniques must address
representational & computational issues
(6) online planning agents interleave execution & planning
they monitor for problems & repair plans to recover from
unplanned states, allowing them to deal with nondeterministic
actions, exogenous events, & poor models of the environment
(7) multiple agents might be cooperative or competitive
the keys to success are in mechanisms for coordination
(8) future chapters will cover
probabilistic non-determinism, learning from experience to acquire
strategies