Document

Transcript Document

CISC453 Winter 2010
Planning & Acting in the Real World
AIMA3e Ch 11
Time & Resources
Hierarchical Techniques
Relaxing Environmental
Assumptions
2
Overview
 extending planning language & algorithms
 1. allow actions that have durations & resource constraints
 yields a new "scheduling problem" paradigm
 incorporating action durations & timing, required resources
 2. hierarchical planning techniques
 control the complexity of large scale plans by hierarchical
structuring of actions
 3. uncertain environments
 non-deterministic domains
 4. multiagent environments
Planning & Acting in the Real World
Scheduling versus Planning
3
 recall from classical planning (Ch 10)
 PDDL representations only allowed us to decide the relative
ordering among planning actions
 up till now we've concentrated on what actions to do, given
their PRECONDs & EFFECTs
 in the real world, other properties must be considered
 actions occur at particular moments in time, have a beginning
and an end, occupy or require a certain amount of time
 for a new category of Scheduling Problems we need to consider
the absolute times when an event or action will occur & the
durations of the events or actions
 typically these are solved in 2 phases: planning then scheduling
 a planning phase selects actions, respecting ordering constraints
 this might be done by a human expert, and automated planners are
suitable if they yield minimal ordering constraints
 then a scheduling phase incorporates temporal information so that
the result meets resource & deadline constraints
Time, Schedules & Resources
4
 the Job-Shop Scheduling (JSS) paradigm includes
 the requirement to complete a set of jobs
 each job consists of a sequence of actions with ordering
constraints
 each action
 has a given duration and may also require some resources
 resource constraints indicate the type of resource, the number of
it that are required, and whether the resource is consumed in the
action or is reusable
 the goal is to determine a schedule
 one that minimizes the total time required to complete all jobs,
(the makespan)
 while respecting resource requirements & constraints
Planning & Acting in the Real World
Job-Shop Scheduling Problem (JSSP)
5
 JSSP involves a list of jobs to do
 where a job is a fixed sequence of actions
 actions have quantitative time durations & ordering constraints
 actions use resources (which may be shared among jobs)
 to solve the JSSP: find a schedule that
 determines a start time for each action
 1. that obeys all hard constraints
 e.g. no temporal overlap between mutex actions (those using the same
one-action-at-a-time resource)
 2. for our purposes, we'll operationalize cost as the total time to
perform all actions and jobs
 note that the cost function could be more complex (it could include the
resources used, time delays incurred, ...)
 our example: automobile assembly scheduling
 the jobs: assemble two cars
 each job has 3 actions: add the engine, add the wheels, inspect the
whole car
 a resource constraint is that we do the engine & wheel actions at a
special one-car-only work station
Ex: Car Construction Scheduling
6
 the job shop scheduling problem of assembling 2 cars
 includes required times & resource constraints
 notation: A < B indicates action A must precede action B
Jobs({AddEngine1 < AddWheels1 < Inspect1},
{AddEngine2 < AddWheels2 < Inspect2})
Resources (EngineHoists(1), WheelStations(1), Inspectors(2), LugNuts(500))
Action(AddEngine1, DURATION: 30,
USE: EngineHoists(1))
Action(AddEngine2, DURATION: 60,
USE: EngineHoists(1))
Action(AddWheels1, DURATION:30,
CONSUME: LugNuts(20), USE: WheelStations(1))
Action(AddWheels2, DURATION:15,
CONSUME: LugNuts(20), USE: WheelStations(1))
Action(Inspecti DURATION: 10,
USE: Inspectors(1))
Planning & Acting in the Real World
Car Construction Scheduling
7
 note that the action schemas
list resources as numerical quantities, not named entities
so Inspectors(2), rather than Inspector(I1) & Inspector(I2)
this process of aggregation is a general one
it groups objects that are indistinguishable with respect to the
current purpose
 this can help reduce complexity of the solution




 for example, a candidate schedule that requires (concurrently)
more than the number of aggregated resources can be rejected
without having to exhaustively try assignments of individuals to
actions
Planning & Acting in the Real World
Planning + Scheduling for JSSP
8
 Planning + Scheduling for Job-Shop Problems
 scheduling differs from standard planning problem
 considers when an action starts and when it ends
 so in addition to order (planning), duration is also considered
 we begin with ignoring the resource constraints, solving the
temporal domain issues to minimize the makespan
 this requires finding the earliest start times for all actions
consistent with the problem's ordering constraints
 we create a partially-ordered plan, representing ordering
constraints in a directed graph of actions
 then we apply the critical path method to determine the start and
end times for each action
Planning & Acting in the Real World
Graph of POP + Critical Path
 the critical path is the path with longest total duration
 it is "critical" in that it sets the duration for the whole plan and
delaying the start of any action on it extends the whole plan
 it is the sequence of actions, each of which has no slack
 each must begin at a particular time, otherwise the whole plan is
delayed
 actions off the critical path have a window of time given by the
earliest possible start time ES & the latest possible start time LS
 the illustrated solution assumes no resource constraints
 note that the 2 engines are being added simultaneously
 the figure shows [ES, LS] for each action, & slack is LS - ES
 the time required is indicated below the action name & bold links
mark the critical path
9
JSSP: (1)Temporal Constraints
10
 schedule for the problem
 is given by ES & LS times for all actions
 note the 15 minutes slack for each action in the top job, versus 0
(by definition) in the critical path job
 formulas for ES & LS also outline a dynamic-programming
algorithm for computing them
 A, B are actions, A < B indicates A must come before B
ES(Start) =0
ES(B) = maxA<B ES(A) + Duration(A)
LS(Finish) = ES(Finish)
LS(A) = minB>A LS(B) - Duration(A)
 complexity is O(Nb) where N is number of actions and b is the
maximum branching factor into or out of an action
 so without resource constraints, given a partial ordering of actions,
finding the minimum duration schedule is (a pleasant surprise!)
computationally easy
JSSP: (1)Temporal Constraints
11
 timeline for the
solution
 grey rectangles
give intervals for
actions
 empty portions
show slack
Planning & Acting in the Real World
Solution from POP + Critical Path
 1. the partially-ordered plan (above)
 2. the schedule from the critical-path method (below)
 notice that this solution still omits resource constraints
 for example, the 2 engines are being added simultaneously
12
Scheduling with Resources
13
 including resource constraints
 critical path calculations involve conjunctions of linear
inequalities over action start & end times
 they become more complicated when resource constraints are
included (for example, each AddEngine action requires the 1
EngineHoist, so they cannot overlap)
 they introduce disjunctions of linear inequalities for possible
orderings & as a result, complexity becomes NP-hard!!
 here's a solution accounting for resource constraints
 reusable resources are in the left column, actions align with resources
 this shortest solution schedule requires 115 minutes
Scheduling with Resources
 including resource constraints
 notice
 that the shortest solution is 30 minutes longer than the critical
path without resource constraints
 that multiple inspector resource units are not needed for this job,
indicating the possibility for reallocation of this resource
 that the "critical path" now is: AddEngine1, AddEngine2,
AddWheels2, Inspect2.
 the remaining actions have considerable slack time, they can begin
much later without affecting the total plan time
14
Scheduling with Resources
15
 for including resource constraints
 a variety of solution techniques have been tested
 one simple approach uses the minimum slack heuristic
 at each step schedule next the unscheduled action that has its
predecessors scheduled & has the least slack
 update ES & LS for impacted actions & repeat
 note the similarity to minimum-remaining values (MRV)
heuristic of CSPs
 applied to this example, it yields a 130 minute solution
 15 minutes longer than the optimal solution
 difficult scheduling problems may require a different approach
 they may involve reconsidering actions & constraints, integrating
the planning & scheduling phases by including durations &
overlaps in constructing the POP
 this approach is a focus of current research interest
Planning & Acting in the Real World
Time & Resource Constraints
16
 summary
 alternative approaches to planning with time & resource
constraints
 1. serial: plan, then schedule
 use a partial or full-order planner
 then schedule to determine actual start times
 2. interleaved: mix planning and scheduling
 for example, include resource constraints during partial planning
 these can determine conflicts between actions
 notes:
 remember that so far we are still working in classical planning
environments
 so, fully observable, deterministic, static and discrete
Planning & Acting in the Real World
17
Hierarchical Planning
 next
 we add techniques to the handle plan complexity issue
 HTN: hierarchical task network planning
 this works in a top-down fashion
 similar to the stepwise refinement approach to programming
 plans that are built from a fixed set of small atomic actions
will become unwieldy as the planning problem grows large
 we need to plan at a higher level of abstraction
 reduce complexity by hierarchical decomposition of plan steps
 at each level of the hierarchy a planning task is reduced to a
small number of activities at the next lower level
 the low number of activities
 means the computational cost of arranging these activities can
be lowered
Planning & Acting in the Real World
18
Hierarchical Planning
 an example: the Hawaiian vacation plan
 recall: the AIMA authors live/work in San Francisco Bay area
 go to SFO airport
 take flight to Honolulu
 do vacation stuff for 2 weeks
 take flight back to SFO
 go Home
 each action in this plan actually embodies another planning
task
 for example: the go to SFO airport action might be expanded
 drive to long term parking at SFO
 park
 take shuttle to passenger terminal
 & each action can be decomposed until the level consists of
actions that can be executed without deliberation
 note: some component actions might not be refined until plan
execution time (interleaving: a somewhat different topic)
Planning & Acting in the Real World
Hierarchical Planning
19
 basic approach
 at each level, each component is reduced to a small number of
activities at the next lower level
 this keeps the computational cost of arranging them low
 otherwise, there are too many individual atomic actions for
non-trivial problems (yielding high branching factor & depth)
 the formalism is HTN planning
 Hierarchical Task Network planning
 notes
 we retain the basic environmental assumptions as for classical
planning
 what we previously simply called actions are now "primitive
actions"
 we add HLAs: High Level Actions (like go to SFO airport)
 each has 1 or more possible refinements
 refinements are sequences of actions, either HLAs or primitive
actions
Hierarchical Task Network
20
 alternative refinements: notation
 for the HLA: Go(Home, SFO)
Refinement (Go(Home, SFO),
STEPS: [Drive(Home, SFOLongTermParking),
Shuttle(SFOLongTermParking, SFO)])
Refinement (Go(Home, SFO),
STEPS: [Taxi(Home, SFO)])
 the HLAs and their refinements
 capture knowledge about how to do things
 terminology: if the HLA refines to only primitive actions
 it is called an implementation
 the implementation of a high-level plan (sequence of HLAs)
 concatenates the implementations for each HLA
 the preconditions/effects representation of primitive action
schemas allows a decision about whether an implementation of
a high-level plan achieves the goal
Hierarchical Task Network
21
 HLAs & refinements & plan goals
 in the HTN approach, the goal is achieved if any
implementation achieves it
 this is the case since an agent may choose the
implementation to execute (unlike non-deterministic
environments where "nature" chooses)
 in the simplest case there's a single implementation of an HLA
 we get preconds/effects from the implementation, and then treat
the HLA as a primitive action
 where there are multiple implementations, either
 1. search over implementations for 1 that solves the problem
 OR
 2. reason over HLAs directly
 derive provably correct abstract plans independent of the specific
implementations
Planning & Acting in the Real World
Search Over Implementations
22
 1. the search approach
 this involves generation of refinements by replacing an HLA in
the current plan with a candidate refinement until the plan
achieves the goal
 the algorithm on the next slide shows a version using
breadth-first tree search, considering plans in the order of the
depth of nesting of refinements
 note that other search versions (graph-search) and strategies
(depth-first, iterative deepening) may be formulated by redesigning the algorithm
 explores the space of sequences derived from knowledge in the
HLA library re: how things should be done
 the action sequences of refinements & their preconditions code
knowledge about the planning domain
 HTN planners can generate very large plans with little search
Planning & Acting in the Real World
Search Over Implementations
23
 the search algorithm for refinements of HLAs
function HIERARCHICAL-SEARCH(problem, hierarchy) returns a solution or failure
frontier  a FIFO queue with [Act] as the only element
loop do
if EMPTY?(frontier) then return failure
plan  POP(frontier)
/* chooses the shallowest plan in frontier */
hla  the first HLA in plan, or null if none
prefix, suffix  the action subsequences before and after hla in plan
outcome  RESULT(problem.INITIAL-STATE, prefix)
if hla is null then
/* so plan is primitive & outcome is its result */
if outcome satisfies problem.GOAL then return plan
/* insert all refinements of the current hla into the queue */
else for each sequence in REFINEMENTS(hla, outcome, hierarchy) do
frontier  INSERT(APPEND(prefix, sequence, suffix), frontier)
Planning & Acting in the Real World
24
HTN Examples
 O-PLAN: an example of a real-world system
 the O-PLAN system does both planning & scheduling,
commercially for the Hitachi company
 one specific sample problem concerns a product line of 350
items involving 35 machines and 2000+ different operations
 for this problem, the planner produces a 30-day schedule of
3x8-hour shifts, with 10s of millions of steps
 a major benefit of the hierarchical structure with the HTN
approach is the results are often easily understood by humans
 abstracting away from excessive detail
 (1) makes large scale planning/scheduling feasible
 (2) enhances comprehensibility
Planning & Acting in the Real World
25
HTN Efficiency
 computational comparisons for a hypothetical domain
 assumption 1: a non-hierarchical progression planner with d
primitive actions, b possibilities at each state: O(bd)
 assumption 2: an HTN planner with r refinements of each
non-primitive, each with k actions at each level
 how many different refinement trees does this yield?
 depth: number of levels below the root = logkd
 then the number of internal refinement nodes = 1 + k + k2 + … +
klogkd-1 = (d - 1)/(k - 1)
 each internal node has r possible refinements, so r(d - 1)/(k - 1)
possible regular decomposition trees
 the message: keeping r small & k large yields big savings
(roughly kth root of non-hierarchical cost if b & r are
comparable)
 nice as a goal, but long action sequences that are useful over a
range of problems are rare
Planning & Acting in the Real World
26
HTN Efficiency
 HTN computational efficiency
 building the plan library is critically important to achieving
efficiency gains HTN planning
 so, might the refinements be learned?
 as one example, an agent could build plans conventionally then
save them as a refinement of an HLA defined as the current
task/problem
 one goal is "generalizing" the methods that are built, eliminating
problem-instance specific detail, keeping only key plan
components
Planning & Acting in the Real World
27
Hierarchical Planning
 we've just looked at the approach of searching over
fully refined plans
 that is, full implementations
 the algorithm refines plans to primitive actions in order to
check whether they achieve the problem goal
 now we move on to searching for abstract solutions
 the checking occurs at the level of HLAs
 possibly with preconditions/effects descriptions for HLAs
 the result is that search is in the much smaller HLA space,
after which we refine the resulting plan
Planning & Acting in the Real World
Hierarchical Planning
28
 searching for abstract solutions
 this approach will require that HLA descriptions have the
downward refinement property
 every high level plan that apparently solves the problem (from the
description of its steps) has at least 1 implementation that
achieves the goal
 since search is not at the level of sequences of primitive
actions, a core issue is the describing of effects of actions
(HLAs) with multiple implementations
 assuming a problem description with only +ve preconds & goals,
we might describe an HLA's +ve effects in terms of those achieved
by every implementation, and its -ve effects in terms of those
resulting from any implementation
 this would satisfy the downward refinement property
 however, requiring an effect to be true for every implementation is
too restrictive, it assumes that an adversary chooses the
implementation (assumes an underlying non-deterministic model)
29
Plan Search in HLA Space
 plan search in HLA space
 there are alternative models for which implementation is
chosen, either
 (1) demonic non-determinism where some adversary makes the
choice
 (2) angelic non-determinism, where the agent chooses
 if we adopt angelic semantics for HLA descriptions
 the resulting notation uses simple set operations/notation
 the key concept is that of the reachable set for some HLA h &
state s, notation: Reach(s, h)
 this is the set of states reachable by any implementation of h
(since under angelic semantics, the agent gets to choose)
 for a sequence of HLAs [h1, h2] the reachable set is the union
of all reachable sets from applying h2 in each state in the
reachable set of h1 (for notation details see p 411)
 a sequence of HLAs forming a high level plan is a solution if
its reachable set intersects the set of goal states
Planning & Acting in the Real World
30
Plan Search in HLA Space
 illustration of reachable sets, sequences of HLAs






dots are states, shaded areas = goal states
darker arrows: possible implementations of h1
lighter arrows: possible implementations of h2
(a) reachable set for HLA h1
(b) reachable set for the sequence [h1, h2]
circled dots show the sequence achieving the goal
Planning & Acting in the Real World
31
Planning in HLA Space
 using this model
 planning consists of searching in HLA space for a sequence
with a reachable set that intersects the goal, then refining
that abstract plan
 note: we haven't considered yet the issue of
representing reachable sets as the effects of HLAs
 our basic planning model has states as conjunctions of fluents
 if we treat the fluents of a planning problem as state
variables, then under angelic semantics an HLA controls the
values of these variables, depending on which implementation
is actually selected
 HLA may have 9 different effects on a given variable
 if it starts true, in can always keep it true, always make it false,
or have a choice & similarly for a variable that is initially false
 any combination of the 3 choices for each case is possible,
yielding 32 or 9 effects
Planning & Acting in the Real World
32
Planning in HLA Space
 using this model
 so there are 9 possible combinations of choices for the effects
on variables
 we introduce some additional notation to capture this idea
 note some slight formatting differences between the details of
the notation used here versus in the textbook
 ~ indicates possibility, the dependence on the agent's choice of
implementation
 ~+A indicates the possibility of adding A
 ~-A represents the possible deleting of A
 ~±A stands for possibly adding or deleting A
Planning & Acting in the Real World
Planning in HLA Space
33
 possible effects of HLAs
 a simple example uses the HLA for going to the airport
Go(Home, SFO)
Refinement (Go(Home, SFO),
STEPS: [Drive(Home, SFOLongTermParking), Shuttle(SFOLongTermParking, SFO)])
Refinement (Go(Home, SFO),
STEPS: [Taxi(Home, SFO)])
 this HLA has ~-Cash as a possible effect, since the agent may choose
the refinement of going by taxi & have to pay
 we can use this notation & angelic reachable state semantics to
illustrate how an HLA sequence [h1, h2] reaches a goal
 it's often the case that an HLA's effects can only be approximated
(since it may have infinitely many implementations & produce
arbitrarily "wiggly" reachable sets)
 we use approximate descriptions of result states of HLAs that are
 optimistic: REACH+(s, h) or pessimistic: REACH-(s, h)
 one may overestimate, the other underestimate
 here's the definition of the relationship
 REACH-(s, h)  REACH(s, h)  REACH+(s, h)
Planning in HLA Space
34
 possible effects of HLAs using approximate descriptions
of result states
 with approximate descriptions, we need to reconsider how to
apply/interpret the goal test
 (1) if the optimistic reachable set for a plan does not intersect the
goal, then the plan is not a solution
 (2) if the pessimistic reachable set for a plan intersects the goal,
then the plan is a solution
 (3) if the optimistic set intersects but the pessimistic set does not,
the goal test is not decided & we need to refine the plan to resolve
residual ambiguity
Planning in HLA Space
35
 illustration
 shading shows the set of goal states
 reachable sets: R+ (optimistic) shown by dashed boundary, R(pessimistic) by solid boundary
 in (a) the plan shown by a dark arrow achieves the goal & the
plan shown by the lighter arrow does not
 in (b), the plan needs further refinement since the R+
(optimistic) set intersects the goal but the R- (pessimistic) does
not
Planning in HLA Space
36
 the algorithm
 hierarchical planning with approximate angelic descriptions
function ANGELIC-SEARCH(problem, hierarchy, initialPlan) returns solution or fail
frontier  a FIFO queue with initialPlan as the only element
loop do
if EMPTY?(frontier) then return fail
plan  POP(frontier)
/* chooses shallowest node in frontier */
if REACH+(problem.INITIAL-STATE, plan) intersects problem.GOAL then
/* opt'c*/
if plan is primitive then return plan
/* REACH+ is exact for primitive plans */
guaranteed  REACH-(problem.INITIAL-STATE, plan)  problem.GOAL /* pess'c*/
/* pessimistic set includes a goal state & we're not in infinite regress of refinements */
if guaranteed  {} and MAKING-PROGRESS(plan, initialPlan) then
finalState  any element of guaranteed
return DECOMPOSE(hierarchy, problem.INITIAL-STATE, plan, finalState)
hla  some HLA in plan
prefix, suffix  the action subsequences before & after hla in plan
for each sequence in REFINEMENTS(hla, outcome, hierarchy) do
frontier  INSERT(APPEND(prefix, sequence, suffix), frontier)
Planning in HLA Space
 the decompose function
 mutually recursive with ANGELIC-SEARCH
 regress from goal to generate successful plan at next level of
refinement
function DECOMPOSE(hierarchy, s0, plan, sf) returns a solution
solution  an empty plan
while plan is not empty do
action  REMOVE-LAST(plan)
si  a state in REACH-(s0, plan) such that sf  REACH-(si, action)
problem  a problem with INITIAL-STATE = si and GOAL = sf
solution  APPEND(ANGELIC-SEARCH(problem, hierarchy, action), solution)
s f  si
return solution
37
38
Planning in HLA Space
 notes
 ANGELIC-SEARCH has the same basic structure as the
previous algorithm (BFS in space of refinements)
 the algorithm detects plans that are or aren't solutions by
checking intersections of optimistic & pessimistic reachable
sets with the goal
 when it finds a workable abstract plan, it decomposes the
original problem into subproblems, one for each step of the
plan
 the initial state & goal for each subproblem are derived by
regressing the guaranteed reachable goal state through the
action schemas for each step of the plan
 ANGELIC-SEARCH has a computational advantage over the
previous hierarchical search algorithm, which in turn may
have a large advantage over plain old exhaustive search
Planning & Acting in the Real World
Least Cost & Angelic Search
39
 the same approach can be adapted to find a least cost
solution
 this generalizes the reachable set concept so that a state,
instead of being reachable or not, has costs for the most
efficient way of getting to it ( for unreachable states)
 then optimistic & pessimistic descriptions bound the costs
 the holy grail of hierarchical planning
 this revision may allow finding a provably optimal abstract plan
without checking all implementations
 extensions: the approach can also be applied to online search
in the form of hierarchical lookahead algorithms (recall LRTA*)
 the resulting algorithm resembles the human approach to problems
like the vacation plan
 initially consider alternatives at the abstract level, over long time
scales
 leave parts of the plan abstract until execution time, though other
parts are expanded into detail (flights, lodging) to guarantee feasibility
of the plan
Nondeterministic Domains
40
 finally, we'll relax some of the environment
assumptions of the classical planning model
 in part, these parallel the extensions of our earlier (CISC352)
discussions of search
 we'll consider the issues in 3 sub-categories
 (1) sensorless planning (conformant planning)
 completely drop the observability property for the environment
 (2) contingency planning
 for partially observable & nondeterministic environments
 (3) online planning & replanning
 for unknown environments
 however, we begin with some background
Planning & Acting in the Real World
BKGD: Nondeterministic Domains
41
 note some distinct differences from the search
paradigms
 the factored representation of states allows an alternative belief
state representation
 plus, we have the availability of the domain-independent
heuristics developed for classical planning
 as usual, we explore issues using a prototype problem
 this time it's the task of painting a chair & table so that their
colors match
 in the initial state, the agent has 2 cans of paint, colors unknown,
likewise the chair & table colors are unknown, & only the table is
visible
 plus there are actions to remove the lid of a can, & to paint from
an open can (see the next slide)
The Furniture Painting Problem
42
 the furniture painting problem
Init(Object(Table)  Object(Chair)  Can(C1)  Can(C2)  InView(Table)
Goal(Color(Chair, c)  Color(Table, c))
Action(RemoveLid(can),
PRECOND: Can(can)
EFFECT: Open(can))
Action(Paint(x, can),
PRECOND: Object(x)  Can(can)  Color(Can, c)  Open(can)
EFFECT: Color(x, c))
Planning & Acting in the Real World
BKGD: Nondeterministic Domains
43
 the environment
 since it may not be fully observable, we'll allow action
schemas to have variables in preconditions & effects that
aren't in the action's variable list
 Paint(x, can) omits the variable c representing the color of the
paint in can
 the agent may not know what color is in a can
 in some variants, the agent will have to use percepts it gets
while executing the plan, so planning needs to model sensors
 the mechanism: Percept Schemas
Percept (Color(x, c),
PRECOND: Object(x)  InView(x))
Percept (Color(can, c),
PRECOND: Can(can)  InView(can)  Open(can))
 when an object is in view, the agent will perceive its color
 if an open can is in view, the agent will perceive the paint color
Planning & Acting in the Real World
BKGD: Nondeterministic Domains
44
 we still need an Action Schema for inspecting objects
Action (LookAt(x),
PRECOND: InView(y)  (x  y)
EFFECT: InView(x)  ¬ InView(y))
 in a fully observable environment, we include a percept axiom
with no preconds for each fluent
 of course, a sensorless agent has no percept axioms
 note: it can still coerce the table & chair to the same color to
solve the problem (though it won't know what color that is)
 a contingent planning agent with sensors can do better
 inspect the objects, & if they're the same color, done
 otherwise check the paint cans & if one is the same color as an
object, paint the other object with it
 otherwise paint both objects any color
 an online agent produces contingent plans with few branches
 handling problems as they occur by replanning
Planning & Acting in the Real World
BKGD: Nondeterministic Domains
45
 a contingent planner assumes that the effects of an action are
successful
 a replanning agent checks results, generating new plans to fix
any detected flaws
 in the real world we find combinations of approaches
Planning & Acting in the Real World
Sensorless Planning Belief States
46
 unobservable environment = Sensorless Planning
 these problems are belief state planning problems with physical
transitions represented by action schemas
 we assume a deterministic environment
 we represent belief states as logical formulas rather than the
explicit sets of atomic states we saw for sensorless search
 for the prototype planning problem: furniture painting
 1. we omit the InView fluents
 2. some fluents hold in all belief states, so we can omit them for
brevity: (Object(Table), Object(Chair), Can(C1), Can(C2))
 3. the agent knows things have a color (x c Color(x, c)), but
doesn't know the color of anything or the open vs closed state of
cans
 4. yields an initial belief state b0 = Color(x, C(x)), where C(x) is a
Skolem function to replace the existentially quantified variable
 5. we drop the closed-world assumption of classical planning, so
states may contain +ve & -ve fluents & if a fluent does not appear,
its value is unknown
Sensorless Planning Belief States
47
 belief states
specify how the world could be
they are represented as logical formulas
each is a set of possible worlds that satisfy the formula
in a belief state b, actions available to the agent are those
with their preconds satisfied in b
 given the initial belief state b0 = Color(x, C(x)), a simple
solution for the painting problem plan is:




[RemoveLid(Can1), Paint(Chair, Can1), Paint(Table, Can1)]
 we'll update belief states as actions are taken, using the rule
 b' = RESULT(b, a) = {s': s' = RESULTP(s, a) and s  b}
 where RESULTP defines the physical transition model
Planning & Acting in the Real World
Sensorless Planning Belief States
48
 updating belief states
 we assume that the initial belief state is 1-CNF form, that is, a
conjunction of literals
 b' is derived based on what happens for the literals l in the
physical states s that are in b when a is applied
 if the truth value of a literal is known in b then in b' it is given by
the current value, plus the add list of a & the delete list of a
 if a literal's truth value is unknown, 1 of 3 cases applies
 1. a adds l so it must be true in b'
 2. a deletes l so it must be false in b'
 3. a does not affect l so it remains unknown (thus is not in b')
Planning & Acting in the Real World
Sensorless Planning Belief States
49
 updating belief states: the example plan
 recall the sensorless agent's solution plan for the furniture
painting problem
[RemoveLid(Can1), Paint(Chair, Can1), Paint(Table, Can1)]
 apply RemoveLid(Can1) to b0 = Color(x, C(x))
(1) b1= Color(x, C(x))  Open(Can1)
 apply Paint(Chair, Can1) to b1
 precondition Color(Can1, c) is satisfied by Color(x, C(x)) with the
binding {x/Can1, c/C(Can1)}
(2) b2 = Color(x, C(x))  Open(Can1)  Color(Chair, C(Can1))
 now apply the last action to get the next belief state, b3
(3) b3 = Color(x, C(x))  Open(Can1)  Color(Chair, C(Can1))  Color(Table, C(Can1))
 note that this satisfies the plan goal (Goal(Color(Chair, c) 
Color(Table, c))with c bound to C(Can1)
Sensorless Planning Belief States
50
 the painting problem solution
 this illustrates that the family of belief states given as
conjunctions of literals is closed under updates defined by
PDDL action schemas
 so given n total fluents, any belief state is represented as a
conjunction of size O(n) (despite the O(2n) states in the
world)
 however, this is only the case when action schemas have the
same effects for all states in which their preconds are satisfied
 if an action's effects depends on the state, dependencies among
fluents are introduced & the 1-CNF property does not apply
 illustrated by an example from the simple vacuum world on the
next slides
Planning & Acting in the Real World
51
Recall Vacuum World
 the simple vacuum world state space
Planning & Acting in the Real World
Sensorless Planning Belief States
52
 if an action's effects depends on the state
 dependencies among fluents are introduced & the 1-CNF
property does not apply
 the effect of the Suck action depends on where it is done
(CleanL if agent is AtL, but CleanR if agent is AtR)
 this requires conditional effects for action schemas:
 when condition: effect, or for the vacuum world
Action (Suck,
Effect: when AtL: CleanL  when AtR: CleanR)
 considering conditional effects & belief states
 applying the conditional action to the initial belief state yields a
result belief state
(AtL  CleanL)  (AtR  CleanR)
 so the belief state formula is no longer 1-CNF, and in the
worst case may be exponential in size
Planning & Acting in the Real World
Sensorless Planning Belief States
53
 to a degree, the available options are
 (1) use conditional effects for actions & deal with the loss of
the belief state representational simplicity
 (2) use a conventional action representation whose
preconditions, if unsatisfied, are inapplicable & leave the
resulting state undefined
 for sensorless planning, conditional effects are preferable
 they yield "wiggly" belief states (& maybe that's inevitable
anyway for non-trivial problems)
 an alternative is a conservative approximation of belief states (all
literals whose truth values can be determined, with the others
treated as unknown)
 this yields planning that is sound but incomplete (if problem requires
interactions among literals)
Planning & Acting in the Real World
Sensorless Planning Belief States
54
 another alternative
 the agent (algorithm) could attempt to use actions
sequences that keep the belief state simple (1-CNF) as in
this vacuum world example
 the target is a plan consisting of actions that will yield the simple
belief state representation, for example:
[Right, Suck, Left, Suck]
b0 = True
b1 = AtR
b2 = AtR  CleanR
b3 = AtL  CleanR
b4 = AtL  CleanR  CleanL
 note that some alternative sequences (e.g. those beginning with
the Suck action) would break the 1-CNF representation
 more simple belief states are attractive, as even human behaviour
shows - the evidence is our carrying out of frequent small actions
to reduce uncertainty (keeping the belief state manageable)
Sensorless Planning Belief States
55
 yet another alternative for representing belief states
under the relaxed observability
 we might represent belief states in terms of an initial belief
state + a sequence of actions, yielding an O(n + m) bound on
belief state size
 a world of n literals, with a maximum of m actions in a sequence
 if so, the issues relate to the difficulty of calculating when an
action is applicable or a goal is satisfied
 we might use an entailment test: b0  Am ╞ Gm, where
 b0 is the initial belief state
 Am are the successor state axioms for the actions in the
sequence, and Gm states the goal is achieved after m actions
 so we want to show b0  Am  ¬Gm is unsatisfiable
 a good SAT solver may be able to determine this quite efficiently
Planning & Acting in the Real World
Sensorless Planning Heuristics
 as a last consideration
 we return to the question of the use of heuristics to prune the
search space
 notice that for belief states, solving for a subset of the belief
state must be easier than solving it entirely
if b1  b2 then h*(b1)  h*(b2)
 thus an admissible heuristic for a subset of states in the belief
state is an admissible heuristic for the belief state
 candidate subsets include singletons, the individual states
 assuming we adopt 1 of the admissible heuristics we saw for
classical planning, and that s1, ..., sN is a random selection of
states in belief state b, an accurate admissible heuristic is
H(b) = max{h(s1), ..., h(sN)}
 still other alternatives involve converting to planning graph form,
where the initial state layer is derived from b
 just its literals if b is 1-CNF or potentially derived from a non-CNF
representation
56
Contingent Planning
 we relax some of the environmental assumptions of
classical planning to deal with environments that are
partially observable and/or non-deterministic
 for such environments, a plan includes branching based on
percepts (recall percept schemas from the introduction)
Percept (Color(x, c),
PRECOND: Object(x)  InView(x))
Percept (Color(can, c),
PRECOND: Can(can)  InView(can)  Open(can))
 at plan execution, we represent a belief state as logical
formulas
 the plan includes contingent/conditional branches
 check branch conditions: does the current belief state entail
the condition or its negation
 the conditions include first order properties (existential
quantification), so they may have multiple substitutions
 an agent gets to choose one, applying it to the remainder of
the plan
57
Contingent Planning
 a contingent plan solution for the painting problem
[LookAt(Table), LookAt(Chair),
if Color(Table, c)  Color(Chair, c) then NoOp
else [RemoveLid(Can1), LookAt(Can1), RemoveLid(Can2), LookAt(Can2)
if Color(Table, c)  Color(can, c) then Paint(Chair, can)
else if Color(Chair, c)  Color(Can, c) then Paint(Table, can)
else [Paint(Chair, Can1), Paint(Table, Can1)]]]
 note: Color(Table, c)  Color(can, c)
 this might be satisfied under both {can/Can1} and {can/Can2} if
both cans are the same color as the table
 the previous-to-new belief state calculation occurs in 2 stages
 (1) after an action, a, as with the sensorless agent
 b^ = (b - DEL(a))  Add(a), where b^ is the predicted belief state,
represented as a conjunction of literals
 (2) then in the percept stage, determine which percept axioms
hold in the now partially updated belief state, and add their
percepts + preconditions
58
Contingent Planning
59
 (2) updating the belief state from the percept axioms
 Percept(p, PRECOND: c), where c is conjunction of literals
 suppose percept literals p1, ..., pk are received
 for a given percept p, there's either a single percept axiom or there
may be more than 1
 if just 1, add it's percept literal & preconditions to the belief state
 if > 1, then we have to deal with multiple candidate preconditions
 add p & the disjunction of the preconditions that may hold in the
predicted belief state b^
 if this is the case, we've given up the 1-CNF form for belief state
representation and similar issues arise as for conditional effects for the
sensorless planner
 given a way to generate exact or approximate belief states
 (1) the algorithm for contingent search may generate contingent
plans
 (2) actions with nondeterministic effects (disjunctive EFFECTs) can
be handled with minor changes to belief state updating
 (3) heuristics, including those that were suggested for sensorless
planning, are available
Contingent Planning
 the AND-OR-GRAPH-SEARCH algorithm
 AND nodes indicate non-determinism, must all be handled, while
OR nodes indicate choices of actions from states
 the algorithm
 is depth first, mutually recursive, & returns a conditional plan
 notation: [x | l] is the list formed by prepending x to the list l
function AND-OR-GRAPH-SEARCH(problem) returns a conditional plan, or failure
return OR-SEARCH(problem.INITIAL-STATE, problem, [])
function OR-SEARCH(state, problem, path) returns a conditional plan or failure
if problem.GOAL-TEST(state) then return the empty plan
if state is on path then return failure
/* repeated state on this path */
for each action in problem.ACTIONS(state) do
plan  AND-SEARCH(RESULTS(state, action), problem, [state | path] )
if plan  failure then return [action | plan]
return failure
function AND-SEARCH(states, problem, path) returns a conditional plan or failure
for each si in states do
plani  OR-SEARCH(si, problem, path )
if plan = failure then return failure
return [ if s1 then plan1 else if s2 then plan2 else … if sn-1 then plann-1 else plann]
60
Online Replanning
61
 replanning
 this approach uses/captures knowledge about what the agent is
trying to do
 some form of execution monitoring triggers replanning
 it interleaves executing & planning, dealing with some
contingencies by including Replan branches in the plan
 if the agent encounters a Replan during plan execution, it returns
to planning mode
 why Replan?
 may be error or omission in the world model used to build the plan
 e.g. no state variable to represent the quantity of paint in a can (so
it could even be empty), or exogenous events (a can wasn't
properly sealed & the paint dried up), or a goal may be changed
 environment monitoring by the online agent
 (1) action monitoring: check preconds before executing an action
 (2) plan monitoring: check that the remaining plan will still work
 (3) goal monitoring: before executing, ask: "Is a better set of
goals available?"
62
Online Replanning
 a replanning example
 action monitoring indicates the agent's state is not as
planned, so it should try to get back to a state in the original
plan, minimizing total cost
 when the agent finds it is in not in the expected state, E, but
observes that it is instead in O, it Replans
Planning & Acting in the Real World
Online Replanning
63
 replanning in the furniture painting problem
[LookAt(Table), LookAt(Chair),
if Color(Table, c)  Color(Chair, c) then NoOp
else [RemoveLid(Can1), LookAt(Can1),
if Color(Table, c)  Color(Can1, c) then Paint(Chair, Can1)
else REPLAN]]
 the online planning agent, having painted the Chair, checks the
preconds for the remaining empty plan: that the table & chair
are the same colour
 suppose the new paint didn't cover well & the old colour still shows
 the agent needs to determine where in the whole plan to return to,
& what repair action sequence to use to get there
 given that the current state matches that before Paint(Chair, Can1), an
empty repair sequence & new plan of the same [Paint] sequence is OK
 the agent resumes execution monitoring, retries the Paint action &
loops like this until colours match
 note that the loop is online: plan-execute-replan, not explicit in the
plan
64
Online Replanning
 replan
 the original plan doesn't handle all contingencies, the REPLAN
step could generate an entirely new plan
 a plan monitoring agent may detect faults earlier, before the
corresponding actions are executed: when the current state
means that the remaining plan won't work
 so it checks preconditions for success of the remaining plan
 for each of its steps, except those contributed by some other step in
the remaining plan
 the goal is to detect future failure as early as possible, & replan
 note: in (rare) cases it might even detect serendipitous success
 action monitoring by checking preconditions is relatively easy
to include but plan monitoring is more difficult
 partial order & planning graph structures include information that
may support the plan monitoring approach
Planning & Acting in the Real World
65
Online Replanning
 with replanning, plans will always succeed, right?
 still there can be "dead ends", states from which no repair is
possible
 a flawed model can lead the plan into dead ends
 an example of a flawed model: the general assumption of
unlimited resources (for example, bottomless paint cans)
 however, if we assume there are no dead ends, there will be a
plan to reach a goal from any state
 and if we further assume that the environment is truly nondeterministic (that there's always a non-zero chance of success)
then a replanning agent will eventually achieve the goal
Planning & Acting in the Real World
66
Online Replanning
 when replanning fails
 another problem is that actions may not really be nondeterministic - instead, they may depend on preconditions the
agent does not know about
 for example, that painting from an empty paint can has no effect
& will never lead to the goal
 there are alternative approaches to cope with such failures
 (1) the agent might randomly select a candidate repair plan
(open another can?)
 (2) the agent also might learn a better model
 modifying the world model to match percepts when predictions fail
Planning & Acting in the Real World
Multiagent Planning
67
 the next relaxation of environmental assumptions
 there may be multiple agents whose actions need to be taken
into account in formulating our plans
 background: distinguish several slightly different paradigms
 (1) multieffector planning
 this is what we might call multitasking, really a single central agent
but with multiple ways of interacting with the environment,
simultaneously (or, like a multiarmed robot)
 (2) multibody planning
 here we consider multiple detached units moving separately, but
sharing percepts to generate a common representation of the
world state that is the basis of the plan
 one version of the multibody scenario has central plan formulation
but somewhat decoupled execution
 for example, a fleet/squadron of reconnaissance robots that are
sometimes out of communications range
 multibody subplans for each individual body include communication
actions
68
Multiagent Planning
 variations on the theme
 with a central planning agent, there's a shared goal
 it's also possible for distinct agents, each generating plans, to
have a shared goal
 the latter paradigm suggests the new prototypical problem:
planning for a tennis doubles team
 so shared goal situations can be either multibody (1 central
plan) or multiagent (each developing a plan, but with a
requirement for coordination mechanisms)
 a system could even be some hybrid of centralized &
multiagent planning
 as an example, the package delivery company develops
centralized routing plans but each truck driver may respond to
unforeseen weather, traffic issues with independent planning
Planning & Acting in the Real World
Multiagent Planning
69
 our first model involves multiple simultaneous actions
 the terminology is multiactor settings
 we merge aspects of the multieffector, multibody, & multiagent
paradigms, then consider issues related to transition models,
correctness of plans, efficiency/complexity of planning algorithms
 correctness: a correct plan, if carried out by the actors will
achieve the goal
 note that in a true multiagent situation, they might not agree
 synchronization: a simplifying assumption we apply that all
actions require the same length of time, & multiple actions at a
step in the plan are simultaneous
 under a deterministic environment assumption, the transition
model is given by the function: Result(s, a)
 action choices for a single agent = b, & b may be quite large
 in the multiactor model with n actors, now an action is joint using
the notation <ai, ..., an>, where ai is the action for the ith actor
Multiactor Scenario
70
 complexity implications of the transition model
 now with bn joint actions we have a bn branching factor for
planning
 since planning algorithm complexity was already an issue, a shared
target for multiactor planning systems is to treat the actors as
decoupled so that complexity is linear in n rather then exponential
 the loose coupling of actors may allow an approximate to linear
improvement
 this is analogous to issues we've encountered before: additive
heuristics for independent subproblems in planning, reducing of a
CSP graph to a tree (or multiple trees) to apply efficient
algorithms, ...
 in multiactor planning: for loosely coupled problems, we treat them
as decoupled & then apply fixes as required to handle any
interactions
 so the action schemas of the transition model treat actors as
independent
Multiactor Scenario
71
 prototype problem: doubles tennis
 the problem is formulated as returning a ball hit to the team,
while retaining court coverage
 there are 2 players on the team, each is either at the net or
baseline, on the right side or left side of the court
 actions are the moving of a player (actor) or the hitting of the
ball by a player
Planning & Acting in the Real World
72
Doubles Tennis Problem
 here's the conventional (independence assumption)
multiactor problem setup for doubles tennis
Actors(A, B)
Init(At(A, LeftBaseline)  At(B, RightNet)  Approaching(Ball,
RightBaseline))  Partner(A, B)  Partner(B, A)
Goal(Returned(Ball)  At(a, RightNet)  At(a, LeftNet)
Action(Hit(actor, Ball),
PRECOND: Approaching(Ball, loc)  At(actor, loc)
EFFECT: Returned(Ball)
Action(Go(actor, to),
PRECOND: At(actor, loc)  to  loc
EFFECT: At(actor, to)  ¬ At(actor, loc))
Planning & Acting in the Real World
Multiactor Tennis Doubles Scenario
73
 for the multiactor tennis problem
 here is a joint plan given the problem description
Plan 1:
A:[Go(A, RightBaseline), Hit(A, Ball)]
B:[NoOp(B), NoOp(B)]
 what are issues given the current problem representation?
 a legal and apparently successful plan could still have both
players hitting the ball at the same time (though that really won't
work)
 the preconditions don't include constraints to preclude
interference of this type
 a solution: revise the action schemas to include concurrent
action lists that can explicitly state actions are or are not
concurrent
Planning & Acting in the Real World
Controlling Concurrent Actions
74
 a revised Hit action requires it be by 1 actor
 this is represented by including a concurrent action list
Action(Hit(a, Ball),
CONCURRENT: b  a  ¬Hit(b, Ball)
PRECOND: Approaching(Ball, loc)  At(a, loc)
EFFECT: Returned(Ball)
 some actions might require concurrency for success
 apparently tennis players require large coolers full of
refreshing drinks & 2 actors are required to carry the cooler
Action(Carry(a, cooler, here, there),
CONCURRENT: b  a  Carry(b, cooler, here, there)
PRECOND: At(a, here)  At(cooler, here)  Cooler(cooler)
EFFECT: At(a, there)  At(cooler, there)  ¬At(a, here)  ¬At(cooler, here)
Planning & Acting in the Real World
75
Multiactor Scenario
 given appropriately revised action schemas
 including concurrent action lists
 it becomes relatively simple to adapt the classical planning
algorithms for multiactor planning
 it depends on there being loose coupling of subplans
 so the plan search algorithm does not encounter concurrency
constraints too frequently
 further, the HTN approaches, techniques for partial
observability, contingency & replanning techniques may also
be adapted for the loosely coupled multiactor problems
 next: full blown multiagent scenarios
 each agent makes independent plans
Planning & Acting in the Real World
76
Multiple Agents
 cooperation & coordination
 each agent formulates its own plan, but based on shared
goals & a shared knowledge base
 we continue with the doubles tennis example problem
Plan 1:
A:[Go(A, RightBaseline), Hit(A, Ball)]
B:[NoOp(B), NoOp(B)]
Plan 2:
A:[Go(A, LeftNet), NoOp(A)]
B:[Go(B, RightBaseline), Hit(B, Ball)]
 either of these plans may work if both agents use it, but if A
does 1 & B does 2 (or vice versa), both or neither returns the
ball
 so there has to be some mechanism that results in agents
agreeing on a single plan
Planning & Acting in the Real World
Multiple Agents
77
 techniques for agreement on a single plan
 (A) convention: adopt or agree upon some constraint on the
selection of joint plans, for example in doubles tennis, "stay on
your side of the court"
 or a baseball center fielder takes fly balls hit "in the gap"
 conventions are observable at more global levels among multiple
agents, when, for example, drivers agree to drive on a particular
side of the road
 in higher order contexts, the conventions become "social laws"
 (B) communication: between agents, as when 1 doubles player
yells "mine" to a teammate
 the signal indicates which is the preferred joint plan
 see similar examples in other team sports as when a baseball fielder
calls for the catch on a popup
 note that the communication could be non-verbal
 plan recognition applies when 1 agent begins execution & the initial
actions unambiguously indicate which plan to follow
78
Multiple Agents
 the AIMA authors discuss natural world conventions
 these may be the outcome of evolutionary processes
 in harvester ant colonies - there is no central control
 yet they execute elaborate "plans" where each individual ant
takes on 1 of multiple roles based on its current local conditions
 convention or communication?
 planning & "spontaneous" human social events (Aberdeen)?
 another example from the natural world is the flocking
behaviour of birds
 this can be seen as a cooperative multiagent process
 successful simulations of flocking behaviour algorithmically over
a collection of agents ("boids") are possible if each observes its
neighbours & maximizes a weighted sum of 3 elements
 (1) cohesion: +ve for closer to average position of neighbours
 (2) separation: -ve for too close to a neighbour
 (3) alignment: +ve for closer to the average heading of neighbours
Planning & Acting in the Real World
Multiple Agents
79
 convention & emergent behaviour
 where complex global behavior can arise from the interaction of
simple local rules
 in the boids example, the result is a pseudorigid "flock" that has
approximately constant density, does not disperse over time, &
makes occasional swooping motions
 each agent operates without having any joint plan to explicitly
indicate actions of other agents
 see some boids background & a demo at: boids online
 UMP! (ultimate multiagent problems)
 these involve cooperation within a team & competition against
another team, without central planning/control
 robot soccer is an example, as are other similar dynamic team
sports (hockey, basketball)
 may be less true of say baseball, football where some central
control is possible & high degree of convention + communication
Summary
80
 moving away from the limits of classical planning
 (1) actions consume (& possibly produce) resources which we
treat as aggregates to control complexity
 formulate partial plans, taking resource constraints into account,
then refine them
 (2) time is a resource that can be considered with dedicated
scheduling algorithms or perhaps integrated with planning
 (3) a HTN (Hierarchical Task Network) approach captures
knowledge in HLAs (High Level Actions) that may have multiple
implementations as sequences of lower level actions
 angelic semantics for interpreting the effects of HLAs allows
planning in the space of HLAs without refinement into primitive
actions
 HTN systems can create large, real-world plans
 (4) classical planning's environment assumptions are too
rigid/optimistic for many problem domains
 full observability, deterministic actions, a single agent
Summary
81
 relaxing the assumptions of classical planning
 (5) contingent & sensorless planning
 contingent planning uses percepts during execution to conditionally
branch to appropriate subplans
 sensorless/conformant planning may succeed in coercing the world
to a goal state without any percepts
 for contingent & sensorless paradigms, plans are built by search in
the belief space, for which the techniques must address
representational & computational issues
 (6) online planning agents interleave execution & planning
 they monitor for problems & repair plans to recover from
unplanned states, allowing them to deal with nondeterministic
actions, exogenous events, & poor models of the environment
 (7) multiple agents might be cooperative or competitive
 the keys to success are in mechanisms for coordination
 (8) future chapters will cover
 probabilistic non-determinism, learning from experience to acquire
strategies

Document

Transcript Document

Directory