tripp07a.ppt

Download Report

Transcript tripp07a.ppt

Cooperation, Negotiation and
Reconsideration on the basis of
Environmental Cues
O. Tripp and J.S. Rosenschein
Agenda
• The BDI control loop and the
reconsideration problem
• Educated reconsideration
• Cooperation and state-changing rules
• Conditional cooperation
• Cooperation through exploration
• Conclusions
The BDI Control Loop
The Reconsideration Problem
• The reconsideration function behaves
optimally if, and only if, whenever it
chooses to deliberate the agent changes
its intentions (Wooldridge and Parsons,
1995).
• The logic that controls the activation of the
reconsideration function is hardwired into
the agent.
Educated Reconsideration
• The agent’s frequency of ‘seeing’ is
proportional to the degree of dynamism of
the environment.
• When the agent senses its environment to
change rapidly, it reconsiders quite often.
• When the agent senses its environment to
be relatively static, it rarely reconsiders.
A Simple Learning Algorithm
• If you are in the middle of a plan, then carry on executing
it. Otherwise, select a new plan.
• If ReconsiderationInterval is not set, then set it to
the number of steps in the current plan.
• After ReconsiderationInterval steps, stop to
reconsider.
– Compare the expected state of the environment with its actual
state (as registered by your sensors).
– Update ReconsiderationInterval accordingly.
– Feed ReconsiderationInterval into a buffer maintaining
your history, and set ReconsiderationInterval to some
weighed average on the values stored in the buffer.
• Jump to the first step.
Cooperation and State-Changing
Rules
• State-changing rules are meta-rules that
encourage agents to transform the world
in a cooperative manner (Goldman and
Rosenschein, 1994).
• In this setting, agents do some extra work,
which (ideally) saves other agents a lot of
effort.
State-Changing Rules in the
Tileworld
Scenario two:
Agents are blocked from the
tiles closest to them by a
barrier. As they move around
the barrier, another agent
prepares the target tile for
them.
Scenario one:
A2 helps A1 by repositioning
the tiles, and thus each agent
wastes 12 moves (on
average) instead of 17.
Weaknesses of the Existing Model
• Naïve cooperation may be
counterproductive in a sufficiently
heterogeneous environment.
• Agents that are hardwired to cooperate
can easily be abused.
The Premises of Conditional
Cooperation
• Perception and inferencing are a primary
means of acquiring information in the
absence of direct communication.
• Multiagent coordination can be based on
perception even if the agents are not
aware of one another.
Conditional Cooperation in Practice
• The agent registers the degree to which it
found its environment to be cooperative.
• The agent uses the technique used
previously for determining its
reconsideration rate to determine its
cooperation level.
Conditional Cooperation in the
Tileworld
• The agent might expect the tiles it comes across
to have many degrees of freedom.
– If this is the case, then the agent observes its
environment to be cooperative and raises its
cooperation level accordingly.
– Otherwise, the agent expresses its disappointment by
lowering its cooperation level.
• The higher the agent’s cooperation level is, the
more it is inclined to allocate resources to tasks
that were not assigned to it.
The Benefits of Conditional
Cooperation
• Agents can react dynamically to situations
where they get in each other’s way, and
thus prevent from instances of negative
cooperation (this happens without the
agents being aware of one another!).
• Agents are no longer exposed to the threat
of being (continually) abused. They expect
to see return on their investment.
Notes on Conditional Cooperation
• If the depth of the history the agent maintains is
1, then the agent’s cooperation strategy
converges with the TIT-FOR-TAT strategy.
• There is no one answer to the question of when
the agent should update its cooperation level.
– Should this happen when the agent constructs a new
plan or initiates the execution of a new plan?
– Should this happen every time the agent comes
across aspects of the environment that may influence
its cooperation level?
Cooperation through Exploration
• So far, agents restricted their patterns of
cooperation to actions they find desirable.
• This choice of actions assumes
homogeneity amongst the agents
occupying the environment.
• In a general setting, this may not
necessarily be the case, and thus
exploratory actions may lead to more
useful patterns of cooperation.
Cooperation through Exploration in
a Natural Setting
• Two robotic agents
occupy some natural
environment that consists
of a hill and a valley.
• The first robot’s task is to
push stones up to the hilltop.
• The second robot’s task is
to push stones down the
hill.
Cooperation through Exploration in
a Natural Setting - Continued
• The first robot is more skilled at moving small
stones, but gets higher reward on moving large
stones.
• The opposite is true of the second agent.
• It may thus be rational for the first robot to push
small stones down the hill, while the second
robot pushes large stones up the hill.
• Each of the robots was designed to perform a
task that another robot – designed by another
person – can do better. Exploratory actions have
led them to cooperate.
Formal Definition of Cooperation
through Exploration
• We look at pairs of of the form:
(Action,CooperationLevel)
• Assuming that the environment is fairly
stable, each such pair has a certain utility
associated with it.
• An exploring agent thus iterates through
this set of cooperation patterns in search
of a pair that would maximize its utility.
Formal Definition of Coopeation
through Exploration - Continued
• Maximization of the agent’s utility thus
reduces to the task of gradient ascent
optimization.
• To make this task tractable, Monte Carlo
methods can be used.
Exploration as a Source of
Adventure
• The classic distinction between bold and
cautious agents finds a natural
interpretation in this setting:
– A bold agent expends considerable energy
searching for an optimal pattern of
cooperation with its environment. Such a
search may lead the agent far away from its
original goals.
– A cautious agent is more reluctant to diverge
from the goals it was designed to accomplish.
The Complexity of Exploratory
Cooperation
• Theoretical results concerning finite,
deterministic domains cannot be applied to
this setting.
• There are malicious stochastic domains
where any exploratory technique will take
exponential time to reach a goal state
[Thrun, 1992].
Exploratory Cooperation as a Form
of Gradient Ascent
• At time 0:
– Start from an arbitrary cooperation mode.
– Execute the selected cooperation mode and register
its utility.
• At time t+1:
– With probability prob(boldness), jump to an
arbitrary cooperation mode.
– With probability 1-prob(boldness), choose the
next cooperation mode (uniformly) from amongst the
current cooperation mode’s untried neighbors.
Exploratory Cooperation as a Form
of Gradient Ascent - Continued
• Stop if either of the following two events
occurs:
– The space of possible cooperation modes has
been exhausted.
– ticks(boldness) clock ticks have
elapsed.
• Revert to the cooperation mode that
yielded the highest utility.
Notes on Cooperation through
Exploration
• The theoretical model of cooperation through
exploration finds its counterpart in many
biological systems [Axelrod, 1984].
• Many decisions remain in the hands of the
designer of the agent. These include:
– Deciding how bold the agent should be.
– Deciding on criteria for determining whether the
environment is stable.
– Deciding how much time the agent should wait before
switching between cooperation modes.
– Deciding on the initial cooperation mode.
Negotiation through Exploration
• Another perspective on exploratory behaviors is
to view them as a form of implicit negotiation.
• The object of negotiation is the multidimensional space formed by the cooperation
modes available to the agents.
• An agreement is a specification of the
cooperation modes the agents will embrace.
• Agents can always jump to the conflict deal and
thus operate on their own.
• This perspective is reminiscent of state oriented
domains [Rosenschein and Zlotkin, 1994].
Conclusions
• Agents operating in realistic environments
cannot afford to calculate their moves in isolation
from the dynamics of their environment [Pollack
and Horty, 1999].
• Cooperation in the dark is a viable form of
interaction between agents.
• Exploratory actions allow the agent to settle into
patterns of interaction that were unforeseen by
its designer.
• Exploratory actions are a form of negotiation that
necessitates minimal assumptions about the
negotiation scenario.
THANK YOU
The Déjà Vu Strategy
0. Assumptions and notations:
– The utilities the agent can register with distinct cooperation
modes are separable.
– The agent has access to some clock.
– The agent’s degree of boldness is denoted by BoldNess and
is an integral value.
1. Select a cooperation mode (i.e., a combination of action
and frequency of performing that action) at random.
2. Execute the selected action at the selected frequency
until the inputs you register from the environment
stabilize.
3. Register the utility of the current cooperation mode.
The Déjà Vu Strategy - Continued
4. Check how many times this utility has been registered so
far.
– If this has happened or more times, then jump to then next step.
– Otherwise, return to the first step.
5. Revert to the cooperation mode with which you
registered the highest utility.
– If this utility is negative, then stop cooperating.
– Otherwise, remain in that cooperation mode so long as its utility
is high enough and the environment has not changed too much.
– If the environment has changed considerably, then jump back to
the first step.
– If the utility of the current cooperation mode drops below that of
another cooperation mode, then delete the current cooperation
mode and return to the beginning of this step.
The Complexity of the Déjà Vu
strategy
• Cooperation modes are treated as points in a
uniform probability space.
• We define a geometrical random variable that
counts how many steps have passed between
two subsequent returns to an arbitrary
cooperation mode.
• Its expectation is equal to the number of
cooperation modes the agent supports.
• Using Chebyshev’s inequality, we obtain that:
Pr(  k *  * N )  1
2*k2
(where  is the agent’s degree of boldness)