Intelligent Agents

Download Report

Transcript Intelligent Agents

Intelligent Agents
Agent Rationality and Auctions
Katia Sycara
The Robotics Institute
email: [email protected]
www.cs.cmu.edu/~softagents
The Electronic Society
• A society of electronic autonomous agents
• The agents have means to communicate with
each other
• The agents do not necessarily know others.
Means for finding others exist (may be costly)
• Agents may be heterogeneous by means of:
–
–
–
–
–
design philosophy
expertise/capabilities
capacity/access to resources
intelligence: the algorithms used for problem solving
as a result: different performance - efficiency, quality
Agents Attitudes
• Self interest: a self-interested agent is
attempting to maximize its own personal payoff
• Benevolence/altruism: a benevolent agent is
attempting to increase others’ payoffs and the
cumulative payoff of the society
• Cooperation: an agent is considered cooperative
when it performs tasks on behalf of other agents
(possibly for payment)
• Non-cooperative has game-theoretic strategic
behavior
Rationality
• A rational behavior is such that an agent
prefers a greater payoff over a smaller one
• A rational agent should always behave
rationally. That is, from among several
options, it should select the one that results
in maximum payoff
• The problem:
– in may cases the number of options is
overwhelming
– there may be no algorithm for finding the best
Bounded Rationality
• To overcome problems of rationality,
bounded rationality:
– limits the time/computation for option
consideration
– prunes the search space
– imposes restrictions on the types of options
• Results in fewer possibilities, hence
– computationally feasible
– may be too restrictive, far from optimal
– strategically inferior to rational
“Good Enough” Behavior
• Make the bounded rationality rational:
– modify linear payoff functions to incorporate
computational costs
– put a cap on payoff
– add a small-amounts’ indifference
• The payoff of an option is good enough if
– too much additional computation to find other
good options, or
– other options do not provide a significant
payoff increase, or
– the agent is indifferent w.r.t. the increase
What are Protocols?
• A protocol (aka mechanism):
– provides a set of rules and behaviors to be
followed by agents who participate in it
– following the rules of a protocol is to an agent’s
discretion, though deviation may leave it “out
of the game”
– examples: auctions, negotiation, voting
• Desired properties:
– maximize payoffs
– not manipulable/enforceable
– simple to implement and execute
What are Strategies?
• A strategy:
– is one of the possible actions an agent can select given the
protocol
– is not dictated (or provided) by the protocol
– is usually the result of the agent’s reasoning and
decisions, based on local algorithms and info.
– examples: bid lowest possible, vote for your faction
• A good strategy:
– should maximize the agent’s payoff given the protocol
and the behavior of other agents
– should be difficult or impossible to manipulate
– should be computationally feasible
– may depend on the strategies of other agents
Protocol evaluation
• Payoff maximization: can refer to individual
payoffs, group payoffs, or social welfare the sum of individual payoffs
• Pareto-optimality: a payoff vector
p(x1,x2,…,xn) is pareto-optimal if there is
no other feasible payoff vector p' such that
at least one payoff is better in p' and no
payoff is worse in p
• Stability: a protocol is stable if once the
agents arrived at a solution they do not
deviate from it
Stability and Equilibria
• There are multiple stability concepts. In game
theory, the notion of equilibrium is used:
– dominant strategies: the agents have some strategies
that, regardless of what others do, maximize payoff
– Nash equilibrium: the agents have strategies that, as
long as other stick to their’s, maximize payoff
– Mixed Nash: the agents each have a set of strategies
from among which they select one with some
probability
– Bayes-Nash: adds history to the previous one
The Prisoner’s Dilemma
Payoff table for a 2- Player 2’s strategies
player, no-repetition
Prisoner’s Dilemma
Cooperate Defect
game
Player 1’s Cooperate
2, 2
-2, 4
strategies
Defect
4,-2
-1,-1
No Pure Nash Equilibrium
Payoff table for a 2- Player 2’s strategies
player game with no
pure Nash equilibrium Cooperate Defect
Player 1’s Cooperate
strategies
Defect
6, 2
0, 3
0, 1
5, 0
Mixed Nash
• Player 1 will cooperate with probability pc and
defect with probability pd
• Player 2 will cooperate with probability qc and
defect with probability qd
• Expected utility of an agent is the utility from
a strategy times the probability of this strategy
being selected
• When there are multiple possibilities, the
expected utility is a sum over these
possibilities
Computing the Probabilities
• The expected utility of an agent x when the
other agent y follows strategy s is denoted by
Ux(s)
• In the case of equilibrium (mixed Nash), the
expected utility of x should be the same for all
of the possible strategies of y
• In our case we have agents 1,2 and strategies c,d
• We require that Ux(c) = Ux(d), which means that
for each of the two agents, the expected utility
from the other cooperating should be equal to
the expected utility from the other defecting
Computation Details
•
•
•
•
•
•
•
For agent 1 we have:
U1(c)= qc(6 pc+ 0 pd), U1(d)= qd(0 pc+ 5 pd)
For agent 2 we have:
U2(c)= pc(2 qc+ 3 qd), U2(d)= pd(1 qc- 0 qd)
The requirement that Ux(c) = Ux(d) results in:
qc= 0.642, qd = 0.358
pc= 0.317, pd = 0.683
Tragedy of Common Goods
• Information on the web is (mostly) free
• Agents that seek up to date information may
query web site as frequently as desired
• If all agents will do so, the network will be
overly congested, and some servers will crash
• So, is it undesirable to behave this way?
• If all (or most) of the agents prevent
congestion, it is in the best interest of each
individual agent to increase network use ...
The Contract Net Protocol
• An agent coordination and distributed task
allocation mechanism, where:
– multiple heterogeneous agents can perform tasks
– agents can play two roles: managers, contractees
– managers receive tasks, select prospective
contractees and ask for bids
– best bid wins task, performs it, manager monitors
• Pros and cons:
–
–
–
–
simple to implement, base for many other protocols
fully distributed
performance quality not checked
easy to manipulate (free riders), may cause loops
Lying
• Optimality analysis of previous contracts
assumes that agents are sincere: reveal their real
marginal cost when they bid, their tasks
– but it is beneficial to lie about costs ...
– and about tasks: hide, phantom, generate
• Immunity to lying:
– pure deals (disjoint tasks): not immune
– mixed deals (probability distribution): phantom
immune
Commitment
• A contractee should commit to task performance
and a manager, not to terminate contract, but:
• During execution, contractee may receive a more
profitable task, or manager - a better bid
• Self-interest will result in de-commitment
• To prevent losses - need for enforcement
• De-commitment penalties, leveled commitment
• Hedging against risk: pricing a contract like an
option, taking into account future via
probabilities of events
Contracts
• Contingency contracts - when probability of
future events is known, similar to mixed Nash
– but this is, in general, exponentially complex
• Leveled commitment contracts - allow unilateral
de-commitment at any time, penalties set in
advance
– self-interested agents may avoid beneficial decommitment based on chance that the other party do
so
– not optimal, but better than non-leveled
• Option pricing
– optimal (complete knowledge, infinite markets)
Background
• For the binding
contracts, decisions
are “now or never”.
• For the contingent
contracts, decisions
could be deferred for
the future when more
information could be
obtained.
Advantages of
contingent contracts:
(1) the space of possible
decisions is enlarged
(2) the decision maker’s
payoff can benefit
given these
flexibilities
(3) the overall payoff
(social welfare) could
be improved
Model
A call option gives the holder the right to buy the
underlying asset by a certain date(expiration date)
for a certain price(strike price).
• American option can be exercised at any time up
to the expiration date.
• Six factors have effects on the price of a stock
option:stock price,strike price,time to
expiration,volatility,risk-free interest and
dividends.
• Contingent contracts where an agent can
decommit at any time can be viewed as American
call options without dividends.
Simulation and Analysis
Assumptions for the Simulation:
• Only one agent can take a role as a manager.
• The manager generates non-decomposable tasks in
a predefined frequency and task specifies the
contract duration and execution time for each task.
• The task value is linearly increasing from 10 to
100 by 10 periodically. Similar to stock price, the
task value is stochastic. Using Monte Carlo
method, the manager simulates the task value at
each time period and announces the current task
value to all contractors.
Simulation and Analysis
• Each contractor has a queue of capacity equal to
10 tasks, and he schedules his tasks only
according to the latest start time.
• If a contractor breaches a contract during the task
execution, both manager and contractor can get
partial result.
• The volatility and risk-free interest in the option
pricing model are fixed for all the experiments.
Performance Evaluation
Three dimensions are focused on to evaluate the
performance: Throughput, Social Welfare and
Negotiation Efficiency.
• Throughput : total number of tasks executed
within predefined experiment duration.
• Social Welfare is defined as the total payoff of all
the agents, which is used to check the global
optimality.
• Negotiation Efficiency is measured by the total
decision making time. It represents the total time
spent in all the tasks from the moment they are
assigned until the moment they are executed or
breached.
Conclusion
• CCP incorporates option pricing theory so
contracts could be modeled in a very natural
way.
• CCP provides a computational framework
for the agents to calculate the value of
flexible contract, the payoff and penalty fee,
when to breach.
Conclusion (contd)
• Comparing to CNP, LCP and CCP are less
computational efficient. CCP provides a
more general framework for the agents to
evaluate and compute optimal decisions in
face of uncertainty.
• Both LCP and CCP in scenario2 have a high
solution quality; while LCP achieves a good
tradeoff between commitment and
flexibility.
Auctions
• A centralized protocol, includes one auctioneer
and multiple bidders
• The auctioneer puts a good for sale. In some
cases, the good may be a combination of other
goods, or a good with multiple attributes
• The bidders make offers. This may be repeated
for several times, depending on the auction type
• The auctioneer determines the winner
Auctions: Pros and Cons
•
•
•
•
•
•
Usually easier to prevent bidder lying
Simple protocols
Centralized: a single point of failure
Multi-attribute exponentially complex
Allows collusion “behind the scenes”
May favor the auctioneer
Auction Types
• Private value: the value of a good to a
bidder agent depends only on its private
preferences. Assumed to be known exactly
• Common value: the good’s value depends
entirely on other agents’ valuation
• Correlated value: the good’s value depends
on internal and external valuations
Auction Protocols
• English auction (aka first-price open-cry):
– bidders free to raise their bid
– end: no more raises, winner: highest bidder at bid
– agent strategy: a series of bids, based on private
value, estimates of others’ valuations, their past bids
– dominant strategy: bid a small amount more than
current highest bid, stop when private value reached
• For correlated value:
– auctioneer increases price by constant or other rate
– open-exit allows to quit without re-entry
More Protocols
• First-price sealed-bid auction:
– each bidder submits one bid, not knowing others’
– highest wins, pays his bid
– agent strategy: function of private value and beliefs
about others’ valuations
– no dominant strategy. Best: bid less than true value
– how much less? Nash is computable if probability
distribution of agents’ values is known
• Example: n agents, uniform value distribution,
agent i has value vi, there is Nash if each agent i
bids vi(n-1)/n
Yet More Auctions
• Dutch auction (decending):
– the seller lowers the price until a bidder takes it
– strategically, equivalent to first-price sealed-bid
– advantage: auctioneer can accelerate auction
• All-pay auction:
– each bidder pays its bid to the auctioneer
– several types of such auctions are used for
resource (re-)allocation
Vickrey (second-price sealed-bid)
• Each bidder submits one bid, not knowing others’
• The highest bid wins, but bidder pays second-highest
bid
• Agent strategy: base bid on private value and beliefs
about others’ values
• Dominant strategy: bid true valuation
– if it bids more and this increment made him win, the agent
ends up with a loss, since it may pay more that its true value
– if it bids less, there is a smaller chance of winning (but
winning price is not affected)
• Meaning: bid true value regardless of others
So, Which Auction is Better?
• Computation: auctions with dominant strategies
(Vickrey and English) are more efficient - no
need to speculate regarding other bidders
• Auctioneer’s revenue:
– second-price is less than the true price, however firstprice bidders under-bid. Which effect is stronger?
– for risk-neutral bidders with private independent
values, the effects are equivalent
– for risk-averse bidders, Dutch and first-price sealedbid auctions maximize auctioneer’s revenue
Real Auctions
• In real auctions, values are not private
• As a result, for 3 or more bidders, English
auctions provides auctioneer revenue higher than
Vickrey does
• Explanation: when it observes other bidders
increasing their bid, a bidder increases its own
valuation of the good
• Both English and Vickrey are better for the
auctioneer than Dutch and first-price sealed-bid
Collusion
• Bidder can coordinate their bids to lower them
• In English and Vickrey auctions, collusion is a
dominant Strategy!
• Example:
– agents a,b,c values of the good are 10,10,12,
respectively
– they can agree to bid 5,5,6 respectively
– if one defects, all observe that, and can increase to
real value, so there is no benefit from defection
Avoiding Collusion
• In the first-price sealed-bid and Dutch auctions,
bidder collusion is not dominant, but possible:
– in the previous example, after a,b,c decided on
bidding 5,5,6, it is beneficial for a,b to bid more than
5. For any bid of c below 10 they can bid and win
• In first-price sealed-bid, Vickrey and Dutch
auctions, all bidders must identify each other and
collude jointly. External bidder can win
• In the English auction identifying is through
bidding. Computerize anonymization can
prevent identification and collusion
Insincere Auctioneer
• Private value auctions:
– Vickrey: auctioneer can overstate the second
highest bid to the winner
– Solution: electronic signature
– Other auctions do not motivate auctioneer
lying, since the winner pays its bid
• Non-private value:
– English: auctioneer can use shills that bid in the
auction to increase real bidders valuation
– Any auction: auctioneer may bid, to guarantee a
minimum price
Example: Auctioneer Bid
• In the Vickrey auction, auctioneer is motivated
to bid over its true reservation price
• In case his bid is second, it determines the
good’s price higher than the reservation price
• On the other hand, auctioneer may win although
others value the good at more than reservation
price
Insincere Bidders
• Non-private value:
– winner’s curse: an agent that bids its true value and
wins knows that it was too high
– this means the a win is a loss (of money)
– hence, agents should bid less than true value
– this is the best strategy even in Vickrey (unlike
private value Vickrey)
• Private value, Vickrey:
– dominant truthful bidding reveals true valuations
– this may be disadvantageous:
• when subcontracting, subcontractors may re-negotiate
Auctions of Interrelated Goods
• Multiple homogeneous goods: truth
revelation of Vickrey holds
• Heterogeneous goods, one at a time,
interdependent values:
– for optimal bidding, agents need full lookahead
– but then agents don’t bid true values per good
• Protocol modifications to overcome that:
– pool of goods at a single auction
– allow decommitemt, with penalties
Negotiation
• A process by which two or more agents reach a
joint decision, each trying to achieve an
individual goal/objective. Includes:
– a conversation language
– a protocol
– a decision process (by which an agent decides upon
its position, concessions, criteria for agreement
• Can be performed 1:1, 1:N, N:N
• May include a single shot message by each party
or conversation with several messages going
back and forth
Negotiation Desired Properties
• Efficiency: little time spent on arriving at
agreements
• Stability: once an agreement is reached, agents
should stick to it
• Simplicity: computation and communication
overheads should be small
• Distribution: there should not be a central
decision maker
• Symmetry: the mechanism should not be biased
Some Details
• Goal of negotiation: arrive at an agreement beneficial
to all parties
• Initially, parties may have different beliefs
• So, a proposal and a counter-proposal are not merely an
offer for an agreement - they are an attempt to change
the beliefs of the other party
• Beliefs are based on facts and justifications
• In MAS, Truth Maintenance Systems (TMS), utilizing
logic, serve for belief maintenance and revision
• Commonly includes: internal, external, out
Are Auctions a Negotiation Protocol?
• In auctions, the protocol does not assume, and
attempts to prevent, inter-agent cooperation
• Auctions centralize the commerce
• If the number of buyer-bidders is small, the final
price may be far lower than reservation price
• To enable maximal exploitation of cooperation
opportunities, thus payoff maximization, there is
a need for free, elaborate, one on one, and many
to many negotiation
Cooperation via Coalitions
• A coalition: a set of agents that agree to
cooperate to execute a task/achieve a goal
• Assumptions:
– agents have different expertise and capacities
– tasks require cooperation of different agents for
their execution (in terms of reasoning/performance)
– task have values, depend on coalition members
• To perform tasks and increase benefits agents
may need to cooperate via coalition formation
Transportation Example
• A transportation company needs to mobilize 10
passengers
• It has access to cars, vans, buses and helicopters, all are
possibly self-interested service providers
• A car can take 3 passengers, a van can take 7, a bus can
take 50 and an helicopter can take 6
• Each vehicle is priced differently and has different
speed and comfort
• Passengers are willing to pay a fixed amount
• The company needs to find the best, or at least a good,
combination of vehicles for the task
Issues in Coalition Formation
• Given the tasks and the other agents, which
coalitions should an agent attempt to form?
• What mechanism can an agent use for coalition
formation?
• What guarantees regarding efficiency and
quality of task performance can the mechanism
provide?
• Once a coalition has formed, how should its
members go about distribution of work/payoff?
Coalition Formation: Solutions
• Self-interest vs. benevolence: the mechanisms
for benevolent agents are usually much simpler,
as such agents do not need means to maintain
their own payoff maximization
• Centralization vs. distribution: central design of
coalitions is usually much simpler to execute
and enforce than a distributed one is
• Environment super-additivity: in super-additive
environments any unification of two coalitions
increases overall payoff. Strongly influences the
mechanism
Agent1
b1
Agent2
b2
T1(b1,b2,b3)
T2(b2,b4)
Agent3
b3
Agent10
b1
Agent5
b5
Agent4
b4
T3
Agent6
b6
Agent7
b8
Agent8
Agent11
b3
T4(b1,b3,b7)
Agent12
b7
Agent9
T5
Agent13
b4
Coalition Formation in Dynamic,
Open MAS
• Coalition formation (Shehory and Sycara)
• Main idea (distributed greedy algorithm):
– Each agent performs iteratively:
•
•
•
•
Design possible coalitions w.r.t. tasks
Compute coalition-task values
Choose best one and form it
Re-design when new tasks/agents arrive
Agent4
Agent1 b1
T1
Agents
3,5,6,8
Agent5
b2
T11 T12 T13 Who has
capability?
Middle
Agent3
Agent
b1
Agent7
Agent2
b7
b8
Agent8
Agent10
b3
Middle
Agent
Agent6
b3
Agent11
Agent12