Introduction of Beer Game - University of Pennsylvania

Download Report

Transcript Introduction of Beer Game - University of Pennsylvania

Artificial Agents Play the Beer Game
Eliminate the Bullwhip Effect
and Whip the MBAs
Steven O. Kimbrough
D.-J. Wu
Fang Zhong
FMEC, Philadelphia, June 2000; file: beergameslides.ppt
The MIT Beer Game
• Players
– Retailer, Wholesaler, Distributor and Manufacturer.
• Goal
– Minimize system-wide (chain) long-run average cost.
• Information sharing: Mail.
• Demand: Deterministic.
• Costs
– Holding cost: $1.00/case/week.
– Penalty cost: $2.00/case/week.
• Leadtime: 2 weeks physical delay
Timing
1. New shipments delivered.
2. Orders arrive.
3. Fill orders plus backlog.
4. Decide how much to order.
5. Calculate inventory costs.
Game Board
…
The Bullwhip Effect
• Order variability is amplified upstream in the
supply chain.
• Industry examples (P&G, HP).
Observed Bullwhip effect from
undergraduates game playing
Wholesaler's Order
40
40
30
30
Order
Order
Retailer's Order
20
10
20
10
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
1
2
3
4
5
6
7
8
9
10
11
12
13
Week
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
18
19
20
21
22
23
24
25
26
27
28
29
Week
Distributor's Order
Factory's Order
40
40
30
30
Order
Order
14
20
10
20
10
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Week
17
18
19
20
21
22
23
24
25
26
27
28
29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Week
17
Bullwhip Effect Example (P & G)
Lee et al., 1997, Sloan Management Review
Analytic Results: Deterministic Demand
• Assumptions:
– Fixed lead time.
– Players work as a team.
– Manufacturer has unlimited capacity.
• “1-1” policy is optimal -- order whatever
amount is ordered from your customer.
Analytic Results: Stochastic Demand
(Chen, 1999, Management Science)
• Additional assumptions:
–
–
–
–
Only the Retailer incurs penalty cost.
Demand distribution is common knowledge.
Fixed information lead time.
Decreasing holding costs upstream in the chain.
• Order-up-to (base stock installation) policy
is optimal.
Agent-Based Approach
• Agents work as a team.
• No agent has knowledge on demand
distribution.
• No information sharing among agents.
• Agents learn via genetic algorithms.
• Fixed or stochastic leadtime.
Research Questions
• Can the agents track the demand?
• Can the agents eliminate the Bullwhip
effect?
• Can the agents discover the optimal policies
if they exist?
• Can the agents discover reasonably good
policies under complex scenarios where
analytical solutions are not available?
Flowchart
Agents Coding Strategy
•
•
•
•
Bit-string representation with fixed length n.
Leftmost bit represents the sign of “+” or “-”.
The rest bits represent how much to order.
Rule “x+1” means “if demand is x then order
x+1”.
• Rule search space is 2n-1 – 1.
Experiment 1a: First Cup
• Environment:
– Deterministic demand with fixed leadtime.
– Fix the policy of Wholesaler, Distributor and
Manufacturer to be “1-1”.
– Only the Retailer agent learns.
• Result: Retailer Agent finds “1-1”.
Experiment 1b
• All four Agents learn under the environment of
experiment 1a.
• Über rule for the team.
• All four agents find “1-1”.
Result of Experiment 1b
All four agents can find the optimal “1-1” policy
Artificial Agents Whip the MBAs and
Undergraduates in Playing the MIT Beer
Game
Accum ulated Cost Com parison of MBAs and our agents
5000
Accumulated Cost
4000
MBA Gr oup1
MBA Gr oup2
3000
MBA Gr oup3
Agent
Under Gr ad Gr oup1
2000
Under Gr ad Gr oup2
Under Gr ad Gr oup3
1000
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Week
Stability (Experiment 1b)
• Fix any three agents to be “1-1”, and allow the fourth
agent to learn.
• The fourth agent minimizes its own long-run average
cost rather than the team cost.
• No agent has any incentive to deviate once the others
are playing “1-1”.
• Therefore “1-1” is apparently Nash.
Experiment 2: Second Cup
• Environment:
– Demand uniformly distributed between [0,15].
– Fixed lead time.
– All four Agents make their own decisions as in
experiment 1b.
• Agents eliminate the Bullwhip effect.
• Agents find better policies than “1-1”.
Artificial agents eliminate the Bullwhip effect.
20
18
16
14
Retailer
WholeSaler
10
Factory
8
Distributer
6
4
2
Week
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
0
1
Order
12
Artificial agents discover a better policy than “1-1”
when facing stochastic demand with penalty costs
for all players.
Accumulated Cost vs. Week
5000
3000
Agent Cost
1-1 Cost
2000
1000
Week
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
0
1
Accum ulated Cost
4000
Experiment 3: Third Cup
• Environment:
– Lead time uniformly distributed between [0,4].
– The rest as in experiment 2.
• Agents find better policies than “1-1”.
• No Bullwhip effect.
• The polices discovered by agents are Nash.
Artificial agents discover better and stable policies
than “1-1” when facing stochastic demand and
stochastic lead-time.
Artificial Agents are able to eliminate the Bullwhip
effect when facing stochastic demand with stochastic
leadtime.
Agents learning
Winner Strategies
Generation
Total Cost
Retailer
Wholesaler
Distributor
Manufacturer
0
x–0
x–1
x+4
x+2
7380
1
x+3
x–2
x+2
x+5
7856
2
x–0
x+5
x+6
x+3
6987
3
x–1
x+5
x+2
x+3
6137
4
x+0
x+5
x–0
x–2
6129
5
x+3
x+1
x+ 2
x+3
3886
6
x–0
x+1
x+2
x+0
3071
7
x+2
x+1
x+2
x+ 1
2694
8
x+1
x+1
x+2
x+1
2555
9
x+1
x+1
x+2
x+1
2555
10
x+1
x+1
x+2
x+1
2555
The Columbia Beer Game
• Environment:
– Information lead time: (2, 2, 2, 0).
– Physical lead time: (2, 2, 2, 3).
– Initial conditions set as Chen (1999).
• Agents find the optimal policy: order
whatever is ordered with time shift, i.e.,
Q1 = D (t-1), Qi = Qi-1 (t – li-1).
Ongoing Research: More Beer
•
•
•
•
Value of information sharing.
Coordination and cooperation.
Bargaining and negotiation.
Alternative learning mechanisms: Classifier
systems.
Summary
• Agents are capable of playing the Beer Game
–
–
–
–
Track demand.
Eliminate the Bullwhip effect.
Discover the optimal policies if exist.
Discover good policies under complex scenarios where
analytical solutions not available.
• Intelligent and agile supply chain.
• Multi-agent enterprise modeling.
A framework for multi-agent intelligent
enterprise modeling
Pricing
Agent
Investment
Agent
Executive Community
(StrategyFinder)
Production
Community
(LivingFactory)
Supply Chain
Community
(DragonChain)
Factory
Agent
Distributor
Agent
E-Marketplace
Community
(eBAC)
Retailer
Agent
Wholesaler
Agent
Bidding
Agent
Contracting
Agent
Auction
Agent