Markov Chains - Tutorial #5
Download
Report
Transcript Markov Chains - Tutorial #5
Markov Chains as
a Learning Tool
.
Markov Process
Simple Example
Weather:
• raining today
40% rain tomorrow
60% no rain tomorrow
• not raining today
20% rain tomorrow
80% no rain tomorrow
Stochastic Finite State Machine:
0.6
0.4
rain
0.8
no rain
0.2
2
Markov Process
Simple Example
Weather:
• raining today
40% rain tomorrow
60% no rain tomorrow
• not raining today
20% rain tomorrow
80% no rain tomorrow
The transition matrix:
Rain
Rain
No rain
No rain
0.4 0.6
P
0.2 0.8
• Stochastic matrix:
Rows sum up to 1
• Double stochastic matrix:
Rows and columns sum up to 1
3
Markov Process
Let Xi be the weather of day i, 1 <= i <= t. We may
decide the probability of Xt+1 from Xi, 1 <= i <= t.
• Markov Property: Xt+1, the state of the system at time t+1 depends
only on the state of the system at time t
PrX t 1 xt 1 | X 1 X t x1 xt PrX t 1 xt 1 | X t xt
X1
X2
X3
X4
X5
• Stationary Assumption: Transition probabilities are independent of
time (t)
Pr X t 1 b | X t a pab
Markov Process
Gambler’s Example
– Gambler starts with $10 (the initial state)
- At each play we have one of the following:
• Gambler wins $1 with probability p
• Gambler looses $1 with probability 1-p
– Game ends when gambler goes broke, or gains a fortune of $100
(Both 0 and 100 are absorbing states)
p
0
1
1-p
p
p
99
2
1-p
p
1-p
1-p
100
1-p
Start
(10$)
5
Markov Process
• Markov process - described by a stochastic FSM
• Markov chain - a random walk on this graph
(distribution over paths)
• Edge-weights give us
Pr X t 1 b | X t a pab
• We can ask more complex questions, like PrX t 2 a | X t b ?
p
0
1
1-p
p
p
99
2
1-p
p
100
1-p
1-p
Start
(10$)
6
Markov Process
Coke vs. Pepsi Example
• Given that a person’s last cola purchase was Coke,
there is a 90% chance that his next cola purchase will
also be Coke.
• If a person’s last cola purchase was Pepsi, there is
an 80% chance that his next cola purchase will also be
Pepsi.
transition matrix:
coke
coke
pepsi
pepsi
0.9 0.1
P
0.2 0.8
0.1
0.9
coke
0.8
pepsi
0.2
7
Markov Process
Coke vs. Pepsi Example (cont)
Given that a person is currently a Pepsi purchaser,
what is the probability that he will purchase Coke two
purchases from now?
Pr[ Pepsi?Coke ] =
Pr[ PepsiCokeCoke ] + Pr[ Pepsi Pepsi Coke ] =
0.2 *
0.9
+
0.8 *
0.2
= 0.34
00.9.9 00.1.1 0.9 0.1 0.83 0.17
P
00.2.2 00.8.8 0.2 0.8 0.34 0.66
2
Pepsi ?
? Coke
8
Markov Process
Coke vs. Pepsi Example (cont)
Given that a person is currently a Coke purchaser,
what is the probability that he will buy Pepsi at the
third purchase from now?
0.9 0.1 0.83 0.17 0.781 0.219
P
0.2 0.8 0.34 0.66 0.438 0.562
3
9
Markov Process
Coke vs. Pepsi Example (cont)
•Assume each person makes one cola purchase per week
•Suppose 60% of all people now drink Coke, and 40% drink Pepsi
•What fraction of people will be drinking Coke three weeks from now?
0.9 0.1
P
0
.
2
0
.
8
0.781 0.219
P
0
.
438
0
.
562
3
Pr[X3=Coke] = 0.6 * 0.781 + 0.4 * 0.438 = 0.6438
Qi - the distribution in week i
Q0= (0.6,0.4) - initial distribution
Q3= Q0 * P3 =(0.6438,0.3562)
10
Markov Process
Coke vs. Pepsi Example (cont)
Simulation:
2/3
3
Pr[Xi = Coke]
2
1
0.9 0.1 2
3
3
0
.
2
0
.
8
1
3
stationary distribution
0.1
0.9
coke
0.8
pepsi
0.2
week - i
11
How to obtain Stochastic matrix?
Solve the linear equations, e.g.,
2 3
x
1,1
x
1, 2
1
2
3
3
x 2,1 x 2, 2
1
3
Learn from examples, e.g., what
letters follow what letters in
English words: mast, tame,
same, teams, team, meat,
steam, stem.
12
How to obtain Stochastic matrix?
Counts table vs Stochastic Matrix
P
a
s
t
m
e
\0
a
0
1/7
1/7
5/7
0
0
e
4/7
0
0
1/7
0
2/7
m
1/8
1/8
0
0
3/8
3/8
s
1/5
0
3/5
0
0
1/5
t
1/7
0
0
0
4/7
2/7
@
0
3/8
3/8
2/8
0
0
13
Application of Stochastic matrix
Using Stochastic Matrix to generate a random word:
C
Generate most likely first letter
For each current letter generate most likely next letter
A
a
s
t
m
e
\0
a
-
1
2
7
-
-
e
4
-
-
5
-
7
m
1
2
-
-
5
8
s
1
-
4
-
-
5
t
1
-
-
-
5
7
@
-
3
6
8
-
-
If C[r,j] > 0,
let A[r,j] = C[r,1]+C[r,2]+…+C[r,j]
14
Application of Stochastic matrix
Using Stochastic Matrix to generate a random word:
Generate most likely first letter: Generate a random number x
between 1 and 8. If 1 <= x <= 3, the letter is ‘s’; if 4 <= x <= 6,
the letter is ‘t’; otherwise, it’s ‘m’.
For each current letter generate
A
a
s
t
m
e
most likely next letter: Suppose
a
1
2
7
the current letter is ‘s’ and we
e
4
5
generate a random number x
m
1
2
5
between 1 and 5. If x = 1, the next
s
1
4
letter is ‘a’; if 2 <= x <= 4, the next
t
1
5
letter is ‘t’; otherwise, the current
@
3
6
8
letter is an ending letter.
\0
7
8
5
7
-
If C[r,j] > 0,
let A[r,j] = C[r,1]+C[r,2]+…+C[r,j]
15
Supervised vs Unsupervised
tree learning is “supervised
learning” as we know the correct output of
each example.
Learning based on Markov chains is
“unsupervised learning” as we don’t know
which is the correct output of “next letter”.
Decision
16
K-Nearest Neighbor
Features
All instances correspond to points in an ndimensional Euclidean space
Classification is delayed till a new instance
arrives
Classification done by comparing feature
vectors of the different points
Target function may be discrete or real-valued
1-Nearest Neighbor
3-Nearest Neighbor
Example:
Identify Animal Type
14 examples
10 attributes
5 types
What’s the type of
this new animal?
20
K-Nearest Neighbor
An
arbitrary instance is represented by(a1(x), a2(x),
a3(x),.., an(x))
ai(x) denotes features
Euclidean distance between two instances
d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) - ar(xj))2)
Continuous valued target function
mean value of the k nearest training examples
Distance-Weighted Nearest Neighbor
Algorithm
Assign
weights to the neighbors based on their
‘distance’ from the query point
Weight ‘may’ be inverse square of the
distances
All training points may influence a particular
instance
Shepard’s method
Remarks
+ Highly effective inductive inference method for
noisy training data and complex target functions
+ Target function for a whole space may be
described as a combination of less complex local
approximations
+ Learning is very simple
- Classification is time consuming (except 1NN)