IN THE NAME OF ALLAH DECISION MAKING BY USING THE …
Download
Report
Transcript IN THE NAME OF ALLAH DECISION MAKING BY USING THE …
IN THE NAME OF ALLAH
DECISION MAKING BY USING
THE THEORY OF EVIDENCE
STUDENTS:
HOSSEIN SHIRZADEH, AHAD OLLAH EZZATI
SUPERVISOR:
Prof. BAGERI SHOURAKI
SPRING 2009
1
OUTLINES
INTRODUTION
BELIEF
FRAMES OF DISCERNMENT
COMBINIG THE EVIDENCE
ADVANTAGES OF DS THEORY
DISADVANTAGES OF DS THEORY
BASIC PROBABLITY ASSIGNMENT
BELIEF FUNCTIONS
DEMPSTER RULE OF COMBINATION
ZADEH’S OBJECTION TO DS THEORY
GENERALIZED DS THEORY
AN APPLICATION OF DECISION MAKING METHOD
2
INTRODUCTION
Introduced
by Glenn Shafer in 1976
“A mathematical theory of evidence”
A
new approach to the
representation of uncertainty
What means uncertainty? Most
people don’t like uncertainty
Applications
Expert systems
Decision making
Image processing, project planning, risk
analysis,…
3
INTRODUCTION
All
students of partial belief have tied it
to Bayesian theory and
I.
Committed to the value of idea and
defend it
II.
Rejected the theory (Proof of inviability)
4
INTRODUCTION
BELIEF FUNCTION
• : Finite set
• Set of all subsets :
• Bel : 2 [0,1]
2
(1) Bel( ) 0
(2) Bel() 1
(3) Bel( 1 ... n )
n 1
Bel
(
)
Bel
(
)
...
(
1
)
Bel( 1 ... n )
i
i
j
i
i j
• Then Bell is called belief function on
5
INTRODUCTION
BELIEF FUNCTION
Bel : 2 [0,1] is called simple support
function if
There exists a non-empty subset A of and
0 s 1 that
0 if B does not contain A
Bell ( B ) s if B contain A but B
1 if B
6
INTRODUCTION
THE IDEA OF CHANCE
For several centuries the idea of numerical
degree of belief has been identified with the
idea of chance.
Evidence Theory is intelligible only if we reject
this unification
Chance :
A random experiment : unknown outcome
The proportion of the time that a particular one the
possible outcomes tends to occur
7
INTRODUCTION
THE IDEA OF CHANCE
• Chance density
– Set of all possible outcomes :X
– Chance q(x) specified for each possible
outcome
q : [0,1]
– A chance density must satisfy :
q( x) 1
x
8
INTRODUCTION
THE IDEA OF CHANCE
• Chance function
– Proportion of time that the actual
outcome tends to be in a particular subset
of X. Ch(U ) q( x)
xU
– Ch is a chance function if and only it
obeys the following
(1) Ch( ) 0
(2) Ch( ) 1
(3) if U ,V and U V
then Ch(U V ) Ch(U ) Ch(V )
9
INTRODUCTION
CHANCES AS DEGREES OF BELIEF
• If we know the chances then we will
surely adopt them as our degrees of
belief
• We usually don’t know the chances
– We have little idea about what chance
density governs a random experiment
– Scientist is interested in a random
experiment precisely because it might
be governed by any one of several
chance densities
10
INTRODUCTION
CHANCES AS DEGREES OF BELIEF
• Chances :
– Features of the world
• This is the way shafer addresses chance
– Features of our knowledge or belief
• Simon Laplace
– Deterministic
• Since the advent of Quantum mechanics this
view has lost it’s grip on physics
11
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
• Very Popular theory of partial belief
– Called Bayesian after Thomas Bayes
• Adapts the three basic rules for
chances as rules for one’s degrees of
belief based on a given body of
evidence.
• Conditioning : changing one’s degree of
belief when that evidence is
augmented by the knowledge of a
particular proposition
12
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
Bell :2 [0, 1] obey s
(1) Bel ( ) 0
(2) Bell () 1
(3) If A B , then Bell A B) Bell ( A) Bell ( B)
When we learn that A is true then
Bell ( A B)
(4) Bell A ( B)
Bell ( A)
13
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
• The Bayesian theory is contained in
Shafer’s evidence theory as a
restrictive special case.
• Why is Bayesian Theory too
restrictive?
– The representation of Ignorance
– Combining vs. Conditioning
14
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
THE REPRESENTATION OF IGNORANCE
In Evidence Theory
• Belief functions
– Little evidence:
• Both the proposition and it’s
negation have very low degrees of
belief
– Vacuous belief function
0 if A
Bel( A)
1 if A
15
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
COMBINATION VS. CONDITIONING
• Dempster rule
– A method for changing prior opinion in the
light of new evidence
• Deals symmetrically with the new and
old evidence
• Bayesian Theory
– Bayes rule of conditioning
• No Obvious symmetry
• Must assume exact and full effect of the
new evidence is to establish a single
proposition with certainty
16
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
THE REPRESENTATION OF IGNORANCE
• In Bayesian Theory:
– Can not distinguish between lack of belief and disbelief
– A
can not be low unless Bel (A ) is high
– Failure to believe A necessitates accordance of belief to
Bel( A)
– Ignorance represented by :
1
Bel ( A ) Bel (A)
2
– Important factor in the decline of Bayesian ideas in the
nineteenth century
– In DS theory
17
Bel(A) Bel(A) 1
BELIEF
The belief in a particular hypothesis is denoted
by a number between 0 and 1
The belief number indicates the degree to which
the evidence supports the hypothesis
Evidence against a particular hypothesis is
considered to be evidence for its negation (i.e., if
Θ = {θ1, θ2, θ3}, evidence against {θ1} is considered
to be evidence for {θ2, θ3}, and belief will be
allotted accordingly)
18
FRAMES OF DISCERNMENT
Dempster - Shafer theory assumes a fixed,
exhaustive set of mutually exclusive events
Θ = {θ1, θ2, ..., θn}
Same assumption as probability theory
Dempster - Shafer theory is concerned with the set of
all subsets of Θ, known as the Frame of Discernment
2Θ = {, {θ1}, …, {θn}, {θ1, θ2}, …, {θ1, θ2, ... θn}}
Universe of mutually exclusive hypothesis
19
FRAMES OF DISCERNMENT
A
subset {θ1, θ2, θ3} implicitly represents
the proposition that one of θ1, θ2 or θn is the
case
The
complete set Θ represents the
proposition that one of the exhaustive set of
events is true
So Θ is always true
empty set represents the proposition
that none of the exhaustive set of events is
true
The
So always false
20
COMBINING THE EVIDENCE
Dempster-Shafer
Theory as a theory
of evidence has to account for the
combination of different sources of
evidence
Dempster & Shafer’s Rule of
Combination is a essential step in
providing such a theory
This rule is an intuitive axiom that
can best be seen as a heuristic rule
rather than a well-grounded axiom.
21
ADVANTAGES OF DS THEORY
The
difficult problem of specifying priors
can be avoided
In
addition to uncertainty, also ignorance
can be expressed
It
is straightforward to express pieces of
evidence with different levels of
abstraction
Dempster’s
combination rule can be used
to combine pieces of evidence
22
DISADVANTAGES
Potential
computational complexity
problems
It
lacks a well-established decision theory
whereas Bayesian decision theory
maximizing expected utility is almost
universally accepted.
Experimental
comparisons between DS
theory and probability theory seldom done
and rather difficult to do; no clear
advantage of DS theory shown.
23
BASIC PROBABILITY ASSIGNMENT
The
basic probability assignment (BPA),
represented as m, assigns a belief number
[0,1] to every member of 2Θ such that the
numbers sum to 1
(1) m( ) 0
(2)
m( A) 1
A
m(A)
represents the maesure of the belief
that is committed exactly to A (to individual
element A and to no smaller subset)
24
BASIC PROBABILITY ASSIGNMENT
EXAMPLE
suppose {Blue, Black, Yellow, Other}
Diagnostic problem
No information
m ( ) 1
60 of 100 are blue
m({Blue}) 0.6
m() 0.4
30 of 100 are blue and rest of them are black or yellow
m({Blue}) 0.3
m({Black, Yellow}) 0.7
m() 0
25
25
BELIEF FUNCTIONS
Obtaining the measure of the total belief
committed to A:
Bel( A) m( B)
B A
Belief functions can be characterized without
reference to basic probability assignments:
1.Bel( ) 0
2.Bel() 1
3.Bel( 1 ... n )
I 1
1
Bel( i )
iI
I 1, .. . n
,
I
26
BELIEF FUNCTIONS
For Θ = {A,B}
Bel( A B) Bel( A) Bel( B) Bel A B
BPA is unique and can recovered from the belief
function
m( A) (1)
A B
BelB
B A
27
BELIEF FUNCTIONS
Focal element
Core
A subset is a focal element if m(A)>0
The union of all the focal elements.
Theorem
B , Bel(B) 1 Core B
28
BELIEF FUNCTIONS
BELIEF INTERVALS
Ignorance in DS Theory:
Bel(A) Bel(A) 1
The width of the belief interval:
[Bel(A), 1- Bel(A)]
The sum of the belief committed to elements that intersect
A, but are not subsets of A
The width of the interval therefore represents the
amount of uncertainty in A, given the evidence
29
BELIEF FUNCTIONS
DEGREES OF DOUBT AND UPPER PROBABILITIES
One’s belief about a proposition A are not fully
described by one’s degree of belief Bel(A)
Bel(A) does not reveal to what extend one doubts A
Degree of Doubt:
Upper probability:
Dou( A) Bel( Ac )
P* ( A) 1 Dou( A)
P* ( A) 1 Bel( Ac )
m(B) m(B) m(B)
B
B Ac
B A
The total probability mass that can move into A.
30
30
BELIEF FUNCTIONS
DEGREES OF DOUBT AND UPPER PROBABILITIES
EXAMPLE
Subset
m
Bel
Dou
P*
{}
0
0
1
0
{1}
0.1
0.1
0.5
0.5
{2}
0.2
0.2
0.4
0.6
{3}
0.1
0.1
0.4
0.6
{1, 2}
0.1
0.4
0.1
0.9
{1, 3}
0.2
0.4
0.2
0.8
{2, 3}
0.2
0.5
0.1
0.9
{1, 2, 3}
0.1
1
0
1
m({1, 2}) = – Bel({1}) – Bel({2}) + Bel({1, 2})
= – 0.1 – 0.2 + 0.4
31
BELIEF FUNCTIONS
BAYESIAN BELIEF FUNCTIONS
A belief function Bel is called Bayesian if Bel is a
probability function.
The following conditions are equivalent
Bel is Bayesian
All the focal elements of Bel are singletons
For every A⊆Θ, Bel(A) Bel(A) 1
The inner measure can be characterized by the
condition that the focal elements are pairwise
disjoint.
32
BELIEF FUNCTIONS
BAYESIAN BELIEF FUNCTIONS
EXAMPLE
Suppose {a, b, c}
Subset
BPA
Belief
Φ
0
0
{a}
m1
m1
{b}
m2
m2
{c}
m3
m3
{a, b}
0
m1 + m2
{a, c}
0
m1 + m3
{b, c}
0
m2 + m3
{a, b, c}
0
m1 + m2 + m3 = 1
33
DEMPSTER RULE OF COMBINATION
Belief functions adapted to the representation of
evidence because they admit a genuine rule of
combination.
Several belief functions
Based on distinct bodies of evidence
Computing their “Orthogonal sum” using Dempster’s
rule
34
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS
m1: basic probability assignment for Bel1
A1,A2,…Ak : Bel1’s focal elements
m2: basic probability assignment for Bel2
B1,B2,…Bl : Bel2’s focal elements
35
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS
Probability
mass measure
of m1(Ai)m2(Bj)
committed to
Ai B j
36
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS
The
intersection of two strips m1(Ai) and m2(BJ)
has measure m1(Ai)m2(BJ) , since it is committed to
both Ai and to BJ , we say that the joint effect of Bel1
and Bel2 is to commit exactly to Ai B j
The total probability mass exactly committed to A:
m ( A )m ( B )
1
i
2
j
i, j
Ai B j A
37
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS
EXAMPLE
m1({1}) = 0.3
m1({2}) = 0.3
m1({1, 2}) = 0.4
m2({1}) = 0.2
{1}, 0.06
Φ, 0.06
{1}, 0.08
m2({2}) = 0.3
Φ, 0.09
{2}, 0.09
{2}, 0.12
m2({1, 2}) = 0.5
{1}, 0.15
{2}, 0.15
{1,2}, 0.2
Subset
Φ
{1}
{2}
{1, 2}
mc
0.15
0.29
0.36
0.2
38
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS
The only Difficulty
some of the squares may be committed to empty set
If Ai and Bj are focal elements of Bel1 and Bel2 and if
then
Ai B j
m (A )m (B ) 0
1
i
2
j
i,j
Ai B j φ
The only Remedy:
Discard all the rectangles committed to empty set
Inflate the remaining rectangles by multiplying them with
(1
1
m
(
A
)
m
(
B
)
0
)
1 i 2 j
i, j
Ai B j
39
DEMPSTER RULE OF COMBINATION
THE WEIGHT OF CONFLICT
The renormalizing factor measures the extent of
conflict between two belief functions.
Every instance in which a rectangle is committed to
corresponds to an instance which Bel1 and Bel2
commit probability to disjoint subsets Ai and Bj
m (A )m (B
k
1
i
2
j
)
i,j
Ai B j φ
K
1
1 k
log(K) log(
1
) log( 1 k)
1 k
40
DEMPSTER RULE OF COMBINATION
THE WEIGHT OF CONFLICT (CONT.)
Bel1 , Bel2 not conflict at all:
k = 0, Con(Bel1, Bel2)= 0
Bel1 , Bel2 flatly contradict each other:
Bel1 Bel2 does not exist
k = 1, Con(Bel1, Bel2) = ∞
In previous example k = 0.15
41
DEMPSTER’S RULE OF COMBINATION
Suppose m1 and m2 are basic probability
functions over Θ. Then m1⊕m2 is given by
In previous example
Subset
Φ
{1}
{2}
{1, 2}
m=m1⊕m2
0
0.3412
0.4235
0.2353
42
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
Frame of Discernment: A set of mutually
exclusive alternatives:
SIT, STAND,WALK
All subsets of FoD form:
{}, SIT, STAND, WALK , SIT, STAND,
2 SIT, WALK , STAND, WALK ,
SIT, STAND,WALK
43
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
Exercise deploys two “evidences in features” m1
and m2
m1 is based on MEAN features from Sensor1
m1 provides evidences for {SIT} and {¬SIT}
({¬SIT} = {STAND, WALK})
m2 is based on VARIANCE features from Sensor1
m2 provides evidences for {WALK} and {¬WALK }
({¬WALK } = {SIT, STAND})
44
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
Pl(A) Pls(A) P (A) 1 BelA 1 mB
*
B A
45
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
CALCULATION OF EVIDENCE M1
Bel(SIT) = 0.2
Pls(SIT) = 1 - Bel(¬SIT) = 0.5
Evidence
Concrete
value z1(t)
m1Concrete Value(SIT, ¬SIT, ) = (0.2, 0.5, 0.3)
z1=mean(S1)
46
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
CALCULATION OF EVIDENCE M2
Evidence
Concrete
value z2(t)
Bel(WALK) = 0.4
Pls(WALK) = 1-Bel(¬WALK) = 0.5
z2=variances(S1)
m2Concrete Value(WALK, ¬WALK, ) = (0.4, 0.5, 0.1)
47
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
DS THEORY COMBINATION
Applying Dempster´s Combination Rule:
m m1 m2
m1(SIT) = 0.2
m1(¬SIT) =
m1(STAND,WALK) = 0.5
m1(ALL) = 0.3
m2(WALK) = 0.4
m({}) = 0.08
m(WALK) = 0.2
m(WALK) = 0.12
m2(¬WALK) =
m2(STAND,SIT) = 0.5
m(SIT) = 0.1
m(STAND) = 0.25
m(STAND,SIT) =
0.15
m2(ALL) = 0.1
m(SIT) = 0.02
m(STAND,WALK)=0.05
m(ALL) = 0.03
Due to m({}) Normalization with 0.92 (=1-0.08)
48
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
NORMALIZED VALUES
m1(SIT)= 0.2
m1(¬SIT) =
m1(STAND,WALK) =0.5
m1(ALL) = 0.3
m2(WALK)= 0.4
0
m(WALK)=0.217
m(WALK)=0.13
m2(¬WALK) =
m2(STAND,SIT) =0.5
m(SIT)=0.108
m(STAND)=0.272
m(STAND,SIT)=
0.163
m2(ALL) = 0.1
m(SIT)=0.022
m(STAND,WALK)=0.054
m(ALL)=0.033
Belief(STAND) = 0.272
Plausibility(STAND) = 1 - (0.108+0.022+0.217+0.13) = 0.523
49
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
BELIEF AND PLAUSIBILITY (SIT)
Ground Truth: 1: Sitting; 2: Standing; 3: Walking
50
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
DS CLASSIFICATION
51
DEMPSTER RULE OF COMBINATION
PROBLEMS IN COMBINING EVIDENCE
Unfortunately, the above approach doesn't work
It satisfies the second assumption about mass
assignments, that the masses add to 1
But it usually conflicts with the first assumption, that
the mass of the empty set is zero
Why?
Because some subsets X and Y don't intersect, so their
intersection is the empty set
So when we apply the formula, we end up with non-zero
mass assigned to the empty set
We can’t arbitrarily assign m1⊕m2() = 0 because the
sum of m1⊕m2 will no longer be 1
52
ZADEH’S OBJECTION TO DS THEORY
Suppose two doctors A and B have the following
beliefs about a patient's illness:
mA(meningitis) = 0.99
mA(concussion) = 0.00
mA(brain tumor) = 0.01
mB(meningitis) = 0.00
mB(concussion) = 0.99
mB(brain tumor) = 0.01
then
k = mA(meningitis) * mB(concussion)
+ mA(meningitis) * mB(brain tumor)
+ mA(brain tumor) * mB(concussion)
= 0.9999
so
mA ⊕ mB (brain tumor) = (.01 * .01) / (1 - .9999)
=1
53
GENERALIZED DS THEORY
Body of evidence
Consider Ω = {w1, w2, ..., wn}
{A1, A2, …, An}{m1, m2, …, mn}, Φ≠Ai Ω
Fuzzy Body of evidence
Yen’s generalization
54
EXAMPLE
Consider body of evidence in DS theory over
Ω = {1, 2, …, 10} with focal elements:
We want to compute Bel(B) and Pls(B) where
55
EXAMPLE
Acgcording Yen’s generalization A and C whit A and
C’s α-cuts then distribute their BPA among α-cuts
Coputing belief and plusibility
56
AN APPLICATION OF DECISION MAKING
METHOD BASED ON FUZZIFICATED DS THEORY
A CASE STUDY IN MEDICINE
Consider these rules:
If “A change in breast skin” Then status is “malignant”
If “No change in breast skin” Then status is “unknown”
If “Adenoma dwindles” Then status is “benign”
If “Adenoma does not dwindle” Then status is “unknown”
Suppose we have the following probabilistic
P(“A change in breast skin” ) = 0.7
P(“No change in breast skin” ) = 0.3
P(“Adenoma dwindles” ) = 0.4
P(“Adenoma does not dwindle” ) = 0.6
57
AN APPLICATION OF DECISION MAKING
METHOD BASED ON FUZZIFICATED DS THEORY
body of evidence
{malignant , [total range]}
m1 (malignant) = 0.7 , m1 ([total range]) = 0.3
{benign , [total range]}
m2 (benign) = 0.4 ,
m2 ([total range]) = 0.6
combining body of evidence
m12 (benign) = 0.1476
m12 (malignant) = 0.5164
m12 (benign malignant) = 0.1147
m12 ([total range]) = 0.2213
58
AN APPLICATION OF DECISION MAKING METHOD
BASED ON FUZZIFICATED DS THEORY
Definition
Fuzzy Valued Bel and Pls Functions
59
AN APPLICATION OF DECISION MAKING METHOD
BASED ON FUZZIFICATED DS THEORY
Pr(benign)
Pr(malignant)
60
AN APPLICATION OF DECISION MAKING
METHOD BASED ON FUZZIFICATED DS THEORY
Calculating risk functions based on following
equation
61
AN APPLICATION OF DECISION MAKING METHOD
BASED ON FUZZIFICATED DS THEORY
a. Fuzzy set of risk function values for benign
prediction
b.Fuzzy set of risk function values for malignant
prediction
62
AN APPLICATION OF DECISION MAKING METHOD
BASED ON FUZZIFICATED DS THEORY
The final step
rejecting uncertainties (fuzzyness and ignorance)
to obtain a scalar value
These answers are calculated by
63
THANKS
64