IN THE NAME OF ALLAH DECISION MAKING BY USING THE …

Download Report

Transcript IN THE NAME OF ALLAH DECISION MAKING BY USING THE …

IN THE NAME OF ALLAH
DECISION MAKING BY USING
THE THEORY OF EVIDENCE
STUDENTS:
HOSSEIN SHIRZADEH, AHAD OLLAH EZZATI
SUPERVISOR:
Prof. BAGERI SHOURAKI
SPRING 2009
1
OUTLINES

INTRODUTION

BELIEF

FRAMES OF DISCERNMENT

COMBINIG THE EVIDENCE

ADVANTAGES OF DS THEORY

DISADVANTAGES OF DS THEORY

BASIC PROBABLITY ASSIGNMENT

BELIEF FUNCTIONS

DEMPSTER RULE OF COMBINATION

ZADEH’S OBJECTION TO DS THEORY

GENERALIZED DS THEORY

AN APPLICATION OF DECISION MAKING METHOD
2
INTRODUCTION
 Introduced

by Glenn Shafer in 1976
“A mathematical theory of evidence”
A
new approach to the
representation of uncertainty
 What means uncertainty? Most
people don’t like uncertainty
 Applications
Expert systems
 Decision making
 Image processing, project planning, risk
analysis,…

3
INTRODUCTION
 All
students of partial belief have tied it
to Bayesian theory and
I.
Committed to the value of idea and
defend it
II.
Rejected the theory (Proof of inviability)
4
INTRODUCTION
BELIEF FUNCTION
•  : Finite set
• Set of all subsets :
• Bel : 2  [0,1]
2
(1) Bel( )  0
(2) Bel()  1
(3) Bel( 1  ...  n ) 
n 1
Bel
(

)

Bel
(



)

...

(

1
)
Bel( 1  ...  n )


i
i
j
i
i j
• Then Bell is called belief function on
5
INTRODUCTION
BELIEF FUNCTION


Bel : 2  [0,1] is called simple support
function if
 There exists a non-empty subset A of and
0  s  1 that
0 if B does not contain A

Bell ( B )   s if B contain A but B  
1 if B  

6
INTRODUCTION
THE IDEA OF CHANCE

For several centuries the idea of numerical
degree of belief has been identified with the
idea of chance.
Evidence Theory is intelligible only if we reject
this unification

Chance :
A random experiment : unknown outcome
 The proportion of the time that a particular one the
possible outcomes tends to occur

7
INTRODUCTION
THE IDEA OF CHANCE
• Chance density
– Set of all possible outcomes :X
– Chance q(x) specified for each possible
outcome
q :   [0,1]
– A chance density must satisfy :
q( x)  1


x
8
INTRODUCTION
THE IDEA OF CHANCE
• Chance function
– Proportion of time that the actual
outcome tends to be in a particular subset
of X. Ch(U )  q( x)

xU
– Ch is a chance function if and only it
obeys the following
(1) Ch( )  0
(2) Ch(  )  1
(3) if U ,V   and U  V  
then Ch(U  V )  Ch(U )  Ch(V )
9
INTRODUCTION
CHANCES AS DEGREES OF BELIEF
• If we know the chances then we will
surely adopt them as our degrees of
belief
• We usually don’t know the chances
– We have little idea about what chance
density governs a random experiment
– Scientist is interested in a random
experiment precisely because it might
be governed by any one of several
chance densities
10
INTRODUCTION
CHANCES AS DEGREES OF BELIEF
• Chances :
– Features of the world
• This is the way shafer addresses chance
– Features of our knowledge or belief
• Simon Laplace
– Deterministic
• Since the advent of Quantum mechanics this
view has lost it’s grip on physics
11
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
• Very Popular theory of partial belief
– Called Bayesian after Thomas Bayes
• Adapts the three basic rules for
chances as rules for one’s degrees of
belief based on a given body of
evidence.
• Conditioning : changing one’s degree of
belief when that evidence is
augmented by the knowledge of a
particular proposition
12
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
Bell :2  [0, 1] obey s
(1) Bel ( )  0
(2) Bell ()  1
(3) If A  B   , then Bell  A  B)  Bell ( A)  Bell ( B) 
When we learn that A   is true then
Bell ( A  B)
(4) Bell A ( B) 
Bell ( A)
13
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
• The Bayesian theory is contained in
Shafer’s evidence theory as a
restrictive special case.
• Why is Bayesian Theory too
restrictive?
– The representation of Ignorance
– Combining vs. Conditioning
14
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
THE REPRESENTATION OF IGNORANCE
In Evidence Theory
• Belief functions
– Little evidence:
• Both the proposition and it’s
negation have very low degrees of
belief
– Vacuous belief function
0 if A  
Bel( A)  

1 if A   
15
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
COMBINATION VS. CONDITIONING
• Dempster rule
– A method for changing prior opinion in the
light of new evidence
• Deals symmetrically with the new and
old evidence
• Bayesian Theory
– Bayes rule of conditioning
• No Obvious symmetry
• Must assume exact and full effect of the
new evidence is to establish a single
proposition with certainty
16
INTRODUCTION
BAYESIAN THEORY OF PARTIAL BELIEF
THE REPRESENTATION OF IGNORANCE
• In Bayesian Theory:
– Can not distinguish between lack of belief and disbelief
– A
can not be low unless Bel (A ) is high
– Failure to believe A necessitates accordance of belief to
Bel( A)
– Ignorance represented by :
1
Bel ( A )  Bel (A) 
2
– Important factor in the decline of Bayesian ideas in the
nineteenth century
– In DS theory
17
Bel(A) Bel(A)  1
BELIEF



The belief in a particular hypothesis is denoted
by a number between 0 and 1
The belief number indicates the degree to which
the evidence supports the hypothesis
Evidence against a particular hypothesis is
considered to be evidence for its negation (i.e., if
Θ = {θ1, θ2, θ3}, evidence against {θ1} is considered
to be evidence for {θ2, θ3}, and belief will be
allotted accordingly)
18
FRAMES OF DISCERNMENT

Dempster - Shafer theory assumes a fixed,
exhaustive set of mutually exclusive events

Θ = {θ1, θ2, ..., θn}


Same assumption as probability theory
Dempster - Shafer theory is concerned with the set of
all subsets of Θ, known as the Frame of Discernment

2Θ = {, {θ1}, …, {θn}, {θ1, θ2}, …, {θ1, θ2, ... θn}}

Universe of mutually exclusive hypothesis
19
FRAMES OF DISCERNMENT
A
subset {θ1, θ2, θ3} implicitly represents
the proposition that one of θ1, θ2 or θn is the
case
 The
complete set Θ represents the
proposition that one of the exhaustive set of
events is true

So Θ is always true
empty set  represents the proposition
that none of the exhaustive set of events is
true
 The

So  always false
20
COMBINING THE EVIDENCE
 Dempster-Shafer
Theory as a theory
of evidence has to account for the
combination of different sources of
evidence
 Dempster & Shafer’s Rule of
Combination is a essential step in
providing such a theory
 This rule is an intuitive axiom that
can best be seen as a heuristic rule
rather than a well-grounded axiom.
21
ADVANTAGES OF DS THEORY
 The
difficult problem of specifying priors
can be avoided
 In
addition to uncertainty, also ignorance
can be expressed
 It
is straightforward to express pieces of
evidence with different levels of
abstraction
 Dempster’s
combination rule can be used
to combine pieces of evidence
22
DISADVANTAGES
 Potential
computational complexity
problems
 It
lacks a well-established decision theory
whereas Bayesian decision theory
maximizing expected utility is almost
universally accepted.
 Experimental
comparisons between DS
theory and probability theory seldom done
and rather difficult to do; no clear
advantage of DS theory shown.
23
BASIC PROBABILITY ASSIGNMENT
 The
basic probability assignment (BPA),
represented as m, assigns a belief number
[0,1] to every member of 2Θ such that the
numbers sum to 1
(1) m( )  0
(2)
 m( A)  1
A 
 m(A)
represents the maesure of the belief
that is committed exactly to A (to individual
element A and to no smaller subset)
24
BASIC PROBABILITY ASSIGNMENT
EXAMPLE
suppose   {Blue, Black, Yellow, Other}
 Diagnostic problem


No information
m ( )  1

60 of 100 are blue
m({Blue})  0.6
m()  0.4

30 of 100 are blue and rest of them are black or yellow
m({Blue})  0.3
m({Black, Yellow})  0.7
m()  0
25
25
BELIEF FUNCTIONS

Obtaining the measure of the total belief
committed to A:
Bel( A)   m( B)
B A

Belief functions can be characterized without
reference to basic probability assignments:
1.Bel( )  0
2.Bel()  1
3.Bel( 1  ...  n ) 
I 1



1
Bel(   i )

iI
I  1, .. . n
,
I 
26
BELIEF FUNCTIONS

For Θ = {A,B}
Bel( A  B)  Bel( A)  Bel( B)  Bel A  B

BPA is unique and can recovered from the belief
function
m( A)   (1)
A B
BelB
B A
27
BELIEF FUNCTIONS

Focal element


Core


A subset    is a focal element if m(A)>0
The union of all the focal elements.
Theorem
B   , Bel(B)  1 Core  B
28
BELIEF FUNCTIONS
BELIEF INTERVALS

Ignorance in DS Theory:
Bel(A) Bel(A)  1

The width of the belief interval:
[Bel(A), 1- Bel(A)]


The sum of the belief committed to elements that intersect
A, but are not subsets of A
The width of the interval therefore represents the
amount of uncertainty in A, given the evidence
29
BELIEF FUNCTIONS
DEGREES OF DOUBT AND UPPER PROBABILITIES

One’s belief about a proposition A are not fully
described by one’s degree of belief Bel(A)

Bel(A) does not reveal to what extend one doubts A

Degree of Doubt:

Upper probability:
Dou( A)  Bel( Ac )
P* ( A)  1  Dou( A)
P* ( A)  1  Bel( Ac ) 
 m(B)   m(B)  m(B)
B 

B Ac
B A
The total probability mass that can move into A.
30
30
BELIEF FUNCTIONS
DEGREES OF DOUBT AND UPPER PROBABILITIES
EXAMPLE
Subset
m
Bel
Dou
P*
{}
0
0
1
0
{1}
0.1
0.1
0.5
0.5
{2}
0.2
0.2
0.4
0.6
{3}
0.1
0.1
0.4
0.6
{1, 2}
0.1
0.4
0.1
0.9
{1, 3}
0.2
0.4
0.2
0.8
{2, 3}
0.2
0.5
0.1
0.9
{1, 2, 3}
0.1
1
0
1
m({1, 2}) = – Bel({1}) – Bel({2}) + Bel({1, 2})
= – 0.1 – 0.2 + 0.4
31
BELIEF FUNCTIONS
BAYESIAN BELIEF FUNCTIONS



A belief function Bel is called Bayesian if Bel is a
probability function.
The following conditions are equivalent

Bel is Bayesian

All the focal elements of Bel are singletons

For every A⊆Θ, Bel(A) Bel(A)  1
The inner measure can be characterized by the
condition that the focal elements are pairwise
disjoint.
32
BELIEF FUNCTIONS
BAYESIAN BELIEF FUNCTIONS
EXAMPLE

Suppose   {a, b, c}
Subset
BPA
Belief
Φ
0
0
{a}
m1
m1
{b}
m2
m2
{c}
m3
m3
{a, b}
0
m1 + m2
{a, c}
0
m1 + m3
{b, c}
0
m2 + m3
{a, b, c}
0
m1 + m2 + m3 = 1
33
DEMPSTER RULE OF COMBINATION

Belief functions adapted to the representation of
evidence because they admit a genuine rule of
combination.

Several belief functions

Based on distinct bodies of evidence

Computing their “Orthogonal sum” using Dempster’s
rule
34
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS

m1: basic probability assignment for Bel1


A1,A2,…Ak : Bel1’s focal elements
m2: basic probability assignment for Bel2

B1,B2,…Bl : Bel2’s focal elements
35
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS
Probability
mass measure
of m1(Ai)m2(Bj)
committed to
Ai  B j
36
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS
 The
intersection of two strips m1(Ai) and m2(BJ)
has measure m1(Ai)m2(BJ) , since it is committed to
both Ai and to BJ , we say that the joint effect of Bel1
and Bel2 is to commit exactly to Ai  B j

The total probability mass exactly committed to A:
 m ( A )m ( B )
1
i
2
j
i, j
Ai  B j  A
37
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS
EXAMPLE
m1({1}) = 0.3
m1({2}) = 0.3
m1({1, 2}) = 0.4
m2({1}) = 0.2
{1}, 0.06
Φ, 0.06
{1}, 0.08
m2({2}) = 0.3
Φ, 0.09
{2}, 0.09
{2}, 0.12
m2({1, 2}) = 0.5
{1}, 0.15
{2}, 0.15
{1,2}, 0.2
Subset
Φ
{1}
{2}
{1, 2}
mc
0.15
0.29
0.36
0.2
38
DEMPSTER RULE OF COMBINATION
COMBINING TWO BELIEF FUNCTIONS

The only Difficulty

some of the squares may be committed to empty set

If Ai and Bj are focal elements of Bel1 and Bel2 and if
then
Ai  B j  
 m (A )m (B )  0
1
i
2
j
i,j
Ai  B j φ

The only Remedy:


Discard all the rectangles committed to empty set
Inflate the remaining rectangles by multiplying them with
(1 
1
m
(
A
)
m
(
B
)

0
)
 1 i 2 j
i, j
Ai  B j 
39
DEMPSTER RULE OF COMBINATION
THE WEIGHT OF CONFLICT


The renormalizing factor measures the extent of
conflict between two belief functions.
Every instance in which a rectangle is committed to
 corresponds to an instance which Bel1 and Bel2
commit probability to disjoint subsets Ai and Bj
 m (A )m (B
k
1
i
2
j
)
i,j
Ai  B j  φ
K
1
1 k
log(K)  log(
1
)   log( 1  k)
1 k
40
DEMPSTER RULE OF COMBINATION
THE WEIGHT OF CONFLICT (CONT.)

Bel1 , Bel2 not conflict at all:



k = 0, Con(Bel1, Bel2)= 0
Bel1 , Bel2 flatly contradict each other:

Bel1  Bel2 does not exist

k = 1, Con(Bel1, Bel2) = ∞
In previous example k = 0.15
41
DEMPSTER’S RULE OF COMBINATION


Suppose m1 and m2 are basic probability
functions over Θ. Then m1⊕m2 is given by
In previous example
Subset
Φ
{1}
{2}
{1, 2}
m=m1⊕m2
0
0.3412
0.4235
0.2353
42
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY

Frame of Discernment: A set of mutually
exclusive alternatives:
  SIT, STAND,WALK 

All subsets of FoD form:
{}, SIT, STAND, WALK , SIT, STAND, 



2  SIT, WALK , STAND, WALK ,

SIT, STAND,WALK 



43
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY

Exercise deploys two “evidences in features” m1
and m2

m1 is based on MEAN features from Sensor1

m1 provides evidences for {SIT} and {¬SIT}
({¬SIT} = {STAND, WALK})

m2 is based on VARIANCE features from Sensor1

m2 provides evidences for {WALK} and {¬WALK }
({¬WALK } = {SIT, STAND})
44
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
Pl(A) Pls(A) P (A)  1  BelA   1   mB
*
B A
45
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
CALCULATION OF EVIDENCE M1
Bel(SIT) = 0.2
Pls(SIT) = 1 - Bel(¬SIT) = 0.5
Evidence
Concrete
value z1(t)
m1Concrete Value(SIT, ¬SIT, ) = (0.2, 0.5, 0.3)
z1=mean(S1)
46
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
CALCULATION OF EVIDENCE M2
Evidence
Concrete
value z2(t)
Bel(WALK) = 0.4
Pls(WALK) = 1-Bel(¬WALK) = 0.5
z2=variances(S1)
m2Concrete Value(WALK, ¬WALK, ) = (0.4, 0.5, 0.1)
47
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
DS THEORY COMBINATION

Applying Dempster´s Combination Rule:
m  m1  m2
m1(SIT) = 0.2
m1(¬SIT) =
m1(STAND,WALK) = 0.5
m1(ALL) = 0.3
m2(WALK) = 0.4
m({}) = 0.08
m(WALK) = 0.2
m(WALK) = 0.12
m2(¬WALK) =
m2(STAND,SIT) = 0.5
m(SIT) = 0.1
m(STAND) = 0.25
m(STAND,SIT) =
0.15
m2(ALL) = 0.1
m(SIT) = 0.02
m(STAND,WALK)=0.05
m(ALL) = 0.03

Due to m({})  Normalization with 0.92 (=1-0.08)
48
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
NORMALIZED VALUES
m1(SIT)= 0.2
m1(¬SIT) =
m1(STAND,WALK) =0.5
m1(ALL) = 0.3
m2(WALK)= 0.4
0
m(WALK)=0.217
m(WALK)=0.13
m2(¬WALK) =
m2(STAND,SIT) =0.5
m(SIT)=0.108
m(STAND)=0.272
m(STAND,SIT)=
0.163
m2(ALL) = 0.1
m(SIT)=0.022
m(STAND,WALK)=0.054
m(ALL)=0.033
Belief(STAND) = 0.272
Plausibility(STAND) = 1 - (0.108+0.022+0.217+0.13) = 0.523
49
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
BELIEF AND PLAUSIBILITY (SIT)
Ground Truth: 1: Sitting; 2: Standing; 3: Walking
50
DEMPSTER RULE OF COMBINATION
AN APPLICATION OF DS THEORY
DS CLASSIFICATION
51
DEMPSTER RULE OF COMBINATION
PROBLEMS IN COMBINING EVIDENCE

Unfortunately, the above approach doesn't work

It satisfies the second assumption about mass
assignments, that the masses add to 1

But it usually conflicts with the first assumption, that
the mass of the empty set is zero

Why?



Because some subsets X and Y don't intersect, so their
intersection is the empty set
So when we apply the formula, we end up with non-zero
mass assigned to the empty set
We can’t arbitrarily assign m1⊕m2() = 0 because the
sum of m1⊕m2 will no longer be 1
52
ZADEH’S OBJECTION TO DS THEORY

Suppose two doctors A and B have the following
beliefs about a patient's illness:
mA(meningitis) = 0.99
mA(concussion) = 0.00
mA(brain tumor) = 0.01
mB(meningitis) = 0.00
mB(concussion) = 0.99
mB(brain tumor) = 0.01
then
k = mA(meningitis) * mB(concussion)
+ mA(meningitis) * mB(brain tumor)
+ mA(brain tumor) * mB(concussion)
= 0.9999
so
mA ⊕ mB (brain tumor) = (.01 * .01) / (1 - .9999)
=1
53
GENERALIZED DS THEORY

Body of evidence
Consider Ω = {w1, w2, ..., wn}
{A1, A2, …, An}{m1, m2, …, mn}, Φ≠Ai  Ω

Fuzzy Body of evidence

Yen’s generalization
54
EXAMPLE

Consider body of evidence in DS theory over
Ω = {1, 2, …, 10} with focal elements:
We want to compute Bel(B) and Pls(B) where
55
EXAMPLE


Acgcording Yen’s generalization A and C whit A and
C’s α-cuts then distribute their BPA among α-cuts
Coputing belief and plusibility
56
AN APPLICATION OF DECISION MAKING
METHOD BASED ON FUZZIFICATED DS THEORY
A CASE STUDY IN MEDICINE
 Consider these rules:

If “A change in breast skin” Then status is “malignant”
 If “No change in breast skin” Then status is “unknown”




If “Adenoma dwindles” Then status is “benign”
If “Adenoma does not dwindle” Then status is “unknown”
Suppose we have the following probabilistic
P(“A change in breast skin” ) = 0.7
 P(“No change in breast skin” ) = 0.3
 P(“Adenoma dwindles” ) = 0.4
 P(“Adenoma does not dwindle” ) = 0.6

57
AN APPLICATION OF DECISION MAKING
METHOD BASED ON FUZZIFICATED DS THEORY

body of evidence
{malignant , [total range]}
m1 (malignant) = 0.7 , m1 ([total range]) = 0.3
{benign , [total range]}
m2 (benign) = 0.4 ,
m2 ([total range]) = 0.6

combining body of evidence
m12 (benign) = 0.1476
m12 (malignant) = 0.5164
m12 (benign malignant) = 0.1147
m12 ([total range]) = 0.2213
58
AN APPLICATION OF DECISION MAKING METHOD
BASED ON FUZZIFICATED DS THEORY

Definition
Fuzzy Valued Bel and Pls Functions
59
AN APPLICATION OF DECISION MAKING METHOD
BASED ON FUZZIFICATED DS THEORY

Pr(benign)

Pr(malignant)
60
AN APPLICATION OF DECISION MAKING
METHOD BASED ON FUZZIFICATED DS THEORY

Calculating risk functions based on following
equation
61
AN APPLICATION OF DECISION MAKING METHOD
BASED ON FUZZIFICATED DS THEORY
a. Fuzzy set of risk function values for benign
prediction
 b.Fuzzy set of risk function values for malignant
prediction

62
AN APPLICATION OF DECISION MAKING METHOD
BASED ON FUZZIFICATED DS THEORY

The final step
rejecting uncertainties (fuzzyness and ignorance)
to obtain a scalar value

These answers are calculated by
63
THANKS
64