Expert Judgement

Download Report

Transcript Expert Judgement

1
Expert Judgment
EMSE 280 Techniques of Risk
Analysis and Management
2
Expert Judgment
•
Why Expert Judgement?
– Risk Analysis deals with events with low intrinsic rates of
occurrence  not much data available.
– Data sources not originally constructed with a Risk Analysis in
mind can be in a form inadequate form for the analysis.
– Data sources can be fraught with problems e.g. poor entry, bad
data definitions, dynamic data definitions
– Cost, time, or technical considerations
Expert Judgment
•
3
Issues in the Use of Expert Judgement
– Selection of Experts
• Wide enough to encompass all facets of scientific thought on the topic
• Qualifications\criteria need to be specified
– Pitfalls in Elicitation – Biases
• Mindsets – unstated assumptions that the expert uses
• Structural Biases – from level of detail or choice of background scales for
quantification
• Motivational Biases – expert has a stake in the study outcome
• Cognitive Biases
– Overconfidence – manifested in uncertainty estimation
– Anchoring – expert subconsciously bases his judgement on some
previously given estimate
– Availability – when events that are easily (difficult) to recall are likely
to be overestimated (underestimated)
Expert Judgment
–
Avoiding Pitfalls
• Be aware
• Carefully design elicitation process
• Perform a dry run elicitation with a group of experts not
participating in the study
• Strive for uniformity in elicitation sessions
• Never perform elicitation session without the presence of a
qualified analyst
• Guaranteeing Anonymity of Experts
–
Combination of Expert Judgements
• Technical and political issues
4
5
Basic Expert Judgment for Priors
• Method of Moments
– Expert provides most likely value at parameter, q, say q* and a
range qL,qU
– for a distribution f(q) we equate
E[q]=(qL+4q*+qU)/6 Var[q}= [(qU-qL)/6]2
– And solve for distribution parameters
• Method of Range
– Expert provides maximum possible range for q say qL,qU
– for a distribution f(q) with CDF F(q) we equate
– F(qU) = .95
F(qL) = .05
– And solve for distribution parameters
6
Combining Expert Judgment: Paired Comparison
• Description
– Paired Comparison is general name for a technique used to
combine several experts’ beliefs about the relative probabilities
(or rates of occurrence) of certain events.
• Setup
– E
# experts
– a1, …, an
object to be compared
– v1, …, vn
true value of the objects
– v1,r, …, vn,r
internal value of object i for expert r
– Experts are asked a series (specifically a total of n taken 2 at a
time) of paired comparisons ai, vs aj
– ai, >> aj by e  e thinks P(ai) > P(aj)
Combining Expert Judgment:
Paired Comparison
• Statistical Tests
– Significance of Expert e’s Preferences (Circular Triad)
Test
H0 Expert e Answered Random
Ha Expert e Did Not Answered Randomly
A circular triad is a set of preferences
ai, >> aj , aj >> ak , ak >> ai
Define
c
# circular triads in a comparison of n objects and
Nr(i)
the number of times that expert r prefers ai to another
object  expert data Nr(1), …, Nr(n), r = 1, …, e.
c(r)
the number of circular triads in expert r’s preferences
n( n  1) 1 
1

c( r ) 
   N r ( i )  n  1
24
2 i 1 
2

2
n
2
David(1963)
7
Combining Expert Judgment:
Paired Comparison
8
– Significance of Expert e’s Preferences (Circular Triad)
• Kendall (1962)
– tables of the Pr{c(r) >c*} under H0 that the expert
answered in a random fashion for n = 2, …, 10
– developed the following statistic for comparing n items in
a random fashion,
n( n  1)(n  2)  8   1  n 
1
c' ( e ) 

     c ( e )  
2
( n  4)
2
 n  4   4  3 
When n>7, this statistic has (approximately) a chisquared distribution with df = nn  1n  2
n  42
– perform a standard one-tailed hypothesis test. If H0 for
any expert cannot be rejected at the 5% level of
significance i.e. Pr{2c’(e)}>.05, the expert is dropped
Combining Expert Judgment:
Paired Comparison
• Statistical Tests
– Agreement of Experts : coefficient of agreement
Test
H0 Experts Agreement is Due to Chance
Ha Experts Agreement is not Due to Chance
Define
N(i,j)
denote the number of times ai >> aj.
 N ( i, j ) 
2  

2 
i  1 j  1, j  i 
u
1
 e  n 
  
 2  2 
n
n
coefficient of agreement
attains a max of 1 for complete agreement
9
Combining Expert Judgment:
Paired Comparison
– Agreement of Experts : coefficient of agreement
• tabulated distributions of
 N ( i, j ) 




2 
i  1 j  1, j  i 
n
n
for small values of n and e under H0
• These are used to test hypothesis concerning u. For large
values of n and e, Kendall (1962) developed the statistic
 n n  N ( i , j )  1  e  n  e  3  
4  
    

2
e

2
2
2
2

i 1 j 1, j  i 

  
u'  
e2
which under H0 has (approx.) a chi squared distribution with
.
 n
 e e  1
2

df 
e  22
we want to reject at the 5% level and
fail if Pr{2u’}>.05
10
Combining Expert Judgment:
Paired Comparison
• Statistical Tests
– Agreement of Experts : coefficient of concordance
Define
R (i,r) denote the rank of ai obtained expert r’s responses


R( j , r ) 

n  e

S     R( i , r )  j 1 r 1
n

i 1  r 1




n
w
S
1 2 3
e ( n  n)
12
e
2
coefficient of concordance
Again attains max at 1 for complete agreement
11
Combining Expert Judgment:
Paired Comparison
– Agreement of Experts : coefficient of concordance
• Tables of critical values developed for distribution of S
under H0 for 3n7 and 3n20 by Siegel (1956)
• For n>7, Siegel (1956) provides the the statistic
w' 
S
1
en( n  1)
12
Which is (approx) Chi Squared with dfn1
Again we should reject a the 5% level of significance
12
13
Paired Comparison: Thurstone Model
• Assumptions
vi,r ~N(i, i2) with i = vi and
1
2
 i2 =  2
3
Probability that 3 beats 2 or
3 is preferred to 2
Think of this as tournament play
14
Paired Comparison: Thurstone Model
• Assumptions
vi,r ~N(i, i2) with i = vi and
 i2 =  2
• Implications
vi,r vj,r ~N(i j, 22) ~N(i,j, 22) (experts assumed indep)
 ai is preferred to aj by expert r with probability
 i, j
Pr{vi ,r  v j , r  0}  Pr{Z 
}
2
i, j
 i, j 

 Pr{Z 
}   
2
 2 
if pi,j is the % of experts that preferred ai to aj then
i , j i   j
 i , j 
1
 or   pi , j  
pi , j   

2
2
 2 
15
Paired Comparison: Thurstone Model
• Establishing Equations
i   j
Le t x i,j    pi , j  
2
1
Then we can establish a set of
scaling constant so that
 n

 2
equations


by choosing a
xi,j  i   j
as this is an over specified system for we solve for i such that
n
we get m in
1 ,.., n


x i,j  (  i   j ) and est (  i ) 
i, j
2
x
j 1
i,j
n
Mosteler (1951) provides a goodness of fit test based on an approx
Chi-Squared Value
16
Paired Comparison: Bradley-Terry Model
• Assumptions
vi
r ( i , j )  Pr{ai is pre fe re dto a j } 
vi  v j
Thus each paired comparison is the result of a Bernoulli rv for a
single expert , a binomial rv for he set of experts
n
v
vi are determined up to a constant so we can assume
i 1
Define N ( i ) 
e
 N (i)
r 1
r
then vi can be found as the solution to
N (i)
hi 
 h  h 
n
e
j  1, j  i
1
i
j
i
1
17
Paired Comparison: Bradley-Terry Model
Iterative solution Ford (1956)
hi
( k 1 )

 h
i 1
j 1
i
N (i) / e
(k)
 hj

( k 1 ) 1

 h
n
j  i 1
i
(k)
 hj

( k ) 1
Ford (1957) notes that the estimate obtained is the MLE and that
the solution is unique and convergence under the conditions that it
is not possible to separate the n objects into two sets where all
experts deem that no object in the first set is more preferable to
any object in the second set.
Bradley (1957) developed a goodness of fit test based on
n
n
n
n n

F  2  N ( i , j ) ln(r ( i , j )   N ( i ) ln(vi )    e ln(vi  v j ) 
i 1
i 1 j  i 1
 i 1 j 1, j i

(asymptotically) distributed as a chi-square distribution with
df = (n1)(n2)/2
18
Paired Comparison: NEL Model
•
Motivation
– If Ti~exp(li) then
Pr{Ti  Tj } 
li
li  l j
– For a set of exponential random variables,we may ask experts
which one will occur first
– We can use all of the Bradley-Terry machinery to estimate li
– We need only have a separate estimate one particular l to
anchor all the others
Combination of Expert Judgment:
Bayesian Techniques
•
Method of Winkler (1981) & Mosleh and Apostolakis (1986)
– Set Up
• X
an unknown quantity of interest
• x1, …, xe
estimates of X from experts 1, …, e
• p(x)
DM’s prior density for X
• Then p(x | x1 ,..,x e )  p(x1 ,...,x e | x)p(x)
–
 p(x1,...,x e | x)?
If the experts are independent
e
p(x1 ,...,x e | x)   p(xi | x)
i 1
 p(xi | x)?
19
Combination of Expert Judgment:
Bayesian Techniques
•
Method of Winkler (1981) & Mosleh and Apostolakis (1986)
– Approach
AdditiveModel
xi  x   i  i ~ N( i ,  i2 ) and indep.
MultiplicativeModel
xi  x i Ln( i ) ~ N( i ,  i2 ) and indep.
where the parameters μi and σi are selected by the DM to
reflect his\her opinion about the experts’ biases and accuracy
• Under the assumptions of the linear (multiplicative) model,
the likelihood is simply the value of the normal (lognormal)
density with parameters x+μi and σi .
• Then for the additive model we have
Proposition 10.1 Let thedecision maker's prior p( x) be normal
with mean xe 1 and standarddeviation e 1 , thenp(x | x1 ,...,xe )
is normalwith mean and variancegiven by
20
Combination of Expert Judgment:
Bayesian Techniques
e 1
E(X | x1 ,..., xe )   wi ( xi   i ) and
i 1
VAR(X | x1 ,..., xe ) 
1
e 1
2

 i
i 1
with wi 
 i 2
e 1
2

 j
and e 1  0
j 1
Note: i. the multiplicative model follows the same line of
reasoning but with the lognormal distribution
ii. the DM acts as the e+1st expert, (perhaps uncomfortable)
21
Combination of Expert Judgment:
Bayesian Techniques
MOSLEH APOSTALAKAIS MODEL
FOR EXPERT JUDGMENT COMBINATION
Selct Model
Additive
Multiplicative
x
Number of Experts
3
DM PRIOR SPECIFICATION
Mean X
St Dev X
4.63E-05
9.36E-06
Mean Ln X
-1.00E+01
EXPERT INPUT
Estimate
Expert Evaluation
Mult. Error Parameters
Number
Xi
Mean
St Dev
1
2.00E-05
1.01E+00
1.01E-01
2
4.50E-06
8.00E+00
3.33E+00
3
1.00E-04
2.77E+00
5.60E-01
Mean
St Dev
Prob X in
POSTERIOR INFERENCE
2.19E-05
1.76E-06
Ln Mean
Ln St Dev
=
Ln Xi
-1.08E+01
-1.23E+01
-9.21E+00
-1.07E+01
8.00E-02
2. 50E +05
Prior
2. 00E +05
Post.
1. 50E +05
1. 00E +05
5. 00E +04
0. 00E +00
0. 00E +00
-5. 00E +04
1. 00E -05
2. 00E -05
3. 00E -05
4. 00E -05
5. 00E -05
6. 00E -05
7. 00E -05
8. 00E -05
St Dev Ln X
2.00E-01
Ln Error Parameters
Ln Mean
Ln St Dev
0.00E+00
1.00E-01
2.00E+00
4.00E-01
1.00E+00
2.00E-01
22
Combination of Expert Judgment:
The Classical Model
•
23
Overview
– Experts are asked to assess their uncertainty distribution via
specification of a 5%, 50% and 95%-ile values for unknown
values and for a set of seed variables (whose actual realization is
known to the analyst alone) and a set of variables of interest
– The analyst determines the Intrinsic Range or bounds for the
variable distributions
– Expert weights are determined via a combination of calibration
and information scores on the seed variable values
– These weights can be shown to satisfy an asymptotic strictly
proper scoring rules, i.e., experts achieve their best maximum
expected weight in the long run only by stating assessments
corresponding to their actual beliefs
Combination of Expert Judgment:
The Classical Model
24
1
Expert 1
.5
0
ql
q5
q50
q95
qu
1
Expert 2
.5
0
ql
q5
q50
q95
qu
For a weighted combination of expert CDFs take the
weighted combination at all break points (i.e. qi values for
each expert) and then linearly interpolate
Combination of Expert Judgment:
The Classical Model
Expert 1
Expert 2
Expert Distribution
Break Points
Var. 1
|
|
x|
Var. 2
|
x|
|
Var. 3
|x
|
|
Var. 4
|
|
x|
Var. 5
|
|
|
x
25
Expert 3
Realization
| | |
x
| | |x
x
| | |
x
|
x
| x| |
|x | |
Expert 1 – Calibrated but not informative
Expert 2 – Informative but not calibrated
Expert 3 – Informative and calibrated
x|
| x |
|
| | |
|
|
|
|
| |
x|
Combination of Expert Judgment:
The Classical Model
–
26
Information
• Informativeness is measured with respect to some background
measure, in this context usually the uniform distribution
F(x) = [x-l]/[h-l]
l<x<h
• or log-uniform distribution
F(x) = [ln(x)-ln(l)]/[ln(h)-ln(l)]
l<x<h
• Probability densities are associated with the assessments of
each expert for each query variable by
– the density agrees with the expert’s quantile assessment
– the densities are nominally informative with respect to the
background measure
– When the background measure is uniform, for example,
then the Expert’s distribution is uniform on it’s 0% to 5%
quantile, 5% to 50% quantile, etc.
Combination of Expert Judgment:
The Classical Model
–
27
Information
• The relative information for expert e on a variable is
4
I (r )   pi Ln( pi / ri )
i 1
where ( p1 ,..., p4 ) is (.05,.45,.45,.05) and ri is the
backgroundmeasurefor intervali
• That is, r1 = F(q5(e))  F(ql(e)) , …, r4 = F(qh(e))  F(q95(e))
Expert
Distribution
1
0.5
•
|
|
min
|
|
|
max
Uniform
Background
Measure
• The expert information score is the average information over
Combination of Expert Judgment:
The Classical Model
•
28
Intrinsic Range for Each Seed Variable
• Let qi(e) denote expert e’s i% quantile for seed variable X
• Let seed variable X have realization (unknown to the experts )
of r
• Determine intrinsic range as (assuming m experts)
l=min{q5(1),…, q5(m),r} and h =max{q95(1),…, q95(m),r}
• then for k the overshoot percentage (usually k = 10%)
– ql(e)=l – k(h - l)
– qh(e)=l + k(h - l)
– Expert Distribution (CDF) for seed variable X is a linear
interpolation between
• (ql(e),0), (q5(e),.05), (q50(e),.5), (q.95(e),.95), (qh(e),1)
Combination of Expert Judgment:
The Classical Model
•
29
Calibration
– By specifying the 5%, 50% and 95%-iles, the expert is
specifying a 4-bin multinomial distribution with probabilities .05,
.45, .45, and .05 for each seed variable response
– For each expert, the seed variable outcome (realization), r, is the
result of a multinomial experiment, i.e.
• r  [ql(e), (q5(e)), [interval 1], with probability 0.05
• r  [q5(e), q50(e)), [interval 2], with probability 0.45
• r  [q50(e), q95(e)), [interval 3], with probability 0.45
• r  [q95(e), qh(e)], [interval 4], with probability 0.05
– Then if there are N seed variables and assuming independence
si= [# seed variable in interval i]/N is an empirical estimate of
(p1, p2, p3, p4) = (.05, .45, .45, .05)
Combination of Expert Judgment:
The Classical Model
•
Calibration
• We may test how well the expert is calibrated by testing the
hypothesis that
H0 si = pi for all i vs Ha si  pi for some i
• This can be performed using Relative Information
4
I ( s, p)   si Ln( si / pi )
i 1
where (s1 ,...,s4 ) is theempiricaldensityfrom
theseed variablesand ( p1 ,..., p4 ) is (.05,.45,.45,.05)
30
Combination of Expert Judgment:
The Classical Model
31
4
I ( s, p )   si Ln( si / pi )
i 1
Note that this value is always nonnegative and only takes the
value 0 when si=pi for all i.
• If N (the number of seed variables) is large enough
2NI(s,p) ~ χ
2
(3)
• Thus the calibration score for the expert is the probability of
getting a relative information score worse (greater or equal to)
than what was obtained
2
c(e)  1  χ (3)
(2NI(s,p))
2
2
where χ (3)
(2NI(s,p))  Pr{χ (3)
 2NI(s,p)}
Combination of Expert Judgment:
The Classical Model
–
Weights
• Proportional to calibration score * information score
• Don’t forget to normalize
–
Note
• as intrinsic range for a variable is dependent on expert
quantiles, dropping experts may cause the intrinsic range
to be recalculated
• change in intrinsic range and background measure have
negligible to modest affects on scores
32
Combination of Expert Judgment:
The Classical Model
• Optimal (DM)Weights
– Choose minimum  value such that
if C(e) > , C(e) = 0 (some experts will get 0 weight)
–  is selected so that a fictitious expert with a distribution
equal to that of the the weighted combination of expert
distributions would be given the highest weight among
experts
33
Combination of Expert Judgment:
The Classical Model
COMBINATION OF EXPERT JUDGMENT
Number of Experts
Select Type of Weights
4
Equal
Use Specified
Performance Based
Intrinsic Range Adjustment
#
1
2
3
4
l
40
40
40
40
5%
200.00
270.00
100.00
200.00
x
0.1
Expert Input
50%
330.00
300.00
200.00
300.00
95%
500.00
400.00
300.00
700.00
h
760
760
760
760
Equal
0.25
0.25
0.25
0.25
Weights
User
Perform
0.40
0.266
0.10
0.429
0.10
0.000
0.40
0.306
34
Combination of Expert Judgment:
The Classical Model
SEED VARIABLE INPUT
Overshoot
Number of Variables
VARIABLE
Realization
Experts
1
2
3
4
0.1
5
1
26
5%
2.00E+01
3.00E+01
1.00E+00
1.00E+01
ql=
qh=
50%
5.50E+01
5.00E+01
1.00E+01
5.00E+01
0
87.9
95%
8.00E+01
6.00E+01
2.00E+01
6.00E+01
<5%
0
1
0
0
0
0
0
0
0
0
5%-50% 50%-95%
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
>95% Information
0 1.56E-01
0 7.37E-01
1 1.22E+00
0 4.80E-01
0 0.00E+00
0 0.00E+00
0 0.00E+00
0 0.00E+00
0 0.00E+00
0 0.00E+00
35
Combination of Expert Judgment:
The Classical Model
36
PERFORMANCE WEIGHTS

1
EXPERT NORM WT
1
0.274
2
0.439
3
0.000
4
0.287
WT
0.187
0.301
0.000
0.196
EMPITRICAL DISTRIBUTION
CAL
0.05
0.45
0.45
0.395
0.000
0.200
0.800
0.411
0.200
0.200
0.600
0.000
0.000
0.000
0.200
0.740
0.000
0.400
0.600
0.05
0.000
0.000
0.800
0.000
INDIVIDUAL VARIABLE INFORMATION
INF
VAR 1 VAR 2 VAR 3 VAR 4 VAR 5 VAR 6
0.4749 0.1564 0.3182 0.7481 0.6284 0.5236 0.0000
0.7323 0.7372 0.6301 0.5415 0.6391 1.1136 0.0000
1.2720 1.2222 0.7847 1.3326 0.7847 2.2356 0.0000
0.2656 0.4803 0.2260 0.2884 0.2260 0.1071 0.0000
Combination of Expert Judgment:
The Classical Model
37
PERFORMANCE BASED WEIGHTS
Expert CDF
DM
1
2
3
4
CDF
Plot Comparison
Plot 1
Plot 2
Plot 3
Plot 4
Plot 5
1.000
0.900
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100
0.000
DM
1
2
3
4
0
200
400
600
Varaible Value
USER DEFINED WEIGHTS
800
EQUAL WEIGHTS
Expert CDF
1.000
0.900
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100
0.000
DM
1
2
3
4
200
400
600
Varaible Value
800
1000
CDF
CDF
Expert CDF
0
1000
1.000
0.900
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100
0.000
DM
1
2
3
4
0
200
400
600
Varaible Value
800
1000