Transcript Richard I. Frederick, Ph.D.
Solving Classification Problems for Symptom Validity Tests with Mixed Groups Validation Richard Frederick, Ph.D., ABPP (Forensic) US Medical Center for Federal Prisoners Springfield, Missouri
I am not a neuropsychologist .
My view of brain Your view of brain
My board certifications:
Forensic Psychology American Board of Professional Psychology Assessment Psychology American Board of Assessment Psychology
My professional goal:
Use tests properly in forensic psychological assessments
Goals of workshop Participants in this workshop will be able to employ Excel graphing methods: --to evaluate classification characteristics of symptom validity tests --to adapt symptom validity test scores to their individual, local, base rates --to combine information from local base rate and multiple symptom validity tests
richardfrederick.com
Something is terribly wrong
1. The SIRS has sensitivity = .485 and specificity = .995.
2. The SIRS was administered to 131 criminal defendants who were strongly suspected of feigned psychopathology.
68% of them were categorized as feigning by the SIRS
What is a classification test?
A structured routine for determining which individuals belong to which of
two groups
.
(1) There are two groups.
(2) It’s not easy to determine which group an individual belongs to without the help of the test.
Real World
The distributions represent our estimations of how the populations of the two groups score on the test. We generally estimate the population distributions by sampling. We notice that the populations have two separate, but overlapping distributions. The extent of the overlap is of concern to us.
Questions that must be addressed in research before we can continue: (1) Are there really two separate groups?
(2) Can we effectively represent the population distributions by sampling?
Real World
What we notice next.
The mean separation between the groups is 10 points.
Persons in Population A have a mean score that is 10 points below persons in Population B.
The sd for each population is the same. The mean separation between groups is one sd .
When researchers talk about mean separation, they often refer to effect size. Often, Cohen’s d is the statistic used to refer to standardized mean separation. Here, Cohen’s d = 1. This is often referred to as a large, or very large, effect size.
Mean separation = 0 Making tests often means finding those characteristics that best separate the distributions of the two groups.
Two distributions of gender with respect to: Intelligence
Moderately large mean separation Two distributions of gender with respect to: Longevity
Large mean separation Two distributions of gender with respect to: Hair Length
Very large mean separation Two distributions of gender with respect to: Body Mass
Real World
Summary: (1) We have two groups.
(2) We have a test for which the two groups score differentially.
(3) The differences in mean scores represents a very large effect.
Foundations of TPR and FPR
More commonly, researchers report Sensitivity and Specificity. These terms are common, but not most helpful. We are going to use the terms: True Positive Rate (TPR) and False Positive Rate (FPR).
TPR = Sensitivity FPR = 1 - Specificity
What are TPR and FPR?
TPR is the proportion of individuals who do have the condition who generate positive scores. TPR is the rate of scores are beyond the cut in the direction that indicates the presence of the condition.
FPR is the proportion of individuals who do NOT have the condition who generate positive scores. FPR is the rate of scores beyond the cut in the direction that indicates the presence of the condition.
Have nots Haves The green line represents the cut score. Scores to the LEFT of the line are classified NEGATIVE. Scores to right are classified POSITIVE.
Here, the False Positive Rate is 92.4%.
The True Positive Rate is 100%.
As we move the line to the right, both rates DECREASE.
To totally eliminate false positives, we have to be willing to identify almost no one as a positive.
Test/ /Truth
Has disorder Doesn’t
Have disorder
True Positives False Negatives Haves
Don’t
False Positives Positives True Negatives Negatives Have Nots
TPR = True Positives/Haves FPR = False Positives/Have Nots
Haves Have nots
A positive score will be one that is associated with Population A membership. If we set a point at which a score will be used to say, “This score represents Population A,” such a score will be referred to as a “positive score.” A positive score can be a true positive or a false positive: unknown to us.
The True Positive Rate is the proportion of Population A members who generate a positive score. In our figure, the point at which we begin to identify “positive scores” is at 50, the mean of population A. Scores at or below 50 are called positive, and a person who generates a positive score is classified as a Population A member.
We can pick any value to be our “cut score,” but it’s hard to pick one that doesn’t result in some Population B members producing “positive scores.” In our figure, 50% of the Population A members have scores at 50 or below. This is the True Positive Rate. TPR = .50.
In our figure, 16% of the Population B members have scores at 50 or below. This it the False Positive Rate. FPR = .16.
We note that it is not the test that has a certain TPR and FPR.
It is the chosen test score that has a certain TPR and FPR.
A different test score will almost certainly have different TPR and FPR.
Overcoming limiting factors of “known groups” validation in determining test score sensitivity and specificity
We think of a test as a way to characterize a dependency.
As you have more of X, you have more of Y.
Y depends on X.
X predicts Y.
X is some construct. Y is some test score.
There is a relationship that we wish to characterize and quantify.
Let’s consider feigning.
As you are more likely to feign, you are more likely to engage in certain behavior.
This behavior might be “providing answers to items on a test” at a certain rate.
You might choose more items, you might choose fewer items than “normals.”
We develop the idea that we can identify individuals who respond at a certain rate as feigners, and we decide to make a decision point about when we call test takers feigners and when we don’t.
We call that decision point a cut score.
We call test scores at or beyond the cut score: positive scores Some positive scores are correct: true positives Some positive scores are incorrect: false positives
If our test is any good, and if the relationship between X and Y is strong, then our rate of true positives is much higher than our rate of false positives.
Let’s skip to the end. We are now using the test in our clinic.
We look over our results. We see a number of “positive scores.” We know that those “positive scores” are some unknown mixture of “true positives” and “false positives.” We’d like to know what that ratio of that mixture is.
Here’s how we do it: First, we estimate what the true positive rate of the cut score is.
Then, we estimate what the false positive rate of the cut score is.
Then, we figure out what percentage of people in our sample are feigning.
Then we can get the ratio of the mixture of our true positive and false positives in all the positive scores in our clinic. (We call this positive predictive power.)
Getting TPR and FPR: We depend on researchers to tell us what the estimates of true positive rate and false positive rate are.
They usually do this through a process called “criterion groups validation.” People with more confidence than might be called for refer to this process as “known groups validation.”
The process is seemingly straightforward.
Identify two groups. One group has the condition. All the positives in this group are “true positives.” One group doesn’t have the condition. All the positives in this group are “false positives.” The rate of “true positives” is the sensitivity of the test.
TPR = sensitivity.
The rate of “false positives” is the non-specificity of the test. FPR = 1 – specificity.
There are many problems with this process, but let’s focus on the main two.
Problem 1 In Study 1, for a given cut score, researchers report the TPR is .67 and the FPR is .12.
In Study 2, for the same cut score, researchers report TPR = .58 and FPR = .09.
Which values do you use?
Problem 2: In Study 1, for a given cut score, researchers report the TPR is .67 and the FPR is .12.
In Study 2, for a different cut score, researchers report TPR = .58 and FPR = .09.
Which cut score do you use?
“Known” groups validation
Let’s validate a test!
God whispers to us what truth is and we identify 100 honest responders and 100 feigners.
100 100
We take our best shot at a test.
TRUTH 100 100
Test results
TRUTH 49 51 100 1 99 100 50 150
We say for our test: True positive rate = 49/100 = 49% [sens = 49%] False positive rate = 1/100 = 1% [specificity = 99%] TRUTH 49 51 100 1 99 100 50 150
Because God does not whisper to us anymore, we take this test, our best test, and we say, “This is the best we can do.” Let’s call it our Gold Standard.
We will now make criterion groups with this test, and we will call the groups “Known Groups.” We will then validate tests, based on these Known Groups.
We say for our test: True positive rate = 49/100 = 49% [sensitivity = 49%] False positive rate = 1/100 = 1% [specificity = 99%] TRUTH 49 51 100 1 99 100 50 150
Our move from TRUTH to KNOWN GROUPS “KNOWN” GROUPS 49 1 50 51 100 99 150 100
We forget what truth is and develop faith in our gold standard “KNOWN” GROUPS 50 150
Let’s validate a new test, which just happens to be a perfect test. What test diagnostic efficiencies will we assign our new, perfect, test?
“KNOWN” GROUPS 49 1 50 51 100 99 150 100
Let’s validate a new test, which just happens to be a perfect test. What test diagnostic efficiencies will we assign our new, perfect, test?
“KNOWN” GROUPS 49 1 51 99 100 100 Our belief that we can make perfect criterion groups from imperfect criteria has led us to misunderstand tremendously what we are doing.
50 150 TPR = 49/50 = 98%, FPR = 51/150 = 34%
Let’s begin to address these problems in a non-traditional way.
Table for Computation of Test Characteristics Positive (Feigners) Negative (Not Feigning) Test Positive Test Negative 80% 20% 10% 90% Sensitivity = 80% Specificity = 90% Computation for Positive Predictive Power Computation for Negative Predictive Power
Table for Computation of Test Characteristics Positive (Feigners) Negative (Not Feigning) Test Positive Test Negative 80% 20% 10% 90% True Positive Rate (TPR) = 80% False Positive Rate (FPR) = 10% PPP = Ratio of True Positives to All Positives NPP = Ratio of True Negatives to All Negatives
Table for Computation of Test Characteristics Positive (Feigners) Negative (Not Feigning) Test Positive Test Negative 80% 20% 10% 90% True Positive Rate (TPR) = 80% False Positive Rate (FPR) = 10% PPP = Ratio of True Positives to All Positives NPP = Ratio of True Negatives to All Negatives
Table for Computation of Test Characteristics Base Rate of Feigning 100% 0% Test Positive 80% 10% Test Negative 20% NOTE: Calculations of TPR and FPR are INDEPENDENT of Base Rate True Positive Rate (TPR) = 80% 90% False Positive Rate (FPR) = 10%
Table for Computation of Test Characteristics Base Rate of Feigning 100% 0% Test Positive 80% True Positive Rate (TPR) = 80% 10% False Positive Rate (FPR) = 10%
Table for Computation of Test Characteristics Base Rate of Feigning 1.00
0 Proportion Tests Positive .80
.10
True Positive Rate (TPR) = .80
False Positive Rate (FPR) = .10
The Test Validation Summary 0,6 0,5 0,4 1 0,9 0,8 0,7 True Positive Rate 0,3 0,2 0,1 False Positive Rate 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 Base Rate of Feigning ,70 ,75 ,80 ,85 ,90 ,95 1,00 Proportion Positive Scores
REMINDER: Here is what we are working on—figuring out
which positives in our clinic are true positives.
First, we estimate what the true positive rate of the cut score is. Then, we estimate what the false positive rate of the cut score is.
Let’s do that part now.
Then, we figure out what percentage of people in our sample are feigning.
Then we can get the ratio of the mixture of our true positive and false positives in all the positive scores in our clinic.
Mixed groups validation
Table for Computation of Test Characteristics Base Rate of Malingering 1.0
.8
.6
.4
.2
0 Pr + Tests .8
TPR = .8
.66
.52
.38
.24
(.8, .6, .4, .2 are mixed groups, not pure) .1
FPR = .1
Table for Computation of Test Characteristics 0 Base Rate of Malingering .2
Pr + Tests .1
FPR = .1
.24
.4
.38
.6
.52
.8
.66
1 .8
TPR = .8
The Test Validation Summary 0,6 0,5 0,4 1 0,9 0,8 0,7 True Positive Rate 0,3 0,2 0,1 False Positive Rate 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 Base Rate of Malingering ,70 ,75 ,80 ,85 ,90 ,95 1,00 Proportion Positive Scores
The Test Validation Summary 0,6 0,5 0,4 1 0,9 0,8 0,7 When 0 < BR < 1, positive scores are some mixture of true positives and false positives. That mixture is easily discernible. When BR = 1, 80% of scores positive, all true positives True Positive Rate 0,3 0,2 0,1 When BR = 0, 10% of scores positive, all false positives False Positive Rate 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 Base Rate of Malingering ,70 ,75 ,80 ,85 ,90 ,95 1,00 Proportion Positive Scores
0.40
0.30
0.20
0.10
1.00
0.90
0.80
0.70
0.60
0.50
Movement along this line from left to right represents increasing rate of Population A and increasing rate of positive scores.
FPR = .16, TPR = .50
0.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
Base Rate of Population A in Sample
0.90
1.00
When we say FPR = .16
and TPR = .50, what we’re saying is that, no matter what samples we test, we expect to see no fewer than 16% positive scores and no more than 50% positive scores.
The Test Validation Summary 1 0,9 0,4 0,3 0,2 0,1 0,8 0,7 0,6 0,5 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 ,70 ,75 ,80 ,85 ,90 ,95 1,00 Base Rate of Condition FPR is the proportion of positive scores obtained when BR = 0. TPR is the proportion of positive scores obtained when BR = 1. The BR of the condition varies moving along the solid straight line as the proportion of positive scores increases from FPR to TPR. NPP PPP Proportion Positive Scores FPR = .10
TPR = .80
The Test Validation Summary 1 0,9 0,4 0,3 0,2 0,1 0,8 0,7 0,6 0,5 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 ,70 ,75 ,80 ,85 ,90 ,95 1,00 Base Rate of Condition The mixture of true positives and false positives changes in a linear fashion, moving from 0% true positives to 100% true positives, but the rate of change (PPP) is not linear. PPP changes in a non-linear, or curvilinear, fashion.
NPP PPP Proportion Positive Scores FPR = .10
TPR = .80
Pr + Tests Table for Computation of Test Characteristics 0 Base Rate of Malingering .2
.4
.6
.8
1 .24
.38
.52
.66
The Test Validation Summary 0,6 0,5 0,4 1 0,9 0,8 0,7 True Positive Rate 0,3 0,2 0,1 False Positive Rate 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 Base Rate of Malingering ,70 ,75 ,80 ,85 ,90 ,95 1,00 Proportion Positive Scores
The Test Validation Summary 0,6 0,5 0,4 1 0,9 0,8 0,7 True Positive Rate 0,3 0,2 0,1 False Positive Rate 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 Base Rate of Malingering ,70 ,75 ,80 ,85 ,90 ,95 1,00 Proportion Positive Scores
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
FPR = .052, SE = .021
TPR = .777, SE = .061
0.10
0.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
Estimated Base Rate of Malingering
0.80
0.90
1.00
1.00
0.90
FPR = .056, SE = .025
TPR = .742, SE = .093
0.80
0.70
0.60
0.50
0.40
0.30
0.20
TOMM No simulation studies FPR = .056, SE = .025
0.10
0.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
Estimated Base Rate of Malingering
0.80
0.90
1.00
TPR = .742, SE = .093
PPP NPP For any
imperfect
test, PPP ranges from 0 to 1 as base rate ranges from 0 to 1 NPP ranges from 0 to 1 as base rate ranges from 1 to 0
Using MGV to estimate test diagnostic efficiencies of the Reliable Digit Span Laurie Ragatz, PhD Richard Frederick, PhD
What is Reliable Digit Span?
RDS is a symptom validity measure for Digit Span. The value of RDS is derived by a dding longest strings of two trials passed for both forward and backward Digit Span.
Researched cut scores include 5 or lower, 6 or lower, 7 or lower, or 8 or lower.
Reliable Digit Span Example:
Forward Digit Span
1 4 2 5 5 7 1 8 3 4 5 9 4 6 7 2 3 9
Correct
Correct Correct Correct Correct Correct Correct
Incorrect
Incorrect Incorrect Incorrect Incorrect Incorrect Incorrect
Directions: Examinee recalls numbers in the same order they were provided by the examiner Backward Digit Span
Example Correct Answer 1 2 7 4 5 3 9 8 2 4 2 1 4 7 9 3 5 4 2 8
Correct
Correct Correct Correct Correct Correct
Incorrect
Incorrect Incorrect Incorrect Incorrect Incorrect
Directions: Examinee recalls numbers in the reverse order they were provided by the examiner
Reliable Digit Span: 4 + 3 =7
(1) We found all available articles dealing with RDS and identified the cut scores investigated. We included simulator studies.
(2) Based on the authors’ decision about criterion group membership, we calculated the overall base rate of malingering in the study.
(3) We observed the overall rate of positive scores in the study at the identified cut score.
(4) We did not include any data for persons with mental retardation. The rate of positive scores among persons with mental retardation was exceedingly high for all cut scores.
Example: Smith (2010) reported 203 TOMMs at cut < 45.
Is malingering Test score positive Test score negative 42 21 Criterion group Is not malingering 15 125 Total 57 146 Total 63 140 We have 63 malingerers in a sample of 203. BR = 63/203 = 0.31.
203 We have 57 positive scores. Proportion positive scores (PPS) is 57/203 = .28. For this study, we plot (BR, PPS) = (.31, .28) x = .31, y = .28. Our n for WLS = 203.
RDS = 5 or lower
Study Meyers & Volbrecht Mathias, Greve, Bianchini et al. 2002 Etherton, Bianchini, Greve et al 2005 Etherton, Bianchini, Ciota, & Greve 2005 Axelrod, Fichtenberg, Millis, & Wertheimer, nd Ylioga, Baird, Podell (2009) Harrison, Rosenblum, Currie (2010) N 96 54 157 60 65 62 133 Cut score = 5 or lower BR 0.490
PPS 0.052
0.444
0.223
0.093
0.057
0.333
0.554
0.532
0.113
0.083
0.215
0.113
0.008
Using weighted least squares regression (with N as the weight), we regressed Proportion Positive Scores (PPS) on Base Rate (BR) to generate the Proportion Positive Score Line. We obtained y-intercept of -.015 (all negative values are truncated to 0), and slope of .265.
Study 1 4 5 2 3 6 7 RDS = 5 or lower N 96 54 157 60 65 62 133 BR 0.49
0.444
0.223
0.333
0.554
0.532
0.113
PPS 0.052
0.093
0.057
0.083
0.215
0.113
0.008
put these data in WLS to obtain regression line characteristics 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 0 0,2 0,4 0,6 scatterplot 0,8 1
RDS: 5 or lower, FPR = 0, TPR = .265
RDS = 6 or lower
Study Duncan & Ausborn, 2002 Meyers & Volbrecht Mathias, Greve, Bianchini et al. 2002 Etherton, Bianchini, Greve et al 2005 Strauss, Slick, Hunter, et al 2002 Etherton, Bianchini, Ciota, & Greve 2005 Axelrod, Fichtenberg, Millis, & Wertheimer, nd Ylioga, Baird, Podell (2009) Harrison, Rosenblum, Currie (2010) Babikian, Boone, Lu, & Arnold Greiffenstein & Baker (2008) N 187 96 54 157 74 60 65 62 133 154 87 Cut score = 6 or lower BR 0.283
0.490
PPS 0.230
0.094
0.444
0.223
0.459
0.333
0.554
0.532
0.113
0.429
0.775
0.185
0.089
0.243
0.117
0.354
0.242
0.045
0.130
0.368
y-intercept = .015, slope = .419
RDS: 6 or lower, FPR = .015, TPR = .434
RDS = 7 or lower
N Study Duncan & Ausborn, 2002 Meyers & Volbrecht Mathias, Greve, Bianchini et al. 2002 Etherton, Bianchini, Greve et al 2005 Inman & Berry, 2002 Etherton, Bianchini, Ciota, & Greve 2005 Axelrod, Fichtenberg, Millis, & Wertheimer, nd Ruocco, Swirsky-Sacchetti, Chute et al., 2007 Merten, Bossink, Schmand (first) Ylioga, Baird, Podell (2009) Harrison, Rosenblum, Currie (2010) Greiffenstein, Baker, Gola (1994) Babikian, Boone, Lu, & Arnold Greiffenstein, Gola, Baker (1995) Greiffenstein & Baker (2008) 187 96 54 157 92 60 65 77 48 62 133 106 154 177 602 Cut score = 7 or lower BR 0.283
PPS 0.394
0.490
0.444
0.223
0.478
0.260
0.333
0.270
0.130
0.333
0.554
0.041
0.500
0.532
0.113
0.406
0.429
0.384
0.492
0.133
0.554
0.338
0.458
0.452
0.083
0.396
0.234
0.582
0.419
y-intercept = .187, slope = .39
RDS: 7 or lower, FPR = .187, TPR = .618
RDS = 8 or lower
Study Meyers & Volbrecht Mathias, Greve, Bianchini et al. 2002 Etherton, Bianchini, Greve et al 2005 Etherton, Bianchini, Ciota, & Greve 2005 Axelrod, Fichtenberg, Millis, & Wertheimer, nd Ylioga, Baird, Podell (2009) Harrison, Rosenblum, Currie (2010) Greiffenstein, Baker, Gola (1994) Babikian, Boone, Lu, & Arnold y-intercept = .236, slope = .824
N 96 54 157 60 65 62 133 106 154 Cut score = 8 or lower BR 0.49
PPS 0.458
0.444444
0.22293
0.5
0.49
0.333333
0.217
0.553846 0.753846154
0.532258
0.112782
0.40566
0.428571
0.565
0.263
0.557
0.377
RDS: 8 or lower, FPR = .236, TPR = .824
As we move from a cut score of 5 or lower to 6 or lower, we obtain substantial improvement in TPR estimate with little cost in FPR increase.
Our choice for best cut score for RDS RDS: 6 or lower, FPR = .015, TPR = .434
Cut score 5 or lower 6 or lower 7 or lower 8 or lower FPR 0 (.038) .015 (.053) .187 (.102) .236 (.112) TPR .25 (.07) .434 (.082) .618 (.155) .824 (.190) By using WLS regression, we can obtain standard errors of our estimates of FPR and TPR.
So, new researchers can test hypotheses about parametric values of FPR and TPR.
Overcoming limiting factors of “known groups” validation in determining test score sensitivity and specificity
Summary: 1. The TVS and MGV allow powerful research into existing published data sets. Summary data are used.
2. Understanding of parametric values of TPR and FPR is facilitated when researchers publish results on a variety of cut scores that should be considered. A frequency distribution would be ideal, for example, RDS
0 1 2 3 4 1 3
n
5 0 0
RDS
5 6 7 8 9
n
7 51 68 79 98
RDS
10 11 12 13 14
n
88 74 61 32 12
3. Combining studies in this way allows us to generate stable values of TPR and FPR with SE’s so that new research can test those values.
4. Researchers should focus on the basis for estimating BR’s in their research groups. All research estimating FPR and TPR is vulnerable to error when the purity of research groups is overestimated. Working towards a reliable estimate of mixed group base rate will facilitate better validation studies.
Reliably estimate local base rates of feigning for proper allocation of sensitivity and specificity information
How can the Test Validation Summary help me determine my local BR?
1. Get the best estimate of the test FPR and TPR for a certain test score.
2. Find the proportion of test scores in your sample that are positive scores.
The Test Validation Summary 1 0,9 0,4 0,3 0,2 0,1 0,8 0,7 0,6 0,5 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 ,70 ,75 ,80 ,85 ,90 ,95 1,00 Base Rate of Condition You review your records and determine that 40% of your patients have a positive score when the score has FPR = .10 and TPR = .80.
From the TVS, you see that this corresponds to a BR = .43. You see that in your clinic, the PPP for a positive score is .86 and the NPP for a negative score is .86.
NPP PPP Proportion Positive Scores FPR = .10
TPR = .80
From a sample, observe rate of positive scores.
Use TVS to estimate condition BR in that sample, PPP and NPP for that BR.
527 criminal defendants who took RMT and VIP concurrently Rate of positive scores in this sample was .113
PPP = .814
1 – NPP = .077
1.00
0.90
FPR = .056, SE = .025
TPR = .742, SE = .093
0.80
0.70
0.60
0.50
0.40
0.30
0.20
TOMM No simulation studies FPR = .056, SE = .025
0.10
0.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
Estimated Base Rate of Malingering
0.80
0.90
1.00
TPR = .742, SE = .093
Beth A. Caillouet, Bernice A. Marcopulos, Jesse G. Brand, Julie Ann Kent, & Richard I. Frederick Question: What are the BRs of malingering in the two samples?
Question: What are the BRs of malingering in the two samples?
Information needed: Estimates of TOMM FPR and TPR. From TOMM TVS, we get FPR = .056, TPR = 742.
Sample 1: Secondary gain present. Proportion positive scores = 55/220 = .25.
Sample 2: Secondary gain absent. Proportion positive scores = 34/299 = .11.
Use TOMM TVS to estimate BR of each sample.
When PPS = .25, BR = .28.
When PPS = .11, BR = .08.
Defensibly choose symptom validity cut scores that are ideally suited for their local base rates
M-FAST
MFAST > 5 MFAST < 6 Malingering Genuine 86 TPR = .93
FPR = .17
BR malingering = 35%, N = 86
MFAST > 5 MFAST < 6 Malingering Genuine .35(86) .65(86) 86 TPR = .93
FPR = .17
BR malingering = 35%, N = 86
MFAST > 5 MFAST < 6 Malingering .93(30) 30 Genuine .17(56) 56 86 TPR = .93
FPR = .17
BR malingering = 35%, N = 86
MFAST > 5 MFAST < 6 Malingering 28 2 30 Genuine 9.52
46.48
56 86 TPR = .93
FPR = .17
BR malingering = 35%, N = 86
MFAST > 5 MFAST < 6 Malingering 28 2 30 Genuine 10 46 56 86 TPR = .93
FPR = 10/56 = .18
BR malingering = 35%, N = 86
MFAST > 5 MFAST < 6 Malingering 28 2 30 Genuine 10 46 56 38 48 86 TPR = 28/30 = .93
PPP = 28/38 = .737
BR malingering = .35
NPP PPP FPR = 10/56 = .18
NPP = .958
1 – FPR TPR
MFAST > 5 MFAST < 6 Malingering 28 2 30 Genuine 103 467 570 131 469 600 TPR = 28/30 = .93
PPP = 28/131 = .213
BR malingering = .05
NPP PPP FPR = 10/56 = .18
NPP = .996
1 – FPR TPR
Test validation summary for M-FAST cut score recommended by test manual.
PPP does not even reach 50% correct decisions until BR > .16
At recommended cut score FPR very high M-FAST > 5 FPR = .17
TPR = .93
At BR = .05, PPP does not exceed .50 until cut score adjusted to > 9 on M-FAST
Combining information from local base rate and multiple symptom validity tests
You can get estimates of PPP and NPP for the sample you work with—IF you can reliably estimate the BR.
737 defendants were administered: Rey 15 Item Memory Test (RMT) —memorize and reproduce 15 items—very easy test.
Score is items reproduced (0 to 15) Word Recognition Test (WRT) —memorize 15 words, identify those 15 and correctly reject 15 from a list of 30. Score is number of hits and correct rejections (0 to 30)
RMT validating using MGV with clinical probability judgments.
FPR = .025
TPR = .574
Frederick & Bowden, 2009
RMT < 9 FPR = .025
TPR = .574
We found 726 defendants who completed BOTH RMT and WRT.
81/726 failed the RMT= .111 proportion positive score.
By observation of TVS, then BR = .16, PPP = .814, NPP = .923
From a sample, observe rate of positive scores.
Use TVS to estimate condition BR in that sample, PPP and NPP for that BR.
527 criminal defendants who took RMT and VIP concurrently Rate of positive scores in this sample was .113
PPP = .814
1 – NPP = .077
We found 726 defendants who completed BOTH RMT and WRT.
81/726 failed the RMT= .111 proportion positive score.
By observation of TVS, then BR = .16, PPP = .814, NPP = .923
If PPP = .814, then in this sample, the probability of feigning if RMT is positive, is .814.
If NPP = .923, then in this sample, the probability of feigning if RMT is negative is .077, or 1 - .923.
To conduct MGV, we sampled from two groups: 1. The 645 individuals who passed the RMT—had a negative score.
2. The 81 individuals who failed the RMT—had a positive score.
Example of sampling 645 individuals with negative scores, p(mal) = .077
81 individuals with positive scores, p(mal) = .814
Sample n = 360 Sample n = 40 400 cases, 10% failures, 90% passes Overall p(mal) = 40*.814 + 360*.077 = .151 Sample 25 times, plot x = .151, y = observed rate of positive WRT scores, n for WLS = 400
Group 5 6 1 2 3 4 7 8 9 10 11 Ratio Failures Passes 0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 40 40 40 40 81 0 40 40 40 40 40 27 17 10 4 0 645 300 160 93 60 40 N 645 400 200 133 100 80 67 57 50 44 81 BR 0.077
0.1507
0.2244
0.2981
0.3718
0.4455
0.5192
0.5929
0.6666
0.7403
0.814
Samples 1 25 25 25 25 25 25 25 25 25 1 For each sample, BR was pre-estimated. Then we observed rate of positive WRT scores at each potential cut score.
Word Recognition Test (WRT) Range 4 to 30, Mean = 23.2
Within group of RMT < 9, mean = 18.7
Within group of RMT > 8, mean = 23.8
Word Recognition Test (WRT) For every potential cut score of WRT (4 -30), we plotted all x, y pairs obtained from sampling We performed WLS to obtain the FPR and TPR estimates at every potential cut score.
We plotted the FPR and TPR estimates at every potential cut score to generate the ROC curve.
AUC = .905, SE = .012, 95% CI for AUC = .881-.930.
Best cut scores: LTE 18 (TPR = .563, FPR = .034) LTE 19 (TPR = .620, FPR = .066)
LTE 18 0,4 0,3 0,2 0,1 0 0 0,7 0,6 0,5 1 0,9 0,8 0,1 0,2 0,3 0,4 0,5 Base Rate 0,6 0,7 0,8 0,9 1
FPR WORD RECOGNITON TEST (WRT) We plotted the FPR and TPR estimates at every potential cut score to generate the ROC curve.
AUC = .905, SE = .012, 95% CI for AUC = .881 .930.
Best cut scores: LTE 18 (TPR = .563, FPR = .034) LTE 19 (TPR = .620, FPR = .066)
Summary: 1. We can use tests to form mixed groups for validation.
2. The best estimates of FPR and TPR for a test cut score allow us to estimate PPP and NPP at our sample BR.
3. Instead of “known groups” design (which is misleading), we do not presume to know (or care) about the status of any individual. We assign individuals “probabilities of having the condition” based on their test score.
4. Mixed groups have an overall “probability of having the condition,” which is the average of the individual probabilities.
5. We do not need to be certain about group memberships. We gain much flexibility by working with probabilities of having the condition vs. certainties of having the condition.
Another example
Dawes 1967 showed that valid probability judgments are excellent base rate indicators. His work was substantiated in Frederick 2000 and Frederick and Bowden 2009.
To conduct MGV, we formed groups of defendants for whom individuals ratings of likelihood of malingering psychosis were generated by forensic psychologists, before any testing took place.
The BR of malingered psychosis for each group was then the mean of the probability rating. If each member of the group had been rated as 10% likely to feign psychosis, then the BR of the group was estimated to be 10%.
We then observed the hit rate (proportion positive scores) for the groups for a variety of F-family indicators of feigning on the MMPI-2 and MMPI-2-RF.
We formed 15 groups of 30 individuals. For each group, we had a static base rate, which was the mean of the probability judgments assigned before testing.
Within each group, we iteratively observed the hit rate of positive F-family indicators at each potential cut score. Using the BR estimate and the proportion positive scores at each potential cut score, we performed WLS to generate estimates of FPR and TPR.
From these estimates, we generated ROC curves.
15 groups, 30 defendants in each group, 450 defendants Each defendant rated from 0 to 100 before testing, with respect to likelihood he would feign psychosis.
Groups were formed after first sorting individuals by ratings, from lowest to highest.
Mean ratings of groups (each group, n = 30): 0 0 1.2
4.2
5.0
5.0
5.0
5.0
8.1
10 14.5
22.2
30.3
45.7
72.3
Rates of positive F-family scores at each potential cut observed.
Scale F Fp Fp (no L items) F-r Fp-r AUC .904
.870
.905
.940
.926
SE .015
.018
.015
.011
.013
95%CI .874-.933
.834-.906
.877-.934
.919-.962
.901-.950
Estimates by Nicholson, Mouton, Bagby, Buis, Peterson, and Buigas (1998): AUC’s and SE: F (.929, .021) Fp (.885, .027)
Scale F Fp Fp (no L items) F-r Fp-r Cutoff GTE 28 GTE 8 GTE7 GTE20 GTE8 FPR .043
.054
.055
.050
.055
TPR .635
.484
.537
.640
.652
Summary: 1. Using the estimates of likelihood of feigning based only on clinician judgment prior to testing did not result in random results. We can assume that mean probability judgments were effective base rate estimates.
2. Our estimates of F and Fp are consistent with estimates in large, well-validated analysis.
3. In this study, MMPI-2-RF indicators have higher mean AUC and lower SE than their MMPI-2 counterparts.
Scale F Cutoff GTE 28 FPR .043
TPR .635
Combine information about F with the SIRS-2
f
Valid 17 18 19 20 21 22 23 24 25 26 27 12 13 14 15 16 Frequency 4 6 9 10 11 4 1 1 2 3 2 3 4 2 3 3 2 4 5 2 7 3 1 5 6 6
28
29
7
2 2.0
1.3
2.7
3.4
1.3
4.7
2.0
.7
3.4
4.0
4.0
Percent Valid Percent 2.7
.7
.7
1.3
2.0
3.1
.8
.8
1.5
2.3
3.1
3.8
4.6
6.1
8.4
1.3
2.0
2.7
1.3
2.0
1.5
2.3
3.1
1.5
2.3
9.9
12.2
15.3
16.8
19.1
2.3
1.5
3.1
3.8
1.5
5.3
2.3
.8
3.8
4.6
4.6
21.4
22.9
26.0
29.8
31.3
36.6
38.9
39.7
43.5
48.1
52.7
4.7
1.3
5.3
1.5
58.0
59.5
Cumulative Percent
131 defendants who took MMPI and SIRS
52.7% of cases are 27 or lower 47.3% of cases are 28 or higher What is the base rate of feigned psychopathology?
Scale F Cutoff GTE 28 FPR .043
TPR .635
BR TPR FPR NPP PPP What we say: Within our sample of 131 defendants, the BR of feigned psychopathology is .73 (NOT .475) At BR = .73, the PPP of F GTE 28 is .976.
At BR = .73, the NPP of GTE 28 is .492, so p(feigning if LTE 27) is still .508) (Remember, they’re being given the SIRS for a reason)
NPP about .66
F < 28
F > 27
Application of MGV to a CGV estimation of FPR and TPR
Greve, Bianchini, Love, Brennan, & Heinly (2006) articulated six separate groups with increasing base rate of malingering based on formal criteria for malingering (the Slick criteria) to validate the MMPI-2 Fake Bad Scale 1. No incentive (no evidence of external incentive and no test performance suggestive of malingering; n = 18, mean FBS = 15.4) 2. Incentive (external incentive, but no test performance suggestive of malingering; n = 79, mean FBS = 19.5) 3. Suspect (external incentive and at least one indicator suggestive of malingering; n = 66, mean FBS = 22.7) 4. Statistically Likely (external incentive; at least two indicators suggestive of malingering; n = 51, mean FBS = 22.8) 5. Probable (external incentive; strong indicators of malingering; n = 31, mean FBS = 26.9) 6. Definite (external incentive; very strong indicators of malingering; n = 14, mean FBS = 29.8)
Even though it is clear that BR Definite > BR Probable > BR Statistically Likely > BR Suspect > BR Incentive Only > BR No Incentive They were required, to conduct “Known” groups validation, to ignore this obvious circumstance and to define BR No Incentive = BR Incentive Only = 0 BR Statistically Likely = BR Probable = BR Definite = 1.0
And drop all participants defined as Suspect to yield the following ROC
FBS ROC generated by “Known” groups validation by Greve & Bianchini
If we had estimates of the BR for each of the subgroups formed by Greve and Bianchini, we could use MGV to estimate FPR and TPR for each potential cut score.
We have our stable estimate of TOMM FPR and TPR
1.00
0.90
FPR = .056, SE = .025
TPR = .742, SE = .093
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
Estimated Base Rate of Malingering
0.80
0.90
1.00
TOMM No simulation studies FPR = .056, SE = .025
TPR = .742, SE = .093
We can get estimates of BRs for those groups from other work by Greve & Bianchini.
They formed similar groups using the Slick criteria to investigate the TOMM.
We can use the proportion of positive TOMMs in each of these subgroups to estimate the BRs in each of them.
From Greve, Bianchini, Doane (2006) Proportion Positive TOMM Scores No Inc Inc Only 0 5 Suspect Probable 20 47 Definite 78
The Test Validation Summary 1 0,9 0,4 0,3 0,2 0,1 0,8 0,7 0,6 0,5 TOMM: FPR = .056, TPR = .742
Proportion Positive TOMM malingering Scores No Inc Inc Only Suspect 0 ,00 ,05 ,10 ,15 ,20 ,25 ,30 ,35 ,40 ,45 ,50 ,55 ,60 ,65 ,70 ,75 ,80 ,85 ,90 ,95 1,00 Base Rate of Malingering Definite 0 .05
.20
Probable .47
.78
Est BR of 0 0 .21
.633
1
We take these BR estimates and reapply them to the Greve & Bianchini FBS data.
Example of MGV for FBS based on BR estimates for Greve & Bianchini groups established by Slick criteria 0 Base Rate of Malingering 0 .21
.633
1 Pr + Tests n .11
18 .09
79 .23
66 .52
31 .79
14 At FBS > 27 For FBS > 27, using WLS Regression, FPR = .091, TPR = .773
(For WLS, n is the weighted variable)
Evaluate constructs that underlie symptom validity tests
1.00
0.90
FPR = .054, SE = .037; TPR = .570, SE = .119
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
Estimated Base Rate of Malingering
0.80
0.90
1.00
10 clinical studies using Rey 15-Item Test No simulators All clinical data
RMT validating using MGV with clinical probability judgments.
FPR = .025
TPR = .574
Frederick & Bowden, 2009
CI, TPR = .574, SE = .044
We will generate TVS based on these values and find PPP and 1 – NPP to estimate probability of bad intent represented by RMT score.
Low Effort Intends to respond correctly Inconsistent/Invalid Irrelevant/Invalid Compliant/Valid High Effort Suppressed/Invalid Does not intend to respond correctly
Validity Indicator Profile
VIP Verbal Subtest Items
•
Easy: Baby
Drink Infant
•
Moderate: People
Ally
•
Difficult: Nimiety
Conceit Folk Surfeit
1.0
0.9
Compliant 0.8
0.7
0.6
Irrelevant 0.5
0.4
0.3
0.2
0.1
0.0
0 10 Sector 1 Sector 2 Sector 3 20 30 40 50 60 Running Mean Serial Position 70 80 90
1.0
0.9
Compliant 0.8
0.7
0.6
0.5
0.4
0.3
0.2
Not guessing, knowledgeable responding Sector 1 Sector 2 Guessing is imminent Sector 3 Guessing 0.1
0.0
0 Easy items 10 Difficult items 20 30 40 50 60 Running Mean Serial Position 70 80 90
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 10 20 30 40 50 60 Running Mean Serial Position 70 80 90
527 criminal defendants who took RMT and VIP concurrently Rate of positive scores in this sample was .113
PPP = .814
1 – NPP = .077
1.00
0.90
0.80
0.70
FPR = 0, SE = .028
TPR = .859, SE = .130
Here we are matching VIP categories to the construct most likely captured by the VIP.
0.60
0.50
0.40
Points in scatterplot represent groups of 25
0.30
0.20
0.10
BR of .42 estimated for this group is mean of PPP for positive RMT scores in this group and (1 – NPP) for negative RMT scores in the group individuals.
Sorted defendants by clinical ratings of malingering, then took 20
0.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
Probability Feigned Cognitive Impairment Given by RMT < 9
1.00
groups of 25 and one group of 27, for 527 defendants.
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
Probability Feigned Cognitive Impairment
0.90
1.00
Same 21 subgroups, N = 527 defendants
527 criminal defendants VRIN was converted to “probability of invalid responding” by dividing VRIN raw score by 12.
VRIN raw scores >12 were assigned p = 1.
We are interested in FPR and TPR for “Invalid”