Detecting Faking on Noncognitive Assessments Using

Download Report

Transcript Detecting Faking on Noncognitive Assessments Using

Brian Lukoff
Stanford University
October 13, 2006

Based on a draft paper that is joint work with
Eric Heggestad, Patrick Kyllonen, and
Richard Roberts



The decision tree method and its applications
to faking
Evaluating decision tree performance
Three studies evaluating the method
 Study 1: Low-stakes noncognitive assessments
 Study 2: Experimental data
 Study 3: Real-world selection

Implications and conclusions



A technique from machine learning for predicting an outcome
variable from (a possibly large number of) predictor variables
Outcome variable can be categorical (classification tree) or
continuous (regression tree)
Algorithm builds the decision tree based on empirical data
TRAINING SET
Day
Snowing?
Raining?
Method
1
yes
yes
drive
2
yes
no
drive
3
no
yes
drive
4
no
yes
walk
5
no
no
walk
6
no
no
walk
7
no
yes
drive
Is it snowing?
Yes
No
drive
Is it raining?
Yes
drive
No
walk
TRAINING SET
Day
Snowing?
Raining?
Method
1
yes
yes
drive
2
yes
no
drive
3
no
yes
drive
4
no
yes
walk
5
no
no
walk
6
no
no
walk
7
no
yes
drive

Is it snowing?
Yes
No
drive
Is it raining?
Yes
drive
Not all cases are accounted for correctly
 Wrong decision on Day 4
 Need to choose variables predictive enough of the
outcome
No
walk
TEST SET
Is it snowing?
Yes
No
drive
Is it raining?
Yes
drive

No
walk
Day
Snowing?
Raining?
Method Prediction
8
yes
yes
drive
drive
9
no
yes
walk
drive
10
no
yes
drive
drive
11
yes
no
drive
drive
12
no
no
walk
walk
13
no
no
walk
walk
14
yes
yes
drive
drive
Not all cases are predicted correctly
 Maybe the decision to drive or walk is determined
by more than just the snow and rain?




Ease of interpretation
Simplicity of use
Flexibility in variable selection
Functionality to build decision trees readily
available in software (e.g., the R statistical
package)

Outcome variable = faking status (“faking” or
“honest”)
 Training set = an experimental data set where
some participants instructed to fake
 Training set = a data set where some respondents
are known to have faked

Outcome variable = lie scale score
 Training set = a data set where the target lie scale
was administered to some subjects


So far, have used individual item responses
only
Other possibilities:
 Variance of item responses
 Number of item responses in the highest (or
lowest category)
 Modal item response

Decision tree method permits some
sloppiness in variable selection

Classification trees (dichotomous outcome
case, e.g., predicting faking or not faking)
 Accuracy rate
 False positive rate
 Hit rate

Continuous
 Average absolute error
 Correlation between actual and predicted scores


Algorithm can “overfit” to the training data,
so performance metrics computed on the
training data not indicative of future
performance
Thus we will often partition the data:
 Training set (data used to build tree)
 Test set (data used to compute performance
metrics)


Training/test set split leaves a lot to the
chance selection of the training and test set
Instead, partition the data into k equal
subsets
 Use each subset as a test set for the tree trained
on the rest of the data
 Average the resulting performance metrics to get
better estimates of performance on new data

Here we will report cross-validation
estimates

Data sets
 Two sets of students (N = 431 and N = 824) that took a battery of
noncognitive assessments as well as two lie scales as part of a larger
study

Measures
 Predictor variables
▪ IPIP (“Big Five” personality measure) items
▪ Social Judgment Scale items
 Outcomes (lie scales)
▪ Overclaiming Questionnaire
▪ Balanced Inventory of Desirable Responding

Method
 Build regression trees to predict scores on each lie scale based on
students’ item responses



Varying performance, depending on the
items used for prediction and the lie scale
used as the outcome
Correlations between actual lie scale scores
and predicted scores ranged from -.02 to .49
Average prediction errors ranged from .74 to
.95 SD


Low-stakes setting: how much faking was
there to detect?
Nonexperimental data set: students with
high scores on the lie scales may or may not
have actually been faking

Data sets
 An experimental data set of N = 590 students in two conditions
(“honest” and “faking”)

Measures
 Predictor variables
▪ IPIP (“Big Five” personality assessment) items

Method
 Build decision trees to classify students as honest or faking based on
their personality test item responses

Decision trees correctly classified students
into experimental condition with varying
success
 Accuracy rates of 56% to 71%
 False positive rates of 25% to 41%
 Hit rates of 52% to 68%

Two items on a 1-5
scale form a decision
tree:
 Item 19: “I always get
right to work”
 Item 107: “Do things at
the last minute”
(reversed)

Extreme values of
either one are
indicative of faking




Many successful trees utilized few item responses
Range of tree performance
Laboratory—not real-world—data
Although an experimental study, still don’t know:
 If students in the faking condition really faked
 If the degree to which they faked is indicative of how
people fake in an operational setting
 If any of the students in the honest condition faked

Data set
 N = 264 applicants for a job

Measures
 Predictor variables
▪ Achievement striving, assertiveness, dependability, extroversion, and
stress tolerance items of the revised KeyPoint Job Fit Assessment
 Outcome (lie scale)
▪ Candidness scale of the revised KeyPoint Job Fit Assessment

Method
 Build decision trees predicting the candidness (lie scale) score from
the other item responses


Correlations between actual and predicted
candidness (lie scale) scores ranged from .26
to .58
Average prediction errors ranged from .61 to
.78 SD
Items are on a 1-5 scale,
where 5 indicates the
highest level of
Achievement Striving
 Note that most tests are for
extreme item responses



Similar methodology to Study 1, but better
results (e.g., stronger correlations)
Difference in results likely due to the fact that
motivation to fake was higher in this realworld, high-stakes setting


Wide variety in decision tree quality between
groups of variables (e.g., conscientiousness
scale vs. openness scale)
Examining trees can give insight into the
structure of the assessment
Some decision trees in each study used only a small
number of items and achieved a moderate level of
accuracy
 Use decision trees for real-time faking detection on
computer-administered noncognitive assessments
 Real-time “warning” system
 Need to study how this changes the psychometric
properties of the assessment





Address whether decision trees can be
effective in an operational setting—are
current decision trees accurate enough to
reduce faking?
Comparisons of decision tree faking/honest
classification with classifications from IRT
mixture models
Develop additional features to be used as
predictor variables
Explore other machine learning techniques