PsychometricsDefinedResearch
Download
Report
Transcript PsychometricsDefinedResearch
Psychometric
Defined by Research
Goals of This Session
• Brief wrap up of brown bags
• Psychometrics defined through research
–
–
–
–
Broad historical perspective
Research framework
The parallel universe concept
Some current research here at Measured
Progress
• Some concluding remarks
Psychometric Brown Bags
• All these brown bags have been
introductory in nature
• Eventually, these will be posted on our
company website
– Staff members
– Clients
– Teachers & parents
Psychometric Brown Bags
• We have covered a lot of ground
– Statistics, classical test theory, item response
theory
– Standard setting, equating, adaptive testing,
DIF, skills diagnosis
Psychometric Brown Bags
• If you found these talks interesting let us
know
• Because of the introductory nature of our
presentations – there’s lots more we could
present on
We can really define psychometrics
from a variety of perspectives
•
•
•
•
Historical
Assessment program
Analyzing data here at Measured Progress
Research
Historical Perspective
• The history of psychometrics has deep roots in
the cross roads of psychology, physiology, and
philosophy
• Ultimately these disciplines are trying to
better understand the human experience
• Psychometrics does this by quantifying
behavioral observations
Historical Perspective
• Early psychometricians focused primarily
on the quantification of intelligence
• Psychometricians have also worked
extensively on the application of
psychometric models to assess patients
within a clinical setting
Historical Perspective
• Psychometrics is ultimately a very broad
discipline
• Psychometrics is an example of blending of
the social sciences with the quantitative
sciences
– Sociometrics
– Econometrics
Research in Psychometrics
• Because of the psychometrics is a broad
discipline there are many national and
international research organizations and
societies
• This results in many
– Peer reviewed journals
– Conferences
– Opportunities for research
Research Societies
• American Educational Research Association
• National Council on Measurement in
Education
• Psychometric Society
• American Psychological Association
• International Testing Commission
• Society for Industrial/Organizational
Psychology
Psychometricians
at Other Organizations
• Again, because of our broad discipline
psychometricians work in a variety of places:
–
–
–
–
–
American Institute of Certified Public Accountants
National Board of Medical Examiners
Law School Admissions Council
The Rand Corporation
Research Triangle Institute
Research at Other Organizations
• Research Agendas
– This approach tends to be an laundry list approach of
ideas that are not well connected
• Products and Services
– This is a narrowly focused method with a specific goal
• Both these approaches are not resource friendly
and will lead to research programs that are not
well orchestrated
Psychometric Research
at Measured Progress
• We wanted to come up with a different way
of organizing and conducting research
• Our approach is an attempt at:
– Connecting research projects in meaningful
ways
– Allowing for product based research to be done
in a cost effective manner
– Connecting research with products
Psychometric Research
at Measured Progress
• This approach also allows for external
opportunities
– Interns
– Through other research institutes
•
•
•
•
Center for Assessment
Center for Advanced Studies in Measurement and Assessment
Center for Educational Research and Evaluation
The Research & Evaluation Methods Program
– Visiting Scholars
– Clients
Research Framework
• Because all assessment programs have some
common structure, any research project should
fit somewhere in that structure.
• Most research projects relate to more than one
area. Still, a framework with separately
delineated areas is helpful for organizing and
discussing such research.
Research Framework
• Design and Modeling
• Statistical Analyses
• Scoring and Reporting
Design and Modeling
• Included in this category is research having
to do with modeling the students, the
assessment tasks, the interaction of the
students with the tasks, or test-centered
research
• The focus is on the design or modeling of
the test as a whole
Design and Modeling
•
•
•
•
Task modeling
Student modeling
Modeling Student-Task interaction
Test-centered modeling research
– Test design
– Test assembly
Statistical Analyses
• Focus is on statistics used to evaluate the
individual assessment tasks, and the overall
assessment instrument with respect to the
psychometric model applied to the test data.
• This includes research on the calibration of
psychometric models, model fit analyses,
estimation of reliability, and validity
analyses.
Statistical Analyses
• Calibration and ability estimation
• Interpretation of estimated parameters
– item parameters and ability distribution
• Model fit
• Reliability and Generalizability
• Validity
– internal and external
Scoring and Reporting
• Here the focus is on how best to score
assessment tasks and the assessment instrument
as a whole.
• This includes how to transform the observed
scores and ability estimates from the
psychometric model into useful and
interpretable score reports
Scoring and Reporting
•
•
•
•
•
Observed scores, scaled scores, & IRT ability
Equating
Linking
Standard setting
Score Reports and Interpretive Guides
The Parallel Universe Concept
Parallel Universe
• A research project will certainly fit
somewhere in the Framework – it’s helpful
for organizing different research projects.
• But can the converse be true? Could the
Framework fit into a research project?
Could the Framework help organize the
research project?
Parallel Universe
• Sometimes a research project is better
characterized as a research program.
Research Program: A set of research
projects organized around a common theme
and intended to address most or all of the
components listed in the Framework.
• There are also other parallel universes
besides research programs! For example,
any one of our testing programs!
Parallel Universe Example
Skills Diagnosis Research Program
• Design and Modeling
– How does one design a test specifically for
diagnostic purposes? What’s the psychometric
model? Content specifications? IRT specs?
• Statistical Analyses
– Need new estimation methods for new models.
Also new fit statistics. How do we estimate
reliability? Internal validity stats?
• Scoring and Reporting
– How to report diagnostic scores?
Parallel Universe Example
A State Testing Program
• Design and Modeling
– Which psychometric model will be used?
– How many items? How many subscores?
• Statistical Analyses
– What calibration software is used and in what way?
– What kind of supporting statistical analyses will be
done? DIF? Dimensionality? Validity? Reliability?
• Scoring and Reporting
– Design of the score report.
– Statistics to be reported.
– Interpretation of the scores
Current Research
• Here’s 8 of the 17 papers we’re presenting
at the 2007 AERA/NCME Meeting in
Chicago, how each fits in the Framework,
and its possible relevance to real life.
Conditional item exposure in
multidimensional adaptive testing.
• Researchers: Matt Finkelman, Michael
Nering, & Louis Roussos.
• Framework: Design and Modeling
– Modeling the item selection algorithm so as to
prevent items from being over-exposed.
• Application: CAT is desired to be used
with multidimensional IRT, but current
exposure control techniques won’t work.
Generalized Mathematical Formulation for
Computing Inter-Rater Inconsistency for
BOW, Bookmark, and Yes/No methods
• Researchers: Abdullah Ferdous & Barbara Plake
(Univ. of Nebraska)
• Framework: Statistical Analyses and Scoring and
Reporting
– Standard setting raters are part of the scoring procedure
– Provide statistical support for internal validity
• Application: Can be used as part of the standard
setting process to improve the quality of the
ratings.
Use of Subset of Test Items in
Bookmark Standard Setting
• Researchers: Abdullah Ferdous
• Framework: Scoring and Reporting
– Standard setting is part of the scoring procedure
• Application: Can be used to streamline the
Bookmark standard setting procedure,
saving money and time, and perhaps
increasing reliability by reducing fatigue.
Using the DFIT framework to
evaluate equating items
• Researchers: Michael Nering, & Wonsuk
Kim.
• Framework: Statistical Analyses
– Support the internal validity of the equating
items.
• Application: A new method that may be
more sensitive to ill-suited equating items
than the current method that is used.
Using Person Fit in a Body of Work
Standard Setting
• Researchers: Matt Finkelman & Wonsuk Kim.
• Framework: Statistical Analyses (major) and
Scoring and Reporting (minor)
– Statistical support for selecting students to be used in
the body of work standard setting method.
• Application: A new method for detecting aberrant
students who should be excluded from the BOW
standard setting.
Development and evaluation of an effectsize measure for the DIMTEST statistic
• Researchers: Minhee Seo (U. of Ill.) & Louis Roussos
• Framework: Statistical Analyses
– DIMTEST assesses test unidimensionality, giving statistical
support for test internal validity.
• Application: Testing programs want us to check
dimensionality. DIMTEST is a reliable hypothesis test.
An effect size measure much improves the
interpretation of DIMTEST results.
Variations of Body of Work.
• Researchers: Kevin Sweeney and Abdullah
Ferdous.
• Framework: Scoring and Reporting.
– Body or Work is a standard setting method, which, of
course, determines cut scores for a test.
• Application: To improve the efficiency of BOW
by reducing the time and work activities in
preparing for and conducting the standard setting.
Detection of compromised items in
personnel selection examination
• Researchers: Yongwei Yang (Gallup), Abdullah
Ferdous, & Katherine Chin (U. of Neb).
• Framework: Statistical Analyses—Validity.
– Old method looked at change in p-value over time; new
improved method does this conditional on ability.
• Application: Improves the efficacy of personnel
selection by getting rid of compromised items.
Can also be used to improve item bank for
appropriate assessment programs (like CAT).
Concluding Remarks
• In Educational Assessment, psychometric
research and practice are interdependent.
– Good communication b/t research and practice
is essential for the efficacy of both.
– Our research ideas come directly from
questions and problems that arise in practice
• Our Research Framework helps give
structure and completeness to both our
research and practice.