Transcript Slide 1

PIAAC 2013 results: Care needed
in reading reports of international
surveys
Jeff Evans [email protected]
ALM Webinar, 18 March 2014
Plan
1. Introducing PIAAC (Project for International Assessment of
Adult Competencies, aka Survey of Adult Skills), including its
concept of adult numeracy
2. Social surveys, and several key issues of survey validity
3. Findings for the UK sample and international comparisons;
consideration of various interpretations currently circulating
2
PIAAC (Project for International Assessment of
Adult Competencies (aka Survey of Adult Skills)
Fieldwork in 2011-12, results available in Oct. 2013
• Measures: Literacy, Numeracy, and Problem solving in TRE
• Samples: adults usually 16-65: 5000 [or more*] per country
Builds on earlier IALS (1990s) and ALLS (2002-06), BUT …
• larger sample of 24 “industrial” countries, in 1st round
• uses computer administration, allows ‘adaptive routing’, to
find appropriate “level” of respondent
• methodological & fieldwork improvements, e.g. regulation of
sampling and fieldwork standards.
Some affinity to PISA (15 year-olds): BUT different concepts;
PIAAC uses household survey methodology + educ’l testing 3
PIAAC aims
Education Directorate at OECD (PIAAC sponsor):
helping countries to:
• Identify and measure differences between individuals and
across countries in key “competencies”
• Relate measures of skills based on these competencies to:
individual outcomes , e.g. labour market participation /
earnings/ further learning; or to aggregate outcomes, e.g.
economic growth, or social equity in the labour market
• Assess performance of education / training systems, to
enhance competencies through formal educational system
– or in the work-place, through incentives (Schleicher, 2008)
4
PIAAC concepts and measures (1)
OECD: competencies: […] abilities, capacities or
dispositions embedded in the individual […] cognitive
skills & knowledge base are critical elements,
[but] important […] to include other aspects such as
motivation and value orientation.
Numeracy: the ability to access, use, interpret,
communicate mathematical information & ideas, to
engage in / manage mathematical demands of a
range of situations in adult life.
Conceptualisation (PIAAC Numeracy Expert Group, 2009)
5
Social Surveys – a distinctive method
• Standardised measure for every respondent
 allows comparison of “like with like”
• Emphasises representativeness sampling’ Random’
BUT ALSO  produces sampling variation {‘error’}
So … need statistical inference, using the SAMPLE (n=5000)
 significance testing, of hypothesis about the value in
the POPULATION e.g. average numeracy score in the UK
… or
 ‘confidence interval’ estimation of the value in the
POPULATION: sample estimate + margin of error
Thus uses probability to reduce uncertainty: illustrations below
6
Surveys (non – experiments): issues of validity
Several concerns:
• appropriateness of indicators for concepts to be measured
… Construct Validity
Comparability across countries, or across groups, where one
wishes to assess the effect of other differences, such as
gender or amount of formal schooling … Internal Validity
[Campbell & Stanley (1966), arguing that controlled
experiments (now aka RCTs), do not solve everything]
representativeness and generalisability of findings outside the
research context … External Validity
7
PIAAC concepts and measures (2)
To produce measures, must characterise Numerate behaviour,
dimensions used in construction / validation of set of items:
• context (4 types): everyday life, work, societal, further learning
• response (or ‘cognitive strategy’ – 3 main types): identify /
locate / access (information); act on / use; interpret / evaluate.
• mathematical content ( 4 main types): quantity & number,
dimension & shape, pattern & relationships, data & chance.
• representations (of mathematical / statistical information): e.g.
text, tables, graphs.
Also Background Questionnaire: demographic & attitudinal
information, e.g. level of trust, political efficacy, health
+ Job-Related Assessment: use of / need for skills at work
8
Methodology (1)
• the content validity of the definitions of numeracy and
numerate behaviour [‘types’ of items]
• the measurement validity of the items presented,
including the administration and scoring procedures
[‘qualities’ of items]
• the reliability of the measurement procedures
• the internal validity, or validity of (‘effective’) relationships
claimed (within the sample), e.g. between skill scores and
desirable life outcomes, e.g. wages, employment, health
• the external validity, or representativeness, for the
national population of interest, of the results produced
from the sample.
… Similar dilemmas for most educational assessment.
- and for both Qual. and Quant. educational research
9
Methodology (2)
Content validity: the extent to which a measure represents all
facets of a given concept: … Here definition of numeracy
based on 4 dimensions of numerate behaviour stipulated:
context, content, response, representation.
Each item can be categorised on these four dimensions, and
the proportion of items falling into each category can be
controlled over the scale, so as to enhance the transparency
of the operational definition.
However, this is a standard definition …(generalising) ... How
well does it “fit” adults’ lives in any particular country?
Further, the four types of context (everyday, work, society
and community, further learning) are under-specified: rather
general to refer to any actual specific social practice or social
context, in any particular respondent’s everyday life.
(Evans Wedege & Yasukawa, 2013) 10
Methodology (3)
Measurement validity: extent to which person’s responses to
set of items actually capture what the conceptualisation of
numeracy specifies
• Depends on the actual range of items used: see 3 illustrative
items presented by OECD (2013) / on websites (e.g. CSO
Ireland, PIAAC 2012 Results) … and next slide
• Requires design of procedures for administration of the survey
to be standardised across all countries, e.g. training of
interviewers / testers; design specs. of the laptops (& software)
to be used, and rules for access to calculators and other aids.
• Full appreciation of the validity of procedures requires
assurance of how these procedures are followed in the field …
even more crucial when results are compared across countries
using different fieldwork teams (see PIAAC Technical Report).
11
Numeracy – Sample Item 3
This sample item
Content
Process
Context
(of difficulty level
4 ) focuses on the following aspects of the numeracy construct:
Quantity and n umber
Act upon, use (compute)
Community and society
Correct Response:
One of the
three values (no values between): 595, 596 or 600.
12
Methodology (4)
External validity: includes representativeness of sample for
the “population “… check a country’s sample design + other
fieldwork aspects, e.g. incentives for completing interview …
& judgments depend on knowing about actual field practices.
SO any summaries, e.g. mean scores, or gender differences,
are sample-based estimates for the population value (of
the mean score or size of gender difference ...) for country x
These interval estimates not exact, but show a margin of
error [say, 2* standard errors, on either side -* depends on
the level of confidence desired in the estimate] surprises
e.g. PIAAC numeracy: overall country results 2013
Japan = 288 Finland = 282
NL / BELG = 280
286 to 290 280 to 284
278 to 282 (overlap !)
13
Methodology (5)
Reliability of test administration across countries and across
interviewers, especially assuring same standards / practices in
marking (problem with past international surveys) …
Computer presentation and marking will help greatly.
But it may tend to undermine construct validity, if it reduces
that range of types of question that can be asked (example)…
And, increasing the reliability may lead to concerns about
ecological validity, whether the setting of the research is
representative of those to which one wishes to generalise the
results. For example, on-screen presentation may limit this?
14
Presentation of Results (1)
Adult’s performance not expressed as ‘proportion correct’,
since adaptive routing  some presented with ‘harder’ items
So Item Response Theory (IRT) used to (‘psychometrically’)
estimate a standardised score (e.g. mean 250, std dev 50)
(e.g. Tout, 2013)
Then, to make numerical scores meaningful, they are
commonly related to one of 5 general ‘levels’ of literacy or
numeracy …
15
PIAAC Proficiency levels: numeracy
Level
Below Level
1
Score range
Lower than 176
Numeracy
Tasks at this level require the respondents to carry out simple processes such as counting, sorting, performing
basic arithmetic operations with whole numbers or money, or recognising common spatial representations in
concrete, familiar contexts where the mathematical content is explicit with little or no text or distractors.
1
176-225
Tasks at this level require the respondent to carry out basic mathematical processes in common, concrete
contexts where the mathematical content is explicit with little text and minimal distractors. Tasks usually require
one-step or simple processes involving counting; sorting; performing basic arithmetic operations; understanding
simple percents such as 50%; and locating and identifying elements of simple or common graphical or spatial
representations.
2
226-275
Tasks at this level require the respondent to identify and act on mathematical information and ideas embedded
in a range of common contexts where the mathematical content is fairly explicit or visual with relatively few
distractors. Tasks tend to require the application of two or more steps or processes involving calculation with
whole numbers and common decimals, percents and fractions; simple measurement and spatial
representation; estimation; and interpretation of relatively simple data and statistics in texts, tables and graphs.
3
276-325
Tasks at this level require the respondent to understand mathematical information that may be less explicit,
embedded in contexts that are not always familiar and represented in more complex ways. Tasks require
several steps and may involve the choice of problem-solving strategies and relevant processes. Tasks tend to
require the application of number sense and spatial sense; recognising and working with mathematical
relationships, patterns, and proportions expressed in verbal or numerical form; and interpretation and basic
analysis of data and statistics in texts, tables and graphs.
4
326-375
Tasks at this level require the respondent to understand a broad range of mathematical information that may be
complex, abstract or embedded in unfamiliar contexts. These tasks involve undertaking multiple steps and
choosing relevant problem-solving strategies and processes. Tasks tend to require analysis and more complex
reasoning about quantities and data; statistics and chance; spatial relationships; and change, proportions and
formulas. Tasks at this level may also require understanding arguments or communicating well-reasoned
explanations for answers or choices.
5
Higher than 376
Tasks at this level require the respondent to understand complex representations and abstract and formal
mathematical and statistical ideas, possibly embedded in complex texts. Respondents may have to integrate
multiple types of mathematical information where considerable translation or interpretation is required; draw
inferences; develop or work with mathematical arguments or models; and justify, evaluate and critically reflect
upon solutions or choices.
16
Presentation of Results (2)
BUT this is simple, one-dimensional sense … e.g. “levels
embody predetermined assumptions about progression and
relative difficulty” (Gillespie (2004) referring to UK Skills for Life)
•Partly because many adults have different “spiky profiles”,
distinctive life experiences: some find type A items (e.g. “data
& chance”) more difficult; others items type B (e.g. “dimension
& shape”).
... Some policy makers attempt to stipulate “minimum level of
numeracy needed to cope with the demands of adult life” in
particular country - BUT not supported by OECD [cf. IALS]
…or by Canada (Bussière, Centre for Literacy Webinar, 17 Feb. 2014)
… in Australia, debate (see Tout, 2013; Black & Yasukawa, 2014)
• tends to assume ‘demands’ are the same across countries
• conflates adults with different work, family, social situations
17
Some interpretations of PIAAC results (1)
In each of 24 countries reporting PIAAC results in 2013, the
media seem to focus on “prominent results”:
 You can check them out in your country (cf. Hamilton,
Yasukawa & Evans, ESREA 2014) …
For example, in the UK …
“the UK (England and Northern Ireland) performed
significantly below average in numeracy” …
18
Results (1)
Results (2)
19
Results (1a)
Results (1a)
Not only
Means …
look
at the
Spreads
20
Some interpretations of PIAAC results (2)
Prominent in the UK:
the UK (England and Northern Ireland) performed significantly
below average in numeracy – with particular problems among
the 16-24 age group where the UK came 21st out of 24
industrialised countries.
… “UK faces a shrinking pool of skills, with England the only
country where the skills of young people are below those of
older people.”
21
Results (2)
22
Some interpretations of PIAAC results (3)
OECD (UK Country Note: UK, 2013, p2): “The median hourly
wage of workers who score at Level 4 or 5 in literacy is 94%
higher than that of workers who score at or below Level 1.”
23
Results (3)
24
Results (3a): another correlation
25
Other results: early impressions
1. Within-country results complex  much “fun” for
media, politicians, spin-doctors, since
1a. ... Some praiseworthy and some regrettable
findings for almost everyone
2. Between country results ‘striking’…
2a. e.g. much discussion of age / generation
differences – patterns vary widely
2b. but need to allow for sampling variation – and
even harder to control for wide range of cultural
differences between countries or groups
26
Other results: methodological tools
3. In interpretation of results, beware:
A. Is the ‘numeracy’ (literacy, PSTRE) measured
an appropriate indicator for the ‘numeracy’ referred
to in research, policy and pedagogical
discussions? [Construct Validity – several dims.]
B. Many of the interesting findings are correlations,
but not necessarily causal [Internal Validity ]
C. All scores for countries and subgroups are
double estimates: sample estimates and
“psychometric” (IRT) estimates [External Validity]
27
What is to be done? (1)
… by researchers and tutors / practitioners working together
** Generalising (E Wedege & Yasukawa): bring research
evidence / practitioner experience / to argue (remind) that
• Adult numeracy is distinctive from School Maths
• Adult numeracy is distinctive in different settings
• Adult numeracy is distinctive across different cultures, i.e.
different subgroups AND different countries
** One-dimensionality: Adult numeracy is multi-dimensional
** Need other kinds of research: local surveys, case studies,
incl. life histories (cf. Barton & Hamilton, 2012 on local literacies)
28
What is to be done? (2)
Examples of possible research topics:
a. Are numeracy levels higher in England than in NI; and, if so,
why? E.G. Higher educational qualifications – or higher
levels of numerate experience at work?
b. Why do a higher proportion of males (17%) attain scores at
Level 4/5 in Australia on numeracy scale compared with
females (9%)? ???
c. Why are the proportions of people at Level 1 (& below)
generally highest in the oldest age groups (people aged
60+)? Does this indicate, as sometimes claimed, that “a
person’s skills deteriorate over the life-course”?
29
References
OECD (2013). OECD Skills Outlook 2013: First Results from the Survey of Adult
Skills. Paris: OECD. Online: Online: http://www.oecd.org/site/piaac/#d.en.221854
OECD (2013). The Survey of Adult Skills: Reader’s Companion. Paris: OECD.
Online: Online: http://www.oecd.org/site/piaac/#d.en.221854
OECD (2013). Survey of Adult Skills First Results: Country Note - England and
Northern Ireland. Paris: OECD. Online: Online:
http://www.oecd.org/site/piaac/#d.en.221854
Evans, J. (2013/14). What to look for in PIAAC results: reading reports from
international surveys; paper given at ALM-20; revised to appear in ALM-IJ.
Evans, J., Wedege, T. & Yasukawa, K. (2013). Critical Perspectives on Adults’
Mathematics Education; in M. A. Clements, A. Bishop, C. Keitel, J. Kilpatrick and
F. Leung (Eds.), Third International Handbook of Mathematics Education, New
York: Springer.
Tout, D. (2013). Lessons Learned from International Assessments, Fine Print, 36, 2.
Black, S. & Yasukawa, K. (2014). Level 3: another single measure of adult literacy
and numeracy, Australian Educational Researcher, 41, 2, April, 125-138.
Barton, D. & Hamilton, M. (2012). Local Literacies: Reading and Writing in One
Community. London: Routledge.
30
Hamilton, M. (2011). Literacy and the Politics of Representation. London: Routledge.
Appendices
The current 24 participating countries in PIAAC include: 17 EU
members, plus USA, Can., Aus, Japan, Korea, possibly
Russian Federation. Developing countries are not involved
in Round 1, including BRIC (except Russia),
And Round 2 includes: Chile, Greece, Indonesia, Israel,
Lithuania, New Zealand, Slovenia, Singapore, Turkey.
Results expected in 2016.
Illustrative items (3 slides): taken from OECD (@013), The
Survey of Adult Skills: Reader’s Companion, pp28-30.
Available online.
Claimed “equivalences” among different qualifications in Engl.
31
Numeracy – Sample Item 1
This sample item (of difficulty level 3) focuses on the following aspects of the numeracy construct:
Content
Process
Context
Data and chance
Interpret, evaluate
Community and society
Correct Response: 1957 - 1967 and 1967 – 1977
32
Numeracy – Sample Item 2
This sample item (of difficulty level 1) focuses on the following aspects of the numeracy construct:
Content
Process
Context
Dimension and shape
Act upon, use (measure)
Every day or work
Correct Response: Any value between -4 and -5
33
Equivalences ?
Notice: columns 1, 2, and final – neat equivalences claimed between
different tests and age groups
34