Transcript Document

Rosseni Din
Muhammad Faisal Kamarul Zaman
 Nurainshah Abdul Mutalib



Universiti Kebangsaan Malaysia
there are many ways to calculate validity,
“cronbach coefficient alpha” is the most
common. According to Nunally (1978) ,
minimum value for alpha cronbach is 0.70.
the procedures are:
1. Click Analyze, choose scale then choose
Reliability Analisis
Step 1
Step 2 and 3
Select all items then move them into the Items box
in Model section, make sure you choose Alpha
Step 4
Click on Statistic. For Descriptive choices, choose Item, Scale, Scale
if Item Deleted. for inter-Item section, choose correlation. for
Summaries, choose Correlation also.
Step 5
Click Continue then OK. Output will be displayed as follows:
Reliability Statistics
Cronbach's Alpha
Cronbach's Alpha Based on
Standardized Items
N of Items
0.658
0.655
10
in matrix Inter-item correlation, all values must be POSITIVE.
This means all items in one same characteristic. Next, we look at
the Cronbach Alpha value that we hope for.
minimum value ( Item Reliability) for Cronbach Alpha should be
0.7 ( Pallant : 2007).
next we look at the values in Corrected ItemTotal Correlation - minimum value for this is
0.3 ( Pallant : 2007). The value in next table
indicates that the item need to be
reconsidered whether it should be removed.
When we have few/limited items (e.g. less
that 10), the inter item correlation value will
be high which is within 0.48 to 0.76 ( Pallant
: 2007).
b01
b02
b03
b04
b05
b06
b07
b08
b01
1.000
.191
.283
.221
.262
.157
.168
.151
b02
.191
1.000
.061
.136
.101
.236
-.092
-.092
Inter-Item Correlation Matrix
b03
b04
b05
b06
b07
.283
.221
.262
.157
.168
.061
.136
.101
.236
-.092
1.000
.113
.168
.177
.127
.113
1.000
.222
.217
.226
.168
.222
1.000
.320
.144
.177
.217
.320
1.000
.172
.127
.226
.144
.172
1.000
.115
.187
.071
.188
.259
b08
.151
-.092
.115
.187
.071
.188
.259
1.000
b09
.002
-.192
.074
.024
.118
.147
.171
.359
b10
.194
-.018
.233
.282
.292
.263
.355
.141
Item-Total Statistics
b09
b10
.002
.194
Scale Variance Corrected
Squared
if Item
Item-Total
Multiple
Scale Mean if
-.192
.074
.024
.118
.147
.171
Item Deleted
Deleted
Correlation
Correlation
.233
.282
.292.355 .263 .191
.355
b01-.018 19.42
7.092
b02
19.75
7.714
.068
.161
b03
18.97
6.838
.292
.125
b04
19.01
6.504
.361
.171
b05
19.37
7.081
.385
.195
b06
19.08
6.289
.416
.223
b07
18.69
6.772
.354
.197
b08
18.68
6.945
.312
.216
b09
18.61
7.159
.217
.210
b10
18.73
6.050
.456
.272
Cronbach's
Alpha if Item
.359
Deleted 1.000
.141
.630
.676
.640
.625
.627
.612
.627
.635
.654
.601
.242
.242
1.000



1. Analyze > Scale > Reliability Analysis
2. click on all of the individual items that
make up the scale (lifsat1, lifsat2, lifsat3,
lifsat4, lifsat5). Move these into the box
marked Items.
3. In Model Section select Alpha



4. In Scale Label box type in the name of the
scale or subscale (life satisfaction)
5. Click on the Statistic button. In the
Descriptive for section, click on Item, Scale
and Scale if item deleted. In the Inter-item
section, click on Correlations. In the
Summaries section,click on Correlations
6. Click on Continue and then OK
Information from these slides onwards are taken and modified from
Prof. Rosynella Cardozo
Prof. Jonathan Magdalena
 Validity
Does it measure what it is supposed to measure?
 Reliability
How representative is the measurement?
 Practicality
Is it easy to construct, administer, score and interpret?
The term validity refers to whether or not a test
measures what it intends to measure.
On a test with high validity the items will be closely linked
to the test’s intended focus. For many certification and
licensure tests this means that the items will be highly
related to a specific job or occupation. If a test has poor
validity then it does not measure the job-related content
and competencies it ought to.
There are several ways to estimate the validity of a test,
including content validity, construct validity, criterionrelated validity (concurrent & predictive), convergent
validity, discriminant validity and face validity.


Content”: related to objectives and their sampling.
“Construct”: referring to the theory underlying the target.

“Criterion”: related to concrete criteria in the real world. It
can be concurrent or predictive.
“Concurrent”: correlating high with another measure already
validated.
“Predictive”: Capable of anticipating some later measure.

“Face”: related to the test overall appearance.


Content validity refers to the connections
between the test items and the subject-related
tasks. The test should evaluate only the content
related to the field of study in a manner
sufficiently
representative,
relevant,
and
comprehensible.
It implies using the construct correctly
(concepts, ideas, notions). Construct validity
seeks agreement between a theoretical concept
and a specific measuring device or procedure. For
example, a test of intelligence nowadays must
include measures of multiple intelligences, rather
than just logical-mathematical and linguistic
ability measures.
Like content validity, face validity is determined by a review of
the items and not through the use of statistical analyses.
Unlike content validity, face validity is not investigated through
formal procedures.
Instead, anyone who looks over the test, including examinees,
may develop an informal opinion as to whether or not the test is
measuring what it is supposed to measure.
While it is clearly of some value to have the test appear to be
valid, face validity alone is insufficient for establishing that the
test is measuring what it claims to measure.
 Validity
Does it measure what it is supposed to measure?
 Reliability
How representative is the measurement?
 Practicality
Is it easy to construct, administer, score and interpret?
Reliability is the extent to which an experiment, test,
or any measuring procedure shows the same result on
repeated trials.
Without the agreement of independent observers able
to replicate research procedures, or the ability to use
research tools and procedures that produce consistent
measurements, researchers would be unable to
satisfactorily draw conclusions, formulate theories, or
make claims about the generalizability of their
research.





“Equivalency”: related to the co-occurrence of
two items
“Stability”: related to time consistency
“Internal”: related to the instruments
“Inter-rater”: related to the examiners’
criterion
“Intra-rater”: related to the examiners’
criterion
Internal consistency is the extent to which tests or
procedures assess the same characteristic, skill or quality.
It is a measure of the precision between the measuring
instruments used in a study.
This type of reliability often helps researchers interpret data
and predict the value of scores and the limits of the
relationship among variables.
For example, analyzing the internal reliability of the items on a
vocabulary quiz will reveal the extent to which the quiz
focuses on the examinee’s knowledge of words.
Equivalency reliability is the extent to which two items measure
identical concepts at an identical level of difficulty. Equivalency
reliability is determined by relating two sets of test scores to one
another to highlight the degree of relationship or association. For
example, a researcher studying university English students happened
to notice that when some students were studying for finals, they got
sick. Intrigued by this, the researcher attempted to observe how
often, or to what degree, these two behaviors co-occurred
throughout the academic year. The researcher used the results of
the observations to assess the correlation between “studying
throughout the academic year” and “getting sick”. The researcher
concluded there was poor equivalency reliability between the two
actions. In other words, studying was not a reliable predictor of
getting sick.
Stability reliability (sometimes called test, re-test
reliability) is the agreement of measuring instruments
over time. To determine stability, a measure or test is
repeated on the same subjects at a future date.
Results are compared and correlated with the initial
test to give a measure of stability. This method of
evaluating reliability is appropriate only if the
phenomenon that the test measures is known to be
stable over the interval between assessments. The
possibility of practice effects should also be taken
into account.
Inter-rater reliability is the extent to which two or more
individuals (coders or raters) agree. Inter-rater reliability
assesses the consistency of how a measuring system is
implemented. For example, when two or more teachers use a
rating scale with which they are rating the students’ oral
responses in an interview (1 being most negative, 5 being most
positive). If one researcher gives a "1" to a student response,
while another researcher gives a "5," obviously the inter-rater
reliability would be inconsistent. Inter-rater reliability is
dependent upon the ability of two or more individuals to be
consistent. Training, education and monitoring skills can enhance
inter-rater reliability.
Intra-rater reliability is a type of reliability
assessment in which the same assessment is completed
by the same rater on two or more occasions. These
different ratings are then compared, generally by
means of correlation. Since the same individual is
completing both assessments, the rater's subsequent
ratings are contaminated by knowledge of earlier
ratings.

Examinee
 (is a human being)

Examiner
 (is a human being)

Examination
 (is designed by and for human beings)
Validity and reliability are closely related.
A test cannot be considered valid unless
the measurements resulting from it are
reliable.
Likewise, results from a test can be
reliable and not necessarily valid.
 Validity
Does it measure what it is supposed to measure?
 Reliability
How representative is the measurement?
 Practicality
Is it easy to construct, administer, score and interpret?
It refers to the economy of time, effort and
money in testing. In other words, a test should be…




Easy
Easy
Easy
Easy
to
to
to
to
design
administer
mark
interpret (the results)