IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof.

Transcript IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof.

IS 4800 Empirical Research Methods
for Information Science
Class Notes March 16, 2012
Instructor: Prof. Carole Hafner, 446 WVH
[email protected] Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
Outline
•
•
•
•
•
Sampling and statistics (cont.)
T test for paired samples
T test for independent means
Analysis of Variance
Two way analysis of Variance
Relationship Between Population
and Samples When a Treatment
Had No Effect
Population

Sample 1
M1
Sample 2
M2
3
Relationship Between Population
and Samples When a Treatment
Had An Effect
Control
group
population
c
Treatment
group
population
t
Treatment
group
sample
Control
group
sample
Mc
Mt
4
Sampling
Mean?
Variance?
Population

Sample of size N
Mean values from all possible
samples of size N
aka “distribution of means”
2
X

M=
SD2 =
2
(
X

M
)

N
MM = 
 M2 =
N
2
N
ZM = ( M -  ) /  M
Z tests and t-tests
t is like Z:
Z=M-μ/
M
t = M – μ / SM
μ = 0 for paired samples
We use a stricter criterion (t) instead of Z because S M is
based on an estimate of the population variance while  M is
based on a known population variance.
S2 = Σ (X - M)2 =
N–1
SS
N-1
S2M = S2/N
T-test with paired samples
Given info about
population of change
scores and the
sample size we will
be using (N)
We can compute the
distribution of means
?
=0
S2 est 2 from sample = SS/df
Now, given a
particular sample of
change scores of
size N
S2M = S2/N
We compute its mean
and finally determine
the probability that this
mean occurred by
chance
t=
M
SM
df = N-1
t test for independent samples
Given two
samples
Estimate population
variances
(assume same)
Estimate variances
of distributions
of means
Estimate variance
of differences
between means
(mean = 0)
This is now your
comparison distribution
Estimating the Population Variance
S2 is an estimate of σ2
S2 = SS/(N-1) for one sample (take sq root for S)
For two independent samples – “pooled estimate”:
S2 = df1/dfTotal * S12 + df2/dfTotal * S22
dfTotal = df1 + df2 = (N1 -1) + (N2 – 1)
From this calculate variance of sample means: S2M = S2/N
needed to compute t statistic
S2difference = S2Pooled / N1 + S2Pooled / N2
t test for independent samples, continued
Distribution of differences
between means
This is your
comparison distribution
NOT normal, is a ‘t’
distribution
Shape changes depending on
df
df = (N1 – 1) + (N2 – 1)
Compute t = (M1-M2)/SDifference
Determine if beyond cutoff score
for test parameters (df,sig, tails)
from lookup table.
ANOVA: When to use
• Categorial IV
numerical DV (same as t-test)
• HOWEVER:
– There are more than 2 levels of IV so:
– (M1 – M2) / Sm won’t work
ANOVA Assumptions
• Populations are normal
• Populations have equal variances
• More or less..
12
Basic Logic of ANOVA
• Null hypothesis
– Means of all groups are equal.
• Test: do the means differ more than expected
give the null hypothesis?
• Terminology
– Group = Condition = Cell
13
Accompanying Statistics
• Experimental
– Between-subjects
• Single factor, N-level (for N>2)
– One-way Analysis of Variance (ANOVA)
• Two factor, two-level (or more!)
– Factorial Analysis of Variance
– AKA N-way Analysis of Variance (for N IVs)
– AKA N-factor ANOVA
– Within-subjects
• Repeated-measures ANOVA (not discussed)
– AKA within-subjects ANOVA
14
ANOVA: Single factor, N-level
(for N>2)
• The Analysis of Variance is used when you have more
than two groups in an experiment
– The F-ratio is the statistic computed in an Analysis of
Variance and is compared to critical values of F
– The analysis of variance may be used with unequal sample
size (weighted or unweighted means analysis)
– When there are just 2 groups, ANOVA is equivalent to the t
test for independent means
15
One-Way ANOVA – Assuming
Null Hypothesis is True…
Within-Group Estimate
Of Population Variance


2
est1
2
est 2
2
 est
3
Between-Group Estimate
Of Population Variance
M1

2
withinest
M2
M3
2
 between
est

F=

2
between est
2
within est
Justification for F statistic
Calculating F
Example
Example
Using the F Statistic
• Use a table for F(BDF, WDF)
– And also α
BDF = between-groups degrees of freedom =
number of groups -1
WDF = within-groups degrees of freedom =
Σ df for all groups = N – number of groups
One-way ANOVA in SPSS
Data
6
5
4
Mean
3
Performance
2
1
0
1 Day 2 Day 3 Day
23
Analyze/Compare Means/One Way
ANOVA…
24
SPSS Results…
ANOVA
Performance
Between Groups
Within Groups
Total
Sum of
Squares
24.813
27.594
52.406
df
2
21
23
Mean Square
12.406
1.314
F(2,21)=9.442, p<.05
F
9.442
Sig .
.001
Factorial Designs
• Two or more nominal independent variables,
each with two or more levels, and a numeric
dependent variable.
• Factorial ANOVA teases apart the contribution
of each variable separately.
• For N IVs, aka “N-way” ANOVA
26
Factorial Designs
• Adding a second independent variable to a singlefactor design results in a FACTORIAL DESIGN
• Two components can be assessed
– The MAIN EFFECT of each independent variable
• The separate effect of each independent variable
• Analogous to separate experiments involving those variables
– The INTERACTION between independent variables
• When the effect of one independent variable changes over levels of a
second
• Or– when the effect of one variable depends on the level of the other
variable.
27
Example
Wait Time Sign in Student Center
vs.
No Sign
Satisfaction
Example of An Interaction - Student Center Sign –
2 Genders x 2 Sign Conditions
Value of the Dependent
Variable
Level 1
Level 2
12
F
10
8
6
4
M
2
0
Level 1
Level 2
No
Sign
Sign Level of Independent Variable A
Two-way ANOVA in SPSS
30
Analyze/General Linear
Model/Univariate
31
Results
Tests of Between-Subj ects Effects
Dependent Variable: Performance
Source
Corrected Model
Intercept
Training Days
Trainer
Training Days * Trainer
Error
Total
Corrected Total
Type III Sum
of Squares
26.507a
210.855
20.728
.002
1.680
25.899
401.250
52.406
df
5
1
2
1
2
18
24
23
Mean Square
5.301
210.855
10.364
.002
.840
1.439
F
3.685
146.547
7.203
.001
.584
a. R Squared = .506 (Adjusted R Squared = .369)
32
Sig .
.018
.000
.005
.974
.568
Results
33
Degrees of Freedom
• df for between-group variance estimates for main
effects
– Number of levels – 1
• df for between-group variance estimates for
interaction effect
– Total num cells – df for both main effects – 1
– e.g. 2x2 => 4 – (1+1) – 1 = 1
• df for within-group variance estimate
– Sum of df for each cell = N – num cells
• Report: “F(bet-group, within-group)=F, Sig.”
34
Publication format
Tests of Between-Subj ects Effects
Dependent Variable: Performance
Source
Corrected Model
Intercept
Training Days
Trainer
Training Days * Trainer
Error
Total
Corrected Total
Type III Sum
of Squares
26.507a
210.855
20.728
.002
1.680
25.899
401.250
52.406
df
5
1
2
1
2
18
24
23
Mean Square
5.301
210.855
10.364
.002
.840
1.439
F
3.685
146.547
7.203
.001
.584
a. R Squared = .506 (Adjusted R Squared = .369)
N=24, 2x3=6 cells =>
df TrainingDays=2,
df within-group variance=24-6=18
=>
F(2,18)=7.20, p<.05
Sig .
.018
.000
.005
.974
.568
Reporting rule
• IF you have a significant interaction
• THEN
– If 2x2 study: do not report main effects, even if
significant
– Else: must look at patterns of means in cells to
determine whether to report main effects or not.
36
Results?
Sig.
0.34
0.12
0.41
TrainingDays
Trainer
TrainingDays * Trainer
n.s.
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.34
0.12
0.02
Significant interaction between TrainingDays
And Trainer, F(2,22)=.584, p<.05
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.34
0.02
0.41
Main effect of Trainer, F(1,22)=.001, p<.05
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.04
0.12
0.01
Significant interaction between TrainingDays
And Trainer, F(2,22)=.584, p<.05
Do not report TrainingDays as significant
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.04
0.02
0.41
Main effects for both TrainingDays,
F(2,22)=7.20, p<.05, and Trainer,
F(1,22)=.001, p<.05
“Factorial Design”
• Not all cells in your design need to be tested
– But if they are, it is a “full factorial design”, and you
do a “full factorial ANOVA”
Real-Time
Retrospective
Agent


Text

X
Higher-Order Factorial Designs
• More than two independent variables are included in a
higher-order factorial design
– As factors are added, the complexity of the experimental
design increases
• The number of possible main effects and interactions increases
• The number of subjects required increases
• The volume of materials and amount of time needed to complete the
experiment increases
43

IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof.

Transcript IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof.

Directory