IS 4800 Empirical Research Methods for Information Science Class Notes Feb. 17, 2012 Instructor: Prof.
Download
Report
Transcript IS 4800 Empirical Research Methods for Information Science Class Notes Feb. 17, 2012 Instructor: Prof.
IS 4800 Empirical Research Methods
for Information Science
Class Notes Feb. 17, 2012
Instructor: Prof. Carole Hafner, 446 WVH
[email protected] Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
Hypothesis Testing
and
Inferential Statistics
2
Basic Process of Hypothesis Testing
■ H1: Research Hypothesis:
■ Population 1 is different than Population 2
■ H0: Null Hypothesis:
■ No difference between Pop 1 and Pop 2
■ The difference is “null”
■ Compute p(observed difference|H0)
■ ‘p’ = probability observed difference is due to random
variation
■ If p<threshold then reject H0 => accept H1
■ p typically set to 0.05 for most work
■ p is called the “level of significance”
3
Examples
■ Research Question:
■ Which is more popular, Guitar Hero or Rock Band?
■ Research Question:
■ Is the ownership of Wii vs. Xbox consoles
significantly different for NU students compared to
ownership in the general US population?
■ Research Question:
■ Are Wii owners more likely to own Guitar Hero vs.
Rock Band, compared to Xbox owners?
4
Type of Errors in Hypothesis
Testing
“The Truth”
H0 True
H0 False
Decide to Reject H0
Type I
Error
Correct
Decision
Do not Reject H0
Correct
Decision
Type II
Error
‘p’ = p(?) Probability of Type I Error
5
Procedure for Hypothesis Testing
with Chi-square
1. Formulate your research hypothesis (including
statement of expected frequencies)
2. Determine hypothesis test parameters
–
significance threshold
3. Collect your data
4. Compute Chi-Square statistic and draw
conclusion
6
Chi-Square for Goodness of Fit
■ Assumes
1. You have a nominal variable
•
Values are exhaustive & mutually-exclusive
2. You have an Expected Frequency table for the
nominal variable
3. Representative sample
7
Chi-Square for Goodness of Fit
■ Form of null hypothesis H0?
■ Observed frequency = Expected frequency
■ Populations (expected, observed) are actually the
same
■ Form of hypothesis H1?
■ Observed frequency ≠ Expected frequency
■ Populations (expected, observed) are different
8
Formula for Chi-square statistic
(O E )
E
2
2
■ O = Observed frequency for a given category
■ E = Expected frequency for a given category
■ Note: “statistic” is a function you apply to a set of
data (in a statistical analysis)
9
Example
■ You go gambling in a shady casino, and
suspect that the games are rigged.
■ You focus your attention on one 6-sided die
being used in a game and keep track of 60
rolls:
Roll
1
Count 6
2
5
3
7
4
9
5
3
6
30
■ Is the die “loaded”?
10
Example
■ OS market share in US is:
■ 96% PC; 3% Macintosh; 1% Linux
■ You do a survey in your company and find the
following user breakdown:
■ 475 PC; 25 Mac; no Linux
■ Is your company weird?
11
Formula for Chi-square statistic
(O E )
E
2
2
■ O = Observed frequency for a given category
■ E = Expected frequency for a given category
■ Note: “statistic” is a function you apply to a set of
data (in a statistical analysis)
12
Computing Chi-square
■ SPSS:
■ run NonParametric/ChiSquare
■ See if significance<threshold
• Yes => reject H0
• No => inconclusive
■ Manually:
■ Determine df (= num categories – 1)
■ Compute Chi-square using formula
■ Lookup to see if statistic>table entry for
threshold-significance, df
• If yes => reject H0
• If no => inconclusive
13
Reporting result
Where,
2 (df ) chisq, p sigthresh
– df = degrees of freedom
– sigthresh = pre-defined significance threshold
• Note: if p<<sigthresh, can report that as well, e.g., “p<.01”,
“p=.001”
For example:
2 (2) 11.89, p 0.05
If not significant, than use “n.s.” instead of “p<…”.
Usually also report expected and actual frequencies, or
at a minimum, the total number of cases considered
(aka “n”).
Measure
475
25
0
500
O
E
95
5
0
100
X^2
term
96 0.010417
3 1.333333
1
1
100
2.34375
15
What do you do if you don’t have
any information to base Expected
frequencies on?
16
Chi-Square Test for
Independence
Statistical independence:
P(A| B) = P(A | ~B) = P(A)
17
Chi-Square Test for Independence
■ Are two variables related, or are they
independent?
■ Assumptons
■ Both variables must be nominal (or treated as if)
■ Cannot be related in a ‘special’ way (i.e., repeated
measures)
■ Representative samples assumed
■ Normal distribution NOT assumed
18
Example
■ Morning & night people using different modes
of transportation.
■ What kind of study is this?
Bus
Morning
Night
Carpool Own Car
60
30
30
20
20
40
19
Expected frequencies if variables are
independent
■ E = (R x C)/N
for each cell
■ R = row count
■ C = column count
■ N = total number in all cells
Bus
Morning
Night
Carpool Own Car
60
30
30
20
20
40
20
Expected frequencies if variables are
independent
■ Step 1 – compute row & col totals
Bus
Morning
Night
Carpool Own Car
60
30
30
20
20
40
80
50
70
21
120
80
Expected frequencies if variables are
independent
■ Step 1 – compute row & col totals
■ Step 2 – compute row %
Bus
Morning
Night
Carpool Own Car
60
30
30
20
20
40
80
50
70
22
120
80
60%
40%
Expected frequencies if variables are
independent
■ Step 1 – compute row & col totals
■ Step 2 – compute row %
■ Step 3 – ea cell = (R x C)/N
Bus
Morning
Night
(48)
(32)
Carpool Own Car
(30)
60
30 (42) 30
(20)
20
20 (28) 40
80
50
70
23
120
80
Formula
■ df = (NumRows-1)x(NumColumns-1)
(O E )
E
2
2
24
Written Study Reports
■ Objectives (also critiques)
■ Describe what your study is about
■ Motivate your study
■ Assure reader you have conducted a sound study
• Research Methods – often presented in small font
■ Present results in an objective manner
■ Discuss implications
■ Discuss future work
■ Enable replication
25
Typical Study vs. IS/CS/HCI
Paper Structure
Astract
Introduction
Method
Results
Discussion
Motivation
Related work
Hypotheses
Limitations
Implications
Future work
References
26
Typical Study vs. IS/CS/HCI
Paper Structure
■ Abstract
■ Introduction
■ Motivation
■ Related work
■ System design
■ Evaluation
■
■
■
■
Hypotheses
Method
Results
Discussion – summary,
limitations
■ Conclusion
■ Implications
■ Future work
■ References
27
The Abstract
■ Concise summary
■ Abstract for an empirical study should include
■ Information on the problem under study
■ The nature of the subject sample
■ A description of methods, equipment, and
procedures
■ A statement of the results
■ A statement of the conclusions drawn
■ Often the last thing you write
28
The Introduction
■ Part of paper giving justification for study
■ Usually has the following information
■ Introduction to the topic under study
■ Brief review of research and theory related to the topic
■ A statement of the problem to be addressed
■ A statement of the purpose of the research
■ A brief description of the research strategy
■ A description of predictions and hypotheses
■ CS/IS papers often put Related Work as a separate section after
Introduction
■ For each, describe how your work is different
29
Organization of the Introduction:
General to Specific
Present a general
introduction to your topic
Review relevant
literature
Link literature review to
your hypotheses
State your
hypotheses
30
The Method Section
■ Includes information on exactly how a study was
carried out
■ Subsections
■ Participants or subjects
• Describe in detail the participant or subject sample
• Human participants go in a Participants subsection, and animal
subjects in a Subjects subsection
■ Apparatus or materials
• Describe in detail any equipment or materials used
• Equipment is usually described in an Apparatus subsection and
written materials in a Materials subsection
31
The Method Section
■ Procedure
■ Describe
• Exactly how the study was carried out
• The conditions to which subjects were exposed or under
which observed
• The behaviors measured and how they were scored
• When and where observations were made
• Debriefing procedures
■ Enough detail should be included in all sections so that
the study could be replicated
32
The Results Section
■ Objective, dry, boring – just the facts
■ All relevant data and analyses are reported in the
results section
■ Do not present raw data
■ Data should be reported in summary form
■ Descriptive statistics
■ Inferential statistics
■ Results of descriptive and inferential statistics must be
presented in narrative format
■ Describe the source of any unconventional statistical
tests
33
Commonly Used Statistical Citations
Statistical Test
Format
Analysis of variance
F (1,85) = 5.96, p < .01
Chi-square
χ2(3) = 11.34, p < .01
t test
t (56) = 4.78, p < .01
34
Abbreviations for Statistical Notation
Abbreviation
Meaning
df
Degrees of freedom
F
F ratio
M
Arithmetic average (mean)
N
Number of subjects in entire sample
p
p value
SD
Standard deviation
t
t statistic
z
Results from a z test or z score
μ
Population mean (mu)
s
Population stddev
35
The Discussion Section
■ This is where you can take some liberties with
describing what the results mean
■ Results are interpreted, conclusions drawn, and
findings are related to previous research
■ Section begins with a brief restatement of hypotheses
■ Next, indicate if hypotheses were confirmed
■ The rest of the section is dedicated to integrating
findings with previous research
■ It is fine to speculate, but speculations should not stray
far from the data
36
Organization of Discussion: Specific
to General
Restate your hypotheses
or major finding
Tie your results with
previous research and
theory
State broad implications of
your results, methodological
implications, directions for
future research
37
Example
38
39
40
41
42
43
44
45
46
47
48
49
50
Citations
■ Liberally cite previous & related work.
■ If you copy passages you must cite and,
depending on length, format to indicate it is
copied.
■ Suggest using EndNote, BibTex or similar.
51
Ethical Issues
■ Report all of your findings (not just the ones you like)
■ Adhere to your original plan
■ Report any deviations and why
■ Power analysis, statistics, measures
■ Do not drop subjects or data points without rigorous justification
■ If your hypothesis test was not significant you cannot say
anything about difference in means (example).
■ If you did not do an experiment, attempting to control for
extraneous variables, you cannot mention or imply causality.
52
Oral Presentation of
Study Results
53
Oral Presentation
■ Main concepts and ideas
■ Do not go into great detail on experimental
methods – just enough so people understand
roughly what you did
■ Focus on motivation, results, implications
■ If listener wants details they can read the paper or
ask questions
54
Oral Presentation
Don’t do this…
Change
From To
Measure
Day1 Day2
WAI/COMP
7
27
WAI/BOND
7
27
WAI/TASK
7
27
WAI/GOAL
7
27
CONTINUE LAURA
30
44
MIN/DAY
-6-0 22-30
1-7 22-30
22-30 38-44
DAY/WK>30MIN
-6-0 22-30
1-7 22-30
22-30 38-44
STEP/DAY
1-7 22-30
DAY/WK>10KSTEP
1-7 22-30
STAGE
Intake
30
30
44
SELF-EFFICACY
1
29
29
44
PROS
1
29
29
44
CONS
1
29
29
44
CONTINUE FT
30
44
ALL CONDS
df
t
p
54 0.205 0.838
54 0.519 0.606
54 0.134 0.894
54 0.155 0.877
54 0.868 0.389
81 1.470 0.145
81 0.691 0.492
81 3.626 0.001
81 6.653 0.000
81 6.272 0.000
81 8.990 0.000
81 1.778 0.079
77 3.986 0.000
81 6.988 0.000
81 2.019 0.047
81 4.782 0.000
81 2.770 0.007
81 1.998 0.049
81 0.393 0.695
81 0.902 0.370
81 0.740 0.462
81 1.520 0.133
CONTROL
df
t
p
26
26
26
26
26
26
26
25
26
26
26
26
26
26
26
26
26
1.274
0.758
2.480
2.323
2.401
4.043
1.197
1.355
3.403
1.185
0.872
1.525
1.418
1.147
1.124
0.386
1.442
0.214
0.456
0.020
0.028
0.024
0.000
0.242
0.188
0.002
0.247
0.391
0.139
0.168
0.262
0.271
0.703
0.161
NON-REL
df
t
p
24 0.014 0.989
24 0.376 0.710
24 0.409 0.686
24 0.081 0.936
24 0.625 0.538
24 0.124 0.903
24 0.109 0.914
24 1.959 0.062
24 5.284 0.000
24 3.818 0.001
24 5.322 0.000
24 2.366 0.026
23 3.591 0.002
24 4.000 0.001
24 1.000 0.327
24 3.314 0.003
24 4.550 0.000
24 0.456 0.653
24 0.225 0.824
24 0.499 0.622
24 0.611 0.547
24 1.163 0.256
RELATIONL
df
t
p
29 0.361 0.720
29 1.489 0.147
29 0.661 0.514
29 0.329 0.745
29 0.619 0.541
29 1.104 0.279
29 0.358 0.723
29 1.804 0.082
29 4.347 0.000
29 4.597 0.000
29 6.530 0.000
29 0.236 0.815
27 2.055 0.050
29 4.738 0.000
29 1.409 0.169
29 4.750 0.000
29 0.085 0.933
29 1.540 0.134
29 0.308 0.760
29 0.823 0.417
29 0.339 0.737
29 0.000 1.000
55
Oral Presentation
Do use as many figures as possible
7
WEEK 1
WEEK 4
6
5
4
NON-REL
RELATIONAL
3
2
CO
M
P
W
K1
BO
ND
W
K1
TA
SK
W
K1
G
O
AL
W
K1
CO
M
P
W
K4
BO
ND
W
K4
TA
SK
W
K4
G
O
AL
W
K4
1
56
Oral Presentation
Guide for Visuals
■ Visuals should be exhibits that you talk about
■ Do not put lots of text on charts
■ Do not read your charts for your presentation
■ Use interactivity, video, images to keep your
audience awake
57
Common Questions
■ How did you evaluate that?
■ How did you measure that?
■ How did you control for extraneous variable
X?
■ Why didn’t you use statistic Y?
■ Isn’t that a biased sample?
■ What was your control group?
■ How did you do study procedure Z?
58
Tips
■ Describe your sample
■ Minimal demographics – number of subjects, broken down by gender
■ Better: age, occupation, major, year
■ Minimize text on your charts
■ If you use a novel measure (e.g., new survey) you must give
details on the measure
■ Actual questions asked
■ Any reliability/validity/psychometrics done
■ If you do interviews, include actual quotes
■ Build from data to conclusions
■ Practice your timing/delivery with your project team
59