LIS 397.1 Introduction to Research in Library and

Download Report

Transcript LIS 397.1 Introduction to Research in Library and

LIS 397.1
Introduction to Research in
Library and Information
Science
Statistical Hypotheses
R. E. Wyllys
Copyright 2003 by R. E. Wyllys
Last revised 2003 Jan 15
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Hypotheses
• Hypotheses state a relationship among
two or more variables
• Hypotheses may be stated in positive or
negative terms
• Hypotheses must be capable of being
tested as to whether they are “true” or
“false”
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Two Types of Hypotheses
• General Hypotheses
– Concern variables directly related to the
problem being studied
• Statistical Hypotheses
– Are a subclass of general hypotheses
– Are tools
– Are used in efforts to determine whether
general hypotheses are true or false
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Statistical Hypothesis
• Makes a to-be-tested statement about
either
– The kind of probability distribution that a
certain variable obeys; or
– The value of a population parameter
(average, total, proportion, etc.)
• Must be one of a relatively small
number of standard types of statistical
hypothesis
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Statistical Hypothesis vs.
State of Nature
NATURE
DECISION
Accept H
Reject H
STATE OF
H is true
OK
error
H is false
error
OK
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Null Hypothesis vs.
State of Nature
Are the erroneous decisions of equal importance?
If not (the usual case), then arrange the wording
of the hypothesis so that the more serious
error occurs when the hypothesis is true but you
decide that it is false. I.e., arrange things so
that the more serious error occurs when you
reject a true hypothesis. This arrangement
yields the Null Hypothesis, H0.
NATURE
DECISION
Accept H0
Reject H0
STATE OF
H0 is true
OK
Type I error
H0 is false
Type II error
OK
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Null Hypothesis & Probabilities
• P(Type I error) =  = “level of
significance of the test” = “risk of the
test” = “alpha of the test”
• P(Type II error) = 
• P(not making Type II error) = 1 -  =
“power of the test” = P(correctly
recognizing hypothesis as false when it
is false)
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Null Hypothesis & Probabilities
• Primary objective: Avoid more serious
error; i.e., ensure that  is small
• Secondary objective: Increase chance
of recognizing hypothesis as false if it is
false; i.e., increase power of test, 1-
– For a given , the only way to increase
power is to increase size of sample
– Power is very hard to determine; in
practice,  gets almost all the attention
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Test of a Statistical Hypothesis
• Statements of
– Null hypothesis, H0
– Alternative hypothesis (often simply
negation of H0)
– Level of significance, 
– “Critical region”, i.e., what outcomes will
lead to rejection of H0 (in practice, this
usually means stating threshold value from
appropriate statistical table)
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Common Types of SingleVariable Statistical Hypotheses
• H0:  = 0
– Population mean is some number:
“Average daily circulation total is 123”
• H0: 1 = 2
– Means of two populations are equal:
“Average cost per online search using
Service A = average cost using Service B”
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Common Types of SingleVariable Statistical Hypotheses
• H0: 1 = 2 = 3 = ..., etc.
– Means of Populations 1, 2, 3, ..., etc. are all
equal: “Average number of books
borrowed per student per semester is the
same for freshmen, sophomores, juniors,
and seniors.”
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Common Types of Two-Variable
Statistical Hypotheses
• H0: XY = 0
– Variables X and Y are not correlated in the
population: “There is no correlation
between the age and the salary of a typical
librarian”
• H0: Categorical variables X and Y are
not associated:
– “There is no association between the sex
of a library patron and the type of book the
patron prefers”
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Standardized Tests of
Statistical Hypotheses
• To each type of statistical hypothesis
corresponds a particular standardized test
procedure or procedures
• Each test procedure includes a formula, the
“test statistic”
• You
– place, into the test statistic, data from observed
sample or samples
– obtain a number, the observed value of the test
statistic
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Standardized Tests of
Statistical Hypotheses (cont'd)
• Traditional Method: Compare absolute value
of observed value of test statistic against
threshold value from pertinent table
– If |test statistic|  tabled threshold
– If |test statistic| > tabled threshold
Accept H0
Reject H0
• Computer-Era Method: Use probability of
getting observed value of test statistic when
the null hypothesis H0 is true (OVTSWNHT)
– If P(OVTSWNHT)  
– If P(OVTSWNHT) < 
Accept H0
Reject H0
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Comparing the Traditional and
Computer-Era Methods
t-Test: Two-Sample Assuming
Equal Variances
Men
Mean
Variance
Observations
Pooled Variance
Hypothesized Diff.
df
t Stat
P(T<=t) two-tail
t Critical two-tail
5.000
0.800
6
1.625
0
12
-1.82
0.0945
2.179
Women
6.250
2.214
8
Part of Excel’’s output for “A Worked
Example” from pp. 88-90 of Hinton
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science
Evidence in the Sample is Weighed
against Risk in order to Tip the Balance
toward Acceptance or Rejection
School of Information - The University of Texas at Austin
LIS 397.1, Introduction to Research in Library and Information Science