Action Research Review INFO 515 Glenn Booker INFO 515 Lecture #10 Why do we do this? Measurements are needed to understand a system, and predict its future.
Download
Report
Transcript Action Research Review INFO 515 Glenn Booker INFO 515 Lecture #10 Why do we do this? Measurements are needed to understand a system, and predict its future.
Action Research
Review
INFO 515
Glenn Booker
INFO 515
Lecture #10
1
Why do we do this?
Measurements are needed to understand
a system, and predict its future behavior
Statistical techniques provide a commonly
accepted means of analyzing
measurements
Statistics is based on recognizing that
measurements tend to fall over a range of
values, not just one precise number
INFO 515
Lecture #10
2
Types of Research
Historical (what
happened?)
Descriptive (what is
happening?)
Developmental
(over time)
Case and Field (study
an organization)
INFO 515
Lecture #10
Correlational (does A
affect B?)
Causal Comparative
(what caused it)
True Experimental
(single / double blind)
Quasi-Experimental
Action Research
3
Data Analysis
Raw data, such as one survey result
Refined data, such as the distribution of
ages of Philadelphia residents
Derived data, such as comparing the age
distribution of Philadelphia residents to
that of the country
INFO 515
Lecture #10
4
Population vs. Sample
Often the subject of interest (population)
is so big it isn’t feasible to measure it all
Then a sample of measurements can be
made, and we want to relate the sample
measurement to the population
INFO 515
Lecture #10
5
Sampling
Sampling can be done using probabilistic
techniques (e.g. various random samples)
Simple or stratified random,
Cluster (geographic), or
Systematic (every Nth) samples
Or using non-probabilistic methods
(whoever’s convenient, specific groups,
or experts)
INFO 515
Lecture #10
6
Customer Satisfaction Surveys
A special case of sampling, customer
satisfaction surveys are often done using:
In person interview
Telephone interview
Questionnaire by mail
Sample sizes are based on the allowable
error, population size, and the result
obtained
INFO 515
Lecture #10
7
Measurement Scales
Measurements can use four major types
of scales; the types of analysis possible
depend strongly on the type of
measurements used
INFO 515
Nominal (named buckets, without sequence)
Ordinal (ordered buckets)
Interval (intervals mean something, can +-)
Ratio (you can form ratios, can +-*/ )
Lecture #10
8
Discrete versus Continuous
Discrete (nonparametric) measurements
use nominal or ordinal scales; only specific
values are allowed
Car make = Chevy, or cost = High
Continuous (parametric) measurements
use interval or ratio scales, and generally
have integer or real number values
INFO 515
Temperature = 98.6 deg F, Height = 172.1 cm
Lecture #10
9
Descriptive Statistics
Many common statistics can describe the
central tendency of a set of measurements
INFO 515
Average (arithmetic mean)
Minimum, Maximum, Range
Median (middle value)
Mode (most common value)
Lecture #10
10
Normal Distribution
Many measurements can be described by
a “normal” distribution, which is
summarized by an average value and a
standard deviation, s or s
We can predict how likely any range of
values is to occur for a normal distribution
(how often is X between 5 and 8?)
INFO 515
Lecture #10
11
Z Score
Z scores measure how far from the mean
a single measurement is
z = (Xi - m) / s
Same formula used for finding “t” too
Does not only apply to a normal
distribution, but if it does, then we can
predict the probability of that value or
higher/lower occurring
INFO 515
Lecture #10
12
Standard Error
A sample of N measurements will have a
standard error SEx = s / sqrt(N)
The standard error allows us to define the
confidence interval, CI
CI = mean +/- crit*SEx
where “crit” is the critical z score for a
large sample, or the critical t score for a
small sample
INFO 515
Lecture #10
13
Critical z and t
The critical z score is only a function of the
desired confidence level of the results
(zc = 1.96 for 95% confidence level)
Critical t score is a function of the sample
size (degrees of freedom, df = n-1) and
the desired confidence level
INFO 515
As df gets very large, critical t critical z
Lecture #10
14
Confidence Level
We have to accept some level of
uncertainty in a statistical analysis – our
conclusion might be wrong!
Generally, a 95% level of confidence is
used, unless life is on the line - then a
99% level of confidence is required
INFO 515
Use 95% typically, hence critical significance
is 0.050
Lecture #10
15
Confidence Level
The level of confidence of your results,
plus the critical significance, always equals
exactly one
For practically every statistical test, having
the Significance of the result less than
the critical value means to reject the
null hypothesis
INFO 515
If Sig
actual
< Sig
crit,
reject null hypothesis
Lecture #10
16
Frequency and Percentage
Frequency graphs and crosstabs can
provide a lot of information just from
counts of a nominal or ordinal
measurement occurring, possibly given
with the percentages of each event’s
occurrence
Histograms can provide similar charts for
ratio or interval scaled data
INFO 515
Lecture #10
17
Scatterplots
Scatter plots or diagrams show the
relationship between two or more
measures
INFO 515
The horizontal axis is generally the
independent variable (X), sometimes also
called a factor or grouping variable
The vertical axis is generally the dependent
variable (Y), which is the measure you’re
trying to understand
Lecture #10
18
Hypothesis Testing
Some statistics are used in the context of
testing a hypothesis - a statement whose
truth you wish to determine
Are Philadelphians more likely to be Nobel
Prize winners?
The Null hypothesis is the opposite of the
hypothesis, and generally says there is no
difference or no effect observed
INFO 515
Philadelphians no more likely to be Nobel Prize
winners than any other group
Lecture #10
19
Hypothesis Testing
Can’t truly PROVE anything - only
determine if the differences observed are
“not likely to be due to chance”
Select one or more “Tests of Significance”
to determine if there is a statistically
significant difference (Yes/No); if Yes, then
can
INFO 515
Select one or more “Measures of Association”
to describe the strength of the difference, and
possibly its direction
Lecture #10
20
One versus Two Tailed Tests
A null hypothesis which tests for “no
difference” uses a two tailed test
A null hypothesis which specifically tests
for “greater than” uses a one tailed test
A null hypothesis which specifically tests
for “less than” uses a one tailed test
INFO 515
One versus two tailed changes the critical z
or t score; generally makes the test easier to
show significance – that’s why two-tailed tests
are used
Lecture #10
21
Z or T Test
The z or t tests can be used to compare
two distribution means, or compare one
distribution mean to a fixed value (interval
or ratio data)
Compare the actual z or t score to the
critical z or t score
If the actual z or t score is closer to zero
than the critical value, accept the null
hypothesis
INFO 515
Lecture #10
22
Z or T Test (Two Tailed)
Accept Null Hypothesis
Reject Null
Hypothesis
Reject Null
Hypothesis
X
actual z
or t
-crit
mean
z or t
scale
+crit
Notice this is for the x or t value, NOT the significance of that value
INFO 515
Lecture #10
23
Z or T Test (One Tailed)
Accept Null Hypothesis
Reject Null
Hypothesis
X
actual z
or t
mean
z or t
scale
+crit
(Case here is testing if the actual value is greater than the mean;
for a “less than” case, use only the negative critical value.)
INFO 515
Lecture #10
24
Is My Sample Normal?
Boxplots and stem-and-leaf diagrams can
help show graphically whether a sample
has a fairly normal distribution
The skewness and kurtosis of a data set
can help identify non-normality, if their
values are more than two times their own
standard errors
INFO 515
Lecture #10
25
T Tests
T tests compare means for ratio or
interval data
INFO 515
Independent t test is for two different strata
within one data set
Paired t test is to compare measures of the
same group before and after some event (drug
test), or the samples are otherwise believed to
be dependent on each other
One-sample t test compares one sample to a
fixed value
Lecture #10
26
T Tests
Null hypothesis is that there is no
difference between the means
Results (e.g. significance) may differ if
variances are not equal, since df changes
The Levene test checks for equal
variances
INFO 515
Null hypothesis for the Levene test is that the
variances are equal
If the Levene significance < 0.050, variances
are not equal (reject the null hypothesis)
Lecture #10
27
Independent T Test Evaluation
Three ways to check the results of a T test
INFO 515
If the T test’s significance < 0.050, reject the
null hypothesis
Check the stated t value against the critical t
value for this ‘df’ level; if t(actual) > t(critical)
reject the null hypothesis
If the confidence interval for the difference
between the means does not include zero,
reject the null hypothesis
Lecture #10
28
Evaluating Significance
Accept Null
Hypothesis
Reject Null Hypothesis
X
Significance
Actual
Sig.
Critical
0.050
0
INFO 515
Lecture #10
29
Paired T Test Evaluation
Checks before and after test cases
Includes a correlation factor (like ‘r’)
Can use paired test if significance < 0.050
Larger correlation factor means stronger
relationship between the variables
Test evaluation as Independent T Test
INFO 515
Significance, ‘t’ value, and confidence interval
Lecture #10
30
One-Sample T Test
Compare a sample mean to a fixed value
Test shows the actual values of means,
with their std deviation and std error
Same interpretation of results
INFO 515
Significance, ‘t’ value, and confidence interval
Lecture #10
31
F Test and ANOVA
Compare several means against each
other using Analysis of Variance (ANOVA)
and the F test
Like extending the T tests to many
variables
Want data from random samples of
normal populations with equal variances
INFO 515
Lecture #10
32
F Test and ANOVA
Output includes the Levene test
Want significance for Levene > 0.050, so
that equal variances can be assumed
Otherwise, should not use ANOVA
Evaluate F by its significance
INFO 515
If Sig. < 0.050, reject the null hypothesis
(there is a significant difference among the
means)
Lecture #10
33
Additional ANOVA Tests
Once the F test shows there is some
difference in the means across a subset,
additional ANOVA tests can help identify
more specific trends and differences
Types of tests (see end of lecture 6)
include
INFO 515
Pairwise Multiple Comparisons
Post Hoc Range Tests
Lecture #10
34
Pairwise Multiple Comparisons
Pairwise Multiple Comparisons check two
subsets of data at a time
Bonferroni test is better for a small number
of subsets
Tukey test is better for many subsets
Both assume subset variances are equal
For each pair of subset values,
Sig < 0.050 means the difference in
means is significant
INFO 515
Lecture #10
35
Post Hoc Range Tests
Post Hoc Range Tests look for groups
within each subset which all have similar
variances
Tukey and Tukey’s-b tests include Post Hoc
Range Tests
Each column of the output is a subset with
statistically similar means
INFO 515
Subsets may overlap substantially
Lecture #10
36
Contrasts Across Means
Look across subset means to see if there
is a trend, such as a linear increase or
decrease across subsets
Can check for Linear, Quadratic, or Cubic
relationships
(i.e. first, second, or third order polynomials)
Check Significance of F for the Unweighted
version of each relationship (Linear, etc.) if
Sig. < 0.050, reject the null hypothesis
INFO 515
Lecture #10
37
Determine Linearity
An option under Compare Means / Means
allows checking just for linearity
This confirms the ANOVA test result for
Linearity
And gives R and Eta parameters, which
are Measures of Association
INFO 515
Lecture #10
38
R and Eta
Pearson’s R * measures how well the data
fits the regression (-1 is a perfect negative
correlation, 0 is no relationship, 1 is
perfect positive correlation), and describes
the amount of shared variance
between them
Eta squared gives how much of the
variance in one variable is caused by
the changes in the other variable
* Named for English statistician Karl Pearson, 1857-1936
(per http://human-nature.com/nibbs/03/kpearson.html)
INFO 515
Lecture #10
39
Regression Analysis
Regression Analysis looks at two interval
or ratio-scaled variables (generically X and
Y) and tries to fit an equation between
them
A dozen different equations are available
Linear, Power, Logarithmic, Exponential, etc.
Significance is checked by ANOVA F, and
Sig. of the regression coefficients;
association is measured with R Squared
INFO 515
Lecture #10
40
Regression Analysis
For a regression to have any significance,
we must have ANOVA’s Sig. F < 0.050
Then each variable’s coefficient (b0, b1,
etc.) must have significance < 0.050
Otherwise the coefficient might be zero
Then the better regression equations are
ranked in order of strength by R Square,
which is confirmed visually by plotting
INFO 515
Lecture #10
41
Regression Analysis
The standard error of coefficients is given,
so confidence intervals can be formed
Also helps report them meaningfully, so you
don’t report a value as 4.861435 if it has a
standard error of 0.92
INFO 515
Depending on the accuracy of the source data, you
could report that result as 5 +/- 1, or 4.9 +/- 0.9,
or 4.86 +/- 0.92
Lecture #10
42
Crosstabs
Crosstabs display data sorted by two
or more variables in table form
Often just counts of each category,
and/or the percentage of counts
Recoding data allows interval or ratio
scale data to be put into groups (e.g.
age 18-25)
INFO 515
Lecture #10
43
Pearson’s Chi Square
Measures how well the actual (observed)
data differs from a even (expected)
distribution of data
The “expected” data can be a random
distribution (same number of counts per
cell), or adjusted for the actual total
counts for each row and column
INFO 515
Lecture #10
44
Pearson’s Chi Square Evaluation
When chi square is larger than the critical
value, reject the null hypothesis
Or if the significance of chi square is <
0.050, reject the null hypothesis
Can also generate Chi square for a single
variable
Beware that Chi square is less meaningful
for large matrices
INFO 515
Or, it’s too easy for large matrices to show
significance falsely using Chi square
Lecture #10
45
Residuals
A residual is the difference between the
Observed and Estimated values for a cell
Residuals can be plotted to look for
outliers
Residuals can be standardized by dividing
by their standard deviation
INFO 515
Cells with a standardized residual magnitude
> 2 contribute a lot to Chi square
Lecture #10
46
Measures of Association
Measures of Association between two
variables can be symmetric or directional
Dozens of measures have been developed
to work with chi square test
Interpret them like ‘r’ - zero means no
correlation, larger values mean a stronger
correlation
INFO 515
Some can be > 1
Lecture #10
47
Measures of Association
Symmetric measures don’t care which
variable is dependent (Y)
Directional measures DO care which
variable is dependent (A = f(B) is not B =
f(A))
INFO 515
Some directional measures have a “symmetric”
value, the weighted average of the other two
Lecture #10
48
Symmetric Measures
The “Contingency Coefficient” is the main
symmetric measure with a Chi Square test
Works even with nominal data
Evaluated like Pearson’s r
Phi and Cramer’s V are other symmetric
measures
INFO 515
Lecture #10
49
Directional Measures
Directional measures range from 0 to 1
INFO 515
Lambda is the recommended directional
measure - tells what proportion of the
dependent variable is predicted by the
independent variable (like Eta)
Eta can be applied here if one variable is
interval or ratio scaled
Lecture #10
50
Relative Risk and Odds Ratio
Use only with 2x2 tables
Are quite directional
Tells how much more likely one cell is to
occur than the others
Need to be very careful when interpreting
INFO 515
Lecture #10
51
Square Tables
Tables with the same number of rows and
columns (RxR), and the same variables in
those rows and columns, can use kappa
INFO 515
Measures strength of association, like ‘r’
Check results for significance (<0.050)
Then judge the value of kappa using a
fixed scale
Lecture #10
52
General RxC Measures
Many measures can be used with a
general table of R rows and C columns
Gamma is the recommended measure
(symmetric)
Spearman’s Correlation Coefficient is also
widely used
INFO 515
Ranges from -1 to +1, based on ordered
categories
Lecture #10
53
Yule’s Q
Yule’s Q is a special case of gamma for a
2x2 table
Is judged on a fixed scale, like ‘r’
INFO 515
Lecture #10
54