Document 7454248

Download Report

Transcript Document 7454248

Final review - statistics
Spring 03
Also, see final review - research design
Statistics
Descriptive Statistics
Statistics to summarize and describe the
data we collected
Inferential Statistics
Statistics to make inferences from
samples to the populations
A summary of your data
Center / Central Tendencies
Indicates a central value for the variable
Measures of Dispersion (Variability / Spread)
Indicate how much each participants’ score vary from each other
Measures of Association
Indicates how much
variables go together
(Shown in Tables, Graphs, Distributions)
Measures of Center
 Mode
 A value
 The
with the highest frequency
most common value
 Median
 The
“middle” score
 Mean
 Average
WHY are LEVELS / SCALE of
MEASUREMENT IMPORTANT?
 Because
you need
to match the
statistic you use to
the kind of variable
you have
Measures of Central Tendency,
Center
Nominal
Ordinal
Interval/Ratio
Mode
Mode
Mode
Median
Median
Mean
Info of difference among values
Summary
Meaningful
Zero
Ratio
Equal
Interval
Interval
Order
Difference
Ordinal
Nominal
Level of Measurement
Calculate
Math
Why “Equal Distance” Matters?
 If
the distance between values are equal (as
in interval or ratio data), you are able to
calculate (add, subtract, multiply, divide)
values
You can get a mean only for interval/ratio
variables
A wider variety of statistical tests are
available for interval/ratio variables
4
5
6
7
8
9
10
What are the Mean, Median, and Mode for this distribution?
What is this distribution shape called?
Types of
Measures of Dispersion
Variability / Spread
 Frequencies
/ Percentages
 Range
 The
distance between the highest score and the
lowest score (highest – lowest)
Standard deviation /
 Variance

Variance / Standard Deviation
 Variance
(S-squared): An approximate average
of the squared deviations from the mean
 Standard
Deviation(S or SD): Square root of
variance
 The larger the variance/ SD is, the higher
variability the data has or larger variation in
scores, or distributions that vary widely from
the mean.
Measures of Dispersion
Nominal
Ordinal
Interval/Ratio
Frequency,
%
Frequency,
%
Frequency,
%
Range, IQR
Range, IQR
StandardDeviatn,
Variance
CORRELATION
 Co-relation
2
variables tend to “go together”
 Indicates how strongly and
in which direction two variables are
correlated with each other
 ***
Correlation does NOT EQUAL cause
SIGN
 0:
No systematic relationship
• Positive correlation: As one
variable increases, so does the 2nd
• Negative correlation: As
one variable increases,
the 2nd gets smaller
Correlation Co-efficient
Perfect
Perfect
None
Negative
-1
Stronger
Positive
0
Weaker
+1
Stronger
SIZE
from –1 to + 1
 0 or close to 0 indicates NO relationship
 +/- .2 - .4 weak
 +/- .4 - .6 moderate
 +/- .6 - .8 strong
 +/- .8 - .9 very strong
 +/- 1.00 perfect
Negative relationships are NOT weaker!
 Ranges
Significance Test
 Correlation
co-efficient also comes with
significance test (p-value)
 p=.05: .05 probability of no correlation in
the population = 5% risk of TYPE I Error =
95% confidence level
 If p<.05, reject H0 and support Ha at 95%
confidence level
1. Infer characteristics of a
population from the
characteristics of the samples.
2. Hypothesis Testing
3. Statistical Significance
4. The Decision Matrix
Population
Sample
Statistics
X SD n
In fer
Population
Parameters
m
s
N
Inferential Statistics
 assess
-- are the sample statistics
indicators of the population parameters?
 Differences
between 2 groups -- happened
by chance?
 What
effect do random sampling errors
have on our results?
Random sampling error
Random sampling error:
Difference between the sample
characteristics and the population
characteristics caused by chance
 Sampling bias:
Difference between the sample characteristics
and the population characteristics
caused by biased (non-random) sampling
Probability
 Probability
(p) ranges between 1 and 0
 p = 1 means that the event would occur in
every trial
 p = 0 means the event would never occur in
any trial
 The closer the probability is to 1, the more
likely that the event will occur
 The closer the probability is to 0, the less
likely the event will occur
P > .05 means that …
 Means of two groups fall in 95% central
area of normal distribution with one
population mean
Mean 1
Mean 2
95%
P < .05 means that …
 Means of two groups do NOT fall in 95%
central area of normal distribution of one
population mean, so it is more reasonable to
assume that they belong to different
populations
m1
m2
Null Hypothesis
Says IV has no influence on DV
There is no difference between the two
variables.
There is no relationship between the
two variables.
Null Hypothesis
 States
there is NO true difference between
the groups
 If sample statistics show any difference, it is
due to random sampling error
 Referred as H0
 (Research Hypothesis = Ha)
 If you can reject H0, you can support Ha
 If you fail to reject H0, you reject Ha

Be conservative.

What are chances I would get these
results if null hypothesis is true?



Only if pattern is highly unlikely (p  .05)
do you reject null hypothesis and support
your hypothesis
Since cannot be 100% sure your conclusion
is correct, you take up to 5% risk.
Your p-value tells you the risk /the
probability of making TYPE I Error
True state
Wrong
person to
marry
You think it’s the wrong
person to marry
Correct
Type II
error
Type I
error
Correct
True state
No fire
No Alarm
Correct
Type II
error
Type I
error
Correct
True State
Ho (no fire)
You decide...
Accept Ho
Correct
Type II
error
Type I
error
Correct
(no alarm)
Reject Ho
Ha
Ho = null hypothesis =
there is NO fire
Ha = alternative hyp. =
there IS a FIRE
Easy ways to LOSE points
 Use
the word “prove”
 Better
to say support the hypothesis or
consistent with the hypothesis
 Tentative
statements acknowledge possibility of
making a Type 1 or Type 2 error
 Use
the word “random” incorrectly
Significance Test
 Significance
test examines the probability
of TYPE I error (falsely rejecting H0)
 Significance
test examines how probable it
is that the observed difference is caused by
random sampling error
 Reject
the null hypothesis if probability is
<.05 (probability of TYPE I error
is smaller than .05)
Principle Logic
P < .05
Reject Null Hypothesis (H0)
Support Your Hypothesis (Ha)
Logic of Hypothesis Testing
Statistical tests used in hypothesis testing deal with the
probability of a particular event occurring by chance.
Is the result common or a rare occurrence
if only chance is operating?
A score (or result of a statistical test) is “Significant”
if score is unlikely to occur on basis of chance alone.
Level of Significance
The “Level of Significance” is a cutoff point for
determining significantly rare or unusual scores.
Scores outside the middle 95% of a distribution are
considered “Rare” when we adopt the standard
“5% Level of Significance”
This level of significance can be written as:
p = .05
Decision Rules
Reject Ho (accept Ha) when
the sample statistic is statistically significant at the
chosen p level, otherwise accept Ho (reject Ha).
Possible errors:
• You reject the Null Hypothesis when in fact it is true,
a Type I Error, or Error of Rashness.
B. You accept the Null Hypothesis when in fact it is false,
a Type II Error, or Error of Caution.
True state
Your decision:

There is nothing happening
except chance variation (accept
the null)
Data indicates something
significant is happening (reject
null)
Data results Data indicates
are by
something is
chance (Null happening (Null
is true)
is false)
Correct
Type II error
Type I
error
Correct
To compare two groups on Mean Scores use t-test.
For more than 2 groups use Analysis of Variance
(ANOVA)
Can’t get a mean from nominal or ordinal data.
Chi Square tests the difference in Frequency
Distributions of two or more groups.
Parametric Tests
 Used
with data w/ mean score or standard
deviation.
 t-test, ANOVA and
Pearson’s Correlation r.

 Use
a t-test to compare mean differences
between two groups (e.g., male/female and
married/single).

Parametric Tests
 use ANalysis
Of VAriance (ANOVA) to
compare more than two groups (such as
age and family income) to get probability
scores for the overall group differences.
 Use
a Post Hoc Tests to identify which
subgroups differ significantly from each
other.
When comparing two groups on
MEAN SCORES use the t-test.
Mea n 1 - Mea n 2
t =
2
SD1
n1
+
2
SD2
n2
T-test
 If
p<.05, we conclude that two groups are
drawn from populations with different
distribution (reject H0) at 95% confidence
level
Our Research Hypothesis: hair length leads to
different perceptions of a person.
The Null Hypothesis: there will be no difference
between the pictures.
When comparing two groups on MEAN SCORES
use the
t-test.
Mea n 1 - Mea n 2
t =
2
SD1
n1
+
2
SD2
n2
“I think she is one of those people who
quickly earns respect.”
Short Hair:
p = .03
Accept Ha
Mean scores come
from different
distributions.
Mean = 2.2
SD =
1.9
n = 100
Long Hair:
2.2
4.1
Accept Ho
Mean scores reflect just
chance differences from
a single distribution.
Mean = 4.1
SD =
1.8
n = 100
3.1
“In my opinion, she is a mature person.”
Short Hair:
p = .01
Accept Ha
Mean scores come
from different
distributions.
Mean = 1.6
SD =
1.7
n = 100
1.6
Long Hair:
3.6
Mean = 3.6
SD =
1.2
n = 100
2.6
Accept Ho
Mean scores reflect just
chance differences from
a single distribution.
“I think we are quite similar to one
another.”
Short Hair:
Mean = 3.7
SD =
Accept Ha
Mean scores come
from different
distributions.
1.8
n = 100
3.7
Long Hair:
Mean = 3.9
SD =
1.5
n = 100
3.8
3.9
p = .89
Accept Ho
Mean scores are just
chance differences from
a single distribution.
A nonsignificant result may
be caused by a
 A.
 B.
 C.
 D.
low sample size.
very cautious significance level.
weak manipulation of independent
variables.
true null hypothesis.
When to use various statistics
Parametric
Non-parametric
 Interval
 Ordinal
or ratio data
data
and nominal
Chi-Square X2
 Chi
Square tests the difference in
frequency distributions of two or more
groups.
 Test of Significance
 of two nominal variables or
 of a nominal variable & an ordinal variable
 Used with a cross tabulation table