APPROACHES TO QUANTITATIVE DATA ANALYSIS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Download ReportTranscript APPROACHES TO QUANTITATIVE DATA ANALYSIS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
©
LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
STRUCTURE OF THE CHAPTER
• Scales of data • Parametric and non-parametric data • Descriptive and inferential statistics • Kinds of variables • Hypotheses • One-tailed and two-tailed tests • Distributions • Statistical significance • Hypothesis testing • Effect size • A note on symbols
FOUR SCALES OF DATA
NOMINAL ORDINAL INTERVAL RATIO
It is incorrect to apply statistics which can only be used at a higher scale of data to data at a lower scale.
PARAMETRIC AND NON PARAMETRIC STATISTICS
• •
Parametric statistics
: where characteristics of, or factors in, the population are known;
Non-parametric statistics
: where the characteristics of, or factors in, the population are unknown.
DESCRIPTIVE AND INFERENTIAL STATISTICS
• •
Descriptive statistics
: to summarize features of the sample or simple responses of the sample (e.g. frequencies or correlations). • No attempt is made to infer or predict population parameters.
Inferential statistics
: to infer or predict population parameters or outcomes from simple measures, e.g. from sampling and from statistical techniques.
• Based on probability.
DESCRIPTIVE STATISTICS
• The
mode
(the score obtained by the greatest number of people); • The
mean
(the average score); • • The
median
(the score obtained by the middle person in a ranked group of people, i.e. it has an equal number of scores above it and below it);
Minimum
and
maximum
scores; • The
range
(the distance between the highest and the lowest scores); • The
variance
(a measure of how far scores are from the mean: the average of the squared deviations of individual scores from the mean);
SIMPLE STATISTICS
• • • •
Frequencies
(raw scores and percentages) – Look for skewness, intensity, distributions and spread (kurtosis);
Mode
– For nominal and ordinal data
Mean
– For interval and ratio data
Standard deviation
– For interval and ratio data
9 8 7 6 5 4 3 2 1 X X X X 1 2 3 4 5
Mean
| | | | | | |
1 2 3 Mean = 6 4 20 High standard deviation
X 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
9 8
Mean
|
1 2 6 Mean = 6 10 11
7 6 | 5 4 3 | | |
Moderately high standard deviation
2 1 X X | X X X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
9 8 7 6 5 4
Mean
| | | |
5 6 6 Mean = 6 6 7 Low standard deviation
3 2 X X 1 X X X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
STANDARD DEVIATION
• The
standard deviation
is a standardised measure of the dispersal of the scores, i.e. how far away from the mean/average each score is. It is calculated, in its most simplified form as:
S
.
D
.
d
2
N
or
S
.
D
.
N
d
1 2 • d 2 = the deviation of the score from the mean (average), squared • = the sum of • N = the number of cases • A low standard deviation indicates that the scores cluster together, whilst a high standard deviation indicates that the scores are widely dispersed.
DESCRIPTIVE STATISTICS
• The
standard deviation
(a measure of the dispersal or range of scores: the square root of the variance); • • The
standard error
(the standard deviation of sample means); • The
skewness
(how far the data are asymmetrical in relation to a ‘normal’ curve of distribution);
Kurtosis
(how steep or flat is the shape of a graph or distribution of data; a measure of how peaked a distribution is and how steep is the slope or spread of data around the peak).
INFERENTIAL STATISTICS
• Can use descriptive statistics. • Correlations • Regression • Multiple regression • Difference testing • Factor analysis • Structural equation modelling
• •
DEPENDENT AND INDEPENDENT VARIABLES
An
independent variable
is an antecedent variable, that which causes, in part or in total, a particular outcome; it is a stimulus that influences a response, a factor which may be modified (e.g. under experimental or other conditions) to affect an outcome. A
dependent variable
is the outcome variable, that which is caused, in total or in part, by the input, antecedent variable. It is the effect, consequence of, or response to, an independent variable.
•
DEPENDENT AND INDEPENDENT VARIABLES
In using statistical tests which require independent and dependent variables, exercise caution in assuming which is or is not the dependent or independent variable, as the direction of causality may not be one-way or in the direction assumed.
FIVE KEY INITIAL QUESTIONS
1. What kind (scales) of data are there?
2. Are the data parametric or non-parametric?
3. Are descriptive or inferential statistics required?
4. Do dependent and independent variables need to be identified?
5. Are the relationships considered to be linear or non-linear?
CATEGORICAL, DISCRETE AND CONTINUOUS VARIABLES
• A
categorical
variable is a variable which has categories of values, e.g. the variable ‘sex’ has two values: male and female.
• A
discrete
variable has a finite number of values of the same item, with no intervals or fractions of the value, e.g. a person cannot have half an illness or half a mealtime.
• A
continuous
variable can vary in quantity, e.g. money in the bank, monthly earnings. There are equal intervals, and, usually, a true zero, e.g. it is possible to have no money in the bank.
CATEGORICAL, DISCRETE AND CONTINUOUS VARIABLES
• Categorical variables match categorical data. • Continuous variables match interval and ratio data.
KINDS OF ANALYSIS
• • •
Univariate analysis:
looks for differences amongst cases within one variable.
Bivariate analysis:
looks for a relationship between two variables.
Multivariate analysis:
looks for a relationship between two or more variables.
HYPOTHESES
• Null hypothesis (H 0 ) • Alternative hypothesis (H 1 ) • The null hypothesis is the stronger hypothesis, requiring rigorous evidence
not
to support it.
• One should commence with the former and cast the research in the form of a null hypothesis, and only turn to the latter in the case of finding the null hypothesis not supported.
HYPOTHESES
• Direction of hypothesis: states the kind of difference or relationship between two conditions or two groups of participants • One-tailed (directional), e.g.: ‘people who study in silent surroundings achieve indicates the direction.)
better
than those who study in noisy surroundings’. (‘Better’ • Two-tailed (no direction), e.g.: ‘there is a difference between people who study in silent surroundings and those who study in noisy surroundings’. (There is no indication of which is the better.)
ONE-TAILED AND TWO-TAILED TESTS
• A one-tailed test makes assumptions about the population and the direction of the outcome, e.g. Group A will score more highly than another on a test.
• A two-tailed test makes no assumptions about the population and the direction of the outcome, e.g. there will be a difference in the test scores.
THE NORMAL CURVE OF DISTRIBUTION
THE NORMAL CURVE OF DISTRIBUTION
• A smooth, perfectly symmetrical, bell-shaped curve. • It is symmetrical about the mean and its tails are assumed to meet the x-axis at infinity.
• Statistical calculations often assume that the population is distributed normally and then compare the data collected from the sample to the population, allowing inferences to be made about the population.
THE NORMAL CURVE OF DISTRIBUTION
Assumes that: – 68.3 per cent of people fall within 1 standard deviation of the mean; – 27.1 per cent) are between 1 standard deviation and 2 standard deviations away from the mean; – 4.3 per cent are between 2 and 3 standard deviations away from the mean; – 0.3 per cent are more than 3 standard deviations away from the mean.
SKEWNESS
The curve is not symmetrical or bell-shaped
KURTOSIS (STEEPNESS OF THE CURVE)
STATISTICAL SIGNIFICANCE
If the findings hold true 95% of the time then the statistical significance level ( ) = 0.05
If the findings hold true 99% of the time then the statistical significance level ( ) = 0.01
If the findings hold true 99.9% of the time then the statistical significance level ( ) = 0.001
Shoe size
1 2 3 4 5
CORRELATION
Hat size
1 2 3 4 5 Perfect positive correlation: + 1
Hand size
1 2 3 4 5
CORRELATION
Foot size
1 2 3 4 5 Perfect positive correlation: + 1
HAND SIZE
1 2 3 4 5
CORRELATION
FOOT SIZE
2 1 4 3 5 Positive correlation: <+1
PERFECT POSITIVE CORRELATION
7 6 5 4 Line 1 1 0 3 2
PERFECT NEGATIVE CORRELATION
2 1 0 7 6 5 4 3 Line 1
10 4 2 8 6 0
MIXED CORRELATION
Line 1
CORRELATIONS
Statistical significance is a function of the co-efficient and the sample size:
– the smaller the sample, the larger the co-efficient has to be in order to obtain statistical significance; – the larger the sample, the smaller the co-efficient can be in order to obtain statistical signifiance; – Statistical significance can be attained
either
by having a large coefficient together with a small sample
or
having a small coefficient together with a large sample.
CORRELATIONS
• Begin with a null hypothesis (e.g. there is no relationship between the size of hands and the size of feet). The task is
not
to support the hypothesis, i.e. the burden of responsibility is
not
to support the null hypothesis. • If the hypothesis is not supported for 95 per cent or 99 per cent or 99.9 per cent of the population, then there is a statistically significant relationship between the size of hands and the size of feet at the 0.05, 0.01 and 0.001 levels of significance respectively. • These levels of significance – the 0.05, 0.01 and 0.001 levels – are the levels at which statistical significance is frequently taken to be demonstrated.
HYPOTHESIS TESTING
• Commence with a null hypothesis • Set the level of significance ( ) to be used to support or not to support the null hypothesis (the alpha ( ) level); the alpha level is determined by the researcher.
• Compute the data.
• Determine whether the null hypothesis is supported or not supported.
• Avoid Type I and Type II errors.
TYPE I AND TYPE II ERRORS
•
Null Hypothesis: there is no statistically significant difference between x and y.
•
TYPE I ERROR
• – The researcher rejects the null hypothesis when it is in fact true (like convicting an innocent person) increase significance level
TYPE II ERROR
– The researcher accepts the null hypothesis when it is in fact false (like finding a guilty person innocent) reduce significance level, increase sample size.
EFFECT SIZE
• Increasingly seen as preferable to statistical significance.
• A way of quantifying the difference between two groups. It indicates how big the effect is, something that statistical significance does not.
• For example, if one group has had an experimental treatment and the other has not (the control group), then the effect size is a measure of the effectiveness of the treatment.
EFFECT SIZE
• It is calculated thus: Effect size (mean of experiment standard al group deviation of mean of the control control group) group • Statistics for calculating effect size include
r
2 , adjusted R 2 , 2 , 2 , Cramer’s
V
, Kendall’s
W
, Cohen’s
d
, Eta, Eta 2 .
Effect size (Eta 2 ) Sum of square Total sum between of groups squares • Different kinds of statistical treatments use different effect size calculations.
EFFECT SIZE
• In using Cohen’s
d
: 0-0.20 = weak effect 0.21-0.50 = modest effect 0.51-1.00 = moderate effect >1.00 = strong effect
THE POWER OF A TEST
• An estimate of the ability of the test to separate the effect size from random variation.