APPROACHES TO QUANTITATIVE DATA ANALYSIS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

Download Report

Transcript APPROACHES TO QUANTITATIVE DATA ANALYSIS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

APPROACHES TO QUANTITATIVE DATA ANALYSIS

©

LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON

STRUCTURE OF THE CHAPTER

• Scales of data • Parametric and non-parametric data • Descriptive and inferential statistics • Kinds of variables • Hypotheses • One-tailed and two-tailed tests • Distributions • Statistical significance • Hypothesis testing • Effect size • A note on symbols

FOUR SCALES OF DATA

NOMINAL ORDINAL INTERVAL RATIO

It is incorrect to apply statistics which can only be used at a higher scale of data to data at a lower scale.

PARAMETRIC AND NON PARAMETRIC STATISTICS

• •

Parametric statistics

: where characteristics of, or factors in, the population are known;

Non-parametric statistics

: where the characteristics of, or factors in, the population are unknown.

DESCRIPTIVE AND INFERENTIAL STATISTICS

• •

Descriptive statistics

: to summarize features of the sample or simple responses of the sample (e.g. frequencies or correlations). • No attempt is made to infer or predict population parameters.

Inferential statistics

: to infer or predict population parameters or outcomes from simple measures, e.g. from sampling and from statistical techniques.

• Based on probability.

DESCRIPTIVE STATISTICS

• The

mode

(the score obtained by the greatest number of people); • The

mean

(the average score); • • The

median

(the score obtained by the middle person in a ranked group of people, i.e. it has an equal number of scores above it and below it);

Minimum

and

maximum

scores; • The

range

(the distance between the highest and the lowest scores); • The

variance

(a measure of how far scores are from the mean: the average of the squared deviations of individual scores from the mean);

SIMPLE STATISTICS

• • • •

Frequencies

(raw scores and percentages) – Look for skewness, intensity, distributions and spread (kurtosis);

Mode

– For nominal and ordinal data

Mean

– For interval and ratio data

Standard deviation

– For interval and ratio data

9 8 7 6 5 4 3 2 1 X X X X 1 2 3 4 5

Mean

| | | | | | |

1 2 3 Mean = 6 4 20 High standard deviation

X 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

9 8

Mean

|

1 2 6 Mean = 6 10 11

7 6 | 5 4 3 | | |

Moderately high standard deviation

2 1 X X | X X X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

9 8 7 6 5 4

Mean

| | | |

5 6 6 Mean = 6 6 7 Low standard deviation

3 2 X X 1 X X X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

STANDARD DEVIATION

• The

standard deviation

is a standardised measure of the dispersal of the scores, i.e. how far away from the mean/average each score is. It is calculated, in its most simplified form as:

S

.

D

.

 

d

2

N

or

S

.

D

.

 

N

d

1 2 • d 2 = the deviation of the score from the mean (average), squared •  = the sum of • N = the number of cases • A low standard deviation indicates that the scores cluster together, whilst a high standard deviation indicates that the scores are widely dispersed.

DESCRIPTIVE STATISTICS

• The

standard deviation

(a measure of the dispersal or range of scores: the square root of the variance); • • The

standard error

(the standard deviation of sample means); • The

skewness

(how far the data are asymmetrical in relation to a ‘normal’ curve of distribution);

Kurtosis

(how steep or flat is the shape of a graph or distribution of data; a measure of how peaked a distribution is and how steep is the slope or spread of data around the peak).

INFERENTIAL STATISTICS

• Can use descriptive statistics. • Correlations • Regression • Multiple regression • Difference testing • Factor analysis • Structural equation modelling

• •

DEPENDENT AND INDEPENDENT VARIABLES

An

independent variable

is an antecedent variable, that which causes, in part or in total, a particular outcome; it is a stimulus that influences a response, a factor which may be modified (e.g. under experimental or other conditions) to affect an outcome. A

dependent variable

is the outcome variable, that which is caused, in total or in part, by the input, antecedent variable. It is the effect, consequence of, or response to, an independent variable.

DEPENDENT AND INDEPENDENT VARIABLES

In using statistical tests which require independent and dependent variables, exercise caution in assuming which is or is not the dependent or independent variable, as the direction of causality may not be one-way or in the direction assumed.

FIVE KEY INITIAL QUESTIONS

1. What kind (scales) of data are there?

2. Are the data parametric or non-parametric?

3. Are descriptive or inferential statistics required?

4. Do dependent and independent variables need to be identified?

5. Are the relationships considered to be linear or non-linear?

CATEGORICAL, DISCRETE AND CONTINUOUS VARIABLES

• A

categorical

variable is a variable which has categories of values, e.g. the variable ‘sex’ has two values: male and female.

• A

discrete

variable has a finite number of values of the same item, with no intervals or fractions of the value, e.g. a person cannot have half an illness or half a mealtime.

• A

continuous

variable can vary in quantity, e.g. money in the bank, monthly earnings. There are equal intervals, and, usually, a true zero, e.g. it is possible to have no money in the bank.

CATEGORICAL, DISCRETE AND CONTINUOUS VARIABLES

• Categorical variables match categorical data. • Continuous variables match interval and ratio data.

KINDS OF ANALYSIS

• • •

Univariate analysis:

looks for differences amongst cases within one variable.

Bivariate analysis:

looks for a relationship between two variables.

Multivariate analysis:

looks for a relationship between two or more variables.

HYPOTHESES

• Null hypothesis (H 0 ) • Alternative hypothesis (H 1 ) • The null hypothesis is the stronger hypothesis, requiring rigorous evidence

not

to support it.

• One should commence with the former and cast the research in the form of a null hypothesis, and only turn to the latter in the case of finding the null hypothesis not supported.

HYPOTHESES

• Direction of hypothesis: states the kind of difference or relationship between two conditions or two groups of participants • One-tailed (directional), e.g.: ‘people who study in silent surroundings achieve indicates the direction.)

better

than those who study in noisy surroundings’. (‘Better’ • Two-tailed (no direction), e.g.: ‘there is a difference between people who study in silent surroundings and those who study in noisy surroundings’. (There is no indication of which is the better.)

ONE-TAILED AND TWO-TAILED TESTS

• A one-tailed test makes assumptions about the population and the direction of the outcome, e.g. Group A will score more highly than another on a test.

• A two-tailed test makes no assumptions about the population and the direction of the outcome, e.g. there will be a difference in the test scores.

THE NORMAL CURVE OF DISTRIBUTION

THE NORMAL CURVE OF DISTRIBUTION

• A smooth, perfectly symmetrical, bell-shaped curve. • It is symmetrical about the mean and its tails are assumed to meet the x-axis at infinity.

• Statistical calculations often assume that the population is distributed normally and then compare the data collected from the sample to the population, allowing inferences to be made about the population.

THE NORMAL CURVE OF DISTRIBUTION

Assumes that: – 68.3 per cent of people fall within 1 standard deviation of the mean; – 27.1 per cent) are between 1 standard deviation and 2 standard deviations away from the mean; – 4.3 per cent are between 2 and 3 standard deviations away from the mean; – 0.3 per cent are more than 3 standard deviations away from the mean.

SKEWNESS

The curve is not symmetrical or bell-shaped

KURTOSIS (STEEPNESS OF THE CURVE)

STATISTICAL SIGNIFICANCE

If the findings hold true 95% of the time then the statistical significance level (  ) = 0.05

If the findings hold true 99% of the time then the statistical significance level (  ) = 0.01

If the findings hold true 99.9% of the time then the statistical significance level (  ) = 0.001

Shoe size

1 2 3 4 5

CORRELATION

Hat size

1 2 3 4 5 Perfect positive correlation: + 1

Hand size

1 2 3 4 5

CORRELATION

Foot size

1 2 3 4 5 Perfect positive correlation: + 1

HAND SIZE

1 2 3 4 5

CORRELATION

FOOT SIZE

2 1 4 3 5 Positive correlation: <+1

PERFECT POSITIVE CORRELATION

7 6 5 4 Line 1 1 0 3 2

PERFECT NEGATIVE CORRELATION

2 1 0 7 6 5 4 3 Line 1

10 4 2 8 6 0

MIXED CORRELATION

Line 1

CORRELATIONS

Statistical significance is a function of the co-efficient and the sample size:

– the smaller the sample, the larger the co-efficient has to be in order to obtain statistical significance; – the larger the sample, the smaller the co-efficient can be in order to obtain statistical signifiance; – Statistical significance can be attained

either

by having a large coefficient together with a small sample

or

having a small coefficient together with a large sample.

CORRELATIONS

• Begin with a null hypothesis (e.g. there is no relationship between the size of hands and the size of feet). The task is

not

to support the hypothesis, i.e. the burden of responsibility is

not

to support the null hypothesis. • If the hypothesis is not supported for 95 per cent or 99 per cent or 99.9 per cent of the population, then there is a statistically significant relationship between the size of hands and the size of feet at the 0.05, 0.01 and 0.001 levels of significance respectively. • These levels of significance – the 0.05, 0.01 and 0.001 levels – are the levels at which statistical significance is frequently taken to be demonstrated.

HYPOTHESIS TESTING

• Commence with a null hypothesis • Set the level of significance (  ) to be used to support or not to support the null hypothesis (the alpha (  ) level); the alpha level is determined by the researcher.

• Compute the data.

• Determine whether the null hypothesis is supported or not supported.

• Avoid Type I and Type II errors.

TYPE I AND TYPE II ERRORS

Null Hypothesis: there is no statistically significant difference between x and y.

TYPE I ERROR

• – The researcher rejects the null hypothesis when it is in fact true (like convicting an innocent person)  increase significance level

TYPE II ERROR

– The researcher accepts the null hypothesis when it is in fact false (like finding a guilty person innocent)  reduce significance level, increase sample size.

EFFECT SIZE

• Increasingly seen as preferable to statistical significance.

• A way of quantifying the difference between two groups. It indicates how big the effect is, something that statistical significance does not.

• For example, if one group has had an experimental treatment and the other has not (the control group), then the effect size is a measure of the effectiveness of the treatment.

EFFECT SIZE

• It is calculated thus: Effect size  (mean of experiment standard al group deviation of  mean of the control control group) group • Statistics for calculating effect size include

r

2 , adjusted R 2 ,  2 ,  2 , Cramer’s

V

, Kendall’s

W

, Cohen’s

d

, Eta, Eta 2 .

Effect size (Eta 2 )  Sum of square Total sum between of groups squares • Different kinds of statistical treatments use different effect size calculations.

EFFECT SIZE

• In using Cohen’s

d

: 0-0.20 = weak effect 0.21-0.50 = modest effect 0.51-1.00 = moderate effect >1.00 = strong effect

THE POWER OF A TEST

• An estimate of the ability of the test to separate the effect size from random variation.