Lecture 3: Chi-Sqaure, correlation and your dissertation

Download Report

Transcript Lecture 3: Chi-Sqaure, correlation and your dissertation

Lecture 3: Chi-Sqaure, correlation
and your dissertation proposal
• Non-parametric data: the Chi-Square test
• Statistical correlation and regression:
parametric and non-parametric tests
• Break
• Regression in SPSS
• Writing a dissertation proposal when you
plan to use statistics
• Exercises, assessment and assistance
Non-parametric statistics
• Non-parametric statistics in human
geography
• Different types of non-parametric test:
–
–
–
–
1 sample
2 independent samples
2 tied samples
3 or more samples
The Chi-Square test
• Most versatile test in social science
• Can be used to examine nominal data,
ordinal data and interval/ratio data in groups
• There are no assumptions about
independent or paired observations
Theory of Chi-Square
• The test examines the difference between
observed counts and expected values
• Suppose we wanted to examine the
difference between age groups in our
sample and people in those groups in the
UK? Or perhaps the difference between age
groups between two or three samples?
• Chi-Square can examine these differences
The Chi-Square Equation
χ2 = Sum of:
(observed - expected)2
expected
One way Chi-Square test
• Examines whether there is a difference
between one sample and a population
• We can assume either that the expected
counts will be equal between categories or
that we know the proportions
• But, before we do the test, we have to crosstabulate the data
The Cross-tabulation
Age
18-30 31-50 51-65 Over 65 Total
North 30
20
35
55
140
Total 30
20
35
55
140
The expected counts
• Expected counts relate to either equal
proportions or previously known
proportions (e.g. from a population)
• These are then compared to observed counts
and the difference is calculated
• A significance level is selected and the null
hypothesis is accepted or rejected
The Contingency Table
Age
18-30 31-50 51-65 Over 65 Total
North 30
20
35
55
140
Exp
35
35
35
35
140
Total 30
20
35
55
140
The test result
• Chi-Square is calculated as the sum of each
difference for every cell
• Assessed as for other statistical tests
• χ2 = 7.1 (p <0.05)
Two way Chi-Square tests
• Very often, we want to compare more than
one sample with a population, such as with
another sample, or three or more samples
• Two way Chi-Square allows us to do this
easily
• Again, we cross-tabulate the data
The Contingency table
Age
18-30 31-50 51-65 Over 65 Total
North 25
25
35
55
140
South 40
35
25
20
120
Total 65
60
60
75
260
Two-way analysis
• Chi-Square calculates expected values by
multiplying the row and column totals and
dividing between the grand total
• Expected values represent the number in
each category which, given the sample sizes
and distribution, we would expect to see in
each cell
The Chi-Square result
• Chi-Square gives the result and we evaluate
the test with the use of significance tests
• χ2 = 21.7 (p <0.05)
• But, we can only state that there is a
difference - not what the difference is. For
example, does our sample from the north
have more older people in it?
• We must examine the relative proportions of
the contingency table to find this out
The expected counts problem
• Chi-Square has the stipulation that 20% or
less of the expected counts in an analysis
must be under 5. If there are more than this,
the test is invalid
• So, how can we get over this problem?
Recoding variables
• We can aggregate suitable variables to make
the number of groups smaller
• Aggregating only works with ordinal data
• This reduces the number of groups and
makes the likelihood of obtaining counts
below 5 less
• We can also use this to make interval/ratio
data into groups
Chi-Square: Qualifications
• You should have no less than 20 cases
• As stated above, not more than 20% of cells
should have expected values under 5
• You should not necessarily ignore a
contingency table, even if the Chi-Square
test is invalid
• Remember, above all, that Chi-Square is a
test of difference, not correlation
Statistical correlation:
relationships among variables
• Relationships are concerned with the extent
to which variable A is related to B
• This is termed correlation
• Correlation does not necessarily imply
causation, but merely a possible relationship
• There are parametric and non-parametric
tests of correlation
Types of correlation
• Perfect positive
correlation: +1
• Perfect negative
correlation: -1
• Linear relationship
• No correlation: 0
• Non-linear
relationship
20
15
10
5
0
0
5
10
15
20
Parametric correlation:
Pearson’s r
• Assumes your data are on interval/ratio
scales AND are normally distributed
• Measured as -1 - +1
• This result shows the strength of the
relationship
• The test must be judged by its significance
(as for other parametric tests: < > 0.05)
Non-parametric correlation:
Spearman’s rs
• Assumes ordinal data, or interval/ratio data
that are not normally distributed
• Data are ranked for the test
• Measured as for Pearson’s
• Significance as for Pearson’s
From correlation to explanation:
regression analysis
• Regression seeks to examine the nature of
the relationship between one or more
independent variables and a dependent
variable
• It is concerned with prediction, not just
correlation
• To predict, there is an equation which
describes the ‘line of best fit’ between
variables
The Line of best fit
• Line of best fit ‘fits’ a
straight line through
the data points you
observe
• Can be expressed by:
Y = mx + c
Where:
Y = Dependent variable
c = constant (intercept)
m = slope gradient
x = independent variable
y = 0.9677x + 0.5895
20
18
16
14
12
10
8
6
4
2
0
0
5
10
15
20
Predicting using the regression
equation
• You can use the equation to predict levels of
Y for given levels of X
• This is often of use when looking at
different outcome situations
Interpreting regression results
• R2: the ‘goodness of fit’ that the model
offers, expressed in per cent
• F: the significance of the model
• The regression coefficients and associated p
values
Regression: assumptions
• Your data:
–
–
–
–
•
Are measured on interval/ratio scales;
Are normally distributed;
And are therefore Parametric; and...
Have a linear relationship
You can use other techniques for non-linear
regression and regression with nominal/ordinal
variables
Is any of this relevant to me?
• YES - you have to write a dissertation
proposal
• Saying you will ‘analyse’ the data using
appropriate methods is not enough
• You will get a far higher mark if you follow
these simple steps in the next two months
when preparing your proposal:
Writing your Dissertation
Proposal: key points
• Do you need to use a questionnaire/other
quantitative instrument?
• If yes, what key questions are you posing?
• ALWAYS relate these questions to your
plans for analysis
• How will you analyse these collected data
to meet your aims and objectives?
Writing your proposal
• Methodology
• Questionnaire
• Questions
• Data this will yield
• Analysis types
• Analysis tools
• Quantitative/qualitative?
• Type: closed/open/both?
• Yes/no; frequency;
categorical; multiple
response?
• Parametric/non-parametric?
• Description, Differences,
relationships?
• Parametric/non-parametric?
Example of this process
Section of proposal
Abstract example
Specific example
Methodology
Quantitative
A questionnaire
Questionnaire
Type
Closed, with one open question
Dichotomous
Categorical
Agreement
Frequency
Write-in answers
Parametric
Non-parametric
Description
Differences
Relationships
Visual
Parametric
Non-parametric
•
Yes/no, M/F, etc.
Family type, etc.
Attitude questions
Behaviour questions
Age
Age
All other variables
Describe attitudes
Difference in attitudes (e.g. M/F)
Correlation of attitudes/behaviour
Bar and pie graphs, Pareto charts
t-tests, ANOVA, Pearson’s r
Chi-Square, Spearman’s rs
Questions
Data this will yield
Analysis types
Analysis tools
A final word
• Think carefully about your questionnaire can you meet the objectives you have set
yourself?
• Do you need to use every statistical test?
• Assessments (all 3) due in on 6 May
• Where can you get help?
– Friday 14th March, 9-11am;
– Monday 28th April, 11am-1pm
• E-mail: [email protected]