Analysis of Data for Measuring Food Availability, Access

Transcript Analysis of Data for Measuring Food Availability, Access

Srinivasulu Rajendran
Centre for the Study of Regional Development (CSRD)
Jawaharlal Nehru University (JNU)
New Delhi
India
[email protected]
Objective of the session
To understand
CORRELATION
1. What is the procedure to
perform Correlation &
Regression?
2. How do we interpret results?
Identify the relationship between variables that
we want to perform Scatter plot for outliers and
type of relationship
 Monthly HH food Expenditure and HHSIZE
Interpreting Correlation Coefficient r
 strong correlation: r > .70 or r < –.70
 moderate correlation: r is between .30 & .70
or r is between –.30 and –.70
 weak correlation: r is between 0 and .30
or r is between 0 and –.30 .
GENERATE A SCATTERPLOT TO SEE
THE RELATIONSHIPS
Go to Graphs → Legacy dialogues→ Scatter/Dot → Simple
Click on DEPENDENT “mfx”. and move it to the Y-Axis
Click on the “hhsize”. and move it to the X-Axis
Click OK
Scatterplot might not look promising at first
Double click on chart to open a CHART EDIT window
use Options →Bin Element
Simply CLOSE this box.
Bins are applied automatically.
BINS
Dot size now
shows
number of
cases with
each pair of
X, Y values
DO NOT CLOSE CHART EDITOR YET!
Add Fit Line (Regression)
 In Chart Editor:
 Elements
→Fit Line at Total
 Close dialog box
that opens
 Close Chart Editor
window
Edited Scatterplot
 Distribution of
cases shown by
dots (bins)
 Trend shown
by fit line.
Type of Correlation
 Bivariate Correlations.
 Partial Correlations
 Distances
BIVARIATE CORRELATIONS
 In Bivariate Correlations, the relationship between two
variables is measured. The degree of relationship (how
closely they are related) could be either positive or
negative. The maximum number could be either +1
(positive) or -1 (negative). This number is the
correlation coefficient. A zero correlation indicates no
relationship. Remember that you will want to perform
a scatter plot before performing the correlation (to see
if the assumptions have been met.)
Objective
 We are interested in whether an monthly HH food
expenditure was correlated with hhsize.
Step 1
The Bivariate Correlations dialog box will appear
List of
Variables
Right
arrow
button to
add
selected
variable(s)
Step 2
 Select one of the variables that you want to correlate by
clicking on it in the left hand pane of the Bivariate
Correlations dialog box i.e mfx and hhsize
 Check the type of correlation coefficients that you require
(Pearson for parametric, and Kendall’s tau-b and Spearman
for non-parametric).
 Select the appropriate Test: Pearson’s correlation coefficient
assumes that each pair of variables is bivariate normal and
it is a measure of linear association. Two variables can be
perfectly related, but if the relationship is not linear,
Pearson’s correlation coefficient is not an appropriate
statistic for measuring their association.
 Test of Significance: You can select two-tailed or one-tailed
probabilities. If the direction of association is known in
advance, select One-tailed. Otherwise, select Two-tailed.
 Flag significant correlations. Correlation coefficients
significant at the 0.05 level are identified with a single
asterisk, and those significant at the 0.01 level are
identified with two asterisks.
 Click on the Options… button to select statistics, and
select Means and SD and control the missing value by
clicking “Exclude Cases pairwise.
Click on the Continue button.
Step 3
Click the OK button in
the Bivariate
Correlations dialog box
to run the analysis. The
output will be displayed
in a separate SPSS
Viewer window.
SPSS Output of Correlation Matrix
 The
Descriptive
Statistics section
gives the mean,
standard deviation,
and number of
observations (N)
for each of the
variables that you
specified.
Descriptive Statistics
Household
size
Monthly hh
food
expenditure
(taka)
Std.
Mean
Deviation
4.34
1.919
4411.25
2717.13
N
1237
1237
The correlations table displays Pearson
correlation coefficients, significance values,
and the number of cases with non-missing
values (N).
Correlations
The values of the correlation coefficient
range from -1 to 1.
The sign of the correlation coefficient
indicates the direction of the relationship Household
(positive or negative).
size
The absolute value of the correlation
coefficient indicates the strength, with
larger absolute values indicating stronger
relationships.
Monthly hh
food
The correlation coefficients on the main
expenditure
diagonal are always 1, because each variable
(taka)
has a perfect positive linear relationship with
itself.
Pearson
Correlation
Sig. (1tailed)
N
Pearson
Correlation
Sig. (1tailed)
N
Monthl
y hh
food
expend
Househol iture
d size (taka)
**
1 .608
.000
1237
**
.608
1237
1
.000
1237
1237
 The
significance of each
correlation coefficient is also
displayed in the correlation
table.
Correlations
 The significance level (or p-
value) is the probability of
obtaining results as extreme
as the one observed. If the
significance level is very small
(less than 0.05) then the
correlation is significant and
the two variables are linearly
related. If the significance
level is relatively large (for
example, 0.50) then the
correlation is not significant
and the two variables are not
linearly related.
Household
size
Monthly hh
food
expenditure
(taka)
Pearson
Correlation
Sig. (1tailed)
N
Pearson
Correlation
Sig. (1tailed)
N
Monthl
y hh
food
expend
Househol iture
d size (taka)
**
1 .608
.000
1237
**
.608
1237
1
.000
1237
1237
Partial Correlations
 The Partial Correlations procedure computes partial
correlation coefficients that describe the linear
relationship between two variables while controlling
for the effects of one or more additional variables.
Correlations are measures of linear association. Two
variables can be perfectly related, but if the
relationship is not linear, a correlation coefficient is
not a proper statistic to measure their association.
Step 1
How to perform Partial Correl: SPSS
Analyze –> Correlate –> Partial...
You will be presented with the “Partial Correlations" dialogue box:
Step 2
 Click right
click on
variables and
select “Display
Variable”
 Click “Sort
Alphabetically
“
Step 3
Step 4
 Select one of the
variables that you want
to correlate by clicking
on it in the left hand
pane of the Bivariate
Correlations
dialog
box i.e mfx and hhsize
 In this case, we can see
the
correlation
between monthly HH
food expenditure and
household size when
head of education
maintain constant.
 Test of Significance:
You can select two-
tailed or one-tailed
probabilities. If the
direction
of
association is known
in advance, select
One-tailed. Otherwise,
select Two-tailed.
 Flag significant
correlations.
Correlation
coefficients
significant at
the 0.05 level
are identified
with a single
asterisk, and
those
significant at
the 0.01 level
are identified
with
two
asterisks.
 Click OK to get
results
Step 5
As we can see, the
positive
correlation
between mfx and
hhsize
when
hh_edu
is
maintained
constant
is
significant at 1%
level (p > 0.00)
Correlations
Control Variables
(sum) head_edu Household size Correlation
Monthly hh
food
expenditure
(taka)
Monthl
y hh
food
expend
Househol iture
d size (taka)
1.000 .606
Significance
(1-tailed)
.
.000
df
0
1232
Correlation
.606 1.000
Significance
(1-tailed)
.000
.
df
1232
0
Hands-on Exercises
 Find out the correlation
relationship between per
capita total monthly expenditure and household size
and identify the nature of relationship and define the
reasons?
 Find out the correlation relationship between per
capita total monthly expenditure and household size
by controlling the village those who have adopted
technology and not adopted tech?
 Find out the correlation relationship between per
capita food expenditure and non-food expenditure by
controlling district effect? [Hint: it is two tail why?]
Distances
 This procedure calculates any of a wide variety of
statistics
measuring
either
similarities
or
dissimilarities (distances), either between pairs of
variables or between pairs of cases. These similarity or
distance measures can then be used with other
procedures, such as factor analysis, cluster analysis, or
multidimensional scaling, to help analyze complex
data sets.