Data Analysis - Architecture and Landscape. UoG.

Download Report

Transcript Data Analysis - Architecture and Landscape. UoG.

Data Analysis
Introduction
In this presentation we will examine:
 Procedures for analysing data;
 The value of plotting data;
 Primary methods for analysing data:

– regression;
– correlation;
– variance;
– factor analysis.
Data analysis
All to often ‘enthusiastic’ researchers
jump into the most complex statistical
analyses only to emerge some while
latter rather bemused.
 Such an approach is not rigourous or
wise.
 Data analysis should be undertaken in
stages.

Examining raw data
No matter what data is collected the first
stage in all analyses is to examine the
raw data to search for any patterns.
 Patterns may:

– be expected from theory/literature;
– emerge from observation.

Pattern searching must be undertaken
with an ‘open-mind’.
Examining raw data
Pattern searching can be aided by the
use of pictures.
 The most common type of pictures used
by researchers are data plots.
 Plotting the data can help indicate the
nature of the distribution of the data and
any relationships between data.

Pie charts
22%
HA
LA
UNI
33%
45%

Pie charts are a
useful way of
showing the overall
distribution to a
given variable from
the various
subgroups being
investigated.
Bar charts

50
45
40
35
30
25
20
15
10
5
0

1
2
3
4
5
Bar charts are a
simple way of
viewing the range of
values recorded for
a variable.
Bar charts can also
indicate the shape
of the distribution.
Bar charts
14
HA
LA
UNI
12
10

8
6
4

2
0
Good
Fair
Poor
Bar charts can be
used to contrast the
values for a given
variable obtained
from two or more
subgroups.
Universities appear
to be different to the
other groups.
Bar charts
30
UNI
LA
HA
25
20
15
10
5
0
Good
Fair
Poor

Bar charts can be
used to investigate
the overall
distribution of a
variable and its
subgroup
breakdown.
Scatter plots

2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0

0
1
2
3
4
5
Scatter plots can be
used to examine
possible
relationships
between variables.
In this example one
might indeed
suspect a
relationship exists.
Plotting data
In summary, plotting data can be useful
in identifying possible trends in data and
suggesting possible relationships
between data.
 If relationships are suggested then it is
usual to support the graphical
representation by the use of statistics.

Statistics
However, statistics is a highly complex
subject and the use of statistical
analyses should not be taken lightly.
 In the following slides we will introduce
some of the more common statistical
techniques that can be used to analyse
research data.

Correlation & Regression
Correlation and regression are used to
test possible relationships between two
(or more) variables.
 Correlation is used to establish an
association between variables.
 Regression is used to express the
association in mathematical terms.

Correlation & Regression
Neither correlation nor regression can
establish causality.
 Only theory, evidence and logical
reasoning, used in conjunction with
statistics, can establish causality.

Correlation

9
8
7
6

5
4
3
2
1
0
0
5
10
15
Consider the
general scatter plot
shown opposite.
Conventionally the
independent
variable is plotted on
the x-axis and the
dependent variable
on the y-axis.
Correlation

9
8
7
6
5
4
3

2
1
0
0
5
10
15
From inspection it
would appear that a
relationship may
exist between the
two variables.
The correlation
coefficient measures
the degree and
nature of the
relationship.
Correlation
The value of the correlation coefficient
ranges from +1 to -1.
 A correlation coefficient of +1 implies
perfect positive relationship:
 An increase in the variable x is matched
by an equiproportional increase in the
variable y.

Correlation
A correlation coefficient of zero implies
that there is no relationship between the
two variables.
 A correlation coefficient of -1 implies a
perfect negative relationship:
 An increase in x is matched by a
equiproportional decrease in the
variable y.

Correlation

The square of the correlation coefficient
is known as the coefficient of
determination and it can be used to
establish how much of the change in
the dependent variable can be
accounted for by the change in the
independent variable.
Correlation
For example a correlation coefficient of
0.9 gives a coefficient of determination
of 0.81.
 This in turn implies that 81% of the
observed change in the dependent
variable can be explained by the
changes in the independent variable.

Correlation

Finally, the level of confidence placed
on a correlation coefficient depends
upon the number of observations used
in its calculation and can be obtained by
comparing the calculated value to those
contained in standard statistical tables.
Regression
Regression analyses attempts to
develop a mathematical equation which
describes the relationship between two
(or more) variables.
 Consider the scatter plot presented
previously.

Regression

9
8
7
6
5
4
3
2
1
0
0
5
10
15
The relationship
between the two
variables could be
modeled by a
straight line:
y = ax + b.
Regression

9
8
7
6
5
4
3
2
1
0
0
5
10
15
The relationship
between the two
variables could be
modeled by a
straight line:
y = ax + b.
Regression

9
8
7
6
5
4
3
2
1

0
0
5
10
15
The equation of the
‘best’ straight is
established by
minimising the missmatch between the
predicted values
and the data.
This is achieved by
the method of Least
squares.
Regression
Unfortunately not all relationships can
be modeled using a straight line
relationship.
 If you suspect a relationship should
exist between two variables BUT a
straight line doesn’t ‘look right’ then you
can examine the possibility that the
relationship may be non-linear.

Regression

When examining a possible non-linear
relationship you have to assume the
basic form of the relationship:
– logarithmic ( y = abx);
– exponential (y = aex);
– polynomial (y = a + bx + cx2 ....)

There should always be a logical
reason for the form you choose.
Correlation & Regression
One final word of warning:
 Not all data lends itself to advanced
statistical analysis.
 The choice of statistical technique
depends upon the type of data being
analysed.
 Inappropriate use of statistics is worse
than no use at all.

Appropriate statistics

For nominal data scales:
– number of case;
– mode;
– contingency correlation;

For ordinal data scales:
– median;
– percentiles.
Appropriate statistics

For interval scales:
– mean;
– standard deviation;
– rank-order correlation;
– product-moment correlation.

For ration scales:
– coefficient of variation.
Nominal scales
The number of cases is a simple count
of the number of times a variable is a
given value.
 The mode is the most frequent value
recorded for the variable.
 Contingency correlation uses the Chi
square statistic to correlate between
variables.

Ordinal scales
The median is a measure of central
tendency of a variable.
 Percentiles summarise the percentage
of the variable that lies between certain
(preset) limits.
 Rank-Order (Spearman) correlation can
be used to measure relationships
between variables

Interval scales
The mean is the average value of a
given variable.
 The standard deviation measures the
dispersion of the variable around the
mean.
 Product-moment (Pearson) correlation
can be used to measures relationships
between variables.

Ratio scales

All statistical techniques can be used
with ration scales of measurement.
Appropriate statistics

In more complex multi-variable
analyses another way of assessing the
appropriateness of a statistical
technique is to examine the nature of
the measurement.
Appropriate statistics
Interval
Ordinal
Nominal
Multi-variable methods

If more complex statistical techniques
are needed then the following
restrictions should be considered.
Multi-variable methods
Dependent Independent
Method
Continious
Continious
Continious
Nominal
Mutliple
regression
Variance
Continious
Nominal and
continious
Continious
Nominal
Co-variance
Discriminant
Multi-variable methods

Factor analysis is a technique which
can be used to identify underlying
trends which can be described by
combining variables into distinguishable
factors.
Summary
In this presentation we have examined
ways of analysing data.
 Plotting the data.
 The use of statistics:

– Scales of measurement.

Next week.......
Results, Inferences & Conclusions