Chapter 17b - Analysing and Presenting Quantitative Data

Download Report

Transcript Chapter 17b - Analysing and Presenting Quantitative Data

Analysing and Presenting Quantitative Data: Inferential Statistics

Objectives After this session you will be able to: • Choose and apply the most appropriate statistical techniques for exploring relationships and trends in data (correlation and inferential statistics).

Stages in hypothesis testing • Hypothesis formulation.

• Specification of significance level (to see how safe it is to accept or reject the hypothesis).

• Identification of the probability distribution and definition of the region of rejection.

• Selection of appropriate statistical tests.

• Calculation of the test statistic and acceptance or rejection of the hypothesis.

Hypothesis formulation Hypotheses come in essentially three forms.Those that: • Examine the characteristics of a single population (and may involve calculating the mean, median and standard deviation and the shape of the distribution).

• Explore contrasts and comparisons between groups.

• Examine associations and relationships between groups.

[ Specification of significance level – potential errors • Significance level is not about importance – it is how likely a result is to be probably true (not by chance alone).

• Typical significance levels: – p = 0.05 (findings have a 5% chance of being untrue) – p = 0.01 (findings have a 1% chance of being untrue)

Identification of the probability distribution

Selection of statistical tests – examples

Research question

Is stress counselling effective in reducing stress levels?

Do women prefer skin care products more than men?

Independent variable

Nominal groups (experimental and control) Nominal (gender) Does gender influence choice of coach?

Is there an association between rainfall and sales of face creams?

Nominal (gender) Do two interviewers judge candidates the same?

Nominal Rainfall (ratio data)

Dependent variable Statistical test

Attitude scores (stress levels) Attitude scores (product preference levels) Nominal (choice of coach) Rank order scores Ratio data (sales) Paired t-test Mann Whitney U (data not normally distributed) Chi-square Spearman’s rho (data not normally distributed) Pearson Product Moment (data normally distributed)

Nominal groups and quantifiable data (normally distributed) To compare the performance/attitudes of two groups, or to compare the performance/attitudes of one group over a period of time using quantifiable variables such as scores.

Use

paired t-test

which compares the means of the two groups to see if any differences between them are significant. Assumption: data are normally distributed.

Paired t-test data set

Data outputs: test for normality

Case Processing Summary

Cases StressTime1 StressTime2 Valid N 92 92 Percent 98.9% 98.9% N Missing 1 1 Percent 1.1% 1.1% Total N 93 93 Percent 100.0% 100.0%

Tests of Normality

StressTime1 StressTime2 Statistic Kolmogorov-Smirnov(a) df .095

.096

92 92 Sig.

.041

.034

a Lilliefors Significance Correction Shapiro-Wilk Statistic .983

.985

df 92 92 Sig.

.289

.363

Data outputs: visual test for normality

Statistical output

Paired Samples Statistics

Pair 1 StressTime1 StressTime2 Mean 10.3587

8.7500

N 92 92 Std. Deviation 3.48807

Std. Error Mean .36366

3.19555

.33316

Paired Samples Test

Paired Differences Mean Pair 1 Stress Time 1 Stress Time 2 1.60870

Std.

Deviation 2.12239

Std. Error Mean .22127

95% Confidence Interval of the Difference Lower Upper 1.16916

2.04823

t 7.270

df Sig. (2-tailed) 91 .000

Nominal groups and quantifiable data (normally distributed) To compare the performance/attitudes of two groups, or to compare the performance/attitudes of one group over a period of time using quantifiable variables such as scores.

Use

Mann-Whitney U

.

Assumption: data are not normally distributed.

Example of data gathering instrument

Mann-Whitney U data set

Tests of Normality

Statistical output Shapiro-Wilk Attitude Sex 1 Statistic Kolmogorov-Smirnov(a) .298

2 .167

a Lilliefors Significance Correction df 32 68

Ranks

Sig.

.000

.000

Statistic .815

.909

df 32 .000

Sig.

68 .000

Test Statistics(a)

Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) a Grouping Variable: Sex Attitude 492.500

1020.500

-4.419

.000

Ranks Ranks

Attitude Sex 1 N 32 2 Total 68 100 Mean Rank 31.89

Sum of Ranks 1020.50

59.26

4029.50

Association between two nominal variables We may want to investigate relationships between two nominal variables – for example: • Educational attainment and choice of career.

• Type of recruit (graduate/non-graduate) and level of responsibility in an organization.

• Use

chi-square

when you have two or more variables each of which contains at least two or more categories.

Chi-square data set

Statistical output

Chi-Square Tests

Pearson Chi-Square Continuity Correction(a) Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Value .382(b) .221

.383

.380

201 df 1 1 1 1 Asymp. Sig. (2-sided) .536

Exact Sig. (2-sided) Exact Sig. (1-sided) .638

.536

.537

.556

a Computed only for a 2x2 table b 0 cells (.0%) have expected count less than 5. The minimum expected count is 33.08.

.320

Symmetric Measures

Nominal by Nominal Phi Value .044

Approx. Sig.

.536

N of Valid Cases Cramer's V .044

201 .536

a Not assuming the null hypothesis.

b Using the asymptotic standard error assuming the null hypothesis.

Correlation analysis Correlation analysis is concerned with associations between variables, for example: • Does the introduction of performance management techniques to specific groups of workers improve morale compared to other groups? (Relationship: performance management/morale.) • Is there a relationship between size of company (measured by size of workforce) and efficiency (measured by output per worker)? (Relationship: company size/efficiency.) • Do measures to improve health and safety inevitably reduce output? (Relationship: health and safety procedures/output.)

Perfect positive and perfect negative correlations

Highly positive correlation

Strength of association based upon the value of a coefficient

Correlation figure

0.00

0.01-0.09

0.10-0.29

0.30-0.59

0.60-0.74

0.75-0.99

1.00

Description

None Negligible Weak Moderate Strong Very strong Perfect

Calculating a correlation for a set of data We may wish to explore a relationship when: • The subjects are independent and not chosen from the same group.

• The values for X and Y are measured independently. • X and Y values are sampled from populations that are normally distributed.

• Neither of the values for X or Y is controlled (in which case, linear regression, not correlation, should be calculated).

Associations between two ordinal variables For data that is ranked, or in circumstances where relationships are non-linear, Spearman’s rank-order correlation (

Spearman’s rho

), can be used.

Spearman’s rho data set

Statistical output

Correlations

Spearman's rho MrJones Correlation Coefficient Sig. (2-tailed) N MrsSmith Correlation Coefficient Sig. (2-tailed) N ** Correlation is significant at the 0.01 level (2-tailed).

MrJones 1.000

MrsSmith .779(**) .

.000

30 30 .779(**) 1.000

.000

.

30 30

Association between numerical variables We may wish to explore a relationship when there are potential associations between, for example: • Income and age.

• Spending patterns and happiness.

• Motivation and job performance.

Use

Pearson Product-Moment

(if the relationships between variables are linear).

If the relationship is Spearman’s rho.

 or  -shaped, use

Pearson Product-Moment data set

Relationship between variables 180.00

160.00

140.00

120.00

100.00

80.00

20.00

30.00

40.00

Rainfall

50.00

60.00

70.00

Statistical output

Descriptive Statistics

Rainfall Sales Mean 48.17

132.47

Std. Deviation 11.228

28.311

N 30 30

Correlations

Rainfall Pearson Correlation Sig. (2-tailed) Rainfall 1 Sales -.813(**) .000

Sales N Pearson Correlation 30 -.813(**) 1 Sig. (2-tailed) N .000

30 ** Correlation is significant at the 0.01 level (2-tailed).

30 30

Summary • Inferential statistics are used to draw conclusions from the data and involve the specification of a hypothesis and the selection of appropriate statistical tests. • Some of the inherent danger in hypothesis testing is in making Type I errors (rejecting a hypothesis when it is, in fact, true) and Type II errors (accepting a hypothesis when it is false).

• For categorical data, non-parametric statistical tests can be used, but for quantifiable data, more powerful parametric tests need to be applied. Parametric tests usually require that the data are normally distributed.