1. dia - unideb.hu

Download Report

Transcript 1. dia - unideb.hu

Applied Quantitative Methods

MBA course Montenegro

Peter Balogh PhD [email protected]

Non-parametric tests

• • •

13. Non-parametric tests

When looking at hypothesis testing in the previous chapter we were concerned with specific statistics (parameters), which represent statements about the population, and these are then tested by using further statistics derived from the sample. Such parametric tests are extremely important in the development of statistical theory and for testing of many sample results, but they do not cover all types of data, particularly when parameters cannot be calculated in a meaningful way. We therefore need to develop other tests which will be able to deal with such situations; a small range of these non-parametric tests are covered in this chapter.

13. Non-parametric tests

• Parametric tests require the following conditions to be satisfied: 1. A null hypothesis can be stated in terms of parameters.

2. A level of measurement has been achieved that gives validity to differences.

• • • 3. The test statistic follows a known distribution.

It is not always possible to define a meaningful parameter to represent the aspect of the population in which we are interested.

For instance, what is an average eye-colour? Equally it is not always possible to give meaning to differences in values, for instance if brands of soft drink are ranked in terms of value for money or taste.

• • •

13. Non-parametric tests

Where these conditions cannot be met, non parametric tests may be appropriate, but note that for some circumstances, there may be no suitable

test.

As with all tests of hypothesis, it must be remembered that even when a test result is significant in statistical terms, there may be situations where it has no importance in practice. A non-parametric test is still a hypothesis test, but rather than considering just a single parameter of the sample data, it looks at the overall distribution and compares this to some known or expected value, usually based upon the null hypothesis.

13. Non-parametric tests

• • • • •

13.1 Chi-squared tests

The format of this test is similar to the parametric tests already considered (see Chapter 12). As before, we will define hypotheses, calculate a test statistic, and compare this to a value from tables in order to decide whether or not to reject the null hypothesis. As the name may suggest, the statistic calculated involves squaring values, and thus the result can only be positive.

This is by far the most widely used non-parametric hypothesis test and is almost invariably used in the early stages of the analysis of questionnaire results. As statistical programs become easier and easier to use, more people are able to conduct this test, which, in the past, took a long time to construct and calculate.

• • •

13.1 Chi-squared tests

The shape of the chi-squared distribution is determined by the number of degrees of freedom (cf. the t-distribution used in the previous chapter).

In general, for relatively low degrees of freedom, the distribution is skewed to the left, as shown in Figure 13.2. As the number of degrees of freedom approaches infinity, then shape of the distribution approaches a Normal distribution.

13.1 Chi-squared tests

• • •

13.1 Chi-squared tests

We shall look at two particular applications of the chi-squared χ

2

) test. The first considers survey data, usually from questionnaires, and tries to find if there is an association between the answers given to a pair of questions. Secondly, we will use a chi-squared test to check whether a particular set of data follows a known statistical distribution.

• •

13.1.1 Tests of association

Case study

In the Arbour Housing Survey we might be interested in how the responses to question 4 on type of property and question 10 on use of the local post office relate.

Looking directly at each question would give us the following:

13.1.1 Tests of association

• • • •

13.1.1 Tests of association

In addition to looking at one variable at a time, we can construct tables to show how the answers to one question relate to the answers to another; these are commonly referred to as cross-tabulations. The single tabulations tell us that 150 respondents (or 50%) live in a house and that 40 respondents (or 13.3%) use their local post office 'once a month'.

They do not tell us how often people who live in a house use the local post office and whether their pattern of usage is different from those that live in a flat. To begin to answer these questions we need to complete the table (see Table 13.1).

13.1.1 Tests of association

• • • • •

13.1.1 Tests of association

It would be an extremely boring and time-consuming job to manually fill a table like Table 13.1, even with a small sample such as this.

In fact, in some cases we might want to cross-tabulate three or more questions. However, most statistical packages will produce this type of table very quickly.

For relatively small sets of data you could use Excel, but for larger scale surveys it would be advantageous to use a more specialist package such as SPSS (the Statistical Package for the Social Sciences). With this type of program it can take rather longer to prepare and enter the data, but the range of analysis and the flexibility offered make this well worthwhile.

13.1.1 Tests of association

The cross-tabulation of questions 4 and 10 will produce the type of table shown as Table 13.2.

• •

13.1.1 Tests of association

We are now in a better position to relate the two answers, but, because different numbers of people live in each of the types of accommodation, it is not immediately obvious if different behaviours are associated with their type of residence.

A chi-squared test will allow us to find if there is a statistical association between the two sets of answers; and this, together with other information, may allow the development of a proposition that there is a causal link between the two.

• • • • • • • •

13.1.1 Tests of association

To carry out the test we will follow the seven steps used in Chapter 12.

Step 1 State the hypotheses.

H 0 : There is no association between the two sets of answers.

H 1 : There is an association between the two sets of answers.

Step 2 State the significance level.

As with a parametric test, the significance level can be set at various values, but for most business data it is usually 5%.

Step 3 State the critical value.

The chi-squared distribution varies in shape with the number of degrees of freedom (in a similar way to the t-distribution), and thus we need to find this value before we can look up the appropriate critical value.

• • •

13.1.1 Tests of association

Degrees of freedom

Consider Table 13.1. There are four rows and four columns, giving a total of 16 cells.

Each of the row and column totals is fixed (i.e. these are the actual numbers given by the frequency count for each question), and thus the individual cell values must add up to the appropriate totals.

In the first row, we have freedom to put any numbers into three of the cells, but the fourth is then fixed because all four must add to the (fixed) total (i.e. three degrees of freedom).

• • • • • •

13.1.1 Tests of association

Degrees of freedom

The same will apply to the second row (i.e. three more degrees of freedom). And again to the third row (three more degrees of freedom). Now all of the values on the fourth row are fixed because of the totals (zero degrees of freedom). Totaling these, we have 3 + 3 + 3 + 0 = 9 degrees of freedom for this table.

This is illustrated in Table 13.3. As you can see, you can choose any three cells on the first row, not necessarily the first three.

13.1.1 Tests of association

Degrees of freedom

• • •

13.1.1 Tests of association

Degrees of freedom

There is a short cut! If you take the number of rows minus one and multiply by the number of columns minus one you get the number of degrees of freedom ν= (r- 1) x (c- 1) Using the tables in Appendix E, we can now state the critical value as 16.9.

• •

13.1.1 Tests of association

Degrees of freedom

Step 4 Calculate the test statistic.

The chi-squared statistic is given by the following formula: • • where O is the observed cell frequencies (the actual answers) and E is the expected cell frequencies (if the null hypothesis is true).

Finding the expected cell frequencies takes us back to some simple probability rules, since the null hypothesis makes the assumption that the two sets of answers are independent of each other. If this is true, then the cell frequencies will depend only on the totals of each row and column.

• •

13.1.1 Tests of association

Calculating the expected values

Consider the first cell of the table (i.e. the first row and the first column). The number of people living in houses is 150 out of a total of 300, and thus the probability of someone living in a house is: • The probability of 'once a month' is:

• • • •

13.1.1 Tests of association

Calculating the expected values

Thus the probability of living in a house and 'once a month' is: 0.5 x 0.13333 = 0.066665

Since there are 300 people in the sample, one would expect there to be 0.066665 x 300 = 19.9095

people who fit the category of the first cell. (Note that the observed value was 30.)

• •

13.1.1 Tests of association

Calculating the expected values

Again there is a short cut! Look at the way in which we found the expected value.

• • •

13.1.1 Tests of association

Calculating the expected values

We need to complete this process for the other cells in the table, but remember that, because of the degrees of freedom, you only need to calculate nine of them, the rest being found by subtraction.

Statistical packages will, of course, find these expected cell frequencies very quickly.

The expected cell frequencies are shown in Table 13.4.

13.1.1 Tests of association

Calculating the expected values

• • •

13.1.1 Tests of association

Calculating the expected values

If we were to continue with the chi-squared test by hand calculation we would need to produce the type of table shown as Table 13.5.

Step 5 Compare the calculated value and the critical value.

The calculated χ

2

value of 140.875 > 16.9.

• • • • • •

13.1.1 Tests of association

Calculating the expected values

Step 6 Come to a conclusion.

We already know that chi-squared cannot be below zero. If all of the expected cell frequencies were exactly equal to the observed cell frequencies, then the value of chi-squared would be zero.

Any differences between the observed and expected cell frequencies may be due to either sampling error or to an association between the answers; the larger the differences, the more likely it is that there is an association. Thus, if the calculated value is below the critical value, we will be unable to reject the null hypothesis, but if it is above the critical value, we reject the null hypothesis.

In this example, the calculated value is above the critical value, and thus we reject the null hypothesis.

• • • •

13.1.1 Tests of association

Calculating the expected values

Step 7 Put the conclusion into English.

There appears to be an association between the type of property people are living in and the frequency of using the local post office. We need now to examine whether such an association is meaningful within the problem context and the extent to which the association can be explained by other factors. The chi-squared test is only telling you that the association (a word we use in this context in preference to relationship) is likely to exist but not what it is.

13.1.1 Tests of association

Calculating the expected values

• • • • •

13.1.1 Tests of association

An adjustment

In fact, although the basic methodology of the test is correct, there is a problem.

One of the basic conditions for the chi-squared test is that all of the expected frequencies must be above

five.

This is not true for our example!

In order to make this condition true, we need to combine adjacent categories until their expected frequencies are equal to five or more.

To do this, we will combine the two categories 'Bedsit' and 'Other' to represent all non-house or flat dwellers; it will also be necessary to combine 'Twice a week' with 'More often' to represent anything above once a week.

• • • • • •

13.1.1 Tests of association

An adjustment

The new three-by-three cross-tabulation is shown as Table 13.6.

The number of degrees of freedom now becomes (3 - 1) x (3 1) = 4, and the critical value (from tables) is 9.49.

Re-computing the value of chi-squared from Step 4, we have a value of approximately 107.6, which is still substantially above the critical value of chi-squared. However, in other examples, the amalgamation of categories may affect the decision.

In practice, one of the problems of meeting this condition is deciding which categories to combine, and deciding what, if anything, the new category represents. (One of the most difficult examples is the need to combine ethnic groups in a sample.)

• •

13.1.1 Tests of association

An adjustment

This has been a particularly long example since we have been explaining each step as we have gone along. Performing the tests is much quicker in practice, even if a computer package is not used.

13.1.1 Tests of association

An adjustment

Table 13.6

How often House Type of property Rat Observed frequencies: Once a month Once a week More often Expected frequencies: Once a month Once a week More often 30 110 10 20 100 30 5

80

15 13.3

66.7

20 Other 5 10 35 6.7

3

.

33 10

• •

Example

13.1.1 Tests of association

An adjustment

Purchases of different strengths of lager are thought to be associated with the gender of the drinker and a brewery has commissioned a survey to find if this is true. Summary results are shown below.

Strength High Medium Low Male Female 20 10 50 55 30 35

13.1.1 Tests of association

An adjustment

1. H 0 : No association between gender and strength bought. H 1 : An association between the two.

2. Significance level is 5%.

3. Degrees of freedom = (2 -1) x (3 -1) = 2. Critical value = 5.99.

4. Find totals:

Male Female Total

13.1.1 Tests of association

An adjustment

High 20 10 30 Strength Medium 50 55 105 Low 30 35 65 Total 100 100 200

13.1.1 Tests of association

An adjustment

Expected frequency for Male and High Strength is

O 20 10 50 55 30 35 Male Female Total

13.1.1 Tests of association

High

An adjustment

Medium Low 15 15 30

E

15 15 52.5

52.5

32.5

32.5

52.5

52.5

105 Calculating chi-squared: (0-E) 5 -5 -2.5

2.5

-2.5

2.5

32.5

32.5

65 (0-E) 2 25 25 6.25

6.25

6.25

6.25

Total 100 100 200

(0-E) 2 /E

1.667

1.667

0.119

0.119

0.192

0.192

13.1.1 Tests of association

An adjustment

Chi-squared = 3.956

5. 3.956 < 5.99.

6. Therefore we cannot reject the null hypothesis.

7. There appears to be no association between the gender of the drinker and the strength of lager purchased at the 5% level of significance.

• • •

13.1.2 Tests of goodness-of-fit

If the data has been collected and seems to follow some pattern, it would be useful to identify that pattern and to determine whether it follows some (already) known statistical distribution.

If this is the case, then many more conclusions can be drawn about the data. (We have seen a selection of statistical distributions in Chapters 9 and 10.)

• • • • •

13.1.2 Tests of goodness-of-fit

The chi-squared test provides a suitable method for deciding if the data follows a particular distribution, since we have the observed values and the expected values can be calculated from tables (or by simple arithmetic). For example, do the sales of whisky follow a Poisson distribution? If the answer is 'yes', then sales forecasting might become a much easier process.

Again, we will work our way through examples to clarify the various steps taken in conducting goodness-of-fit tests.

The statistic used will remain as:

13.1.2 Tests of goodness-of-fit

• where 0 is the observed frequencies and

E

is the expected (theoretical) frequencies.

• • •

13.1.2 Tests of goodness-of-fit

Test for a uniform distribution

You may recall the uniform distribution from Chapter 9; it implies that each item or value occurs the same number of times. Such a test would be useful where we want to find if several individuals are all working at the same rate, or if sales of various 'reps' are the same. Suppose we are examining the number of tasks completed in a set time by five machine operators and have available the data shown in Table 13.7.

13.1.2 Tests of goodness-of-fit

Test for a uniform distribution

13.1.2 Tests of goodness-of-fit

Test for a uniform distribution

We can again follow the seven steps: • • • 1. State the hypotheses: H 0 : All operators complete the same number of tasks.

H 1 : All operators do not complete the same number of tasks.

(Note that the null hypothesis is just another way of saying that the data follows a uniform distribution.) 2. The significance level will be taken as 5%.

• • 3. The degrees of freedom will be the number of cells minus the number of parameters required to calculate the expected frequencies minus one. Here ν = 5 — 0—1=4. Therefore (from tables) the critical value is 9.49.

13.1.2 Tests of goodness-of-fit

4. time.

Test for a uniform distribution

Since the null hypothesis proposes a uniform distribution, we would expect all of the operators to complete the same number of tasks in the allotted This number is:

13.1.2 Tests of goodness-of-fit

Test for a uniform distribution

13.1.2 Tests of goodness-of-fit

Test for a uniform distribution

5. 0.5714 < 9.49.

6. Therefore we do not reject H 0 .

7. There is no evidence that the operators work at different rates in the completion of these tasks.

• •

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

A similar procedure may be used in this case, but in order to find the theoretical values we will need to know one parameter of the distribution.

In the case of a Binomial distribution this will be the probability of success, p (see Chapter 9).

For a Poisson distribution it will be the mean (λ).

• •

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

Example

The components produced by a certain manufacturing process have been monitored to find the number of defective items produced over a period of 96 days. A summary of the results is contained in the following table: Number of defective items:

0 1 2 3 4 5

Number of days:

15 20 20 18 13 10

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

You have been asked to establish whether or not the rate of production of defective items follows a Binomial distribution, testing at the 1% level of significance.

1. H

0

: the distribution of defectives is Binomial.

H 1

: the distribution is not Binomial.

2. The significance level is 1%.

3. The number of degrees of freedom is: No. of cells - No. of parameters -1 = 6-1-1 = 4 From tables (Appendix E) the critical value is 13.3 (but see below for modification of this value).

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

In order to find the expected frequencies, we need to know the probability of a defective item. This may be found by firts calculating the average of the sample results.

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

Using this value and the Binomial formula we can now work out the theoretical values, and hence the expected frequencies.

The Binomial formula states

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

• •

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

Note that because two of the expected frequencies are less than five, it has been necessary to combine adjacent groups. This means that we must modify the number of degrees of freedom, and hence the critical value.

Degrees of freedom = 4-1-1 = 2 Critical value = 9.21

We are now in a position to calculate chi-squared.

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

r

0 E (0-E) (0 - E) 2 (0 -

E) 2 /E

0, 1 2 3 35 20 18 24.6

10.4

108.16

32.34 -12.34 152.28

26.47

-8.47

71.74

4.3967

4.7086

2.7103

4, 5 23 12.59

10.41 108.37

8.6075

chi-squared=20.4231

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

5. 20.4231 > 9.21.

6. We therefore reject the null hypothesis.

7. There is no evidence to suggest that the production of defective items follows a Binomial distribution at the 1% level of significance.

• • •

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

Example

Items are taken from the stockroom of a jewellery business for use in producing goods for sale. The owner wishes to model the number of items taken per day, and has recorded the actual numbers for a hundred day period. The results are shown below.

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

Number of items 0 1 2 3 4 5 6 Number of days 7 17 26 22 17 9 2

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

If you were to build a model for the owner, would it be reasonable to assume that withdrawals from stock follow a Poisson distribution ? (Use a significance level of 5%.) 1. H 0 : the distribution of withdrawals is Poisson.

H 1 : the distribution is not Poisson.

2. Significance level is 5%.

3. Degrees of freedom = 7-1-1 = 5. Critical value = 11.1 (but see below).

4. The parameter required for a Poisson distribution is the mean, and this can be found from the sample data.

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

• We can now use the Poisson formula to find the various probabilities and hence the expected frequencies:

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

• • ('Note that, since the Poisson distribution goes off to infinity, in theory, we need to account for all of the possibilities. This probability is found by summing the probabilities from 0 to 5, and subtracting the result from one.)

13.1.2 Tests of goodness-of-fit

• •

Test for binomial and Poisson distributions

('Note that, since the Poisson distribution goes off to infinity, in theory, we need to account for all of the possibilities. This probability is found by summing the probabilities from 0 to 5, and subtracting the result from one.) • • • Since one expected frequency is below 5, it has been combined with the adjacent one. The degrees of freedom therefore are now 6-1-1 = 4, and the critical value is 9.49. Calculating chi-squared, we have:

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

(x) 0 E (0 - E) (0 -

E) 2 (O-E 2 )/ E

0 1 2 3 4 5+ 7 17 26 22 17 11 7.43

19.31

25.1

21.76

14.14

12.26

-0.43

-2.31

0.9

0.24

2.86

0.1849

5.3361

0.81

0.0576

8.1796

0.0249

0.2763

0.0323

0.0027

0.5785

-1.26

1.5876

0.1295

chi-squared = 1.0442

13.1.2 Tests of goodness-of-fit

Test for binomial and Poisson distributions

5. 1.0442 < 9.49.

6. We therefore cannot reject H 0 .

7. The evidence from the monitoring suggests that withdrawals from stock follow a Poisson distribution.

• •

13.1.2 Tests of goodness-of-fit

Test for the Normal distribution

This test can involve more data manipulations since it will require grouped (tabulated) data and the calculation of two parameters (the mean and the standard deviation) before expected frequencies can be determined. (In some cases, the data may already be arranged into groups.)

• •

13.1.2 Tests of goodness-of-fit

Test for the Normal distribution

Example

The results of a survey of 150 people contain details of their income levels which are summarized here.

Test at the 5% significance level if this data follows a Normal distribution.

• • • • • • •

13.1.2 Tests of goodness-of-fit

Test for the Normal distribution 1. H 0

: The distribution is Normal.

H 1

. The distribution is not Normal.

2. Significance level is 5%.

3. Degrees of freedom = 6- 2-1 = 3. Critical value (from tables) = 7.81.

4. The mean of the sample = £266 and the standard deviation = £239.02 calculated from original figures. (N.B. see Section 11.2.2 for formulae.) To find the expected values we need to: – convert the original group limits into z-scores (by subtracting the mean and then dividing by the standard deviation) – find the probabilities by using the Normal distribution tables (Appendix C); and – find our expected frequencies

13.1.2 Tests of goodness-of-fit

Test for the Normal distribution

This is shown in the table below.

• • (Note that the last two groups are combined since the expected frequency of the last group is less than five. This changes the degrees of freedom to 5 - 2 - 1 = 2, and the critical value to 5.99.)

13.1.2 Tests of goodness-of-fit

Test for the Normal distribution

• • • 5. 45.8605 > 5.99.

6. We therefore reject the null hypothesis.

7. There is no evidence that the income distribution is Normal.

13.1.3 Summary note

• • • • • It is worth noting the following characteristics of chi-squared:

χ 2

is only a symbol; the square root of χ

2

has no meaning.

χ 2

has never be less than zero. The squared term in the formula ensures positive values.

χ 2

is concerned with comparisons of observed and expected frequencies (or counts). We therefore only need a classification of data to allow such counts and not the more stringent requirements of measurement.

If there is a close correspondence between the observed and expected frequencies, χ

2

will tend to a low value attributable to sampling error, suggesting the correctness of the null hypothesis.

If the observed and expected frequencies are very different we would expect a large positive value (not explicable by sampling errors alone), which would suggest that we reject the null hypothesis in favour of the alternative hypothesis.

• • •

13.2 Mann-Whitney U test

This non-parametric test deals with two samples that are independent and may be of different sizes.

It is the equivalent of the t-test that we considered in Chapter 12. Where the samples are small (<30) we need to use tables of critical values (Appendix G) to find whether or not to reject the null hypothesis; but where the sample size is large, we can use a test based on the Normal distribution.

• • • •

13.2 Mann-Whitney U test

The basic premise of the test is that once all of the values in the two samples are put into a single ordered list, if they come from the same parent population, then the rank at which values from Sample 1 and Sample 2 appear will be by chance (at random).

If the two samples come from different populations, then the rank at which sample values appear will not be random and there will be a tendency for values from one of the samples to have lower ranks than values from the other sample.

We are therefore testing whether the positioning of values from the two samples in an ordered list follows the same pattern or is different.

While we will show how to conduct this test by hand calculation, packages such as SPSS and MINITAB will perform the test by using the appropriate commands.

• • • • •

13.2.1 Small sample test

Consider the situation where samples have been taken from two branches of a chain of stores. The samples relate to the daily takings and both branches are situated in city centres. We wish to find if there is any difference in turnover between the two branches.

Branch 1: £235, £255, £355, £195, £244, £240, £236, £259, £260 Branch 2: £240, £198, £220, £215, £245

13.2.1 Small sample test

1. H 0 : the two samples come from the same population. H 1 : the two samples come from different populations.

2. We will use a significance level of 5%.

3. To find the critical level for the test statistic, we look in the tables (Appendix G) and locate the value from the two sample sizes. Here the sizes are 9 and 5, and so the critical value of U is 8.

4. To calculate the value of the test statistic, we need to rank all of the sample values, keeping a note of which sample each value came from.

• Rank 1 2 3 4 5 6 7.5

7.5

9 10 11 12 13 14

13.2.1 Small sample test

Value 195 198 215 220 235 236 240 240 244 245 255 259 260 355 Sample 1 1 1 1 1 2 2 2 1 1 1 2 1 2 (Note that in the event of a tie in ranks, an average is used.)

• •

13.2.1 Small sample test

We now sum the ranks for each sample: Sum of ranks for Sample 1 = 78.5 Sum of ranks for Sample 2 = 26.5

We select the smallest of these two, i.e. 26.5 and put it into the following formula: • where S is the smallest sum of ranks, and n, is the number in the sample whose ranks we have summed.

13.2.1 Small sample test

5. 11.5 > 8.

6. Therefore we reject H

0 .

7. Thus we conclude that the two samples come from different populations.

• • • •

13.2.2 Large sample test

Consider the situation where the awareness of a company's product has been measured amongst groups of people in two different countries.

Measurement is on a scale of 0-100, and each group has given a score in this range; the scores are shown below.

Country A: 21, 34, 56, 45, 45, 58, 80, 32, 46, 50, 21, 11, 18, 89, 46, 39, 29,67, 75, 31, 48 Country B: 68, 77, 51, 51, 64, 43, 41, 20, 44, 57, 60

13.2.2 Large sample test

Test to discover whether the level of awareness is the same in both countries.

1. H 0 : the levels of awareness are the same. H 1 : the levels of awareness are different.

2. We will take a significance level of 5%.

3. Since we are using an approximation based on the Normal distribution, the critical values will be ±1.96 for this two-sided test.

4. Ranking the values, we have:

8 9 10 11 12 13 14.5

14.5

16.5

Rank 1 2 3 4.5

4.5

6 7 32 34 39 41 43 44 45 45 46 Value 11 18 20 21 21 29 31

13.2.2 Large sample test

Sample Rank Value A A B A A A A A A A A A A B B B 24 25 26 27 28 29 30 31 32 16.5

18 19 20.5

20.5

22 23 46 48 50 51 51 56 57 58 60 64 67 68 75 77 80 89 B A A A B B A B A B B A B Sample A A A

13.2.2 Large sample test

Sum of ranks of A = 316.

13.2.2 Large sample test

The approximation based on the Normal distribution requires the calculation of the mean and standard deviation as follows:

13.2.2 Large sample test

We can now calculate the z value which gives the number of standard errors from the mean and can be compared to the critical value from the Normal distribution.

5. 1.21 < 1.06.

6. Therefore we cannot reject the null hypothesis.

7. There appears to be no difference in the awareness of the product between the two countries.

• • •

13.3 Wilcoxon test

This test is the non-parametric equivalent of the t test for matched pairs and is used to identify if there has been a change in behaviour. This could be used when analysing a set of panel results, where information is collected both before and after some event (for example, an advertising campaign) from the same people.

Here the basic premise is that while there will be changes in behaviour, or opinions, the ranking of these changes will be random if there has been no overall change (since the positive and negative changes will cancel each other out).

• • •

13.3 Wilcoxon test

Where there has been an overall change, then the ranking of those who have moved in a positive direction will be different from the ranking of those who have moved in a negative direction.

As with the Mann-Whitney test, where the sample size is small we shall need to consult tables to find the critical value (Appendix H); but where the sample size is large we can use a test based on the Normal distribution.

While we will show how to conduct this test by hand calculation, again packages such as SPSS and MINITAB will perform the test by using the appropriate commands.

• •

13.3.1 Small sample test

Consider the situation where a small panel of eight members have been asked about their perception of a product before and after they have had an opportunity to try it.

Their perceptions have been measured on a scale, and the results are given below.

13.3.1 Small sample test

13.3.1 Small sample test

You have been asked to test if the favourable perception (shown by a high score) has changed after trying the product.

1. H 0 : There is no difference in the perceptions. H 1 : There is a difference in the perceptions.

2. We will take a significance level of 5%, although others could be used.

3. The critical value is found from tables (Appendix H); here it will be 2. We use the number of pairs (8) minus the number of draws (1). Here the critical value is for n= 8 - 1 = 7.

4. To calculate the test statistic we find the differences between the two scores and rank them by absolute size (i.e. ignoring the sign). Any ties are ignored.

7 • • 4 5 7 6 Before 8 3 6 4 9 4

13.3.1 Small sample test

After Difference Rank + 1 + 1 2 2 -2 4 1 6 7 9 -3 + 1 0 +3 5.5

2 ignore 5.5

2 -5 7 Sum of ranks of positive differences = 11.5. Sum of ranks of negative differences = 16.5.

13.3.1 Small sample test

We select the minimum of these (i.e. 11.5) as our test statistic.

5. 11.5 > 2.

6. Therefore we cannot reject the null hypothesis.

(This may seem an unusual result, but you need to look carefully at the structure of the test.) 7. There has been a change in perception after trying the product.

• • • •

13.3.2 Large sample test

Consider the following example. A group of workers has been packing items into boxes for some time and their productivity has been noted. A training scheme is initiated and the workers' productivity is noted again one month after the completion of the training. The results are shown below.

13.3.2 Large sample test

13.3.2 Large sample test

1. H 0 : there has been no change in productivity.

H 1 : there has been a change in productivity.

2. We will use a significance level of 5%.

3. The critical value will be ±1.96, since the large sample test is based on a Normal approximation.

4. To find the test statistic, we must rank the differences, as shown below:

Person M N

O

P

Q

R S T U V

W

X Y

Z

I J K L A B C D E F G H

13.3.2 Large sample test

Before After Difference Rank

10 20 30 25 27 19 8 17 14 18 21 23 32 18 25 24 16 25 40 21 11 19 27 32 41 33 21 19 30 26 21 22 20 16 25 16 24 24 31 22 24 30 12 24 41 25 16 17 25 33 40 39 11 -1 0 1 -6 3 12 -1 11 -2 3 1 1 -1 6 4 -1 6 -4 -1 -1 1 4 5 -2

-2

23.5

5.5

ignore

5.5

21 14.5

25 5.5

23.5

12 14.5

5.5

5.5

5.5

17 19 12 12 5.5

5.5

21 17 5.5

21 17 5.5

• • • • •

13.3.2 Large sample test

(Note the treatment of ties in absolute values when ranking. Also note that n is now equal to 25.) Sum of positive ranks = 218 Sum of negative ranks = 107 (minimum) The mean is given by:

13.3.2 Large sample test

The standard error is given by: • Therefore the value of z is given by 5. -1.96 <-1.493.

6. Therefore we cannot reject the null hypothesis.

7. A month after the training there has been no change in the productivity of the workers.

• • • • • •

13.4 Runs test

This is a test for randomness in a dichotomized variable, for example gender is either male or female. The basic assumption is that if gender is unimportant then the sequence of occurrence will be random and there will be no long runs of either male or female in the data.

Care needs to be taken over the order of the data when using this test, since if this is changed it will affect the result.

The sequence could be chronological, for example the date on which someone was appointed if you were checking on a claimed equal opportunity policy. The runs test is also used in the development of statistical theory, for example looking at residuals in time series analysis.

While we will show how to conduct this test by hand calculation, packages such as SPSS and MINITAB will perform the test by using the appropriate commands.

13.4 Runs test

Example

With equal numbers of men and women employed in a department there have been claims that there is discrimination in choosing who should attend conferences. During April, May and June of last year the gender of those going to conferences was noted and is shown below.

13.4 Runs test

13.4 Runs test

1. H 0 : there is no pattern in the data (i.e. random order). H 1 : there is a pattern in the data.

2. We will use a significance level of 5%.

3. Critical values are found from tables (Appendix I).

Here, since there are six men and five women and we are conducting a two-tailed test, we can find the values of 3 and 10.

4. We now find the number of runs in the data:

Date April 10 April 12 April 16 April 25 May 10 May 14 May 16 June 2 June 10 June 14 June 28

13.4 Runs test

Person attending Male Female Female Male Male Male Male Female Female Male Female Run no.

1 2 3 4 5 6

13.4 Runs test

Note that a run may constitute just a single occurrence (as in run 5) or may be a series of values (as in run 3). • The total number of runs is 6.

5. 3 < 6 < 10.

6. Therefore we cannot reject the null hypothesis.

7. There is no evidence that there is discrimination in the order in which people are chosen to attend conferences.

• • •

13.4 Runs test

Many of the tests which we have considered use the ranking of values as a basis for deciding whether or not to reject the null hypothesis, and this means that we can use non-parametric tests where only ordinal data is available. Such tests do not suggest that the parameters (e.g. the mean and variance), are unimportant, but that we do not need to know the underlying distribution in order to carry out the test. They are also called distribution free tests.

• • •

13.4 Runs test

In general, non-parametric tests are less efficient than parametric tests (since they require larger samples to achieve the same level of Type II error).

However, we can apply non-parametric tests to a wide range of applications and in many cases they may be the only type of test available. This chapter has only considered a few of the many non-parametric tests but should have given you an understanding of how they differ from the more common, parametric tests and how they can be constructed.

• • • • • •

Part 4 Conclusions

This part of the book has considered the ideas of generalizing the results of a survey to the whole population, and the testing of assertions about that population on the basis of sample data. Both of these concepts are fundamental to the use of statistics in making business decisions. They are essential if you want to take a set of sample results, say from market testing a new product, and make predictions about it's likely sales success, or otherwise. They need to be used in quality control systems which monitor items of production (or service) and compare them to set levels (derived from confidence intervals).

We saw in Part 1 that sampling was necessary in order to report meaningful results in a timely fashion.

Now we have a tool which can take these results and generalize them to the parent population.

• • •

Part 4 Conclusions

As we already knew, some data sets cannot be adequately described by statistical parameters such as the mean and standard deviation.

In these cases we are still able to test assertions about the population from the sample results by using non-parametric tests.

In much survey research, the chi-squared test is the most widely used test since the ideas and attitudes being measured do not use a cardinal scale.

• • •

Part 4 Conclusions

There will always be some dispute about the meaning of testing results and a formal mechanism such as the hypothesis test is not going to completely dispel this. What it can do, however, give some agreement on the method used for testing whilst allowing much more discussion of the hypotheses, the means of deriving the sample, and the ways of interpreting the result. This might even lead to more considered decisions being taken!