The Chi-Square Distribution

Download Report

Transcript The Chi-Square Distribution

Goodness Of Fit

Goodness Of Fit

The purpose of a

chi-square goodness-of-fit test

is to compare an observed distribution to an expected distribution. For example, suppose there are four entrances to a building. You want to know if the four entrances are equally used. You observe 400 people entering the building on a random basis:

Entrance

Main Back Side 1 Side 2

Total Observed Frequency

140 120 90 50

400 Expected Frequency

100 100 100 100

400

H H

0 1 : :

p M = p B = p S1 = p S2 The proportions are not all equal.

If the entrances are equally utilized, we would expect each entrance to be used approximately 25% of the time. Is the difference shown above statistically significant?

Chi Square Test

If the observed frequencies are obtained from a random sample and each expected frequency is at least 5, the sampling distribution for the goodness of-fit test is a chi-square distribution with

k

-1 degrees of freedom. (where

k

= the number of categories)  2

Test Statistic

     

f o

f e f e

 2    O = observed frequency in each category E = expected frequency in each category

Goodness-of-Fit Test: Equal Expected Frequencies

Let

f

0 and

f

e be the observed and expected frequencies, respectively.

H

0 : There is no difference between the observed and expected frequencies.

H

0 :

p 1 = p 2 = p 3 = p 4 H

1 : There is a difference between the observed and the expected frequencies.

H

1 :

The proportions are not all equal.

df = 3 df = 5

k

-1 degrees of freedom. (where

k

= the number of categories) See Table P.495

df = 10

 2

EXAMPLE

The following information shows the number of employees absent by day of the week at a large manufacturing plant. At the .05 level of significance, is there a difference in the absence rate by day of the week? Day Monday Tuesday Wednesday Thursday Friday

Total

Frequency 120 45 60 90 130

445

EXAMPLE

continued

The expected frequency is: (120+45+60+90+130)/5=89.

The degrees of freedom is (5-1)=4.

The critical value is 9.488.

(Appendix B, P.495)

Example

continued

Day Monday Tuesday Wednesday Thursday Friday

Total

Freq.

120 45 60 90 130

445

Because the computed value of chi-square is greater than the critical value,

H

0 rejected.

is We conclude that there is a difference in the number of workers absent by day of the week.

Expec.

89 89 89 89 89

445

(

f

o

f

e

)

2

/

f

e

10.80

21.75

9.45

0.01

18.89

60.90

 2       

f o

f e f e

 2    

Example

Goodness of Fit

A seller of baseball cards wants to know if the demand for the following 6 cards is the same.

Tom Seaver Nolan Ryan Ty Cobb George Brett Hank Aaron Johnny Bench

Cards Sold

13 33 14 7 36 17

120

MegaStat

Tom Seaver Nolan Ryan Ty Cobb George Brett Hank Aaron Johnny Bench

Goodness of Fit Test Observed

13 33 14 7 36 17 120

Expected

34.40 chi-square 5 df 1.98E-06 p-value 20 20 20 20 20 20 120 O - E -7.000

13.000

-6.000

-13.000

16.000

-3.000

0.000

(O - E)² / E 2.450

8.450

1.800

8.450

12.800

0.450

34.400

% of chisq 7.12

24.56

5.23

24.56

37.21

1.31

100.00

Goodness Of Fit

(unequal frequencies)

Example - Goodness Of Fit

(unequal frequencies) The Bank of America (BoA) credit card department knows from national US government records that 5% of all US

VISA

card holders have no high school diploma, 15% have a high school diploma, 25% have some college, and 55% have a college degree. Given the information below, at the 1% level of significance can we conclude that (BoA) card holders are significantly different from the rest of the nation?

Education

Some HS HS Diploma Some College College Degree

Total Observed Frequency

50 100 190 160

500 Expected Frequency

25 75 125 275

500

= (500)(.05) = (500)(.15) = (500)(.25) = (500)(.55)

 2     

f o

f e f e

 2    115 .

22 

C

2  11 .

345

df = (4 - 1) = 3

Reject

H

0

Limitations of Chi-Square

Limitations of Chi-Square

1.) If there are only 2 cells, the expected frequency in each cell should be at least 5. 2.) For more than 2 cells, chi-square should not be used if more than 20% of

f e

cells have expected frequencies less than 5.

Roll-Of-The-Die Experiment

Outcome

1 2 3 4 5 6

TOTAL Observed Frequency

3 6 2 3 9 7

30 Expected Frequency

5 5 5 5 5 5

30

Two-thirds of the computed chi-square value is accounted for by just two categories (outcomes). Although the expected frequency is not less than 5, too much weight may be given to these categories. More experimental trials should be conducted to increase the number of observations.

MegaStat

Goodness of Fit Test observed 3 6 2 3 9 7 30 expected 5.000

5.000

5.000

5.000

5.000

5.000

30.000

7.60 chi-square 5 df .1797 p-value O - E -2.000

1.000

-3.000

-2.000

4.000

2.000

0.000

(O - E)² / E 0.800

0.200

1.800

0.800

3.200

0.800

7.600

% of chisq 10.53

2.63

23.68

10.53

42.11

10.53

100.00

Independence & Contingency Tables

Contingency Table Analysis

A contingency table is used to investigate whether two traits or characteristics are related. Each observation is classified according to two criteria.

The

degrees of freedom

is equal to:

df

= (# rows - 1)(# columns - 1).

The

expected frequency

is computed as: Expected Frequency = (row total)(column total)/Grand Total

EXAMPLE

Is there a relationship between the location of an accident and the gender of the person involved in the accident? A sample of 150 accidents reported to the police were classified by type and gender. At the .05 level of significance, can we conclude that gender and the location of the accident are related?

Gender

Work Home Other

Total

Male 60 Female 20

Total 80

20 30

50

10 10

20 90 60 150

EXAMPLE

continued

Gender

Work Home Other

Total

Male 60 Female 20

Total 80

20 30

50

10 10

20 90 60 150

The expected frequency for the work-male intersection is computed as (90)(80)/150=48. Similarly, you can compute the expected frequencies for the other cells.

H

0 : Gender and location are not related.

H 1

: Gender and location are related.

EXAMPLE

continued

H

0 is rejected if the computed value of χ 2 is greater than 5.991. There are (3- 1)(2-1) = 2 degrees of freedom.

 2 

Find the value of

χ

2

.

 60  48  2 48  10  8  2 8  16 .

667

H 0

is rejected. We conclude that gender and location are related.

MegaStat Example

Contingency Tables

A crime agency wants to know if a male released from prison and returned to his hometown has an easier (or more difficult) time adjusting to civilian life .

Residence After Release From Prison Adjustment to Civilian Life

Hometown Not Hometown

Total Outstanding

27 13

40 Good

35 15

50 Fair

33 27

60 Unsatisfactory

25 25

50 Total 120 80 200

MegaStat

Chi-square Contingency Table Test for Independence

Hometown Observed Expected Not Hometown Observed Expected Total Observed Expected Outstanding

27

24.00

13

16.00 40 40.00 Good

35

30.00

15

20.00 50 50.00 Fair Unsatisfactory

33

36.00

27

24.00 60 60.00

25

30.00

25

20.00 50 50.00 Total 120 120.00 80 80.00 200 200.00

5.73 chi-square

3 df

.126 p-value