Week 7: Sampling Distributions

Download Report

Transcript Week 7: Sampling Distributions

Sampling Distributions

Week 7

Objectives

One completion of this module you should be able to:  calculate the standard error of the mean and explain the effect of sample size on the standard error,   explain the concept of a sampling distribution for samples taken from either normal or non normal populations and understand the central limit theorem, calculate the standard error of the proportion, 2

Objectives

One completion of this module you should be able to:  calculate probabilities relating to sample means and proportions,  use the normal approximation to the binomial and Poisson distributions and  understand and apply sampling techniques for finite populations.

3

Sampling distributions

    We use sample data to estimate population parameters.

X

gives us an estimate of  , and

p μ

,

S

estimates

π

.

Sample error occurs since the sample does not reflect the population exactly.

Standard error

parameter estimate varies from sample to sample.

measures how the 4

Sampling distribution of the mean

    the average of all possible sample means will be equal to the population mean

μ

.

We’ll demonstrate with a very small population… 5

Sampling distribution of the mean

Component A B C D Number of faults 5 3 6 2   4  4  Four components of a coffee machine are tested and the number of faults found on each is recorded.

0.25 0.2 0.15 0.1 0.05 1 2 3 4 5 6 7 X 6

 0.4 0.3 0.2 0.1  Samples of size two ( n =2) are drawn from the population ( N =4).

1 2 3 4 5 6 X

X

 6 4   !!

 A, B A, C A, D B, C B, D C, D

X

  4 2

X X X X X

      5.5

2 5 2  3.5

2 3 6 2 3 2 2 6 2 2  4.5

  2.5

4 Note: there are possible samples

N

  

N C n

7

Sampling distribution of the mean

   We see from this example that    .

X

The arithmetic mean is an unbiased estimator of the population mean.

Although we can’t be sure that a sample mean is close to the population mean, we can be sure that the average of all sample means is equal to the population mean.

8

Sampling distribution of the mean

   There is variation in the sample means but not as much as in the population.

Standard error of the mean

of the variability in the mean from sample to sample.

The standard error is: 

X

 

n

– a measure where and

n

 = population standard deviation = sample size.

9

Sampling distribution of the mean

   If sampling is from a normal population (with mean

μ

and standard deviation sampling distribution of the mean will also be normally distributed with mean   

X

) then the and standard This allows us to find the probability that a sample mean is greater than or less than certain values etc.

We use:

Z

X

 

X

X

X

  

n

10

Sampling distribution of the mean

  We can also rearrange the expression for to find an interval within which a fixed proportion of sample means fall:   

Z

X

Z n

So for example to include 95% of sample means we would substitute in

Z

obtain the following two values:   1.96

to

X

1.96

X

1.96

Lower

n

Upper

n

11

Sampling distribution of the mean

 If sampling is from a (with mean the sampling distribution of the mean can be approximated by the normal distribution when the sample size is sufficiently large (usually

n

μ

and standard deviation  ) then

Central Limit Theorem

30 ).

non-normal

population states that the 12

Sampling distribution of the mean

   CLT applies regardless of the shape distribution of individual values in the population.

If the population is highly skewed (rare), more than 30 observations may be needed for normality to be approximated… If the population is fairly symmetrical sample sizes may only need to be 15 or more.

13

Example 7-1

The distribution of times it takes an office worker to complete a particular task is known to have a mean of eight minutes and a standard deviation of two minutes.

If random samples of forty tasks are taken, find: (a) the probability that the average time spent per task will be more than nine minutes 14

Solution 7-1

   We are given   8,   2 and

n

 40.

The standard error of the mean is:   2

X

n

 40 Given that we are looking for Z-value is:

Z

X

 

X

X

X

     2   3.16

n

40 9  , the 15

Solution 7-1

 9    3.16

  0.00079

 The probability that the average time spent per task will be more than nine minutes is 0.00079.

16

Solution 7-1

(b) the proportion of sample means that will be between 7.2 and 8.5 minutes  We are looking for

P

 7.2

X

Z

1 

X

 

X

X

X

    2   2.53

Z

2 

X

 

X

X

n X

    40 2  1.58

n

40 17

Solution 7-1

P

 7.2

X

 8.5

  

P

  2.53

 0.9372

1.58

 The proportion of sample means that can be expected to be between 7.2 and 8.5 minutes is 0.9372 (93.72%).

18

Solution 7-1

(c) If the random sample had been of only 20 tasks, what changes would this make to you answers in (a) and (b)?

What assumptions would you need to make in order to be able to answer (a) and (b) based on a sample of 20 tasks?

19

Solution 7-1

   Sample of 20 is less than required by CLT.

If the population is known to be normal (we are not told this) or symmetrical then CLT may still apply and we could solve the problem using the methods in (a) and (b).

If not, then we can’t assume the means are normally distributed and couldn’t solve (a) and (b).

20

Solution 7-1

(d) Which of the following is more likely to occur:    a sample mean below 7.5 minutes in a sample of 30 tasks?

a sample mean below 7.5 minutes in a sample of 50 tasks?

an individual task taking less than two minutes?

21

Solution 7-1

 sample mean below 7.5 minutes in a sample of

30

tasks?

Z

X

    2   1.37

n

 7.5

  30   1.37

  0.0853

22

Solution 7-1

 sample mean below 7.5 minutes in a sample of

50

tasks?

Z

X

    2   1.77

n

 7.5

  50   1.77

  0.0384

23

Solution 7-1

 Individual task taking less than 2 minutes?

Z

X

    2    2   3   3   0.00135

Therefore, the most likely outcome is a sample mean below 7.5 minutes in a sample of 30.

24

Sampling distribution of the proportion

  Often we are interested in the proportion of items in a population which possess a certain characteristic.

When we can’t examine every item in the population, we estimate this proportion with:

p s

X n

 number of items having the characteristic sample size 25

Sampling distribution of the proportion

  As with sample means, estimates of the proportion will differ between samples.

The standard error of the proportion is: 

p s

p

 1 

p

n

26

Sampling distribution of the proportion

  When sampling with replacement, the sampling distribution of the proportion follows the binomial distribution.

We’ll see shortly that when the following conditions are met, this can be approximated by the normal distribution:

np

  

p

  5 27

Sampling distribution of the proportion

 The difference between the sample proportion and the population proportion in standardised normal units is:

Z

p p s

 1  

p p

n

28

Example 7-2

Recent research has indicated a growing number of young children from two parent families are being placed in child care so that both parents can work.

Although the families increase their income with two pay packets, the cost of child care is often prohibitively high.

A particular study indicated that 40% of families have children in child care facilities.

29

Example 7-2

(a) If a random sample of 100 two-parent families is selected, find: the proportion of samples which will have between 40% and 50% of families using child care facilities 30

Solution 7-2

We are given

p

= 0.4 and

n

= 100.

np

  40  5

n

 1 

p

   60  5 Therefore the sample size is large enough to use normal distribution approximation.

We are looking for

P

 0.4

p s

 31

Solution 7-2

Z

1 

Z

2 

p p s

 1  

p p

n

p p s

 1  

p p

n

  100   0  100   2.04

32

Solution 7-2

P

 0.4

p s

 0.5

  

P

 0  0.4793

2.04

 So the proportion of samples between 40% and 50% will be 0.4793.

33

Example 7-2

If a random sample of 100 two-parent families is selected, find: (b) the probability of obtaining a sample percentage of greater than 45%

Z

p p s

 1  

p p

n

  100   1.02

34

Solution 7-2

s

 0.45

   1.02

  0.1539

So the probability of obtaining a sample percentage of greater than 45% is 0.1539.

35

Example 7-2

 (c) Within what symmetrical limits of the population percentage will 95% of the sample percentages fall?

95% of the standard normal curve is between ± 1.96 and so

P

  1.96

1.96

  0.95

36

Solution 7-2

Now rearranging

Z

p p s

 1  

p p

n

we get

p

 1 

p

p s Z n

37

Solution 7-2

Substituting in the two values of that:

p s

   100

Z

, we find  0.3040 (to 4 dec. pl.)

p s

 100  0.4960 (to 4 dec. pl.) So 95% of the sample percentages will fall between 30.40% and 49.60%.

38

Normal approximation to the binomial distribution

   As we saw earlier, we require that

np

  

p

  5 in order for the normal approximation to the binomial to be appropriate.

We also need to consider a continuity correction since the normal distribution is continuous whilst the binomial is discrete.

We’ll demonstrate this via an example.

39

Normal approximation to the binomial distribution

 We know that for a binomial distribution  and so 

Z np

 and 

X

   

np

 1 

p

 becomes

Z

X a n

   1

np

p

 where

X

a is adjusted using the continuity correction.

40

Example 7-3

A company offers its sales staff a choice of three salary packages.

Package A includes a base salary of $50000 per year as well as 1% commission on all sales made by the staff member.

Package B includes a base salary of $20000 plus a 4% sales commission and package C consists solely of a 7% sales commission.

41

Example 7-3

(a) The company has designed the packages in such a way as to expect equal numbers of staff to choose each option.

If a random selection of six sales staff is taken, what is the probability that at least three will select package C?

We are given

n

= 6.

42

Solution 7-3

  Since we are interested only in whether they select package C or not (i.e. when the result is not choosing package C, we aren’t interested in whether it is A or B), we can say that

p

 1 3 We want to find the value of  43

Solution 7-3

  Given that 1

np

 6 5 and

n

 1 

p

   6 1  1 3  5 we cannot use the normal approximation (and must use the binomial distribution).

We will use the binomial formula (since the exact

p

value is not tabulated).

44

Solution 7-3

 3   2   6!

1 2     4     6!

1 2     5 0  6!

1 2     6  0.3196

Given a random selection of six sales staff, the probability that at least three will select salary package C is 0.3196.

45

Example 7-3

(b) If a random selection of twenty sales staff is taken, what is the

approximate

probability that at least three will select package C?

np n

 1   20

p

 1    20  6 2 2 3   3   5 13 1 3  5 So we can use the normal approximation.

46

Continuity correction

    The normal distribution is continuous but the binomial is discrete.

This means the binomial can only take on certain values (like the bars in a histogram).

The normal distribution can take on any value so is drawn as a continuous line (see next slide).

The two distributions will therefore have differences.

47

Continuity correction

Normal Binomial 48

Continuity correction

In our example we are looking for area).

 on the binomial distribution (the shaded  3  1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

49

Continuity correction

  To include the everything greater than or equal to 3, the entire bar must be included.

On the normal curve this means everything from 2.5 upwards.

1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

50

Solution 7-3

Z

 

X np

a

 1 

np p

 3    20 1 3 2      1.98

   1.98

 0.9761

Given a random selection of twenty employees, the approximate probability that at least three will select salary package C is 0.9761.

51

Normal approximation to the Poisson distribution

   With the Poisson distribution   2   We can use the normal distribution to approximate the Poisson distribution whenever   5.

where 

X

a 

Z

X a

   is adjusted (continuity correction).

52

Example 7-4

Customers arrive at a busy takeaway coffee counter at the rate of five per minute.

(a) What is the probability that in any given minute three or fewer customers arrive?

  5   3     0       2    3   0.0067

 0.0337

  0.2650 (using Poisson tables).

53

Solution 7-4

 (b) What is the arrive?

approximate

probability that in any given minute three or fewer customers  Since  5 we can use the normal approximation to the Poisson distribution.

X

a = 3.5 since we are looking for less than or equal to 3.

Z

X a

     5   0.67

  3      0.67

  0.2514

54

Solution 7-4

(c) Compare your answers from (a) and (b).

 There is a difference of 0.0136.

 Normal distribution has been a reasonably accurate approximation of the Poisson distribution in this case.

55

Sampling from finite populations

   Until now we have assumed sampling

with

replacement and an infinite population in our calculations, or at least that our sample size is very small relative to the population size.

Sampling is more often

without

replacement.

We use the

factor finite population correction

when  the population is

finite

of size

N

and  the sample size

n

population size i.e.

is

not

n N

small relative to the  0.05.

56

Sampling from finite populations

  Finite population correction factor (fpc) fpc 

N N

n

 1 Standard error of the mean for finite populations 

X

n N N

n

 1 

p s

p

 1 

p

n N N

n

 1 57

Example 7-5

The management team of a large company has been investigating the work habits of the employees of the company.

They have been concerned that some of the employees are spending large portions of the working day outside having smoking breaks.

It is known that 500 employees are regular smokers.

58

Example 7-5

It is expected that the time these employees spend smoking per day is normally distributed with a mean of 25 minutes and a standard deviation of eight minutes.

If a random sample of 50 of the smokers is selected without replacement, what proportion of the sample means would be greater than 26 minutes?

59

Example 7-5

  We have

μ

Since

n

N

= 25 and 50 500   = 8.

and the sample is without replacement, the finite population correction factor is needed.

X

25 

X

 

n N N

n

 1  8 50  1.0744 (to 4 dec. pl.) 60

Solution 7-5

Z

X

 

X

X

 26    1.0744

 0.93 (2 dec. pl.)  0.93

  0.1762

So 0.1762 (17.62%) of the sample means can be expected to be greater than 26 minutes.

61

After the lecture each week…

     Review the lecture material Complete all readings Complete all of recommended problems (listed in SG) from the textbook Complete at least some of additional problems Consider (briefly) the discussion points prior to tutorials 62