Chapter 9 Means and Proportions as Random Variables Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9.1 Understanding Dissimilarity Among Samples Key: Need to understand what kind.

Transcript Chapter 9 Means and Proportions as Random Variables Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9.1 Understanding Dissimilarity Among Samples Key: Need to understand what kind.

Chapter 9

Means and Proportions as Random Variables

9.1

Understanding Dissimilarity Among Samples

Key:

Need to understand what kind of dissimilarity we should expect to see in various samples from the same population.

• Suppose knew most samples were likely to provide an answer that is within 10% of the population answer. • Then would also know the population answer should be within 10% of whatever our specific sample gave. • => Have a good guess about the population value based on just the sample value.

Statistics and Parameters

statistic

is a numerical value computed from a sample. Its value may differ for different samples.

and sample proportion .

parameter

is a numerical value associated with a population. Considered fixed and unchanging.

e.g. population mean

, population standard deviation

, and population proportion p.

Sampling Distributions Each new sample taken => sample statistic will change.

The distribution of possible values of a statistic for repeated samples of the same size from a

population is called the sampling distribution

of the statistic.

Many statistics

of interest have sampling distributions that are

approximately normal

Example 9.1

Mean Hours of Sleep for College Students

Survey of

n = 190

college students.

“How many hours of sleep did you get last night?” Sample mean = 7.1 hours.

If we repeatedly took samples of 190 and each time computed the sample mean, the histogram of the resulting sample mean values would look like the histogram at the right: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.

9.2

Sampling Distributions for Sample Proportions

• Suppose (unknown to us)

40% of a population carry the gene

for a disease, (

= 0.40). • We will take a

random sample of 25

people from this population and count

X = number with gene

. • Although we

expect

(on average) to find 10 people (40%) with the gene, we know the number will

vary

for different samples of

= 25.

• In this case,

with

is a = 25 and

binomial random variable

= 0.4.

Many Possible Samples

Four possible random samples of 25 people:

Sample 1:

X =

12, proportion with gene =12/25 = 0.48 or 48%.

Sample 2:

X =

9, proportion with gene = 9/25 = 0.36 or 36%.

Sample 3:

X =

10, proportion with gene = 10/25 = 0.40 or 40%.

Sample 4:

X =

7, proportion with gene = 7/25 = 0.28 or 28%.

Note:

• Each sample gave a different answer, which did not always match the population value of 40%.

• Although we cannot determine whether one sample will accurately reflect the population, statisticians have determined what to

expect

for most possible samples.

The Normal Curve Approximation Rule for Sample Proportions

Let

p =

population proportion of interest or binomial probability of success. If numerous random samples or repetitions of the same size

approximately

normal

curve distribution with •

Mean

p p

( 1 

) •

Standard deviation

The Normal Curve Approximation Rule for Sample Proportions

Normal Approximation Rule can be applied in

two situations

Situation 1

: A random sample is taken from a population.

Situation 2

: A binomial experiment is repeated numerous times. In each situation,

three conditions

must be met: Condition 1: The Physical Situation There is an actual population or repeatable situation.

Condition 2: Data Collection A random sample is obtained or situation repeated many times.

Condition 3: The Size of the Sample or Number of Trials The size of the sample or number of repetitions is relatively large,

and

n(1-p)

must be at least 5 and preferably at least 10.

Examples for which Rule Applies

•

Election Polls:

to estimate proportion who favor a candidate; units = all voters.

•

Television Ratings:

to estimate proportion of households watching TV program; units = all households with TV.

•

Consumer Preferences:

to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers.

•

Testing ESP:

to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess.

Example 9.2

Possible Sample Proportions Favoring a Candidate

Suppose 40% all voters favor Candidate X. Pollsters take a sample of

= 2400 voters. Rule states the sample proportion who favor X will have approximately a normal distribution with mean =

p p

( 1 

) 

0 .

4 ( 1  0 .

4 )  0 .

Estimating the Population Proportion from a Single Sample Proportion

In practice,

we don’t know the true population proportion

, ˆ

( 1 

) s.d.( ) = .

In practice,

we only take one random sample, so we only have

p p

deviation expression gives us an estimate that is called the

standard error of

( 1  ) s.e.( ) = .

= 2400, then the standard error is 0.01. So the true proportion who support the candidate is

almost surely

between 0.39 – 3(0.01) = 0.36 and 0.39 + 3(0.01) = 0.42.

9.3

What to Expect of Sample Means

• Suppose we want to

estimate the mean weight loss

for all who attend clinic for 10 weeks. Suppose (unknown to us) the

distribution of weight loss

approximately

N(8 pounds, 5 pounds)

is • We will take a

random sample of 25

people from this population and record for each

X = weight loss

. • We know the value of the

sample mean

for different samples of

= 25.

will

vary

• What do we

expect

those means to be?

Many Possible Samples

Four possible random samples of 25 people:

Sample 1: Mean = 8.32 pounds, standard deviation = 4.74 pounds.

Sample 2: Mean = 6.76 pounds, standard deviation = 4.73 pounds.

Sample 3: Mean = 8.48 pounds, standard deviation = 5.27 pounds.

Sample 4: Mean = 7.16 pounds, standard deviation = 5.93 pounds.

Note:

• Each sample gave a different answer, which did not always match the population mean of 8 pounds.

• Although we cannot determine whether one sample mean will accurately reflect the population mean, statisticians have determined what to

expect

for most possible sample means.

The Normal Curve Approximation Rule for Sample Means

Let m Let s

mean for population of interest.

standard deviation for population of interest. If numerous random samples of the same size

are taken, the

approximately

normal

curve distribution with •

Mean

= m s •

Standard deviation

This approximate distribution is

The Normal Curve Approximation Rule for Sample Means

Normal Approximation Rule can be applied in

two situations

Situation 1

: The population of measurements of interest is

bell-shaped

and a random sample of

any size

is measured.

Situation 2

: The population of measurements of interest is

not bell-shaped

but a

large

random sample is measured.

Note:

Difficult to get a Random Sample? Researchers usually willing to use Rule as long as they have a

representative

sample with no obvious sources of confounding or bias.

Examples for which Rule Applies

•

Average Weight Loss:

to estimate average weight loss; weight assumed bell-shaped; population = all current and potential clients.

•

Average Age At Death:

to estimate average age at which left-handed adults (over 50) die; ages at death not bell-shaped so need

 30; population = all left-handed people who live to be at least 50.

•

Average Student Income:

to estimate mean monthly income of students at university who work; incomes not bell-shaped and outliers likely, so need large random sample of students; population = all students at university who work.

Example 9.4

Hypothetical Mean Weight Loss

Suppose the

distribution of weight loss N(8 pounds, 5 pounds)

is approximately and we will take a random sample of

= 25 clients. Rule states the sample mean weight loss will have a normal distribution with mean = m s

 5 25  1 Histogram at right shows sample means resulting from simulating this situation 400 times.

Empirical Rule:

It is

almost certain

that the sample mean will be between 5 and 11 pounds.

Standard Error of the Mean

In practice,

the population standard deviation s is rarely

x x n

In practice,

we only take one random sample, so we only have the sample mean and the sample standard deviation Replacing s

x s

. with

in the standard deviation expression gives us an estimate that is called the

standard error of

x s

s.e.( ) = .

For a sample of

= 25 weight losses, the standard deviation is

Increasing the Size of the Sample

Suppose we take

= 100 people instead of just 25. The standard deviation of the mean would be s

 5 s.d.( ) = pounds.

100  0 .

5 • For samples of

n = 25

, sample means are likely to range between 8 ± 3 pounds => 5 to 11 pounds. • For samples of

n = 100

, sample means are likely to range only between 8 ± 1.5 pounds => 6.5 to 9.5 pounds.

Larger samples

tend to result in

more accurate

estimates of population values than smaller samples.

Sampling for a Long, Long Time: The Law of Large Numbers

LLN:

“

close

” to the population mean m

no matter how small a difference

you use to define “

close

.”

LLN = peace of mind to casinos, insurance companies.

• Eventually, after enough gamblers or customers, the mean net profit will be

close

to the theoretical mean.

•

Price to pay

= must have enough $ on hand to pay the occasional winner or claimant.

9.4

What to Expect in Other Situations: CLT

The

Central Limit Theorem

states that if

n

sufficiently large

, the

sample means

of random samples from a population with mean m and finite standard deviation s are

approximately normally distributed

with mean standard deviation . m and

Technical Note:

The mean and standard deviation given in the CLT hold for any sample size; it is only the “approximately normal” shape that requires

to be sufficiently large.

Example 9.5

California Decco Winnings

California Decco lottery game

mean amount lost

ticket over millions of tickets sold is m per = $0.35;

standard deviation

s = $29.67 => large variability in possible amounts won/lost, from net win of $4999 to net loss of $1.

Suppose store sells

100,000

tickets in a year.

CLT

=> distribution of possible sample mean loss per ticket is approximately normal with … mean (loss) = m s

 $ 29 .

67 100000  $ 0 .

Empirical Rule:

The

mean loss

is almost surely between $0.08 and $0.62 =>

total loss

for the 100,000 tickets is likely between $8,000 to $62,000! There are better ways to invest $100,000.

9.5

Sampling Distribution for Any Statistic

Every statistic

has a sampling distribution, but the appropriate distribution may not always be normal, or even approximately bell-shaped.

Construct an approximate sampling distribution for a statistic

by actually taking repeated samples of the same size from a population and constructing a relative frequency histogram for the values of the statistic over the many samples.

Example 9.6

Winning the Lottery by Betting on Birthdays

Pennsylvania Cash 5 lottery game:

Select 5 numbers from integers 1 to 39. Grand prize won if match all 5 numbers. One strategy = 5 numbers bet correspond to birth days of month for 5 family members => no chance to win if highest number drawn is 32 to 39.

What is the probability of this?

Statistic of interest = H = highest

of five integers randomly drawn without replacement from 1 to 39.

e.g.

if numbers selected are 3, 12, 22, 36, 37 then

= 37.

Example 9.6

Winning the Lottery by Betting on Birthdays

(cont)

Summarized below:

value of H for 1560 games

Highest number over 31 occurred in

72%

of the games.

Most common value of

= 39 in 13.5% of games.

9.6

Standardized Statistics

If conditions are met, these

standardized statistics

have, approximately, a standard normal distribution

(0,1).

Example 9.7

Unpopular TV Shows

Networks cancel shows with low ratings. Ratings based on random sample of households, using the sample proportion watching show as estimate of population proportion

p <

0.20, show will be cancelled.

Suppose in a random sample of 1600 households, 288 are watching (for proportion of 288/1600 = 0.18). Is it likely to see = 0.18 even if

were 0.20 (or higher)?

z

 

p p(

1 

p) n

 0 .

18  0 .

20 0 .

(

1  0 .

)

1600   2 .

00 The sample proportion of 0.18 is about 2 standard deviations

below

9.7

Student’s t-Distribution: Replacing

with s

Dilemma: we generally don’t know s . Using

we have:

 m



 m  

(

 m )

)

n s

If the sample size

is small, this standardized statistic will not have a

(0,1) distribution but rather a

t-distribution

– 1

degrees of freedom

with (df)

Example 9.8

Standardized Mean Weights

Claim

: mean weight loss is m = 8 pounds.

Sample of

=25 people gave a sample mean standard deviation of

= 4.74 pounds.

Is the sample mean of 8.32 pounds reasonable to expect if

= 8 pounds?



 m

s n

 8 .

32  8 4 .

74 25  0 .

34 The sample mean of 8.32 is

only

about one-third of a standard error

above

9.8

Statistical Inference

•

Confidence Intervals

: uses sample data to provide an interval of values that the researcher is confident covers the true value for the population.

•

Hypothesis Testing or Significance Testing

: uses sample data to attempt to reject the hypothesis that nothing interesting is happening, i.e. to reject the notion that chance alone can explain the sample results.

Case Study 9.1

Do Americans Really Vote When They Say They Do?

Election of 1994:

•

Time Magazine Poll

= 800 adults (two days after election),

56% reported that they had voted

. • Info from Committee for the Study of the American Electorate:

only 39% of American adults had voted

. If

= 0.39 then sample proportions for samples of size

= 800 should vary approximately normally with … mean =

p p

( 1 

) 

0 .

39 ( 1  0 .

39 )  0 .

Case Study 9.1

Do Americans Really Vote When They Say They Do?

If respondents were telling the truth, the sample percent should be no higher than 39% + 3(1.7%) = 44.1%, nowhere near the reported percentage of 56%.

If 39% of the population voted, the

standardized score

for the reported value of 56% is …

 0 .

56  0 0 .

017 .

39  10 .

0 It is virtually

impossible

to obtain a standardized score of 10.

Chapter 9 Means and Proportions as Random Variables Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9.1 Understanding Dissimilarity Among Samples Key: Need to understand what kind.

Transcript Chapter 9 Means and Proportions as Random Variables Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9.1 Understanding Dissimilarity Among Samples Key: Need to understand what kind.

Chapter 9

Means and Proportions as Random Variables

Understanding Dissimilarity Among Samples

Statistics and Parameters

Sampling Distributions Each new sample taken => sample statistic will change.

The distribution of possible values of a statistic for repeated samples of the same size from a

population is called the sampling distribution

Example 9.1

Sampling Distributions for Sample Proportions

Many Possible Samples

The Normal Curve Approximation Rule for Sample Proportions

The Normal Curve Approximation Rule for Sample Proportions

Examples for which Rule Applies

Example 9.2

Estimating the Population Proportion from a Single Sample Proportion

What to Expect of Sample Means

Many Possible Samples

The Normal Curve Approximation Rule for Sample Means

The Normal Curve Approximation Rule for Sample Means

Examples for which Rule Applies

Example 9.4

Standard Error of the Mean

Increasing the Size of the Sample

Sampling for a Long, Long Time: The Law of Large Numbers

What to Expect in Other Situations: CLT

Central Limit Theorem

n

sufficiently large

sample means

approximately normally distributed

Example 9.5

Sampling Distribution for Any Statistic

Example 9.6

Example 9.6

Standardized Statistics

If conditions are met, these

standardized statistics

have, approximately, a standard normal distribution

(0,1).

Example 9.7

z

Student’s t-Distribution: Replacing

with s

Example 9.8

Statistical Inference

Confidence Intervals

Hypothesis Testing or Significance Testing

Case Study 9.1

Do Americans Really Vote When They Say They Do?

Case Study 9.1

Do Americans Really Vote When They Say They Do?

Directory