Inference for the Mean of a Population

Download Report

Transcript Inference for the Mean of a Population

Chapter 7 and Chapter 8

1

Inference for the Mean of a Population – Part 1

Chapter 7.1

(omit sign test pp 469 – 470)

2

The situation where

is not known

• If  is known then the std deviation of the sample mean is given by  /sqrt(n) • We now consider the more realistic situation where  is not known. In effect, we estimate  using,

s,

the sample standard deviation.

3

4

df 28 29 30 60 1000 z* 9 10 11 12 5 6 7 8 1 2 3 4 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

t-table (Table D)

0.25

1.000

0.816

0.765

0.741

0.727

0.718

0.711

0.706

0.703

0.700

0.697

0.695

0.694

0.692

0.691

0.690

0.689

0.688

0.688

0.687

0.686

0.686

0.685

0.685

0.684

0.684

0.684

0.683

0.683

0.683

0.679

0.675

0.674

50.0% Upper tail probability p 0.2

0.15

1.376

1.061

0.978

0.941

1.963

1.386

1.250

1.190

0.920

0.906

0.896

0.889

0.883

0.879

0.876

0.873

0.870

0.868

0.866

0.865

0.863

0.862

0.861

1.156

1.134

1.119

1.108

1.100

1.093

1.088

1.083

1.079

1.076

1.074

1.071

1.069

1.067

1.066

0.860

0.859

0.858

0.858

0.857

0.856

0.856

0.855

0.855

0.854

0.854

0.848

0.842

0.842

60.0%

1.064

1.063

1.061

1.060

1.059

1.058

1.058

1.057

1.056

1.055

1.055

1.045

1.037

1.036

70.0% Confidence Level C 0.1

3.078

1.886

1.638

1.533

1.476

1.440

1.415

1.397

1.383

1.372

1.363

1.356

1.350

1.345

1.341

1.337

1.333

1.330

1.328

1.325

1.323

1.321

1.319

1.318

1.316

1.315

1.314

1.313

1.311

1.310

1.296

1.282

1.282

80.0% 0.05

6.314

2.920

2.353

2.132

2.015

1.943

1.895

1.860

1.833

1.812

1.796

1.782

1.771

1.761

1.753

1.746

1.740

1.734

1.729

1.725

1.721

1.717

1.714

1.711

1.708

1.706

1.703

1.701

1.699

1.697

1.671

1.646

1.645

90.0% 0.025

12.706

4.303

3.182

2.776

2.571

2.447

2.365

2.306

2.262

2.228

2.201

2.179

2.160

2.145

2.131

2.120

2.110

2.101

2.093

2.086

2.080

2.074

2.069

2.064

2.060

2.056

2.052

2.048

2.045

2.042

2.000

1.962

1.960

95.0% 0.01

31.821

6.965

4.541

3.747

3.365

3.143

2.998

2.896

2.821

2.764

2.718

2.681

2.650

2.624

2.602

2.583

2.567

2.552

2.539

2.528

2.518

2.508

2.500

2.492

2.485

2.479

2.473

2.467

2.462

2.457

2.390

2.330

2.326

98.0% 0.005

63.656

9.925

5.841

4.604

4.032

3.707

3.499

3.355

3.250

3.169

3.106

3.055

3.012

2.977

2.947

2.921

2.898

2.878

2.861

2.845

2.831

2.819

2.807

2.797

2.787

2.779

2.771

2.763

2.756

2.750

2.660

2.581

2.576

99.0% 0.0025

127.321

14.089

7.453

5.598

4.773

4.317

4.029

3.833

3.690

3.581

3.497

3.428

3.372

3.326

3.286

3.252

3.222

3.197

3.174

3.153

3.135

3.119

3.104

3.091

3.078

3.067

3.057

3.047

3.038

3.030

2.915

2.813

2.807

99.5% 0.001

318.289

22.328

10.214

7.173

5.894

5.208

4.785

4.501

4.297

4.144

4.025

3.930

3.852

3.787

3.733

3.686

3.646

3.610

3.579

3.552

3.527

3.505

3.485

3.467

3.450

3.435

3.421

3.408

3.396

3.385

3.232

3.098

3.090

99.8% 0.0005

636.578

31.600

12.924

8.610

6.869

5.959

5.408

5.041

4.781

4.587

4.437

4.318

4.221

4.140

4.073

4.015

3.965

3.922

3.883

3.850

3.819

3.792

3.768

3.745

3.725

3.707

3.689

3.674

3.660

3.646

3.460

3.300

3.291

99.9%

5

Using the t-table

6

• •

Example:

The following data are the amounts of vitamin C, measured in mg. per 100 grams of blend (dry basis) for a random sample of size 8 from a production run: 26,31,23,22,11,22,14,31 We want a 95% c.i. for µ, the mean vitamin C content produced during this run.

7

Example:

(dollars): A random sample of 10 one bedroom apartment rental ads from your local newspaper has these monthly rents 500,650,600,505,450,550,515,495,650,395.

Do these data give good reason to believe that the mean rent of all advertised one apartments is greater than $500 per month?

8

Matched Pairs

• Here are some sales before and after a motivational course.

Employee Before

1 212 2 3 282 203 4 5 6 327 165 198

After

237 291 191 341 192 180 Does the course appear to be effective in increasing sales?

9

Robustness of the t procedures

• A statistical procedure is said to be

robust

if the probability calculations required are insensitive to violations of the assumptions made: • For t: – n < 15: use t if data is clearly close to normal. If clearly non normal or outliers are present do not use t.

– 40>= n ≥ 15: can use t except in presence of outliers or strong skewness.

– Large samples: can use t procedures even for clearly skewed data when sample size is large, roughly n ≥ 40.

10

Inference for the Mean of a Population – Part 2: Comparing Two Means Chapter 7.2 (omit pp 498- 503)

11

Overview

• Want to compare means of two populations • Can use c.i. or hypothesis tests.

• Many specialized procedures -- depending on data and underlying distributions.

• We’ll look at some of the most important ones.

12

The idealized situation • • • • • We assume variances are known and normal population.

Doesn’t happen often in practice Can do hypothesis tests and compute p values as in Ch 6.

Example: sigma1 =20, sigma 2 =30, n1 = 120, n2 = 150, x1bar = 67.3, x2 bar = 72.0

H0: mu1- mu2 = 0. Ha: mu1 mu2 ≠0 – (a) Compute the z statistic and p-value.

– (b) Get a 95% c.i for mu1- mu2 13

Two sample t procedures

• The most common situation. We use sample standard deviations to estimate sigma1 and sigma2 .

14

Example

The purchasing department has suggested that all new computer monitors for your company should have flat screens. You want to be sure employees like them. The next 20 employees needing screens are randomly divided into two groups, with 10 in each group. 10 get flat screens, the other 10 get conventional monitors.

One month after receiving the monitors, the employees rate their satisfaction with their monitors on a scale from 1 to 5 by responding to the question “I like my new monitor ( 1= strongly disagree, 5 = strongly agree). Flat screen employees have an average satisfaction of 4.6 with std dev of 0.7. The employees with the standard monitors have an average 3.2 with a standard deviation of 1.6.

(a) Give a 95% c.i for the difference in mean satisfaction scores for all employees. (b) What about a hypothesis test for comparing the two means?

15

• • • Robustness of the two sample procedures Generally procedures are quite robust If

sample sizes are equal

and distributions of the two populations have similar shapes, p-values from t table are quite accurate even when n1 and n2 are as small as 5.

If

sample sizes are unequal

can use the following (same as for one sample t-tests and conf.ints., but replace n by n1+n2): – – –

n1+n2 < 15:

use t if data is clearly close to normal. If clearly non-normal or outliers are present do not use t.

n1+n2 ≥ 15:

presence of outliers or strong skewness.

can use t except in

Large samples:

40.

can use t procedures even for clearly skewed data when sample size is large, roughly n1+n2 ≥ 16

Small samples

• Have to be very careful.

– Substantial uncertainty in estimates, but if differences in means is large, can often detect this • Specialized procedures – If we can assume that two populations have equal variances then can use pooled estimator.

– Can test for equal variances (F test) – Numerical procedures (optional) appear in text.

17

Excel

• Data analysis tool pack can do two-sample t-tests that we have discussed + optional material: • • Most important for us are the two sample t test that does not assume equal variances Excel also does the calculation for a specialized test that assumes the two populations have equal variance • All are very easy to use.

• We Should alway plot data, do normal quantile plots, etc. 18

Excel example • Example – Do piano lessons improve spatial-temporal reasoning?

• Excel output appears below.

t-Test: Two-Sample Assuming Unequal Variances Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail

Variable 1

3.618

9.334

34 0 62 5.059

0.000

1.670

0.000

1.999

Variable 2

0.386

5.871

44 19

Chapter 8 Inferences for Proportions

(Section 8.1)

20

How do sample proportions behave?

Chapter 5 tells us … 21

22

Example

• A SRS of 1600 BC residents found that 954 favored construction of a new highway to Whistler.

• Give a 95% c.i for the true proportion of BC residents who favor a new highway to Whistler.

23

A variation that works better for small samples 24

Using the plus 4 estimator for small samples • 9 of 15 people in a SRS of 15 Buec 232 students felt that the course workload was too heavy.

• Compute an approximate 90% c.i. for the proportion of students who felt the course workload was too heavy. 25

Hypothesis tests for proportions – we use sample proportion rather than plus 4 estimate.

26

Example

• We found that 11 customers in a sample of 40 would be willing to buy a software upgrade that costs $100. If the upgrade is to be profitable, you will need to sell it to more than 20% of your customers. Do the sample data give good evidence that more than 20% are willing to buy?

27

• A poll (March 2, 2004) estimated that support for the BC Liberal party was 39%. Using this estimate as a “guessed value” for a follow up study, how large a sample would I need to estimate Liberal support to within +/- 3%? I want a 95% level of confidence in my estimate. 28