Transcript Inference for the Mean of a Population
Chapter 7 and Chapter 8
1
Inference for the Mean of a Population – Part 1
Chapter 7.1
(omit sign test pp 469 – 470)
2
The situation where
is not known
• If is known then the std deviation of the sample mean is given by /sqrt(n) • We now consider the more realistic situation where is not known. In effect, we estimate using,
s,
the sample standard deviation.
3
4
df 28 29 30 60 1000 z* 9 10 11 12 5 6 7 8 1 2 3 4 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
t-table (Table D)
0.25
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.679
0.675
0.674
50.0% Upper tail probability p 0.2
0.15
1.376
1.061
0.978
0.941
1.963
1.386
1.250
1.190
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
0.848
0.842
0.842
60.0%
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.045
1.037
1.036
70.0% Confidence Level C 0.1
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.296
1.282
1.282
80.0% 0.05
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.671
1.646
1.645
90.0% 0.025
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.000
1.962
1.960
95.0% 0.01
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.390
2.330
2.326
98.0% 0.005
63.656
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.660
2.581
2.576
99.0% 0.0025
127.321
14.089
7.453
5.598
4.773
4.317
4.029
3.833
3.690
3.581
3.497
3.428
3.372
3.326
3.286
3.252
3.222
3.197
3.174
3.153
3.135
3.119
3.104
3.091
3.078
3.067
3.057
3.047
3.038
3.030
2.915
2.813
2.807
99.5% 0.001
318.289
22.328
10.214
7.173
5.894
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.610
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
3.232
3.098
3.090
99.8% 0.0005
636.578
31.600
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.689
3.674
3.660
3.646
3.460
3.300
3.291
99.9%
5
Using the t-table
6
• •
Example:
The following data are the amounts of vitamin C, measured in mg. per 100 grams of blend (dry basis) for a random sample of size 8 from a production run: 26,31,23,22,11,22,14,31 We want a 95% c.i. for µ, the mean vitamin C content produced during this run.
7
•
Example:
(dollars): A random sample of 10 one bedroom apartment rental ads from your local newspaper has these monthly rents 500,650,600,505,450,550,515,495,650,395.
Do these data give good reason to believe that the mean rent of all advertised one apartments is greater than $500 per month?
8
Matched Pairs
• Here are some sales before and after a motivational course.
Employee Before
1 212 2 3 282 203 4 5 6 327 165 198
After
237 291 191 341 192 180 Does the course appear to be effective in increasing sales?
9
Robustness of the t procedures
• A statistical procedure is said to be
robust
if the probability calculations required are insensitive to violations of the assumptions made: • For t: – n < 15: use t if data is clearly close to normal. If clearly non normal or outliers are present do not use t.
– 40>= n ≥ 15: can use t except in presence of outliers or strong skewness.
– Large samples: can use t procedures even for clearly skewed data when sample size is large, roughly n ≥ 40.
10
Inference for the Mean of a Population – Part 2: Comparing Two Means Chapter 7.2 (omit pp 498- 503)
11
Overview
• Want to compare means of two populations • Can use c.i. or hypothesis tests.
• Many specialized procedures -- depending on data and underlying distributions.
• We’ll look at some of the most important ones.
12
The idealized situation • • • • • We assume variances are known and normal population.
Doesn’t happen often in practice Can do hypothesis tests and compute p values as in Ch 6.
Example: sigma1 =20, sigma 2 =30, n1 = 120, n2 = 150, x1bar = 67.3, x2 bar = 72.0
H0: mu1- mu2 = 0. Ha: mu1 mu2 ≠0 – (a) Compute the z statistic and p-value.
– (b) Get a 95% c.i for mu1- mu2 13
Two sample t procedures
• The most common situation. We use sample standard deviations to estimate sigma1 and sigma2 .
14
Example
The purchasing department has suggested that all new computer monitors for your company should have flat screens. You want to be sure employees like them. The next 20 employees needing screens are randomly divided into two groups, with 10 in each group. 10 get flat screens, the other 10 get conventional monitors.
One month after receiving the monitors, the employees rate their satisfaction with their monitors on a scale from 1 to 5 by responding to the question “I like my new monitor ( 1= strongly disagree, 5 = strongly agree). Flat screen employees have an average satisfaction of 4.6 with std dev of 0.7. The employees with the standard monitors have an average 3.2 with a standard deviation of 1.6.
(a) Give a 95% c.i for the difference in mean satisfaction scores for all employees. (b) What about a hypothesis test for comparing the two means?
15
• • • Robustness of the two sample procedures Generally procedures are quite robust If
sample sizes are equal
and distributions of the two populations have similar shapes, p-values from t table are quite accurate even when n1 and n2 are as small as 5.
If
sample sizes are unequal
can use the following (same as for one sample t-tests and conf.ints., but replace n by n1+n2): – – –
n1+n2 < 15:
use t if data is clearly close to normal. If clearly non-normal or outliers are present do not use t.
n1+n2 ≥ 15:
presence of outliers or strong skewness.
can use t except in
Large samples:
40.
can use t procedures even for clearly skewed data when sample size is large, roughly n1+n2 ≥ 16
Small samples
• Have to be very careful.
– Substantial uncertainty in estimates, but if differences in means is large, can often detect this • Specialized procedures – If we can assume that two populations have equal variances then can use pooled estimator.
– Can test for equal variances (F test) – Numerical procedures (optional) appear in text.
17
Excel
• Data analysis tool pack can do two-sample t-tests that we have discussed + optional material: • • Most important for us are the two sample t test that does not assume equal variances Excel also does the calculation for a specialized test that assumes the two populations have equal variance • All are very easy to use.
• We Should alway plot data, do normal quantile plots, etc. 18
Excel example • Example – Do piano lessons improve spatial-temporal reasoning?
• Excel output appears below.
t-Test: Two-Sample Assuming Unequal Variances Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail
Variable 1
3.618
9.334
34 0 62 5.059
0.000
1.670
0.000
1.999
Variable 2
0.386
5.871
44 19
Chapter 8 Inferences for Proportions
(Section 8.1)
20
How do sample proportions behave?
Chapter 5 tells us … 21
22
Example
• A SRS of 1600 BC residents found that 954 favored construction of a new highway to Whistler.
• Give a 95% c.i for the true proportion of BC residents who favor a new highway to Whistler.
23
A variation that works better for small samples 24
Using the plus 4 estimator for small samples • 9 of 15 people in a SRS of 15 Buec 232 students felt that the course workload was too heavy.
• Compute an approximate 90% c.i. for the proportion of students who felt the course workload was too heavy. 25
Hypothesis tests for proportions – we use sample proportion rather than plus 4 estimate.
26
Example
• We found that 11 customers in a sample of 40 would be willing to buy a software upgrade that costs $100. If the upgrade is to be profitable, you will need to sell it to more than 20% of your customers. Do the sample data give good evidence that more than 20% are willing to buy?
27
• A poll (March 2, 2004) estimated that support for the BC Liberal party was 39%. Using this estimate as a “guessed value” for a follow up study, how large a sample would I need to estimate Liberal support to within +/- 3%? I want a 95% level of confidence in my estimate. 28