Review #2 - California State University, Fullerton

Download Report

Transcript Review #2 - California State University, Fullerton

Review #2
Chapter 9
Chapter 10
Chapter 11
Chapter 12
1
Chapter 9
• A statistic is a random variable describing a
characteristic of a random samples.
– Sample mean
– Sample variance
• We use statistic values in inferential
statistics (make inference about population
characteristics from sample characteristics).
• Statistics have distributions of their own.
The Central Limit Theorem
– The distribution of the sample mean is normal if the
parent distribution is normal.
– The distribution of the sample mean approaches the
normal distribution for sufficiently large samples
(n 30), even if the parent distribution is not normal.
– The parameters of the sample distribution of the mean
are:
• Mean:
• Standard deviation:
x  x
x 
x
n
Problem 1
• Given a normal population whose mean is
50 and whose standard deviation is 5,
– Find the probability that a random sample of 4
has a mean between 49 and 52
– Answer:
P(49  x  52)  P(
49  50
Z
52  50
5 4
5 4
P(.4  Z  .8)  .7881  .3446  .4435
)
-.4
.8
Problem 2
– Find the probability that a random sample of 16
has a mean between 49 and 52.
– Answer
P(49  x  52)  P(
49  50
Z
52  50
5 16
5 16
P(.8  Z  1.6)  .9332  .2119  .7213
)
Problem 2
• The amount of time per day spent by adults
watching TV is normally distributed with =6 and
=1.5 hours.
– What is the probability that a – What is the probability that 5
randomly selected adult
adults watch TV on the
watches TV for more than
average 7 or more hours?
7 hours a day?
– Answer:
– Answer:
76

P(X  7)  P Z 

1.5 

P(Z  .67)  1  .7486  .2514

7  6 

P(X  7)  P Z 



1.5 5 

P(Z  1.49)  1  .9319  .0681
Problem 2
• Additional question
– What is the probability that the total TV
watching time of the five adults sampled will
exceed 28 hours?
– Answer:

5.6  6 

P( X  28 / 5)  P Z 

1.5 5 

Sampling distribution of the
sample proportion
• In a sample of size n, if np > 5 and n(1-p) > 5,
then the sample proportion p^ = x/n is
approximately normally distributed with the
following parameters:
μˆp  p
and
p(1  p)
σ ˆp 
, therefore,
n
ˆp  p
Z
p(1  p) n
Problem 3
• A commercial of a household appliances
manufacturer claims that less than 5% of all
of its products require a service call in the
first year.
• A survey of 400 households that recently
purchased the manufacturer products was
conducted to check the claim.
Problem 3
– Assuming the manufacturer is right, what is the
probability that more than 10% of the surveyed
households require a service call within the first
year?


.
10

.
05
  P(Z  4.59)  0
P(pˆ  .10)  P Z 


.
05
(
1

.
05
)
400


If indeed 10% of the sampled households reported
a call for service within the first year, what does it
tell you about the the manufacturer claim?
Chapter 10
• A population’s parameter can be estimated
by a point estimator and by an interval
estimator.
• A confidence interval with 1-a confidence
level is an interval estimator that covers the
estimated parameters (1-a)% of the time.
• Confidence intervals are constructed using
sampling distributions.
Confidence interval of the mean
• We use the central limit theorem to build
the following confidence interval
x  za / 2
a/2

n
   x  za / 2
n
a/2
1-a
-za/2

za/2
Problem 4
• How many classes university students miss
each semester? A survey of 100 students
was conducted. (see Missed Classes)
• Assuming the standard deviation of the
number of classes missed is 2.2, estimate
the mean number of classes missed per
student.
• Use 99% confidence level.
Problem 4
– Solution
x  za / 2

n
= 10.21 2.575
2.2
100
= 10.21 .57
Missed classes
1- a = .99
a = .01
a/2 = .005
Za/2 = Z.005= 2.575
LCL = 9.64, UCL = 10.78
Mean
10.21
Standard Error
0.21755993
Median
10
Mode
10
Standard Deviation 2.1755993
Sample Variance 4.73323232
Kurtosis
0.91111511
Skewness
-0.107237
Range
14
Minimum
3
Maximum
17
Sum
1021
Count
100
Selecting the sample size
• The shorter the confidence interval, the
more accurate the estimate.
• We can, therefore, limit the width of the
interval to W, and get
x  W  x  za / 2

n
or W  z a / 2
• From here we have
 za / 2 
n

W


2

n
Problem 5
• An operation manager wants to estimate the
average amount of time needed by a worker
to assemble a new electronic component.
• Sigma is known to be 6 minutes.
• The required estimate accuracy is within 20
seconds.
• The confidence level is 90%; 95%.
• Find the sample size.
Problem 5
– Solution
 = 6 min; W = 20 sec = 1/3 min;
• 1 - a =.90 Za/2 = Z.05 = 1.645
2
2
2
 za / 2 
 z .05  
 1.645(6) 
n


 876.75



 1/ 3 
 W 
 W 
Take n  877
• 1-a = .95, Za/2 = Z.025 = 1.96
2
 1.96(6) 
n
 1244.67 Take n  1245

 1/ 3 
Chapter 11
• Hypotheses tests
– In hypothesis tests we hypothesize on a value of
a population parameter, and test to see if there
is sufficient evidence to support our belief.
– The structure of hypotheses test
• Formulate two hypotheses.
– H0: The one we try to reject in favor of …
– H1: The alternative hypothesis, the one we try to prove.
• Define a significance level a.
Hypotheses tests
– The significance level is the probability of
erroneously reject the null hypothesis.
a= P(reject H0 when H0 is true)
– Sample from the population and calculate a
statistic that provides an indication whether or
not the parameter value defined under H1 is
more probable.
– We shall test the population mean assuming the
standard deviation is known.
Problem 6
• A machine is set so that the average
diameter of ball bearings it produces is .50
inch. In a sample of 100 ball bearings the
mean diameter was .51 inch. Assuming the
standard deviation is .05 inch, can we
conclude at 5% significance level that the
mean diameter is not .50 inch.
Problem 6
• The population studied is the ball-bearing
diameters.
• We hypothesize on the population mean.
• A good point estimator for the population
mean is the sample mean.
• We use the distribution of the sample mean
to build a sample statistic to test whether 
= .50 inch.
Problem 6
• Solution
– Define the hypotheses:
Probability of
type one error
• H0:  = .50
• H1:  = .50
Define a rejection region. Note that this is a two tail
test because of the inequality.
P(X  X L1 or X  X L2 given that μ  .50)  .05
P(Z  ZL1 or Z  ZL2 given that μ  .50)  .05
Let us take symmetrica
l rejectionarea ZL1   ZL2
Problem 6
P(Z Z.025 or Z  Z.025 given that μ  .50)  .05
Critical Z
Z.025 = 1.96 (obtained from the Z-table)
Build a rejection region: Zsample> Za/2, or
Zsample<-Za/2
-1.96
1.96
Calculate the value of the sample Z statistic
and compare it to the critical value
Z sample
X   .51 .50


2
 n .05 100
Since 2 > 1.96, there is
sufficient evidence to reject
H0 in favor of H1 at 5%
significance level.
Problem 6
• We can perform the test in terms of the mean
value.
• Let us find the critical mean values for
rejection
XL1=0 + Z.025
XL2=0 - Z.025

=.50+1.96(.05/(100)1/2=.5098
n

=.50 -1.96(.05/(100)1/2=.402
n
Since.51 > .5098, there is sufficient evidence to
reject the null hypothesis at 5% significance level.
Problem 7
• The average annual return on investment for
American banks was found to be 10.2% with
standard deviation of 0.8%.
• It is believed that banks that exercise comprehensive
planning do better.
• A sample of 26 banks that conducted a
comprehensive training provided the following
result: Mean return = 10.5%.
• Can we infer that the belief about bank performance
is supported at 10% significance level by this sample
result?
Problem 7
– The population tested is the “annual rate of
return.”
H0:  = 10.2
H1:  > 10.2
– Let us perform the test with the p-value method:
• P(X > 10.5 given that  = 10.2) =
P(Z > (10.5 – 10.2)/[.8/(26)1/2] =
P(Z > 1.91) = 1 - .5719 = .0281
– Since .0281 < .10 we reject the null hypothesis at
10% significance level.
Problem 7
• Note the equivalence between the
standardized method or the rejection region
method and the p-value method.
• P(Z>Z.10) = .10
Z10 = 1.28
• Run the test with Data Analysis Plus.
See data in Return
.0281
1.28 1.91
Type II Error
• Type II error occurs when H0 is erroneously not
rejected.
• The probability of a type II error is called b.
b=P(Do not reject H0 when H1 is true)
• To calculate b:
– H1 specifies an actual parameter value (not a range of
values). Example: H0:  = 100; H1:  = 110
– The critical value is expressed in original terms (not in
standard terms).
Problem 7a
• What is the probability you’ll believe the
mean return in problem 7 is 10.2% while
actually it’s 10.6%, if the sample provided a
mean return of 10.5%?
Problem 7a
• Solution
– The two hypotheses are:
H0:  = 10.2
H1:  = 10.6
– H0 is not rejected (we believe  = 10.2) if the
sample mean is less than a critical value.
– Therefore, the probability required is:
b = P(X < Xcr | = 10.6).
Problem 7a
• The critical value is (recall, this problem was a case
of a right hand tail test, with 10% significance
level):
X L  μ0  Z.10
σ
.8
 10.2  1.28
 10.40
n
26
b = P(X<10.4 when  = 10.6) =
P(Z < (10.4-10.6)/[.8/(26)1/2]) = P(Z < -1.27) = .102
Chapter 12
• Generally, the standard deviation is unknown the
same way the mean may be unknown.
• When the standard deviation is unknown, we need
to change the test statistic from “Z” to “t”.
• We shall test three population parameters:
– Mean
– Variance
– Proportion
Testing the mean
(unknown variance)
• Replace the statistic Z with “t”
X μ
t
s n
The original distribution must be normal (or at
least mound shaped).
Problem 8
• A federal agency inspects packages to determine if
the contents is at least as great as that advertised.
• A random sample of (i)5, (ii)50 containers whose
packaging states that the weight was 8.04 ounces
was drawn. (See Content).
• From the sample results…
– Can we conclude that the average weight does not meet
the weight stated? (use a = .05).
– Estimate the mean weight of all containers with 99%
confidence
– What assumption must be met?
Problem 8
• Solution
– We hypothesize on the mean weight.
• H0:  = 8.04
• H1:  < 8.04
• (i) n=5. For small samples let us solve manually
Assume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94
– The rejection region: t < -ta,n1 = -t.05,5-1 = -2.132
The tsample = ?
– Mean = (8.07+…+7.94)/5 = 7.996
-2.132
Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054
Problem 8
• The t sample is calculated as follows:
t
X
s
n

7.996  8.04
0.054
5
 1.32
-2.132 the sample statistic does not
• Since -1.32 > -2.132
fall into the rejection region. There is insufficient
evidence to conclude that the mean weight is smaller
than 8, at 5% significance level.
Rejection Region
-1.32
Problem 8
• (ii) n=50. To calculate the sample statistics we use
Excel, “Descriptive statistics” from the Tools>Data
analysis menu. From the sample we obtain:
Mean = 8.02; Std. Dev. = .04
• The confidence interval is calculated by
x  ta/2
1-a = .99
a = .01
a/2 = .005
s
n
= 8.02 2.678
.04
50
= 8.02 .015
or LCL = 8.005, UCL = 8.35
t.005,50-1 = about 2.678 from the t - table
Problem 8
• Comments
– Check whether it appears that the distribution is
normal
Frequency
20
15
10
5
0
7.93
7.97
8.01
8.05
8.09
More
Using Excel
– To obtain an exact value for ‘t’ use the TINV
function:
=TINV(0.01,49)
The exact value: 2.6799535
Degrees of
freedom
.01 is the two tail probability
Problem 8
– In our example recall:
• H0:  = 8.4
• H1:  < 8.4
• The p-value =
.000187 < .05
– There is sufficient
evidence to reject
the H0 in favor of H1.
t-Test: Two-Sample Assuming Unequal Variances
Weights
Mean
8.0182
Variance
0.001627
Observations
50
Hypothesized Mean Difference0
df
49
t Stat
-3.82126
P(T<=t) one-tail
0.000187
t Critical one-tail
1.676551
P(T<=t) two-tail
0.000375
t Critical two-tail
2.009574
Note: t = (8.018-8.04)/[.0403/(50)1.2]=-3.82. < -t.05,49 = -1.676
V2
8.04
0
50
Inference about the population
Variance
• The following statistic is c2 (Chi squared)
distributed with n-1 degrees of freedom:
(n  1)s
c 
2

2
2
• We use this relationship to test and estimate
the variance.
Inference about the population
Variance
• The Hypotheses tested are:
H0 :  2   20
H1 :  2   20 or   20 or   20
• The rejection region is:
(n  1)s 2
 20
 c 2a ,n 1 or  c12 a ,n 1
For the two tail test replace
a
with a.
2
Problem 9
• A random sample of 100 observations was
taken from a normal population. The
sample variance was 29.76.
• Can we infer at 2.5% significance level that
the population variance exceeds 30?
• Estimate the population variance with 90%
confidence.
Problem 9
• Solution:
• H0:2 = 30
• H1:2 < 30
Rejection region: c2 < c2a, n-1
2
2
(100
–
1)29.76
(n
–
1)s
c2 =
=
= 97.42
302
02
c2a,n-1 = c2.025,100-1 = about 129.561
For the confidence interval
look at page 370.
– Since 97.42 < 129.42 we conclude that there is
sufficient evidence at 2.5% significance level that the
variance is smaller than 30.
Using Excel
– We can get an exact value of the probability P(c2d.f.>
c2) = ? for a given c2 and known d.f. This makes it
possible to determine the p-value.
– Use the CHIDIST function: =CHIDIST(c2,d.f.)
For example: = CHIDIST(97.42,99) = .526
That is: P(c299> 97.42) = .526
– In our example we had a left hand tail rejection region.
The p-value is calculated based on the c2 value (97.42):
P(c299 < 97.42) = 1 - .526
Using Excel
– We can get the exact c2 value for which
P(c2d.f.> c2) = a, for any given probability a
and known d.f.
– Use the CHIINV function =CHIINV(a,d.f.)
For example: =CHIINV(.025,99) = 128.4219
That is: P(c299 > ?) = .025. c2 = 128.4219
Inference about a population
proportion
• The test and the confidence interval are based on
the approximated normal distribution of the
sample proportion, if np>5 and n(1-p)>5.
• For the confidence interval of p we have:
ˆp  Z a 2
ˆp( 1  ˆp )
n
where p^ = x/n
• For the hypotheses test, we run a Z test.
Problem 10
• A consumer protection group run a survey
of 400 dentists to check a claim that 4 out of
5 dentists recommend ingredients included
in a certain toothpaste.
• The survey results are as follows:
71 – No; 329 – Yes
• At 5% significance level, can the consumer
group infer that the claim is true?
Problem 10
• Solution
– The two hypotheses are:
• H0: p = .8
• H1: p > .8
The rejection region: Z > Za
pˆ  p
.8225  .8
Z

 1.18
.8225(1  .8225) 400
pˆ (1  pˆ ) n
Z.05 = 1.645
– Since 1.18 < 1.645 the consumer group cannot confirm
the claim at 5% significance level.