#### Transcript Review #2 - California State University, Fullerton

Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12 1 Chapter 9 • A statistic is a random variable describing a characteristic of a random samples. – Sample mean – Sample variance • We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics). • Statistics have distributions of their own. The Central Limit Theorem – The distribution of the sample mean is normal if the parent distribution is normal. – The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n 30), even if the parent distribution is not normal. – The parameters of the sample distribution of the mean are: • Mean: • Standard deviation: x x x x n Problem 1 • Given a normal population whose mean is 50 and whose standard deviation is 5, – Find the probability that a random sample of 4 has a mean between 49 and 52 – Answer: P(49 x 52) P( 49 50 Z 52 50 5 4 5 4 P(.4 Z .8) .7881 .3446 .4435 ) -.4 .8 Problem 2 – Find the probability that a random sample of 16 has a mean between 49 and 52. – Answer P(49 x 52) P( 49 50 Z 52 50 5 16 5 16 P(.8 Z 1.6) .9332 .2119 .7213 ) Problem 2 • The amount of time per day spent by adults watching TV is normally distributed with =6 and =1.5 hours. – What is the probability that a – What is the probability that 5 randomly selected adult adults watch TV on the watches TV for more than average 7 or more hours? 7 hours a day? – Answer: – Answer: 76 P(X 7) P Z 1.5 P(Z .67) 1 .7486 .2514 7 6 P(X 7) P Z 1.5 5 P(Z 1.49) 1 .9319 .0681 Problem 2 • Additional question – What is the probability that the total TV watching time of the five adults sampled will exceed 28 hours? – Answer: 5.6 6 P( X 28 / 5) P Z 1.5 5 Sampling distribution of the sample proportion • In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion p^ = x/n is approximately normally distributed with the following parameters: μˆp p and p(1 p) σ ˆp , therefore, n ˆp p Z p(1 p) n Problem 3 • A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. • A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim. Problem 3 – Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year? . 10 . 05 P(Z 4.59) 0 P(pˆ .10) P Z . 05 ( 1 . 05 ) 400 If indeed 10% of the sampled households reported a call for service within the first year, what does it tell you about the the manufacturer claim? Chapter 10 • A population’s parameter can be estimated by a point estimator and by an interval estimator. • A confidence interval with 1-a confidence level is an interval estimator that covers the estimated parameters (1-a)% of the time. • Confidence intervals are constructed using sampling distributions. Confidence interval of the mean • We use the central limit theorem to build the following confidence interval x za / 2 a/2 n x za / 2 n a/2 1-a -za/2 za/2 Problem 4 • How many classes university students miss each semester? A survey of 100 students was conducted. (see Missed Classes) • Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student. • Use 99% confidence level. Problem 4 – Solution x za / 2 n = 10.21 2.575 2.2 100 = 10.21 .57 Missed classes 1- a = .99 a = .01 a/2 = .005 Za/2 = Z.005= 2.575 LCL = 9.64, UCL = 10.78 Mean 10.21 Standard Error 0.21755993 Median 10 Mode 10 Standard Deviation 2.1755993 Sample Variance 4.73323232 Kurtosis 0.91111511 Skewness -0.107237 Range 14 Minimum 3 Maximum 17 Sum 1021 Count 100 Selecting the sample size • The shorter the confidence interval, the more accurate the estimate. • We can, therefore, limit the width of the interval to W, and get x W x za / 2 n or W z a / 2 • From here we have za / 2 n W 2 n Problem 5 • An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component. • Sigma is known to be 6 minutes. • The required estimate accuracy is within 20 seconds. • The confidence level is 90%; 95%. • Find the sample size. Problem 5 – Solution = 6 min; W = 20 sec = 1/3 min; • 1 - a =.90 Za/2 = Z.05 = 1.645 2 2 2 za / 2 z .05 1.645(6) n 876.75 1/ 3 W W Take n 877 • 1-a = .95, Za/2 = Z.025 = 1.96 2 1.96(6) n 1244.67 Take n 1245 1/ 3 Chapter 11 • Hypotheses tests – In hypothesis tests we hypothesize on a value of a population parameter, and test to see if there is sufficient evidence to support our belief. – The structure of hypotheses test • Formulate two hypotheses. – H0: The one we try to reject in favor of … – H1: The alternative hypothesis, the one we try to prove. • Define a significance level a. Hypotheses tests – The significance level is the probability of erroneously reject the null hypothesis. a= P(reject H0 when H0 is true) – Sample from the population and calculate a statistic that provides an indication whether or not the parameter value defined under H1 is more probable. – We shall test the population mean assuming the standard deviation is known. Problem 6 • A machine is set so that the average diameter of ball bearings it produces is .50 inch. In a sample of 100 ball bearings the mean diameter was .51 inch. Assuming the standard deviation is .05 inch, can we conclude at 5% significance level that the mean diameter is not .50 inch. Problem 6 • The population studied is the ball-bearing diameters. • We hypothesize on the population mean. • A good point estimator for the population mean is the sample mean. • We use the distribution of the sample mean to build a sample statistic to test whether = .50 inch. Problem 6 • Solution – Define the hypotheses: Probability of type one error • H0: = .50 • H1: = .50 Define a rejection region. Note that this is a two tail test because of the inequality. P(X X L1 or X X L2 given that μ .50) .05 P(Z ZL1 or Z ZL2 given that μ .50) .05 Let us take symmetrica l rejectionarea ZL1 ZL2 Problem 6 P(Z Z.025 or Z Z.025 given that μ .50) .05 Critical Z Z.025 = 1.96 (obtained from the Z-table) Build a rejection region: Zsample> Za/2, or Zsample<-Za/2 -1.96 1.96 Calculate the value of the sample Z statistic and compare it to the critical value Z sample X .51 .50 2 n .05 100 Since 2 > 1.96, there is sufficient evidence to reject H0 in favor of H1 at 5% significance level. Problem 6 • We can perform the test in terms of the mean value. • Let us find the critical mean values for rejection XL1=0 + Z.025 XL2=0 - Z.025 =.50+1.96(.05/(100)1/2=.5098 n =.50 -1.96(.05/(100)1/2=.402 n Since.51 > .5098, there is sufficient evidence to reject the null hypothesis at 5% significance level. Problem 7 • The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%. • It is believed that banks that exercise comprehensive planning do better. • A sample of 26 banks that conducted a comprehensive training provided the following result: Mean return = 10.5%. • Can we infer that the belief about bank performance is supported at 10% significance level by this sample result? Problem 7 – The population tested is the “annual rate of return.” H0: = 10.2 H1: > 10.2 – Let us perform the test with the p-value method: • P(X > 10.5 given that = 10.2) = P(Z > (10.5 – 10.2)/[.8/(26)1/2] = P(Z > 1.91) = 1 - .5719 = .0281 – Since .0281 < .10 we reject the null hypothesis at 10% significance level. Problem 7 • Note the equivalence between the standardized method or the rejection region method and the p-value method. • P(Z>Z.10) = .10 Z10 = 1.28 • Run the test with Data Analysis Plus. See data in Return .0281 1.28 1.91 Type II Error • Type II error occurs when H0 is erroneously not rejected. • The probability of a type II error is called b. b=P(Do not reject H0 when H1 is true) • To calculate b: – H1 specifies an actual parameter value (not a range of values). Example: H0: = 100; H1: = 110 – The critical value is expressed in original terms (not in standard terms). Problem 7a • What is the probability you’ll believe the mean return in problem 7 is 10.2% while actually it’s 10.6%, if the sample provided a mean return of 10.5%? Problem 7a • Solution – The two hypotheses are: H0: = 10.2 H1: = 10.6 – H0 is not rejected (we believe = 10.2) if the sample mean is less than a critical value. – Therefore, the probability required is: b = P(X < Xcr | = 10.6). Problem 7a • The critical value is (recall, this problem was a case of a right hand tail test, with 10% significance level): X L μ0 Z.10 σ .8 10.2 1.28 10.40 n 26 b = P(X<10.4 when = 10.6) = P(Z < (10.4-10.6)/[.8/(26)1/2]) = P(Z < -1.27) = .102 Chapter 12 • Generally, the standard deviation is unknown the same way the mean may be unknown. • When the standard deviation is unknown, we need to change the test statistic from “Z” to “t”. • We shall test three population parameters: – Mean – Variance – Proportion Testing the mean (unknown variance) • Replace the statistic Z with “t” X μ t s n The original distribution must be normal (or at least mound shaped). Problem 8 • A federal agency inspects packages to determine if the contents is at least as great as that advertised. • A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (See Content). • From the sample results… – Can we conclude that the average weight does not meet the weight stated? (use a = .05). – Estimate the mean weight of all containers with 99% confidence – What assumption must be met? Problem 8 • Solution – We hypothesize on the mean weight. • H0: = 8.04 • H1: < 8.04 • (i) n=5. For small samples let us solve manually Assume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94 – The rejection region: t < -ta,n1 = -t.05,5-1 = -2.132 The tsample = ? – Mean = (8.07+…+7.94)/5 = 7.996 -2.132 Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054 Problem 8 • The t sample is calculated as follows: t X s n 7.996 8.04 0.054 5 1.32 -2.132 the sample statistic does not • Since -1.32 > -2.132 fall into the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level. Rejection Region -1.32 Problem 8 • (ii) n=50. To calculate the sample statistics we use Excel, “Descriptive statistics” from the Tools>Data analysis menu. From the sample we obtain: Mean = 8.02; Std. Dev. = .04 • The confidence interval is calculated by x ta/2 1-a = .99 a = .01 a/2 = .005 s n = 8.02 2.678 .04 50 = 8.02 .015 or LCL = 8.005, UCL = 8.35 t.005,50-1 = about 2.678 from the t - table Problem 8 • Comments – Check whether it appears that the distribution is normal Frequency 20 15 10 5 0 7.93 7.97 8.01 8.05 8.09 More Using Excel – To obtain an exact value for ‘t’ use the TINV function: =TINV(0.01,49) The exact value: 2.6799535 Degrees of freedom .01 is the two tail probability Problem 8 – In our example recall: • H0: = 8.4 • H1: < 8.4 • The p-value = .000187 < .05 – There is sufficient evidence to reject the H0 in favor of H1. t-Test: Two-Sample Assuming Unequal Variances Weights Mean 8.0182 Variance 0.001627 Observations 50 Hypothesized Mean Difference0 df 49 t Stat -3.82126 P(T<=t) one-tail 0.000187 t Critical one-tail 1.676551 P(T<=t) two-tail 0.000375 t Critical two-tail 2.009574 Note: t = (8.018-8.04)/[.0403/(50)1.2]=-3.82. < -t.05,49 = -1.676 V2 8.04 0 50 Inference about the population Variance • The following statistic is c2 (Chi squared) distributed with n-1 degrees of freedom: (n 1)s c 2 2 2 • We use this relationship to test and estimate the variance. Inference about the population Variance • The Hypotheses tested are: H0 : 2 20 H1 : 2 20 or 20 or 20 • The rejection region is: (n 1)s 2 20 c 2a ,n 1 or c12 a ,n 1 For the two tail test replace a with a. 2 Problem 9 • A random sample of 100 observations was taken from a normal population. The sample variance was 29.76. • Can we infer at 2.5% significance level that the population variance exceeds 30? • Estimate the population variance with 90% confidence. Problem 9 • Solution: • H0:2 = 30 • H1:2 < 30 Rejection region: c2 < c2a, n-1 2 2 (100 – 1)29.76 (n – 1)s c2 = = = 97.42 302 02 c2a,n-1 = c2.025,100-1 = about 129.561 For the confidence interval look at page 370. – Since 97.42 < 129.42 we conclude that there is sufficient evidence at 2.5% significance level that the variance is smaller than 30. Using Excel – We can get an exact value of the probability P(c2d.f.> c2) = ? for a given c2 and known d.f. This makes it possible to determine the p-value. – Use the CHIDIST function: =CHIDIST(c2,d.f.) For example: = CHIDIST(97.42,99) = .526 That is: P(c299> 97.42) = .526 – In our example we had a left hand tail rejection region. The p-value is calculated based on the c2 value (97.42): P(c299 < 97.42) = 1 - .526 Using Excel – We can get the exact c2 value for which P(c2d.f.> c2) = a, for any given probability a and known d.f. – Use the CHIINV function =CHIINV(a,d.f.) For example: =CHIINV(.025,99) = 128.4219 That is: P(c299 > ?) = .025. c2 = 128.4219 Inference about a population proportion • The test and the confidence interval are based on the approximated normal distribution of the sample proportion, if np>5 and n(1-p)>5. • For the confidence interval of p we have: ˆp Z a 2 ˆp( 1 ˆp ) n where p^ = x/n • For the hypotheses test, we run a Z test. Problem 10 • A consumer protection group run a survey of 400 dentists to check a claim that 4 out of 5 dentists recommend ingredients included in a certain toothpaste. • The survey results are as follows: 71 – No; 329 – Yes • At 5% significance level, can the consumer group infer that the claim is true? Problem 10 • Solution – The two hypotheses are: • H0: p = .8 • H1: p > .8 The rejection region: Z > Za pˆ p .8225 .8 Z 1.18 .8225(1 .8225) 400 pˆ (1 pˆ ) n Z.05 = 1.645 – Since 1.18 < 1.645 the consumer group cannot confirm the claim at 5% significance level.