#### Transcript presentation slides for this follow along session

### Bus 621 Statistics

## Lecture 1 Basics of Statistical Inference

1.

2.

3.

4.

5.

6.

7.

**Lecture 1**

Inference for a single numerical variable Statement of hypotheses P-value concept How to communicate the results of a test Inference for a single numerical variable and a categorical variable with 2 categories Inference for a single categorical variable Inference for 2 categorical variables

**Statistical Methods**

**Descriptive Statistics Tutorials Statistical Methods Estimation Inferential Statistics Hypothesis Testing**

**Estimation Process**

**Population**

**Mean, **

**, is unknown**

**Sample**

**Random Sample**

**Mean X = 50**

**I am 95% confident that **

**is between 40 & 60.**

**Unknown Population Parameters Are Estimated Estimate Population Parameter...**

**Mean**

**Proportion**

p

**Std. Dev.**

**Differences**

**1**

**-**

**2**

**with Sample Statistic**

*x p s*

*x*

**1**

**-**

*x*

**2**

**Estimation Methods**

**Point Estimation Estimation Interval Estimation**

**Point Estimation**

1.

Provides a single value • Based on observations from one sample 2.

Gives no information about how close the value is to the unknown population parameter 3.

Example: Sample mean

*x*

= 3 is a

**point estimate**

of unknown population mean

**Interval Estimation**

1.

Provides a range of values • Based on observations from one sample 2.

Gives information about closeness to unknown population parameter 3.

Example: Unknown population mean lies between 50 and 70 with 95% confidence

**Confidence Level **

1.

2.

3.

Probability that the unknown population parameter falls within interval Denoted (1 – • is probability that parameter is

**not**

within interval Typical values are 99%, 95%, 90%

**Intervals & Confidence Level**

**Sampling Distribution of Sample Mean _**

**/2 1 -**

**/2**

**x = **

_

**X**

(1 – α)% of intervals contain μ α% do not Large number of intervals

**Factors Affecting Interval Width**

1.

Data dispersion More variability = larger width 2.

Sample size Larger sample = smaller width 3.

Level of confidence (1 – ) Higher confidence = larger width © 1984-1994 T/Maker Co.

**Accurate Confidence Interval for Mean (**

**Unknown)**

Assumption: Population must be

**normally distributed**

**Thinking Challenge**

You’re a time study analyst in manufacturing. You’ve recorded the following task times (min.):

**3.6, 4.2, 4.0, 3.5, 3.8, 3.1**

. What is the

**90%**

confidence interval estimate of the population

**mean**

task time?

• Confidence Interval for a Mean ( ) with Unknown , Using MegaStat MegaStat does all calculations for you. We can be 90% confident that the population mean falls between 3.379 and 4.021.

### Applications

An example using L1 One sample numerical variable.xlsx

Problem 1: Obtain and interpret a 95% confidence interval for the population mean for price per square foot for all combinations of SAD and with/without a pool. Problem 2: Check to see if these confidence intervals may be inaccurate by looking at normality/sample size.

Your Turn: Do PS1 problem 1

**Statistical Methods**

**Descriptive Statistics Statistical Methods Estimation Inferential Statistics Hypothesis Testing**

**What’s a Hypothesis?**

A belief about a population parameter • Parameter is

**population**

mean, proportion, slope • Must be stated

**before**

analysis I believe the mean GPA of this class is 3.5!

© 1984-1994 T/Maker Co.

**Population**

**Hypothesis Testing**

**I believe the population mean age is 50 (hypothesis).**

**Reject hypothesis! Not close.**

**Random sample**

**Mean**

*X *= 20

1.

2.

**How do we Measure “Close”?**

If hypothesized value were really the true mean, there should be a high probability of obtaining the observed sample xbar by pure random chance. Call this the p-value If the p-value is smaller than, say, 5%, we “reject” the hypothesized value for .

**Basic Idea**

**It is unlikely that we would get a sample mean of this value ...**

**Sampling Distribution ... therefore, we reject the hypothesis that **

**= 50.**

**... if in fact this were the population mean 20**

**= 50 H 0 Sample Means**

**Naming Null & Alternative Hypotheses**

2.

3.

4.

1.

Null hypothesis, H 0 sign: , , or (pronounced H-oh) always has equality Alternative hypothesis, H a , opposite of null H a always has inequality sign: Specified as H a • : , Example, H a : , or < 3 , , or some value

**Identifying Hypotheses**

Example: Test that the population mean is not 3 Steps: • State the question statistically ( • State the opposite statistically ( 3) = 3) • • • — Must be mutually exclusive & exhaustive Designate which is alternative hypothesis ( — Has the ,

**<**

, or

**> **

sign 3) Designate which is the null hypothesis ( Called a two-tailed hypothesis because of = 3) in H a

**What Are the Hypotheses?**

**Is the population average amount of TV viewing equal to 12 hours?**

• State the question statistically:

**= 12**

• • • State the opposite statistically:

**12**

Select the alternative hypothesis:

**H a : **

**12**

State the null hypothesis:

**H 0 : **

**= 12**

•

**This is a two-tailed test. **

**What Are the Hypotheses?**

**Is the population average amount of TV viewing different from 12 hours?**

• State the question statistically:

**12**

• • • State the opposite statistically:

**= 12**

Select the alternative hypothesis:

**H a : **

**12**

State the null hypothesis:

**H 0 : **

**= 12**

•

**This is a two-tailed test. **

**What Are the Hypotheses?**

**Is the average amount spent in the bookstore greater than $25?**

• State the question statistically:

**25**

• • • State the opposite statistically:

**25**

Select the alternative hypothesis:

**H a : **

**25**

State the null hypothesis:

**H 0 : **

**25**

•

**This is a one-tailed or right-tailed test. **

**What Are the Hypotheses?**

**Is the average cost per hat less than $20?**

• • • • State the question statistically:

**20**

State the opposite statistically:

**≥ 20**

Designate the alternative hypothesis:

**H a : **

**20**

State the null hypothesis:

**H 0 : **

**≥ 20**

•

**This is a one-tailed or left-tailed test. **

**Level of Significance**

1.

2.

3.

4.

A “tail” probability of the bell curve used to define how many std. devs. of xbar to judge “closeness” and to compare p-value against.

Designated (alpha) • Typical values are .01, .05, .10 (.05 is most common) Selected by researcher, otherwise will be given in a problem Defines unlikely values of sample statistic if null hypothesis is true

**p-Value Approach**

1.

Probability of obtaining a test statistic more extreme ( or than actual sample value, given H 0 is true is called the p-value 2.

3.

1- (p-value) is called the confidence in H a 1 is called the required confidence to conclude H a 4. Used to make a decision between hypotheses • If confidence in H a is greater than the required confidence, conclude H a otherwise find H 0 acceptable.

## The Four Steps of a Hypothesis Test

**1.**

**2.**

**3.**

**4.**

**State Hypotheses Determine p-value (MegaStat) Make decision based on 1-p =confidence in H a Draw conclusion within context of problem**

• If confidence in H a is greater than the required confidence, conclude H a otherwise find H 0 acceptable.

**t Test for Mean (**

**Unknown)**

Assumption for p-value to be accurate • Population is normally distributed • • If not normal, take large sample (

*n*

30) Or switch to a test for population median such as Wilcoxon Mann-Whitney test

**One-Tailed t Test Example **

Is the average capacity of batteries

**less than 140**

ampere-hours? A random sample of

**20**

batteries had a mean of

**138.47**

and a standard deviation of

**2.66**

. Assume a normal distribution. Test at the

**.05**

level of significance.

**One-Tailed t Test Solution**

• • • •

**H 0 : H a : **

**=**

**≥ 140 < 140 .**

**05 df = 20 - 1 = 19 p-value =.009 (MegaStat) Conclusion: We can be 99.1% confident that the population mean is less than 140 and since that exceeds the requirement of 95% we can conclude **

**< 140**

**One-Tailed t Test**

You’re a marketing analyst for Wal Mart. Wal-Mart had teddy bears on sale last week. The weekly sales ($00s) of bears sold in

**10**

stores was:

**8 11 0 4 7 8 10 5 8 3**

At the

**.05**

level of significance, is there evidence that the average bear sales per store is

**more than 5**

($00s)?

**One-Tailed t Test Solution***

• • • • • •

**H 0 : H a : **

**= **

**5 > 5 .05**

**df = 10 - 1 = 9 p-value = .111 from MegaStat Confidence in Ha = 1- .111 or .889**

**Required confidence to conclude H a is 95%.**

**There is insufficient evidence that pop. mean is more than 5 since we can be only 88.9% confident. **

One-tailed T-test for a Mean ( ) with Unknown , Using MegaStat

One-tailed T-test for a Mean ( ) with Unknown , Using MegaStat Hypothesis Test: Mean vs. Hypothesized Value 5.0000 hypothesized value 6.4000 mean Sales ($00) 3.3731 std. dev.

1.0667 std. error 1.31 t .1109 p-value (one-tailed, upper)

**Two-Tailed t Test**

You work for the FTC. A manufacturer of detergent claims that the mean weight of detergent is

**3.25**

lb. You take a random sample of

**64**

containers. You calculate the sample average to be

**3.238**

lb. with a standard deviation of

**.117**

lb. At the

**.01**

level of significance, is the manufacturer correct?

**3.25 lb.**

**Two-Tailed t Test Solution***

• • • • •

**H 0 : H a : **

**df **

**= 3.25**

**.01**

**3.25**

**64 - 1 = 63 p-value = .208 from MegaStat Confidence in Ha = 1- .208 or .792**

**Need to be 99% confident to conclude H a There is insufficient evidence pop. mean is not 3.25 since we can only be 79.2% confident. The null hypothesis is acceptable.**

### Applications

An example using L1 One sample numerical variable.xlsx

Problem 3: Test the hypothesis that the mean price per square foot mean for SAD3Pool is different than $320 at a level of significance of .05. How does that compare to the 95% confidence interval you calculated in Problem 1. Use a level of significance of .05 in this problem and all that follow.

Problem 4: Use the Wilcoxon signed rank test to test whether the median price per square foot for SAD2Pool is different than $320.

Problem 5: Test the hypothesis that the mean price per square foot for SAD1NoPool is less than 350.

Example 6: Use the Wilcoxon signed rank test to test whether the median price per square foot for SAD1NoPool is less than $350.

Your Turn: Do PS1 problems 2,3,4

1.

**Two Independent Populations Example applications**

An economist wishes to determine whether there is a difference in mean family income for households in two socioeconomic groups.

2.

An admissions officer of a small liberal arts college wants to compare the mean SAT scores of applicants educated in rural high schools and in urban high schools.

• • How can we tell what to use for these situations?

Both have a numerical variable and a categorical variable (with 2 categories) See “Choosing Situation by Data Type”

**Comparing Two Independent Means, μ**

**1**

**– μ**

**2**

**, assuming **

**unknown**

Assumptions • Independent, random samples • Populations are approximately normally distributed • Population standard deviations are equal

**If at least one population is not normal then an alternative test is to compare population medians using the Wilcoxon Mann-Whitney test**

**Hypothesis Test Example**

You’re a financial analyst for Charles Schwab. Is there a difference in dividend yield between stocks listed on the NYSE and NASDAQ? You collect the following data:

**NYSE NASDAQ Number Mean Std Dev 11 3.27**

**1.30**

**15 2.53**

**1.16**

Assuming

**normal**

populations, is there a difference in

**average**

yield (

**= .05**

)?

© 1984-1994 T/Maker Co.

**Independent Samples Hypothesis Test Solution**

• • • • •

**H H 0 a : :**

**1 -**

**1 .05**

**2**

**2 df **

**= 0 (**

**0 (**

**1 1 = 11 + 15 - 2 = 24**

**2 2 ) ) Need to be 95% confident to conclude H a .p-value = .1397 Confidence in H a Is 1- .1397 = .8603**

**There is little evidence of a difference in means since we can only be 86.03% confident that the pop. means are different**

Two Sample T-test & C.I. for Mean Difference Assuming Equal Variances , Using MegaStat

Two Sample T-test & C.I. for Mean Difference Assuming Equal Variances , Using MegaStat Hypothesis Test: Independent Groups (t-test, pooled variance) NYSE 3.27

1.3

11 NASDAQ 2.53mean

1.16std. dev.

15n 0.740 difference (NYSE - NASDAQ) 1.489 pooled variance 1.220 pooled std. dev.

0.484 standard error of difference 0hypothesized difference 1.53 t .1397 p-value (two-tailed) -0.260 confidence interval 95.% lower 1.740 confidence interval 95.% upper 1.000 margin of error

Wilcoxon Mann-Whitney test using MegaStat Wilcoxon - Mann/Whitney Test Pr/SF n 32 29 61 sum of ranks 1035.5 SAD1Pool 855.5

SAD2Pool 1891 total expected 992.000

value standard 69.243

deviation z corrected for ties with continuity 0.621

correction .5346 p-value (two-tailed) H0: Population Medians are equal H1: Population Medians are not equal P-value = .5346

We can only be 46.54 % confident of a difference in population medians.

See L1 2 sample tests excel file for this example.

### Applications

An example using the L1 2 sample tests.xlsx excel file.

Example 7: Test whether price per square foot has the same population means for homes with and without pools. Your Turn: Do PS1 problems 5,6,7

**A single categorical variable: Z Test for a Proportion**

1.

2.

Condition • nπ and n(1-π) > 5 Z-test from MegaStat Example: Do ranch style homes make up less than 50% of the population of homes?

Data: A sample of 108 homes revealed that 54 were ranch style.

**One-Tailed Solution**

• • •

**H 0 : H a : **

**= π ≥ 0.50**

**π < 0.50**

**.**

**05 P-value = .0271 from Excel MegaStat We can be 97.29% confident that the population proportion is less than 0.5 and therefore can conclude that π < 0.50**

**95% confidence interval estimate for π We can be 95% confident that the population proportion falls between .3147 and .5001.**

**Note: A 2-tailed test would have found the null hypothesis acceptable.**

### Applications

An example using the L1 Categorical variables tests and CI-1.xls file.

Example 8: Test whether less than 50% of the homes are ranch style in the population and obtaining a 95% interval estimate for that population proportion. Your Turn: Do PS1 problems 8

**Two categorical variables: Chi-square Test for Independence**

1.

Chi-square test statistic Example: Do 3 different school districts have the same percentage of ranch, trilevel and two-story homes?

Data: A sample of 108 homes revealed the following table.

Count of STYLE STYLE SAD ranch trilevel twostory Grand Total SAD1 SAD2 SAD3 Grand Total 8 15 21 44 24 11 4 39 11 7 7 25 43 33 32 108

**Chi-square solution**

• • •

**H 0 : H a : **

**= No relationship Relationship exists .**

**05 P-value = .0005 from Excel MegaStat We can be 99.95% confident that there is a relationship between school district and style of home A follow up analysis suggests that SAD 1 has fewer ranch homes and more trilevel homes than expected and that the reverse holds for SAD 3. See the Results tab in L1 Categorical variables file for details.**

### Applications

An example using the L1 Categorical variables tests and CI-1.xlsx file.

Example 9: Test whether there is a relationship between SAD and style of home.

Your Turn: Do PS1 problems 9