Lectur - Montana State University College of Engineering

Download Report

Transcript Lectur - Montana State University College of Engineering

Lecture 5 Regression

Homework Issues…past

1. Bad Objective: Conduct an experiment because I have to for this class 2. Commas – ugh  3. Do not write out symbols (‘pi’), use the symbol (‘ p ’) 4. Summarize results (don’t give me everything and then some) 5. Report: mean ± std. dev.

Homework Issues…past

1. A confidence interval should be reported as an interval, e.g., 1.2 – 1.5

2. Define abbreviations when first used, e.g., CI 3. However, there were too many conjunctive adverbs at the start of sentences!

4. Equation formatting

Homework Issues…present?

1. Do not show 27 digits of accuracy 2. UNITS!!! UNITS!!! INCLUDE UNITS!!!

3. Every table and figure should have a caption and be referred to in the text.

4. A section (e.g., results) should be more than just a table and a figure.

On to the lecture…

In Excel…

three ways to perform a linear regression: 1. Built-in functions SLOPE() and INTERCEPT() - no details 2. Adding a trendline to a chart, and showing the regression equation on the chart (simplest) 3. Regression analysis using the Data Analysis Toolkit (best option – more information)

Option 3 in Excel

Excel Results

• Recall that we forced the intercept = 0

• •

Interpretation of results…

Excel reports the Standard Error, not the standard deviation. They are not equal. See next slide.

The P-value is the probability that the observed result could be explained by random chance. The tiny P-value for the slope (1.91 x 10 -25 ) indicates that there is a miniscule probability that the observed result can be explained by random chance. That is, you REALLY NEED the slope term to explain the data.

• •

Interpretation of results…

The 95% confidence interval for the true value of the slope (true value of π in this example) is presented in the output table. In this example, with 95% confidence, the true value of π is somewhere between 3.138 and 3.307.

The 90% confidence interval is 3.15233 to 3.292408, which does not contain the true value!! Measurement bias – not small, random, additive error?

Calculating std. dev.

sd

se

N

• • • Slope se =0.0405

Slope sd = 0.0405 · sqrt(20) = 0.181

Our experimental results are: – “The experimental value of π was found to be 3.22 ± 0.181.” – “The 95% confidence interval for true value of π ranges from 3.138 to 3.307.”

Multivariable Regression

y

Fit this data to an equation of the form:

p

b

0 

b

1

x

b

2

x

2

Plot

450 400 350 300 250 200 150 100 50 0 0 2 4 6 8 10 12

Multivariable Regression

• y is the response variable.

• Order of the other columns does not matter.

In Excel…

Results… (bug?)

• • • • • • •

Interpretation…

The coefficients ± s are: b 0 = 5.53 ± 20.45

b 1 = 2.12 ± 8.54

b 2 = 3.98 ± 0.78

Standard deviations are significantly larger than the mean values for b 0 and b 1 . p-values for these coefficients are 0.42 and 0.45.

These p-values are well over 0.05, so these terms are statistically insignificant (at 5%.) We can regress this data nearly as well with:

y p

b

2

x

2

• •

p-value?

Recall: The lower the p-value, the less likely the result, assuming the null hypothesis, so the more "significant" the result, in the sense of statistical significance.

The null hypothesis here is, simplistically, that the coefficient is zero.

t-Test on a Regression Slope

y p

b

0 

b

1

x

• • Comparison of b 1 another value, b .

from regression with The t-test is a hypothesis test. Here are the hypotheses for this t-test.

– H0 (null hypothesis) – The slope, b 1 , is equal to the known value, β.

– H1 (test hypothesis) – The slope, b 1 , is not equal to the known value, β.

t-Statistic

• The appropriate t-statistic for this case is calculated as

t stat

 (

b

1  b )

N SSE

/ 

i N

 1 (

x i

  2

x

) 2 • where

SSE

 

i N

 1 (

y i

y p i

) 2 • The t statistic is always positive; you may have to use (β-b 1 ) to get a positive value.

Critical t Value

• • • • If t stat > t crit – Reject the null hypothesis that the slope, b 1 , is equal to the known value, β.

If t stat ≤ t crit – Fail to reject the null hypothesis.

Get t crit from a t-Table or Excel (see example).

degrees of freedom, DOF = N-2

Example

• • • • • • • • We are comparing b 1 = 3.22 (first example in lecture) to Get SSE = 85.954 from regression output.

b = p .

Calculate: t stat = 0.952

Choose α = 0.05.

DOF = 20 – 2 = 18.

In Excel, calculate TINV(α,DOF), which returns the value t crit =2.101 when α = 0.05 and DOF = 18 Since t stat ≤ t crit hypothesis.

(0.952 < 2.101) we fail to reject the null Conclusion? We cannot say with 95% confidence that b 1 equal to b.

is not

Example

• • • • • • Choose α = 0.40.

DOF = 20 – 2 = 18.

In Excel, calculate TINV(α,DOF), which returns the value t crit =0.86 when α = 0.40 and DOF = 18 Since t cirt ≤ t stat we reject the null hypothesis.

Conclusion? We can say with 60% confidence that b 1 equal to b.

is not Hmmm…that’s a coin flip.