Transcript Chapter 12
Business Statistics, 3e by Ken Black Chapter 12 Discrete Distributions Simple Regression & Correlation Analysis Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-1 Learning Objectives • Compute the equation of a simple regression line from a sample of data, and interpret the slope and intercept of the equation. • Understand the usefulness of residual analysis in testing the assumptions underlying regression analysis and in examining the fit of the regression line to the data. • Compute a standard error of the estimate and interpret its meaning. • Compute a coefficient of determination and interpret it. • Test hypotheses about the slope of the regression model and interpret the results. • Estimate values of Y using the regression model. • Compute a coefficient of correlation and interpret it. Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-2 Correlation and Regression • Correlation is a measure of the degree of relatedness of two variables. • Regression analysis is the process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable. Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-3 Simple Regression Analysis • bivariate (two variables) linear regression -the most elementary regression model – dependent variable, the variable to be predicted, usually called Y – independent variable, the predictor or explanatory variable, usually called X Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-4 Airline Cost Data Number of Passengers ($1,000) X 61 63 67 69 70 74 76 81 86 91 95 97 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning Cost Y 4.28 4.08 4.42 4.17 4.48 4.30 4.82 4.70 5.11 5.13 5.64 5.56 1-5 Scatter Plot of Airline Cost Data 6 5 Cost ($1000) 4 3 2 1 0 0 20 40 60 80 100 120 Number of Passengers Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-6 Regression Models Deterministic Regression Model Y = 0 + 1X Probabilistic Regression Model Y = 0 + 1X + 0 and 1 are population parameters 0 and 1 are estimated by sample statistics b0 and b1 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-7 Equation of the Simple Regression Line Yˆ b0 b1 X where : b 0 = t hesampleint ercept b = t hesampleslope 1 Yˆ = t hepredict edvalue of Y Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-8 Least Squares Analysis X X Y Y XY nXY b X n X X X 2 1 2 2 X Y XY n X 2 X 2 n Y X b Y b X n b n 0 1 1 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-9 Least Squares Analysis SSXY X X Y Y XY SSXX b1 X X 2 X 2 X Y n X 2 n SSXY SSXX Y X b Y b X n b n 0 1 1 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-10 Solving for b1 and b0 of the Regression Line for the Airline Cost Example (Part 1) X Number of Passengers X Cost ($1,000) Y X2 61 63 67 69 70 74 76 81 86 91 95 97 4.28 4.08 4.42 4.17 4.48 4.30 4.82 4.70 5.11 5.13 5.64 5.56 3,721 3,969 4,489 4,761 4,900 5,476 5,776 6,561 7,396 8,281 9,025 9,409 = 930 Y = 56.69 X 2 = 73,764 XY 261.08 257.04 296.14 287.73 313.60 318.20 366.32 380.70 439.46 466.83 535.80 539.32 XY Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning = 4,462.22 1-11 Solving for b1 and b0 of the Regression Line for the Airline Cost Example (Part 2) SSXY XY SSXX X b1 b0 2 X Y n ( X ) 2 n 4,462.22 (930)(56.69) 68.745 12 (930) 2 73,764 1689 12 SSXY 68.745 .0407 SSXX 1689 Y b X n 1 n 56.69 930 (.0407) 1.57 12 12 Yˆ 1.57 .0407X Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-12 Graph of Regression Line for the Airline Cost Example 6 5 Cost ($1000) 4 3 2 1 0 0 20 40 60 80 100 120 Number of Passengers Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-13 Residual Analysis: Airline Cost Example Number of Passengers X 61 63 67 69 70 74 76 81 86 91 95 97 Cost ($1,000) Y Predicted Value Yˆ Residual Y Yˆ 4.28 4.08 4.42 4.17 4.48 4.30 4.82 4.70 5.11 5.13 5.64 5.56 4.053 4.134 4.297 4.378 4.419 4.582 4.663 4.867 5.070 5.274 5.436 5.518 .227 -.054 .123 -.208 .061 -.282 .157 -.167 .040 -.144 .204 .042 (Y Yˆ ) .001 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-14 Excel Graph of Residuals for the Airline Cost Example 0.2 Residual 0.1 0.0 -0.1 -0.2 -0.3 60 70 80 90 100 Number of Passengers Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-15 Nonlinear Residual Plot 0 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning X 1-16 Nonconstant Error Variance 0 0 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning X X 1-17 Graphs of Nonindependent Error Terms 0 X X 0 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-18 Healthy Residual Plot 0 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning X 1-19 Standard Error of the Estimate Sum of Squares Error SSE Standard Error of the Estimate Y Y 2 Y b0 Y b1 XY 2 Se SSE n2 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-20 Determining SSE for the Airline Cost Example Number of Passengers X 61 63 67 69 70 74 76 81 86 91 95 97 Cost ($1,000) Y 4.28 4.08 4.42 4.17 4.48 4.30 4.82 4.70 5.11 5.13 5.64 5.56 Residual Y Yˆ (Y Yˆ ) 2 .227 -.054 .123 -.208 .061 -.282 .157 -.167 .040 -.144 .204 .042 .05153 .00292 .01513 .04326 .00372 .07952 .02465 .02789 .00160 .02074 .04162 .00176 (Y Yˆ ) .001 (Y Yˆ ) 2 =.31434 Sum of squares of error = SSE = .31434 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-21 Standard Error of the Estimate for the Airline Cost Example Sum of Squares Error SSE Standard Error of the Estimate Y Yˆ 2 0.31434 SSE Se n 2 0.31434 10 0.1773 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-22 Coefficient of Determination SSYY Y Y Y 2 Y 2 2 n SSYY exp lained var iation un exp lained var iation SSYY SSR SSE SSR SSE 1 SSYY SSYY SSR 2 r SSYY SSE 1 SSYY SSE 1 Y 2 Y n 2 0 r 1 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 2 1-23 Coefficient of Determination for the Airline Cost Example SSE 0.31434 Y 56.69 3.11209 270.9251 Y 2 SSYY 2 2 n SSE r 1 SSYY .31434 1 3.11209 ..899 12 2 89.9% of the variability of the cost of flying a Boeing 737 is accounted for by the number of passengers. Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-24 Hypothesis Tests for the Slope of the Regression Model S b t S S S SSE n2 1 H 0: 1 0 H 1: 1 0 H 0: 1 0 H 1: 1 0 H 0: 1 0 H 1: 1 0 1 b where: e b e SSXX X 2 SSXX 1 X 2 n the hypothesized slope df n 2 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-25 Point Estimation for the Airline Cost Example Yˆ 1.57 0.0407X For X 73, Yˆ 1.57 0.040773 4.5411or $4,541.10 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-26 Confidence Interval to Estimate Y : Airline Cost Example X 1 X 0 n SS XX 2 where : X 0 a part icularvalue of X Yˆ t , n 2 S e 2 X 2 SS XX = X 2 n For X 0 73 and a 95% confidencelevel, 73 77.5 930 73,764 2 4.5411 2.2280.1773 1 12 2 12 4.5411 1220 4.4191 E Y 73 4.6631 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-27 Confidence Interval to Estimate the Average Value of Y for some Values of X: Airline Cost Example X 62 68 73 85 90 Confidence Interval 4.0934 + .1876 4.3376 + .1461 4.5411 + .1220 5.0295 + .1349 5.2230 + .1656 3.9058 to 4.2810 4.1915 to 4.4837 4.4191 to 4.6631 4.8946 to 5.1644 5.0674 to 5.3986 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-28 Prediction Interval to Estimate Y for a given value of X 1 X 0 X ˆ Y t ,n 2 S e 1 n SS XX 2 where : X 0 a particularvalue of X 2 X 2 SS XX = X 2 n Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-29 Confidence Intervals for Estimation Regression Plot 6 Cost 5 Regression 4 95% CI 95% PI 60 70 80 90 100 Number of Passengers Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-30 MINITAB Regression Analysis of the Airline Cost Example The regression equation is Cost = 1.57 + 0.0407 Number of Passengers Predictor Constant Number o Coef 1.5698 0.040702 S = 0.1772 StDev 0.3381 0.004312 R-Sq = 89.9% T 4.64 9.44 P 0.001 0.000 R-Sq(adj) = 88.9% Analysis of Variance Source Regression Residual Error Total Obs 1 2 3 4 5 6 7 8 9 10 11 12 Number o 61.0 63.0 67.0 69.0 70.0 74.0 76.0 81.0 86.0 91.0 95.0 97.0 DF 1 10 11 Cost 4.2800 4.0800 4.4200 4.1700 4.4800 4.3000 4.8200 4.7000 5.1100 5.1300 5.6400 5.5600 SS 2.7980 0.3141 3.1121 Fit 4.0526 4.1340 4.2968 4.3782 4.4189 4.5817 4.6631 4.8666 5.0701 5.2736 5.4364 5.5178 MS 2.7980 0.0314 F 89.09 StDev Fit 0.0876 0.0808 0.0683 0.0629 0.0605 0.0533 0.0516 0.0533 0.0629 0.0775 0.0912 0.0984 P 0.000 Residual 0.2274 -0.0540 0.1232 -0.2082 0.0611 -0.2817 0.1569 -0.1666 0.0399 -0.1436 0.2036 0.0422 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning St Resid 1.48 -0.34 0.75 -1.26 0.37 -1.67 0.93 -0.99 0.24 -0.90 1.34 0.29 1-31 Pearson Product-Moment Correlation Coefficient r SSXY SSX SSY X X Y Y X X Y Y X Y XY n 2 2 X 2 X 2 n Y Y 2 n 2 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1 r 1 1-32 Three Degrees of Correlation r<0 r>0 r=0 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-33 Computation of r for the Economics Example (Part 1) Day 1 2 3 4 5 6 7 8 9 10 11 12 Summations Interest X 7.43 7.48 8.00 7.75 7.60 7.63 7.68 7.67 7.59 8.07 8.03 8.00 92.93 Futures Index Y 221 222 226 225 224 223 223 226 226 235 233 241 2,725 X2 55.205 55.950 64.000 60.063 57.760 58.217 58.982 58.829 57.608 65.125 64.481 64.000 720.220 Y2 48,841 49,284 51,076 50,625 50,176 49,729 49,729 51,076 51,076 55,225 54,289 58,081 619,207 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning XY 1,642.03 1,660.56 1,808.00 1,743.75 1,702.40 1,701.49 1,712.64 1,733.42 1,715.34 1,896.45 1,870.99 1,928.00 21,115.07 1-34 Computation of r for the Economics Example (Part 2) r X Y XY n 2 X Y 2 2 X n Y n 92.93 2725 21115 , .07 12 2 92 . 93 720.22 619,207 2725 12 12 2 2 .815 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-35 Scatter Plot and Correlation Matrix for the Economics Example 245 Futures Index 240 235 230 225 220 7.40 7.60 7.80 8.00 8.20 Interest Interest Interest Futures Index Futures Index 1 0.815254 Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1 1-36 Covariance 2 XY X X Y Y N X Y XY N N SSXY N Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 1-37 Covariance Matrix and Descriptive Statistics for the Economics Example Interest Futures Index Interest Futures Index 0.050408 1.11053 36.81060606 Interest Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence Level(95.0%) Futures Index 7.74416667 0.06481276 7.675 8 0.224518 0.05040833 -1.4077097 0.3197374 0.64 7.43 8.07 92.93 12 0.14265201 Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence Level(95.0%) Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning 227.08 1.7514 225.5 226 6.0672 36.811 1.2427 1.3988 20 221 241 2725 12 3.8549 1-38