Transcript Chapter 11
11-1 Chapter 11 Correlation and Regression © The McGraw-Hill Companies, Inc., 2000 11-2 Outline 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression © The McGraw-Hill Companies, Inc., 2000 11-3 Outline 11-5 Coefficient of Determination and Standard Error of Estimate © The McGraw-Hill Companies, Inc., 2000 11-4 Objectives Draw a scatter plot for a set of ordered pairs. Find the correlation coefficient. Test the hypothesis H0: = 0. Find the equation of the regression line. © The McGraw-Hill Companies, Inc., 2000 11-5 Objectives Find the coefficient of determination. Find the standard error of estimate. © The McGraw-Hill Companies, Inc., 2000 11-6 11-2 Scatter Plots A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable, x, and the dependent variable, y. © The McGraw-Hill Companies, Inc., 2000 11-7 11-2 Scatter Plots - Example Construct a scatter plot for the data obtained in a study of age and systolic blood pressure of six randomly selected subjects. The data is given on the next slide. © The McGraw-Hill Companies, Inc., 2000 11-8 11-2 Scatter Plots - Example Subject Age, x Pressure, y A 43 128 B 48 120 C 56 135 D 61 143 E 67 141 F 70 152 © The McGraw-Hill Companies, Inc., 2000 11-2 Scatter Plots - Example Positive Relationship Pressure Pressure 11-9 150 150 140 140 130 130 120 120 40 40 50 50 Age Age 60 60 70 70 © The McGraw-Hill Companies, Inc., 2000 11-2 Scatter Plots - Other Examples Negative Relationship 90 90 Finalgrade grade Final 11-10 80 80 70 70 60 60 50 50 40 40 55 10 10 Number Numberofofabsences absences 15 15 © The McGraw-Hill Companies, Inc., 2000 11-2 Scatter Plots - Other Examples No Relationship 10 10 y Y 11-11 55 00 00 10 10 20 20 30 30 40 40 50 50 60 60 70 70 xX © The McGraw-Hill Companies, Inc., 2000 11-12 11-3 Correlation Coefficient The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables. Sample correlation coefficient, r. Population correlation coefficient, © The McGraw-Hill Companies, Inc., 2000 11-13 11-3 Range of Values for the Correlation Coefficient Strong negative relationship No linear relationship Strong positive relationship © The McGraw-Hill Companies, Inc., 2000 11-14 11-3 Formula for the Correlation Coefficient r r n xy x y n x x n y y 2 2 2 2 Where n is the number of data pairs © The McGraw-Hill Companies, Inc., 2000 11-15 11-3 Correlation Coefficient Example (Verify) Compute the correlation coefficient for the age and blood pressure data. x 345, y = 819, xy = 47 634 2 2 x 20 399 , y 112 443. Substituting in the form ula for r gives r 0.897. © The McGraw-Hill Companies, Inc., 2000 11-16 11-3 The Significance of the Correlation Coefficient The population correlation coefficient, , is the correlation between all possible pairs of data values (x, y) taken from a population. © The McGraw-Hill Companies, Inc., 2000 11-17 11-3 The Significance of the Correlation Coefficient H0: = 0 H1: 0 This tests for a significant correlation between the variables in the population. © The McGraw-Hill Companies, Inc., 2000 11-18 11-3 Formula for the t-tests for the Correlation Coefficient n2 t 1 r with d .f . n 2 2 © The McGraw-Hill Companies, Inc., 2000 11-19 11-3 Example Test the significance of the correlation coefficient for the age and blood pressure data. Use = 0.05 and r = 0.897. Step 1: State the hypotheses. H0: = 0 H1: 0 © The McGraw-Hill Companies, Inc., 2000 11-20 11-3 Example Step 2: Find the critical values. Since = 0.05 and there are 6 – 2 = 4 degrees of freedom, the critical values are t = +2.776 and t = –2.776. Step 3: Compute the test value. t = 4.059 (verify). © The McGraw-Hill Companies, Inc., 2000 11-21 11-3 Example Step 4: Make the decision. Reject the null hypothesis, since the test value falls in the critical region (4.059 > 2.776). Step 5: Summarize the results. There is a significant relationship between the variables of age and blood pressure. © The McGraw-Hill Companies, Inc., 2000 11-22 11-4 Regression The scatter plot for the age and blood pressure data displays a linear pattern. We can model this relationship with a straight line. This regression line is called the line of best fit or the regression line. The equation of the line is y = a + bx. © The McGraw-Hill Companies, Inc., 2000 11-23 11-4 Formulas for the Regression Line y = a + bx. y x x xy a n x x n xy x y b n x x 2 2 2 2 2 Where a is the y intercept and b is the slope of the line. © The McGraw-Hill Companies, Inc., 2000 11-24 11-4 Example Find the equation of the regression line for the age and the blood pressure data. Substituting into the formulas give a = 81.048 and b = 0.964 (verify). Hence, y = 81.048 + 0.964x. Note, a represents the intercept and b the slope of the line. © The McGraw-Hill Companies, Inc., 2000 11-4 Example 150 150 Pressure Pressure 11-25 140 140 y = 81.048 + 0.964x 130 130 120 120 40 40 50 50 Age Age 60 60 70 70 © The McGraw-Hill Companies, Inc., 2000 11-26 11-4 Using the Regression Line to Predict The regression line can be used to predict a value for the dependent variable (y) for a given value of the independent variable (x). Caution: Use x values within the experimental region when predicting y values. © The McGraw-Hill Companies, Inc., 2000 11-27 11-4 Example Use the equation of the regression line to predict the blood pressure for a person who is 50 years old. Since y = 81.048 + 0.964x, then y = 81.048 + 0.964(50) = 129.248 129.2 Note that the value of 50 is within the range of x values. © The McGraw-Hill Companies, Inc., 2000 11-28 11-5 Coefficient of Determination and Standard Error of Estimate The coefficient of determination, denoted by r2, is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable. © The McGraw-Hill Companies, Inc., 2000 11-29 11-5 Coefficient of Determination and Standard Error of Estimate r2 is the square of the correlation coefficient. The coefficient of nondetermination is (1 – r2). Example: If r = 0.90, then r2 = 0.81. © The McGraw-Hill Companies, Inc., 2000 11-30 11-5 Coefficient of Determination and Standard Error of Estimate The standard error of estimate, denoted by sest, is the standard deviation of the observed y values about the predicted y values. The formula is given on the next slide. © The McGraw-Hill Companies, Inc., 2000 11-31 11-5 Formula for the Standard Error of Estimate y y s n2 or 2 est y a y b xy s n2 2 est © The McGraw-Hill Companies, Inc., 2000 11-32 11-5 Standard Error of Estimate Example From the regression equation, y = 55.57 + 8.13x and n = 6, find sest. Here, a = 55.57, b = 8.13, and n = 6. Substituting into the formula gives sest = 6.48 (verify). © The McGraw-Hill Companies, Inc., 2000