Transcript File
Linear Regression One Double Whopper with cheese provides 53 grams of protein, 1020 calories, and 65 grams of fat. The correlation between Fat and Protein for 30 of the items on the Burger King menu is 0.83 The fat content and calories for a Double Whopper are extreme, but are they an outlier? The association between protein and fat is a positive linear association with a fairly strong correlation of 0.83 If you want 25 grams of protein from a BK item, how much fat should you expect to consume? We can use a linear model to predict values. Regression Line 𝑦 = 𝑏0 + 𝑏1 𝑥 𝑏1 = 𝑠𝑙𝑜𝑝𝑒; 𝑏0 = y-intercept (Also known as the least squares line or line of best fit) Slope b1 r sy sx y-intercept b 0 y b1 x “Putting a hat on it” is notation to indicate that something has been predicted by a model. 𝑦 𝑖𝑠 𝑟𝑒𝑎𝑑 𝑎𝑠 "𝑦 ℎ𝑎𝑡" Before using a regression model, we need to check the same conditions as we do for correlation: Quantitative variables Straight pattern Check for outliers This model says that our predictions follow a straight line. The equation is given in slope-intercept form. For the association of fat and protein of Burger King items, the estimated linear model is: 𝑓𝑎𝑡 = 6.8 + 0.97𝑝𝑟𝑜𝑡𝑒𝑖𝑛 Example 𝑓𝑎𝑡 = 6.8 + 0.97𝑝𝑟𝑜𝑡𝑒𝑖𝑛 What is the predicted amount of fat for the BK Broiler chicken sandwich, which has 30 grams of protein? 𝑓𝑎𝑡 = 6.8 + 0.97 30 = 35.9𝑔 If we convert the data into z-scores, the scatterplot shifts to the origin. The origin is where both z-scores are 0. A zscore of 0 would happen at the mean. When the variables are standardized, the slope of the line turns out to be r. Moving one standard deviation away from the mean in one variable moves our estimate r standard devations from the mean in the other variable. Example A scatterplot of house price (in thousands of dollars) vs. house size (in thousands of square feet) for houses sold recently in Saratoga, NY shows a relationship that is straight, with only moderate scatter and no outliers. The correlation between Price and Size is 0.77. 1)You go to an open house and find that the house is 1 standard deviation above the mean in size. What would you guess about its price? 2)You read an ad for a house priced 2 standard deviations below the mean. What would you guess about its size? 3)A friend tells you about a house whose size in square meters is 1.5 standard deviations above the mean. What would you guess about its size in square feet? Answers 1)You should expect the price to be 0.77 standard deviations above the mean. 2)You should expect the size to be 2(0.77) = 1.54 standard deviations below the mean. 3)The home is 1.5 standard deviations above the mean in size no matter how it is measured. Residuals We predict that a BK Broiler chicken sandwich with 30 grams of protein should have 36 grams of fat, but it actually only has 25 grams of fat. The difference between the observed value and the predicted value is called the residual. 𝑦 − 𝑦 = 25 − 36 = −11 𝑔 A negative residual means the observed value is below the prediction on the regression line. A positive residual means the observed value is above the prediction on the regression line. If we keep the x-values and replace the y-values with the residuals, the resulting scatterplot has no pattern or direction. Example The linear model for Saratoga homes uses the Size and Price: 𝑃𝑟𝑖𝑐𝑒 = −3.117 + 94.454𝑆𝑖𝑧𝑒. 1)What does the slope of 94.454 mean? 2)What are the units of the slope? 3)Your house is 2000 sq ft bigger than your neighbor’s house. How much more do you expect it to be worth? 4)Is the y-intercept of -3.117 meaningful? Answers 1)An increase in home size of 1000 sq ft is associated with an increase in price of $94,454 2)Units are in thousands of dollars per thousand square feet 3)About $188,908 4)No… You can’t have a zero square ft. house Example The linear model for Saratoga homes uses the Size and Price: 𝑃𝑟𝑖𝑐𝑒 = −3.117 + 94.454𝑆𝑖𝑧𝑒. Suppose you’re thinking of buying a home there. 1)Would you prefer to find a home with a negative or positive residual? Explain. 2)You plan to look for a home of about 3000 square feet. How much should you expect to have to pay? 3)You find a nice home that size which is selling for $300,000. What’s the residual? Answers 1)Negative; that indicated it’s priced lower than a typical home of its size. 2)$280,250 3)$19,755 Today’s Assignment Add to Homework #5: p. 192 #1-6, 11, 12