#### Transcript Simple Regression

Simple Regression Major Questions • Given an economic model involving a relationship between two economic variables, how do we go about specifying the corresponding statistical model? • Given the statistical model and a sample of data on two economic variables, how do we use this information? Main Points • Identify relationships between economic variables • Answer questions like: If one variable changes in some way, by how much does another variable change? • Move from studying one economic variable to studying two Specific Example • Extend household food expenditure • Let population of interest be all households with three members, no matter what income level • Can now look at what happens to food expenditure as income rises or falls? Economic Model • Y : Household expenditure on food • X : Household income • Economic Model (general form) y = f(x) • Specifies that household expenditure on food is some function of household income Relationship between y and x • Need to quantify the change in food expenditure that occurs when income changes • Must be more precise about the nature of the relationship between x and y • Many possible forms – sometimes theory provides some guide • Simplest form: y = a + bx (linear) Statistical Model: Error Terms • Economic model is an approximation • Need to account for other factors that affect the relationship between economic variables • Add an unobservable error term (e) y = a + bx + e 1. The combined effect of other influences on x 2. Approximation error from functional form 3. Elements of random behavior by individuals Adding Data • Suppose we have observations of y and x from i = 1,2,…, n households yi = a + bxi + ei • y : Dependent or Response variable • x : Explanatory variable • Level of household expenditure on food is related to the level of household income Method of Least Squares • The parameters a and b tell us about the relationship between y and x • We need a rule to tell us how to make use of sample data to estimate the parameters • We use the Least Squares Method: find a line so that the sum of the squares of the vertical distances from each point to the line is as small as possible Residual Errors y y = a + bx yi (exp) . e } . i yi (obs) xi x Residual Errors • There will be n of these • Depend on the fitted line as defined by the specific values a and b • Squares can be summed 2= e i (y i - a - bxi )2 Normal equations y = na + bx xy = ax + bx 2 • Two equations in two unknowns • They can be solved for a unique solution Formulae for the regression coefficients xy n x y b 2 2 x nx y b x a n Correlation The correlation between two random variables X and Y measures the strength of the relationship between them. Coefficient of determination The coefficient of determination is a statistic which measures the extent to which the variation in Y is explained by the regression line of Y on X. It is denoted by r2. Coefficient of determination y y = a + bx y=y x Coefficient of determination The quantity ( y y) 2 is known as the unexplained variation since the deviations ( y y ) are completely random and thus unpredictable. Coefficient of determination The coefficient of determination is given by 2 r 2 ( y y ) 2 ( y y) where numerator = explained variation denominator = total variation Coefficient of determination The coefficient of determination is given by a y b xy ny r 2 2 y ny 2 2 Correlation coefficient The linear product-moment correlation coefficient is given by r n xy x y n x x 2 2 n y y 2 2