Transcript Slide 1
Chapter 13 Nonlinear and Multiple Regression Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 13.1 Aptness of the Model and Model Checking Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Standardized Residuals The standardized residuals are given by * ei y i yˆ i s 1 1 n ( xi x ) i 1, ..., n 2 (x j x ) 2 j Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Diagnostic Plots The basic plots for an assessment of model validity and usefulness are 1. ei* (or ei) on the vertical axis vs. xi on the horizontal axis. 2. ei* (or ei) on the vertical axis vs. yi on the horizontal axis. (these two plots are called residual plots) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Diagnostic Plots 3. yˆ i on the vertical axis vs. yi on the horizontal axis. 4. A normal probability plot of the standardized residuals. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Difficulties in the Plots 1. A nonlinear probabilistic relationship between x and y is appropriate. 2. The variance of (and of Y) is not a constant 2 but depends on x. 3. The selected model fits well except for a few outlying data values, which may have greatly influenced the choice of the bestfit function. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Difficulties in the Plots 4. The error term does not have a normal distribution. 5. When the subscript i indicates the time order of the observations, the i 's exhibit dependence over time. 6. One or more relevant independent variables have been omitted from the model. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Abnormality in Data (and Remedies) Non linear relationship (use nonlinear model) Discrepant observation Non-constant variance (weighted least squares) Observation with large influence (omit value or MAD) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Abnormality in Data (and Remedies) Dependence in errors (transform y’s or model including time) Variable omitted (multiple regression model including omitted variable) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 13.2 Regression With Transformed Variables Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Intrinsically Linear – Function A function relating y to x is intrinsically linear if by means of a transformation on x and/or y, the function can be expressed as y 0 1 x , where x , y are the transformed independent and dependent variables respectively. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Intrinsically Linear Functions Function y e Transformation x Linear Form y ln ( y ) y ln ( ) x y ln ( y ), x lo g ( x ) y ln ( ) x (exponential) y x (power) y log( x ) y 1 x (reciprocal) x ln( x ) x 1 y x y x x Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Intrinsically Linear – Probabilistic Model A probabilistic model relating Y to x is intrinsically linear if by means of a transformation on Y and/or x, the it can be reduced to a linear probabilistic model Y 0 1 x . Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Intrinsically Linear Probabilistic Models Exponential x Y e Power Y x ln (Y ) Y ln ( ) x ln ( ) log(Y ) Y log( ) log( x ) log( ) ( x ) Y log( x ) x log( x ) yields the m odel Reciprocal 1 Y 1 x x yields m odel x Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Analyzing Transformed Data 1. Estimating 0 and 1 and then transforming back to obtain estimates of the original parameters is not equivalent to using the principle of least squares on the original model. 2. If the chosen model is not intrinsically linear, least squares would have to be applied to the untransformed model. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Analyzing Transformed Data 3. When a transformed model satisfies the assumptions in chap.12, the method of least squares yields the best estimates of the transformed parameters. The estimates of the original parameters may not be the best. 4. After a transformation on y, to use standard formulas to test hypotheses or construct CI’s, i 's should be at least normally distributed. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Analyzing Transformed Data 5. When y is transformed, the r2 value from the resulting regression refers to variation in the y i 's explained by the transformation regression model. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Logit Function An example where we have a function of 0 1 x . p( x) e 0 1 x 1 e 0 1 x Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Logistic Regression Logistic regression means assuming that p(x) is related to x by the logit function. Algebra yields: e 0 1 x p( x) 1 p( x) Called the odds ratio Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 13.3 Polynomial Regression Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Polynomial Regression Model The kth-degree polynomial regression model equation is Y 0 1 x 2 x ... k x 2 k is normally distributed with 0 2 2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Regression Models Quadratic Model Cubic Model Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Estimation of Parameters Using Least Squares k + 1 normal equations: b0 n b1 x i b2 b0 x i b1 b0 k xi 2 xi b1 2 xi ... bk ... bk k 1 xi k 1 xi ... bk 2k xi k xi yi xi y i k xi yi Solve for estimates of ˆ1 , ... ˆ k . Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. ˆ Estimate of and R 2 2 ˆ s 2 2 SSE 2 n ( k 1) M SE Coefficient of multiple determination: R2 R 2 1 SST Adjusted R2 2 Ra 1 SSE n 1 SSE n ( k 1) SST 2 ( n 1) R k n 1 k Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Interval and Test A 100(1 )% C I for i , the coefficient of xi in the polynomial regression function is ˆi t / 2, n ( k 1) s ˆ i A test of H 0 : i i 0 is based on the following t statistic value and n – (k + 1) df. ˆ t i i0 s ˆ i Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. CI for Y x * Let x* denote a particular value of x. A 100(1 )% CI for Y x * is ˆ Y x * t / 2 , n ( k 1) with estim ated S D of ˆ Y x * k ˆ ˆ ˆ ˆ Y 0 1 x * ... k ( x *) sYˆ the est. standard deviation becomes yˆ t / 2 , n ( k 1) sYˆ Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. PI A 100(1 )% P I for a future y value is ˆ Y x * t / 2 , n ( k 1) yˆ t / 2, n ( k 1) 2 estim ated S D s ˆ of Y x* 2 2 1/ 2 2 s sYˆ Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Centering x Values Let x = the average of the xi’s for which observations are to be taken and consider Y 0 * 1 ( x x) * 2 (x * 1 ( x x) * 2 (x x) x ), In this model Y x 0 and the parameters describe the behavior of the regression near the center of the data. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 13.4 Multiple Regression Analysis Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. General Additive Multiple Regression Model Equation Y 0 1 x1 2 x 2 ... k x k where E ( ) 0 an d V ( ) . For purposes of testing hypotheses and calculating CIs or PIs, assume is normally distributed. 2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Models 1. The first-order model: Y 0 1 x1 2 x 2 2. Second-order no-interaction model: Y 0 1 x1 2 x 2 2 3 x1 2 4 x2 3. First-order predictors and interaction: Y 0 1 x1 2 x 2 3 x1 x 2 4. Second-order (full quadratic) model: Y 0 1 x1 2 x 2 2 3 x1 2 4 x2 5 x1 x 2 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Models With Predictors for Categorical Variables Using simple numerical coding, qualitative (categorical) variables can be incorporated into a model. With a dichotomous variable associate an indicator (dummy) variable x whose possible values are 0 and 1. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Estimation of Parameters Normal equations: b0 n b1 x1 j b 2 x 2 j ... b k x kj yj b0 x1 j b1 x1 j b 2 x1 j x 2 j ... b k x1 j x kj 2 x1 j y j b0 x kj b1 x1 j x kj ... b k 1 x k 1, j x kj b k x kj 2 x kj y j Solve for estimates of ˆ1 , ... ˆ k . Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. ˆ Estimate of and R 2 2 ˆ s 2 2 SSE 2 n ( k 1) M SE Coefficient of multiple determination: R2 R 2 1 SST Adjusted R2 2 Ra 1 SSE n 1 SSE n ( k 1) SST 2 ( n 1) R k n 1 k Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Test Null hypoth.: H 0 : 1 ... k 0 Alt. hypoth.: H a : at least one i 0 2 Test statistic: f R /k 2 (1 R ) /[ n ( k 1)] SSR / k S S E /[ n ( k 1)] M SR M SE Rejection region: f F , k , n ( k 1) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Inferences Based on the Model Y 0 1 x1 2 x 2 ... k x k 1. A 100(1 )% C I for i , the coefficient of xi in the regression function is ˆi t / 2, n ( k 1) s ˆ i Simultaneous CIs for several i 's for which the simultaneous confidence level is controlled can be obtained by the Bonferroni technique. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Inferences Based on the Model Y 0 1 x1 2 x 2 ... k x k 2. A test of H 0 : i i 0 is based on the following t statistic value and n – (k + 1) df. The test is upper-, lower-, or twotailed according to whether Ha contains the inequality >, < or . t ˆi i 0 s ˆ i Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Inferences Based on the Model Y 0 1 x1 2 x 2 ... k x k 3. A 100(1 )% C I for Y x * ,..., x * is 1 Y x * ,..., x * t / 2, n ( k 1) 1 k yˆ t where / 2 , n ( k 1) k estim ated S D of Y x * ,..., x * 1 k sYˆ * * ˆ ˆ ˆ ˆ Y 0 1 x1 ... k x k yˆ is the calculated value of Yˆ Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Inferences Based on the Model Y 0 1 x1 2 x 2 ... k x k 4. A 100(1 )% P I for a future y value is Y x * ,..., x * t / 2 , n ( k 1) 1 k yˆ t / 2, n ( k 1) 2 2 estim ated S D s of * * Y x ,..., x 1 k 2 1/ 2 2 s sYˆ Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. An F Test for a Group of Predictors H 0 : l 1 ... k 0 (so Y 0 1 x1 ... l xl is correct) versus H a : at least one l 1 , ..., k 0 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Procedure SSEk = unexplained variation (full model) SSEl = unexplained variation (reduced model) Test statistic: f (S S E k S S E l ) /( k 1) (S S E k ) /[ n ( k 1)] Rejection Region: f F , k , n ( k 1) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. 13.5 Other Issues in Multiple Regression Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Transformations in Multiple Regression Theoretical considerations or diagnostic plots may suggest a nonlinear relation between a dependent variable and two or more independent variables. Frequently a transformation will linearize the model. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Standardizing Variables Let xi and si be the sampled average and standard deviation of the xij’s Let x i ( x i x i ) / s i , then the coded full second-order model with two independent variables has regression function E (Y ) 0 1 x1 2 x 2 3 x3 4 x 4 5 x5 • increased numerical accuracy • more accurate estimation for the parameters Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1. If we examine regressions involving all possible subsets of the predictors for which data is available, what criteria should be used to select a model? 2. If the number of predictors is too large to examine all regressions, can a good model be found by examining a reduced number of subsets? Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Criteria for Variable Selection 2 Rk , the coefficient of multiple determination for a k-predictor model. Identify k for which R k2 is nearly as large as R2 for all predictors in the model. 1. 2. MSEk = SSE/(n – k – 1), the mean squared error for a k –predictor model. Find the model having minimum MSEk. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Criteria for Variable Selection 3. C k SSE k s 2 2( k 1) n A desirable model is specified by a subset of predictors for which Ck is small. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Stepwise Regression When the number of predictors is too large to allow for explicit or implicit examination of all possible subsets, alternative selection procedures generally identify good models. Two of these methods are the backward elimination (BE) method and the forward selection method (FS). Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Identifying Influential Observations In general, the predicted values corresponding to the sample observations can be written yˆ1 h1 1 y1 h1 2 y 2 ... h1 n y n yˆ n h n 1 y1 h n 2 y 2 ... h n n y n If hjj > 2(k + 1)/n, the jth observation is potentially influential (some regard 3(k + 1)/n as the value.) Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.