Transcript Slide 1
• (Numerical) prediction is similar to classification – construct a model – use model to predict continuous or ordered value for a given input • Prediction is different from classification – Classification refers to predict categorical class label – Prediction models continuous-valued functions • Major method for prediction: regression – model the relationship between one or more independent or predictor variables and a dependent or response variable • Regression analysis – Linear and multiple regression – Non-linear regression – Other regression methods: generalized linear model, Poisson regression, log-linear models, regression trees Nonlinear Regression • Some nonlinear models can be modeled by a polynomial function • A polynomial regression model can be transformed into linear regression model. For example, y = w0 + w1 x + w2 x2 + w3 x3 convertible to linear with new variables: x2 = x2, x3= x3 y = w0 + w1 x + w2 x2 + w3 x3 • Other functions, such as power function, can also be transformed to linear model • Some models are intractable nonlinear (e.g., sum of exponential terms) – possible to obtain least square estimates through Other Regression-Based Models • Generalized linear model: – Foundation on which linear regression can be applied to modeling categorical response variables – Variance of y is a function of the mean value of y, not a constant – Logistic regression: models the prob. of some event occurring as a linear function of a set of predictor variables – Poisson regression: models the data that exhibit a Poisson distribution • Log-linear models: (for categorical data) – Approximate discrete multidimensional prob. distributions – Also useful for data compression and smoothing • Regression trees and model trees – Trees to predict continuous values rather than class labels Classification • Any regression technique can be used for classification – Training: perform a regression for each class, setting the output to 1 for training instances that belong to class, and 0 for those that don’t – Prediction: predict class corresponding to model with largest output value (membership value) • For linear regression this is known as multiresponse linear regression Discussion of linear models • Not appropriate if data exhibits non-linear dependencies • But: can serve as building blocks for more complex schemes • Example: multi-response linear regression defines a hyperplane for any two given classes: (2) (1) (2) (1) (2) (w(1) 0 w0 )a0 (w1 w1 )a1 (w2 w2 )a2 (2) (w(1) k wk )ak 0 Odds can also be found by counting the number of people in each group and dividing one number by the other. Clearly, the probability is not the same as the odds.) In our example, the odds would be .90/.10 or 9 to one. Now the odds of being female would be .10/.90 or 1/9 or .11. This asymmetry is unappealing, because the odds of being a male should be the opposite of the odds of being female. We can take care of this asymmetry though the natural logarithm, ln. The natural log of 9 is 2.217 (ln(.9/.1) = 2.217). The natural log of 1/9 is -2.217 (ln(.1/.9) = -2.217), so the log odds of being male is exactly opposite to the log odds of being female. The natural log function looks like this: In logistic regression, the dependent variable is a logit, which is the natural log of the odds, that is, Logistic regression • Problem: some assumptions violated when linear regression is applied to classification problems • Logistic regression: alternative to linear regression – Designed for classification problems – Tries to estimate class probabilities directly • Does this using the maximum likelihood method – Uses this linear model: P log w0 a0 w1a1 w2 a2 1 P Class probability wk ak Discussion of linear models • Not appropriate if data exhibits non-linear dependencies • But: can serve as building blocks for more complex schemes • Example: multi-response linear regression defines a hyperplane for any two given classes: (2) (1) (2) (1) (2) (w(1) 0 w0 )a0 (w1 w1 )a1 (w2 w2 )a2 (2) (w(1) k wk )ak 0