Transcript Slide 1

• (Numerical) prediction is similar to classification
– construct a model
– use model to predict continuous or ordered value for a given input
• Prediction is different from classification
– Classification refers to predict categorical class label
– Prediction models continuous-valued functions
• Major method for prediction: regression
– model the relationship between one or more independent or
predictor variables and a dependent or response variable
• Regression analysis
– Linear and multiple regression
– Non-linear regression
– Other regression methods: generalized linear model, Poisson
regression, log-linear models, regression trees
Nonlinear Regression
• Some nonlinear models can be modeled by a
polynomial function
• A polynomial regression model can be transformed
into linear regression model. For example,
y = w0 + w1 x + w2 x2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
• Other functions, such as power function, can also
be transformed to linear model
• Some models are intractable nonlinear (e.g., sum
of exponential terms)
– possible to obtain least square estimates through
Other Regression-Based Models
• Generalized linear model:
– Foundation on which linear regression can be applied to modeling
categorical response variables
– Variance of y is a function of the mean value of y, not a constant
– Logistic regression: models the prob. of some event occurring as a
linear function of a set of predictor variables
– Poisson regression: models the data that exhibit a Poisson
distribution
• Log-linear models: (for categorical data)
– Approximate discrete multidimensional prob. distributions
– Also useful for data compression and smoothing
• Regression trees and model trees
– Trees to predict continuous values rather than class labels
Classification
• Any regression technique can be used for
classification
– Training: perform a regression for each class, setting
the output to 1 for training instances that belong to
class, and 0 for those that don’t
– Prediction: predict class corresponding to model with
largest output value (membership value)
• For linear regression this is known as multiresponse linear regression

Discussion of linear models
• Not appropriate if data exhibits non-linear
dependencies
• But: can serve as building blocks for more
complex schemes
• Example: multi-response linear regression
defines a hyperplane for any two given classes:
(2)
(1)
(2)
(1)
(2)
(w(1)
0  w0 )a0  (w1  w1 )a1  (w2  w2 )a2 
(2)
 (w(1)
k  wk )ak  0
Odds can also be found by counting the number of people in each group and
dividing one number by the other.
Clearly, the probability is not the same as the odds.) In our example, the odds
would be .90/.10 or 9 to one.
Now the odds of being female would be .10/.90 or 1/9 or .11. This asymmetry is
unappealing, because the odds of being a male should be the opposite of the
odds of being female.
We can take care of this asymmetry though the natural logarithm, ln.
The natural log of 9 is 2.217 (ln(.9/.1) = 2.217).
The natural log of 1/9 is -2.217 (ln(.1/.9) = -2.217), so the log odds of being
male is exactly opposite to the log odds of being female. The natural log function
looks like this:
In logistic regression, the dependent variable is a logit, which is the natural log of
the odds, that is,
Logistic regression
• Problem: some assumptions violated when
linear regression is applied to classification
problems
• Logistic regression: alternative to linear
regression
– Designed for classification problems
– Tries to estimate class probabilities directly
• Does this using the maximum likelihood method
– Uses this linear model:
 P 
log
 w0 a0  w1a1  w2 a2 
1 P 
Class probability
 wk ak

Discussion of linear models
• Not appropriate if data exhibits non-linear
dependencies
• But: can serve as building blocks for more
complex schemes
• Example: multi-response linear regression
defines a hyperplane for any two given classes:
(2)
(1)
(2)
(1)
(2)
(w(1)
0  w0 )a0  (w1  w1 )a1  (w2  w2 )a2 
(2)
 (w(1)
k  wk )ak  0