Transcript Document

Boosting
Rong Jin
Inefficiency with Bagging
Bagging
D
Inefficient boostrap sampling:
• Every example has equal chance to be
sampled
• No distinction between “easy”
examples and “difficult” examples
Boostrap Sampling
D1
D2
Dk
…
Inefficient model combination:
• A constant weight for each classifier
• No distinction between accurate
classifiers and inaccurate classifiers
h1
h2
i Pr(c | hi , x)
hk
Improve the Efficiency of Bagging
Better sampling strategy
• Focus on the examples that are difficult to classify
Better combination strategy
• Accurate model should be assigned larger weights
Intuition
Classifier1
+
X1
X2
X3
X4
Y1
Y2
Y3
Y4
Classifier2
+
X1
X3
X1
Y1
Y3
Y1
Classifier3
AdaBoost Algorithm
AdaBoost Example: t=ln2
D0:
h1
D1:
h2
D2:
x1, y1
x2, y2
x3, y3
x4, y4
x5, y5
1/5
1/5
1/5
1/5
1/5





x1, y1
x2, y2
x3, y3
x4, y4
x5, y5
1/7
1/7
2/7
1/7
2/7





x1, y1
x2, y2
x3, y3
x4 , y 4
x5, y5
2/9
1/9
1/9
4/9
1/9
Sample
x1, y1
x3, y3
x5, y5
Training
Update
Weights
Sample
h1
x1, y1
x3, y3
Training
Update
Weights
h2
Sample …
How To Choose t in AdaBoost?
How to construct the best distribution Dt+1(i)
1.
2.
Dt+1(i) should be significantly different from Dt(i)
Dt+1(i) should create a situation that classifier ht performs poorly
How To Choose t in AdaBoost?
Optimization View for Choosing t
ht(x): x{1,-1}; a base (weak) classifier
HT(x): a linear combination of basic classifiers
Goal: minimize training error
Approximate error swith a exponential function
AdaBoost: Greedy Optimization
Fix HT-1(x), and solve hT(x) and t
Empirical Study of AdaBoost
AdaBoosting decision trees
•
•
Generate 50 decision trees by
AdaBoost
Linearly combine decision trees using
the weights of AdaBoost
In general:
•
•
AdaBoost = Bagging > C4.5
AdaBoost usually needs less number
of classifiers than Bagging
Bia-Variance Tradeoff for AdaBoost
• AdaBoost can reduce both variance and bias
simultaneously
variance
bias
single decision tree
Bagging decision tree
AdaBoosting decision
trees