下載/瀏覽

Transcript 下載/瀏覽

A Dynamic Discretization Approach
for Constructing Decision Trees
with a Continuous Label
Adviser: Yu-Chiang Li
Speaker: Gung-Shian Lin
Date:2010/07/20
IEEE TRANSACTIONS ON
KNOWLEDGE AND DATA
ENGINEERING, VOL. 21, NO. 11,
NOVEMBER 2009
南台科技大學
資訊工程系
Outline
2
1
Introduction
2
Related work
3
The proposed algorithm
4
Performance evaluation
5
Conclusion
1. Introduction
 When the label is a continuous variable in the data,
two possible approaches based on existing decision
tree algorithms can be used to handle the situations.
 The first uses a data discretization method in the
preprocessing stage to convert the continuous label into a
class label defined by a finite set of nonoverlapping
intervals and then applies a decision tree algorithm.
 The second simply applies a regression tree algorithm,
using the continuous label directly.
3
1. Introduction
 We propose an algorithm that dynamically discretizes
the continuous label at each node during the tree
induction process. The proposed algorithm has the
following two important features:
 The algorithm dynamically performs discretization based on
the data associated with the node in the process of
constructing a tree.
 The algorithm can also produce the mean, median, and other
statistics for each leaf node as part of its output.
4
2. Related work
5
2. Related work
 Main DT algorithms type
 Data discretization method
• Drawback：may cannot provide good fit for the data.
 Regression tree algorithm
• Drawback： size of a regression tree is usually large, results are
often not accurate.
6
2. Related work
 Data discretization method (C4.5)





equal width method
equal depth method
clustering method
Monothetic Contrast Criterions (MCCs)
3-4-5 partition method
 Regression tree algorithm
 Classification and Regression Trees (CARTs)
7
3. The proposed algorithm
 The main steps of the algorithm are outlined as
follows:
8
3. The proposed algorithm
9
3. The proposed algorithm
 We rewrite steps 6 and 7 into the following more
detailed steps:
10
3. The proposed algorithm
 We use three sections to explain the following key
steps in the algorithms:
11
3. The proposed algorithm
 Determining Nonoverlapping Intervals
Set Ci ±16
C5：40-16=24 & 40+16=56
C8：65-16=49 & 65+16=81
Neighboring range:
C1：33
C9：35
C2：28
C10：27
C3：27
C11：28
C4：28
C5：10
C6：11
C7：24
C8：29
12
3. The proposed algorithm
13
3. The proposed algorithm
14
3. The proposed algorithm
 Computing the Goodness Value
15
3. The proposed algorithm
16
3. The proposed algorithm
 Stopping Tree Growing
17
4. Performance evaluation
18
4. Performance evaluation
 First Experiment: Comparing CLC and Approach 1
19
4. Performance evaluation
20
4. Performance evaluation
21
4. Performance evaluation
 Second Experiment: CLC and Regression Trees
22
4. Performance evaluation
 Third Experiment: Supplementary Comparisons
23
5. Conclusion
 Extensive numerical experiments have been
performed to evaluate the proposed algorithm. The
results also confirm the efficiency and accuracy of the
proposed algorithm.
24
南台科技大學
資訊工程系

下載/瀏覽

Transcript 下載/瀏覽

Directory