Rule Induction with Extension Matrices Dr. Xindong Wu

Download Report

Transcript Rule Induction with Extension Matrices Dr. Xindong Wu

HFL
Rule Induction with Extension Matrices
Dr. Xindong Wu
Journal of the American Society for Information Science
VOL. 49, NO. 5, 1998
MFL
Presented by Peter Duval
Context
• HFL/HCV presents an alternative to decision
trees with rule induction.
• HCV can be used as a benchmark for rule
induction.
Context
• This paper condenses Dr. Wu’s Ph.D.
dissertation on the extension matrix approach
and HFL/HCV algorithm.
• Look to University of Illinois, J. Wong and R.S.
Michalski for work leading to HFL/HCV.
• HFL/HCV appears to be underrepresented in
literature citations.
Overview
1. Represent the negative training data as row
vectors in a matrix.
2. Process positive examples as they come to
eliminate uninformative attributes in the
negative examples.
3. Read conjunctive rules from the resulting
matrix.
4. Simplify and cleanup the rules.
Positive and Negative Examples
• A positive example (PE):


ek  (v1k
,..., v ak
)
(overcast ,mild, high, windy)  Play
• A negative example (NE):


ek  (v1k
,..., vak
)
(rainy,hot,high, windy)  Don’t Play
Negative example matrix (NEM)
Gather the negative examples as row vectors in
the NEM:
 rainy
 rainy

 sunny

 sunny
hot
cool
hot
mild
high
normal
normal
high
windy 

windy

windy 

windy 
Positive Example (PE) Against NEM
A positive example written as a row vector:
overcast
 rainy
 rainy

 sunny

 sunny
mild
hot
cool
hot
mild
high windy
high
normal
normal
high
windy 

windy

windy 

windy 
Extension Matrices
• Delete any matching elements in the NEM
overcast
mild
high windy
high
 rainy hot
 rainy cool norm al

 sunny hot norm al

high
 sunny m ild
windy

windy
windy

windy
Extension Matrices
• We construct one Extension Matrix per
Positive Example
 rainy hot
 rainy cool

 sunny hot

*
 sunny
*
norm al
norm al
*
*

*
*

*
Extension Matrices
Let’s make a second extension matrix:
overcast
 rainy
 rainy

 sunny

 sunny
mild normal calm
hot
cool
hot
mild
high
normal
normal
high
windy 

windy

windy 

windy 
Extension Matrices
The second extension matrix:
 rainy hot high
 rainy cool
*

 sunny hot
*

*
high
 sunny
windy

windy
windy

windy
Extension Matrices
Finally let’s make a third extension matrix:
rainy
high
calm
high
 rainy hot
 rainy cool norm al

 sunny hot norm al

high
 sunny m ild
windy

windy
windy

windy
hot
Extension Matrices
The third extension matrix:
*
*
 *
 *
cool norm al

 sunny hot norm al

*
 sunny m ild
windy

windy
windy

windy
Dead Elements
• Dead Elements, *, take the place of attributes
that fail to distinguish the negative example from
the corresponding positive example.
rainy
high
calm
*
*
 *
 *
cool norm al

 sunny hot norm al

*
 sunny m ild
windy

windy
windy

windy
hot
Matrix Disjunction (EMD)
• If there exists a dead element in any position of
the extension matrices, the EMD will have a
dead element there, too.
 rainy hot
 rainy cool

 sunny hot

*
 sunny
*
norm al
norm al
*
*
*
*

*
 rainy hot high
 rainy cool
*

 sunny hot
*

*
high
 sunny
 *
 *

 sunny

 sunny
windy
windy
windy

windy
*
cool
*
*
*
*
*
*
*
*
 *
 *
cool norm al

 sunny hot norm al

*
 sunny m ild
*

*
*

*
windy
windy
windy

windy
“OR” the
dead elements
Partitions
• Once a dead row would be created, start a new EMD.
Partition 1
 rainy hot
 rainy cool

 sunny hot

*
 sunny
*
norm al
norm al
*
*
*
*

*
 rainy
 rainy

 sunny

 sunny
 rainy hot high
 rainy cool
*

 sunny hot
*

*
high
 sunny
hot
cool
hot
*
windy
windy
windy

windy
*
*
*
*
*
*

*

*
Partition 2
*
*
 *
 *
cool
norm
al

 sunny hot norm al

*
 sunny m ild
 *
 *

 sunny

 sunny
windy
windy
windy

windy
*
cool
*
*
…
*
*
*
*
*
*

*

*
Matrix Disjunction (EMD)
• Let’s construct the EMD using just the first two
Extension Matrices.
 rainy hot
 rainy cool

 sunny hot

*
 sunny
*
norm al
norm al
*
*
*
*

*
 rainy hot high
 rainy cool
*

 sunny hot
*

*
high
 sunny
 rainy
 rainy

 sunny

 sunny
windy
windy
windy

windy
hot
cool
hot
*
*
*
*
*
*

*
*

*
“OR” the
dead elements
Matrix Disjunction (EMD)
• The EMD has dramatically reduced the amount
of superfluous information.
 rainy hot
 rainy cool

 sunny hot

*
 sunny
*
norm al
norm al
*
*
*
*

*
 rainy hot high
 rainy cool
*

 sunny hot
*

*
high
 sunny
 rainy
 rainy

 sunny

 sunny
windy
windy
windy

windy
hot
cool
hot
*
*
*
*
*
*

*
*

*
“OR” the
dead elements
Paths
• Choose one non-dead element from each row.
This is called a path.
 rainy
 rainy

 sunny

 sunny
hot
cool
hot
*
*
*
*
*
*
*

*

*
• We can create paths in EMs and EMDs.
Path  Cover ≡ Conjunctive Formula
• The path corresponds to a conjuctive formula
expressed in variable-valued logic.
 rainy
 rainy

 sunny

 sunny
hot
cool
hot
*
*
*
*
*
*
*

*

*
[Outlook  [rainy, sunny]]  Don' t _ Play
Path = Cover ≡ Conjunctive Formula
 rainy
 rainy

 sunny

 sunny
hot
cool
hot
*
*
*
*
*
[Outlook  [ sunny]]
[Tem perature  [hot, cool]]
 Don' t _ Play
*

*
*

*
HFL
Wu developed HFL to find good rules. An
algorithm with 4 strategies, it finds a compact
disjunction of conjunctions:
1.
2.
3.
4.
Fast
Precedence
Elimination
Least Frequency
HFL Strategies: Fast
X3≠1 covers all negative examples.
X3≠1 => positive class.
We can stop processing.
X1 X2 X3 X4
1


1


1

1

 1 

0 1 0
0 1 

0 1 
0 1  
 1 0 
HFL Strategies: Precedence
• [X1≠1] and [X3≠1] are inevitable selectors.
• Record conjunction and label the rows as covered.
• Below, a path is formed. All rows are covered. We are
done.
X1 X2 X3
1


1


1



 

0 1
0 

0 1
0  
 1
[ X 1  [1]]
[ X 3  [1]]
 Positive_ Class
[ X 1  [1]]
 [ X 3  [1]]
 Negative_ Class
HFL Strategies: Elimination
• Redundant selectors in attribute X2 can be
eliminated because non-dead X3 values cover all
of the rows covered by X2.
• All elements in column X2 become dead elements.
X1 X2 X3 X4
1


1


1

1

 1 

0 1 0
0 1 

0 1 
0 1  
  0 
X1 X2 X3 X4

1


1


1

1

 1 

 1 0
 1 

 1 
 1  
  0 
HFL Strategies: Least Frequency
• Attribute X1 selectors are least frequent and can
be eliminated.
• Other strategies must be applied before applying
Least Frequency again.
X1 X2 X3
1


1


1



 1

0 1
0 

0 1
0  
0 1
X1 X2 X3











 1

0 1
0 

0 1
0  
0 1
HCV Algorithm
•
HCV improves HFL:
1. Partition the positive examples into intersecting
groups.
2. Apply HFL on each partition
3. OR the conjunctive formulae from each partition.
Well described in:
http://www.cs.uvm.edu/~xwu/Publication/JASIS.ps
See Wu’s 1993 Ph.D dissertation for more background:
http://www.era.lib.ed.ac.uk/bitstream/1842/581/3/1993xindongw.pdf
HCV Software
• Features many refinements and switches
• Works with C4.5 data.
• Can be run through a web interface:
HCV Online Interface
• Is described in Appendix A of Wu’s textbook, and
online:
HCV Manual
Golf
Rules for the 'Play' class (Covering 3 examples):
The 1st conjunctive rule:
[ temperature != { cool } ] ^ [ outlook != { sunny } ] -->
the 'Play' class (Positive examples covered: 3)
Rules for the 'Don't_Play' class (Covering 4 examples):
The 2nd conjunctive rule:
[ outlook != { overcast } ] ^ [ wind = { windy } ] --> the
'Don't_Play' class (Positive examples covered: 4)
The total number of conjunctive rules is: 2
The default class is: 'Don't_Play' (Examples in class: 4)
Time taken for induction (seconds): 0.0 (real), 0.0
(user), 0.0 (system)
Rule file or preprocessed test file not found. Skipping
deduction
HCV
• HCV is competitive with other decision tree and
rule producing algorithms.
• HCV generally produces more compact rules.
• HCV outputs variable-valued logic.
• HCV handles noise and discretization.
• HCV guarantees a “conjunctive rule for a
concept”.
Ideas
• Can HFL/HCV be applied to chess? Bratko did
this with ID3. [Crevier 1993, 177]
• How can HCV be parallelized?
• How does the extension matrix approach work in
closed-world situations?
• Is HCV 2.0 a good candidate for automated
parameter tuning by genetic algorithm or other
evolutionary technique?
The End.
• Presentation based on slides by Leslie Damon.
• Questions?
Exam Questions
• Definitions:
Extension Matrix: a matrix of negative examples as row
vectors, where, for a given positive example,
elements that match the positive example are
replaced with dead elements, denoted as ‘*’.
Dead Element: an element of a negative example which
cannot be used to distinguish a given positive
example from the negative example.
Path: a set of non-dead elements, one each from all of
the rows of an extension matrix.
Exam Questions
•
Four stages of HFL:
1. Fast: A single attribute value that covers all rows
2. Precedence: Favor attributes that are the only nondead element of a row.
3. Elimination: Get rid of redundant elements.
4. Least Frequency: Get rid of columns that cover
where non-dead values cover the fewest rows.
See slides labeled “HFL Strategies”
Exam Questions
• The Pneumonia/Tuberculosis problem is worked through
in the paper and Leslie Damon’s slides. Here is the
EMD:
absent
 *

 *

absent
 *
* strip
* hole
* strip
*
*
*
fast
*
fast
*
fast
*
normal 
* 

normal 

* 
normal 