Classification in Complex Systems

Download Report

Transcript Classification in Complex Systems

Classification in Complex
Systems
Why we should look at the paper:
CAEP: Classification by Aggregating
Emerging Patterns
G. Dong, X. Zhang, L. Wong, and J Li
What are Common Problems
in Classification?


Many variables
Graphs that relate tuples



Protein-protein interactions (KDD-cup 02)
Citations (KDD-cup 03)
Anything that violates standard table
format
Many Variables
Solution:
 Naïve Bayes way of multiplying probabilities
 Other additive models
Problems:
 Many factors
 May be correlated
 Noise
… but it gets worse
Graphs

2 kinds of attributes



How do neighbor attributes count?



Attributes within nodes
Attributes of neighbor and more distant nodes
Take disjunction?
“At least one neighbor that has a particular
property”
Probably preferable:


Use links or, more general, paths as basis
Integration into classification???
Idea



Get away from strict set of n attributes
If an attribute or combination of
attributes is “interesting” use them
Combining rules?


I would have guessed as in Naïve Bayes
CAEP adds probabilities!?
What is “interesting”

CAEP paper claims “growth rate”





Support of a rule increases significantly
from one class label to another
Note: Only increase, not decrease!
What does that mean?
For pattern e and classes P and N
growth_ratePN (e)
= suppN (e) / suppP (e)
2 Things Worth Investigating

Is “interestingness” measure related to
information gain?


Under certain assumptions: Yes
Can the “score” be justified?

Sum of P(C)!?
Other Issues

Normalization


Emerging patterns only consider increase
in support => different number of relevant
patterns
How to mine for EPs
Conclusions

Idea very valuable


Justification of details?


Classification split into ARM-step and rule
combination
Not great
Should be possible to do it right
accuracy ;-)
– with poorer