Classification in Complex Systems
Download
Report
Transcript Classification in Complex Systems
Classification in Complex
Systems
Why we should look at the paper:
CAEP: Classification by Aggregating
Emerging Patterns
G. Dong, X. Zhang, L. Wong, and J Li
What are Common Problems
in Classification?
Many variables
Graphs that relate tuples
Protein-protein interactions (KDD-cup 02)
Citations (KDD-cup 03)
Anything that violates standard table
format
Many Variables
Solution:
Naïve Bayes way of multiplying probabilities
Other additive models
Problems:
Many factors
May be correlated
Noise
… but it gets worse
Graphs
2 kinds of attributes
How do neighbor attributes count?
Attributes within nodes
Attributes of neighbor and more distant nodes
Take disjunction?
“At least one neighbor that has a particular
property”
Probably preferable:
Use links or, more general, paths as basis
Integration into classification???
Idea
Get away from strict set of n attributes
If an attribute or combination of
attributes is “interesting” use them
Combining rules?
I would have guessed as in Naïve Bayes
CAEP adds probabilities!?
What is “interesting”
CAEP paper claims “growth rate”
Support of a rule increases significantly
from one class label to another
Note: Only increase, not decrease!
What does that mean?
For pattern e and classes P and N
growth_ratePN (e)
= suppN (e) / suppP (e)
2 Things Worth Investigating
Is “interestingness” measure related to
information gain?
Under certain assumptions: Yes
Can the “score” be justified?
Sum of P(C)!?
Other Issues
Normalization
Emerging patterns only consider increase
in support => different number of relevant
patterns
How to mine for EPs
Conclusions
Idea very valuable
Justification of details?
Classification split into ARM-step and rule
combination
Not great
Should be possible to do it right
accuracy ;-)
– with poorer