Interactive Interaction Analysis Aleks Jakulin & Gregor Leban Faculty of Computer and Information Science University of Ljubljana Slovenia.

Download Report

Transcript Interactive Interaction Analysis Aleks Jakulin & Gregor Leban Faculty of Computer and Information Science University of Ljubljana Slovenia.

Interactive Interaction
Analysis
Aleks Jakulin & Gregor Leban
Faculty of Computer and Information Science
University of Ljubljana
Slovenia
Overview
1. Interactions:
–
Correlation can be generalized to more than 2 attributes, to
capture interactions - higher-order regularities.
2. Information theory:
–
A non-parametric approach for measuring ‘association’ and
‘uncertainty’.
3. Applications:
–
–
Visualizations of the domain uncover previously unseen
structure.
Software for interactive investigation of data assists the
user in identifying interesting patterns.
4. Importance:
–
Understanding possible problems and assumptions in
machine learning algorithms.
Attribute Dependencies
label
C
importance of attribute A
attribute
A
importance of attribute B
attribute correlation
B
attribute
2-Way
3-WayInteractions
Interaction:
What is common to A, B and C together;
and cannot be inferred from any subset of attributes.
Shannon’s Entropy
Entropy given C’s empirical probability distribution (p = [0.2, 0.8]).
A
C
H(A)
Information
which came with
I(A;C)=H(A)+H(C)-H(AC)
knowledge
A
Mutualofinformation
or information gain --How much have A and C in common?
H(C|A) = H(C)-I(A;C)
Conditional entropy --Remaining uncertainty
in C after knowing A.
H(AB)
Joint entropy
Interaction Information
I(A;B;C) :=
I(AB;C) - I(A;C) - I(B;C)
= I(A;B|C) - I(A;B)
• Interaction information can be:
– POSITIVE – synergy between attributes
– NEGATIVE – redundancy among attributes
– SMALL – nothing special about the 3-way relationship
Examples: A Useful Attribute
Mutual information or information gain
between the attribute and the label.
The only type of
odor that does not
unambiguously
predict the class
of the mushroom
(edible, inedible).
Another Useful Attribute
A Negative Interaction
The proportion of information
provided by either of the two
attributes.
This is the “overlap” between
both mutual informations.
A Negative Interaction
That’s the gain of
s-p-c if we already
know the odor.
The only column where
spore-print-color succeeded
in providing some
information in excess of
what we already knew from
odor.
One Somewhat Useful Attribute
A (Seemingly) Useless Attribute
Stalk-shape is totally
uninformative, as the class
distribution is similar at all
attribute values.
That’s why we cannot
distinguish between classes
using this attribute.
Surprise: A Positive Interaction!
Information gained by
holistic treatment of both
attributes!
Again, this is “new” mutual
information arising from
both attributes.
Why a Positive Interaction?
Specific attribute value
combinations that yield
perfect label predictions, but
only in combination of both
attributes
Whole Domain: Interaction Matrix
Interaction Graph
An Interaction Dendrogram
an unimportant interaction
a cluster of
negatively
interacting
attributes
a positive interaction
a weakly
negative
interaction
Information Diagram
redundancy
synergy
A dissected Venn diagram helps investigate higher-order interactions.
Multi-Dimensional Scaling
Interactive Interaction Analysis
Attributes of
interest
A sorted list of interactions,
ordered by the interaction
magnitude.
An interaction graph
Summary
1. There are relationships exclusive to
groups of n attributes.
2. Interaction information is a heuristic for
quantification of relationships with
entropy.
3. Visualization methods attempt to:
•
•
summarize the interactions in the domain
(interaction graph, interaction dendrogram),
assist the user in exploring the domain and
constructing classification models
(interactive interaction analysis).
Work in Progress
• Overfitting: the interaction information
computations do not account for the
increase in complexity.
• Support for numerical and ordered
attributes.
• Inductive learning algorithms which use
these heuristics automatically.
• Models that are based on the real
relationships in the data, not on our
assumptions about them.