Relational Evaluation Techniques - Graph-RAT
Download
Report
Transcript Relational Evaluation Techniques - Graph-RAT
Relational Evaluation Techniques
Daniel McEnnis
Outline
Definition
Component Overview
Existing Approaches
Descriptions of the Components
Applications and Examples
1/29
Relational Evaluation Techniques
Definition
Experimental setup for evaluating the
performance of algorithms that use data
that span more than one table or
instance vector
Can use either relational algebra or
hypergraph-based descriptions
2/29
Components
Data Acquisition
Ground Truth Acquisition
Cross-Validation Technique
Query Type
Scoring Metric
Significance Test
3/29
Existing Approaches
Machine Learning
Relational Machine Learning
TREC
Collaborative Filtering
ISMIR
Social Network Analysis
4/29
Machine Learning
Predetermined flat data, no sampling
Predetermined ground truth
Typically simple queries
Sophisticated cross-validation
Basic set based metrics
No significance tests
5/29
Relational Machine Learning
Predetermined relational data
Predetermined ground truth
Predefined simple query
Sophisticated cross-validation
Basic set-based metrics
No significance tests
6/29
TREC
Predetermined flat data
Sophisticated ground truth sampling.
Sophisticated queries
Machine-learning cross-validation
Ranked set-of-sets scoring
Simple significance tests
7/29
Collaborative Filtering
Predetermined flat/relational data
Predetermined ground truth
Simple, predefined query
No cross-validation
Sophisticated Scoring metrics
No significance tests
8/29
ISMIR
Sampled flat data
Predetermined ground truth
Sophisticated queries
Machine-learning cross validation
Simple set based scoring metrics
Sophisticated significance tests
9/29
Social Network Analysis
Sophisticated data sampling
Sophisticated statistical techniques
10/29
Sequences of Choices
Plug ‘n play an experiment
Different aspects are evaluated
Some algorithms simply don’t work
Extensive algorithm rewrites sometimes
needed
11/29
Data Acquisition
Data structure
Where is it?
What sampling technique to use
Random Access
Snowball
Hypergraph
Snowball
How much data is needed?
12/29
Ground Truth Acquisition
What is being tested?
TREC extended ground truth sampling
Structure of the output
13/29
Cross-Validation
Actor Based
Link Based
Graph Based
No Cross Validation
14/29
Graph Notation
Actor definition
Link definition
Graph definition
Database table / instance vector
equivalence
Foreign key / link equivelance
15/29
Actor Cross-Validation
Traditional Machine Learning approach
Divisions by database table
Folds usually random assignment
Works well on flat data
Trouble with relational data
16/29
Link Cross Validation
Rare machine learning approach
Divisions by foreign key reference
Less statistical independence than actor
Works for collaborative filtering
Usually random assignment
17/29
Graph Cross Validation
Relational Machine Learning
Divisions by predetermined discrete
graphs
Statistical independence
Non-learning based approaches
Clustering based fold generation
18/29
No Cross Validation
Standard over fitting problems
Useful after implied cross-validation
19/29
Query Type
Information Need definition
Actor based query
Set or List based query
Conditional queries
20/29
Scoring Metrics
Comparisons against ground truth
Set based metrics
Ranked based metrics
List based metrics
21/29
Set Based Metrics
Recall and Precision
F-Measure
Mean Average Performance
22/29
Ranked List Metrics
Pearson Correlation
Spearmans Correlation
Mean Absolute Error
Linear Algebra Distance Metrics
Serendipity
23/29
Ordered List Metrics
Half Life
Kendall Tau
NDPM
Sequence Alignment Algorithms
Hamming Distance
24/29
Significance Tests
Pairwise student t-test
ANOVA
ANOVA/Tukey-Kramer statistical test
25/29
Evaluation Questions
Does the data contain time (global
ordered sequence)
Actor-, Link-, Graph-, or Set-based
queries
List, Set, or Set-of-Lists output
Contextual question or absolute
Statistical purity versus maximum
information
26/29
Music Recommendation
Example - Personalized Dynamic Tag Radio
LastFM profile data
LastFM tag data
Semantic Web data
Next-week-data ground truth
Conditional query
Graph cross-validation
Kendall Tau scoring metric
ANOVA/Tukey-Kramer statistical analysis
27/29
Conclusions
No one-size-fits-all
Data and ground-truth set the
framework
Question determines the final structure
Each discipline has a piece of the
answer
Graph-RAT 0.5
28/29
Future Work
Finish exploring Social Network
Analysis significance tests
Fully explore set-of-sets evaluation
metrics
Debugging of Graph-RAT crossvalidation schedulers
Ease of use improvements to GraphRAT
29/29