pf2129_presentation.ppt

Download Report

Transcript pf2129_presentation.ppt

Protein Fold Recognition
with Relevance Vector
Machines
Patrick Fernie
COMS 6772
Advanced Machine Learning
12/05/2005
Relevance Vector Machine
A Bayesian treatment of a generalized
linear model
Yields a formulation similar to that of a
Support Vector Machine
Hyperparameters Instead of Margin/Costs
Relevance Vector Machine
SVM
RVM
Hard Binary Outputs or
Point Estimates
Requires a Mercer
Kernel
Must Determine Suitable
Cost and Insensitivity
Values
Sparse (USPS ~2500)
Probabilistic Outputs
Can Use Arbitrary
Kernel
“Nuisance” Values
Automatically
Determined
Sparser USPS (~316!)
Relevance Vector Machine
Can’t Use qp()
Must solve iteratively (Sequential
Minimization Optimization)
As we iterate, many hyperparameters (αi)
values become arbitrarily large; allows
pruning.
Relevance Vector Machine
Faster Algorithm (Still not SVM fast)
Minimizes Number of Active Kernel
Functions to Reduce Computation Time
Analytic Approach to Pruning/Adding Basis
Functions
Protein Fold Recognition
Protein Structure Families
Many Fold Families
Not Necessarily Directly Related by
Protein Sequence
Protein Fold Recognition
Prime Situation for Machine Learning
Techniques!
NN, SVM, etc.
Large Number of Classes
Protein Fold Recognition
27 Fold Families
Train Many 2-Class Classifiers



One vs. Others – False Positives
Unique One vs. Others – Like One vs. Others,
with Another Round of Training
All vs. All – Requires a Lot of Classifiers!
RVMs & Protein Folds
Why RVMs?




Probabilistic Outputs
Sparsity (useful only in assessment)
True Multiclass Prediction
No Need to Find “Nuisance” Parameters
Issues/Future Work
Optimize RVM Classification
Implement True Multiclass
Reduced Greediness and Sequential
Convergence Optimization
Novel Kernels?
References
M. Tipping, “The Relevance Vector Machine”,
http://www.relevancevector.com
M. Tipping, “Sparse Bayesian Learning and the
Relevance Vector Machine”, JMLR, 2001 1:211244.
M. Tipping and A. Faul, “Fast Marginal
Likelihood Maximisation for Sparse Bayesian
Models”, http://www.relevancevector.com
C. Ding and I. Dubchak, “Multi-class Protein Fold
Recognition Using Support Vector Machines”,
http://www.kernel-machines.org