Transcript Deep Boltzmann Machines
Deep Boltzman machines
Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh
Outline Problems with some other methods!
Energy based models Boltzmann machine Restricted Boltzmann machine Deep Boltzmann machine
Problems with other methods!
Supervised learning need labeled data.
Amount of information restricted by labels!
Finding and knowing abnormalities before ever seeing them such as some conditions in a nuclear power plant.
So Instead of learning p(label | data) learn p(data)
Energy Based Models Some Energy function is defined .
Energy function shows score (scalar value) assigned to a configuration.
Ex. 𝑝 𝑥 = 𝑒 −𝐸 𝑥 , Boltzman (Gibbs) Distribution.
𝑍 Normalization factor (partition): observations.
𝑍 = Σ x e −E x , integral of numerator over all Parameters that lead to lower energy are desired.
Boltzmann machine Markov random field (MRF) with hidden variables.
Undirected edges representing dependency. Weights can be assigned.
Conditional distributions over hidden and visible units:
Learning process Parameters update: Exact maximum likelihood learning is intractable.
Use Gibbs sampling to approximate.
Run 2 separate Markov chains to approximate them.
Restricted Boltzmann Machine Setting 𝐿 = 0, 𝐽 = 0 .
Without visible-visible and hidden-hidden connections!
Learning carried out efficiently using Contrastive Divergence (CD) Or Stochastic approximation procedure (SAP) Variational Approach to estimating data-dependent expectations.
Stochastic approximation procedure (SAP) 𝜃 𝑡 and 𝑋 𝑡 : current parameters and state 𝜃 𝑡 and 𝑋 𝑡 updated sequentially as : Given 𝑋 𝑡 , a new state 𝑋 𝑡+1 leaves 𝑝 𝑡ℎ𝑒𝑡𝑎𝑡 invariant.
sampled from a transition operator 𝑇 𝜃 𝑡 (𝑋 𝑡+1 ; 𝑋 𝑡 ) that New parameter 𝜃 𝑡+1 obtained by replacing intractable model’s expectation by expectation with respect to 𝑋 𝑡+1 Learning rate has to decrease with time, for example by 𝛼 𝑡 = 1/𝑡 .
Why go deep?
Why go deep?
Deep architectures are representationally efficient, fewer computational units for same function.
Allow for showing a hierarchy.
Non-local generalization Easier to monitor what is being learn and guide the machine.
Deep Boltzmann Machine Undirected connection between all layers.
Conditional distributions over visible and hidden:”
Pretraining (greedy layerwise)
MNIST dataset
NORB Misclassification Error rate: DBM : 10.8% , SVM:11.6% , logistic regression: 22.5% , K-nearest neighbors : 18.4%
Thank you!