Deep Boltzmann Machines

Download Report

Transcript Deep Boltzmann Machines

Deep Boltzman machines

Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh

Outline      Problems with some other methods!

Energy based models Boltzmann machine Restricted Boltzmann machine Deep Boltzmann machine

Problems with other methods!

    Supervised learning need labeled data.

Amount of information restricted by labels!

Finding and knowing abnormalities before ever seeing them such as some conditions in a nuclear power plant.

So Instead of learning p(label | data) learn p(data)

Energy Based Models   Some Energy function is defined .

Energy function shows score (scalar value) assigned to a configuration.

   Ex. 𝑝 𝑥 = 𝑒 −𝐸 𝑥 , Boltzman (Gibbs) Distribution.

𝑍 Normalization factor (partition): observations.

𝑍 = Σ x e −E x , integral of numerator over all Parameters that lead to lower energy are desired.

Boltzmann machine   Markov random field (MRF) with hidden variables.

Undirected edges representing dependency. Weights can be assigned.

 Conditional distributions over hidden and visible units:

Learning process  Parameters update:    Exact maximum likelihood learning is intractable.

Use Gibbs sampling to approximate.

Run 2 separate Markov chains to approximate them.

Restricted Boltzmann Machine   Setting 𝐿 = 0, 𝐽 = 0 .

Without visible-visible and hidden-hidden connections!

    Learning carried out efficiently using Contrastive Divergence (CD) Or Stochastic approximation procedure (SAP) Variational Approach to estimating data-dependent expectations.

Stochastic approximation procedure (SAP)    𝜃 𝑡 and 𝑋 𝑡 : current parameters and state 𝜃 𝑡 and 𝑋 𝑡 updated sequentially as :   Given 𝑋 𝑡 , a new state 𝑋 𝑡+1 leaves 𝑝 𝑡ℎ𝑒𝑡𝑎𝑡 invariant.

sampled from a transition operator 𝑇 𝜃 𝑡 (𝑋 𝑡+1 ; 𝑋 𝑡 ) that New parameter 𝜃 𝑡+1 obtained by replacing intractable model’s expectation by expectation with respect to 𝑋 𝑡+1 Learning rate has to decrease with time, for example by 𝛼 𝑡 = 1/𝑡 .

Why go deep?

Why go deep?

 Deep architectures are representationally efficient, fewer computational units for same function.

 Allow for showing a hierarchy.

 Non-local generalization  Easier to monitor what is being learn and guide the machine.

Deep Boltzmann Machine  Undirected connection between all layers.

 Conditional distributions over visible and hidden:”

Pretraining (greedy layerwise)

MNIST dataset

NORB Misclassification Error rate: DBM : 10.8% , SVM:11.6% , logistic regression: 22.5% , K-nearest neighbors : 18.4%

Thank you!