Presentation
Download
Report
Transcript Presentation
Submitted by:
Ankit Bhutani
(Y9227094)
Supervised by:
Prof. Amitabha Mukerjee
Prof. K S Venkatesh
AUTO-ASSOCIATIVE NEURAL NETWORKS
OUTPUT SIMILAR AS INPUT
BOTTLENECK CONSTRAINT
LINEAR ACTIVATION – PCA [Baldi et al.,
1989]
NON-LINEAR PCA [Kramer, 1991] – 5 layered
network
ALTERNATE SIGMOID AND LINEAR ACTIVATION
EXTRACTS NON-LINEAR FACTORS
ABILITY TO LEARN HIGHLY COMPLEX
FUNCTIONS
TACKLE THE NON-LINEAR STRUCTURE OF
UNDERLYING DATA
HEIRARCHICAL REPRESENTATION
RESULTS FROM CIRCUIT THEORY – SINGLE
LAYERED NETWORK WOULD NEED
EXPONENTIALLY HIGH NUMBER OF
HIDDEN UNITS
DIFFICULTY IN TRAINING DEEP NETWORKS
NON-CONVEX NATURE OF OPTIMIZATION
GETS STUCK IN LOCAL MINIMA
VANISHING OF GRADIENTS DURING
BACKPROPAGATION
SOLUTION
-``INITIAL WEIGHTS MUST BE CLOSE TO A
GOOD SOLUTION’’ – [Hinton et. al., 2006]
GENERATIVE PRE-TRAINING FOLLOWED BY
FINE-TUNING
PRE-TRAINING
INCREMENTAL LAYER-WISE TRAINING
EACH LAYER ONLY TRIES TO REPRODUCE THE
HIDDEN LAYER ACTIVATIONS OF PREVIOUS
LAYER
INITIALIZE THE AUTOENCODER WITH
WEIGHTS LEARNT BY PRE-TRAINING
PERFORM BACKPROPOAGATION AS
USUAL
STOCHASTIC – RESTRICTED BOLTZMANN
MACHINES (RBMs)
HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A
PROBABILISTIC DECISION OF PUTTING 0 OR 1
MODEL LEARNS THE JOINT PROBABILITY OF 2
BINARY DISTRIBUTIONS - 1 IN INPUT AND THE
OTHER IN HIDDEN LAYER
EXACT METHODS – COMPUTATIONALLY
INTRACTABLE
NUMERICAL APPROXIMATION - CONTRASTIVE
DIVERGENCE
DETERMINISTIC – SHALLOW
AUTOENCODERS
HIDDEN LAYER ACTIVATIONS (0-1) ARE
DIRECTLY USED FOR INPUT TO NEXT LAYER
TRAINED BY BACKPROPAGATION
DENOISING AUTOENCODERS
CONTRACTIVE AUTOENCODERS
SPARSE AUTOENCODERS
TASK \ MODEL RBM
SHALLOW AE
CLASSIFIER
[Hinton et al, 2006]
and many others
since then
Investigated by
[Bengio et al, 2007],
[Ranzato et al,
2007], [Vincent et
al, 2008], [Rifai et
al, 2011] etc.
DEEP AE
[Hinton &
Salakhutdinov, 2006]
No significant
results reported
in literature - Gap
MNIST
Big and Small Digits
Square & Room
2d Robot Arm
3d Robot Arm
Libraries used
Numpy, Scipy
Theano – takes care of parallelization
GPU Specifications
Memory – 256 MB
Frequency – 33 MHz
Number of Cores – 240
Tesla C1060
REVERSE CROSS-ENTROPY
X – Original input
Z – Output
Θ – Parameters – Weights and Biases
RESULTS FROM PRELIMINARY
EXPERIMENTS
TIME TAKEN FOR TRAINING
CONTRACTIVE AUTOENCODERS TAKE
VERY LONG TO TRAIN
EXPERIMENT USING SPARSE
REPRESENTATIONS
STRATEGY A – BOTTLENECK
STRATEGY B – SPARSITY + BOTTLENECK
STRATEGY C – NO CONSTRAINT + BOTTLENECK
MOMENTUM
INCORPORATING THE PREVIOUS UPDATE
CANCELS OUT COMPONENTS IN OPPOSITE
DIRECTIONS – PREVENTS OSCILLATION
ADDS UP COMPONENTS IN SAME DIRECTION –
SPEEDS UP TRAINING
WEIGHT DECAY
REGULARIZATION
PREVENTS OVER-FITTING
USING ALTERNATE LAYER SPARSITY WITH
MOMENTUM & WEIGHT DECAY YIELDS
BEST RESULTS
MOTIVATION