Transcript lecture21

Today’s Topics
• More on DEEP ANNs
– Convolution
– Max Pooling
– Drop Out
• Final ANN Wrapup
• FYI: Some Resources
11/10/15
•
http://deeplearning.net/
•
http://googleresearch.blogspot.com/2015/11/tensorflow-googles-latestmachine_9.html
•
https://research.facebook.com/blog/879898285375829/fair-open-sourcesdeep-learning-modules-for-torch/
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
1
Back to Deep ANNs
- Convolution & Max Pooling
C = Convolution, MP = Max Pooling (next)
(ie, a CHANGE OF REP)
11/10/15
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
2
Look for 8’s in all the
Right Places
• Imagine we have a great 8 detector expressed as
an 8x8 array of 0-1’s (see upper left)
• We want to find all the 8’s in an 1024x1024 image of 0-1’s
• Q: What might we do?
• A: ‘Slide’ the detector across the image and count # of
matching bits in the detector and the ‘overlaid’ image
888 etc
If count greater
than some
threshold, say
an ‘8’ is there
11/10/15
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
3
Look for 8’s in all the
Right Places
• Q: What about 8’s in the image larger than 8x8 bits?
• A: Use ‘detectors’ of, say, 16x16, 32x32, 64x64, etc
• PS: Could also ‘slide’ slightly rotated 8’s of various sizes
(too much rotation and becomes infinity symbol!)
888
11/10/15
etc
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
4
Input units rep’ing the image
Back to Deep ANNs
- Convolution (cont.)
The ‘sliding window’ is the
basic idea of convolution
– but each ‘template’ is a HU
and the wgts are learned
– some HUs are coupled
– each group of HUs learns
what to ‘look for’
– we do hard-code
the ‘size’ of the ‘template’
11/10/15
HU1
HU2
Our code would employ weight
sharing, ie the corresponding weights
in each HU (eg, the two thicker lines)
would always have the same value
Note: HU77, say, would connect to
the same INPUTS as HU1 but would
have different wgts, ie would be a
different ‘feature dectors’
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
5
BACKUP: A Possibly Helpful Slide
on Convolution from the Web
11/10/15
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
6
Back to Deep ANNs
- Max Pooling
Researchers have empirically found it
helpful to ‘clean up’ the convolution scores by
– Creating the next layer of HUs where each HU holds
the MAX score in an N  N array, for various values
of N and across various locations
– This is called MAX POOLING (example on next slide)
– Advanced note (not on final): I’m not sure if people (a) use the
differentiable ‘soft max’ (https://en.wikipedia.org/wiki/Softmax_function)
and BP through all nodes or (b) only BP through the max node;
I’d guess (b)
11/10/15
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
7
Back to Deep ANNs
- Max Pooling Example
(connections not shown)
Possible Nodes in Hidden Layer i + 1
9
4x4 max
Hidden Layer i
-4
0
11/10/15
5
-3
4
2
6
5
6
8
9
2x2 max,
non overlapping
-3
7
8
-5
9
3
0
-4
1
5
5
6
8
8
9
8
8
9
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
2x2 max,
overlapping
(contains nonoverlapping, so
no need for both)
8
Back to Deep ANNs
- Drop Out (from Hinton’s Group)
Each time one example is
processed (forward + back prop) during TRAINING
Randomly turn off (‘ie, drop out’) a fraction
(say,  = ½) of the input and hidden units
During TESTING scale all weights by (1 - ), since
that is the fraction of time each unit was present
during training (ie, so on average, weighted sums
are the same)
Adds ROBUSTNESS – need to learn multiple ways
to compute the function being learned
11/10/15
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
9
Back to Deep ANNs
- Drop Out as an Ensemble
Drop Out can be viewed as training
an ensemble of ‘thinned’ ANNs
We implicitly store
O(2N) networks in 1,
where N = # nonoutput nodes
- ie, consider all possible ANNs that one can
construct by ‘thinning’ the non-output nodes
in the original ANN
- in each Drop Out step we are training ONE of these
(but note that ALL since wgts shared)
becomes
11/10/15
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
10
Warning:
At the Research Frontier
• Research on Deep ANNs changing rapidly,
lot of IT-industry money dedicated to it
• Until recently, people used unsupervised
ML to train all the HU layers except the
final one (surprisingly, BP works through
many levels when much data!)
• So this ‘slide deck’ likely to be out of date
soon, if not already 
11/10/15
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
11
Neural Network Wrapup
• ANNs compute weighted sums to make decisions
• Use (stochastic) gradient descent to adjust
weights in order to reduce error (or cost)
Only find local minima, though (but good enough!)
• Impressive testset accuracy, especially Deep ANNs
on (mainly) vision tasks and natural language tasks
• Slow training (GPUs, parallelism, advanced
optimization methods, etc help)
• Learned models hard to interpret
11/10/15
CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10
12