Transcript lecture21
Today’s Topics • More on DEEP ANNs – Convolution – Max Pooling – Drop Out • Final ANN Wrapup • FYI: Some Resources 11/10/15 • http://deeplearning.net/ • http://googleresearch.blogspot.com/2015/11/tensorflow-googles-latestmachine_9.html • https://research.facebook.com/blog/879898285375829/fair-open-sourcesdeep-learning-modules-for-torch/ CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 1 Back to Deep ANNs - Convolution & Max Pooling C = Convolution, MP = Max Pooling (next) (ie, a CHANGE OF REP) 11/10/15 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 2 Look for 8’s in all the Right Places • Imagine we have a great 8 detector expressed as an 8x8 array of 0-1’s (see upper left) • We want to find all the 8’s in an 1024x1024 image of 0-1’s • Q: What might we do? • A: ‘Slide’ the detector across the image and count # of matching bits in the detector and the ‘overlaid’ image 888 etc If count greater than some threshold, say an ‘8’ is there 11/10/15 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 3 Look for 8’s in all the Right Places • Q: What about 8’s in the image larger than 8x8 bits? • A: Use ‘detectors’ of, say, 16x16, 32x32, 64x64, etc • PS: Could also ‘slide’ slightly rotated 8’s of various sizes (too much rotation and becomes infinity symbol!) 888 11/10/15 etc CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 4 Input units rep’ing the image Back to Deep ANNs - Convolution (cont.) The ‘sliding window’ is the basic idea of convolution – but each ‘template’ is a HU and the wgts are learned – some HUs are coupled – each group of HUs learns what to ‘look for’ – we do hard-code the ‘size’ of the ‘template’ 11/10/15 HU1 HU2 Our code would employ weight sharing, ie the corresponding weights in each HU (eg, the two thicker lines) would always have the same value Note: HU77, say, would connect to the same INPUTS as HU1 but would have different wgts, ie would be a different ‘feature dectors’ CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 5 BACKUP: A Possibly Helpful Slide on Convolution from the Web 11/10/15 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 6 Back to Deep ANNs - Max Pooling Researchers have empirically found it helpful to ‘clean up’ the convolution scores by – Creating the next layer of HUs where each HU holds the MAX score in an N N array, for various values of N and across various locations – This is called MAX POOLING (example on next slide) – Advanced note (not on final): I’m not sure if people (a) use the differentiable ‘soft max’ (https://en.wikipedia.org/wiki/Softmax_function) and BP through all nodes or (b) only BP through the max node; I’d guess (b) 11/10/15 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 7 Back to Deep ANNs - Max Pooling Example (connections not shown) Possible Nodes in Hidden Layer i + 1 9 4x4 max Hidden Layer i -4 0 11/10/15 5 -3 4 2 6 5 6 8 9 2x2 max, non overlapping -3 7 8 -5 9 3 0 -4 1 5 5 6 8 8 9 8 8 9 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 2x2 max, overlapping (contains nonoverlapping, so no need for both) 8 Back to Deep ANNs - Drop Out (from Hinton’s Group) Each time one example is processed (forward + back prop) during TRAINING Randomly turn off (‘ie, drop out’) a fraction (say, = ½) of the input and hidden units During TESTING scale all weights by (1 - ), since that is the fraction of time each unit was present during training (ie, so on average, weighted sums are the same) Adds ROBUSTNESS – need to learn multiple ways to compute the function being learned 11/10/15 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 9 Back to Deep ANNs - Drop Out as an Ensemble Drop Out can be viewed as training an ensemble of ‘thinned’ ANNs We implicitly store O(2N) networks in 1, where N = # nonoutput nodes - ie, consider all possible ANNs that one can construct by ‘thinning’ the non-output nodes in the original ANN - in each Drop Out step we are training ONE of these (but note that ALL since wgts shared) becomes 11/10/15 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 10 Warning: At the Research Frontier • Research on Deep ANNs changing rapidly, lot of IT-industry money dedicated to it • Until recently, people used unsupervised ML to train all the HU layers except the final one (surprisingly, BP works through many levels when much data!) • So this ‘slide deck’ likely to be out of date soon, if not already 11/10/15 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 11 Neural Network Wrapup • ANNs compute weighted sums to make decisions • Use (stochastic) gradient descent to adjust weights in order to reduce error (or cost) Only find local minima, though (but good enough!) • Impressive testset accuracy, especially Deep ANNs on (mainly) vision tasks and natural language tasks • Slow training (GPUs, parallelism, advanced optimization methods, etc help) • Learned models hard to interpret 11/10/15 CS 540 - Fall 2015 (Shavlik©), Lecture 21, Week 10 12