How Microsoft had made deep learning red-hot in IT
Download
Report
Transcript How Microsoft had made deep learning red-hot in IT
How Microsoft Had
Made Deep Learning
Red-Hot in IT Industry
Zhijie Yan, Microsoft Research Asia
USTC visit, May 6, 2014
Self Introduction
@MSRA鄢志杰
996 – studied in USTC from 1999 to 2008
Graduate student – studied in iFlytek speech lab from
2003 to 2008, supervised by Prof. Renhua Wang
Intern – worked in MSR Asia from 2005 to 2006
Visiting scholar – visited Georgia Tech in 2007
FTE – worked in MSR Asia since 2008
Research interests
Speech, deep learning, large-scale machine learning
In Today’s Talk
Deep learning becomes very hot in the past few years
How Microsoft had made deep learning hot in IT
industry
Deep learning basics
Why Microsoft can turn all these ideas into reality
Further reading materials
How Hot is Deep Learning
“This announcement
comes on the heels
of a $600,000 gift
Google awarded
Professor Hinton’s
research group to
support further work
in the area of neural
nets.” – U. of T.
website
How Hot is Deep Learning
How Hot is Deep Learning
How Hot is Deep Learning
How Hot is Deep Learning
Microsoft Had Made Deep
Learning Hot in IT Industry
Initial attempts made by University of Toronto had
shown promising results using DL in speech recognition
on TIMIT phone recognition task
Prof. Hinton’s student visited MSR as an intern, good
results were obtained on Microsoft Bing voice search
task
MSR Asia and Redmond collaborated and got
amazing results on Switchboard task, which shocked
the whole industry
Microsoft Had Made Deep
Learning Hot in IT Industry
*figure borrowed from MSR principal researcher Li DENG
Microsoft Had Made Deep
Learning Hot in IT Industry
Followed by others and results were confirmed in
various different speech recognition tasks
Google / IBM / Apple / Nuance / 百度 / 讯飞
Continuously advanced by MSR and others
Expand to solve more and more problems
Image processing
Natural language processing
Search
…
Deep Learning From Speech
to Image
ILSVRC-2012 competition on ImageNet
Classification task: classify an image into 1 of the 1,000
classes in your 5 bets
lifeboat
airliner
school bus
Institution
Error rate (%)
University of Amsterdam
29.6
XRCE/INRIA
27.1
Oxford
27.0
ISI
26.2
Deep Learning From Speech
to Image
ILSVRC-2012 competition on ImageNet
Classification task: classify an image into 1 of the 1,000
classes in your 5 bets
lifeboat
airliner
school bus
Institution
Error rate (%)
University of Amsterdam
29.6
XRCE/INRIA
27.1
Oxford
27.0
ISI
26.2
SuperVision
16.4
Deep Learning Basics
Deep learning deep neural networks multi-layer
perceptron (MLP) with a deep structure (many hidden
layers)
Output layer
Output layer
W1
Hidden layer
W0
Input layer
W3
Hidden layer
W2
Hidden layer
W1
Hidden layer
W0
Input layer
Deep Learning Basics
Sounds not new at all? Sounds familiar like you’ve
learned in class?
Things not change over the years
Network topology / activation functions / …
Backpropagation (BP)
Things changed recently
Data Big data
General-purpose computing on graphics processing units
(GPGPU)
“A bag of tricks” accumulated over the years
E.g. Deep Neural Network
for Speech Recognition
Three key
components that
make DNN-HMM
work
Many layers of
nonlinear
feature
transformation
Tied triphones as the
basis units for
HMM states
Long window
of frames
*figure borrowed from MSR
senior researcher Dong YU
E.g. Deep Neural Network
for Image Classification
The ILSVRC-2012 winning solution
*figure copied from Krizhevsky, et al., “ImageNet Classification with
Deep Convolutional Neural Networks”
Scale Out Deep Leaning
Training speed was a major problem of DL
Speech recognition model trained with 1,800-hour data
(~650,000,000 vector frames) costs 2 weeks using 1 GPU
Image classification model trained with ~1,000,000 figures
costs 1 weeks using 2 GPUs*
How to scale out if 10x, 100x training data becomes
available?
*Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”
DNN-GMM-HMM
Joint work with USTC-MSRA Ph.D. program student,
Jian XU (许健, 0510)
The “DNN-GMM-HMM” approach for speech
recognition*
DNN as hierarchical nonlinear feature extractor, trained
using a sub-set of training data
GMM-HMM as acoustic model, trained using full data
*Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNNderived features in GMM-HMM based acoustic modeling for LVCSR”
DNN-GMM-HMM
GMM-HMM modeling of DNN-derived features:
combine the best of both worlds
DNNderived
features
PCA
HLDA
Tied-state
WE-RDLT
MMI
sequence
training
CMLLR
unsupervised
adaptation
Experimental Results
300hr DNN (18k states, 7 hidden layers) + 2,000hr
GMM-HMM (18k states)*
Training time reduced from 2 weeks to 3-5 days
Word Error Rate (%)
DNN-HMM (CE)
DNN-GMM-HMM (RDLT)
DNN-GMM-HMM (MMI)
DNN-GMM-HMM (UA)
15.4
14
14.7
10% WERR
13.8
12
15% WERR
13.1
10
*Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNNderived features in GMM-HMM based acoustic modeling for LVCSR”
A New Optimization Method
Joint work with USTC-MSRA Ph.D. program student, Kai
Chen (陈凯, 0700)
Using 20 GPUs, time needed to train a 1,800-hour
acoustic model is cut from 2 weeks to 12 hours,
without accuracy loss
The magic is to be published
We believe the scalability issue in DNN training for
speech recognition is now solved!
Why Microsoft Can Do All
These Good Things
Research
Bridge the gap between academia and industry via our
intern and visiting scholar programs
Scale out from toy problems to real-world industry-scale
applications
Product team
Solve practical issues and deploy technologies to serve
users worldwide via our services
All together
We continuously improve our work towards larger scale,
higher accuracy, and to tackle more challenging tasks
Finally
We have big-data + world-leading computational
infrastructure
If You Want to Know More
About Deep Learning
Neural networks for machine learning:
https://class.coursera.org/neuralnets-2012-001
Prof. Hinton’s homepage:
http://www.cs.toronto.edu/~hinton/
DeepLearning.net: http://deeplearning.net/
Open-source
Kaldi (speech): http://kaldi.sourceforge.net/
cuda-convent (image):
http://code.google.com/p/cuda-convnet/
Thanks!