Statistical Learning Theory & Classifications Based on

Download Report

Transcript Statistical Learning Theory & Classifications Based on

Statistical Learning
Theory & Classifications
Based on Support Vector
Machines
The Nature of Statistical Learning Theory by V. Vapnik
2014: Anders Melen
2015: Rachel Temple
1
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
2
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
3
Empirical Data Modeling
Observations of a system are collected
Induction on observations are used to build up a
model of the system.
Model is then used to deduce responses of an
unobserved system.
Sampling is typically non-uniform
High dimensional problems will form a sparse
distribution in the input space
•
•
•
•
•
4
Modeling Error
•
Approximation error is the consequence of the
hypothesis space not fitting the target space
Globally Optimal Model
Best Reachable Model
Selected Model
5
Modeling Error
•
Approximation error is the consequence of the
hypothesis space not fitting the target space
Globally Optimal Model
Best Reachable Model
Selected Model
●
Goal
○
Choose a model from the hypothesis
space which is closest (w/ respect to some
error measure) to the function target
space
6
•
Estimation Error is the error between the best
model in our hypothesis space and the model within
our hypothesis space that we selected.
Approximation Error
Globally Optimal Model
Generalization Error
Best Reachable Model
Estimation Error
Selected Model
●
This forms the Generalization Error
7
• The Globally optimal model & the selected
model form the generalization error which
measures how well our data model adapts to
new and unobserved data
8
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
9
Statistical Learning Theory
Definition: “Consider the learning problem as a problem of finding a desired
dependence using a limited number of observations.” (Vapnik 17)
10
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
11
Model of Supervised Learning
•
Training
o
The supervisor takes
each generated x
value and returns an
output value y.
o
Each (x,y) pair is part
of the training set:
F(x,y) = F(x)
F(y|x) = (x1, y1) , (x2, y2), … , (xl,yl)
12
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
13
Risk Minimization
•
•
•
To find the best function, we need to measure loss
L(y, F(x,𝛂))
L is the discrepancy function which is based on the
y’s generated by the supervision and the ŷ’s
generated by the estimate functions
F is a predictor such that expected loss is minimized
14
Risk Minimization
•
Pattern Recognition
o With pattern recognition, the supervisor’s output y can
only take on 2 values, y = {0,1} and the loss takes the
following values.
○
So the risk function determines the probability of
different answers being given by the supervisor and
the estimation function.
15
Some Simplifications From Here On
●
Training Set
{(X1,Y1), … , (Xl,Yl)} → {Z1, … , Zl}
●
Loss Function
L(y, F(x,𝛂)) → Q(z,𝛂)
16
Empirical Risk Minimization (ERM)
●
We want to measure the risk over the training set
rather than the set of all
17
Empirical Risk Minimization (ERM)
●
The empirical risk must converge to the actual risk
over the set of loss functions
18
Empirical Risk Minimization (ERM)
●
In both directions!
19
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
20
Vapnik-Chervonenkis Dimensions
•
•
•
Lets just call them VC Dimensions
Developed by Alexey Jakovlevich Chervonenkis &
Vladimir Vapnik
The VC dimension is scalar value that measures the
capacity of a set of functions
21
Vapnik-Chervonenkis Dimensions
•
The VC dimension is a set of functions responsible
for the generalization ability of learning machines
•
The VC dimension of a set of indicator functions
Q(z,𝛂)𝛂 ∈ 𝞚 is the maximum number h of vectors
z1, …, zh that can be separated into two classes in
all 2h possible ways using functions of the set.
22
Upper Bound For Risk
•
It can be shown that
where
and h is the
VC dimension
is the confidence interval
23
Upper Bound For Risk
•
ERM only minimizes
and
,
the confidence interval, is fixed based on the VC
dimension of the set of functions determined by apriori
•
ERM must tune the confidence interval based on the
problem to avoid overfitting and underfitting
24
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
25
Structural Risk Management (SRM)
•
SRM attempts to minimize the right hand size of the
inequality over both terms simultaneously
26
Structural Risk Management (SRM)
The
term is dependent on a specific function’s error while the
term depends on the dimension of the space that the functions lives in.
•
The VC dimension is the controlling variable
27
Structural Risk Management (SRM)
•
•
We define the hypothesis space S to be the set of
functions:
Q(z,𝛂)𝛂 ∈ 𝞚
We say that Sk= {Q(z,𝛂)},𝛂 ∈ 𝞚k is the
hypothesis space of a VC dimension, k, such that:
28
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
29
Support Vector Machines (SVM)
•
Map input vectors x into a high-dimensional
feature space using a kernel function:
(zi, z) = K(x, xi)
30
Support Vector Machines (SVM)
•
Feature space… Optimal hyperplane…
What are you talking about...
31
Support Vector Machines (SVM)
32
Support Vector Machines (SVM)
●
Lets try a basic one dimensional example!
33
Support Vector Machines (SVM)
●
Aw snap, that was easy!
34
Support Vector Machines (SVM)
●
Ok, what about a harder one dimensional example?
35
Support Vector Machines (SVM)
●
Project the lower dimensional data into a higher
dimensional space just like in the animation!
36
Support Vector Machines (SVM)
●
There is several ways to implement a SVM
○
Polynomial Learning Machine (Like the
animation)
○
Radial Basis Function Machines
○
Two-Layer Neural Networks
37
Simple Neural Network
●
Neural Networks are computer science models
inspired by nature!
●
The brain is a massive natural neural network
consisting of neurons and synapses
●
Neural networks can be modeled using a graphical
model
38
Simple Neural Network
●
●
Neurons → Nodes
Synapses → Edges
Molecular Form
Neural Network Model
39
Two-Layer Neural Network
Kernel is a sigmoid
function
Implementing the rules
40
Two-Layer Neural Network
●
Using this technique the following are found
automatically:
i.
Architecture of a two-layer machine
i.
Determining N number of units in first layer (# of
support vectors)
i.
The vectors of the weights wi = xi in the first layer
i.
The vector of weights for the second layer (values
41
of 𝛂)
Conclusion
●
The quality of a learning machine is characterized by
three main components
a. How rich and universal is the set of functions that
the LM can approximate?
b. How well can the machine generalize?
c. How fast does the learning process for this
machine converge
42
Table of Contents
•
•
•
•
•
•
•
•
•
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Machines (SVM)
Exam Questions
Q & A Session
43
Exam Question #1
•
What is the main difference between Polynomial,
radial basis learning machines and neural network
learning machines? Also provide that difference for
the neural network learning machine
o The kernel function
44
Exam Question #2
•
What is empirical data modeling? Give a summary of the
main concept and its components
o Empirical data modeling is the induction of observations
to build up a model. Then the model is used to deduce
responses of an unobserved system.
45
Exam Question #3
•
What must the Remp(𝛂) do over the set of loss
functions?
o It must converge to the R(𝛂)
46
Table of Contents
•
•
•
•
•
•
•
•
•
•
o
Empirical Data Modeling
What is Statistical Learning Theory
Model of Supervised Learning
Risk Minimization
Vapnik-Chervonenkis Dimensions
Structural Risk Management (SRM)
Support Vector Classification
Optimal Separating Hyperplane & Quadratic Programming
Support Vector Machines (SVM)
Exam Questions
Q & A Session
47
End
Any questions?
48