Neural Networks

Download Report

Transcript Neural Networks

Neural Networks
Slides from: Doug Gray, David Poole
What is a Neural Network?
• Information processing paradigm that is
inspired by the way biological nervous
systems, such as the brain, process
information
• A method of computing, based on the
interaction of multiple connected
processing elements
What can a Neural Net do?
Compute a known function
Approximate an unknown function
Pattern Recognition
Signal Processing
Learn to do any of the above
Basic Concepts
A Neural Network generally
maps a set of inputs to a set
of outputs
Number of inputs/outputs is
variable
The Network itself is
composed of an arbitrary
number of nodes with an
arbitrary topology
Input 0
Input 1
...
Input n
Neural Network
Output 0
Output 1
...
Output m
Basic Concepts
Wb
Input 0
Input 1
...
Input n
W0
W1
...
Wn
+
+
fH(x)
Connection
Node
Output
Definition of a node:
• A node is an element
which performs the
function
y = fH(∑(wixi) + Wb)
Properties
Inputs are flexible


any real values
Highly correlated or independent
Target function may be discrete-valued, realvalued, or vectors of discrete or real values

Outputs are real numbers between 0 and 1
Resistant to errors in the training data
Long training time
Fast evaluation
The function produced can be difficult for
humans to interpret
Perceptrons
Basic unit in a neural network
Linear separator
Parts





N inputs, x1 ... xn
Weights for each input, w1 ... wn
A bias input x0 (constant) and associated
weight w0
Weighted sum of inputs, y = w0x0 + w1x1 + ...
+ wnxn
A threshold function (activation function), i.e 1 if
y > 0, -1 if y <= 0
Diagram
w1
x1
w0
x2
w2
.
.
.
xn
x0
Σ
Threshold
y = Σ wixi
wn
1 if y >0
-1 otherwise
Typical Activation Functions
F(x) = 1 / (1 + e –x)
Using a nonlinear
function which
approximates a linear
threshold allows a
network to approximate
nonlinear functions
Simple Perceptron
Binary logic application
fH(x) = u(x) [linear threshold]
Wi = random(-1,1)
Wb
Y = u(W0X0 + W1X1 + Wb)
Input 0
Input 1
W0
W1
+
fH(x)
Now how do we train it?
Output
Basic Training
Perception learning rule
ΔWi = η * (D – Y) * Xi
η = Learning Rate
D = Desired Output
Adjust weights based on how well the
current weights match an objective
Logic Training
Expose the network to the logical
OR operation
Update the weights after each
epoch
As the output approaches the
desired output for all cases, ΔWi will
approach 0
X0
0
0
1
1
X1
0
1
0
1
D
0
1
1
1
Results
W0 W1 Wb
Details
Network converges on a hyper-plane decision
surface
X1 = (W0/W1)X0 + (Wb/W1)
X1
X0
Feed-forward neural networks
Feed-forward neural networks are the
most common models.
These are directed acyclic graphs:
Neural Network for the news
example
Axiomatizing the Network
The values of the attributes are real numbers.
Thirteen parameters w0; … ;w12 are real numbers.
The attributes h1 and h2 correspond to the values of
hidden units.
There are 13 real numbers to be learned. The
hypothesis space is thus a 13-dimensional real space.
Each point in this 13-dimensional space corresponds
to a particular logic program that predicts a value for
reads given known, new, short, and home
Prediction Error
Neural Network Learning
Aim of neural network learning: given a set
of examples, find parameter settings that
minimize the error.
Back-propagation learning is gradient
descent search through the parameter
space to minimize the sum-of-squares
error.
Backpropagation Learning
Inputs:





A network, including all units and their
connections
Stopping Criteria
Learning Rate (constant of proportionality of
gradient descent search)
Initial values for the parameters
A set of classified training data
Output: Updated values for the parameters
Backpropagation Learning
Algorithm
Repeat



evaluate the network on each example given
the current parameter settings
determine the derivative of the error for each
parameter
change each parameter in proportion to its
derivative
until the stopping criteria is met
Gradient Descent for Neural Net
Learning
Bias in neural networks and
decision trees
It’s easy for a neural network to represent “at least
two of I1, …, Ik are true”:
w0 w1 wk
-15 10
10
This concept forms a large decision tree.
Consider representing a conditional: “If c then a
else b”:


Simple in a decision tree.
Needs a complicated neural network to represent
(c ^ a) V (~c ^ b).
Neural Networks and Logic
Meaning is attached to the input and
output units.
There is no a priori meaning associated
with the hidden units.
What the hidden units actually represent is
something that’s learned.