Learning in Neural and Belief Networks
Download
Report
Transcript Learning in Neural and Belief Networks
Learning in Neural and Belief
Networks
-Feed
Forward Neural Network
2001년 3월 28일
20013329 안순길
Contents
How the Brain works
Neural Networks
Perceptrons
Introduction
Two view points in this chapter
Computational view points : representing function using
network
Biological view points : mathematical model for brain
Neuron: computing elements
Neural Networks: collection of
interconnected neurons
How the Brain Works
Cell body (soma) :provides the
support functions and structure
of the cell
Axon : a branching fiber which
carries signals away from the
neurons
Synapse : converts a electrical
signal into a chemical signal
Dendrites : consist of more
branching fibers which receive
signal from other nerve cells
Action potential: electrical pulse
Synapse
excitatory: increasing potential
synaptic connection: plasticity
inhibitory: decreasing potential
A collection of simple cells
can lead to thoughts, action,
and consciousness.
Comparing brains with digital
computers
They perform quite different tasks, have different properties
Speed (in Switching speed)
computer is a million times faster
brain is a billion times faster
Brain
Perform a complex task
More fault-tolerant: graceful degradation
To be trained using an inductive learning algorithm
Neural Networks
NN: nodes(unit), links(has a numeric weight)
Each link has a weight
Learning : updating the weights
Two computational components
linear component: input function
nonlinear component: activation function
Notation
Simple computing elements
Total weighted input
By applying the activation function g
Three activation function
Threshold
To cause the neuron to fire
can be replaced with an extra input weight.
The input greater than threshold, output 1
Otherwise 0
Applying neural network in Logic
Gates
Network structures(I)
Feed-forward networks
Unidirectional links, no cycles
DAG(directed acyclic graph)
No links between units in the same layer, no
links backward to a previous layer, no links
that skip a layer.
Uniformly processing from input units to
output units
No internal state
input units/ output units/ hidden units
Perceptron: no hidden units
Multilayer networks: one or more hidden units
Specific parameterized structure: fixed structure
and activation function
Nonlinear regression: g(nonlinear function)
Network Structures(II)
Recurrent Network
The Brain similar to Recurrent Network
Brain has backward link like Recurrent
Recurrent networks have internal states
stored in the activation level
Unstable, oscillate, exhibit chaotic
behavior
Long computation time
Need advanced mathematical method
Network Structures(III)
Examples
Hopfield networks
Bidirectional connections with symmetric weights
Associative memory: most closely resembles the
new stimulus
Boltzmann machines
Stochastic(probabilitic) activation function
Optimal Network Struture(I)
Too small network: in capable of
representation
Too big network: not generalized well
Overfitting when there are too many parameters.
Feed forward NN with one hidden layer
can approximate any continuous function
Feed forward NN with 2 hidden layer
can approximate any function
Optimal Network Structures(II)
NERF(Network Efficiently Representable
Functions)
Function that can be approximated with a small
number of units
Using genetic algorithm: running the whole NN
training protocol
Hill-climbing search(modifying an existing network
structure)
Start with a big network: optimal brain
damage
Removing weights from fully connected model
Start with a small network: tiling algorithm
Start with single unit and add subsequent units
Cross-validation techniques
Perceptrons
Perceptron: single-layer, feed-forward network
Each output unit is indep. of the others
Each weight only affects one of the outputs
where,
What perceptrons can represent
Boolean function AND, OR, and NOT
Majority function: Wj=1, t=n/2 ->1 unit, n weights
In case of decision tree: O(2n) nodes
can only represent linearly separable functions.
cannot represent XOR
Examples of Perceptrons
Entire input space is divided in two along a
boundary defined by
In Figure 19.9(a): n=2
In Figure 19.10(a): n=3
Learning linearly separable
functions(I)
Bad news: not many problem in this set
Good news: given enough training examples, there exists a
perceptron algorithm learning them.
Neural network learning algorithm
Current-best-hypothesis(CBH) scheme
Hypothesis: a network defined by the current values of the
weights
Initial network: randomly assigned weight in [-0.5, 0.5]
Repeat the update phase to achieve convergence
Each epoch: updating all the weights for all the examples
Learning linearly separable
functions(II)
Learning
The error
Err=T-O
:Rosenblatt in 1960
: learning rate
Error positive
Need to increase O
Error negative
Need to decrease O
Algorithm
Perceptrons(Minsky and Papert, 1969)
Limits of linearly separable functions
Gradient descent search through weight space
Weight space han no local minima
Difference btw. NN and other attribute-based
methods such as decision trees.
Real numbers in some fixed range vs. discrete set
Dealing with discrete set
Local encoding: a single input, discrete attribute
values
None=0.0, Some=0.5, Full=1.0 (WillWait)
Distributed encoding: one input unit for each
attribute
Example
Summary(I)
Neural network is made by seeing human’s
brain
Brain still superior to Computer in Switching Speed
More fault-tolerant
Neural network
nodes(unit), links(has a numeric weight)
Each link has a weight
Learning : updating the weights
Two computational components
linear component: input function
nonlinear component: activation function
Summary(II)
In this text, We only consider
Feed-forward networks
Unidirectional links, no cycles
DAG(directed acyclic graph)
No links between units in the same layer, no links
backward to a previous layer, no links that skip a
layer.
Uniformly processing from input units to output
units
No internal state
Summary(III)
Network size decides Representation
Power
Overfitting when there are too many parameters.
Feed forward NN with one hidden layer
can approximate any continuous function
Feed forward NN with 2 hidden layer
can approximate any function
Summary(IV)
Perceptron: single-layer, feed-forward network
Each output unit is indep. of the others
Each weight only affects one of the outputs
Only available in linear separable functions
If Problem Space is flat, Neural Network is very
available.
In other words, if we make it easy in algorithm
perspective, Neural network also do
Basically, Back Propagation only guarantee Local
Optimality in neural network