Learning in Neural and Belief Networks

Download Report

Transcript Learning in Neural and Belief Networks

Learning in Neural and Belief
Networks
-Feed
Forward Neural Network
2001년 3월 28일
20013329 안순길
Contents
How the Brain works
Neural Networks
Perceptrons
Introduction
Two view points in this chapter
 Computational view points : representing function using
network
 Biological view points : mathematical model for brain
Neuron: computing elements
Neural Networks: collection of
interconnected neurons
How the Brain Works
Cell body (soma) :provides the
support functions and structure
of the cell
Axon : a branching fiber which
carries signals away from the
neurons
Synapse : converts a electrical
signal into a chemical signal
Dendrites : consist of more
branching fibers which receive
signal from other nerve cells
Action potential: electrical pulse
Synapse
 excitatory: increasing potential
 synaptic connection: plasticity
 inhibitory: decreasing potential
A collection of simple cells
can lead to thoughts, action,
and consciousness.
Comparing brains with digital
computers
They perform quite different tasks, have different properties
Speed (in Switching speed)
 computer is a million times faster
 brain is a billion times faster
Brain
 Perform a complex task
 More fault-tolerant: graceful degradation
 To be trained using an inductive learning algorithm
Neural Networks
NN: nodes(unit), links(has a numeric weight)
 Each link has a weight
 Learning : updating the weights
Two computational components
 linear component: input function
 nonlinear component: activation function
Notation
Simple computing elements
Total weighted input
By applying the activation function g
Three activation function
Threshold
 To cause the neuron to fire
 can be replaced with an extra input weight.
 The input greater than threshold, output 1
 Otherwise 0
Applying neural network in Logic
Gates
Network structures(I)
Feed-forward networks
 Unidirectional links, no cycles
 DAG(directed acyclic graph)
 No links between units in the same layer, no
links backward to a previous layer, no links
that skip a layer.
 Uniformly processing from input units to
output units
 No internal state
input units/ output units/ hidden units
Perceptron: no hidden units
Multilayer networks: one or more hidden units
Specific parameterized structure: fixed structure
and activation function
Nonlinear regression: g(nonlinear function)
Network Structures(II)
Recurrent Network
 The Brain similar to Recurrent Network
 Brain has backward link like Recurrent
 Recurrent networks have internal states
stored in the activation level
 Unstable, oscillate, exhibit chaotic
behavior
 Long computation time
 Need advanced mathematical method
Network Structures(III)
Examples
 Hopfield networks
 Bidirectional connections with symmetric weights
 Associative memory: most closely resembles the
new stimulus
 Boltzmann machines
 Stochastic(probabilitic) activation function
Optimal Network Struture(I)
Too small network: in capable of
representation
Too big network: not generalized well
 Overfitting when there are too many parameters.
Feed forward NN with one hidden layer
 can approximate any continuous function
Feed forward NN with 2 hidden layer
 can approximate any function
Optimal Network Structures(II)
NERF(Network Efficiently Representable
Functions)
 Function that can be approximated with a small
number of units
 Using genetic algorithm: running the whole NN
training protocol
 Hill-climbing search(modifying an existing network
structure)
 Start with a big network: optimal brain
damage
 Removing weights from fully connected model
 Start with a small network: tiling algorithm
 Start with single unit and add subsequent units
 Cross-validation techniques
Perceptrons
Perceptron: single-layer, feed-forward network
 Each output unit is indep. of the others
 Each weight only affects one of the outputs
where,
What perceptrons can represent
Boolean function AND, OR, and NOT
Majority function: Wj=1, t=n/2 ->1 unit, n weights
 In case of decision tree: O(2n) nodes
can only represent linearly separable functions.
cannot represent XOR
Examples of Perceptrons
Entire input space is divided in two along a
boundary defined by
In Figure 19.9(a): n=2
In Figure 19.10(a): n=3
Learning linearly separable
functions(I)
Bad news: not many problem in this set
Good news: given enough training examples, there exists a
perceptron algorithm learning them.
Neural network learning algorithm
 Current-best-hypothesis(CBH) scheme
 Hypothesis: a network defined by the current values of the
weights
 Initial network: randomly assigned weight in [-0.5, 0.5]
 Repeat the update phase to achieve convergence
 Each epoch: updating all the weights for all the examples
Learning linearly separable
functions(II)
Learning
 The error
 Err=T-O


:Rosenblatt in 1960
: learning rate
Error positive
 Need to increase O
Error negative
 Need to decrease O
Algorithm
Perceptrons(Minsky and Papert, 1969)
 Limits of linearly separable functions
Gradient descent search through weight space
 Weight space han no local minima
Difference btw. NN and other attribute-based
methods such as decision trees.
 Real numbers in some fixed range vs. discrete set
Dealing with discrete set
 Local encoding: a single input, discrete attribute
values
 None=0.0, Some=0.5, Full=1.0 (WillWait)
 Distributed encoding: one input unit for each
attribute
Example
Summary(I)
Neural network is made by seeing human’s
brain
 Brain still superior to Computer in Switching Speed
 More fault-tolerant
Neural network




nodes(unit), links(has a numeric weight)
Each link has a weight
Learning : updating the weights
Two computational components
 linear component: input function
 nonlinear component: activation function
Summary(II)
In this text, We only consider
 Feed-forward networks
 Unidirectional links, no cycles
 DAG(directed acyclic graph)
 No links between units in the same layer, no links
backward to a previous layer, no links that skip a
layer.
 Uniformly processing from input units to output
units
 No internal state
Summary(III)
Network size decides Representation
Power
 Overfitting when there are too many parameters.
Feed forward NN with one hidden layer
 can approximate any continuous function
Feed forward NN with 2 hidden layer
 can approximate any function
Summary(IV)
Perceptron: single-layer, feed-forward network
 Each output unit is indep. of the others
 Each weight only affects one of the outputs
 Only available in linear separable functions
If Problem Space is flat, Neural Network is very
available.
In other words, if we make it easy in algorithm
perspective, Neural network also do
Basically, Back Propagation only guarantee Local
Optimality in neural network