Artificial Intelligence

Download Report

Transcript Artificial Intelligence

Artificial Intelligence
Lecture 9:
[Part I]: Selected Topics on Neural Networks
Faculty of Mathematical Sciences
4th
5th IT
Elmuntasir Abdallah Hag Eltom
http://www.rational-team.com/muntasir
Lecture Objectives
• Introduces the relationship between biological neurons,
which make up human brains, and artificial neurons,
which are used in artificial neural networks.
• McCulloch and Pitts neurons are explained, and the
capabilities and limitations of perceptrons are examined.
• Multilayer neural networks are explored, and the
backpropagation algorithm for supervised learning in
multilayer networks is explained.
• Recurrent networks, such as Hopfield networks and
other bidirectional associative memories, are also
explained.
• Unsupervised learning is explained through the use of
Kohonen maps and Hebb’s law.
Neural Networks Simplified
• Although the neural networks presented in this chapter
are very simplistic, real-world networks can be extremely
complex, consisting of hundreds or even thousands of
neurons. Networks of this size can often appear like a
“black box,” in the sense that it is not clear why they
behave in the way they do. In fact, the behavior of
complex neural networks is often emergent.
Neurons
Biological Neurons
• The human brain contains over ten billion neurons, each
of which is connected, on average, to several thousand
other neurons.
• These connections are known as synapses, and the
human brain contains about 60 trillion such connections.
• Neurons are in fact very simple processing elements.
Each neuron contains a soma, which is the body of the
neuron, an axon, and a number of dendrites. A
simplified diagram of a biological neuron is shown next
Neurons
Biological Neurons
Neurons
Biological Neurons
• The neuron receives inputs from other neurons along its
dendrites, and when this input signal exceeds a certain
threshold, the neuron “fires”—in fact, a chemical reaction
occurs, which causes an electrical pulse, known as an
action potential, to be sent down the axon (the output of
the neuron), toward synapses that connect the neuron to
the dendrites of other neurons.
• Although each neuron individually is extremely simple, this
enormously complex network of neurons is able to process
information at a great rate and of extraordinary complexity.
• The human brain far exceeds in terms of complexity any
device created by man, or indeed, any naturally occurring
object or structure in the universe, as far as we are aware
today
Neurons
Biological Neurons
• The human brain has a property known as plasticity,
which means that neurons can change the nature and
number of their connections to other neurons in
response to events that occur.
• In this way, the brain is able to learn. The brain uses a
form of credit assignment to strengthen the connections
between neurons that lead to correct solutions to
problems and weakens connections that lead to incorrect
solutions.
• The strength of a connection, or synapse, determines
how much influence it will have on the neurons to which
it is connected, and so if a connection is weakened, it will
play less of a role in subsequent computations.
Neurons
Artificial Neurons
• Artificial neural networks are modeled on the human brain
and consist of a number of artificial neurons.
• Neurons in artificial neural networks tend to have fewer
connections than biological neurons, and neural networks
are all (currently) significantly smaller in terms of number of
neurons than the human brain.
• Each neuron (or node) in a neural network receives a
number of inputs.
• A function called the activation function is applied to these
input values, which results in the activation level of the
neuron, which is the output value of the neuron. There are
a number of possible functions that can be used in
neurons.
Neurons
Artificial Neurons
• Some of the most commonly used activation functions:
Neurons
Artificial Neurons
• In the Step function (Linear threshold function) the inputs
to the neuron are summed (having each been multiplied
by a weight), and this sum is compared with a threshold,
t. If the sum is greater than the threshold, then the
neuron fires and has an activation level of +1. Otherwise,
it is inactive and has an activation level of zero. (In some
networks, when the sum does not exceed the threshold,
the activation level is considered to be -1 instead of 0).
• Hence, the behavior of the neuron can be expressed as
follows:
Neurons
Artificial Neurons
• X is the weighted sum of the n inputs to the neuron, x1 to
xn, where each input, xn is multiplied by its corresponding
weight wn. For example, let us consider a simple neuron
that has just two inputs. Each of these inputs has a
weight associated with it, as follows:
w1 = 0.8
w2 = 0.4
• The inputs to the neuron are x1 and x2:
x1 = 0.7
x2 = 0.9
• So, the summed weight of these inputs is
• (0.8 x 0.7) + (0.4 x 0.9) = 0.92
Neurons
Artificial Neurons
• The activation level Y, is defined for this neuron as
Hence, if t is less than 0.92, then this neuron will fire with
this particular set of inputs. Otherwise, it will have an
activation level of zero.
Neurons
Artificial Neurons
• A neural network consists of a set of neurons that are
connected together.
• The connections between neurons have weights
associated with them, and each neuron passes its output
on to the inputs of the neurons to which it is connected.
This output depends on the application of the activation
function to the inputs it receives. In this way, an input signal
to the network is processed by the entire network and an
output (or multiple outputs) produced.
• There is no central processing or control mechanism,
the entire network is involved in every piece of
computation that takes place.
Neurons
Artificial Neurons
• The way in which neurons behave over time is
particularly interesting.
• When an input is given to a neural network, the output
does not appear immediately because it takes some
finite period of time for signals to pass from one neuron
to another.
• In artificial neural networks this time is usually very short,
but in the human brain, neural connections are
surprisingly slow. It is only the enormously parallel
nature of the brain that enables it to calculate so quickly.
Neurons
Artificial Neurons
• For neural networks to learn, the weight associated with
each connection (equivalent to a synapse in the
biological brain) can be changed in response to
particular sets of inputs and events.
• Hebbian learning involves increasing the weight of a
connection between two neurons if both neurons fire at
the same time.
Perceptrons
• The perceptron, which was first proposed by Rosenblatt
(1958), is a simple neuron that is used to classify its
inputs into one of two categories.
• A perceptron uses a step function that returns +1 if the
weighted sum of the inputs, X, is greater than a
threshold, t, and -1 if X is less than or equal to t:
Perceptrons
• in which case, the activation function for a perceptron
can be written as:
• Note that here we have allowed i to run from 0 instead of
from 1. This means that we have introduced two new
variables: w0 and x0.We define x0 as 1, and w0 as -t.
• A single perceptron can be used to learn a
classification task, where it receives an input and
classifies it into one of two categories: 1 or 0.We can
consider these to represent true and false, in which case
the perceptron can learn to represent a Boolean
operator, such as AND or OR.
Learning Process of a
Perceptron
• First, random weights are assigned to the inputs.
Typically, these weights will be chosen between -0.5 and
+0.5.
• Next, an item of training data is presented to the
perceptron, and its output classification observed. If the
output is incorrect, the weights are adjusted to try to
more closely classify this input. In other words, if the
perceptron incorrectly classifies a positive piece of
training data as negative, then the weights need to be
modified to increase the output for that set of inputs.
• This can be done by adding a positive value to the
weight of an input that had a negative input value, and
vice versa.
Learning Process of a
Perceptron
• The formula for this modification, as proposed by
Rosenblatt (Rosenblatt 1960) is as follows:
• where e is the error that was produced, and a is the
learning rate,where 0 <a < 1; e is defined as 0 if the
output is correct, and otherwise it is positive if the output
is too low and negative if the output is too high.
• In this way, if the output is too high, a decrease in weight
is caused for an input that received a positive value. This
rule is known as the perceptron training rule.
Learning Process of a
Perceptron
• Once this modification to the weights has taken place,
the next piece of training data is used in the same way.
• Once all the training data have been applied, the
process starts again, until all the weights are correct and
all errors are zero.
• Each iteration of this process is known as an epoch.
• Let us examine a simple example: we will see how a
perceptron can learn to represent the logical-OR function
for two inputs.We will use a threshold of zero (t = 0) and
a learning rate of 0.2.
Learning Process of a
Perceptron
• First, the weight associated with each of the two inputs is
initialized to a random value between -1 and +1:
• w1 = - 0.2
• w2 = 0.4
• Now, the first epoch is run through. The training data will
consist of the four combinations of 1’s and 0’s possible
with two inputs.
• Hence, our first piece of training data is
• x1 = 0
• x2 = 0
• and our expected output is x1 ∨ x2 = 0.
Learning Process of a
Perceptron
• We apply our formula for Y:
• Hence, the output Y is as expected, and the error, e, is
therefore 0. So the weights do not change.
• The same goes for other cases.
• Now consider the case x1=1 and x2=0.
Learning Process of a
Perceptron
• We apply our formula for Y:
• This is incorrect because 1 ∨ 0 = 1, so we should expect
Y to be 1 for this set of inputs. Hence, the weights are
adjusted.
• We will use the perceptron training rule to assign new
values to the weights:
Learning Process of a
Perceptron
• Weight adjustment formula:
• Our learning rate is 0.2, and in this case, the e is 1, so
we will assign the following value to w1:
w1 = - 0.2 + (0.2 x 1 x 1)
= - 0.2 + 0.2 = 0
• We now use the same formula to assign a new value to
w2:
w2 = 0.4 + (0.2 x 0 x 1)
= 0.4
Learning Process of a
Perceptron
• Because w2 did not contribute to this error, it is not
adjusted.
• The final piece of training data is now used (x1 = 1 and
x2= 1):
Y = Step ((0 x 1) + (0.4 x 1))
= Step (0 + 0.4)
= Step (0.4)
=1
• This is correct, and so the weights are not adjusted.
• This is the end of the first epoch, and at this point the
method runs again and continues to repeat until all four
pieces of training data are classified correctly.
Learning Process of a
Perceptron
[See perceptron example]
Perceptrons
• A perceptron can be trained to model other logical
functions such as AND, but there are some functions
that cannot be modeled using a perceptron, such as
exclusive OR.
• The reason for this is that perceptrons can only learn to
model functions that are linearly separable.
• A linearly separable function is one that can be
drawn in a two-dimensional graph, and a single
straight line can be drawn between the values so
that inputs that are classified into one classification
are on one side of the line, and inputs that are
classified into the other are on the other side of the
line.
linearly separable functions
Perceptrons
• The reason that a single perceptron can only model
functions that are linearly separable can be seen by
examining the following function:
• Using these functions, we are effectively dividing the
search space using a line for which X = t. Hence, in a
perceptron with two inputs, the line that divides one class
from the other is defined as follows:
w1x1 + w2x2 = t
• The perceptron functions by identifying a set of values for
wi, which generates a suitable function. In cases where no
such linear function exists, the perceptron cannot succeed.
Multilayer Neural Networks
• Most real-world problems are not linearly separable, and
so although perceptrons are an interesting model for
studying the way in which artificial neurons can work,
something more powerful is needed.
• As has already been indicated, neural networks consist
of a number of neurons that are connected together,
usually arranged in layers.
• A single perceptron can be thought of as a single-layer
perceptron. Multilayer perceptrons are capable of
modeling more complex functions, including ones that
are not linearly separable, such as the exclusive-OR
function
Multilayer Neural Networks
This is A simple three-layer feed-forward
in contrast with recurrent networks.
A typical feed-forward neural network
consists of an input layer, one or two
hidden layers, and an output layer, and
may have anywhere between 10 and 1000
neurons in each layer.
Backpropagation
• Multilayer neural networks learn in much the same way
as single perceptrons.
• The main difference is that in a multilayer network, each
neuron has weights associated with its inputs, and so
there are a far greater number of weights to be adjusted
when an error is made with a piece of training data.
• Clearly, an important question is how to assign blame (or
credit) to the various weights. One method that is
commonly used is backpropagation.
Recurrent Networks
• The neural networks we have been studying so far are
feed-forward networks.
• A feed-forward network is acyclic, in the sense that there
are no cycles in the network, because data passes from
the inputs to the outputs, and not vice versa.
• Once a feed-forward network has been trained, its state
is fixed and does not alter as new input data is presented
to it. In other words, it does not have memory.
• A recurrent network can have connections that go
backward from output nodes to input nodes and, in fact,
can have arbitrary connections between any nodes. In
this way, a recurrent network’s internal state can alter as
sets of input data are presented to it, and it can be said
to have a memory.
Recurrent Networks
• This is particularly useful in solving problems where the
solution depends not just on the current inputs, but on all
previous inputs.
• For example, recurrent networks could be used to
predict the stock market price of a particular stock,
based on all previous values, or it could be used to
predict what the weather will be like tomorrow, based on
what the weather has been.
• When learning, the recurrent network feeds its inputs
through the network, including feeding data back from
outputs to inputs, and repeats this process until the
values of the outputs do not change. At this point, the
network is said to be in a state of equilibrium or
stability.
Recurrent Networks
• Recurrent networks are also known as attractor
networks because they are attracted to certain output
values.
• The stable values of the network, which are also known
as fundamental memories, are the output values used
as the response to the inputs the network received.
• A recurrent network can be considered to be a memory,
which is able to learn a set of states—those that act as
attractors for it.
• Once such a network has been trained, for any given
input it will output the attractor that is closest to that
input.