5. Operant Conditioning V2

Download Report

Transcript 5. Operant Conditioning V2

Reward and Punishment






Cats escape from box to get a treat
At first its all trial and error
When successful the behaviour is rewarded
This good consequence strengthens the
behaviour
Law of effect – good consequence more
likely to be repeated, bad consequence not
Instrumental learning – the cat is active in
achieving its own escape and reward

A learning process by which the likelihood of a
particular behaviour occurring is determined by
the consequences of that behaviour

Theory of Operant Conditioning - Behaviour
operates on the environment and our behaviour
is instrumental in producing the consequences Rewards/Punishments

US psychologist Burrhus Frederic Skinner (1904
– 1990) referred to the responses observed in
trial and error learning as operants.

American Psychologist, B.F Skinner (1904 –
1990) believed behaviour can be reduced to the
relationships between the behaviour, its
antecedents (the events that precede it), and its
consequences

Operant - a response (or set or responses) that
occurs and acts (“operates”) on the environment
to produce some kind of effect. It is a response
or behaviour that generates consequences.

Operant Conditioning is based on
Thorndike’s law of effect that an organism
will tend to repeat a behaviour (operants)
that have desirable consequences (e.g.
receiving a treat) or that will enable it to
avoid undesirable consequences (e.g. Given a
detention). Organisms will tend not to repeat
a behaviour that has undesirable
consequences (e.g. Disapproval or a fine)

3 components:
 1. Stimulus (S) that precedes an operant response
 2. Operant response (R) to the stimulus
 3. Consequence (C) to the operant response
S
R
C

Sometimes expressed as:

S
R
S
where second S is a stimulus in the form of a
consequence.
The model means the probability of an operant
response (R) to a stimulus (S) is a function of (depends
on) the consequence (C) that has followed (R) in the
past.
e.g. Cat in puzzle box, the S is the box, R the sequence of
movements needed to open the door and C is escape
and food.
See further examples in Table 10.2 (page 479)

Skinner used the term “operant conditioning”
rather than “instrumental learning” as he
wanted to emphasise that animals and
people learn to operate on the environment
to produce desired or satisfying
consequences.

He proposed that in Thorndike’s experiments
the cat “operated” on the environment to
allow it to escape and get the fish reward.

The operant that became conditioned was the behaviour
of pushing the lever to open the door.

Skinner also contrasted operants with respondents in
classical conditioning. Respondents are behaviours
produced by known or recognised stimuli.
e.g. Pavlov’s dogs responded by salivating to meat powder
and later the bell. Thorndike’s cats made many different
responses that were not prompted by a particular
stimulus. The dog receives a consequence (food) whether
or not it has learned the conditioned response.
This is why Skinner referred to classical conditioning as
“respondent conditioning”.

In operant conditioning the consequence only
occurs if the organism performs the
response.

SUMMING UP:
In operant conditioning, if responses are not made,
the consequence doesn’t happen. In classical
conditioning, responses occur regardless of
responding.

Skinner believed that ALL behaviour could be
explained by the relationships between the
behaviour, its antecedents (events occurring
before it) and its consequences.

Skinner argued that any behaviour that is
followed by a consequence will change in
strength (become more, or less, established)
and frequency (occur more, or less often)
depending on the nature of that consequence
(reward or punishment).

The Skinner Box is a small chamber in which
an experimental animal learns to make a
particular response for which the
consequences can be controlled by the
researcher.
It contains a lever that delivers food (or
water) into a dish when pressed.
Some boxes also have lights and buzzers,
some have grid floors that can deliver a mild
electric shock.

Lever is usually wired to a cumulative
recorder (chart paper with a pen that makes a
special mark each time a desired response is
made).

The recorder indicates how often (frequency)
of response and the rate of response (speed).


Rats – press level
Pigeons – peck disc.

Skinner referred to different types of rewards as
“reinforcers”.

He used the Skinner Box to reward the animals
according to different types of programs or
schedules of reinforcement.

The fact the rats were hungry provided the
motivation for their frantic activity, increasing
the probability the lever would eventually be
pressed and the food reward dispensed.

Skinner believed there was no need to search
for internal agents (factors within an
organism) to explain changes in behaviour.

He based his view on the notion that
behaviour can be understood in terms of
environmental or external influences, without
any consideration of internal mental
processes.

Any stimulus (event or action) that
subsequently strengthens or increases the
likelihood of the response (behaviour) that it
follows.

The reinforcer comes after the response
(behaviour)

Reinforcement makes things stronger

Reinforcement can involve receiving a
pleasant stimulus (e.g. Treat for your dog) or
avoiding or escaping an unpleasant stimulus
(e.g. Umbrella on a rainy day).

An essential feature of reinforcement is that
it is only used after the desired or correct
response is made.

A reinforcer is any stimulus (object or event) that strengthens or
increases the frequency or likelihood of a response that it follows.

The word reinforcer is often used interchangeably with the word
reward (although they are not technically the same).
One difference is that a reward suggests an outcome that is positive,
such as satisfaction or pleasure.
A stimulus is a reinforcer if it strengthens the preceding behaviour.
Also, a stimulus can be rewarding because it’s pleasurable, but is not a
reinforcer unless it increases the frequency of a response or the likelihood
of a response occurring.
e.g. Eating chocolate is pleasurable but is not a reinforcer unless it
promotes or strengthens a particular response.



Positive Reinforcer
PLUS something GOOD
A stimulus which strengthens a response by
providing a pleasant or satisfying
consequence




Skinners experiment = food pellets
Money
Grades
Applause







Negative Reinforcer
MINUS something BAD
A stimulus that strengthens a response by the
reduction, removal or prevention of an
unpleasant stimulus
The behaviour that removes reduces or prevents
an unpleasant stimulus is strengthened by the
consequence
Skinners experiment = electric shock
Taking Panadol for headache
Driving slow to avoid fine

Positive reinforcement add good

Negative reinforcement take away bad

Both STRENGTHEN a response
Overall outcome is desirable to organism, just
have achieved it in different ways





Positive punishment - the delivery of a stimulus following an
undesirable response
PLUS BAD
Negative punishment – the removal of a stimulus following an
undesired response
MINUS GOOD

Punisher – an unpleasant stimulus that when paired with a response
weakens the response or decreases the rate of responding over time

Punishers reduce unwanted behaviour

It is usually more effective to reinforce alternative desirable behaviour
than it is to punish undesirable behaviour




MINUS GOOD
Negative punishment often referred to as
response cost
When a valued stimulus removed
Eg. If you drink drive we will take away your
licence

The way reinforcement is delivered is referred
to as the “schedule of reinforcement”.

It is a program for giving reinforcement,
specifically the frequency and manner in
which a desired response is reinforced.

The schedule influences the speed of learning
and the strength of the learned response.

Continuous Reinforcement necessary for a
response to become learned

Partial Reinforcement can be more effective
at maintaining a response
Fixed Ratio
 Fixed number of correct responses
 Being paid $5 for every 100 newspapers delivered
Variable Ratio
 Variable number of correct responses
 Poker machines
Fixed Interval
 Fixed time period
 Teachers at Gleneagles get paid every fortnight
Variable Interval
 Variable time period
 Fishing



The variable ratio schedule is the most
resistant to extinction
It leads to the fastest rate of responding
Gambling addiction is explicable through
variable ratio reinforcement

Order of presentation – reinforcement needs to
occur after the desired response not before! So the
organism associates the reinforcement with the
behaviour

Timing – Reinforcers need to occur as close in time to
the desired response as possible. Most effective
reinforcement occurs immediately after the desired
response

Appropriateness of the reinforcer – For a stimulus to
be a reinforcer it must provide a pleasing or satisfying
consequence for its recipient.

Reinforcers that work in one situation will not always
work in another.

The characteristics of the individual involved and the
particular situation need to be taken into account
when deciding on the best kind of reinforcer to be
used.

An inappropriate punisher can have the opposite
effect and produce the same consequence as a
reinforcer. (e.g. Giving verbal reprimand from a
teacher to an attention seeking , talkative Year 8
student can act as a reinforcer for the talkative
behaviour)

Punishment may temporarily decrease the
occurrence of unwanted responses or behaviour,
but it doesn’t promote more desirable or
appropriate behaviour in its place.

So, instead Skinner advocated for the greater
use of positive reinforcement to strengthen
desirable behaviours or to promote the learning
of alternative behaviours to punishable
behaviours.

Same key processes as in classical
conditioning:
 ACQUISITION
 EXTINCTION
 SPONTANEOUS RECOVERY
 STIMULUS GENERALISATION
 STIMULUS DISCRIMINATION

Refers to the overall learning process during
which a specific response, or pattern of
responses is established.

THE MEANS by which this is acquired is different
between operant and classical conditioning.

TYPES OF BEHAVIOURS acquired through
operant conditioning are usually more complex
than the reflexive involuntary responses in
classical conditioning.

Acquisition in operant conditioning is the
establishment of a response through reinforcement.

Speed of establishment of response depends on the
schedule of reinforcement.

Sometimes, a behaviour to be acquired is too complex
to be performed completely at the end of the
acquisition process, so a simpler version of the
behaviour or a step towards the target behaviour is
attempted and reinforced continuously until it is
established. This involves a procedure called shaping.

Extinction – the gradual decrease in the
strength or rate of responding after a period of
non- reinforcement. Extinction occurs after the
termination of reinforcement.

Extinction has occurred when a conditioned
response is no longer present.

Depending on whether partial or continuous
reinforcement has been used, the response rate
may actually increase in the initial phase of
extinction after reinforcement is stopped.

There is often reluctance to stop the response
altogether as it has had satisfying
consequences.

Frustration and anger may also accompany
the increased response rate.

Extinction is less likely to occur when partial
reinforcement is used. Uncertainty leads to
greater tendency for response to continue.

Spontaneous recovery – the response is (after a
rest period) again shown in the absence of
reinforcement.

Response is likely to be weaker and will probably
not last very long.

A spontaneously recovered response is often
stronger when it occurs after a lengthy period
following extinction of the response than when
it occurs relatively soon after extinction.

Stimulus generalisation - occurs
when the correct response is made
to another stimulus which is similar
to the stimulus for which
reinforcement is obtained.

Response usually occurs at a
reduced level (frequency and
strength) e.g. pigeons pecked other
colored lights

Stimulus discrimination organism makes response to a
stimulus for which reinforcement is
obtained but not for any other
similar stimulus (e.g. sniffer dogs
used by drug detection units)

Shaping – a strategy in
which a reinforcer is
given for any response
that successively
approximates and
ultimately leads to the
final desired response

Used to train behaviours
that are unlikely to occur
spontaneously

Also known as the method
of successive
approximations.

Used when the desired
response has a low
probability of occurring
naturally.

Used in real life – dolphins
at SeaWorld for
entertainment purposes,
search and rescue dogs
tracking skills, guide dogs.


Learning to write
Children learning to swim

Monkeys trained to assist
quadraplegics (Read Box
10.7 on page 499)
The consistent use of
Operant conditioning to
alter behaviour over
time
 Use of tokens as
rewards that can be
‘cashed in’ for bigger
rewards later
 Schools
 Prisons


Token Economies are a
form of behaviour
modification using
reinforcement tokens to
influence behaviour
change.

E.g. Prisons – tokens
cashed in for rewards
such as cigarettes or
privileges.


A token economy is a
setting in which an
individual receives
tokens (reinforcers) for
desired behaviour and
these tokens can then be
collected and exchanged
for other reinforcers in
the form of actual or
“real” rewards.
E.g. Prisons, schools

Tokens may be
withdrawn as “penalties”
for undesirable
behaviour.

Advantage of tokens:
can be used in large
group situations where
real rewards are difficult
to administer
immediately after a
desired behaviour
occurs.

Once desired behaviour
is established, tokens can
be phased out and
replaced by more
“natural” and easily
administered reinforcers
(e.g. Praise, smile).

E.g. Schools – to
increase reading by
students, improve social
skills of students with
intellectual disabilities.

Sometimes token
economies backfire or fail.

WHY?
People may feel
manipulated and refuse to
co-operate.
Or
Situations are so complex
and uncontrolled that well
planned programs can go
wrong. (e.g. Not smiling
when delivering reinforcer)


Operant conditioning
procedures may fail also
when the underlying cause
of a behaviour is not
altered.
e.g. Rewarding
cheerfulness when the
gloominess is caused by a
boring job – the solution
may lie in changing jobs.