Operant Conditioning - Grand Haven Area Public Schools

Download Report

Transcript Operant Conditioning - Grand Haven Area Public Schools

Operant Conditioning

Comparing Classical and Operant Conditioning

 

Both

classical and operant conditioning use acquisition, extinction, spontaneous recovery, generalization, and discrimination.

 

Classical

conditioning uses

reflexive behavior

- behavior that occurs as an automatic response to some stimulus.

  Ask: Is the behavior something the animal does NOT control? YES. Does the animal have a choice in how to behave? NO. Classical conditioning.

 

Operant

conditioning uses

operant or voluntary behavior

– voluntary behavior that is shaped by consequences.

  Ask: Is the behavior something the animal can control? YES. Does the animal have a choice in how to behave? YES. Operant Conditioning .

What is Operant Conditioning?

Operant Conditioning • A type of learning in which the frequency of a behavior depends on the

consequence

that follows that behavior • The frequency will

increase

consequence is

reinforcing

if the to the subject.

• The frequency will

decrease

if the consequence is

not reinforcing

or

punishing

to the subject.

The Law of Effect

Edward L. Thorndike ( 1874–1949)

Edward Thorndike (1874-1949) • Author of the law of effect • Behaviors with

favorable

consequences will occur

more frequently

.

• Behaviors with

unfavorable

consequences will occur

less frequently

.

• Created puzzle boxes for research on cats

Thorndike’s Puzzle Box • “Thorndike’s Puzzle Box” Video #8 from Worth’s Digital Media Archive for Psychology. (2 min)

Thorndike’s Puzzle Box

Early Operant Conditioning • E. L. Thorndike (1898) • Puzzle boxes and cats

First Trial in Box Situation: stimuli inside of puzzle box Scratch at bars Push at ceiling Dig at floor Howl Etc.

Etc.

Press lever After Many Trials in Box Situation: stimuli inside of puzzle box Scratch at bars Push at ceiling Dig at floor Howl Etc.

Etc.

Press lever

B. F. Skinner (1904–1990)

B.F. Skinner (1904-1990) • Believed that internal factors like thoughts, emotions, and beliefs could not be used to explain behavior. Instead said that new behaviors were actively chosen by the organism • Looked at “Operants” or active behaviors that are used on the environment to generate consequences • Developed the fundamental principles and techniques of operant conditioning and devised ways to apply them in the real world • Designed the Skinner Box, or operant chamber

The Skinner Box

Skinner’s Air Crib: A room fit for a…Baby!

To read more on this invention: Click Here!

B. F. Skinner’s Operant Conditioning • Did not like Thorndike’s term “satisfying state of affairs” • Interested in the behaviors produced • Operant—

voluntary

response that acts on the environment to produce consequences

Reinforcement/Punishment • Reinforcement - Any consequence that

increases

the likelihood of the behavior it follows – Reinforcement is ALWAYS GOOD!!!

• Punishment - Any consequence that

decreases

the likelihood of the behavior it follows • The subject determines if a consequence is reinforcing or punishing

Types of Reinforcement

Principles of Reinforcement

Reinforcing/Desirable Stimulus Aversive/UnDesirable Stimulus Stimulus is presented or added to animal’s environment… Positive (+) Reinforcement

Add something you DO LIKE.

Behavior Increases

Positive (+) Punishment

Add something you DO

NOT

LIKE.

Behavior Decreases

Stimulus is removed or taken away from animal’s environment… Negative (-) Punishment

TAKES AWAY something you DO LIKE.

Behavior Decreases

Negative (-) Reinforcement

TAKES AWAY something you DO

NOT

LIKE.

Behavior Increases

Positive Reinforcement • Strengthens a response by presenting a stimulus that you like after a response • Anything that increases the likelihood of a behavior by following it with a

desirable

event or state • The subject receives something they want (added) • Will

strengthen

the behavior

Positive Reinforcement

Negative Reinforcement • Strengthens a response by

reducing or removing

an aversive (disliked) stimulus • Anything that increases the likelihood of a behavior by following it with the

removal of an undesirable event

or state • Something the subject doesn’t like is removed (subtracted) • Will

strengthen

the behavior • Neg. Rein. Allows you to either: –

Escape

something you don’t like that is

already present

(Neg. Rein. By Escape) –

Avoid

something

before

Avoidance) it occurs (Neg. Rein. By

Negative Reinforcement

Positive/Negative Reinforcement BOTH ARE GOOD THINGS!!!

Operant Conditioning • Play “Operant Conditioning” (3:13) Segment #11 from Psychology: The Human Experience.

Billy Throws a Tantrum • Billy throws a tantrum, his parents give in for the sake of peace and quiet. • How is this an example of positive reinforcement?

• The child’s tantrum is reinforced when the parents give in (pos. reinforcement) • How is the an example of negative reinforcement?

• The parents’ behavior will be reinforced when Billy stops screaming (neg. reinforcement).

Primary Versus Secondary Reinforcement

Primary Reinforcement • Something that is naturally reinforcing • Examples: food, warmth, water, etc.

• The item is reinforcing in and of itself

Conditioned/Secondary Reinforcement • Something that a person has learned to value or finds rewarding because it is paired or associated with a primary reinforcer • Money is a good example. • So are grades and signs of respect & approval.

Immediate Versus Delayed Reinforcement

Immediate Reinforcers • Immediate reinforcers – behaviors that immediately precedes the reinforcer becomes more likely to occur – (This true when training animals. Can’t wait for a long time before reinforcing or the animal won’t know what behavior you are reinforcing)

Delayed Reinforcers • Also called Delayed Gratification – forgoing a small immediate reinforcement for a greater reinforcement later. • Humans do this with paychecks, grades. • When do we not do this? • Stay up late to watch TV when next day we’re tired • Smoke for satisfaction now when later it will kill us

Immediate/Delayed Reinforcement • Immediate reinforcement is more effective than delayed reinforcement • Ability to delay gratification predicts higher achievement

Punishment: The Process of Punishment

Types of Punishment • An undesirable event following a behavior • Behavior ends a desirable event or state • Its effect is opposite that of reinforcement – it

decreases

frequency of behavior the

Positive Punishment (Punishment by Application) • Something is added to the environment you do NOT like. • A verbal reprimand or something painful like a spanking (See examples on pg. 211)

Negative Punishment (Punishment by Removal) • Something is taken away that you DO LIKE. • Lose a privilege. (See examples on pg. 212)

The Good Effects of Punishment • Punishment can effectively control certain behaviors if… – It comes immediately after the undesired behavior – It is consistent and not occasional • Especially useful if teaching a child not to do a dangerous behavior • Most still suggest reinforcing an incompatible behavior rather than using punishment

Bad Effects of Punishment • Does not teach or promote alternative, acceptable behavior.

• Only tells what NOT to do while reinforcement tells what to do.

• Doesn’t prevent the undesirable behavior when away from the punisher in a “safe setting” • Can lead to fear of the punisher, anxiety, and lower self-esteem • Children who are punished physically may learn to use aggression as a means to solve problems.

How is Punishment & Reinforcement being used to treat severely autistic and/or violent children? See CNN video clip from Anderson Cooper 360.

Do you think they should be using these conditioning methods on these kids?

Discriminative Stimuli

• An environmental stimulus that when in the presence of a particular response is more likely to be reinforced and when absent is less likely to be reinforced. • Example: A ringing phone is a discriminative stimulus that sets a particular response of picking it up and speaking in it

Extinction • In operant conditioning, the loss of a conditioned behavior when consequences no longer follow it.

• The subject no longer responds since the reinforcement or punishment has stopped.

Thoughts from Skinner: • Skinner believed from the moment of birth, the environment shapes and determines your behavior through reinforcing or punishing consequences.

• • “A person does not act upon the world, the world acts upon him.” (Read Critical Thinking Box on pg. 214-215 for more) “B.F. Skinner Interview” (4 min) – Video #9 from Worth’s Digital Media Archive for Psychology.

Parts of Operant Conditioning (See Chart on page 215)

Discriminative Stimulus

Specific environmental stimulus Gas gage on empty Wallet on sidewalk

Operant Response

Voluntary behavior Fill car with gas Give Wallet to Security

Consequence Effect on Future Behavior

Event that will make the operant response more or less likely to reoccur Avoid running out of gas.

Get $50 Reward If reinforcement = more likely to reoccur If punishment = less likely to reoccur

Some Reinforcement Procedures: Shaping

Shaping • Reinforcement of behaviors that are more and more similar to the one you want to occur • Technique used to establish a new behavior

Shaping Principles • Skinner box - soundproof box with a bar that an animal presses or pecks to release a food or water reward, and a device that records these responses.

• Shaping - procedure in which rewards, such as food, gradually guide an animal’s behavior toward a desired behavior.

• Successive approximations - shaping method in which you reward responses that are ever closer to the final desired behavior and ignore all other responses.

• Shaping nonverbal animals can show what they perceive. Train an animal to discriminate between classes of events or objects. – After being trained to discriminate between flowers, people, cars, and chairs, a pigeon can usually identify in which of these categories a new pictured object belongs

Skinner attached some horizontal stripes to the wall which he then used to gauge the dog's responses of lifting its head higher and higher. Then, he simply set about shaping a jumping response by flashing the strobe (and simultaneously taking a picture), followed by giving a meat treat, each time the dog satisfied the criterion for reinforcement. The result of this process is shown below, as it was in

LOOK

magazine, in terms of the pictures taken at different points in the shaping process. Within 20 minutes, Skinner had Agnes "running up the wall"

For the second shaping demonstration, Skinner trained Agnes to press the pedal and pop the top on the wastebasket. Again, the photographer's flash served as the conditioned reinforcer, and each step in the process was photographed. The results are shown below.

Schedules of Reinforcement

Continuous reinforcement • A schedule of reinforcement in which a reward follows

every

correct response • Learning occurs rapidly • But the behavior will extinguish quickly once the reinforcement stops.

– Once that reliable candy machine eats your money twice in a row, you stop putting money into it.

Partial Reinforcement • A schedule of reinforcement in which a reward follows only

some

correct responses • Learning of behavior will take longer • But will be more resistant to extinction • Includes the following types: – Fixed-interval and variable interval – Fixed-ratio and variable-ratio

Fixed-Ratio Schedule • A partial reinforcement schedule that rewards a response only after some defined number of correct responses • The faster the subject responds, the more reinforcements they will receive.

• i.e. piece work: You get $5 for every 10 widgets you make.

Variable-Ratio Schedule • A partial reinforcement schedule that rewards an unpredictable average number of correct responses • High rates of responding with little pause in order to increase chances of getting reinforcement • This schedule is very resistant to extinction.

• Sometimes called the “gambler’s schedule”; similar to a slot machine or fishing

Fixed-Interval Schedule • A partial reinforcement schedule that rewards only the first correct response after some defined period of time • Produces gradual responses at first and increases as you get closer to the time of reinforcement • Example: a known weekly quiz in a class, checking cookies after the 10 minute baking period.

Variable-Interval Schedule • A partial reinforcement that rewards the first correct response after an unpredictable amount of time • Produces slow and steady responses • Example: “pop” quiz in a class

Ask Yourself… • Can the animal speed up its reinforcement by doing the behavior? If YES - Ratio – Does the number of times the animal does the behavior vary for reinforcement? Variable – Does the animal do the behavior a set number of times for reinforcement? Fixed • Is the example dealing with the amount of time that elapses from the behavior till it gets reinforcement? - Interval – Reinforcement will NOT be sped up by doing the behavior more often – Does the amount time between the behavior and reinforcement vary? Variable – Is the amount of time between the behavior and reinforcement stay the same? Fixed

Schedules of Reinforcement

Operant Conditioning

Class Activity • 4 Volunteers are needed to demonstrate schedules of reinforcement • No punishment will be used.

• You will remain

dry

for the entire activity.

Variable Ratio

• 1:1/ 7:1 / 4:1 / 12:1 / 8:1 / 19:1 / 3:1 / 2:1 / 2:1 / 5:1 / 16:1 / 11:1 / 3:1 / 8:1 / 4:1

Fixed Ratio

• 7:1 / 7:1 / 7:1 / 7:1 / 7:1,…. 15 times

Fixed Interval

• 10 sec:1 / 10 sec:1 / 10 sec:1 / ,… 15 times

Variable Interval

• 6 sec:1 / 8 sec:1 / 10 sec:1 / 3 sec:1 / 7 sec:1 / 14 sec:1 / 15 sec:1 / 8 sec:1 / 5 sec:1 / 12 sec:1 / 6 sec:1 / 9 sec:1 / 13 sec:1/15 sec:1 / 8 sec:1

New Understandings of Operant Conditioning: The Role of Cognition

Skinner & Thorndike • Believed that cognitions (thoughts), perceptions and expectations have no place in psychology.

• This is because they cannot be studied through observation and therefore were seen as not being objective.

Cognitive Aspects of Operant Conditioning • Latent learning—learning that occurs in the absence of reinforcement, but is not demonstrated until a reinforcer is available • Cognitive map—term for a mental representation of the layout of a familiar environment • Learned helplessness—phenomenon where exposure to inescapable and uncontrollable aversive events produces passive behavior

Latent Learning • Learning that takes place in absence of an apparent reward • Idea developed by E.C. Tolman

E.C. Tolman’s Rat Maze Experiment • Three groups of rats were trained to run a maze. • The control group, Group 1, was fed upon reaching the goal. • The first experimental group, Group 2, was not rewarded for the first six days of training, but found food in the goal on day seven and everyday thereafter. • The second experimental group, Group 3, was not rewarded for the first two days, but found food in the goal on day three and everyday thereafter.

Tolman’s Rat Maze Experiment (continued) • Both of the experimental groups demonstrated fewer errors when running the maze the day after the transition from no reward to reward conditions. The marked performance continued throughout the rest of the experiment. • This suggested that the rats had learned during the initial trials of no reward and were able to use a "cognitive map" of the maze when the rewards were introduced. • The initial learning that occurred during the no reward trials was what Tolman referred to as latent learning. • He argued that humans engage in this type of learning everyday as we drive or walk the same route daily and learn the locations of various buildings and objects. Only when we need to find a building or object does learning become obvious.

Cognitive Map • • A mental representation of a place • Experiments showed rats could learn a maze without any reinforcements See a modern day example of Tollman’s experiment where they change the maze on the rat (2 min)

Latent Learning & Cognitive Maps • Play “Cognitive Processes in Learning” (6:25) Segment #12 from Psychology: The Human Experience.

Other evidence that we do think!

• Animals on a fixed-interval reinforcement schedule though respond more frequently as the time approaches for their reinforcer as if they

expect

that the response will produce the reward

Overjustification Effect • The effect of promising a reward for doing what someone already likes to do • The reward may lessen and replace the person’s original, natural motivation, so that the behavior stops if the reward is eliminated – The person may now see the reward, rather than intrinsic interest, as the motivation for performing the task. – “If I have to be bribed into doing this, then it’s not worth doing for its own sake.” • Rewards do help increase interest when used to indicate a job well done

Learned Helplessness • Dogs in electrified cage at first not able to escape the impending shock.

• Later, all they had to do was cross to the other side but they didn’t even try.

•The dogs had learned they were “helpless” to avoid the shock and just sat there and took it without trying to escape.

Learned Helplessness • Exposure to inescapable and uncontrollable aversive events produces passive behavior. If an animal believes or expects it cannot escape a certain result, it will give up trying to do a behavior that could result in it escaping from the bad result. • To overcome this, one must establish a sense of control over one’s environment and see some success.

New Understandings of Operant Conditioning: The Role of Biology

Biological Predispositions • Animal training issues – easier to train behaviors that are closer to natural behaviors using a natural reinforcer (food).

•Instinctive drift—naturally occurring behaviors that interfere with operant responses. •What happens when a trained tiger shows instinctive drift?

Classical Conditioning vs.

Operant Conditioning