Learning, Punishment & Reinforcement- The Effects Of Consequences In A Behaviour Paradigm

To understand the effects of antecedents (triggers/stimuli), we need to be clear about motivating operations and discriminitative stimuli (SD). Behaviour is maintained by its consequences. Please see previous article which sets out the fundamental and dimensional properties which underly this subject as a science: What Is behaviour (Introduction To ABA)

This article will provide working examples where possible and explanation to help with your learning of this technology. We will be using Tango & Marmalade as a point of reference and myself (Georgie) to help clarify the concept further. Please note:  Mistakes are all my own! Email with any suggestions

Tango & Marmalade:

positive reinforcement training in dogs

Edward Lee Thorndike (~1898) Identified the law of effect. Stating that organisms learn through the consequences of their actions. He spent his time quantifying the laws of weak and strong using experimentation & this lead to Skinners Operant conditioning.

Burrhus Frederick Skinner (~1938) contributed to behaviour analysis by developing the paradigm shift and the methods in which we can study behaviour. He invented various tools to do this, such as, the operant chamber, cumulative recorder and programmed instruction. He started with rats and then on to pigeons. He demonstrated the cumulative recording of extinction.

Which looks sometime like this

RatBehaviorExtinguishedCurve

(from rewardandconsentblogspot.com)

Skinner developed radical behaviourism. He considered verbal behaviour, private events, the scientist (observer) and how this work would save the world and improve peoples lives. His work lead to the development of applied behaviour analysis (ABA) & he developed the principles of operant conditioning

Definition of operant (source wikipedia):

noun- an item of behaviour that is not a response to a prior stimulus but something which is initially spontaneous, which may reinforce or inhibit recurrence of that behaviour.
adjective-involving the modification of behaviour by the reinforcing or inhibiting effect of its own consequences.
Operant Behaviour:

It has an effect on the environment and is under the control of its consequences, usually a combination of antecedents & consequences but never controlled by antecedents alone. The effects that the behaviour has on the environment are called consequences and this results from movement and muscle contractions. Usually from the skeletal frame. These operants have an effect on the future probability of behaviour under similar conditions and can be seen by taking data over time and graphing it. Consequences can either accelerate or decelerate their effects on behaviour responses. That is, they either strengthen or weaken.

Example:

This morning I was laying on my bed with my dogs sprawled out next to me whilst I was studying for my ABA exams. I started tapping my feet together which made a slight noise. Marmalade (my dog) lifted his head up and started to stare at me and towards my feet (he was asleep before). I immediately stopped tapping my feet and he went back to his sleeping position. The consequence of him looking at me and then back to my tapping feet reduced (decelerated my behaviour). His punishment affected the environment and stopped my foot tapping. He applied a look which was a conditioned punisher and it functioned to decrease the foot tapping. This is an example of positive punishment on me and negative reinforcement to himself- as he withdrew an irritant in his environment by his look which was his response and the consequence of this was that my behaviour decreased.

The effect of a consequence on a behaviour is to increase or decrease a dimension of the behaviour over time. Be careful with dimensions & Properties in explanations see: Behaviour Analysis A Natural Science- Fundamentals Of ABA.

Operant selection is the process where repeated cycles of behavioural variability & specific responses are selected by the contingent consequences and the selected responses. These are repeated over time. This ability has evolved through natural selection. This used to be termed instrumental conditioning or operant conditioning (Catania 2007).

Operant Contingencies:

We are looking at response consequence or R-S (response stimulus). Remember consequences and antecedents are stimuli too. Skinner identified the 3 term contingency S-R-S or S-R-C which is stimulus, response stimulus (or consequence) and now we call this the ABC Antecedent, behaviour, consequence and this is much easier for us to help explain to people. There are more complex operant contingencies.

Consequences:

There are two general types of consequences and 8 basic consequences and then there are a further two which consist of withholding the previous consequences.  The latter being Extinction & Recovery. These only make sense if there is a history of a specific consequence which can be a  reinforcement or a punishment on the target behaviour.

  • Reinforcement means to STRENGTHEN
  • Extinguish means to STAMP OUT
  • Recover means to COME BACK STRONG
  • Punishment means to WEAKEN THROUGH A CONTINGENT RELATIONSHIP BETWEEN A BEHAVIOUR & A CONSEQUENCE

Sorry for shouting but most people get this wrong and the term punishment is really rather unfortunate but we are stuck with it and it is what we use.

Contingency & Reinforcement:

Environmental change which follows a response and acts to increase or maintain future frequency of that behaviour= reinforcement. An environmental change has to occur immediately after the response and it must be contingent on it. It usually has close contiguity.

Example:

Tango gets a treat for walking through the door each time she does.

This is not an example of reinforcement because the door walking behaviour is not contingent on the consequence (we do not have enough information) and we have no idea if the food will serve as a way of increasing the responses over time or not. It could even weaken it!

Example that is NOT reinforcement:

I offer Marmalade a treat (I show him the treat and I want him to sit), he only sits when I show him the treat. This is not an example of reinforcement because the antecedent was the consequence which gave the behaviour and it occurred before the target behaviour and it was responsible for evoking the behaviour, therefore it is not reinforcement… Can you think of any examples of how people do this with their animals when training and they think that training is not working? It is an easy mistake to make. The change in the environment i.e. the treat, has to occur after the response. This is a reason why timing is very important, because if the reinforcement is delayed and perhaps another response occurs in the meantime, it is that other response that is reinforced and not the earlier one.

Example

I call Tango to come and sit at me. She runs over and sits. I go into my pocket to get a bit of cheese and then break a piece off and she is sitting and then she starts jumping up in the air and standing and then she runs off and then comes back and jumps.. I have taken too long to give a consequence to her response and she has given me 3 or four other behaviours in the meantime.

If you find that the rate of responding does increase in these situations then it is likely that other factors are coming in to play such as unidentified automatic reinforcement, verbal mediation/rule governance. Perhaps the response is a member of a higher order response class, adduction or scheduled induced adjunctive behaviour.

The contingency between the response and the consequence must be strong for it to strengthen responding. The responses effects the environment. Terms such as non-contingent reinforcement and response-independent are not accurate.

Example:

Tango vocalises when she is let out of the front door, she paces up and down and walks backwards and forwards. Georgie used to soothe Tango when she was making this noise and she wondered if this was increasing the behaviour. Georgie stopped giving Tango any attention and the behaviour has not changed at all over time. This behaviour is not dependant on attention or soothing. There is no contingency between the screaming. Georgie’s attention did not function as reinforcement for this behaviour. There was no increase or decrease in the behaviour over time.

Automaticity of reinforcement: This is reinforcement which works without any verbal mediation.

It is inaccurate to refer to hypothetical constructs such as expectancy, understanding and knowing.

Examples:

  • Shep knows what I would like him to do, he just chooses not to
  • Tango knows that if she vocalises, she will get my attention
  • Fido understands what I mean
  • The only reason Tigger does A is because he is expecting B

The actual behaviour event is reinforced, the organism is not. The thing that the dog did was reinforced not the dog. Reinforcement is a process or operation and not a stimulus. The resulting strengthening of a response leads to the consequence. The process is the increase in rate of responding which is contingent on the presentation or termination of a stimulus as a consequence for a response. The operation is the arrangement of consequence which is contingent on the response and the result is the strengthening of the response.

Example:

  • Correct: The trainer reinforced Tangos sit behaviour by giving her some cheese and praising her
  • Wrong: The trainer reinforced Tango by giving her cheese and praising her.

Premack Principle:

This principle is sometimes called Grandmas Law “you cannot have dessert (high probability behaviour) until you have eaten all of your vegetables (low probability behaviour).

If the opportunity to engage in high probability is contingent on you engaging in less preferred behaviour, the future duration of the less preferred will increase.

Consequences which involve this contingent probability are likely to have a weakening effect on the high probability behaviour!!

What is a reinforcer?

A stimulus is an energy change which affects the organism through its receptors. The reinforcer is a stimulus which comes after a response and it increases or maintains future frequency of that response this is also called a – Positive reinforcer.

They can be visual, auditory, olfactory etc. They are usually tangible, attention, activities etc.

Variable Attributes of reinforcement:

Earlier we talked about consequences and the 8 different attributes of these and the role of extinction and recovery. Here we will explain these in basic terms.

Reinforcement can be conditioned or unconditioned. If it is unconditioned it is reinforcing without any prior learning and its effect is due to phylogenic provenance (genetics). They can sometimes be called primary reinforcers but the better term is unconditioned.

Examples of unconditioned reinforcer:

  • Sexual activity
  • Food
  • Water
  • Preferences for one food over another
  • Returning to normal temperature

Conditioned reinforcers are those which have no innate reinforcing properties but acquire reinforcing them through pairing with the unconditioned ones. or other powerful conditioned reinforcers. These are said to have ontogenic provenance (learned or experienced). These are also termed secondary reinforcers but conditioned is a better term.

Examples of conditioned reinforcers:

  • Toys
  • Sounds
  • Blankets (if you are cold)
  • Facial expressions
  • Dog lead (if he likes it)

Generalised conditioned reinforcers:

These have been paired with a variety of other reinforcers and they are effective for a wide range of behaviour. They are generalised reinforcers. These are worth mentioning because they are less susceptible to the effects of deprivation and satiation than other conditioned reinforcers.

Examples:

  • Praise
  • Cheese
  • walks

What Is Reinforcement

Positive ReinforcementAn environmental change where a stimulus is ADDED (this can be presented or magnified) following a response that increases or maintains the future frequency of that response.

Example: Marmalade is the behaver in this example

Tango is cuddling me on the sofa. Marmalade performs a play bow with a body flip and Tango jumps off the sofa and they engage in a game. In the future Marmalades play bow behaviour increases.

Marmalades play bow was positive because the consequence of this response was a game and that was the reinforcement which increased the behaviour in the future.

Negative Reinforcement- This is an environmental change in which a stimulus is SUBTRACTED (withdrawn or attenuated) following a response, it INCREASES the future frequency of that behaviour. For this to occur there must be an irritant, or aversive antecedent & the removal of which is reinforcing. It is sometimes called reinforcement by relief.

Example:

You bring your dog home from the vet with a cone over his head (Because you have not conditioned his cone wearing behaviour using appropriate conditioning) he finds it highly irritating. He also wants to scratch his stitches. He removes the cone from his head to get relief from the aversive situation and is relieved when it is removed.

Escape- This is when an aversive situation is already happening when the behaviour occurs. It terminates an aversive stimulus and it is maintained (increased) by negative reinforcement.

Example:

Fido sees another dog. He is frightened of other dogs. When he sees the other dog he is on the lead and he is unable to avoid the other dog. His owner thinks it is a really good idea to force Fido to say hello to other dogs and as the other dog runs towards Fido he manages to back out of his collar and run away to escape this aversive stimulus. Fido’s owner scolds Fido for being naughty and Fido’s need to escape is further increased by this negative experience and next time he decides he wants to avoid walks altogether in order to completely escape these types of scenarios.

Avoidance- A warning stimulus causes a response which leads to avoidance so that it can terminate the next part of the upcoming onset of an unconditioned aversive stimulus. This response prevents or deays the onset of the aversive stimulus. They are learned/conditioned. Sometimes there is un-signalled avoidance and no clear warning stimulus and this prevents or delays an aversive event and this is found to occur at regular time intervals.

Automatic Reinforcement (positive or negative)- The response itself produces the reinforcing consequences. The consequence is not mediated by another organism.

Example:

Tango vocalises and sings. Her vocalising is maintained by the beautiful sound she makes when she sings.

Example:

If I go to the fridge and get myself a glass of milk because I am thirsty then that is automatic reinforcement. If I ask Billy to go to the fridge and get me a glass of milk then the reinforcement is dependant on him giving me the milk and this is not automatic reinforcement.

Sometimes automatic reinforcement involves proprioreceptive feedback. This can involve other bodily sensations which are produced via the response. It is not the response itself which is the reinforcer, it is the sensations it produces. We have to make sure we are NOT inferring what is reinforcing the behaviour because of what it looks like. These behaviours can call be reinforced in other ways.

Example:

  • Repetitive behaviour
  • Flank sucking
  • Rocking
  • Pacing
  • Exercise high

Socially Mediated Reinforcement- The consequence is maintained by another person or people. Socially mediated reinforcement can be inadvertent or planned.

There is socially mediated

  • Positive reinforcement,
  • Negative reinforcement

Automatic

  • Positive reinforcement
  • Negative reinforcement

 

Positive Reinforcement Negative Reinforcement
Socially Mediated Attentions, Tangible, Access to preferred activity Escape from task, Having to comply, Setting
Automatic All of above except proprioreceptive feedback All of above escape from pain or discomfort

 

Schedules Of Reinforcement- There are a variety of schedules of reinforcement which we will not go into the details of here but it is important to note that reinforcement occurs in nature and every day life. Sometimes these are planned. It is important to note that something that is reinforcing in one condition might not be so in another condition. Schedules specify the criteria for reinforcement in terms of the number of responses and when the responses occur. The schedules can be complex or simple and they can be used to maintain or develop a behaviour. It is not likely that each response will produce a reinforcement and mostly it is intermittently reinforced. The schedules are naturally occurring or programmed. Most schedules consider that reinforcement is delivered for about every X number of responses. 

Planned Reinforcement- Programmed reinforcement means that a person explicitly arranged the contingency.

Example:

I am going to praise my dog every time he is sitting nicely at me to get attention. You do this and sitting nicely at you increases in rate.

Unplanned Reinforcement-  Contingency is not explicitly arranged. In either case there is no higher power!

Example:

You yell at your dog for jumping up. Yelling rate increases in the future. You did not plan to reinforce jumping up behaviour. This is inadvertent reinforcement.

Motivating Operations- The variables which affect the effectiveness of a reinforcer are termed motivating operations. These will be considered in more detail in other articles but the main areas to consider are:

  • Deprivation & satiation- e.g. if you are using food when training a dog and the dog has just eaten his supper, the food is not going to be as reinforcing as it would be if he had not eaten.
  • Species specific preparedness- some species can only be taught behaviours that are within the potentialities of their behaviour e.g. you cannot teach a dog how to fly because he does not have the structures required to fly
  • Response effort – A reinforcer may be effective at reinforcing one response but may not reinforce a different response or differing amount of same response and the amount of effort is a determining factor regarding reinforcer effectiveness for each response. e.g. giving a dog a piece of cheese contingent on him giving you 10 sits in a row at home with no distractions and he is hungry versus him giving you a sit when he is outside playing with a group of friends and having fun.
  • Competing reinforcers- These may alter each others value (this is concurrent with this schedule of reinforcement when there are two or more schedules simultaneously available). Here, different reinforcers are available for the same or competing behaviour at the same time.

Operant Extinction- This is where a previously reinforced behaviour is WEAKENED by WITHOLDING the reinforcement. The reinforcing environmental change is no longer available. when the response is emitted, nothing happens. The frequency of behaviour thus, decreases in the future to baseline level. It has to have previously been reinforced, the reinforcement has to be withheld each time the behaviour occurs and the behaviour has to be weakened!

Extinction- This term should only be used to describe the procedure of non-reinforcement of a behaviour that has PREVIOUSLY BEEN REINFORCED. Please be careful not to use this term where there is a decrease in a rate of responding and the procedure used is not factored in.

Attributes of extinction:

  • There is often an extinction burst (see graph above)
  • There are often variations in emotional and types of behaviours emitted by the organism following extinction

 

Variable Attributes Positive Reinforcement Negative Reinforcement
Extinction- There is no environmental change, nothing happens Withold Stimulus (following Response when behaviour was previously reinforced) this leads to a DECREASE The aversive stimulus is NO LONGER Withdrawn terminated or removed. Eg Behaviour removed by termination of an electric shock the responding no longer results in termination of an electric shock or an escape. Also ESCAPE-EXTINCTION
Note: Extinction from Automatic reinforcement = sensory extinction- HARD TO DO
Note: socially mediated reinforcement

 

Spontaneous Recovery- This is when a behaviour that has previously been extinct suddenly or temporarily reappears. This can happen in the same circumstances in which it was reinforced.

Resurgence- Sometimes called regression. A previously extinguished behaviour  resurges during the EXTINCTION of a more recently reinforced behaviour

Punishment- An environmental change after a response which DECREASES the future frequency of that behaviour. The definition is concerned with the EFFECT on the specific behaviour and not what it looks like.  Punishers are not punishers independent of their effects on behaviour.

Unconditioned Punisher- Stimulus which is punishing without any prior learning its effects are due to phylogenic provenance (genetics). Things like electric collars, prong collars fire, shock, or any pain inducing stimuli.

Example:

Tango climbed up on to the table to vocalise and look at Georgie. She fell off the table and hurt herself and never got up on the table again. This was probably termed an automatic punishment but it was a response which decreased the behaviour and it was an unconditioned punisher

Conditioned Punisher- Initially this has no innate punishing qualities. It is ontogenic in its provenance. It acquires punishing properties due to pairing with unconditioned punishers or other powerful conditioned punishers. e.g. stern look from Marmalade when I was foot tapping, a frown etc.

Positive Punishment- This is an environmental change following a response where a stimulus is added or presented, magnified. This DECREASES future frequencies of the response.

Negative Punishment- Environmental change where the stimulus is SUBTRACTED or attenuated following a response and this DECREASES the future frequency of that behaviour. This will only work if the removal is from a condition which is appetitive. This is often termed punishment by penalty. 

Example:

Tango is playing a zoom game with Marmalade and she then starts humping him. When she humps him she is removed from playing. The removal from play decreases her humping play activity in frequency in the long term.

Time-Out From Positive Reinforcement- This is a procedure which is based on the principle of negative punishment. The response starts a timer and whilst this timer is running there is no access to specific reinforcers. This will not make sense unless the antecedent is appetitive. Please note, there is time in (response) time out from positive reinforcement.

Please note: what is punishing at one time for an organism may not be punishing at other times. There are variables (operations) that affect the effectiveness of a stimulus as a punisher.

Motivating Operations- Variable that affect punishment.  A punisher is effective if there is a potential loss of a preferred item rather than one that has no utility. Also, if you are given a £1000 fine for a traffic offence and you have NO money this will be a worse punishment for you than if you are a multi-billionaire.

Competing reinforcement contingencies & punishment- The behaviour that is now being punished was previously reinforced it can be reinforced by automatic processes or another person. For example: A dog jumps up and gets told off, a dog jumps up he gets attention, he gets either of these things. The purpose of the jumping up is to get attention, he is finding both of these responses reinforcing and they  are not punishing to him because it is not decreasing the behaviour. The only person who thinks it is punishing is the human who is trying to punish the animal and is getting frustrated because it is not working. It can also be the case that there is punishment and reinforcement for the same behaviour.

Recovery From Punishment- A previously punished behaviour is strengthened by withholding a punishment. A recovery of the behaviour occurs because there is not consequence to the response and therefore the frequency of the behaviour increases. Usually to baseline levels.

Extinction From Punishment Vs Spontaneous Recovery

  • Operant extinction is the MIRROR image of Recovery from punishment
  • Spontaneous recovery is different from recovery from punishment because over time the behaviour reoccurs or increases for a period of time.
Consequence Stimulus Withheld EFFECT on behaviour
Punishment Recovery INCREASES
Reinforcement Extinction DECREASES

Reinforcement Punishment & Selection

There are behaviour changes as organisms adapt to their ENVIRONMENT. The behaviours that result are either a benefit for the organism-because are strengthened by consequences that are beneficial as a consequence of that behaviour. Or, the behaviour which results is a detriment and is weakened due to the detrimental consequences (punished).

The selection of behaviour is OPERANT and is important for survival. It happens over the life of the organism. The contingencies of reinforcement and punishment both select and deselect the behaviour. They are natural processes and work to teach an individual what does and does not work, through experience.

References:

Catania, A.C. (2007) Learning

Catania, A.C. 2013 Learning

Cooper, Applied Behaviour Analysis

Published Work:

Courses & Online Products:

Some Of The Posters (from the Essential Training Resource Guide) that can be purchased and displayed in vets, rescues, training halls, shops, used for presentations to help educate & raise awareness. 

Childrens dog body language colouring book & word finder is free with the poster pack (it can be purchased separately, please email for more information)

Email: info@simplybehaviour.com

To sign up to our newsletter through Facebook click here: Simply Behaviours Facebook Page