How Does Operant Conditioning Work?

Operant Conditioning

Operant Conditioning, or Instrumental Conditioning, is the method of teaching associations between behaviors and the behaviors' consequences, thereby strengthening or weakening the behaviors. Strengthened behaviors are those that have high probability of re-occurrence, while weakened behaviors are those that have low probability of re-occurrence. Operant conditioning involves voluntary responses (operant behavior), whereas classical conditioning involves involuntary responses (respondent behavior).

Operant Conditioning and the Law of Effect

The basic principles behind operant conditioning originated from American psychologist E.L. Thorndike's Law of Effect. It developed at about the same time as classical conditioning. Also known as S-R or Stimulus-Response Theory, Thorndike's law of effect states that positive consequences strengthen behaviors, while negative consequences weaken them. Thorndike demonstrated his idea using a cat in a puzzle box, where the cat should successfully step on the ledger to free itself from the box. After only 5 trials, the cat was able to free itself from the box, initially after 150 seconds and then surprisingly after only 10 seconds of trying. This means that the cat has learned the correct stimulus-response association required to complete the task of the puzzle box.

Skinnerian Operant Conditioning

It was B.F. Skinner, also an American psychologist, who expanded Thorndike's law of effect. Much of the principles of operant conditioning known and used today came from Skinner's extensive research experiments. For example, he was able to successfully teach pigeons to guide missile direction. His pigeons constantly poked on a dot found on the screen to keep the missile on track while being fed with food pellets. Although Skinner's offer to use the pigeons to serve during World War I was rejected by the US Navy officials, the experiment showed the success behind operant conditioning. Another good example is the Skinner box, where a rat is taught to press the lever to get food. Skinner's method of conditioning went like this: random pellets were thrown to the tray to accustom the rat, then the lever is installed, and the rat gets food pellets whenever it occasionally presses on the lever. The reason why the Skinner box is named after Skinner is because Skinner subsequently improved the box to increase the degree of control he has over the experimental setting. He installed devices to precisely measure activity inside the box, and to avoid human error. First, he soundproofed it, then he installed a mechanical device to record the rat's responses, and lastly, he automated the food dispenser.

Skinner strongly believe that the same learning mechanism occurs across species. In 1948, Skinner published Walden Two, a novel about building a scientifically managed society. In his book, Skinner said that the modern society is poorly managed because of the widespread belief on the myth of free will, that environmental forces control behavior, and that recognizing these truths can lead to a happier and more prosperous life.

Reinforcements and Punishments

Reinforcement is the process of strengthening behavior through the use of rewarding consequences. Because learning take time in operant conditioning, a variant of reinforcement, called shaping, rewards instead developing approximations of the desired behavior. Punishment, on the other hand, is the process of weakening or extinguishing behavior through the use of aversive (or undesirable) consequences.

Reinforcements and punishments are categorized as positive/negative, primary/secondary and partial/continuous. Positive Reinforcement uses rewarding stimuli, while Negative Reinforcement removes the aversive stimuli, to strengthen behaviors. On the other hand, Positive Punishment uses aversive stimuli, while Negative Punishment removes rewarding stimuli, to weaken behaviors. Primary Reinforcements use rewarding stimuli that are innately satisfying, such as food, water and sex, while Secondary Reinforcements use rewarding stimuli that are learned (or conditioned), such as eye contact, a pat in the back or a smile. Token reinforcers, like money, for instance, may be exchanged for another reinforcing stimuli. On the other hand, Primary Punishments use aversive stimuli that are innately punishing, such as painful objects and poisonous substances, while Secondary Punishments use aversive stimuli that are learned (or conditioned), such as loss of trust and angry look from other people. Continuous Reinforcements use rewarding consequences all the time, while Partial Reinforcements use rewarding consequences only a portion of the time, in order to successfully establish association. Just the same, Continuous Punishments use aversive consequences all the time, while Partial Punishments use aversive consequences only a portion of the time, in order to successfully establish association.

Basic Processes of Operant Conditioning

Just like classical conditioning, operant conditioning is composed of four different processes - Acquisition, the initial learning of the response-consequence link; Generalization/Discrimination, the process of knowing when and when not to apply learned associations; Extinction, the process of removing associations between rewarding or punishing consequences and the behavior; and Spontaneous Recovery, the phenomenon of recovering learned associations despite continued absence of rewarding or punishing consequences.

Schedules and Timing of Operant Conditioning

Skinner observed that the rate of acquisition and extinction depends upon how and when rewards or punishments are given.

With Fixed-Ratio Schedule, reinforcements/punishments are given after a set number of behaviors. For example, commissions are given after selling a specific number of items. The problem with using this schedule is that performance drops off just after giving the reinforcement/punishment.
With Variable-Ratio Schedule, reinforcements/punishments are given on unpredictable basis, but on an average number of times the desired behavior is observed. For example, a slot machine might pay off at an average of every 20th time. The good thing about using this schedule is the resulting high rate of acquisition and the low rate of extinction. This is why slot machines are often addicting.
With Fixed-Interval Schedule, reinforcements/punishments are given after a fixed amount of time has elapsed. The problem with using this schedule is that the desired behavior picks up rapidly only when the time of giving the reward or punishment approaches. For example, students oftentimes study only when exam period approaches.
With Variable-Interval Schedule, reinforcements/punishments are given after a variable average amount of time has passed. Although the rate of acquisition with this schedule is slow, performance is consistent, and extinction occurs at a slow rate. Fishing is an activity that uses this type of schedule.

Contingency is an important aspect of acquisition in operant conditioning (just like with classical conditioning). Immediate timing of reinforcement and punishment works better than delayed reinforcement and punishment. This is because immediate timing results to direct association between behavior and consequences. This is the reason why ratio schedules work faster than interval schedules (although extinction also occurs at a faster rate). Some behaviors are immediately rewarding, and at the same time, delayingly punishing. For example, the major reason why obesity is such a common problem is because eating provides immediate satisfaction and weight problem is only a delayed punishing consequence. Therefore, behaviors that result to immediate reinforcements and delayed punishments are almost always strengthened. Some behaviors, however, are immediately punishing, although delayingly rewarding. For example, many adults tend to forego swimming lessons because the reward of learning to swim costs embarrassment over immediate mistakes. Unlike excessive eating, such behaviors are weakened.

Benefits and Application of Operant Conditioning

Principles of operant conditioning are applied in different areas - behavior modification, training and education.

Applied Behavior Analysis, or Behavior Modification, is the application of operant conditioning principles to change or control behavior. Underlying this method is the belief that behavioral problems are caused by inadequate or inappropriate consequences.

Operant conditioning is also applied in training animals to do certain tricks. It may also be used to toilet-train toddlers. Lastly, the education sector can benefit from using these principles to shape students to get higher grades. Today's computer-assisted instruction, which originated from Skinner's teaching machine, is proven to better than traditional teacher-based instruction on drill and practice of math problems.

How to Choose Effective Reinforcers

Operant conditioning wouldn't work if the choice of reinforcement/punishment is insignificant to the person involved. Because of this, it is important to attend to individual needs and preferences. However, the Premack principle may be used to have a general idea over what types of reinforcers work best. According to David Premack, high-probability activities reinforce low-probability activities. High-probability activities are those what most people love to do. Educators oftentimes use the Premack principle to engage students in "boring" class instructions by promising a lively and fun activity after.