Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Animal Learning: Goal-Directed vs Habitual Behavior and Neural Dissociations, Study Guides, Projects, Research of Decision Making

The concept of instrumental conditioning in animals, focusing on the distinction between goal-directed and habitual behavior, neural dissociations, and their relationship with reinforcement learning. It also discusses the role of different brain structures in these learning processes.

What you will learn

  • What is instrumental conditioning and how does it differ from Pavlovian conditioning?
  • How does reinforcement learning fit into instrumental conditioning?
  • How are goal-directed and habitual behaviors distinguished in instrumental conditioning?
  • What are the neural dissociations between habitual and goal-directed behavior?

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 09/27/2022

slupdoggy
slupdoggy 🇬🇧

3.4

(5)

215 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Instrumental Conditioning VI:
There is more than one kind of learning
PSY/NEU338: Animal learning and decision making:
Psychological, computational and neural perspectives
outline
what goes into instrumental associations?
goal directed versus habitual behavior
neural dissociations between habitual and goal-
directed behavior
how does all this fit in with reinforcement
learning?
2
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Animal Learning: Goal-Directed vs Habitual Behavior and Neural Dissociations and more Study Guides, Projects, Research Decision Making in PDF only on Docsity!

Instrumental Conditioning VI:

There is more than one kind of learning

PSY/NEU338: Animal learning and decision making:

Psychological, computational and neural perspectives

outline

• what goes into instrumental associations?

• goal directed versus habitual behavior

• neural dissociations between habitual and goal-

directed behavior

• how does all this fit in with reinforcement

learning?

what is associated with what? 3 Thorndike:

S R

reinforcer

Skinner: what is the S? Tolman:

S R

cognitive map Tolman Tolman:

S R

cognitive map

“ The stimuli are not connected by just

simple one-to-one switches to the outgoing

responses. Rather, the incoming impulses are

usually worked over and elaborated in the

central control room into a tentative,

cognitive-like map of the environment. And it

is this tentative map, indicating routes and

paths and environmental relationships, which

finally determines what responses, if any, the

animal will finally release. ”

another example: shortcuts 7

training: test:

Tolman et al (1946)

result:

summary so far...

  • Even the humble rat can learn & internally represent spatial

structure, and use it to plan flexibly

  • Tolman relates this to all of society
  • Note that spatial tasks are really complicated & hard to control
  • Next:^ search for modern versions of these effects
  • Key question:^ is S-R model ever relevant? and what is there

beyond it? (especially important given what we know about RL)

the modern debate: S-R vs R-O

  • S-R theory:
    • parsimonious - same theory for Pavlovian conditioning (CS

associated with CR) and instrumental conditioning (stimulus

associated with response)

  • but: the critical contingency in instrumental conditioning is

that of the response and the outcome…

  • alternative:^ R-O theory^ (also called A-O)
    • among proponents: Rescorla, Dickinson
    • same spirit as Tolman (know ‘map’ of contingencies and

desires, can put 2+2 together)

9 How would you test this? outcome devaluation ?

Non-devalued

Unshifted

1 - Training:

??

3 – Test:

(extinction)

2 – Pairing with illness: 2 – Motivational shift:

Hungry Sated

Q1: why test without

rewards?

Q2: what do you

think will happen?

Q3: what would

Tolman/Thorndike

guess?

will animals

work for food

they don’t

want?

 animals with lesions to DLS

never develop habits despite

extensive training

 also treatments depleting

dopamine in DLS

 also lesions to infralimbic

division of PFC (same

corticostriatal loop)

Yin et al (2004) 13 dorsolateral striatum lesion control (sham lesion)

overtrained rats

devaluation: results from lesions I

after habits have been formed,

devaluation sensitivity can be

reinstated by temporary

inactivation of IL PFC

devaluation: results from lesions II Coutureau & Killcross (2003) 14

IL PFC

inactivation (muscimol) control

overtrained rats

devaluation: results from lesions III

lesions of the pDMS cause

animals to leverpress habitually

even with only moderate

training

Yin, Ostlund, Knowlton & Balleine (2005) 15 devaluation: results from lesions IV

Prelimbic (PL) PFC lesions

cause animals to leverpress

habitually even with only

moderate training

(also dorsomedial PFC and

mediodorsal thalamus (same

loop))

moderate training

control devalued Killcross & Coutureau (2003) 16

0 5 10 0 5 10 Lever Presses Magazine Behavior moderate training extensive training, one outcome extensive training, two outcomes moderate training extensive training actions per minute^ actions per minute

behavior is not always consistent:

leverpressing is habitual and continues for unwanted food…

...at same time nosepoking is reduced (explanations?)

devaluation: one more result Kilcross & Coutureau (2003)^19 why are nosepokes always sensitive to devaluation?

  • Balleine & Dickinson:^3

rd system - Pavlovian behavior is

directly sensitive to outcome value

  • But: doesn’t make sense... the Pavlovian system has

information that it is withholding from the instrumental

system?

  • Also.. true for purely instrumental chain
  • And anyway, it seems that all the information is around all

the time, so why is behavior not always goal-directed?

outline

  • what goes into instrumental associations?
  • goal directed versus habitual behavior
  • neural dissociations between habitual and goal- directed behavior
  • how does all this fit in with reinforcement learning? 21 back to RL framework for decisions

need to know long term consequences of

actions Q(S,a) in order to choose the best one

how can these be learned?

poke nose no food food in mag eating r = 1 press lever poke nose lever press nose poke lever press 3 states: “no food”, “food in mag”, “eating” 2 actions: “press lever”, “poke nose” immediate reward is 1 in state “eating” and 0 otherwise

strategy I1: “model-free” RL 25

S 0

S 1 S 2

L R

  • Shortcut: store long-term values
    • then simply retrieve them to choose action
  • Can learn these from experience
    • without building or searching a model
    • incrementally through prediction errors
    • dopamine dependent SARSA/Q-learning or Actor/Critic
Q(S 0 ,L) = 4
Q(S 0 ,R) = 2
Q(S 1 ,L) = 4
Q(S 1 ,R) = 0
Q(S 2 ,L) = 1
Q(S 2 ,R) = 2

Stored:

strategy I1: “model-free” RL

S 0

S 1 S 2

L R

  • choosing actions is^ easy^ so behavior

is quick, reflexive (S-R)

  • but needs^ a lot of experience^ to learn
  • and^ inflexible, need relearning to

adapt to any change (habitual)

Q(S 0 ,L) = 4
Q(S 0 ,R) = 2
Q(S 1 ,L) = 4
Q(S 1 ,R) = 0
Q(S 2 ,L) = 1
Q(S 2 ,R) = 2

Stored:

two big questions 27

  • Why should the brain use two different strategies/

controllers in parallel?

  • If it uses two: how can it arbitrate between the two when

they disagree (new decision making problem…)

answers

  • each system is best in different situations^ (use each one when it is

most suitable/most accurate)

  • goal-directed (forward search) - good with limited training,

close to the reward (don’t have to search ahead too far)

  • habitual (cache) - good after much experience, distance from

reward not so important

  • arbitration: trust the system that is^ more confident^ in its

recommendation

  • different sources of uncertainty in the

two systems

  • compare to: always choose the highest value estimated action value model free model based