Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Template Matching in Psychology, Slides of Cognitive Development

how brain captive the pattern. everything we see is precieved as an object in long term memory.

Typology: Slides

2021/2022

Uploaded on 03/31/2022

kaijiang
kaijiang 🇺🇸

4.4

(7)

281 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
The object in A can be
recognized over:
C – translation, as in
movement of the object or
gaze
D – changes in size
(distance from observer)
E – changes in lighting
(from upper left to upper
right)
F– shifts in picture-plane
orientation
G-shifts in depth-plane
orientation.
H&I – viewpoint shifts
caused by SOGI turning
in depth or observer
moving around the object.
Object Constancy refers to our
ability to recognize different two-
dimensional images as representations
of a particular three-dimensional
object (SOGI, in this case).
Note that changes in viewpoint might cause parts
of the object to become visible or to occlude other
parts of the object.
The star-shaped side of SOGI shown in H occludes
more of the figure than in A.
To a computer, all of the images in figure 1 are
just collections of numbers that specify the
intensity of each small dot, or pixel.
Teaching a computer to distinguish between the figures
in A and B would be relatively trivial because all that
would be required would be point-by-point
comparisons of pixels, but teaching the computer that C
through I are also the same figure as A would be much
more difficult.
Pixel-by-pixel comparisons would also find A to be
different from images in C through I.
Template Matching
Before considering 3-D figures, let us examine
possible representations of two-dimensional
patterns.
Template theories propose that patterns are not
really analyzed at all—templates are holistic
entities that are compared to input patterns to
determine amount of overlap.
Template matching works well in pattern
recognition machines that read letters and numbers
in standardized, constrained contexts (scanners that
read your account number off from checks,
machines that read postal zip codes off from
envelopes).
Template Matching Process for Letters: The letter must match the
template exactly as in (a). The template matching procedure can fail
because of (b) change in position, (c) change in size, and (d) change in
orientation.
Template Matching: Strengths
There is abundant physiological support that
simple features (lines and edges of
particular orientations) are represented in
the nervous system with template-like
receptive fields in the visual cortex.
They are amazingly reliable. If the to-be-
encoded stimulus is present, it’s template
will become active.
Template Matching: Weaknesses
The difficulty with template matching as a
model for perception is that contexts are
rarely constrained.
For instance, slight deviations in shape, size,
orientation, would prevent template
matchers from reading even the limited
number of letters (26) in English
They are not inherently view invariant. For
every different possible view, there would have
to be a different template (replication). As such,
template representations are uneconomical.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Template Matching in Psychology and more Slides Cognitive Development in PDF only on Docsity!

The object in A can be recognized over: C – translation , as in movement of the object or gaze D – changes in size (distance from observer) E – changes in lighting (from upper left to upper right) F– shifts in picture-plane orientation G-shifts in depth-plane orientation. H&I – viewpoint shifts caused by SOGI turning in depth or observer moving around the object.

Object Constancy refers to our

ability to recognize different two- dimensional images as representations of a particular three-dimensional object (SOGI, in this case).

  • Note that changes in viewpoint might cause parts of the object to become visible or to occlude other parts of the object. - The star-shaped side of SOGI shown in H occludes more of the figure than in A.
  • To a computer, all of the images in figure 1 are just collections of numbers that specify the intensity of each small dot, or pixel. - Teaching a computer to distinguish between the figures in A and B would be relatively trivial because all that would be required would be point-by-point comparisons of pixels, but teaching the computer that C through I are also the same figure as A would be much more difficult. - Pixel-by-pixel comparisons would also find A to be different from images in C through I.

Template Matching

  • Before considering 3-D figures, let us examine possible representations of two-dimensional patterns.
  • Template theories propose that patterns are not really analyzed at all—templates are holistic entities that are compared to input patterns to determine amount of overlap.
  • Template matching works well in pattern recognition machines that read letters and numbers in standardized, constrained contexts (scanners that read your account number off from checks, machines that read postal zip codes off from envelopes).

Template Matching Process for Letters: The letter must match the template exactly as in (a). The template matching procedure can fail because of (b) change in position, (c) change in size, and (d) change in orientation.

Template Matching: Strengths

  • There is abundant physiological support that

simple features (lines and edges of

particular orientations) are represented in

the nervous system with template-like

receptive fields in the visual cortex.

  • They are amazingly reliable. If the to-be-

encoded stimulus is present, it’s template

will become active.

Template Matching: Weaknesses

  • The difficulty with template matching as a

model for perception is that contexts are

rarely constrained.

  • For instance, slight deviations in shape, size,

orientation, would prevent template

matchers from reading even the limited

number of letters (26) in English

  • They are not inherently view invariant. For every different possible view, there would have to be a different template (replication). As such, template representations are uneconomical.

Template Matching: More

Weaknesses

  • Normalization with regard to size, shape, and orientation is one possible way around the problem, but individuals can read written messages that contain gaps in letters and variations in writing instruments, so the number of normalizations would be enormous.
  • Even with replication and normalization, it would be difficult to represent the third dimension (depth) with template matching (since the retina is a two- dimensional receptor array).
  • Standard templates contain no information about whole-part relations--the only two levels of representation are “whole entity” and pixel (or receptor).

Feature Comparison Models

  • For several decades, the most popular class

of shape representation was feature lists : a

symbolic description consisting of a limited

simple set of attributes.

  • According to this view, perceived shape is

defined by the set of features that an object

possesses.

  • The number of shared features can represent similarity/dissimilarity between two shapes.
  • Feature detection models assume that

perceptual systems detect the presence or

absence of particular features (as binary

variables, 0 or 1).

  • Feature analysis models also analyze

displays down into component features, but

they allow gradations between 0 and 1.

Gibson (1969)

  • One of the best-known feature theories is Eleanor Gibson’s (1969) account of perceptual learning of the alphabet by children.
  • She asserted that perceptual learning occurred through the discovery of distinctive features between letters.
  • Children first confronted with an “E” and an “F” may not be aware of how the two differ.
  • The distinctive feature is the lower horizontal line (present in the E but not in the F).

Feature Detection of Analysis

Features A E F H I L T K M N V W X Y Z B C D G J O P R Q S U Straight Horizontal + + + + + + + + Diagonal/Vertical^ + +^ +^ +^ +^ +^ +^ ++^ ++^ +^ + + + ++^ ++^ +^ +^ + Diagonal\ + + + + + + + + + + Curve Closed + + + + + + Open V + + Open H + + + + Intersection + + + + + + + + + + + Redundancy Cyclic changeSymmetry (^) + ++ (^) + + + + ++ (^) + ++ (^) + + ++ (^) + + + + +

Discontinuity Vertical + + + + + + + + + + + Horizontal + + + + +

Below is a table that represents the possible feature list for distinguishing all 26 capitol letters in English (E. Gibson, 1969)

  • Egeland (1975) showed that pre-kindergartners could be taught to distinguish between confusable letters (R-P, Y-V, G-C, Q-O, M-N, K-X) when the distinctive features were brought to their attention (by highlighting the distinctive feature in red).
  • During training the distinctive feature was gradually changed back to black to match the rest of the letter.
  • Another group of children was trained with black letters exclusively; they received only correct/incorrect feedback about their choices (“point to the R”).
  • The group trained to learn the distinctive features performed better immediately after training and one week later (delayed test), even though the features were not highlighted during testing.

Examples of Context Effects:

Top-Down Processing Word-Superiority Effect

  • One of the better phenomena for demonstrating contextual effects is the word-superiority effect.
  • Logically, one might conceive of the letters that compose words to be independent units of text, each one identified separately from the others.
  • One might also suppose that words are read on the basis of the letters composing them.
  • The fact that letters can be more quickly and accurately identified when they are imbedded in meaningful words than in meaningless letter strings has been demonstrated since 1886 (Cattell).

Cattell’s Original Demonstration of the

Word Superiority Effect (1886)

  • He compared the number of letters that subjects could report from 10-ms exposures to English words vs. non-words. His task was akin to: HOW MANY LETTERS CAN YOU REPORT NOW? HWO NMYA RSTELTE NCA OYU RPTERO NWO?
  • Instead of differences in the number of letters identified , it could well be the number of letters remembered that differed. Providing the letters in words allows for convenient “chunking.”

Modern Word Superiority Effect

(Reicher, 1969; Wheeler, 1970)

  • The effect, as it is now studied, is that single

letters can be identified more quickly and

with a higher level of accuracy when they

are imbedded in real words.

Word Superiority Effect

WORD XXXX

ORWD XXXX

D XXXX

D K

D K D K D K D K

Word Condition

Nonword Condition

Letter Condition

Results Precue No Precue 74% 83%

58% 70%

59% 70%

Target Display (50 ms)

Mask and Response Alternatives

Percent Correct

Precuing refers to telling (orally) the participants the two possible letters in advance of the stimulus.

McClelland and Rumelhart: Interactive Activation Model

Each letter can be represented by a subset of 12 possible segments. Therefore, there are 12 feature nodes at each of the 4 letter positions for a total of 48 feature nodes.

McClelland and Rumelhart: Interactive Activation Model

Interactive Activation Model

(IAM): Feature Level

  • The feature level consists of 12 features for

each of 4 letter positions: a total of 48

feature nodes.

  • Projection from the feature level to the letter

level is entirely bottom-up. There is no

feedback from higher levels.

IAM: Letter Level

  • The letter level consists of 26 letters at 4 possible positions: a total of 104 letter nodes.
  • The letter nodes all get excitatory input from all feature nodes that represent segments contained by the letter and inhibitory inputs from all feature nodes that are not present in the letter.
  • As such, the node representing a horizontal line at the top excites the letter nodes for A, B, C, D, E, F, G, I, O, P, Q, R, S, T and Z. It inhibits the representations of H, J, K, L, M, N, U, V, W, X, and Y.
  • When all segments for a particular letter present, the node for that letter will be highly active.

IAM (Continued)

  • Other letter nodes will be active for a given position, depending on the number of shared features.
  • For instance, the segments that comprise the McClelland and Rumelhart A will also activate the H, since the only difference between an A and an H is the presence of the top horizontal for the A but not the H.
  • To sharpen the representation of letters, each letter inhibits the other 25 letter representations (this implements the winner-take-all rule).

IAM: Word Level

  • The word level consists of a lexicon of more than 1000 4-letter words.
  • Each word receives excitatory input from the letter level for its constituent letters at the four positions and inhibitory inputs to the other 25 letters at each of the four positions.
  • As such, the letter A in the first position sends excitatory inputs to the word nodes for ABLE and ACTS, whereas B sends inhibition to the representations of both words. The node for A in the first position inhibits the word-level nodes of BACK and GAVE.
  • All word nodes are mutually inhibitory, so it is a winner-take-all network.

Illustration of five variables in constructing generalized cylinders. The central cube (A) can be modified to construct the 8 other geons shown by changing just one of five parameters: curvature of cross-sectional edges (B), cross-sectional symmetry (C and D), curvature of sweeping axis (E), diameter of sweeping rule (or cross-sectional size) (F and G), and aspect ratio (length of sweeping axis to the length of the largest dimension of the cross-sectional area) (H and I).

Recognition by Components

  • Biederman (1987) used these generalized cylinders as his building blocks for representing objects (“geons,” short for geometric ions).
  • RBC (Biederman, 1987) proposes that objects are represented by a finite number (about 36) of shape primitives (called geons).
  • These can be combined in different ways (different structural descriptions) to yield an infinite number of objects. - Since the structural description is included, RBC is a structural model.

Possible Set of Geons for Natural Objects: Cylinders, cones, blocks, and wedges may all be features of complex objects.

cylinder + noodle (side-connected) = Cup cylinder + noodle (top-connected) = Pail

  • Biederman, Ju, and Clapper (1985) studied the perception of briefly (100 ms) presented partial objects that lacked some of their components.
  • As more components (geons) are presented, RBC would predict better performance, since there would be a greater number of diagnostic matches to the object’s representation in memory. - The stimuli were line drawings of 36 common objects. - The total number of components that composed the objects (level of complexity) was varied at 2, 3, 6 and 9 (9 instances each). - Subjects received a list of item names prior to testing, but this probably had little effect on error rates or reaction times.
  • It is important that object recognition be robust to occlusion, noise, and rotation in depth, so a demonstration that we can name incomplete objects would add credence to RBC theory.

3 geon

3 geon

9 geon

9 geon

The authors showed that error rates were low and reaction times were fast for objects that consisted of as few as 4 (of 9 components). There was some improvement as the number of geons increased, consistent with RBC.

  • If recognition is based upon edge-based geons, then color, brightness, and texture should contribute little to recognition. - Biederman and Ju (1986) found equivalent reaction times and error rates for line drawings of objects vs. professional color photographs of the same objects. - This included results for objects for which color might be a major diagnostic attribute (bananas).

Recoverable vs. Non-recoverable Objects?

  • Biederman (1987) suggested that the

vertices contain information about the

relations between geons, and he predicted

that removing those line segments would be

particularly harmful to recognition (indeed,

parsing the figure down into component

geons becomes difficult).

Deletion at Vertices

Deletion Midsegment Complete

  • Although the geons are volumetric representations, they are encoded by the features of their two- dimensional retinal images (lines, edges, and vertices). The properties from which geons are to be recognized are their “nonaccidental features.”
  • These are aspects of the image structure that, if present, mean that it is very likely that they also exist in the object. - For example, if there is a straight line in the image ( collinearity ), the visual system assumes that the edge producing the line in the three-dimensional world is also straight. - The visual system ignores the possibility that the property in the image could have arisen from the highly unlikely “accidental” alignment of the eye with a smoothly curved edge (accidental viewpoints). - As such, geon identity is viewpoint invariant (constant across viewpoints). - Smoothly curved elements in the image ( curvilinearity ) are implied to arise from smoothly curved features in the three-dimensional world. - If an image is symmetrical ( symmetry ), we also assume that symmetry exists in the object. - When edges in the image are parallel or coterminate , we assume that the edges in the real- world object also are parallel or coterminate, respectively.

CONTRASTS IN NONACCIDENTAL PROPERTIES

f

a

d

e

b

c

g h i

a

b c

d e

2 parallel straight edges: (a, c) 2 parallel curved edges (d, e) 2 tangent Y-vertices: (abe) (cbe)

3 Sets of 3 parallel edges: (a, h, d) (b, e, g) (c, f, i) 1 inner Y-vertex: (g, i, h) 3 outer arrow vertices: (afg) (bch) (dei)

Non-accidental Properties of Two Geons Biederman and Gerhardstein (1993)

  • In this experiment, there were 48 line drawings of 24 objects (2 exemplars each).
  • Each was created at 3 views differing in depth rotation by 67.5 o^.
  • In the “priming block,” each object (1 exemplar) was presented either at 0 or 135 o^.
  • In the second block, the degree of rotation in depth was varied, as was whether or not the participant received the same exemplar.
  • The participants were to name (aloud) the object.

They find no effect of rotation in depth on object recognition.

  • In a subsequent experiment, Biederman and Gerhardstein wanted to examine whether or not novel shapes would show the same view-point invariance.
  • The stimulus set consisted of 10 5- component figures, with each drawn at 3 views separated by 45 o^ in depth.
  • In one of the views, parts that were not visible in the central view become exposed, while the other rotation presents the same 5 geons (the geon structural descriptions stays constant).

A B C A B C

A sequential matching task was used in which the first presentation was randomly chosen to be from any of the 3 views, and the second (750 ms later) was always of the view shown in B. As such, presentation of A-B would be presentations in which no geons changed, while C-B would be trials upon which the geon structural description changed. Participants only responded when the object depicted was the same in the two intervals (different views are considered as same trials; there were also trials in which different objects were presented in the two intervals. The findings show that RT and error rates only increased when the geon structural description changed across the two intervals. Only when the parts changed (for the same object) were the error rates and reaction times higher as function of angular disparity!

B-B

C-B

A-B

B-B (^) A-B

C-B

Tarr, Williams, Haywood and

Gauthier (1998): Are geons

really view-invariant?

  • This study sought to determine whether or not recognition of geons themselves was truly viewpoint invariant.
  • In particular, they were concerned that Biederman and Gerhardstein, in their first experiment, had used highly familiar objects that were probably learned from several different viewpoints.

Experiment 1a-e utilize a sequential matching task in which two images are presented sequentially and the observer must decide whether or not they represent the same geon where “same” responses apply to images that are rotated 0, 45, and 90 o^ (different trials were not used in the analysis). Experiments 2a-c utilized a match-to-sample task in which the observer ran blocks with one geon shown at 0 o^ followed by 12 trials consisting of three different orientations of the same geon (0, 45, and 90 o) along with 9 other geons interspersed. Participants pressed a key when the same geon was shown. Experiment 3 was a simple naming experiment in which verbal labels were learned to the 0 o^ representations but then tested with all 30 of the above figures.