PRIOR KNOWLEDGE EFFECTS
ON SIMOULTANOEUS LEARNING OF MORE THAN TWO CATEGORIES
researcher,
Institute for
Research in Social and Human Sciences,
ABSTRACT. The purpose of the present experiment is to
investigate the influence of prior knowledge during simultaneous learning of
more than two new categories. In everyday life, we learn categories composed of
both features that we are able to associate with other, previously learned
entities, and features that are isolated (rote features). In the experiment
subjects learned two and three categories, respectively, and then performed a
single-feature classification task. The results suggest that prior knowledge
help to integrate features, even if the increased number of categories
significantly reduces the speed of learning.
Keywords:
category learning, prior knowledge.
Categories
provide a solid basis for a wide range of processes. We use them in problem
solving, prediction, inference, induction, and so on (Ross, 2000; Spalding
& Ross, 2000; Murphy & Ross, 2000; Yamauchi & Markman, 2000a;
Yamauchi & Markman, 2000b; Smith & Minda, 2000; Kaplan & Murphy,
2000). Moreover, in order to complete efficiently our changing and renewing
tasks, we constantly have to learn new categories (including learning new
categorizations of already known entities). Although, at first glance the term
“new categories” may falsely suggest that in the process of learning we
use only of totally new information, in reality we heavily rely upon our prior
knowledge. By prior knowledge we mean knowledge about past categories,
general knowledge about the world, and knowledge connected to certain specific
domains (Murphy & Medin, 1985; Wattenmaker, 1999; Kaplan & Murphy,
1999). When we learn new categories, both specific and general knowledge about
categories with similar structure or content, and currently observable
exemplars of the new categories influence us. Such prior knowledge, even in
minimal amount, aid learning – at least of some features, directly connected
with them. This small amount of knowledge is usually related only to a few
features of the new objects, facilitating their learning, but there is no clear
evidence of their interference with other, unrelated features, called rote
features. Following Kaplan & Murphy (1999, 2000), we use the term rote
features (or rote properties) to refer to features that are not
directly connected to prior knowledge, and so, they are learned “by rote” (if
not otherwise).
There
are many possible ways that prior knowledge interacts with learning new
information. Prior knowledge could be triggered by the explicit use of labels
of past categories, or by some other cue-information that activates previously
formed representations. Once background knowledge is activated, information
from this might be “copied” into the new concept or might serve to connect both
current features to formerly learned ones, and knowledge-related features with
rote features. Evan Heit (1994, 1998, 2000) presented a similar proposal, but
regarding the learning of whole exemplars of new categories, called the integration
model of categorization. The fundamental claim for the integration model is
that when a person is confronted with a new item, this item is both compared to
actual observations, and past exemplars from previously learned resembling
categories. However, regardless of the unit of analysis (features or
exemplars), some crucial questions still remains. How could a person use his
background knowledge to make category learning easier, how could this person
know which knowledge would be useful and when, and in which direction influence
learning prior knowledge structures having different contents?
Most
everyday categories have a mixture of knowledge-related features and rote
features. It is obvious that if we access some relevant background knowledge,
the features related to them are learned quite efficiently. As for the rote
features, there are a number of possibilities to learn them. We may associate
them directly with the category’s label, learning them “by rote”; we may
associate them with the knowledge-related features, finding an explanation for
this association; or, we may relate them to prior knowledge triggered by the
context. As Kaplan & Murphy (2000) summarized, previous studies suggest
four possible ways in which prior knowledge may influence category learning.
One possibility is that knowledge-related features elicit previously learned
knowledge structures, which in turn directs attention toward these features,
and inhibits learning of the rote features (Murphy & Medin, 1985;
Wisniewski, 1995). Or, knowledge may actually aid learning of the rote
features, since knowledge-related features are quickly learned at the beginning
of the task. Using them in order to complete the categorization task leaves
more processing capacity for learning the rote features. An alternative
mechanism is that subjects do not use all the features/dimensions; they rely
only on a few of them. Due to the prior knowledge activated, they discover some
explanation for a few features related to this knowledge, which helps them
learn to categories the items (Murphy & Allopenna, 1994). In this case, the
competition between the features “ends before it begins”. Finally, subjects may
try to find some connection (not necessarily systematic connections) between
different types of features, even though those features appeared arbitrarily
together.
Most
experiments in category learning have simultaneously used only a single pair of
categories from the same domain. At the same time, subjects had to deal merely
with task-relevant information since no redundant or unnecessary (irrelevant)
information was introduced in the category descriptions. In everyday life
however, we simultaneously have to deal with more than two categories. When
confronted with only two categories, our task is somewhat easier because of the
very limited number and exclusive nature of classification possibilities.
Subjects have to choose between only two possibilities (“the third possibility
is excluded”), so that classifying a feature as belonging to one of the two
categories does not necessary imply that it was irrevocably associated with the
correct category. It is also possible to give the right answer by learning the
features specific to only one of the categories, and by deciding that the
actually presented feature does not belong to this category, so it has
to belong to the other one. Introducing a third category make the use of this
strategy impossible. For this reason in the present experiment we manipulated
the number of the categories to be learned. In this case, of course, we expect
learning speed to reduce because of the increased amount of features, but
without learning being less efficient in other respects. If in the case of
three categories prior knowledge make easier learning of features related to
them to the same extent as it does in the case of two categories, then more
processing capacity is relieved in order to learn other features. Subjects
should learn equally well both three and two categories. On the other hand, if
features were in competition, learning of knowledge-related would be less
efficient in this case, because less attention could be directed toward an
increased number of knowledge-related features. At the same time, more
non-attended (rote) features would be inhibited, leading similarly to less
efficient learning compared to the learning of only two categories.
Subjects.
Twenty-two persons participated in this experiment.
They were randomly assigned in equal numbers to one of the two conditions
(simultaneous learning of two categories – simultaneous learning of three
categories).
Materials
and design. In this experiment we used two (knight novels and love
novels), respectively three categories of books (knight novels, detective
novels and love novels). The categories’ factorial structure is presented in
Table 1. Each category had 5 exemplars, and each exemplar had 5 features
defined on 5 different dimensions. These dimensions were previously selected
based on the preference of five people who did not participated in the actual
experiment. They were instructed to select on which dimensions would they
describe a book (indicating the precise order of their importance). According
to these, the five most frequently selected dimensions were (1) the content of
the book, (2) the color of the cover, (3) the age of the characters, (4) the
size of the book, and (5) the type of the fonts used. The content of the book
was mostly expressed by (a) the theme of the story, (b) the title of the book,
(c) the logo of the book, (d) the outcome of the story, or (e) a key moment of the
story. These five dimensions were grouped as thematic or knowledge-related
dimensions (T1-T5 in Table 1), and rote-properties dimensions (D1-D4
in Table 1).
|
|
GAMON
books (category
A) |
MIRKO
books (category
B) |
LIKER
books (category
C) |
||||||||||||||
|
Dimension |
T |
D1 |
D2 |
D3 |
D4 |
T |
D1 |
D2 |
D3 |
D4 |
T |
D1 |
D2 |
D3 |
D4 |
||
|
Two-category
condition |
Exemplar |
1 |
T1 |
1 |
1 |
1 |
1 |
|
T1 |
2 |
2 |
2 |
2 |
||||
|
2 |
T2 |
2 |
1 |
1 |
1 |
T2 |
1 |
2 |
2 |
2 |
|||||||
|
3 |
T3 |
1 |
2 |
1 |
1 |
T3 |
2 |
1 |
2 |
2 |
|||||||
|
4 |
T4 |
1 |
1 |
2 |
1 |
T4 |
2 |
2 |
1 |
2 |
|||||||
|
5 |
T5 |
1 |
1 |
1 |
2 |
T5 |
2 |
2 |
2 |
1 |
|||||||
|
Three-category
condition |
Exemplar |
1 |
T1 |
1 |
1 |
1 |
1 |
T1 |
2 |
2 |
2 |
2 |
T1 |
3 |
3 |
3 |
3 |
|
2 |
T2 |
2 |
1 |
1 |
1 |
T2 |
3 |
2 |
2 |
2 |
T2 |
1 |
3 |
3 |
3 |
||
|
3 |
T3 |
1 |
3 |
1 |
1 |
T3 |
2 |
1 |
2 |
2 |
T3 |
3 |
2 |
3 |
3 |
||
|
4 |
T4 |
1 |
1 |
2 |
1 |
T4 |
2 |
2 |
3 |
2 |
T4 |
3 |
3 |
1 |
3 |
||
|
5 |
T5 |
1 |
1 |
1 |
3 |
T5 |
2 |
2 |
2 |
1 |
T5 |
3 |
3 |
3 |
2 |
||
Table
1. Factorial structures of the categories used in the experiment.
Each
exemplar (1-5) has one of the five knowledge-related (thematic) features
(T1-T5)
and
four rote features (D1-D4).
We
considered that the most appropriate dimension for the knowledge-related
dimension would be the content because of its obvious relation to
previously acquired knowledge. Features of the other four dimensions are not
specific to one or another book category, so they really must be learned “by
rote”, eventually by forced associations with the thematic or knowledge-related
features. In the case of thematic dimension, we derived a different feature
for each of the five exemplar of the category. The rote features were the same
for each of the five exemplar of the category, but they were mixed between each
other as indicated in Table 1. The only exception was the first exemplar of
every category, which had all the four rote features of the category it belonged.
This exemplar was the prototype of the category. A complete list of the
features used in both conditions is presented in the Appendix.
Procedure.
Subjects in two-category condition learned a single
pair of categories (knight novels and love novels), while subjects in
three-category condition learned all three of them. The categories were given
the following names: knight novels – Gamon, detective novels – Mirko,
love novels – Liker. The procedure consisted of two sessions: learning
and speeded single-feature classification.
On
the learning session, the stimuli were presented on a computer screen. On each
trial we presented a single exemplar from the two, respectively three
categories. The names of the categories were presented below every exemplar.
One presentation of all the 10, respectively 15 exemplars formed a learning
block. The order of the exemplars within the learning blocks, as well as the
order of the features of an exemplar was randomized on each presentation. Thus,
every description of an exemplar was “unique” in order to prevent the order of
dimensions serve as a cue.
Every
exemplar was presented for 10 s. Between the presentations of two exemplars there
was a break of 3 s. Thus, subjects had 10 s to decide to which category the
item belongs. They indicated their responses verbally, and received feedback
from the experimenter depending on subjects’ answer (correct or incorrect).
Thus, they received the feedback information only when the stimulus had already
disappeared from the screen. The learning session continued until the subject
succeeded to correctly classify all the exemplars within the same block.
On
the speeded single-feature session, on each trial we presented on the screen a
single feature from those previously learned. Subjects had to decide in which
category the feature occurred most frequently. Subjects were told that it is
important to classify the features quickly and correctly. They responded by
pressing the right or left arrow, which indicated the corresponding category
written above the arrow. There were two blocks of feature-presentation. Each
block of trials contained all the 18, respectively 27 features. The features
were randomized within each block. Number of learning blocks, feature
classification decisions and classification reaction time were recorded.
As
we expected, subjects learned to categorize the items significantly faster in
the two-category condition then in the three-category condition (M=2.3 blocks,
and M=6.0 blocks, respectively; t=4.52, p<0.0012). In fact, in the
two-category condition 66% of the subjects completed their task after only two
learning blocks.
Preliminary analyses indicated that there were no significant
differences between reaction times (RTs) for correct single-feature
classifications on the corresponding dimensions due to the three different
categories (Gamon, Mirko vs. Liker). Thus, assuming that the categories were
equivalents, we report averaged reaction times for both knowledge-related and
rote features. First, reaction times were entered in a repeated measures
analysis of variance (ANOVA) with learning condition (two-category vs.
three-category) as between-subjects factor, and testing block (first trial vs.
second trial) as a within-subjects factor. There was an overall main effect of
testing blocks showing significantly shorter reaction times in the second
testing trial [F(1,20)=25.56, p<0.001]. Both groups completed their tasks
significantly faster on the second trial on both feature types (Table 2.). Some
of the effects were similar in both cases; however, we treated separately data
from the two testing blocks. For example, although there was no effect of
learning conditions over trials, when we treated features types separately, we
found a slight learning condition effect for the knowledge-related features
[F(1,20)=3.804, p<0.064]. In the second trial, subjects in the two-category
condition performed marginally better on classifying knowledge-related features
then those in the three-category condition (t=2.191, p<0.039). This
tendency, however, was not detectable in the first trial. No such effects were
found for the rote features.

Another repeated measures analysis of variance with
learning condition (two-category vs. three-category) and feature type
(knowledge-related vs. rote features) indicated a significant main effect of
feature type [for the first trial: F(1,20)=9.913, p<0.005; for the second
trial: F(1,20)=22.676, p<0.001). No significant interaction between the two
factors was found. In both trials, subjects were significantly faster at
classifying knowledge-related features than they were at classifying rote
features (see Figure 1.). Nevertheless, in the first trial subjects in the
two-category condition were just moderately faster at classifying
knowledge-related features than they were at classifying rote features
(t=2.065, p<0.063), while subjects in the three category-condition were significantly
faster completing the same task (t=2.745, p<0.19).

Figure 1. Mean reaction times (in millisecond) for
knowledge-related and rote features for the two learning condition (reactions
times were recorded in the first trial).
As
regards the single-feature classifications error proportions, repeated measures
analysis of variance with learning condition (two-category vs. three-category)
and feature type (knowledge-related vs. rote features) showed a main effect for
the feature type [for the first trial: F(1,20)=9.697, p<0.011; for the
second trial: F(1,20)=11.305, p<0.007). In both trials subjects performed
better on classifying knowledge-related features than they performed on
classifying rote features (see Table 3.). No significant effects were found due
to testing trials or learning condition.

One
major question addressed here was whether the increased number of categories
influences category-learning performances. As predicted, subjects were largely
influenced by prior knowledge. Due to the knowledge-related features they
gained an advantage in learning even when they had to learn additional
features. Not surprisingly, however, learning speed was reduced in this case.
It also seems that the attentional focus hypotheses (Murphy & Medin, 1985)
do not account for our findings. In the three-category condition subjects
performed equally well at learning both knowledge-related features and rote
features as their colleagues in the two-category condition. If subjects had
learned knowledge-related features at the expense of the rote features, the
increased number of the former ones should have inhibited the learning of the
rote features more than it did in the two-category learning. Still there was no
evidence that subjects in the three-category condition learned less efficiently
the rote features than subjects did in the two-category condition. As regards
the knowledge-related features there was an exception, however; here we found
that subjects in the two-category condition performed marginally better in the
second trial of single-feature classification, but not in the first testing
trial. Yet, this does not necessarily
contradict our explanation – on the second testing trial subjects performed
better because of overlearning of the knowledge-related features. It is worth
mentioning that repetition of rote features did not lead to better performance
compared to the three-category condition.
One
reason why both knowledge-related features and rote features were equally well
learned in the three learning condition is that – although subjects had some
extra features to learn – they also had the more opportunities to learn them
than subjects in two-category condition. An alternative explanation suggested
by studies of Kaplan & Murphy (2000) may also account for our results.
Knowledge-related features triggered to the same extent prior knowledge in both
learning conditions. These knowledge structures obviously made easier learning
features related to them, which in turn allowed subjects to allocate more
processing capacity for learning rote features. On the other hand, in the
two-category condition the very limited number and exclusive nature of
classification alternatives allowed subjects to “keep away” from learning
features of both categories, since it was possible to make correct
classifications learning solely one set of features, and knowing that features
which are not from this (single) set have to belong to the other set. As we
have suggested earlier, this strategy does not necessary imply that both sets
of features were associated with the corresponding category. The use of this
strategy is also suggested by the fact that on the first testing trial subjects
in the two-category condition were only moderately faster at classifying
knowledge-related features than in the second trial. Introducing a third
category make the use of this strategy impossible. In the present experiment,
the third category may have compelled subjects to try to use more features as
cues and thus, they made more associations between knowledge-related features
and rote features (compared to the amount of such associations if they had
relied only on learning knowledge-related features). If this is the case, than
paradoxically the additional features did not impeded learning. Instead, it
forced subjects to integrate the two types of features, and to apply a more
elaborate learning strategy, which is presumably more similar to strategies we
use in everyday category learning.
The
present experiment cannot provide clear evidence for the use of such
strategies; it merely suggests that we use different strategies in order to
deal with different category learning conditions. The finding that prior
knowledge aids learning of knowledge-related features and, especially, learning
of rote features even if the number of categories is increased leads to further
questions. Does either the amount or complexity of the material to be learned
differently influence which knowledge structures and in what form will be used
in learning? Further investigation of such topics will provide perhaps
ecologically more valid models of category learning.
References
Heit,
E. (1994). Models of the effects of prior knowledge on category learning.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 24,
712-731.
Heit, E. (1998). Influences of prior knowledge on
selective weighting of category members. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 20, 712-731.
Heit, E., &
Bott, L. (2000). Knowledge selection in category learning. In D. L. Medin
(Ed.), Psychology of Learning and Motivation, (Vol. 39), 163-199. San Diego:
Academic Press.
Kaplan,
A.S., Murphy, G.L. (1999). The acquisition of category structure in
unsupervised learning. Memory & Cognition, 27(4), 699-712.
Kaplan,
A.S., Murphy, G.L. (2000). Category learning with minimal prior knowledge.
Journal of Experimental Psychology: Learning, Memory, and Cognition,
26(4),829-846.
Murphy,
G.L., Medin, D.L. (1985). The role of
theories in conceptual coherence. Psychological Review, 92, 289-316.
Murphy, G.L., Allopenna, P.D. (1994). The
locus of knowledge effects in concept learning. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 20, 904-919.
Murphy
G.L., Ross, B. (2000). Induction with cross-classified categories.
Memory & Cognition, 27(6), 1024-1041.
Ross,
B. (2000). The effects of category use on learned categories. Memory
& Cognition, 28(1), 51-63.
Smith,
J.D., Minda, J.P. (2000). Thirty categorisation results in search of a model.
Journal of Experimental Psychology: Learning, Memory, and Cognition,
26(1),3-27.
Spalding,
T., Ross, B. (2000). Concept learning
and feature interpretation. Memory & Cognition, 28(3), 439-451.
Wattenmaker,
W. D. (1999). The influence of prior knowledge in intentional versus
incidental concept learning. Memory & Cognition, 27(4), 685-698.
Wisniewski,
E.J. (1995). Prior knowledge and functionally relevant features in concept
learning. Journal of Experimental Psychology: Learning, Memory, and Cognition,
21, 449-468.
Yamauchi,
T., Markman, A.B. (2000a). Learning categories composed of varying
instances: The effect of classification, inference and structural alignment.
Memory & Cognition, 28(1), 64-78.
Yamauchi,
T., Markman, A.B. (2000b). Inference using categories. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 26(3),776-795.
Appendix
Thematic
features (thematic dimension):
T1:
the logo is an armored knight figure
T2:
the title is: The Black Knight
T3:
the main character is going to war
T4:
at the end the main character dies on battle
T5:
it (the story) happened a long time ago
Rote
features (dimensions 1-4):
D1:
the cover is brown
D2:
the main character is about 45 years old
D3:
it has the size of a sheet
D4:
is printed with Normal characters
Thematic
features (thematic dimension):
T1:
the logo is a revolver
T2:
the title is: Murder for Revenge
T3:
the main character is interrogating everybody
T4:
at the end the main character finds out everything
T5:
it is about solving a mystery
Rote
features (dimensions 1-4):
D1:
the cover is green
D2:
the main character is about 40 years old
D3:
it has the size of an envelope
D4:
is printed with Italic characters
Thematic
features (thematic dimension):
T1:
the logo is a heart
T2:
the title is: Love story
T3:
the main character makes courts to a waitress
T4:
at the end the main character is getting married
T5:
it is about relationship between two people
Rote
features (dimensions 1-4):
D1:
the cover is blue
D2:
the main character is about 35 years old
D3:
it has the size of a notebook
D4:
is printed with Bold characters