Norman W. Bray, Ph.D.
Kevin D. Reilly, Ph.D.
Lisa A. Grupe, M.A.
Developmental Review, vol. 17, 1998.
This research was supported by research grant HD19426 from the National Institute of Child
Health and Human Development, an in part by NSF DUE-935-1476, and a grant from the UAB Cognitive
Science Program. An earlier version of this paper presented in the Symposium "New Themes in Strategy
Development" at the Meeting of Society for Research in Child Development, March, 1995.
Department of Psychology and
Civitan International Research Center
University of Alabama at Birmingham
Department of Computer and Information Science
University of Alabama at Birmingham
Department of Psychology and
Civitan International Research Center
University of Alabama at Birmingham
Mailing address: Department of Psychology and Civitan International Research Center, SC 313, University of Alabama at Birmingham, Birmingham, AL 35294. Phone: (205) 934-9768, FAX: (205) 975-6330. Send Internet email to: bray@cis.uab.edu
The focus of this paper is on mechanisms that may be responsible for intellectual and developmental difference in the cognitive strategies of typical and atypical children, including those with mental retardation. The discussion of these mechanisms is based on behavioral experiments on external memory strategies and on a set of neural network models designed for these tasks. Following the review of the external memory research, the rationale for using neural network models, how they have been used in other research, and their specific application to intellectual and developmental differences in external memory, including the results of several simulations, are reviewed. This is followed by a discussion of the mechanisms of intellectual differences and developmental change included in the models and some challenges for this type of modeling. Neural network modeling is discussed as an asset to research on cognitive development.
This paper describes the initial results of a program of research to develop a theoretical framework for characterizing the mechanisms that may be responsible for intellectual and developmental differences in the cognitive strategies of typical and atypical children, including children with mental retardation. This framework is based on a set of neural network models and on behavioral experiments designed to test implications of the models. The interaction between empirical research and computer models is designed to suggest new psychological mechanisms and biologically plausible processes that may account for intellectual and developmental differences in strategy use. Our interrelated neural networks build on successful models of behavioral data (e.g., Anumolu, Bray, & Reilly, 1993; Reilly, Bray, Villa, & Anumolu, 1993; Reilly, Bray, Villa, Caniglia-Reilly, & Golding, 1996 ) and our previous empirical research on cognitive strategies (e.g., Bray, Saarnio, Borges, & Hawk, 1994; Bray & Turner, 1986, 1987; Fletcher & Bray, 1995).
Empirical research on the development of problem-solving strategies in typical and atypical children underscores the importance of strategies in nearly all aspects of higher-order learning. It is generally assumed that developmental changes in memory performance are due, in part, to increases in strategy use and to growth of the knowledge base (Brown, Bransford, Ferrara, & Campione, 1983; Siegler, 1991). There is increasing agreement, however, that the precursors of strategy development may be found in early attempts to remember where objects are placed. Very young children use external orientation (touching, pointing, etc.) in tasks requiring memory for the location of objects in the environment. For instance, during a hide-and-go-seek game, children 18 to 24 months of age will look at the hiding place, point to it, hover near it, and even peek at it during a delay interval (DeLoache, Cassidy, & Brown, 1985). Three-year-old- children will look at a target, point to it, or touch it as a means of remembering the location of a hidden object (Wellman, Ritter, & Flavell, 1975). Preschool children will also manipulate to-be-remembered objects, and this manipulation may increase recall accuracy (Baker-Ward, Ornstein, & Holden, 1984).
Older children and adults also use external memory strategies. For example, children report placing their lunch boxes and other school items in a special place (e.g., next to the door) (Kruetzer, Leonard, & Flavell, 1975), and many adults strategically place their briefcases and other items (again, next to the door) in order to remember to take them to school or work (Intons-Peterson & Fournier, 1986). In aging adults, the use of external aids for remembering has been shown to be widespread, and there are many commercially produced items to assist the elderly with memory difficulties (e.g., pill organizers, timers, etc; Petro, Herrmann, Burrows, & Moore, 1991). These examples illustrate that people use external strategies across the lifespan. Each type of external memory strategy or physical aid, by eliminating the need to covertly code memory for an object or event, reduces the processing demands on the cognitive system and facilitates adaptation to the environment. Our research, by focusing on how children hold, move, and/or arrange objects in the environment in their attempts to remember, is designed to understand how such adaptive behaviors change with age and intellectual level.
The goals of our research have included the development of a theoretical framework that is more explicit than traditional theories of strategy development, one which can be implemented in computer simulations, and that is constrained, in part, by neurobiological structures and processes related to cognition. Our program of research began by building on Siegler's theory of strategy development. Although Siegler's approach is not biologically motivated, it has been implemented in a computer simulation (Siegler, 1991; Siegler & Jenkins 1989; Siegler & Shipley, in press). We build on Siegler's theory of strategy development and agree with his emphasis of the importance of going beyond merely descriptive approaches of outlining developmental differences in strategy use to understanding the mechanisms that may be responsible for strategy change. We extend Siegler's approach to account for strategy development in children with mental retardation but depart from his simulations by developing neural network models rather than simulations relying on statistical regression models. In the connectionist vein, we believe that the representation and processing of knowledge in neural networks putatively comes closer to that in the brain than schemes such as statistical regression models. Moreover, neural networks exhibit characteristics such as adaptability and generalization that are evident in the strategy evolution among humans and may be based on learning mechanisms that are biologically plausible (e.g., Hebbian learning rules).
Ours is the only systematic program of research known to us with the goal of devising a computational model of strategy development in typical children and atypical children with mental retardation. Neural network models of intellectual and developmental differences in strategy use would be of considerable theoretical interest and could be of great practical value because children with mental retardation seem to have a pronounced deficit in this aspect of cognition (Bray & Turner, 1986, 1987).
We will proceed by describing the empirical tasks we have used to investigate external memory strategies in typical and atypical children, then provide a brief overview of why we are committed to developing neural network models of strategy development. Next we will describe our models and the theory they entail, followed by a summary of simulations addressing the correspondence between the output of our models and the strategy behavior of children, and simulations that demonstrate the operation of a variety of mechanisms of strategy development. Finally, we will have some concluding remarks about the significance of our modeling efforts and the implication for understanding strategy development.
Description of the External Memory Tasks
Three different external memory tasks have been used in the empirical research and simulated with our neural network models. The first, the instruction following task, was developed by Bray, Saarnio, Borges, & Hawk (1994). As shown in Figure 1, there were 12 movable objects and 6 fixed targets. Children heard sequences of verbal instructions such as "Put the shoe on the chair" and "Put the comb in front of the refrigerator". The children then used a variety of strategies to remember where to move the objects, including picking up objects, holding objects, moving objects, and arranging objects with respect to the target. As a result of developing our neural network models, we now analyze these external memory strategies according to three categories: (a) object encoding strategies (holding objects, moving objects with no alignment toward target), (b) object-target encoding strategies (moving objects with alignment toward targets), (c) object-target-relation encoding strategies [moving objects with alignment toward targets and arranging the objects to code their relation to the target; when the preposition is "on", the object is placed on a yellow wooden board dividing the display (Figure 1), but when the preposition is "in front of", the object is placed in front of the yellow wooden divider].
The second task is the relations memory task developed by Fletcher and Bray (1995) and used in studies by Bray, Fletcher, Huffman, Hawk, and Ward (1994) and Bray, Fletcher, Huffman, Hawk, Ward, and Blair (1996). The task involves relating an object to other objects rather than to fixed targets and is similar to the relations memory tasks used in Johnson-Laird's (1983) research on mental models. In our implementation, the task allows the child to "externally" represent the mental model. The memory task was embedded in a tape recorded story in which the participant was guided through a "haunted house" by a "friendly ghost". On each trial the participant heard from one to seven sentences such as "The broom is above the ghost" and "The lamp is on the blue side of the broom." At the end of the sequence, the participant placed the miniature objects on a computer screen (see Figure 2). As in the instruction following task, the children used pointing, holding, moving, and arranging strategies in their attempts to remember. These are classified as object encoding (holding or moving the object with no relation to the others) and object-object-relation encoding (placing the objects in relation to others in an arrangement that mimics the to-be-remembered array on the computer screen). Many of these strategies are analogous to everyday memory strategies such as the aforementioned putting something by the door in the morning in order to remember to take it to school.
The third task is a simplified version of the instruction following task for preschool children (Fletcher & Bray, in press). This task is similar to the one developed by Bray, Saarnio, Borges, and Hawk (1994) in that children listen to a sequence of tape-recorded sentences (e.g., "The book is on the bed; The ball is on the rug") and then move miniature objects to their specified target location. The preschool version of the instruction following task, however, used fewer miniature objects and targets and fewer relations among the objects and targets than were used in the instruction following task for school-aged children.
These tasks and these strategies are significant because, unlike the largely verbally-based tasks used in most studies of strategy abilities of young typical children and children with mental retardation, this task allows the use of non-verbal (external) strategies. Further, our external memory tasks allow multiple strategies whereas most previous tasks in this area focus on only one strategy. We have found that a more comprehensive picture of the abilities of children requires tasks in which multiple strategies are possible because children use not just one, but a variety of strategies.
The instruction following task, the relations memory task, and the preschool version of the instruction following task differ in the number of sentences presented per trial, the type of strategies used, and the number of objects, targets, and relations to be remembered. Our neural network models have been developed to accommodate this range of variation in task demands and may be readily adapted to other variations in task demands.
The General Function of Computer Models and their Utility for Strategy Development
One important goal of this program of research is to develop a theoretical framework for developmental differences in strategy competence. This goal includes the description of developmental differences in external strategy use but goes beyond the descriptive level in attempting to make a contribution to understanding possible cognitive mechanisms that may be responsible for developmental differences in strategy use. We believe that discovery of these mechanisms will require a level of theoretical explicitness not characteristic of extant empirical studies. For this reason, we are committed to the development of computer models as an important methodological tool that will aid us in the development of a theoretical framework for addressing the mechanisms responsible for developmental changes in typical children and children with mental retardation.
The development of computer models requires that an investigator specify, in detail, the components and mechanisms of the theory relevant to the behavior to be simulated. Investigators in Artificial Intelligence (AI) frequently claim that the only way to test a theory of cognition is to express it in the form of a computer program and to demonstrate that the simulated behavior is similar to actual behavior (e.g., Newell, Young, and Polk, 1993). If we are to understand the mechanisms of strategy use, it seems particularly important to develop a more explicit theory in the area of strategy development, and the use of computer simulations is one methodological tool that will take the area in this direction.
Why Focus on Neural Network Models?
As mentioned, neural networks exhibit characteristics such as adaptability and generalization that are evident in the strategy evolution among humans and may be based on learning mechanisms that are biologically plausible. Another compelling reason to focus on neural networks is that these models, like the human brain, respond to multiple simultaneous constraints (Rumelhart & McClelland, 1988). The neurons of the brain respond almost continuously to a variety of environmental and internal patterns of stimulation; this is also a fundamental property of neural networks. Similarly, strategies are devised in nearly endless varieties in response to changes in context. As noted by Rogoff (1990), an emerging view of cognition is that it involves the use of multiple constraints and the resources provided by a context.
Neural networks also provide a way of looking at the development of rule-like behavior without assuming that the "rules" are "in the child's head" and are either used or not used because of some "meta" knowledge. Rather, rule-like behavior is in response to learning under multiple constraints and being tested under conditions with the same or similar constraints. Explicit representation of rules, which has been the focus of much effort in developmental psychology, may simply not be necessary to understand strategy development.
Other Research Programs Using Neural Network Models
Several other research programs have used neural network models successfully. While an exhaustive treatment of this issue would exceed the space available here (for overviews see Anderson, Pellionisz & Rosenfeld, 1990; Anderson & Rosenfeld, 1988; Bechtel & Abrahamsen, 1991; Carpenter & Grossberg, 1991; Churchland & Sejnowski, 1991; Levine, 1989, 1991; Seidenberg, 1993; Sejnowski, Koch, & Churchland, 1990; Wasserman, 1989), a large number of investigators has used neural network models to more clearly understand typical and atypical development and other aspects of cognition (McClelland, 1989). In fact, some of the early psychological issues to which neural network models were applied involved developmental issues such as how typical children might learn past tenses of English verbs (Plunkett & Marchman, 1991; Rumelhart & McClelland, 1986, 1987). These models led to the development and empirical tests of other neural network models of language development such as vocabulary (Plunkett & Sinha, 1991), learning to pronounce English words (Sejnowski & Rosenberg, 1987) and the transition from beginning to skilled reading (Seidenberg & McClelland, 1989). Neural networks have also been successfully applied to other aspects of cognitive development such as developmental changes in the judgment of balance using the balance beam task (McClelland & Jenkins, 1991), the development of the concept of same (Smith, 1993), and the learning of simple addition (Anderson, Spoehr, & Bennett, in press; Campbell & Oliphant, 1992; McCloskey & Lindemann, 1992). Additionally, neural network models have proven to be useful tools in studies of atypical development as illustrated by the application of these models to the differential diagnosis of autism and mental retardation (I. Cohen, in press; I. Cohen, Sudhalter, Landon-Jimenez, & Keogh, 1993), and in the computer simulation of dyslexia (Hinton & Shallice, 1991; Patterson, Seidenberg & McClelland, 1989). Neural network models have also been used as tools in research on neurological disorders including prosopagnosia in which patients cannot overtly recognize faces but can demonstrate some level of recognition when tested with indirect measures (Farah, O'Reilly; & Vecera, 1993), certain frontal lobe dysfunctions (Levine & Leven, 1989), and impairments of semantic memory due to brain lesions (Farah & McClelland, 1991). The research programs mentioned here (which represent a small fraction of the application of neural network models to aspects of cognition) have used neural network models to specifically implement a variety of psychologically and biologically motivated mechanisms and architectures. The results from atypical development, in particular, bode well for the success of the present research program. For example, I. Cohen (in press), in modelling the learning abilities of children with autism, has used neural networks to model possible consequences of having too many or too few neuronal connections. His results indicated that models with too few connections led to problems in discrimination learning and poor generalization, and too many led to good discrimination but poor generalization, the latter being the pattern typically observed in children with autism. His simulations may lead to additional work on the hypotheses of an abnormally large number of neurons in the brains of children with autism.
A second encouraging example of an application of neural networks to an aspect of atypical development is that of Hinton and Shallice (1991) who imposed artificial lesions ("removal" of connections by fixing their weights at zero) in neural networks trained to decode letter strings. The "damaged" networks exhibited error patterns that were similar to those obtained in individuals with dyslexia. Virtually the same results were obtained regardless of where the "damage" was sustained. This brief overview indicates that neural network models have been successfully applied to a variety of problems in cognition, including problems of atypical development.
General Properties of Neural Network Models
Neural network models are computer programs consisting of a system of interconnected artificial neurons (nodes) constrained, in part, by the metaphor of how the brain operates. As in the brain, each node receives an input from one or more nodes in the system. For example, in Figure 3, the ellipses at the left represent input nodes, and the values INm represent inputs from the external environment. Each connection from an input node to an internal node ("artificial neuron") has a weight, Wmn , as shown in Figure 3. In the course of a simulation run, these weights are adjusted to reflect the relative importance of different patterns of external input or events. In effect, the weights represent differential experience and the effects of learning. Typically, these weights are updated after each learning cycle. The activation value of each input node is multiplied by the weight of each connection, and the activation value of the artificial neuron is the sum of these products ( NET in Figure 3). The activation value is then adjusted by the "transfer function", F, which modifies the activation value and this transformation is the output of the node. There may be different types of transfer functions including a sigmoid function (Figure 3), a step function, and others. In the case of a step function, the output of the node will be zero if the activation value does not exceed a threshold value. The transfer function regulates the output of the node, keeping it within mathematically defined limits. After some flexible number of cycles, the trained system generates output that simulates some aspect of intelligent behavior such as pattern recognition, categorization, or, in the case of our models of external memory strategies, strategy use and accuracy of recall.
Neural network models fall into two broad families (Rumelhart & McClelland, 1988). In local representation models, each node stands for a complex concept. In such models the representation of the sentence "Put the eraser on the table" would consist of the activation of the nodes for "eraser", "on" and "table" (with the other elements implicit). In these models, it is understood that each node represents a complex system of nodes not yet explicitly developed. Nevertheless, there may be a great deal to be gained by understanding the systemic properties of these networks before developing the details of how each of these sub-networks might be configured. These models have the drawback, however, of being less biologically plausible. We know, for example, that complex concepts do not seem to be localized (with the possible exception of some cells for aspects of facial recognition).
In distributed representation models, each node represents a feature of a stimulus which may or may not be readily identified. For example, a distributed representation of "eraser" might involve activation of the first and second node but not the third node of a system (e.g., 1,1,0). In this system there is no one node representing "eraser"; rather the representation is distributed across the first three nodes. If the first seven nodes of the system were used to represent sentences of a particular type, the sentence "Put the eraser on the table" might then be represented by the activation of a particular pattern of nodes (e.g. 1,1,0,1,0,0,1). In this sense, the representation of the sentence is distributed across seven different "on-off" nodes.
Distributed representation models are regarded as more biologically plausible in the sense that representations of complex concepts are distributed. No claim is made, however, that the distribution of activation in the brain is the same as the distribution in neural network models. Nonetheless, distributed representation models allow the exploration of the implications of putatively more brain-like representation and processing. The disadvantage of this type of model is that the particular pattern of activations cannot always be readily identified with concepts unless the features represented by each node can be specified a priori (which in most cases, especially in large monolithic neural networks, is not possible).
In our neural network modeling, we have developed local representation models, distributed representation models, and models with both local representation and distributed representations. In general, each family represents a different level of theoretical explanation, local being more symbolic (systemic), and distributed being more "subsymbolic" (Smolensky, 1988). At this point, there are no clear guidelines for which level of model development will be the most informative for understanding the mechanisms of strategy development [see Sun (1995) for an extensive discussion of modelling levels]. Thus, we have developed models that represent both families, although we have, to date, devoted more effort to local representation models, some with a combination of local and distributed representations.
Neural Network Models of External Memory Strategies
In this section, we provide a brief description of each of the models that we have developed. The family of models using local representation of concepts, listed in order of increasing complexity and in the order in which they were developed, includes the Sequencer, Sequencer/Associator, Novelty Bias, and Components models. Each of these models is embedded in the Generalized Components/Attention Bias model (Figure 4). The description of these local representation modes is followed by a description of a model with both local and distributed representations, the Generalized Strategy Abstractor model (Figure 5). The construction of these models has been modular in the sense that the models consist of distinct, interrelated components (Hrycej, 1992), each designed to represent one aspect of strategy behavior. The development of each module was constrained (a) by the tasks used in our empirical research to study external memory, (b) by prior empirical and theoretical concepts drawn from developmental psychology, (c) by prior neural network research, and, (d) loosely, by basic aspects of neurobiology.
Sequencer Model. The Sequencer Model (Anumolu, Reilly, & Bray, 1992a) is a neural network model designed to represent knowledge of a sequence of events (see Sequencer Module, Figure 4, Top). All of our empirical research on external memory involves the presentation of a sequence of sentences such as "Put the eraser on the table; Put the pencil in front of the chair" (Bray, Saarnio, Borges, & Hawk, 1994), or "The coin is above the ghost; The lamp is on the blue side of the coin" (Fletcher & Bray, 1995), with 1 to 7 sentences presented per trial. One of the first steps in our program of developing neural network models was to devise a network that would represent a sequence.
Sequencer/Associator Model. This model (Anumolu, Reilly, & Bray, 1992b) begins with the sequencer module and adds an associative memory module (Figure 4, Top). The result is a modular neural network that learns and recalls representations of sentences like those used in our empirical work.
Novelty Bias Model. Like the Sequencer/Associator model, this neural network (Anumolu, Bray, & Reilly, 1993) includes the sequencer and associative memory modules of the Generalized Components/Attention Bias model (Figure 4 , Top) and adds four additional modules representing (a) strategies, (b) attention/bias, (c) accuracy/attention, and (d) trial initiation (Figure 4).
The external strategy module consists of three nodes, each with selective connections to the entities in the object, target, and relation pools (Entity Pools 1, 2, and 3, respectively in Figure 4) of the associative memory module. This selective connectivity is crucial for understanding how the model generates different levels of recall depending on the strategy used. Node 1, representing an object encoding strategy, is only connected to nodes in the object pool because only objects are involved in an object encoding strategy observed in our empirical research. Node 2 is connected to the nodes of object and target pools because both objects and targets are encoded with this strategy. Node 3 is connected to the nodes of all three pools because object-target-relation strategies encode objects, targets, and relations. All these connections are excitatory. This architecture means that when a strategy is activated, it raises the activation of the corresponding nodes in the relevant entity pools. Thus, when an object encoding strategy is activated, the activation value of the nodes of the object pool is raised, etc.
Novelty bias is a mechanism which represents the empirical observation that adults and children will try new strategies even though the strategies they have used previously are successful (e.g., Siegler & Jenkins, 1989). When a novelty bias unit is activated (an attention/bias node in Figure 4), the corresponding external memory strategy node receives an increase in its activation that is a random proportion of the weight of the connection between the novelty bias node and the external strategy node. Over simulated trials (epochs), however, there is a steady decay in the connection weights, meaning that the "bias" toward this once "novel" strategy is decreasing, making it more likely that the other strategies will successfully compete for execution.
The accuracy-feedback module is motivated by the computer simulation of Siegler and Shipley (in press) for early addition strategies. Each node has one connection to an "external teacher" which keeps track of whether recall was correct (this external teacher will eventually be replaced by another neural network module).
The model implements a "winner take all" mechanism in which the strategy with the highest activation value at the beginning of a trial remains active, and the other activation values drop to zero, simulating the use of only one strategy per trial (which is what we have observed during most trials of our empirical research). This aspect of the simulation models the process of "mutual inhibition" similar to the "competitive learning" neural networks of Grossberg (1976) and Rumelhart and Zipser (1985) and an important aspect of some types of neural processing in the brain. It is also similar to the inhibition mechanisms discussed by Bjorklund and Harnisfeger (1995).
The novelty bias model generates several simulated behaviors that are similar to those observed in our empirical studies of external memory strategies and in Siegler's studies of early addition strategies (Siegler & Jenkins, 1989). First, once a strategy "emerges", it is likely that it will not be used exclusively, and the simulated child will use less sophisticated strategies such as object encoding after using an object-target encoding strategy as we observed in our empirical work and as Siegler observes in his research on addition strategies. Second, accuracy of recall for our simulated children increased with the sophistication of the strategy, and there were primacy and recency effects in recall as obtained in our empirical research.
Components Model. In our modular approach, the Components model (Anumolu, Bray, & Reilly, in press) consists of the same modules as the Novelty Bias model (see Figure 4) with the deletion of the novelty bias module and the corresponding addition of a module representing tactics and a dimensional encoding mechanism. The tactics module represents our theoretical construction that the tactics involved in strategies in general, and external strategies in particular, have a hierarchical structure. The dimensional encoding mechanism was motivated by theories which maintain that as children mature they encode an increasing number of dimensions of tasks and events (e.g., Halford, 1993). In these theories, the child begins by encoding information about only one dimension and moves to encoding two and then three dimensions.
In our theoretical framework, the significance of the hierarchical nature of the tactics is that as children perform the external memory task, they perform actions very similar to those necessary to construct strategies. That is, when responding to the sentence "Put the eraser on the chair" the child picks up the eraser (grasping tactic), moves it toward the chair (moving tactic) and places it on the chair (arrangement tactic). With experience, children parse the component of the response chain, and begin executing parts of the response chain in anticipation of the actual response. Young children begin executing the first component of this chain while listening to sentences by picking up a to-be-remembered object and holding it until the end of the sequence of sentences. Older children also execute the first and second tactics, moving the objects toward the target while listening to the sentences. With practice, older children and adolescents may execute all three tactics and place the object on the yellow divider to represent the relation "on" or in front of the divider to represent "in front of "and thus devise an object-target-relation strategy.
Our view is that the mechanism that underlies the "discovery" of these types of strategies is one in which the child parses the response chain required by the tasks and attends to an increasing number of elements of the response chain necessary for making a response. This is quite different from thinking of strategies as being "in the child's head"; rather, strategies evolve because the child attends to the appropriate aspects of the context provided by the task. Strategy evolution is in response to the multiple constraints and resources provided by the context that direct the child's attention to the relevant aspects of the task.
Similar processes may be involved in other strategies such as rehearsal in a sequential memory task which involves labeling and sequencing as part of the response (Bray & Turner, 1986). In this situation, strategy evolution begins with simple labeling, moves to repeated labeling, and ends with sequenced (cumulative) labeling. This evolution may be seen as anticipatory of the required response -- recall of the items in their order of presentation requires each item initially be "labeled", and cumulative rehearsal requires repeated labeling of the to-be-remembered items in sequence.
The dimensional encoding mechanism in the Components model consists of a trial initiation signal which triggers excitatory input from the simulation program to the accuracy nodes (called accuracy-attention nodes, bottom, Figure 4) to start the strategy selection phase. The object, target, and relation accuracy-attention nodes receive an increasing amount of input depending on the trial number (epoch) corresponding to the child attending to the tactics involved in execution of the response. The magnitude of excitatory input depends on curvilinear monotonic functions herein called "T-curves" (to denote change across time). The T-curve controlling input to the object accuracy-attention nodes rises to asymptote faster than the other functions, meaning that in the early trials, there is more excitatory input to the object accuracy-attention units. The T-curve for controlling input to the accuracy-attention nodes for the targets rises to asymptote second, and that for relation rises to asymptote last. The T-curves thus represent increased attention by the simulated child to the tactics involved in making the required response as outlined in our theoretical framework.
Like the Novelty Bias model, the Components model generates several simulated behaviors that are similar to those observed in our empirical studies of external memory strategies and in Siegler's studies of early addition strategies (Siegler & Jenkins, 1989). First, in most simulation runs, the object, object-target, and object-target-relation strategies emerge in that order, as observed in our empirical research. Second, as in the Novelty Bias model, once a strategy "emerges", it is likely that it will not be used exclusively, and the simulated child will occasionally use less sophisticated strategies such as object encoding after using more sophisticated strategies, such as an object-target encoding strategy. Third, accuracy of recall for our simulated children increases with the sophistication of the strategy, and there are primacy and recency effects in recall.
Generalized Components/Attention Bias Model. This model includes all the modules shown in Figure 4 and is designed to inlcude the best features of each of the other models, including an adaptation that provides a capability of responding quickly to changes in the environment. In general, neural network models seem well suited for modeling systemic changes that occur gradually as in the shift from object encoding to object-target-relation encoding in the Components model. However, in most neural network models of aspects of cognitive development (i.e., monolithic, multilayered backpropagation networks), it is difficult to model sudden, discrete changes in the environment such as instructions or verbal cues given by the experimenter (Schneider & Oliver, 1991). The Generalized Components/Attention Bias model overcomes this limitation with the flexibility of the "attention bias units." The attention/bias module consists of the three attention bias nodes which were also included in the Novelty Bias model. The weights for the connections from the attention bias nodes to the strategy nodes represent the degree of attention given a particular strategy. Output from the attention bias nodes is always excitatory and may increase or decrease across trials depending on whether the simulation involves increasing or decreasing attention. A response to an environmental change such as an instruction by an experimenter could be simulated by an increase in output from the attention bias nodes to the strategy nodes. Novelty bias, on the other hand, could be simulated as an initial increase in attention (output from the attention bias nodes) which subsequently decreases. All other aspects of the Generalized Components/Attention Bias model are the same as the Components model, making this model capable of sensitivity to slowly evolving strategies (as handled by the Components model) and also capable of rapid increases in attention to sudden environmental changes such as the experimenter's instructions, as well as providing an architecture to implement the novelty bias mechanism or similar decreases in attention.
Generalized Strategy Abstractor Model. The architecture of the Strategy Abstractor model (Reilly, Bray, Villa, & Anumolu, 1993) is shown in Figure 5. The input module is a local representation of the sentence presented and has excitatory input to each node of the sequencer module, which is similar to the sequencer modules of the other models. New instance nodes are added, motivated in part by the construction of the Interactive Activation and Competition (IAC) model of McClelland and Rumelhart (1981). These nodes introduce a somewhat more distributed representation of the to-be-remembered sentences. Each instance node is connected to a unique combination of sequencer nodes and to nodes corresponding to the objects, targets, and relations. Thus, the instance nodes are distributed "multidimensional memory traces."
The tactics module (Figure 5, left side) is similar to the tactics module of the Generalized Component/Attention Bias model including its selective connectivity to the nodes of the object, target, and relation pools. A major difference is that more tactics are represented (allowing a more detailed representation of our empirical data), and the pool of order nodes has been added to represent coding of order of presentation not tied directly to the sequencer; this has proven important in simulation of recall of sentences in the wrong order.
The Actions module (Figure 5, right side) represents the effector, translating the activation of the object, target, and relation information into a strategy. A graphics user interface tied to this unit allows the user to watch simulated movements of the objects corresponding to strategy choices made by the model. There is no local representation of the strategies; the representation is distributed in the sense that the simulated action (e.g., picking up an object, moving it and arranging it in front of the target) is a result of a particular pattern of activation of the tactics nodes.
This model includes the accuracy feedback mechanism of the Generalized Components/Attention Bias model with excitatory connections to the nodes of the tactics module. However, the model was motivated, in large part, to incorporate additional mechanisms of strategy development not yet represented in the other neural network models. Therefore, the model includes a "speed of execution" mechanism (see Table 1), similar to one in Siegler and Shipley's (in press) model of addition strategies. The longer it takes to execute a strategy, the less likely that the strategy will be used. Another mechanism, "subjective difficulty" (see Table 1), is represented by direct excitatory connections to the tactics module. This mechanism was motivated by Belmont and Mitchell's (1987) theory of strategy use which maintains that there is an "optimum" level of subjective task difficulty that must "challenge" the child. Tasks that are subjectively too "easy" or too "difficult" will result in little or no strategy use.
Simulation Studies
The general goal of the simulation studies has been (a) to demonstrate that the neural network models are able to simulate the basic patterns that would be expected of strategy evolution; (b) to show that strategy/accuracy relations are obtained similar to those that would be expected in empirical studies of external memory strategies; (c) to evaluate how well the models simulate specific empirical data obtained from the instruction following tasks from groups of children differing in age and intelligence; (d) to determine how well the models are able to fit empirical data from the relations memory task and the preschool version of the instruction following task; and (e) to demonstrate the influence of mechanisms on simulated strategy development.
Simulations of Expected Trends in Strategy Evolution
Results from a typical simulation run of the Generalized Components/Attention Bias model are given in Table 2. The table displays the strategy selected and its strength for each trial. Discovery of a strategy occurs when its strength first exceeds the firing threshold of 0.1 which is the threshold for all neurons in the network. Subthreshold values simulate the lack of use of a strategy. A strategy unit has an effect on an entity unit only when it exceeds the firing threshold. Due to initial weight assignments (randomized in virtually all neural network simulations), and the presence of a noise term in the activation update equation for strategy units, there is variation in the results from one simulation run to another.
Table 2. Strategy evolution observed in one of the computer simulation runs of the Components Neural Network Model.
| Comment | |||||||
| not used | |||||||
| discovered | |||||||
| not used | |||||||
| 2nd use | |||||||
| not used | |||||||
| 3rd use | |||||||
| discovered | |||||||
| discovered | |||||||
1 Value strategy: Activation value of strategy
At the beginning of the simulation, specifically up to the 17th trial, all three strategies were alternately selected at subthreshold levels (see Table 2) due, in part, to the contribution of the noise term in the strategy update equation. In the particular run shown in Table 2, the object encoding strategy is discovered in the 44th trial. This was caused by a higher degree of the encoding of feedback information for objects than for targets or relations. Although the object-encoding strategy was discovered in the 44th trial, it was not used in the 45th trial because its strength once again fell below the firing threshold. After this trial, the object encoding strategy was frequently selected over the other two strategies. The object-target encoding strategy was discovered in the 52nd trial and the object-target-relation encoding strategy in the 59th trial.
With experience across trials, the network encodes more feedback information for targets, similar to the case with the objects. This results in more frequent selection of the object-target encoding strategy between the 52nd and 66th trials. Lastly, the distribution of the network's encoding of feedback information for all three entities results in the eventual selection of the object-target-relation encoding strategy only. Thus, during the "lifetime" of the network, its strategy choice progresses from the simplest (the object encoding strategy), to the most advanced (the object-target-relation encoding strategy).
Simulation of Expected Strategy/Accuracy Relationships
In this section, we consider the effects of strategy use on recall accuracy for the Novelty Bias model. Recall accuracy of an entity was interpreted using a maximum activation scheme proposed by McClelland (1991). Recall of an entity was deemed correct if its activation exceeded a firing threshold of 0.1 (the same as all other units in the network), its activation was the highest in its pool, and it fired in the order of the instructions. Recall accuracy of objects, for example, was determined by the ratio of the number of objects correctly recalled to the number of objects used in the sequence of four instructions.
From the architecture of the model, it would be expected that the use of an object encoding strategy would facilitate recall of objects relative to no strategy use but not influence recall of target or relations because these are not encoded. Similarly, an object-target encoding strategy would be expected to facilitate recall of objects and targets but not relations. As compared to the other strategies, the object-target-relation encoding strategy would be expected to result in better recall for relations. The performance of the network on recall of entities summarized in Table 3 supports these expectations.
Table 3. Simulated accuracy of recall with the Novelty Bias Neural Network Model (From Anumolu, Bray, & Reilly, submitted).
| Object | |||
| Object-target | |||
| Object-target-relation |
Simulations of Empirical Developmental Differences
Instruction following task. This set of simulation studies focuses on the relative success of the model in simulating the data generated in the study by Bray, Saarnio, Borges, & Hawk (1994). They require decisions concerning the initial weights for the models and the initial values of the mechanisms of strategy development. These decisions are constrained by the empirical results obtained in the control condition for the oldest children without mental retardation tested in the empirical study. Bray, Reilly, Villa, Grupe, and Sadeh (1995) conducted this series of evaluation studies to determine whether simulated strategy frequencies generated by the Generalized Components/Attention Bias model correspond to the empirical observation of external memory strategies reported by Bray, Saarnio, Borges, and Hawk (1994). The participants in that empirical study were 7- and 11-year-old children without mental retardation and 11-year-old children with mild mental retardation. The simulations required decisions concerning the initial weights for the models and the initial values of the mechanisms of strategy development. Using the parameter values of our prior simulations as a point of departure, a "criterion run" fit the simulated mean frequency for each of the three strategies to the empirical mean frequency for the most developmentally advanced participants. In the empirical study, the most developmentally advanced group was that of the 11-year-old children without mental retardation. Thus, in the "criterion run", the simulated means for the three strategy types were fit to the empirical data from the 11-year-old children without mental retardation. In the empirical study there were 8 trials with 4 sentences each for a total of 32 sentences per child. In the simulation, each run generated 32 epochs in which one of the three strategies was or was not used. The mean proportion of epochs in which each strategy was used (across 12 runs, each run simulating the empirical data of one child) and the empirical mean number of sentences in which each strategy was used by the 11-year-old children without mental retardation are shown in Table 4. There was an excellent fit of the simulated data to the empirical data [2 (3)=.93, p> .92].
Table 4. Mean frequency of empirical and simulated strategy use in the control condition for 11-year-old children without mental retardation ("Criterion Run")
| Empirical Data | Simulated Data | |
| No observed strategy | 2.25 | 0.75 |
| Object strategy | 0.83 | 1.08 |
| Object-target strategy | 15.25 | 17.25 |
| Object-target-relation strategy | 13.67 | 12.92 |
In the next step of the model evaluation, the criterion run constrained the weights used to represent differences among the other age groups. That is, in the course of the best-fit simulations of the oldest group of children without mental retardation, the weights changed as the frequency of the three types of strategies changed. We examined the simulation protocols to find epochs in the simulation that provided the best fit to the empirical results in the control condition of each experiment for each of the two other groups. For example, we found that epoch 11 of the criterion run provided the best fit to the empirical data for the 7-year-old children without mental retardation, and epoch 14 provided the best fit for the 11-year-old children with mental retardation [ 2 (3) = 7.23, p> .12 for the 7-year-old children without mental retardation and 2 (3) = 3.22, p> .52 for the 11-year-old children with mental retardation]. The simulated and empirical mean frequency of strategy use for each group are shown in Table 5. In effect, at an earlier phase of the criterion run, the model, which initially provided an excellent fit for the 11-year-old children without mental retardation, also simulated the empirical data from the less developmentally advanced groups but at earlier trials (epochs) of the simulation.
Table 5. Mean frequency of empirical and simulated strategy use in the control condition for 7-year-old children without mental retardation and 11-year-old children with mental retardation
| 7-year-old children without mental retardation | 11-year old children with mental retardation | ||||
| Empirical Data | Simulated Data | Empirical Data | Simulated Data | ||
| No observed strategy | 0.36 | 3.73 | 1.00 | 2.55 | |
| Object strategy | 3.45 | 3.27 | 0.18 | 1.73 | |
| Object-target strategy | 28.18 | 21.45 | 27.55 | 22.00 | |
| Object-target-relation strategy | 0.00 | 3.55 | 3.27 | 5.73 | |
In the next phase of the simulations, the instruction condition was simulated and fit to the empirical data. The weights of the model for the best-fit epoch represent the differences and similarities among the three groups. Thus, for the most developmentally advanced children, the weights of the model in the last epoch of the "criterion run" served as a representation of their strategy competence in the control condition. The weights in epoch 11 represented the developmental and intellectual competence of the younger children without mental retardation, and the weights at epoch 14 repesented the competence of the children with mental retardation under the simulated control condition.
To begin the simulation of the instruction condition, the initial weights used in the simulation of differences due to the experimental conditions were those obtained for the control condition for each age group. The simulation included the incremental manipulation of one mechanism expected to influence the strategy use (i.e., the attention bias mechanism in the Components/Attention Bias model). This was implemented by manipulating the T-curve for the "relation" so that it was given a steeper slope and a higher asymptote than in the "criterion run." This parameter manipulation represents the experimenter's training with the object-target-relation strategy on the assumption that the training resulted in increased attention to coding the relation. The simulated and empirical means are shown in Table 6. In this empirical study, the direct instruction with the object-target-relation coding strategy resulted in no differences among the groups. In our simulation, one theoretically rationalized parameter was manipulated to simulate the empirical results for the instruction condition [2 (3)=.00, p= 1.00 for the 7-year-old children without mental retardation, and 2 (3)=0.003, p> .99 for the 11-year-old children without mental retardation, and 2 (3)=0.23, p> .99 for the 11-year-old children with mental retardation].
Table 6. Mean frequency of empirical and simulated strategy use for the instruction condition
| 7-year-old children without mental retardation | 11-year old children without mental retardation |
11-year old children with mental retardation | ||||
| Empirical Data | Simulated Data | Empirical Data | Simulated Data | Empirical Data | Simulated Data | |
| No observed strategy | 0.00 | 0.00 | 0.00 | 0.00 | 0.09 | 0.00 |
| Object strategy | 0.00 | 0.00 | 0.00 | 0.00 | 0.09 | 0.00 |
| Object-target strategy | 0.00 | 0.00 | 1.33 | 1.25 | 0.55 | 0.82 |
| Object-target-relation strategy | 32.00 | 32.00 | 30.67 | 30.75 | 31.27 | 31.18 |
These simulation results are important for several reasons. First, they demonstrate that individual differences can be modeled within the same architecture -- consistent with the assumption that strategy potentials are the same in children with and without mental retardation. Second, because differences in strategy competency were represented by differences in the weights, the simulations show the feasibility of explicitly characterizing developmental and individual differences in strategy competency within a connectionist model. Third, these simulations illustrate that our approach is able to model differences in experimental conditions with a minimum of parameter manipulation. That is, the simulations of the experimental conditions were constrained by the parameters obtained during the simulations of the control conditions, which were, in turn, constrained by the empirical data from only one of the three groups (the most developmentally advanced children). This type of feasibility demonstration is encouraging, and the use of this protocol will be explored in further research.
Relations Memory Task. We have also successfully completed simulations of empirical data from the relations memory task of Fletcher and Bray (1995). The empirical frequency of strategy use for each type of external memory strategy for the 11-year-old children with mental retardation and the 7- and 11-year-old children without mental retardation and the simulated frequency of strategy use by the Generalized Components/Attention Bias Model are shown in Table 7. For each group, the fit to the model using 2 was excellent, p > .90.
Table 7. Simulation of data from relations memory task of Fletcher and Bray (1995) using the Generalized Components/Attention Bias Model
| 7-year-old children without mental retardation | 11-year old children without mental retardation | ||||
| Empirical Data | Simulated Data | Empirical Data | Simulated Data | ||
| No observed strategy | 9.60 | 11.88 | 11.52 | 12.50 | |
| Object strategy | 13.40 | 13.50 | 3.84 | 4.50 | |
| Object-object-relation strategy | 8.32 | 6.62 | 16.32 | 15.00 | |
Preschool Instruction Following Task. We have also conducted simulations of empirical data from the preschool version of the instruction following task (Fletcher & Bray, in press). We have now completed a simulation of the data from the 3- and 4-year old children and from the 5- and 6-year-old children included from that study. The frequency of strategy use by the children and the simulated frequency of strategy use by the Generalized Component/Attention Bias model are shown in Table 8. For each group, the fit to the model using 2 was excellent, p > .90.
Table 8. Simulation of data from Preschool Instruction following task from Fletcher and Bray (in press) using the Generalized Component/Attention Bias Model
| 3- and 4-year-old children | 5- and 6-year old children | ||||
| Empirical Data | Simulated Data | Empirical Data | Simulated Data | ||
| No observed strategy | 21.76 | 23.75 | 23.75 | 22.32 | |
| Object strategy | 7.36 | 6.56 | |||
| Object-target-strategy | 2.88 | 1.69 | |||
The results of the simulations of the empirical data from the relations memory task and the preschool adaptation of the instruction following task illustrate the generality of the Generalized Component/Attention Bias Model. The excellent fits obtained in these simulations indicate that the architecture and mechanisms involved in the model readily accommodate empirical data from the three different paradigms used in our empirical research on external memory. We are now assured that the models can be generalized beyond the instruction following task for which they were originally developed and, it seems likely, to many others.
Simulations of Mechanisms of Strategy Development
Across our two most comprehensive connectionist models (the Generalized Components/Attention Bias, and Generalized Strategy Abstractor models) we have implemented nine different mechanisms of strategy development that may contribute to a clearer understanding of the nature of typical and atypical strategy development (see Table 1).
In this section we present simulation results suggesting the importance of five mechanisms of strategy development implemented in our neural network models. These include novelty bias, accuracy feedback, context sensitivity, spreading inhibition, and post-synaptic thresholds.
Novelty bias and accuracy feedback. We constructed the Novelty Bias Model based on the assumption from Siegler's theory that novelty bias and accuracy feedback are two key cognitive factors that play a role in the selection and evolution of strategies. Accordingly, the first two mechanism implemented in our neural network models were the novelty bias and accuracy feedback mechanisms.
The novelty bias module consists of three novelty bias units (attention/bias nodes in Figure 4). The connection from a novelty bias unit to its strategy is a random excitation weight (ranging from 0.0 to 1.0) multiplied by the current novelty bias activation value. This product represents the degree of novelty a subject may place on a strategy and is recomputed with a different random weight each time the activations of strategies are updated, reflecting a probabilistic influence in strategy selection. The activation value of a novelty bias unit starts at 0.8 and decays with each trial (e.g., by 4%). A higher activation value indicates that the strategy corresponding to the bias unit is newer and is strongly biased in favor of selection, and vice-versa.
Novelty bias input is a joint function of a random excitation weight (ranging from 0.0 to 1.0) multiplied by the activation values of the novelty bias node on a particular trial. The strategies selected include object encoding (pointing at or holding an object; see Figure 1), object-target encoding (moving an object with orientation toward a target), and object-target-relation encoding (placing an object in front of or on top of a wooden separator directly across from a target). In Table 9, accuracy input to a strategy unit on a particular trial is defined as the net influence of all accuracy/attention units to the strategy unit after the trial initiation (priming) signal is given (see Figure 4, bottom). The input from the accuracy /attention units will increase as a function of the degree of success the simulated child has had in recalling the objects, targets, or relations on previous trials. Thus, this input represents accuracy feedback.
Table 9. Strategy evolution observed in one of the computer simulation runs of the Novelty Bias Neural Network Model.
|
Strategy | |||||||
1 Novelty Bias: Activation value of novelty bias node
2 Acc. Input: Input from accuracy nodes
3 Value Strategy: Activation value of strategy
After the object encoding strategy was initiated at the beginning of the simulation, this strategy continued to be selected until the 12th trial. A steady increase in accuracy input for the object encoding strategy was noticeable. After the object-target encoding strategy was initiated at the beginning of the 13th trial, the selection shifted back and forth over the next several trials between the two strategies. A steady increase in the accuracy input was noticeable for both of these strategies during this period. After the object-target-relation encoding strategy was initiated at the beginning of the 25th trial, all strategies in this simulation each appeared at least twice. The simulated use of multiple strategies in close time intervals is similar to actual strategy selection by children in arithmetic, instruction-following, serial recall, time telling and other task domains (Bray, et al., 1994; McGilly & Siegler, 1989; Siegler & Jenkins, 1989; Siegler, 1991).
Even though the net accuracy input for the object encoding strategy unit, for example, at the 28th trial, was greater than that for the object-target-relation encoding strategy unit, the novelty factor allowed the latter strategy to be selected. As the simulation continued, the net novelty bias input diminished for all strategies and the net accuracy input became the deciding factor in strategy selection. The object-target-relation encoding strategy reached the highest net accuracy input after the 43rd trial. The net novelty biases for other strategies had nearly diminished to zero, and this most productive strategy became the only one selected by the network. This behavior may be compared to the analogous case wherein most older children tested by Bray et al. (1994) always selected an object-target-relation encoding strategy.
Due to the stochastic nature of the activation of novelty bias units, which feed input into the strategy units, the results of the simulation vary with each simulation run. In a set of ten additional simulation runs using the same protocol for initiation of each strategy through input to the novelty bias units, we observed that the number of trials required for the activation of any strategy to reach its peak value (1.0) varied across simulation runs due to this fluctuating contribution of novelty bias units. The number of trials to network convergence (on the object-target-relation encoding strategy) varied from 46 to 65 in ten simulation runs. The network converged on the object-target encoding strategy once during the eleven runs, and the convergence required 79 trials which mimics the occurrence of this same phenomenon in the experimental observations of humans, especially for children with mental retardation, who may not progress beyond the use of the object-target strategy..
Context sensitivity. A third mechanism implemented in our neural network models is "context sensitivity" which is similar to the concept of "breadth of attention" (Fisher & Zeaman, 1973). At low values of context sensitivity, the model "picks up" very little information during one trial (the increments in the weights following a learning trial are minimal), whereas at high values of context sensitivity, the model picks up a great deal of information (the increments in the weights are relatively large). In a simulation reported by Bray, Villa, Reilly, and Grupe (1994), we systematically varied the context sensitivity mechanism under three different conditions. The first was a condition in which the objects were made more salient than the targets or the relation in an instruction. This was implemented by elevating the input from the T-curves to the representation of the objects as compared to the input to the target and relation representations. Experimentally, this would correspond to a situation in which the objects are made more salient than the other elements of the task by moving them closer to the child than to the target, by making the object more colorful than the targets, etc. This input simulates attention to selective aspects of the task, in this case the objects mentioned in the instructions. The other conditions were generated by elevating the input for the T-curves for the targets in one condition and the relations in the other. Experimental conditions for empirical studies could be generated that would correspond to each of these simulation situations.
The results are shown in Figure 6. In these simulations, low values of context sensitivity were used to simulate expected results from young children (e.g., preschool and young school-aged children), and high levels of context sensitivity were used to simulate expected results from older school-aged children and adults. In the top panel which presents the results for the conditions with increased salience of objects, the proportion of simulated trials with "no strategy" decreases as context sensitivity increases, simulating a developmental increase in strategy use. The use of the object encoding strategy increases initially with context sensitivity. However, at high levels of context sensitivity, the object-target-relation strategy is most likely, even though the objects are very salient. To the extent that younger children are expected to have low context sensitivity and older children are expected to have high context sensitivity, the simulation predicts that in an experimental situation with high object saliency, younger children will be most likely to use an object encoding strategy whereas older children will be more likely to use the more sophisticated object-target-relation strategy in spite of the high level of object salience.
In the middle panel (increased salience of targets) the proportion of trials with "no strategy" decreases as context sensitivity increases, object-target encoding strategies emerge with low values of context sensitivity, and object-target-relation strategies emerge at high values of context sensitivity. A similar pattern is seen in the bottom panel (increased salience of relations) except the object-target-relation strategy emerges at much lower values of context sensitivity. To the extent that younger children may be characterized by low context sensitivity, the simulations suggest that making the relation more salient than the objects or target will most likely result in the emergence of the more sophisticated object-target-relations strategy even with younger children. This is an interesting set of specific, non-intuitive predictions. We are currently designing empirical studies to test these predictions.
Spreading inhibition. A fourth mechanism of strategy development is spreading inhibition. Bjorklund and Harnisfeger (1995) have argued that age-related changes in inhibition may be related to several aspects of cognitive development including the use of memory strategies. All of our neural network models incorporate inhibition mechanisms in some capacity. For example, in the strategy selection process on a trial, one of the three strategies (object, object-target, or object-target-relation encoding) is selected and the other two are inhibited using a "winner take all" inhibitory mechanism. Also, with the representation of the objects, targets, and relations used in the task, there is mutual inhibition within each category ("entity pool"; see Figures 4 and 5). This means that when one object is activated at presentation, the remaining objects in the pool are inhibited, making it more likely that the object activated at presentation will still be activated above a threshold at the time of recall (in spite of the decay in the activation that occurs continuously in the system). In both of these examples, this inhibitory process results in "signal enhancement" or "differentiation" of more relevant and less relevant information in a manner which could lead to developmental differences in accuracy of recall and strategy use.
In simulations reported by Reilly, Bray, Villa, Caniglia-Reilly, and Golding (1996) we demonstrated the effects of differences in spreading inhibition with the entity pools of the associative memory modules (see Figures 4 and 5). Our assumption was that for younger children, objects in our external memory tasks are more salient than the targets or relations, corresponding to spreading inhibition in the object pool but not in the target or relation pools. That is, our observation that younger children are most likely to use an object encoding strategy suggests that younger children process information about the objects more effectively than any other element of the task. This was simulated by making inhibition in the object pool marked, whereas there was no inhibition in the target or relation pools. For older children, however, spreading inhibition was assumed to be present in all three pools, consistent with the idea that inhibitory mechanisms become more operational with increasing development.
In our inhibition simulations, we found that with increasing levels of spreading inhibition within the object pool, the differential between the node with the highest and the next highest activation was greater than this differential when there was no inhibition in the pool. Under the conditions of our experiments, this signal enhancement would correspond to higher levels of recall accuracy. Also, as the amount of inhibition across pools increased, the differential between the nodes with the highest and next highest activations increased relative to the condition with no inhibition. The increase in activation as spreading inhibition is introduced into the object, target and relation pools is shown in Figure 7 These simulation results are consistent with the idea that with development, there are increases in inhibition corresponding to changes in accuracy of recall. A similar relation would hold for selection among strategies or strategy components (tactics) with the strategy most frequently associated with high levels of accuracy selected, inhibiting the other possible strategies. This would result in the increasing use of more sophisticated strategies.
Post-Synaptic Threshold. A fifth mechanism of strategy development included in all of our neural network models is the post-synaptic threshold. This may be considered a specific type of inhibitory mechanism. Research in neurobiology has shown that the firing of a neuron depends on at least two important factors. The first is the magnitude of the input form the axons of pre-synaptic neurons. All other things being equal, the more axons that are releasing neurotransmitters across the synaptic cleft, the more likely the receiving neuron will fire, transmitting a neural impulse. A second factor, however, is the threshold of the receiving neuron, here called the post-synaptic threshold. The higher this threshold, the greater is the input required from the pre-synaptic axons.
In our neural network models we have found that with low values of the post-synaptic threshold, the system has difficulty forming clear, forward associations. That is, as the networks try to learn a sequence of object-target-relations, they are more likely to recall the items in an incorrect sequence with a low post-synaptic threshold than with a high one. Anumolu, Bray, and Reilly (1992) conducted a simulation to illustrate the effects of differences in post-synaptic threshold. The simulation results shown in Table 10 demonstrate that at high values of the post-synaptic threshold, the association between the sequence of nodes (1-4) was always forward (the weight between 1-2, 2-3 and 3-4 is the greatest and the others approach zero, simulating little to no connection between these nodes). For low values of the post-synaptic threshold, however, several "backward" associations are noted, which would simulate ordering errors at recall. Although admittedly speculative at this point, our neural network models suggest that post-synaptic threshold could provide a plausible mechanism that may be related to the decreasing sequence errors that we observe in our empirical studies. This mechanism is particularly interesting because of its biological basis and non-intuitive nature from a purely psychological approach to the problem of strategy development.
We are currently designing simulation studies to explore further the relative contribution of the nine mechanisms to strategy evolution included in our models (Table 1). Using the parameters obtained by Bray, Reilly, Villa, Grupe, and Sadeh (1995) in the simulations of the empirical data of Bray, Saarnio, Borges and Hawk (1994), we will be able to manipulate each of these mechanisms across a wide range of values and observe their effects on strategy use and accuracy of recall. Because these simulations will start with the parameters from the prior simulations showing a close fit to our empirical data, the simulation results will make specific predictions concerning the data of empirical research in which the mechanisms may be manipulated. In effect, we will be able to generate compete simulated data sets before the empirical experiments are conducted.
For example, all our models include the accuracy feedback mechanism. Each model allows feedback from an external teacher (which may be considered a simulation of input from another connectionist module). We assume that strategy evolution is dependent, in part, on this mechanism and therefore the accuracy of this feedback would be important for simulated strategy use as shown in our simulations with the novelty bias mechanism. One of the simulations will investigate this assumption with a more specific focus on the role of accuracy feedback by varying the level of noise introduced into the feedback; the percentage of error in the feedback would be varied from 0.0 (as in the simulations to date) to 1.0 (feedback that correct responses are wrong and incorrect responses are right) in increments of 0.1. We expect that this mechanism will have a powerful effect on simulated strategy use and will lead to future empirical research on this issue. A similar simulation/empirical research strategy will be used for the other mechanisms in our models.
Some Potential Challenges for Neural Network Modeling
Although the initial simulations reported here were successful, it is worthwhile to note that investigators attempting to develop neural network models of developmental phenomena and processes can expect to encounter several conceptual challenges. While not exhaustive, a list of important potential difficulties includes catastrophic interference generation, levels of learning, the problem of real time versus simulated time, and questions related to the degree of correspondence between representation in the model and representation in the child's cognitive system. Each of these will be discussed briefly with reference to our own models.
Catastrophic interference generation may occur when a network has been trained using one list, or corpus, and is then required to learn a completely new or additional sets of information. The network may have such extreme difficultly learning the new material that it "crashes." The difficulty is caused by the network learning in such a way that it is subject to massive interference from original learning. This type of learning may occur in some types of neural network models, especially autonomous nets using backpropogation algorithms. Because most models of developmental phenomena involve learning across time, catastrophic interference from original learning would make the models psychologically unrealistic because interference to this degree rarely, if ever, occurs in the course of a normal developmental trajectory. In our research program, we have used a modular approach in which the specific content of to-be-remembered sentences is represented in a way that allows the buildup of within-trial interference yet keeps the between-trial interference at realistic levels. Therefore we have not encountered the problem of catastrophic interference which others have (e.g., Brown, Dalloz, & Hume, 1995).
Another potential difficulty related to learning in developmental models is the representation of different levels of learning. Not all aspects of learning change with age, and this is one important aspect of model development that must be confronted by an investigator interested in development. In our models, information learned may be unmodifiable, modified across trials, and modified within trials. For example, the child's knowledge of serial order is simulated in the sequencer module which is pre-trained before the simulation and, once learned, is not modified during the course of the simulation. In all of our simulations to date we have pre-trained the sequencer to reliably sequence from 1 to n, but one potentially rich area for simulation studies will be to vary the degree of initial learning of serial information to approximate the sequential knowledge of younger learners. Similarly, some learning is represented in a way that allows gradual changes across trials, such as the evolution of strategy knowledge in the Components Model. Also, the specific knowledge of which object is to be placed with each target changes from trial to trial and is allowed to decay to a baseline level between trials. Varying assumptions about the processes involved in between-trial and within-trial learning also offers opportunities for simulations of potentially important aspects of developmental change.
A third difficulty common to all simulations of developmental phenomena is the correspondence between real time and simulated time. For instance, in neural network models using the backpropogation algorithm, the model might take hundreds of thousands of learning-test cycles in the course of simulating the development of learning the concepts involved in the prediction in a balance-beam task (e.g., McClelland, 1995). In this extreme case, the model may be qualitatively simulating the general sequence of concepts that are learned by a child, but without any isomorphism between epochs in the simulations and the actual experience of the child. In our simulations, which use a Hebbian rule-learning algorithm rather than backpropogation, the models learn much more quickly and, under certain conditions, provide an isomorphic relationship between epochs in the simulation and trials in the external memory task.
Lastly, in all simulations of developmental phenomena there is the problem of the degree of correspondence between the nature of the representation in the model and the actual representation in the child's cognitive system. For example, in our models we have made assumptions about the logical structure of strategy representations. We assume that each strategy consists of components that are added together in varying degrees in the course of the child's experience with the task and that the way they are added together depends on the child's developmental level. The young child is assumed to initially focus on the objects in the task and pay less attention than older children to where the objects are to be placed and even less to the particular orientation of the movable object to the fixed target. Strategy evolution during the course of the task involves learning to shift attention from just one element of the task (the movable objects) to all three critical elements (where the movable objects are to be placed relative to the fixed targets). We have designed empirical research to test the implications of this aspect of the model. However, pending the results of these experiments, we are unsure of the degree of correspondence between representation of strategies and the processes of strategy evolution embodied in our models and the actual representation and mechanisms of developmental change in the cognitive system of the children in our empirical studies of external memory strategies.
The initial simulation results in our program of research provides support for some of the mechanisms of strategy development discussed by Siegler and Crowley (1994) (e.g., novelty bias, accuracy feedback), shows the feasibility of implementing inhibitory mechanisms that are currently receiving attention in the literature on cognitive development (e.g., Bjorklund & Harnisfeger, 1995), and introduces the postsynaptic threshold as a mechanism with a clear biological motivation (although admittedly speculative at this point). Although these models and simulations have generated many more questions than they have resolved, we think our approach to empirical research and modeling will bring a focus to the area on the mechanisms of strategy development, thereby extending and complementing the emphasis Siegler and his colleagues have put on the general approach of combining empirical research with the development of computational models.
Our view is that the future of research on strategy use by typical and atypical children will begin to focus more on the mechanisms responsible for strategy development. Additionally, the connectionist modeling approach described here can be expected to lead to a clearer understanding of the nature of strategy competencies and deficiencies in individuals with mental retardation and mechanisms that may be responsible for the pattern of observed differences. Whereas intervention techniques for the early childhood education of typical children and the remediation of strategy deficiencies for atypical children have met with some success, it is our hope that the deeper understanding of typical and atypical development afforded by our approach will eventually lead to new educational training programs that are tailored to the strengths of young typical children and atypical children with mental retardation.
Anderson, J. A., Pellionisz, A., & Rosenfeld, E. (1990). Neurocomputing 2. Cambridge, MA: MIT Press.
Anderson, J. A., & Rosenfeld, E. (1988). Neurocomputing: Foundations of Research. Cambridge, MA: MIT Press.
Anderson, J. A., Spoehr, K. T., & Bennett, D. J. (in press) A study in numerical perversity: Teaching arithmetic to a neural network. In D. D. Levine (Ed.), Neural networks for knowledge representation and inference. Hillsdale, NJ: Lawrence Erlbaum.
Anumolu, V., Bray, N. W., & Reilly, K. D. (in press). Neural network models of strategy development in children. Neural Networks.
Anumolu, V., Bray, N. W., & Reilly, K. D. (1993). A neural network model of strategy selection and evolution. In World Congress on Neural Networks - '93, pp. 1528-1535.
Anumolu, V., Reilly, K. D., & Bray, N. W. (May, 1992a). Neural networks for learning, recognition, and control. Poster presented at A Research Conference at the Wang Institute of Boston University, Boston, MA.
Anumolu, V., Reilly, K. D., & Bray, N. W. (1992b). A hybrid neural network system with serial learning and associative components. In Proceedings of Workshop on Neural Networks and International Simulation Technology Conference, Society for Computer Simulation, San Diego, CA, pp. 455-462.
Baker-Ward, L., Ornstein, P.A. & Holden, D.J. (1984). The expression of memorization in early childhood. Journal of Experimental Child Psychology, 37, 555-575.
Bechtel, W., & Abrahamsen, A. (1991). Connectionism and the mind. Cambridge, MA: Blackwell.
Belmont, J. M., & Mitchell, D. W. (1987). The general strategies hypothesis as applied to cognitive theory in mental retardation. Intelligence, 11, 91-105.
Bjorklund, D. F., & Harnisfeger, K. K. (1995). The evolution of inhibition mechanisms and their role in human cognition and behavior. In F. N. Dempster & C. J. Brainerd (Eds.), Interference and inhibition in cognition (pp. 141-173). San Diego, CA: Academic Press.
Bray, N. W., Fletcher, K. L., Huffman, L. F., Hawk. T. M., & Ward, J. L. (April, 1994). Developmental differences in the use of models and verbal prompts in support of external strategies. Paper presented at the Conference on Human Development, Pittsburgh, PA.
Bray, N. W., Fletcher, K. L., Huffman, L. F., Hawk, T. M., Ward, J. L., & Blair, C. (1996). Knowledge of social rules, metamnemonic knowledge, and use of external memory strategies. Manuscript in preparation.
Bray, N. W., Fletcher, K. L., & Turner, L. A. (in press). Cognitive competencies and strategy use in individuals with mild retardation. In W. E. MacLean Jr. (ed.) Handbook of Mental Deficiency, Psychological Theory and Research (3rd ed.). Hillsdale, NJ: Lawrence Erlbaum.
., Bray, N. W., Reilly, K. D., Villa, M. A., Grupe, L. A., & Sadeh, B. (April, 1995). Hybrid neural network models of strategy development. Paper presented at Society for Research in Child Development, Indianapolis, IN.
Bray, N, W., Saarnio, D, Borges, J. M., & Hawk, L. W. (1994). Intellectual and developmental differences in external memory strategies. American Journal on Mental Retardation, 99, 19-31
Bray, N. W., & Turner, L. A. (1986). The rehearsal deficit hypothesis. In N. R. Ellis & N. W. Bray (Eds.), International review of research in mental retardation (Vol. 14, pp. 47-71). New York: Academic Press.
Bray, N. W., & Turner, L. A. (1987). Production anomalies (not strategic deficiencies) in mentally retarded individuals. Intelligence, 11, 49-60.
., Bray, N. W., Villa, M. A., Reilly, K. D., & Grupe, L. A. (June, 1994). Context sensitivity in a hybrid neural network model of strategy development. Paper presented at the Special Interest Group on Higher-Order Learning, 1994 World Congress on Neural Networks, San Diego, CA.
Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). Learning, remembering and understanding. In J. H. Flavell, & E. M. Markman (Eds.), Handbook of child psychology (4th ed., pp. 77-166). New York: Wiley.
Brown, G. D. A., Dalloz, P. Hulme, C. (1995). Mathematical and connectionist models of human-memory: A comparison. Memory, 3, 113-145.
Campbell, J. I. D., & Oliphant, M. (1992). Representation and retrieval of arithmetic facts: A network-interface model and simulation. In J. I. D. Campbell, (Ed.), The nature of and origins of mathematical skill (pp. 331-361). Amsterdam, Netherlands: North-Holland.
Carpenter, G. A., & Grossberg, S. (1991). Pattern recognition by self-organizing neural networks. Cambridge, MA: MIT Press.
Churchland, P. S., & Sejnowski, T. J. (1992). The computational brain. Cambridge, MA: MIT Press
Cohen, I. R. (In press). An artificial neural network analogue of learning in Autism. Biological Psychiatry
Cohen, I. R., Sudhalter, V., Landon-Jimenez, D., & Keogh, M. (1993). A neural network approach to the classification of autism. Journal of Autism and Developmental Disorders, 23, 443-466.
DeLoache, J.S., Cassidy, D.J., & Brown, A.L. (1985). Precursors of mnemonic strategies in very young children's memory. Child Development, 56, 125-137.
Farah, M. J., O'Reilly, R. C., & Vecera, S. P. (1993). Dissociated overt and covert recognition as an emergent property of a lesioned neural network. Psychological Review, 100, 571-588.
Farah, M. J., & McClelland. (1991). A computational model of semantic memory impairment: Modality specificity and emergent category specificity. Journal of Experimental Psychology: General, 120, 339-357.
Fisher, M. A. & Zeaman, D. (1973). An attention-retention theory of retardate discrimination learning. In N. R. Ellis (Ed.), International review of research in mental retardation, (Vol. 6, pp. 171-257). New York: Academic Press.
Fletcher, K. L. & Bray, N. W. (In press). External memory strategy use in preschool children. Merrill-Palmer Quarterly.
Fletcher, K. L. & Bray, N. W. (1995). External and verbal strategies in children with and without mild mental retardation. American Journal on Mental Retardation, 99, 363-375.
Fox, R., & Oross, S, III. (1992). Perceptual deficits in mildly mentally retarded adults. In N. W. Bray (Ed.), International review of research in mental retardation (Vol. 18, pp. 1-25). New York: Academic Press.
Grossberg, S. (1976). Adaptive pattern classification and universal recording: Part I. Parallel development and coding of neural feature detectors. Biological Cybernetics, 23, 121-134.
Halford, G. S. (1993). Children's understanding: The development of mental models. Hillsdale, NJ: Lawrence Erlbaum.
Hinton, G. E., & Shallice, T. (1991). Lesioning an attractor network: Investigations of acquired Dyslexia. Psychological Review, 98, 74-95.
Hrycej, T. (1992). Modular learning in neural networks: a modularized approach to neural network classification. New York: Wiley.
Intons-Peterson, M. J., & Fournier, J. (1986). External and internal memory aids: When and how often do we use them? Journal of Experimental Psychology: General, 115, 267-280.
Johnson-Laird, P. N. (1983). Mental Models. Cambridge, MA: Harvard University Press.
Kruetzer, M. A., Leonard, C. & Flavell, J. H. (1975). An interview study of children's knowledge about memory. Monographs of the Society for Research in Child Development, 40, (Whole No. 159).
Levine, D. S. (1989). Neural network principles for theoretical psychology. Behavior Research Methods, Instruments, & Computers, 21, 213-224.
Levine, D. S. (1991). Introduction to neural & cognitive modeling. Hillsdale: Lawrence Erlbaum.
Levine, D. S., & Leven, S. J. (1992). Motivation, emotion, and goal direction in neural networks. Hillsdale: Erlbaum.
McClelland, J. L. (1989). Parallel distributed processing: Implications for cognition and development. In R. G. M. Morris (Ed.), Parallel distributed processing: Implications for psychology and neurobiology (pp. 8-45). New York: Oxford University Press.
McClelland, J. L. (1991). Stochastic interactive processes and the effects of context on perception. Cognitive Psychology, 23, 1-44.
McClelland, J. L. (1995). A connectionist perspective on knowledge and development. In T. J. Simon & G. S. Halford (Eds.), Developing cognitive competence: New approaches to process modeling (pp. 157-204). Hillsdale, NJ: Lawrence Erlbaum.
McClelland, J. L., & Jenkins, E. (1991). Nature, nurture, and connections: Implications of connectionist models for cognitive development. In K. VanLehn (Ed.), Architectures for Intelligence (pp. 41-73). Hillsdale, NJ: Lawrence Erlbaum.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1, An account of basic findings. Psychological Review, 88, 375-407
McCloskey, M., & Lindemann, M. A. (1992). MATHNET: Preliminary results from a distributed model of arithmetic fact retrieval. In J. I. D. Campbell, (Ed.), The nature of and origins of mathematical skill (pp. 365-409). Amsterdam, Netherlands: North-Holland.
McGilly, K. & Siegler, R. S. (1989). How children choose among serial recall strategies. Child Development, 60, 172-182.
Newell, A., Young, R., & Polk, T. (1993). The approach through symbols. In D. Broadbent (Ed.), The simulation of human intelligence. Cambridge, MA: Blackwell.
Patterson, K., Seidenberg, M. S., & McClelland, J. L. (1989). Connections and disconnections: Acquired dyslexia in a computational model of reading processes. In G. M. Morris (Ed.), Parallel distributed processing: Implications for psychology and neurobiology (pp. 131-181). Oxford: Oxford University Press.
Petro, S. J., Herrmann, D., Burrows, D., Moore, C. M. (1991). Usefulness of commercial memory aids as a function of age. International Journal of Human Development, 33, 295-209.
Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered perceptron: Implications for child language acquisition. Cognition, 38, 1-60.
Plunkett, K., & Sinha, C. (1992). Connectionism and developmental theory. British Journal of Developmental Psychology, 10, 209-254.
Reilly, K. D., Bray, N. W., Villa, M. F., Caniglia-Reilly, J. A. , & Golding, E. (1996). The RoboKid models: Modules, substrates, behavior. In K. H. Chang and J. H. Cross, II, (Eds.), Proceedings of the 34th Annual Southeast Conference of the Association for Computing Machinery (pp. 341-343). New York: ACM.
Reilly, K. D., Bray, N. W., Villa, M. F., & Anumolu, V. (1993). Neural network and hybrid modeling of external memory strategy acquisition in humans and intelligent robots. In SIMTEC 1993 Proceedings.
Rogoff, B. (1990). Apprenticeship in Thinking: Cognitive Development in Social Contexts. Oxford: Oxford University Press.
Rumelhart, D. E., & McClelland, J L., Eds. (1986). On learning the past tenses of English verbs. In J. L. McClelland & D. E. Rumelhart & the PDP research group, (Eds.), Parallel distributed processing: Explorations in the microstructure of condition. Volume 2. Cambridge, MA: MIT Press.
Rumelhart, D. E., & McClelland, J L., Eds. (1987). Learning the past tenses of English verbs: Implicit rules or parallel distributed processing. In B. McWhinney (Ed.), Mechanisms of language acquisition (pp. 195-248). Hillsdale, NJ: Lawrence Erlbaum.
Rumelhart, D. E., & McClelland, J. L. (1988). Parallel distributed processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. Cambridge, MA: MIT Press.
Rumelhart, D. E., & Zipser, D. (1985). Feature discovery by competitive learning. Cognitive Science, 9, 75-112.
Schneider, W. & Oliver, W. L. (1991). An instructable connectionist/control architecture: Using rule-based instructions to accomplish connectionist learning in a human time scale. In K. VanLehn (Ed.), Architectures for intelligence: The 22nd Carnegie Mellon symposium on cognition (pp. 113-146). Hillsdale, NJ: Lawrence Erlbaum.
Seidenberg, M. S. (1993). Connectionist models and cognitive theory. Psychological Science, 4, 228-235.
Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568.
Sejnowski, T. J., Koch, C., & Churchland, P. S. (1990). Computational Neuroscience. In S. J. Hanson & C. R. Olsen (Eds.), Connectionist modeling and brain functioning: The developing interface. Neural Network Modeling and Connectionism (pp. 5-35). Cambridge, MA: MIT Press.
Sejnowski, T. J., & Rosenberg, C. R. (1988). Learning and representation in connectionist models. In M. S. Gazzaniga (Ed.), Perspectives in Memory Research (pp. 135-178). Cambridge, MA: MIT Press.
Siegler, R. S. (1991). Children's thinking. Englewood Cliffs, NJ: Prentice Hall.
Siegler, R. S., & Crowley, K. (1994). Constraints on learning in non-privileged domains. Cognitive Psychology, 27, 194-226.
Siegler, R. S., & Jenkins, E. (1989). How children discover new strategies. Hillsdale, NJ: Lawrence Erlbaum.
Siegler, R. S. & Shipley, C. (in press). Variation, selection, and cognitive change. In G. Halford and T. Simon (Eds.), Developing cognitive competence: New approaches to process modeling. Hillsdale, NJ: Lawrence Erlbaum.
Smith, L. B. (1993). The concept of same. Advances in Child Development and Behavior (Vol. 24, pp. 216-251). New York: Academic Press.
Smolensky, P. (1988). On proper treatment of connectionism. Behavioral and Brain Sciences, 11, 1-74.
Sun, R. (1995). Robust reasoning integrating rule-bound and similarity-based reasoning. Artificial Intelligence, 75, 241-295.
Wasserman, P. D. (1989). Neural computing theory and practice. New York: Van Nostrand Reinhold.
Wellman, H.M., Ritter, K., & Flavell, J.H. (1975). Deliberate memory behavior in the delayed reactions of very young children. Developmental Psychology, 11, 780-787.