this article was originally published in the comprehensive

19
This article was originally published in the Comprehensive Developmental Neuroscience: Neural Circuit Development and Function in the Brain published by Elsevier, and the attached copy is provided by Elsevier for the author's benefit and for the benefit of the author's institution, for non-commercial research and educational use including without limitation use in instruction at your institution, sending it to specific colleagues who you know, and providing a copy to your institution’s administrator. All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier's permissions site at: http://www.elsevier.com/locate/permissionusematerial Lany J., and Saffran J.R. (2013) Statistical Learning Mechanisms in Infancy. In: RUBENSTEIN J. L. R. and RAKIC P. (ed.) Comprehensive Developmental Neuroscience: Neural Circuit Development and Function in the Brain, volume 3, pp. 231-248 Amsterdam: Elsevier. © 2013 Elsevier Inc. All rights reserved.

Upload: others

Post on 28-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: This article was originally published in the Comprehensive

This article was originally published in the Comprehensive Developmental Neuroscience: Neural Circuit Development and Function in the Brain published by Elsevier, and the attached copy is provided by Elsevier for the author's benefit and for the benefit of the author's institution, for non-commercial research and educational use including without limitation use in instruction at your institution, sending it to specific colleagues who you

know, and providing a copy to your institution’s administrator.

All other uses, reproduction and distribution, including without limitation commercial reprints, selling or licensing copies or access, or posting on open internet sites, your personal or institution’s website or repository, are prohibited. For exceptions, permission may be sought for such use through Elsevier's permissions site at:

http://www.elsevier.com/locate/permissionusematerial

Lany J., and Saffran J.R. (2013) Statistical Learning Mechanisms in Infancy. In: RUBENSTEIN J. L. R. and

RAKIC P. (ed.) Comprehensive Developmental Neuroscience: Neural Circuit Development and Function in the Brain, volume 3, pp. 231-248 Amsterdam: Elsevier.

© 2013 Elsevier Inc. All rights reserved.

Page 2: This article was originally published in the Comprehensive

Neural Circuit Development and Function in the Brain: Comprehensiv

Neuroscience, Volume 3 http://dx.doi.org/10.1016/B978-0-12-397

Comprehensive Developmental Neurosc

C H A P T E R

Author's personal copy

13

Statistical Learning Mechanisms in InfancyJ. Lany1, J.R. Saffran2

1University of Notre Dame, Notre Dame, IN, USA; 2University of Wisconsin-Madison, Madison, WI, USA

O U T L I N E

13.1 Learning Probability Distributions 23213.1.1 Tracking Probability Distributions in

Speech Sounds 23213.1.2 Tracking Probabilities in Sampling

Events 234

13.2 Learning Co-occurrence Statistics 23513.2.1 Sequential Learning: Phonotactics 23513.2.2 Sequential Structure: Word

Segmentation 23513.2.3 Sequential Learning: Grammatical

Patterns 23713.2.4 Learning Nonadjacent Co-occurrence

Probabilities 23713.2.5 Learning Co-occurrence Statistics in Other

Domains 23813.2.6 Neural Correlates of Learning Sequential

Structure 239

13.3 Learning Correlational Structure 23913.3.1 Sensitivity to Correlated Cues in

Language Acquisition 23913.3.2 Learning Correlational Structure in the

Visual Domain 240

13.4 Learning Associations between Words andReferents 24213.4.1 Sensitivity to the Statistics of Objects

Taking the Same Label 24313.4.2 Sensitivity to the Internal Statistics of

Labels in Word Learning 244

13.5 Conclusions 245

Acknowledgments 246

References 246

e Develop

267-5.00

ience:

231mental

034-0

Neural Circuit Develo

pment

While psychologists long imagined that infants expe-rience the world as a bewildering array of sights, sounds,and sensations (James, 1890), research since the 1950shas shown that infants are born with more advancedperceptual and cognitive abilities than was oncethought. For example, infants’ visual, auditory, tactile,and vestibular systems are functional at birth, andtheir discrimination abilities in these areas are alreadyremarkably acute (see Kellmann and Arterberry, 2000for a comprehensive discussion). They show evidenceof processing and remembering stimuli in their environ-ment; even very young infants differentially attend to re-peated stimuli versus novel stimuli (e.g., Fagan, 1970).Infants also exhibit differential attention to stimulus fea-tures. For example, newborns prefer complex, patternedstimuli to homogeneous ones (Olson and Sherman,1983), facelike stimuli to nonfacelike ones (Valenza

et al., 1996), and speech to other acoustic stimuli(Vouloumanos et al., 2010).

In addition to advances in our understanding of in-fants’ perceptual abilities, there has been great progressin the last two decades in the study of the mechanismsby which infants learn about their environment. Scien-tists began studying whether infants can learn via classi-cal and operant conditioning, and by the 1970s,researchershadmade substantial progress indiscoveringthe conditions under which such learning occurred(Rovee-Collier, 1986; Sameroff, 1971). In the decadessince, researchers have continued to make exciting dis-coveries about the nature of the learning mechanismsbywhich infants become sensitive to structure in their en-vironment. In this chapter, the focus is on recent researchon infants’ ability to learn four different kinds of statisti-cal structure: probability distributions, sequential

# 2013 Elsevier Inc. All rights reserved.

and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 3: This article was originally published in the Comprehensive

232 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

structure, correlations between stimulus dimensions,and associations between different forms of information.These types of statistical structures appear to be founda-tional to many aspects of development, includingauditory and visual perception, language development,early cognition, and event processing. Much of ourdiscussion is focused on the role of these learningmech-anisms in language acquisition, though connections arealso made between language learning and learning inother domains. Whenever possible, research investigat-ing the relationships between behavioral findings andunderlying neural processes has also been presented.The studies reviewed in this chapter reveal the power,nuance, and complexity of infant statistical learningmechanisms and contribute to anunderstanding of theirrole in early cognitive development.

13.1 LEARNING PROBABILITYDISTRIBUTIONS

13.1.1 Tracking Probability Distributionsin Speech Sounds

A basic form of learning available to infants is sensi-tivity to the frequencies with which events occur in theenvironment. Early research on infant perceptual devel-opment revealed that infants can differentiate betweenfrequent and infrequent events: when presented withsuccessive trials of two pictures side by side in whichone is repeated while the other changes, infants spendmore time looking toward the novel picture (Fantz,1964). Recent studies revealed that by the time they areabout 8months old, infants can do something potentiallymuch more powerful than tracking frequencies of indi-vidual events: they can track the relative frequencies ofitems within a set, or probability distributions.

Differences between speech sounds, or phonemes,that correspond to differences in the meanings of wordsare called ‘phonetic contrasts.’ For example, the Englishsounds /r/ and /l/ are considered to be different pho-nemes, representing a phonetic contrast, because aswitch from one to the other changes the meaning of aword (i.e., ‘right’ vs. ‘light’). However, while /r/ and /l/are perceived differently by adult speakers of English,they are not by speakers of Japanese, as these speechsounds do not correspond to a meaningful differencein their language. Thus, adults’ ability to discriminateamong speech sounds reflects the phonetic organiza-tion of their native language; adults are best able todiscriminate between sounds that embody phoneticcontrasts.

While adults’ phonetic perception differs according totheir native language, landmark research in the 1970srevealed that early in development, infants across

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

languages readily discriminate among almost all of thespeech sounds of the world (e.g., Eimas et al., 1971; forreview, see Aslin et al., 1998a). Within the first year oflife, however, infants’ phonetic discrimination patternsbecome closely matched to the phonetic contrasts rele-vant in their native language (Kuhl et al., 1992; Werkerand Tees, 1984). Within a given language, the distribu-tion of speech sounds along a particular continuumappears to reflect phonetic category information (e.g.,Lisker and Abramson, 1964). Specifically, every produc-tion of a speech sound differs on a variety of acousticdimensions, influenced by the characteristics of individ-ual speakers’ vocal tract, speech rate, coarticulation, andphrase or sentence-level prosody. However, for acousticdimensions that are critical in the perceived differencesbetween phonemes, the values will tend to form nonuni-form distributions, in which some values along a dimen-sion are much more frequent than others (i.e., forminga bimodal or trimodal distribution). Interestingly, thedifferences relevant for highlighting phonemic contrastsmay be exaggerated in infant-directed speech (Burnhamet al., 2002; Kuhl et al., 1997; Werker et al., 2007). Adultscan capitalize on nonuniform variation in speech soundsto discriminate between phonemes (Kluender et al.,1998), and computational modeling studies suggest thatsensitivity to probability distributions may play an im-portant role in learning phonetic contrasts (McMurrayet al., 2009; Valhaba et al., 2007).

Maye et al. (2002) tested whether experience with un-evenly distributed sound profiles influences phoneticperception in English learning in infants at 6 and8 months of age. They created a set of eight speechsounds forming a continuum between [da] and [ta].The endpoints of the continuum differed in the amountof voicing in the initial aspiration, from [da] (voiced) to[ta] (unvoiced), with six intermediate values. Infants inthe unimodal condition heard a distribution in whichthe intermediate values (tokens 4 and 5) occurred mostfrequently, with decreasing frequencies toward the tailsof the distribution. For infants in the bimodal condition,tokens 2 and 7 occurred with high frequency, while thetokens at the midpoint and endpoints were relativelyless frequent.

After the infants were familiarized to the unimodal orbimodal distribution, they were tested to determinewhether they discriminated between sounds at the op-posite endpoints of the [da] – [ta] continuum, using apreferential listening task. In this procedure, the presen-tation of an auditory stimulus is contingent on infants’attention to a visual stimulus, such as a flashing lightor a checkerboard. As long as the infants continue to looktoward the visual stimulus, the auditory stimulus con-tinues to play, but once they look away, it stops and anew trial begins. If infants’ phonetic discrimination is af-fected by the distributional properties of the speech

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 4: This article was originally published in the Comprehensive

23313.1 LEARNING PROBABILITY DISTRIBUTIONS

Author's personal copy

sounds, they should continue to discriminate betweenstimuli at the endpoints when the distributional infor-mation supports the maintenance of two phoneticcategories. However, when the values of the tokens forma unimodal distribution, infants should treat the soundsas belonging to a single phonetic category, ignoringvariation in voicing. Only infants in the bimodal condi-tion showed discrimination of the endpoints, suggestingthat experience with unimodal distributions along anacoustic continuum may play a role in the loss of sensi-tivity to phonetic contrasts not relevant in theirlanguage.

Recent research suggests that experience with distri-butional variation in speech input can also result in anenhancement or sharpening of discrimination. For exam-ple, Kuhl et al. (2006) found that at 6–8 months of age,both English- and Japanese-learning infants discrimi-nate between /r/ and /l/, a phonetic contrast presentonly in English. By 10–12 months of age, the Japanese-learning infants showed reduced discrimination, consis-tent with other studies showing a loss of discriminationof nonnative language contrasts at this age. the English-learning infants, on the other hand, showed better dis-crimination between /r/ and /l/ at 10–12 months thanat 6–8 months. Moreover, infants who hear speech inwhich acoustic differences in phonemic contrasts are ex-aggerated also show enhanced speech discriminationskills (Liu et al., 2003), suggesting that they capitalizeon acoustic information differentiating speech sounds.

Maye et al. (2008) tested the role of distributionallearning in the enhancement of speech-sound discrimi-nation using a contrast that is difficult for infants to dis-criminate: the contrast between pre-voiced and short-lagstop consonants, such as [da] versus [ka] (Aslin et al.,1981). They found that 8-month-old infants learningEnglish failed to discriminate the endpoints when givenno additional exposure to speech sounds on the contin-uum, confirming that this is a difficult distinction for in-fants. Infants exposed to a unimodal distribution oftokens in the lab also failed to show discrimination.However, infants exposed to a bimodal distributionshowed evidence of discrimination, suggesting thatsensitivity to distributional information can lead to en-hanced discrimination of difficult speech-sound con-trasts. Interestingly, at 10–12 months of age, just afterthe period of perceptual reorganization, infants’speech-sound discrimination abilities show diminishingeffects of experience with distributional information.Specifically, they need more extensive experience witha bimodal distribution to show evidence of discrimina-tion of speech sounds that are not contrastive in theirnative language (Yoshida et al., 2010).

In sum, developmental changes in infants’ phoneticperception appear to result, at least in part, from theirability to track distributional regularities in their native

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

language. Kuhl (2004) has proposed that infants’ experi-ence with acoustic and distributional cues works in con-cert withmaturational changes in neural development toproduce such changes. Data from neurophysiological re-cordings suggest that there are neural changes in infants’processing of nonnative phonetic contrasts. Cheour et al.(1998) used an oddball paradigm to test Finnish infants’discrimination of native and nonnative vowel contrastsat both 6 and 12 months of age. In these studies, a vowelpresent in the infants’ native language was repeated,with another ‘oddball’ vowel presented on about 1 in10 trials. The oddball was either another native-languagevowel, which differed in the frequency of the secondformant, or a vowel that does not occur in Finnish.Evoked-response potential (ERP) studies using the odd-ball paradigm have shown that when infants detect achange to the repeating stimulus, the ERP wave showsa negative deflection beginning at about 200 ms, calleda mismatch negativity, or MMN. At 6 months of age,Finnish infants showed a similar MMN to both the na-tive and nonnative oddball vowels, but by 12 months,they showed a much greater MMN response to the na-tive vowel. Using a similar paradigm in a longitudinalstudy with English-learning infants, Rivera-Gaxiolaet al. (2005b) found an enhancement in the neural re-sponse to native-language contrasts between 7 and 11months. They also found that infants continued to showa neural response indicative of discrimination for nonna-tive vowels, though the neural manifestation of the dis-crimination took two different forms. In one group ofinfants, the oddball elicited a negative component, sim-ilar to their response at 6 months of age, while in a sec-ond group, it elicited a greater positive component.Interestingly, Rivera-Gaxiola et al. (2005a) found that in-fants who continued to show a negative component inresponse to nonnative oddball phonemes later had smal-ler native-language vocabularies, while those thatshowed a positive response had larger vocabularies.The authors hypothesized that the development of thepositive response in infants who later developed moreadvanced language skills reflects neural reorganizationin response to native-language patterns. These data arein linewithKuhl’s (2004) hypothesis that changes in neu-ral organization may play a role in changes in phonemeperception during infancy.

In sum, experience with distributional informationplays an important role in the dramatic changes inspeech-sound discrimination that take place between8 and 10 months of age. While infants remain sensitiveto distributional cues marking nonnative phonemic con-trasts after the period of reorganization, these findingssuggest that neural changes, increased experience withthe distributional properties of one’s native language,or a combination of these factors, reduces their impacton discrimination. It is important to note that the social

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 5: This article was originally published in the Comprehensive

234 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

context in which infants experience nonnative languagedistributional patterns may also modulate the effectsof such patterns, but by increasing sensitivity to novelphonetic contrasts – infants show greater sensitivity todifference in nonnative contrasts after listening to anative speaker in person than after watching and listen-ing to a native speaker on a video tape (Kuhl et al., 2003).These findings suggest that although reorganizationtakes place early and has dramatic effects on speechperception, the capacity to learn novel contrasts ismaintained and can be facilitated through socialinteraction.

13.1.2 Tracking Probabilities in Sampling Events

The studies described so far suggest that distribu-tional information may facilitate perceptual reorganiza-tion in the auditory realm over the first year. Anotherline of recent research suggests that sensitivity to the dis-tributional regularities plays a pervasive role in learningand cognition across diverse areas of development. In-ductive learning involves encountering a finite set of ex-amples (a sample), and using that information toextrapolate beyond the sample to behave flexibly whenencountering new examples. To do so, learners must useinformation about the distribution of samples to learnabout the underlying properties of the population fromwhich the samples were drawn.

In a recent series of studies, Xu and Garcia (2008) ex-plicitly tested infants’ sensitivity to the relationship be-tween the distribution of individual samples and theunderlying populations they belong to. In one experi-ment, 8-month-old infants watched an experimenterbring out a covered box and remove 5 balls of mostlyone color, either 4 red and 1 white, or 4 white and1 red ball with her eyes closed. After she had removedthe 5 balls, the experimenter removed the box’s cover.Thus, the infants could not see the composition of ballsin the box until after the sample balls were removed.The infants looked at the box longer on trials when thesample was unlikely under conditions of random sam-pling (i.e., when the container containedmostly red ballsafter the experimenter had just drawnmostly white ballsfrom it). This suggests that infants’ knowledge of theproperties of the sample influenced their expectationsabout the properties of the population (i.e., the coveredbox). Interestingly, when the same experiment was re-peated, but the experimenter drew the ping-pong ballsfrom her pocket, infants showed no differences in look-ing times toward boxes containing mostly red versusmostly white balls. In other words, when the test ballswere not sampled from the box, infants did not responddifferentially to the contents of the box as a function ofthe properties of the sample. This suggests that in the

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

initial experiment, infants were responding to the rela-tion between a sample and the population they weredrawn from, rather than to more superficial perceptualrelationships between the sample balls and the balls inthe box.

Teglas et al. (2007) also investigated whether infants’processing of sampling events is affected by populationinformation. Twelve-month-olds viewed a moviedepicting a container with a small opening at its base,with four objects bouncing inside. Critically, three ofthe objects were identical, and one was unique. Next,an occluder covered the container, and one of the objectsexited the container through the opening. On half of thetrials, the infants saw events with probable outcomes(one of the three identical objects exited), and on theother trials, they saw the less probable outcome (theunique object exited the container). The infants lookedlonger at the improbable than the probable outcome,suggesting that the composition of objects in the con-tainer influenced their processing of the outcomes. How-ever, when the container was partitioned such that onlythe unique object was physically capable of exiting, theinfants looked longer when one of the three identicalobjects exited the container. This suggests that they werenot simply responding to surface information aboutobject frequency, looking longer toward a less commonor less familiar object, but rather were responding to therelationship between the population of objects in thecontainer and the outcome of the sampling event.

These studies also hint that infants are sensitive to in-formation about the sampling process itself. For exam-ple, when the sampling process is random, as when anexperimenter with her eyes closed picked balls out of abox in Xu and Garcia (2008), infants expect the sampleto reflect the global properties of the population. How-ever, when the sampling was not random, infantsshowed evidence of adjusting their expectations aboutthe relations between a sample and the population. Forexample, when an occluder made some sampling eventsimpossible in Teglas et al. (2007), infants did not appearto expect that the more frequent elements would besampled. Moreover, Xu and Denison (2009) found thatinfants use information about an individual’s prefer-ences (e.g., for red balls) to guide their sampling be-havior, and they are not surprised when she looks intoa box containingmostly white balls and picks out mostlyred ones.

Taken together, these findings suggest that across do-mains, infants are highly sensitive to the probabilities ofindividual events and can integrate this information inorder to track the probabilities of sets of items relatedto one another in an underlying distribution or popula-tion. In the domain of speech perception, this process ap-pears to facilitate differentiation between random noiseand meaningful variations in similar sounds. In the

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 6: This article was originally published in the Comprehensive

23513.2 LEARNING CO-OCCURRENCE STATISTICS

Author's personal copy

domain of event perception, sensitivity to distributionalinformation allows infants to generalize from individualexperiences to anticipate future scenarios.

13.2 LEARNING CO-OCCURRENCESTATISTICS

Within a given domain, the reliable co-occurrence of el-ements is typically a surface manifestation of a meaning-ful relationship between those elements. Co-occurrenceinformation thus has the potential to play a pervasive rolein learning, as long as learners are sensitive to it. Despitethe seemingly simple nature of such associative relation-ships, tracking sequential associations in most environ-mental patterns can be quite challenging. It entailsencoding a stimulus, aswell as those that precede and fol-low it, and maintaining memory representations acrossmultiple occurrences of each stimulus. This might notseem difficult to do for one item that occurs in a reliablecontext, but consider tracking this information for a set of100, or even 1000 different items that occur in highlyvariable contexts.

Natural language provides a compelling illustrationof this difficult problem. Given that the sounds compris-ing words and sentences unfold over time, as opposed tobeing expressed simultaneously, spoken language is richwith sequential structure. Language is hierarchically or-ganized, consisting of patterns at both very fine-grainedand larger-grained levels, and there are sequential regu-larities at each level of structure. In part because ofthe high demands that tracking such complex sequentialstructure would place on infants, the potential role of co-occurrence learningmechanisms in language acquisitionhas been the object of a great deal of interest. Numerousstudies have tested two related hypotheses: (1) trackingsequential associations could lead to the discovery oflanguage structure, both within words as well as acrosswords and in simple grammatical patterns, and (2) in-fants have sufficient computational resources to tracksuch complex information.

13.2.1 Sequential Learning: Phonotactics

At a very fine-grained level, languages are organizedaccording to phonotactic patterns: regularities in howsounds are structured within words. What makes theseregularities interesting is that they differ across lan-guages, and thus must be learned. For example, Englishsyllables can begin with some consonant clusters (e.g.,/st/ and /fr/), but not others (e.g., /fs/ and /ng/).However, none of these sequences can appear in syllableonsets in Japanese. Infants appear to be sensitive to pho-notactic patterns by 9 months of age (Friederici and

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

Wessels, 1993; Jusczyk et al., 1993, 1994). In addition,9-month-old infants can learn novel phonotactic patternsgiven brief laboratory experience with novel languages(e.g., Saffran and Thiessen, 2003). Following exposureto lists of novel words in which the initial consonantwithin a syllable was always voiced (e.g., /g/, /d/,/b/) and the final consonant was unvoiced (e.g., /k/,/t/, /p/), infants were able to distinguish between novelwords consistent with those patterns and ones that fol-lowed the opposite pattern. However, when voicingwas not a consistent feature of syllable onsets and offsets(i.e., /g/, /b/, and /t/ occur in syllable-initial position,while /k/, /d/, and /p/ occur in syllable-final position),infants failed to show evidence of learning the phonotac-tic regularities. This suggests that infants can learn posi-tional regularities when they follow a consistent pattern,but that learning restrictions on the positions of individ-ual elements is taxing for infants at this age.

Interestingly, by 16.5 months, infants successfullylearn individual restrictions on sound locations withinsyllables (Chambers et al., 2003). These results suggestthat with age and linguistic experience, infants’ abilityto learn more specific regularities is enhanced. Becauselearning an abstract pattern typically results from theability to track regularities or similarities across individ-ual exemplars, it is often more demanding than trackinginformation about specific exemplars. However, whenexemplars share a salient feature, the need for trackingexemplar-specific information can be reduced and maylessen the computational demands on learning.This may bewhy younger infants were able to track pho-nological regularities only when they conformed to aphonological generalization.

13.2.2 Sequential Structure:Word Segmentation

Another source of sequential regularity in languagepertains to how syllables are combined to form words.Syllables that reliably co-occur often belong to the sameword, while syllables that rarely co-occur are more likelyto span word boundaries (e.g., Swingley, 2005). Becauseof this feature of language, transitional probabilitiesbetween syllables tend to be higher within wordsthan between words. The transitional probability (TP)of a co-occurrence relationship between two elements,X and Y, is computed by dividing the frequency ofXY by the frequency of X. This yields the probabilitythat if X occurs, Y will also occur. Saffran et al. (1996)tested whether 8-month-old infants, who are just begin-ning to learn their first words, can capitalize on TP cues.Infants listened to a stream of synthesized speech inwhich the only potential cue to word boundaries wasthe TPs between adjacent syllables (1.0 within words;0.33 at word boundaries). The infants listened to the

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 7: This article was originally published in the Comprehensive

236 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

stream for about 2 min and were then tested on theirability to discriminate between the syllable sequenceswith high TPs (‘words’) and novel combinations ofthe familiar syllables (‘nonwords’). Infants listened lon-ger to the strings containing nonwords, suggesting theydistinguished between syllable sequences that wereattested in their input (TPs ¼ 1.0) and sequences thatdid not occur (TPs ¼ 0). A second group of infantswas familiarized in the same manner and tested on se-quences with high TPs ‘words’ (all internal TPs were1.0), and ‘part-words’: sequences of syllables that hadoccurred across word boundaries (e.g., the final syllableof oneword and the first two syllables of another word),which had lower TPs of 0.33. Infants again displayed anovelty preference, suggesting that they can also dis-criminate sequences that contain highly reliable transi-tions from those that contain less reliable ones.

Subsequent studies more specifically probed thekinds of sequential statistics that infants are sensitiveto in these studies, in particular, distinguishing betweenthe frequency of a sequence and its transitional probabil-ity. The frequency of a sequence is simply the number oftimes it occurs, while the transitional probability of a co-occurrence relationship provides a measure of howtightly linked or connected X and Y are, controlling forthe raw frequency of X. If X occurs many times withoutY, then no matter how many times it occurs with Y,the conditional probability will be relatively low.Conversely, a sequence can have low frequency but ahigh conditional probability.

In the Saffran et al. (1996) study, the syllable transi-tions within words were both more frequent and hadhigher TPs than the syllable transitions in part-words.Aslin et al. (1998b) thus investigated whether infantscan track TPs, or just frequencies. Specifically, theymod-ified the design of Saffran et al. (1996) such that two ofthe words occurred twice as often as the other words.This manipulation permitted a design in which the fourtest items – two words and two part-words – wereequally frequent; however, the TPs within words were1.0, and the TPs spanning the word boundaries were still0.33. Infants showed discrimination between the wordsand part-words, despite the fact that the syllable se-quences occurred equally often in both types of testitems. This suggests that by 8months of age, infants havea powerful mechanism for tracking co-occurrence rela-tionships and for distinguishing potentially spuriousco-occurrences from ones in which there is a very tightconnection.

Another recent study further specified the nature ofthe regularities that infants can use to learn reliable co-occurrence relationships. The TPs in these studies haveprimarily been described as prospective relationships,or the probability that, given the occurrence of a syllable,another syllable will follow (i.e., the probability that the

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

syllable ‘by’ will follow the syllable ‘ba,’ as in the wordbaby). However, infants could also be sensitive to theprobabilities of retrospective relationships (i.e., the prob-ability that the syllable ‘ba’ will precede the syllable ‘by’).In the studies of sensitivity to TPs reviewed so far, thewords contained forward TPs (FTPs) and backwardTPs (BTPs) of 1.0, and thus infants could have used eitheror both to discriminate words from part-words. More-over, in natural languages, FTPs and BTPs are both likelyto be reliable cues to word boundaries, and thus trackingboth would potentially be advantageous to infants.Pelucchi et al. (2009a) thus familiarized 8-month-oldinfants with a corpus in which HTP and LTP words dif-fered in their BTPs, but both had FTPs of 1.0. Thus, in-fants could only use BTPs to distinguish between theHTP and LTP words. Infants showed significant dis-crimination between HTP and LTP words, suggestingthat they were sensitive to the BTPs of the syllable se-quences. While FTPs may facilitate anticipating upcom-ing stimuli, infants’ sensitivity to BTPs may help them toremember what came before a particular element. Thus,infants’ sensitivity to both types of relationships poten-tially enhances word segmentation and other tasks thatrequire sequential learning and processing.

In most of the studies described so far, infants weregiven limited exposure to artificial languages, producedas synthetic speech,which is not as engaging as naturallyspoken infant-directed speech. More recent studies haveinvestigated whether infants also use these cues to seg-ment speech when there are many more competing reg-ularities to attend to, and many more individualsegments over which to compute TPs. For example,Pelucchi et al. (2009b) tested whether infants can trackforward TPs in a natural language. They presentedEnglish-learning 8-month-olds with Italian sentencesspoken with infant-directed prosody. Embedded inthese sentences were two words with high internal TPs(1.0) and two words with low internal TPs (0.33). Overthe course of the 2-min familiarization phase, infantsheard each of these words presented just 18 times. Whentested, infants discriminated between high TP wordsand low TP words, suggesting that they were able totrack TPs despite the complexity and variability of thematerials. Pelucchi et al. (2009a), described above, alsoused naturally spoken Italian in their work showing thatinfants can track backward TPs.

In another study examining how statistical learningmechanisms scale up to the complexity of natural lan-guage, Johnson and Tyler (2010) tested infants’ abilityto use TPs in a corpus in which the words varied in syl-lable number. They familiarized 8- and 5.5-month-oldinfants to a synthesized speech stream in which twobisyllabic words and two trisyllabic words wereconcatenated in a continuous stream. Interestingly, in-fants failed to discriminate words from part-words

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 8: This article was originally published in the Comprehensive

23713.2 LEARNING CO-OCCURRENCE STATISTICS

Author's personal copy

under these conditions. This suggests that infants benefitfrom the presence of other regularities beyond TPs (i.e.,consistent syllable length). Indeed, learners also appearto benefit from rhythmic patterns that point to wordboundaries, though the degree to which this occurs de-pends on the infants’ age, presumably reflecting theamount of native-language knowledge the infant has ac-quired (e.g., Johnson and Jusczyk, 2001; Theissen andSaffran, 2003, 2007).

Altogether, these results provide strong evidence thatinfants are sensitive to sequential relationships betweenphonemes and syllables. The extent to which such cuesinteract with other potential cues to word boundaries,such as stress patterns, sentential prosody, and the oc-currence of words in isolation, are areas particularly inneed of further study.

13.2.3 Sequential Learning: GrammaticalPatterns

At yet another level up in the hierarchy of languagestructure, sensitivity to sequential relationships appearsto play a role in learning grammatical patterns, such aslearning how words can be combined into phrases andsentences. For example, Saffran andWilson (2003) askedwhether sensitivity to word boundaries arising fromtracking TPs between syllables provides a foundationfor tracking word combinations. They played 12-month-old infants synthesized speech strings in whichthe TPs between syllables served as a reliable cue toword boundaries (i.e., TPs within words were 1.0, whileTPs of syllables spanning word boundaries were 0.25).The strings also contained word-order patterns thatwere not directly cued by relationships between sylla-bles, but could only be detected by tracking the TPs be-tween thewords. Infants were then tested on their abilityto discriminate novel grammatical strings from ungram-matical ones. Despite the fact that the TPs of adjacent syl-lables could not be used to distinguish between thegrammatical and ungrammatical test strings, infantsshowed significant discrimination. These findings sug-gest that sensitivity to sequential relationships resultingin segmentation may bolster subsequent learning byhelping infants to learn sequential relationships betweenwords.

Gomez and Gerken (1999) also tested whether 12-month-old infants can learn TPs within multiword‘sentences.’ They created an artificial language in whichnonsense words were combined to form sentences inwhich there were probabilistic regularities in the order-ing of words. Infants were then tested on novel gram-matical and ungrammatical strings that containedfamiliar words. Infants successfully discriminatedbetween grammatical and ungrammatical strings,

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

providing evidence that they learned the probabilisticco-occurrence relationships between words. In a subse-quent experiment, infants were also able to distinguishbetween grammatical and ungrammatical strings whenthe strings were instantiated in a novel vocabulary. Thisfinding suggests that learning sequential statistics canlead not just to knowledge of individual sequences butalso potentially to a more abstract level of structure.

13.2.4 Learning Nonadjacent Co-occurrenceProbabilities

The studies on sequential learning described thus farsuggest that infants readily track predictive relation-ships between adjacent segments such as syllables andwords. However, in natural languages, predictive rela-tionships can also occur between nonadjacent elements.For example, grammatical dependencies marking tenseare often nonadjacent, as in the relationship betweenauxiliaries such as ‘is’ and the progressive inflection‘ing,’ as they are necessarily separated by a verb (e.g.,‘is running,’ ‘is eating,’ ‘is talking’). Likewise, a pluralnoun predicts plural marking on the subsequent verb,but the noun and verb can be separated by modifiers,as in ‘The kids who were late to school are in trouble.’ Track-ing nonadjacent dependencies places greater demandson memory than tracking adjacent dependencies, as ele-ments must be remembered long enough to be linked toother elements occurring later in time. In addition, be-cause nonadjacent dependencies can be separated byseveral word elements, there are many potentially irrel-evant relationships for the learner to track, presenting aconsiderable computational burden.

Given the demands involved in detecting nonadjacentregularities, it is perhaps unsurprising that both infantsand adults have substantial difficulty learning them. Forexample, while infants can track the relationships be-tween adjacent elements well before they turn a yearold, infants start showing sensitivity to grammaticalrelationships involving nonadjacent elements in theirnative language only at about 18 months of age(Santelmann and Jusczyk, 1998), suggesting that somecombination of language experience and maturation ofneural substrates for memory is needed to facilitate non-adjacent dependency learning.

Gomez (2002) investigated the conditions that pro-mote sensitivity to nonadjacent relationships. In particu-lar, she hypothesized that there is a relationship betweenthe presence of salient adjacent structure and the ten-dency to track nonadjacent structure. Because trackingadjacent relationships appears to be relatively easy for in-fants, they are likely to focus on adjacent structure as longas it is reliable. However, when the variability in adjacentrelationships is high (i.e., when adjacent TPs are low),

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 9: This article was originally published in the Comprehensive

238 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

infants may be less likely to track those relationships andmore likely to track reliable nonadjacent regularities.

To test this hypothesis, Gomez (2002) exposed 18-month-olds to artificial language strings containing non-adjacent dependencies with TPs of 1.0. Critically, thesedependencies were separated by an intervening elementdrawn from a set of 3, 12, or 24 different elements, andthus the adjacent TPs between words varied acrossconditions: relatively high (0.33, set size 3), medium(0.08, set size 12), or very low (less than 0.01, set size24). Only infants in the set size 24 condition, in whichthe predictability of adjacent sequences was very low,successfully discriminated between familiar grammati-cal strings and strings that violated the nonadjacentrelationships.

In subsequent research, Gomez and Maye (2005)found that 15-month-old infants track nonadjacent de-pendencies even under conditions of high variabilitybut that 12-month-olds fail to do so. This developmentalpattern suggests that increases in memory capacity overthe second year facilitate nonadjacent dependency track-ing. In addition to developments in memory, anotherfactor in the development of nonadjacent dependencylearning is prior language experience. Just as segmenta-tion facilitates learning higher-order patterns in whichthose elements are combined, Lany and Gomez (2008)found that prior learning about adjacent relationshipsfacilitates infants’ detection of nonadjacent patterns. Inparticular, when 12-month-old infants were first givenexperience with adjacent relationships between wordcategories, they then successfully detected novel nonad-jacent relationships between words from thosecategories.

13.2.5 Learning Co-occurrence Statisticsin Other Domains

The studies described so far suggest that infants’ sen-sitivity to sequential regularities could play a role indetecting multiple layers of language structure, fromphonotactics to grammar. Far from being specific tolearning in the auditory domain, infants’ sensitivity tosequential structure plays an important role in learningacross domains. Sequential relationships are importantdimensions of event structure, such as in brushing one’steeth (e.g., picking up the toothbrush, applying tooth-paste, wetting the brush, and brushing), making coffee,or opening a toy container. The subcomponents of the ac-tion tend to co-occur in reliable sequences, and thustracking sequential structure in the visual domain mayplay a role in both detecting event boundaries and inlearning their internal structure.

Indeed, Baldwin et al. (2008) found that adults can usestatistical information to learn regularities in dynamic

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

action sequences. Employing a design similar to studiesinvestigating word segmentation (e.g., Saffran et al.,1996), adults viewed a series of discrete actions splicedtogether. There were action triplets embedded withinthe stream such that within a sequence of individual ac-tions (e.g., poke, scrub, drink), TPs were 1.0, and acrossaction sequences, the TPs between actions were 0.33. Theonly cues to the action sequence boundaries were theTPs. Adults were able to discriminate trained action se-quences (TPs of 1.0) from unattested sequences of famil-iar actions (TPs of 0.0), as well as from sequencesspanning boundaries (TPs of 0.33). They also showed ev-idence of detecting sequential regularities in action se-quences when the individual actions (e.g., the action ofdrinking) varied considerably in their perceptual fea-tures across training.

While similar studies with action sequences have notyet been done with infants, numerous studies suggestthat infants track sequential statistics in other domainsof visual processing. For example, Kirkham et al.(2002) tested infants’ ability to detect sequential relation-ships in a series of shapes. In these experiments, 2-, 5-,and 8-month-old infants were habituated to a series ofsix shapes that appeared sequentially on a screen. Theobjects were grouped into three sets of sequential pairs.Within a pair, one shape was reliably followed by an-other shape; the TPs between objects within a pair were1.0, while the TPs at pair boundaries were 0.33. At all ofthe ages tested, infants discriminated between stringsthat preserved the statistical regularities and sequencesin which the sequential regularities were violated. Whileinfants in this experiment could use either frequency ofco-occurrence or TPs to distinguish the test trials,Marcovitch and Lewkowicz (2009) found that by 4months of age, infants can separately track frequencyof shape co-occurrence as well as the conditional proba-bility of shape pairs. These findings suggest that the abil-ity to track sequential co-occurrence relationships in thevisual domain emerges quite early in development.

Other studies have focused on infants’ ability to learnsequential regularities in dynamic events. Kirkham et al.(2007) showed 8- and 11-month-old infants a spatiotem-poral sequence in which a red dot appeared consecu-tively in the six positions of a 2�3 grid. There werereliable ‘location pairs’ in the sequence: for example, ifthe dot first appeared in the top left, it always subse-quently appeared in the top middle. After the secondelement in a pair, the dot could appear in one of threelocations. Thus, within the continuous sequence, theTPs within a location pair were 1.0, and TPs spanningpairswere 0.33. Only the 11-month-olds distinguished be-tween sequences in which location pairs were preservedand sequences containing the unattested transitions.However, when each location in the habituation se-quence was occupied by a different object, 8-month-olds

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 10: This article was originally published in the Comprehensive

23913.3 LEARNING CORRELATIONAL STRUCTURE

Author's personal copy

distinguished between test sequences in which the loca-tion pairings were preserved and those in which theywere violated. Eye tracking data revealed that infantswere also faster to reach the second element in a locationpair, which was predictable given the previous element,than to initial elements, which were not predictable giventhe last object/location, providing additional evidencethat they had learned the spatiotemporal regularities.These results suggest that learning location-based co-occurrence relationships is challenging, developing be-tween 8 and 11 months of age, despite robust abilitiesto track purely temporal co-occurrence relationships thatdevelop many months earlier. However, infants can suc-ceed at younger ages when the sequential relationshipsare also marked by featural cues, or when the occurrenceof the object at one location predicts not just a subsequentlocation but also the identity of the object that will appearnext.

13.2.6 Neural Correlates of Learning SequentialStructure

Given the behavioral results amassed over the past de-cade, researchers have become extremely interested in theneurophysiological processes involved in tracking se-quential regularities. Cunillera et al. (2006, 2009) recordedERPsasadults listened toacontinuousstreamof3-syllablewordscuedbyTPs, similar to thematerialsusedbySaffranet al. (1996). They observed an increased negativity (anN400) occurring just after the onset of syllables in word-initial position appearing over the course of familiariza-tion with the stream. Abla et al. (2008) found a similarpattern when learners were presented with continuousstreams of ‘tone words,’ an effect that was more pro-nounced in participants who showed better discrimina-tion in a subsequent forced-choice test. This is similar tobehavioral studies by Saffran et al. (1999), which showedsuccessful segmentation of tone sequences using TPs.Sanders et al. (2002) measured ERPs to trisyllabic wordspresented in isolationbeforeandafterhearing thosewordspresented in continuous strings with TPs as the only cuestowordboundary locations.Despite thedifference in theirmethods, they also found that words evoked a greaterN400 after training, as well as an N100 at syllable onset.Moreover, ERP recordings of sleeping newborn infantswho were played similar streams of continuous syllablesshowanegativedeflectionof theERPwaveduring the firstsyllable of a word (Teinonen et al., 2009). These findingssuggest that there is a reliable neural signature related totracking TPs in the auditory domain.

Interestingly, similar ERP signatures have been ob-served in adults learning sequential co-occurrence rela-tionships in visually presented stimuli. For example,Abla andOkanoya (2009) found that triplet onsetswithin

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

a visually presented stream of shapes also evoked anN400. In other words, just as in the case of syllable andtone streams, the first element of a statistically definedunit within a stream of shapes evoked a distinctive re-sponse. There was also overlap in the areas in whichthe increased N400 was observed across these studies –specifically middle frontal and central sites – across bothauditory and visual modalities.

Together, the findings suggest that the N400 reflectssomething about learning reliable sequences across do-mains. In many studies of language processing, N400components reflect the occurrence of an unexpectedword, and thus its appearance at the first syllable of sta-tistically defined words could reflect the fact that its oc-currence was less predictable than the internal syllables.It could also reflect a search for the representation of thesequence as the first syllable is heard.

13.3 LEARNING CORRELATIONALSTRUCTURE

Another form of learning involves tracking correla-tions among properties of objects and events. Similar totracking a sequence of events such as syllables or objects,tracking correlational structure entails learning associa-tions, and thus there is potential overlap in the learningmechanisms involved. In the case of learning correla-tional structure, however, the relevant associations lie be-tween features or dimensions of an object or event (suchas an object or its label) rather than sequential compo-nents as they unfold over time. First, recent researchsuggesting that tracking correlations facilitates wordsegmentation and learning grammatical categories isdescribed.Research suggesting that sensitivity to correla-tional structureplays an important role in segmenting thevisualworld, aswell as in forming object categories (suchas learning to distinguish between cats and dogs or be-tween plates and spoons) is also discussed.

13.3.1 Sensitivity to Correlated Cues inLanguage Acquisition

In the previous section, evidence that tracking se-quential structure facilitates finding word boundariesin fluent speech (Aslin et al., 1998a,b; Saffran et al.,1996), as well as some evidence suggesting that sensitiv-ity to sequential TPsmay interactwithotherphonologicalregularities in word forms was reviewed. Specifically,Johnson and Tyler (2010) found that when word lengthis consistent, even 5.5-month-old infants can track TPsin continuous speech. However, when word length var-ies, infants have a harder time using TPs to segment thestream. This suggests that infants’ ability to use TPs as

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 11: This article was originally published in the Comprehensive

240 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

a segmentation cue may be facilitated by the presence ofcorrelated cues such as regular word length.

Sahni et al. (2010) investigated whether sensitivity toTPs marking word boundaries can facilitate learningother correlated cues relevant to segmentation. They ex-posed 9-month-old infants to a continuous stream ofbisyllabic words in which TPs within words were 1.0and TPs across word boundaries were 0.25. Critically,all the ‘words’ beganwith /t/ (e.g., tohsigh, teemay, tiepu).The question of interest was whether infants could useTP information to learn enough about the speech streamto discover the /t/-initial cue. Infants successfully dis-criminated between novel words that had /t/ onsetsandwords in which /t/ occupied amedial location, sug-gesting they had learned that words beginwith /t/. Crit-ically, infants failed this same discrimination test whenthey had no prior familiarization to the speech stream,or when they were familiarized with a stream in whichevery other syllable began with /t/ but there were noTPs indicating word boundaries. These results suggestthat infants can learn correlations between word proper-ties to aid segmentation – in this case, using a known cue(TPs) to discover a novel correlated cue to word bound-aries (/t/-onsets).

Evidence that infants can learn sequential relation-ships between words by 12 months of age has alreadybeen reviewed (Gomez and Gerken, 1999; Saffran andWilson, 2003). However, a critical property of grammat-ical structure is that it pertains not just to how individualwords are combined (e.g., ‘the cat’ is grammatical, but‘cat the’ is not) but also to how words from differentgrammatical categories can be combined (e.g., deter-miners like ‘the’ and ‘a’ precede nouns rather than followthem). While tracking distributional information is use-ful for learning word-order patterns, learning grammat-ical categories with distributional cues alone can be verydifficult (Braine, 1987; Smith, 1969). Importantly, in nat-ural languages, words fromdifferent syntactic categoriesare distinguished both by distributional (Cartwright andBrent, 1997; Mintz, 2003; Mintz et al., 2002; Monaghanet al., 2005; Redington et al., 1998) and phonological cues(Farmer et al., 2006; Kelly, 1992; Monaghan et al., 2005).In other words, words from the same grammatical cate-gory share correlated distributional and phonologicalfeatures. For example, nouns tend to occur after both‘a’ and ‘the,’ and to have a strong–weak stress pattern.Critically, Braine (1987) found that the presence of corre-lated cues facilitates learning the category-level co-occurrence relationships in adults. Braine hypothesizedthat the presence of the additional cues reduces the com-putational demands involved in tracking a large numberof individual co-occurrence relationships between spe-cific words. Rather than tracking and accurately remem-bering each individual co-occurrence relationship, thepresence of correlated cues may allow learners to detect

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

the associations between the phonological featuresshared by words within a category.

To test whether infants can also use correlated cuesto form grammatical categories, Gerken et al. (2005) ex-posed 17-month-old English-learning infants to Russianwords drawn from two grammatical categories. InRussian, words have complex morphological structure,often consisting of a stem plus multiple grammaticalmorphemes. The familiarization set consisted of six fem-inine and six masculine words, and all of the words con-tained additional grammatical morphemes: femininewords ended in the case markers ‘oj’ and ‘u’ and mascu-line words ended in the case markings ‘ya’ and ‘em.’ Thecase markings provided distributional cues to the femi-nine and masculine categories. An additional phonolog-ical cue marking the category distinction was present onhalf of the words: three of the feminine words containeda derivational suffix ‘k,’ and three of the masculinewords contained the suffix ‘tel.’ Thus, in many femininewords, ‘k’ was followed by ‘oj’ and ‘us’ (e.g., ‘polku’ and‘polkoj’), while in many masculine words, ‘tel’ was fol-lowed by ‘ya’ and ‘em’ (e.g., ‘zhitelya’ and ‘zhitelyem’).Infants familiarizedwith thesewordswere subsequentlyable to distinguish between novel grammatical wordscontaining those relationships and ungrammatical ones:for example, even if they had not heard ‘zhitelyem,’ theywere able to distinguish it from the ungrammatical ‘zhi-telu.’ They were also able to distinguish between ‘vannoj’(grammatical combination) and ‘vannya’ or ‘vannyem’(ungrammatical combinations). In this case, they couldnot have been using the co-occurrence relationships be-tween the case markers and derivational morphemes(the ‘telya’ and ‘koj’ sequences), but were generalizingbased on distributional information.

Thus, whenword categories are marked by correlatedcues sharing phonological properties and distributionalcharacteristics, infants successfully learn and generalizethe category relationships. While 17-month-olds showevidence of generalizing based on distributional infor-mation, Gerken et al. (2005) found that 12-month-oldsfailed to do so. However, using a similar category-learning paradigm, Gomez and Lakusta (2004) foundthat 12-month-olds can learn correlations between distri-butional and phonological features. These findings areconsistent with the hypothesis that category learningmay initially involve tracking the correlations betweendistributional and phonological features of words.

13.3.2 Learning Correlational Structure in theVisual Domain

Infants’ visual environments are exceptionally com-plex. Nevertheless, elements that reliably co-occuracross time and space tend to belong to the same object

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 12: This article was originally published in the Comprehensive

24113.3 LEARNING CORRELATIONAL STRUCTURE

Author's personal copy

and thus provide information about object boundaries.Thus, just as in the case of segmenting words from con-tinuous speech, the co-occurrence of features in the vi-sual field is potentially a powerful cue that could beused to learn what clusters form objects. To test whetherinfants can track such conditional probabilities in com-plex visual displays, Fiser and Aslin (2002) showed9-month-old infants scenes containing three elements.In each scene, three discrete shapes were presentedsimultaneously in a 2�2 grid. Two of the elementsformed a ‘base pair’: those objects always occurred to-gether in a consistent spatial orientation (the probabilityof co-occurrence, or TP, was 1.0). The two elements ofeach base pair also occurred with a third element, butits position relative to the base pair varied (TP¼0.25).Infants were then tested on scenes containing twoelements – either base pair elements in their properconfiguration or two elements that did not form a basepair. Infants were able to discriminate between basepairs and nonbase pairs when they differed in the fre-quency with which they appeared during habituation,and also when they were equally frequent but differedin their TPs. This suggests that sensitivity to statisticalinformation facilitates object perception and raises thequestion of whether such learning can support objectperception in younger infants.

Category learning is another domain in which sensi-tivity to correlations between object attributes might fa-cilitate learning. Object categories (e.g., cups, dogs, birds,and trees) are structured such that properties character-istic of the category tend to co-occur within individualinstances of the category. For example, the presence ofone feature of a tree (branches, leaves, bark, trunk) is cor-related with the presence of the others, but less stronglycorrelated with the presence of features associated withobjects from other categories (e.g., ceramic handles, fur,and beaks).

Younger (1985) asked whether infants’ sensitivity tosuch correlations plays a role in category learning.Ten-month-old infants viewed drawings of a set of ani-mals that were composed of several continuously vary-ing features (i.e., leg length, tail width, neck length, andear separation). In the ‘broad’ condition, infants were ex-posed to a set of animals with features that were uni-formly distributed (e.g., animals with long necks hadboth long and short legs), forming one category with abroad distribution of feature values. In the ‘narrow’ con-dition, the feature values formed two correlated clusters:for example, long-necked animals all had short legs,while short-necked animals had long legs. In this condi-tion, the animals clustered together into two potentialcategories, each with a narrow range of features.

Infants were habituated to a set of these animals andthen tested with novel animals. One animal containedfeature values that were the average of all the animals

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

in the familiarization set for the broad condition, but fall-ing in between the average of the two narrow categories.The second animal contained a set of features that werecloser to the average of one of the categories in the nar-row condition, but farther from the average of the broadcategory. Infants in the narrow condition dishabituatedto the broad stimulus, while infants in the broad condi-tion dishabituated to the test animal closer to the averageof one of the narrow categories. This suggests that whenfeature values formed two clusters, infants formed cate-gories corresponding to correlations between values onthese dimensions. However, when feature values wererandomly distributed, infants did not group the animalsinto separate categories.

In similar studies, Younger and Cohen (1986) foundthat 7-month-olds, but not 4-month-olds, are sensitiveto correlations among object features. However,Mareschal et al. (2005) noted that because 4-month-oldinfants appear to have learned some perceptually de-fined categories, for example, the difference betweensquares and diamonds (Bomba and Siqueland, 1983),and cats and dogs (Eimas and Quinn, 1994), the studiesof Younger and colleagues may have underestimated in-fants’ categorization abilities. They hypothesized that foryoung infants with limited memory, longer exposures toindividual animalsmaymake it more difficult to keep allthe animals in memory. In particular, if infants can onlyretain information about one animal at a time, theywould be unable to detect the correlations between ob-ject features across a set of animals. Thus, they modifiedthe habituation phase such that the individual animalswere presented for briefer durations. Under these condi-tions, 4-month-olds showed evidence of forming catego-ries based on correlations between features.

These findings suggest that tracking correlated cuesplays an important role in learning perceptually basedobject categories. There is also a potentially importantconnection between infants’ ability to track correlationsbetween features and their sensitivity to distributionalinformation. Specifically, studies of distributional learn-ing, such as Maye et al. (2002, 2008), suggest that infantscan use the frequencies across a single dimension of acomplex stimulus to form groupings. In these studies, in-fants show evidence of tracking correlations among thevalues of several distributions.

Quinn et al. (2006) began to investigate the neuralunderpinnings of categorization in infants. In particular,6-month-old infants were presented with a set of imagesof cats and then tested on their looking behavior towarda set of pictures of dogs interspersed with novel cat pic-tures while ERPs were recorded. When tested, infantslooked longer to the novel dogs than to the novel cats,suggesting they treated the within-category images(the novel cats) as more similar to the images from famil-iarization, despite the fact that all images were novel.

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 13: This article was originally published in the Comprehensive

242 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

The ERPs to the pictures of cats and dogs also suggestedthat infants were sensitive to category information. Inparticular, the ERPs to the cats viewed early on duringfamiliarization, as well as to the set of novel dog images,showed a negative slowwave (or a negative deflection ofthe ERP wave) between 1 and 1.5 s after the pictureappeared. The decrease of these components to cats dur-ing the latter part of familiarization indicates that infantswere beginning to respond to novel cats as though theywere familiar. In other words, infants were respondingto category-level information, not just item-level fea-tures. In addition, the dog pictures, but not the novelcat pictures, elicited a negative central component be-tween 300 and 750, a component thought to reflect atten-tional allocation to novel stimuli in infants (Nelson,1994). Quinn et al. suggest that this component may berelated to the behavioral differences in looking to thedog and cat pictures, specifically the novelty preferencefor pictures of dogs observed after infants have beenviewing only pictures of cats. Noting that the negativeslow wave indexing categorization occurs after thecomponent reflecting novelty detection, they suggestthat recognizing similarity between within-categoryobjects is a more complex process than recognition ofnovelty.

Using a similar procedure, Grossmann et al. (2009)also investigated the ERP signatures involved in infantcategorization. The ERP recordings revealed a negativecomponent at 300–500 ms in anterior cortical regionswhen infants were shown an image from a novelcategory, similar to the findings of Quinn et al.(2006). Grossmann et al. (2009) also found that novelexemplars of the familiarized category evoked a latepositive component relative to the response to repeateditems from the category. The authors suggest that thiscomponent may reflect updating of the categoryrepresentation.

In sum, infants can track correlated properties of ele-ments by the time they are 4 months of age. This abilityplays a role in object perception, both in object segmen-tation and categorization, as well as in forming gram-matical categories in the auditory domain. The datafrom the ERP recordings add to our understanding ofthe behavioral findings by showing real-time learning,helping to isolate different aspects of the process(responding to familiarity, novelty, and memory updat-ing), and by shedding light on the neural processes un-derlying discrimination observed at test. In futurestudies, it will be important to test the neural correlatesof learning novel categories, as current evidence pertainsto categories that infants are familiar with prior to theexperiment. In addition, it will be interesting to testwhether there are parallels between the neural correlatesobserved in the visual domain and in grammatical cate-gory formation, as well as whether the signatures of

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

category learning differ from those of learning sequen-tial structure.

13.4 LEARNING ASSOCIATIONSBETWEEN WORDS AND REFERENTS

A central problem facing infant language learners isthat of forming associations between words and theirreferents. One possibility is that this process begins nodifferently than the process of learning correlated objectproperties, as described in the previous section. Indeed,one might think of a label as simply another feature thatis shared by objects within a category (Sloutsky andRobinson, 2008). Others have suggested that words arenot simply a feature like any other but that they have aprivileged role because they are inherently symbolic(Waxman and Gelman, 2009). On this view, words havea different relation to objects than do other features ofobjects, such as their appearance, from the very begin-ning of word learning. In either case, it is clear that learn-ing the associations between words and their referents isa highly complex process, requiring a powerful butselective associative learning mechanism. Even the oc-currence of a word with a very concrete and observablemeaning, such as ‘rabbit,’ will coincidewithmany objectsand events in the environment, sometimes, but not al-ways, including an actual rabbit (e.g., Quine, 1960).Thus, establishing the referent of a novel word poses aformidable challenge. Even if the infant is fixating on arabbit, she still must determine that the label refers tothe animal itself, as opposed to its floppy ears, or to itscolor. Thus, the challenge facing the infant is to forman association between the word and some aspect ofthe environment that contains the referent, but that isneither overly broad nor too narrow.

While the demands of learning words are substantial,recent research suggests that infants’ ability to form as-sociations is both remarkably powerful and highly selec-tive. Infants’ sensitivity to the reliability of co-occurrencerelationships between words and particular aspects ofthe environment may facilitate learning in such complexenvironments. Smith and Yu (2008) tested whetherinfants in the early stages of word learning can capitalizeon these relationships using a cross-situational learningparadigm. They presented 12- and 14-month-old infantswith six words whose referents were embedded in com-plex scenes. On each trial, infants heard a label (e.g.,‘bosa’) while viewing a scene consisting of the referentalong with a distractor object. Given just a single trialof this nature, infants would be unable to determinewhich object was the referent. However, on other trialsin which ‘bosa’ was presented, one of the same objectswas presented along with a different distractor. On othertrials, the label ‘manu’ was presented, and the object that

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 14: This article was originally published in the Comprehensive

24313.4 LEARNING ASSOCIATIONS BETWEEN WORDS AND REFERENTS

Author's personal copy

consistently occurred with it served as a distractor onother trials. Thus across trials, each label consistently oc-curred with one object. After infants were familiarizedwith the label–object training trials, they were testedusing a preferential looking procedure. On each trial, in-fants were shown two objects simultaneously while thelabel for one of themwas repeated. Infants looked signif-icantly longer toward the object that matched the label atboth ages, with stronger effects for the 14-month-oldsthan for the 12-month-olds.

These findings suggest that infants are sensitive to thereliability of label–object pairings across occurrences andcan use information gathered across trials to settle on themost probable referent. However, it is important to notethat words often occur in the absence of their referentand vice versa. Voloumanos and Werker (2009) thustestedwhether infants learn label–object associations un-der more stochastic conditions. They presented 18-month-old infants with three word–object pairings, witheach object labeled 10 times. One of the objects was la-beled by the same word all 10 times, but the other twoobjects were labeled stochastically: most of the time, theyoccurred with one label (8 times), but on two occur-rences, they were paired with a label that occurred pre-dominantly with the other object. Infants were thentested on how well they learned the associations usinga preferential looking task. The results suggest that in-fants were able to find the referent of words labeled sto-chastically: performance for word labeled 10 times or8 times was the same when the correct referent waspaired with a distractor object that had never occurredwith the label. However, when a word labeled 8 timeswas tested with the correct referent (the object it hadco-occurred with 8 times) paired with a distractor objectthat it had co-occurred with 2 times, infants’ perfor-mance was at chance. Thus, experiencing the distractorobject paired with the label just twice over the courseof training disrupted recognition of the more frequentword/object pairing. Together, these findings suggestthat probabilistic co-occurrences between labels and ob-jects are harder for infants to learn than deterministicpairings, but that infants nonetheless develop some sen-sitivity to the association.

Recent work has shed light on the neurophysiologicalcorrelates of detecting reliable word–object associationsduring infancy. Friedrich and Friederici (2008) recordedERPs while 14-month-olds were presented with twonovel nonsensewords, each occurring 8 times. Onewordwas always presented with the same novel object (con-sistent referent), while the other was presented with anew object on each occurrence (inconsistent referent).After the first four exposures to each word, they foundan early fronto-laterally localized negative component(N200-500) that was greater for the word with a consis-tent referent than for the word that had inconsistent

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

referents. Over the course of the second set of four expo-sures, this component diminished, and the words withconsistent referents began to evoke an N400 in parietalregions. Infants were also tested on their memory forthe consistent pairings 24 h later. On these trials, thosewords were paired with either the correct or incorrectreferent. A larger N400 was observed when the wordswere presented with an incorrect referent.

The authors suggest that the N200-500 reflects early-occurring associative processes that begin to link to-gether the auditory and visual features of the consistentevents, and that the N400 reflects semantic integration: astronger encoding of and memory for the word-referentrelation. This interpretation is consistent with thefinding that when 14-month-olds are presented withfamiliar words from their own language, there is a largerN400 to words presented with incongruous labels thancongruous ones (Friedrich and Friederici, 2005). Takentogether, these findings suggest that infants are sensitiveto consistent co-occurrence relationships between wordsand objects, and that distinctive neural processes mayunderlie the formation of these referential associations.

13.4.1 Sensitivity to the Statistics of ObjectsTaking the Same Label

Another way infants’ associations become more re-fined is in terms of the kinds of referents they associatewith novel words. While infants are initially flexible inthe kinds of referents they consider, recent studies sug-gest that this process rapidly becomes constrained byprior learning. Smith and colleagues (Colunga andSmith, 2003, 2005; Jones and Smith, 2002; Smith et al.,2002) suggest while learning novel words may initiallyrequire repeated associations between objects and labels,once a critical mass of labels has been acquired, they de-tect statistical regularities that characterize those associa-tions. For example, object labels such as ‘cup,’ ‘ball,’ and‘phone’ refer to a set of objects that have a common shape,and many of the words in children’s early vocabulariesare objects that are organized by shape (Samuelson andSmith, 1999). However, as children learn more words, ahigher-order abstraction emerges from these specific as-sociations: object labels tend to pick out groups of thingswith similar shapes. Such a generalization would allowchildren to extend object labels on the basis of shape asopposed to other features, such as color or texture. As aresult, upon hearing the label ‘crayon’ referring to a redcrayon, they can use it to refer to new crayon-shapedobjects regardless of their color.

Based on these considerations, Smith et al. (2002)hypothesized that teaching children words fromshape-based categories should result in increasedword learning. They tested this hypothesis by bringing

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 15: This article was originally published in the Comprehensive

244 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

17-month-old infants, who do not yet show a systematicshape bias, into the lab several times over 2 months, andteaching them names for novel objects. In one condition,the object categories were shape based: objects given thesame names had similar shapes. In a control condition,objects that shared names did not have shape in com-mon, but rather had some other property, such as theirmaterial, in common. Infants taught shape-based catego-ries extended trained words to novel items based onshape, while infants in the nonshape-based conditions,or given no training, did not. Moreover, infants in theshape-based category training rapidly extended novellabels based on shape as well and showed a greater in-crease in their overall vocabularies over the 2-monthtraining period than infants in the other conditions.These findings suggest that infants’ experience withwell-structured label–object pairings facilitates the rapidformation of new label–object associations, and theappropriate extension of those labels. Critically, this pro-cess results in rapid gains in vocabulary size and word-learning skill.

Just as increases in vocabulary size lead to more effi-cient word learning (Smith et al., 2002), the neurophysi-ological processes underlyingword learning also appearto change as infants’ vocabulary size increases.Torkildsen et al. (2008b) investigated the ERP signaturesof 20-month-old infants in a novel word-learning task asa function of whether infants had fewer or more than 75words in their productive vocabularies, a period thattypically precedes a period of rapid vocabulary growth.In this task, infants were exposed to training blocks inwhich they viewed three familiar pictures, each labeledby the correct referent. They also saw three pictures ofimaginary animals, each paired with a nonsense word.After eachword had been presentedwith its referent fivetimes, the pictures were presented with incongruous la-bels. All infants showed a larger N400 to the pairings ofreal words with incongruous objects at central and pari-etal recording sites, suggesting that the infants recog-nized the mismatch between label and referent. For thenewly trained words, only the high-vocabulary infantsshowed an N400 when they were paired with an incon-gruent referent. This suggests that the low-vocabularyinfants either did not learn the words or were able toform only a very weak association. The high-vocabularyinfants showed more robust evidence of learning thenovel mappings, though the N400 response was morebroadly distributed than for the familiar words, suggest-ing that their sensitivity to the novel words differed fromthat to more established word–referent pairings.

Torkildsen et al. (2008a) also examined the ERPs dur-ing the five training trials. Consistent with the fact thatonly high-vocabulary infants showed anN400 indicativeof successful learning at test, they found that only thehigh-vocabulary infants showed an increase in an early

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

negative component (N200-400) as the novel words re-peated during training. This component has previouslybeen associated with the early stages of learning themeaning of a word (see Friedrich and Friederici(2008)). While these findings do not suggest a neuralmechanism by which infants narrow in on specific fea-tures as they gain skill in word learning, they do confirmthat the process becomesmore efficient as infants’ vocab-ulary grows.

13.4.2 Sensitivity to the Internal Statisticsof Labels in Word Learning

Additional constraints on the associations that infantsare likely to form during word learning come from theproperties of potential labels themselves. Graf Esteset al. (2007) tested the hypothesis that experience withstatistical cues to word boundaries influences infants’ability to learn word–object pairings. In this study, 16-month-old infants were first familiarized with a streamof syllables in which the only cues to word boundarieswere the TPs between adjacent syllables. Infants werethen habituated to two label–object pairs. For some in-fants, the labels corresponded to the words (TP ¼ 1.0),and for the other infants, the labelswere equally frequentpart-words (TP¼ 0.33) from the speech stream. After ha-bituation to the label–object pairings, infants were testedusing a Same-Switch procedure (Werker et al., 1998). TheSame trials preserved the label–object pairings presentduring habituation, while the label–object pairings wereswitched on Switch trials. Learning is measured by thedegree of increased looking to Switch trials, which is ev-idence of dishabituation. Only infants for whom the la-bels corresponded to words showed longer looking tothe Switch trials, suggesting that learning words com-posed of syllables with high TPs is easier than learninglabels that had internal statistics suggesting that the syl-lables did not cohere into a word. Likewise, Graf Esteset al. (2010) found that infants learn word–object associ-ations better when the labels conform to the predomi-nant phonotactic patterns of the native language.Experience with regularities in fluent speech may thusfacilitate word learning by providing infants with candi-date sound sequences to map to referents, and by pro-moting more accurate processing of those soundsequences, facilitating subsequent recall.

Data from ERP recordings in word-learning tasks alsosuggest that semantic processing is influenced by pho-notactic information. As previously discussed, between12 and 14 months of age, infants show a larger N400when familiar words are presented with incongruousreferents. Friedrich and Friederici (2005) found thatboth 12- and 19-month-old infants showed an earlynegative component that differed between real and

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 16: This article was originally published in the Comprehensive

24513.5 CONCLUSIONS

Author's personal copy

phonotactically legal nonsense words and phonotacti-cally illegal nonsense words presented without refer-ents, which likely reflects sensitivity to the relativefamiliarity of the words. At 19 months, infants also showan N400 response when novel words that are phonotac-tically legal in their language are presented with objectsthat have known labels, suggesting they recognize thelabel–object mismatch. However, phonotactically illegalwords did not elicit an N400, suggesting that they maynot have been perceived as potential labels. These find-ings are consistent with the hypothesis that the N400 re-sponse reflects an advance or change in word-learningskill, and specifically that phonotactic information playsan important role by the end of the second year.

Another potential source of information guidingword learning is the overlap between words’ statisticaland semantic properties at the level of grammatical cat-egories. Words from different lexical categories are cor-related with different semantic regularities (e.g., nounstend to refer to objects and people, adjectives to proper-ties such as color or texture, and verbs to actions orevents). As previously discussed, words from differentlexical categories can also be distinguished by a constel-lation of statistical cues, including distributional andphonological regularities (e.g., Christiansen et al., 2009;Kelly, 1992; Mintz et al., 2002; Monaghan et al., 2005).

A recent study tested whether infants can capitalizeon experience with such cues in a word-learning task(Lany and Saffran, 2010). In this study, 22-month-old in-fants first listened to an artificial language that containedtwo word categories, disyllabic X-words and monosyl-labic Y-words. For infants in the experimental group,phrases took the form aX and bY, and thus words fromthe X and Y categories were reliably marked by corre-lated distributional and phonological cues. Infants inthe control group also heard aX and bY phrases, but inaddition, they heard an equal number of aY and bXphrases. Thus, for the control infants, the word catego-ries were not marked by correlated cues. Infants werethen trained on pairings between phrases from the lan-guage and pictures of unfamiliar animals and vehicles.For both experimental and control groups, familiar aXphrases were paired with animal pictures, and familiarbY phrases were paired with vehicle pictures. Infantswere then tested using a preferential looking procedure.

Interestingly, only the experimental infants were ableto learn the trained associations between phrases andpictures, despite the fact that control infants had thesame amount of experience with them. Moreover, onlythe experimental infants successfully generalized tonovel pairings: when hearing a word with distributionaland phonological properties of other words referring toanimals, they mapped the word to a novel animal over anovel vehicle. These findings suggest that infants’ expe-rience with statistical cues marking word categories lays

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

an important foundation for learning the meanings ofthose words.

In sum, infants’ ability to form associations is remark-ably powerful, but it is also selective. Infants do not sim-ply form an association between a word and object thathappen to co-occur, but rather they form an associationonly when that object reliably occurs across other pre-sentations of that word. And, while infants are initiallyrelatively flexible in the kinds of associations they formbetweenwords and referents, recent studies suggest thatthis process rapidly becomes constrained by prior learn-ing. For example, by 17 months of age, words with goodsequential statistics are more readily associated withreferents than sound sequences that do not have thecharacteristics of typical words in that language (GrafEstes et al., 2007). Moreover, once infants have begunto learn associations between words and referents, thisknowledge influences the associations that infants willsubsequently form. For example, infants rapidly beginto associate novel nouns with objects’ shape over objects’color (e.g., Smith et al., 2002). Likewise, words that havestatistical properties of a particular category are readilymapped to new referents from that category (Lany andSaffran, 2010).

13.5 CONCLUSIONS

The research reviewed in this chapter suggests that in-fants’ sensitivity to statistical structure is powerful andnuanced and plays a role in diverse aspects of learningover the first several months of life. We have discussedinfants’ sensitivity to four different kinds of statisticalstructure: probability distributions, sequential structure,correlations between stimulus dimensions, and associa-tions between different forms of information. Whilewe have discussed these types of structure separately,these are likely not independent learning processes.For example, there are important similarities betweenlearning sequential structure in word segmentationand in learning correlational structure in object segmen-tation. Tracking sequential structure within words mayshare important properties with learning sequential reg-ularities between words and word categories. Likewise,the processes involved in learning word–label associa-tions may be similar to those in learning feature co-occurrences in object categorization. Furthermore,real-world learning processes tap a combination of theselearning mechanisms – for example, learning word–object associations critically relies on a complex interac-tion of sensitivity to phonotactic regularities, wordsegmentation, object categorization, word-order pat-terns, and grammatical categories.

A related point is that statistical learning mechanismsare tuned through experience. For example, in the

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 17: This article was originally published in the Comprehensive

246 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

auditory domain, infants’ sensitivity to reliable TPs be-tween syllables facilitates segmenting words from fluentspeech. In turn, this makes it possible to learn phonolog-ical regularities correlated with TPs, which may furtherfacilitate segmentation in natural speech (Sahni et al.,2010). Likewise, segmentation facilitates learningword-order patterns, and tracking correlations betweensuch distributional properties of words and their phono-logical properties can facilitate learning grammaticalstructure. In the domain of visual perception, trackingthe co-occurrence of features may facilitate segmentingobjects within complex scenes (Fiser and Aslin, 2002),and tracking how these features co-occur across in-stances facilitates the formation of object categories(Younger, 1985). In other words, as learners track simplestructure, they build a foundation for tracking morecomplex, higher-order structure. They also appear tohighlight relevant structure, which can facilitate learningmore computationally challenging regularities (Lanyand Gomez, 2008).

These studies also raise many intriguing questionsfor future research. For example, many studies manip-ulate the statistical properties of infants’ input andshow that this influences learning, but this does notsuggest that infants are tracking input statistics verid-ically. Indeed, studies with older children suggest thatthey, unlike adults, tend to distort input statistics,maximizing probabilities rather than matching them(e.g., Hudson Kam and Newport, 2005). It also leadsto questions of how infants encode these statisticalregularities and how fine-grained their sensitivity is.Also, learning inherently entails perceiving, remem-bering, acting on the environment, and interactingwith others, and thus in future research, it will be im-portant to relate infants’ statistical learning abilities toother developmental processes, such as perception,memory, and social interaction.

Another important area for future research is theunderlying neurophysiology of statistical learning. Re-searchers have just begun to investigate the neurophysio-logical substrates of the behavioral findings, namely, howERPs to stimuli consistent with familiarized patterns dif-fer from ERPs to stimuli that are inconsistent. Given thechallenges inherent in ERP research in infants, much ofthis work has tested learning in adults. However, recentadvances in neurophysiological methodology in infantssuggest that these techniques hold much promise for in-vestigating learning mechanisms directly. Indeed, whilethe discrimination tasks typically used in infant behav-ioral studies can only probe the endpoint (or other staticsnapshot) of the learning process, psychophysiologicaltechniques are particularly well suited to observing theprocess of learning.

In sum, infants’ environments are rich with meaning-ful statistical structure, and infants readily track it. Far

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

from being limited to simple associative learning, suchas the stimulus–response learning of classical learningtheory, the studies reviewed in this chapter reveal a sen-sitivity to statistical structure that is powerful and nu-anced, capable of not only tracking fine-grained detailbut also forming generalizations. Advances in the studyof statistical learning mechanisms have shed new lighton many aspects of development, such as auditory andvisual perception, language development, and eventprocessing. Continued study of infants’ statistical learn-ing mechanisms holds great promise for advancing thestudy of early cognitive development.

Acknowledgments

Preparation of this chapter was supported by National Institute ofChild Health and Human Development Grants F32 HD057698 toJ Lany and R01HD37466 to JR Saffran, by a James S. McDonnell Foun-dation Scholar Award to JR Saffran. The content is solely the responsi-bility of the authors and does not necessarily represent the officialviews of the Eunice Kennedy Shriver National Institute of ChildHealthand Human Development or the National Institutes of Health. Theauthors thank Jessica Payne for helpful comments on a previous draft.

References

Abla, D., Okanoya, K., 2009. Visual statistical learning of shape se-quences: An ERP study. Neuroscience Research 64, 185–190.

Abla, D., Katahira, K., Okanoya, K., 2008. Online assessment of statis-tical learning by event-related potentials. Journal of Cognitive Neu-roscience 20, 952–964.

Aslin, R.N., Pisoni, D.B., Hennessy, B.I., Perey, A.J., 1981. Discrimina-tion of voice onset time by human infants: New findings and impli-cations for the effects of early experience. Child Development 52,1135–1145.

Aslin,R.N., Jusczyk,P.W.,Pisoni,D.B., 1998a.Speechandauditoryproces-sing during infancy: Constraints on and precursors to language. In:Kuhn,D., Siegler,R. (Eds.),HandbookofChildPsychology:Cognition,Perception, and Language, vol. 2. Wiley, New York, pp. 147–254.

Aslin, R.N., Saffran, J.R., Newport, E.L., 1998b. Computation of condi-tional probability statistics by 8-month-old infants. PsychologicalScience 9, 321–324.

Baldwin, D., Andersson, A., Saffran, J.R., Meyer, M., 2008. Segmentingdynamic human action via statistical structure. Cognition 106,1382–1407.

Bomba, P.C., Siqueland, E.R., 1983. The nature and structure of infantform categories. Journal of Experimental Child Psychology 35,609–636.

Braine,M.D.S., 1987.What is learned in acquiringwords classes: A steptoward acquisition theory. In: MacWhinney, B. (Ed.), Mechanismsof Language Acquisition. Erlbaum, Hillsdale, NJ, pp. 65–87.

Burnham, D., Kitamura, C., Vollmer-Conna, U., 2002. What’s newpussycat? On talking to babies and animals. Science 296, 1435.

Cartwright, T.A., Brent, M.R., 1997. Syntactic categorization in earlylanguage acquisition: Formalizing the role of distributional analy-sis. Cognition 63, 121–170.

Chambers, K.E., Onishi, K.H., Fisher, C., 2003. Infants learn phonotacticregularities form brief auditory experience. Cognition 87, B69–B77.

Cheour, M., Ceponiene, R., Lehtokoski, A., et al., 1998. Development oflanguage-specific phoneme representations in the infant brain. Na-ture Neuroscience 1, 351–353.

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 18: This article was originally published in the Comprehensive

24713.5 CONCLUSIONS

Author's personal copy

Christiansen, M.H., Onnis, L., Hockema, S.A., 2009. The secret is in thesound: From unsegmented speech to lexical categories. Develop-mental Science 12, 388–395.

Colunga, E., Smith, L.B., 2003. The emergence of abstract ideas: Evi-dence from networks and babies. Philosophical Transactions bythe Royal Society 358, 1205–1214.

Colunga, E., Smith, L.B., 2005. From the Lexicon to expectations aboutkinds: A role for associative learning. Psychological Review 112,347–382.

Cunillera, T., Toro, J.M., Sebastian-Galles, N., Rodriguez-Fornells, A.,2006. The effects of stress and statistical cues on continuous speechsegmentation: An event-related brain potential study. Brain Re-search 1123, 168–178.

Cunillera, T., Camara, E., Toro, J.M., et al., 2009. Time course and func-tional neuroanatomy of speech segmentation in adults. Neuro-Image 48, 541–553.

Eimas, P.D., Quinn, P.C., 1994. Studies on the formation of perceptuallybased basic-level categories in young infants. Child Development65, 903–917.

Eimas, P.D., Siqueland, E.R., Jusczyk, P., Vigorito, J., 1971. Speech per-ception in infants. Science 171, 303–306.

Fagan, J.F., 1970. Memory in the infant. Journal of Experimental ChildPsychology 9, 217–226.

Fantz, R.L., 1964. Visual experience in infants: Decreased attention tofamiliar patterns relative to novel ones. Science 146, 668–670.

Farmer, T.H., Christiansen, M.H., Monaghan, P., 2006. Phonologicaltypicality influences on-line sentence comprehension. Proceedingsof theNational Academy of Sciences of theUnited States of America103, 12203–12208.

Fiser, J., Aslin, R.N., 2002. Statistical learning of new visualfeature combinations by infants. Proceedings of the National Acad-emy of Sciences of the United States of America 99, 15822–15826.

Friederici, A.F., Wessels, J.M.I., 1993. Phonotactic knowledge of wordboundaries and its use in infant speech perception. Perception &Psychophysics 54, 287–295.

Friedrich, M., Friederici, A.D., 2005. Phonotactic knowledge andlexical–semantic processing in one-year-olds: Brain responses towords and nonsensewords in picture contexts. Journal of CognitiveNeuroscience 17, 1785–1802.

Friedrich, M., Friederici, A.D., 2008. Neurophysiological correlates ofonline word learning in 14-month-old infants. Cognitive Neurosci-ence and Neuropsychology 19, 1757–1761.

Gerken, L.A., Wilson, R., Lewis, W., 2005. Seventeen-month-olds canuse distributional cues to form syntactic categories. Journal of ChildLanguage 32, 249–268.

Gomez, R.L., 2002. Variability and detection of invariant structure. Psy-chological Science 13, 431–436.

Gomez, R.L., Gerken, L.A., 1999. Artificial grammar learning by one-year-olds leads to specific and abstract knowledge. Cognition 70,109–135.

Gomez, R.L., Lakusta, L., 2004. A first step in form-based category ab-straction in 12-month-old infants. Developmental Science 7, 567–580.

Gomez, R.L., Maye, J., 2005. The developmental trajectory of nonadja-cent dependency learning. Infancy 7, 183–206.

Graf Estes, K., Evans, J.L., Alibali, M.W., Saffran, J.R., 2007. Can infantsmapmeaning to newly segmentedwords? Psychological Science 18,254–260.

Graf Estes, K., Edwards, J., Saffran, J.R., 2010. Phonotactic constraintson infant word learning. Infancy 16, 180–197.

Grossmann, T., Gliga, T., Johnson, M.H., Mareschal, D., 2009. The neu-ral basis of perceptual category learning in human infants. Journalof Cognitive Neuroscience 21, 2276–2286.

Hudson Kam, C.L., Newport, E.L., 2005. Regularizing unpredictablevariation: The roles of adult and child learners in language forma-tion and change. Language Learning and Development 1, 151–195.

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

James, W., 1890. The Principles of Psychology. Holt, New York.Johnson, E.K., Jusczyk, P.W., 2001. Word segmentation by 8-month-

olds: When speech cues count more than statistics. Journal of Mem-ory and Language 44, 548–567.

Johnson, E.K., Tyler, M.D., 2010. Testing the limits of statistical learningfor word segmentation. Developmental Science 13, 339–345.

Jones, S.S., Smith, L.B., 2002. How children know the relevant propertiesfor generalizing object names. Developmental Science 5, 219–232.

Jusczyk, P.W., Friederici, A.D., Wessels, J.M., Svenkerud, V.Y.,Jusczyk, A.M., 1993. Infants’ sensitivity to the sound patterns of na-tive languagewords. Journal ofMemory and Language 32, 402–420.

Jusczyk, P.W., Luce, P.A., Charles-Luce, J., 1994. Infants’ sensitivity tophonotactic patterns in their native language. Journal of Memoryand Language 33, 630–645.

Kellmann, P.J., Arterberry, M.E., 2000. The Cradle of Knowledge. TheMIT Press, Cambridge, MA.

Kelly, M.H., 1992. Using sound to solve syntactic problems: The role ofphonology in grammatical category assignments. Psychological Re-view 99, 349–364.

Kirkham, N.Z., Slemmer, J.A., Johnson, S.P., 2002. Visual statisticallearning in infancy: Evidence for a domain general learning mech-anism. Cognition 83, B35–B42.

Kirkham, N.Z., Slemmer, J.A., Richardson, D.C., Johnson, S.P., 2007.Location, location, location: Development of spatiotemporal se-quence learning in infancy. Child Development 78, 1559–1571.

Kluender, K.R., Lotto, A.J., Holt, L.L., Bloedel, S.L., 1998. Role of expe-rience for language-specific functional mappings of vowel sounds.Journal of the Acoustical Society of America 104, 3568–3582.

Kuhl, P.K., 2004. Early language acquisition: Cracking the speech code.Nature Reviews Neuroscience 5, 831–843.

Kuhl, P.K., Williams, K.A., Lacerda, F., Stevens, K.N., Lindblom, B.,1992. Linguistic experience alters phonetic perception in infantsby 6 months of age. Science 255, 606–608.

Kuhl, P.K., Andruski, J.E., Chistovich, I.A., et al., 1997. Cross-languageanalysis of phonetic units in language addressed to infants. Science277, 684–686.

Kuhl, P.K., Tsao, F.M., Liu, H.M., 2003. Foreign-language experience ininfancy: Effects of short-term exposure and social interaction onphonetic learning. Proceedings of theNationalAcademyof Sciencesof the United States of America 100, 9096–9101.

Kuhl, P.K., Stevents, E., Hayashi, A., Deguchi, T., Kiritani, S.,Iverson, P., 2006. Infants show a facilitation effect for native lan-guage phonetic perception between 6 and 12 months. Developmen-tal Science 9, F13–F21.

Lany, J., Gomez, R.L., 2008. Twelve-month-old infants benefit frompriorexperience in statistical learning. Psychological Science 19, 1247–1252.

Lany, J., Saffran, J.R., 2010. From statistics to meaning: Infant acquisi-tion of lexical categories. Psychological Science 21, 284–291.

Lisker, L., Abramson, A.S., 1964. A cross-language study of voicing ininitial stops: Acoustical measurements. Word 20, 384–482.

Liu, H.M., Kuhl, P.K., Tsao, F.M., 2003. An association betweenmothers’ speech clarity and infants’ speech discrimination skills.Developmental Science 6, F1–F10.

Marcovitch, S., Lewkowicz, D.L., 2009. Sequence learning in infancy:The independent contributions of conditional probability and pairfrequency information. Developmental Science 12, 1020–1025.

Mareschal, D., Powell, D., Westermann, G., Volein, A., 2005. Evidenceof rapid correlation-based perceptual category learning by 4-month-olds. Infant and Child Development 14, 445–457.

Maye, J., Werker, J.F., Gerken, L., 2002. Infant sensitivity to distribu-tional information can affect phonetic discrimination. Cognition82, B101–B111.

Maye, J., Weiss, D.J., Aslin, R.N., 2008. Statistical phonetic learning ininfants: Facilitation and feature generalization. DevelopmentalScience 11, 122–134.

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248

Page 19: This article was originally published in the Comprehensive

248 13. STATISTICAL LEARNING MECHANISMS IN INFANCY

Author's personal copy

McMurray, B., Aslin, R.N., Toscano, J.C., 2009. Statistical learning ofphonetic categories: Insights from a computational approach. De-velopmental Science 12, 369–378.

Mintz, T.H., 2003. Frequent frames as a cue for grammatical categoriesin child directed speech. Cognition 90, 91–117.

Mintz, T.H., Newport, E.L., Bever, T.G., 2002. The distributional struc-ture of grammatical categories in speech to young children. Cogni-tive Science 26, 393–424.

Monaghan, P., Chater, N., Christiansen, M.H., 2005. The differentialrole of phonological and distributional cues in grammatical catego-rization. Cognition 96, 143–182.

Nelson, C.A., 1994. Neural correlates of recognitionmemory in the firstpostnatal year of life. In: Dawson, G., Fischer, K. (Eds.), Human Be-havior and the Developing Brain. Guilford Press, New York,pp. 269–313.

Olson, G.M., Sherman, T., 1983. Attention, learning and memory in in-fants. In: Haith, M.M., Campos, J.J. (Eds.), Handbook of Child Psy-chology. Infancy and Developmental Psychobiology, vol. 2. Wiley,New York, pp. 1001–1080.

Pelucchi, B., Hay, J.F., Saffran, J.R., 2009a. Learning in reverse: Eight-month-old infants track backwards transitional probabilities. Cog-nition 11, 244–247.

Pelucchi, B., Hay, J.F., Saffran, J.R., 2009b. Statistical learning in a nat-ural language by 8-month-old infants. Child Development 80,674–685.

Quine, W.V.O., 1960. Word and Object. MIT Press, Cambridge, MA.Quinn, P.C., Westerlund, A., Nelson, C.A., 2006. Neural markers of cat-

egorization in 6-month-old infants. Psychological Science 17, 59–66.Redington,M., Chater, N., Finch, S., 1998. Distributional information: A

powerful cue for acquiring syntactic categories. Cognitive Science22, 425–469.

Rivera-Gaxiola, M., Klarman, L., Garcia-Sierra, A., Kuhl, P.K., 2005a.Neural patterns to speech and vocabulary growth in American in-fants. NeuroReport 16, 495–498.

Rivera-Gaxiola, M., Silva-Pereyra, J., Kuhl, P.K., 2005b. Brain potentialsto native and non-native speech contrasts in 7- and 11-month-oldAmerican infants. Developmental Science 8, 162–172.

Rovee-Collier, C., 1986. The rise and fall of infant classical conditioningresearch: Its promise for the study of early development. Advancesin Infancy Research 4, 139–159.

Saffran, J.R., Thiessen, E.D., 2003. Pattern induction by infant languagelearners. Developmental Psychology 39, 484–494.

Saffran, J.R., Wilson, D.P., 2003. From syllables to syntax: Multi-levelstatistical learning by 12-month-old infants. Infancy 4, 273–284.

Saffran, J.R., Aslin, R.N., Newport, E.L., 1996. Statistical learning by 8-month-old infants. Science 274, 1926–1928.

Saffran, J.R., Johnson, E.K., Aslin, R.N., Newport, E.L., 1999. Statisticallearning of tone sequences by human infants and adults. Cognition70, 27–52.

Sahni, S.D., Seidenberg,M.S., Saffran, J.R., 2010. Connecting cues: Over-lapping regularities support cue discovery in infancy. Child Devel-opment 81, 727–736.

Sameroff, A.J., 1971. Can conditioned responses be established in thenewborn infant: 1971? Developmental Psychology 5, 1–12.

Samuelson, L.K., Smith, L.B., 1999. Early noun vocabularies: Do ontol-ogy, category structure and syntax correspond? Cognition 73, 1–33.

Sanders, L.D., Newport, E.L., Neville, H.J., 2002. Segmenting nonsense:An event-related potential index of perceived onsets in continuousspeech. Nature Neuroscience 5, 700–703.

Santelmann, L.M., Jusczyk, P.W., 1998. Sensitivity to discontinuous de-pendencies in language learners: Evidence for limitations in proces-sing space. Cognition 69, 105–134.

Sloutsky, V.M., Robinson, C.W., 2008. The role of words and sounds ininfants’ visual processing: From overshadowing to attentional tun-ing. Cognitive Science 32, 354–377.

II. COGNITIVE D

Comprehensive Developmental Neuroscience: Neural Circuit Develo

Smith, K.H., 1969. Learning co-occurrence restrictions: Rule learning orrote learning? Journal of Verbal Behavior 8, 319–321.

Smith, L.B., Yu, C., 2008. Infants rapidly learn word-referent mappingsvia cross-situational statistics. Cognition 106, 1558–1568.

Smith, L.B., Jones, S.S., Landau, B., Gershkoff-Stowe, L., Samuelson, L.,2002. Object name learning provides on-the-job training for atten-tion. Psychological Science 13, 13–19.

Swingley, D., 2005. Statistical clustering and the contents of the infantvocabulary. Cognitive Psychology 50, 86–132.

Teglas, E., Girotto, V., Gonzales, M., Bonatti, L.L., 2007. Intuitions ofprobabilities shape expectations of the future at 12 months and be-yond. Proceedings of the National Academy of Sciences of theUnited States of America 104, 19156–19159.

Teinonen, T., Fellman, V., Naatanen, R., Alku, P., Huotilainen,M., 2009.Statistical language learning in neonates revealed by event-relatedbrain potentials. BMC Neuroscience 10, 21.

Theissen, E.D., Saffran, J.R., 2003. When cues collide: Use of stress andstatistical cues to word boundaries by 7- to 9-month-old infants. De-velopmental Psychology 39, 706–716.

Theissen, E.D., Saffran, J.R., 2007. Learning to learn: Infants’ acquisitionof stress-based strategies for word segmentation. Language Learn-ing and Development 3, 73–100.

Torkildsen, J.V.K., Hansen, H.F., Svangstu, J.M., 2008a. Brain dynamicsof word familiarization in 20-month-olds: Effects of productive vo-cabulary size. Brain and Language 108, 73–88.

Torkildsen, J.V.K., Svangstu, J.M., Hansen, H.F., 2008b. Productivevocabulary size predicts event-related potential correlates of fast-mapping in 20-month-olds. Journal of Cognitive Neuroscience 20,1266–1282.

Valenza, E., Simion, F., Macchi Cassia, V., Umilta, C., 1996. Face pref-erence at birth. Journal of Experimental Psychology. Human Per-ception and Performance 22, 892–903.

Valhaba, G.K., McClelland, J.L., Pons, F., Werker, J.F., Amano, S., 2007.Unsupervised learning of vowel categories from infant-directedspeech. Proceedings of the National Academy of Sciences of theUnited States of America 104, 13273–13278.

Voloumanos, A.,Werker, J.F., 2009. Infants’ learning of novelwords in astochastic environment. Developmental Psychology 45, 1611–1617.

Vouloumanos, A., Hauser, M., Werker, J.F., Martin, A., 2010. The tun-ing of human neonates’ preference for speech. Child Development81, 517–527.

Waxman, S.R., Gelman, S.A., 2009. Early word-learning entails refer-ence not merely associations. Trends in Cognitive Sciences 13,258–263.

Werker, J.F., Tees, R.C., 1984. Cross-language speech perception: Evi-dence for perceptual reorganization during the first year of life. In-fant Behaviour and Development 7, 49–63.

Werker, J.F., Cohen, L.B., Lloyd, V.L., Casasola, M., Stager, C.L., 1998.Acquisition of word-object associations by 14-month-old infants.Developmental Psychology 34, 1289–1309.

Werker, J.F., Pons, F., Dietrich, C., Kajikawa, S., Fais, L., Amano, S.,2007. Infant-directed speech supports phonetic category learningin English and Japanese. Cognition 103, 147–162.

Xu, F., Denison, S., 2009. Statistical inference and sensitivity to sam-pling in 11-month-old infants. Cognition 112, 97–104.

Xu, F., Garcia, V., 2008. Intuitive statistics by 8-month-old infants. Pro-ceedings of theNationalAcademyof Sciences of theUnited States ofAmerica 105, 5012–5015.

Yoshida, K.A., Pons, F., Maye, J., Werker, J.F., 2010. Distributional pho-netic learning at 10 months of age. Infancy.

Younger, B.A., 1985. The segregation of items into categories by ten-month-old infants. Child Development 56, 1574–1583.

Younger, B.A., Cohen, L.B., 1986. Developmental changes in infants’perception of correlation among attributes. Child Development57, 803–815.

EVELOPMENT

pment and Function in the Brain, (2013), vol. 3, pp. 231-248