trimsize:7inx10in wixted-vol3 ffirs.tex v1-12/29/2017 6:59p.m. … · 2018. 2. 3. · susan e....

30

Upload: others

Post on 02-Apr-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C
Page 2: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 ffirs.tex V1 - 12/29/2017 6:59 P.M. Page ii�

� �

Page 3: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 ffirs.tex V1 - 12/29/2017 6:59 P.M. Page i�

� �

STEVENS’ HANDBOOK OFEXPERIMENTAL PSYCHOLOGYAND COGNITIVE NEUROSCIENCE

Page 4: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 ffirs.tex V1 - 12/29/2017 6:59 P.M. Page ii�

� �

Page 5: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 ffirs.tex V1 - 12/29/2017 6:59 P.M. Page iii�

� �

STEVENS’ HANDBOOK OFEXPERIMENTAL PSYCHOLOGYAND COGNITIVE NEUROSCIENCEFOURTH EDITION

Volume 3Language & Thought

Editor-in-Chief

JOHN T. WIXTED

Volume Editor

SHARON L. THOMPSON-SCHILL

Page 6: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 ffirs.tex V1 - 12/29/2017 6:59 P.M. Page iv�

� �

This book is printed on acid-free paper. ∞

Designations used by companies to distinguish their products are often claimed astrademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim,the product names appear in initial capital or all capital letters. Readers, however,should contact the appropriate companies for more complete informationregarding trademarks and registration.

Copyright © 2018 by John Wiley & Sons, Inc., Hoboken, NJ. All rights reserved.

Published by John Wiley & Sons, Inc.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system ortransmitted in any form or by any means, electronic or mechanical, includinguploading, downloading, printing, decompiling, recording or otherwise, except aspermitted under Sections 107 or 108 of the 1976 United States Copyright Act,without the prior written permission of the Publisher. Requests to the Publisher forpermission should be addressed to the Permissions Department, John Wiley &Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax(212) 850-6008, E-Mail: [email protected].

This publication is designed to provide accurate and authoritative information inregard to the subject matter covered. It is sold with the understanding that thepublisher is not engaged in rendering professional services. If professional adviceor other expert assistance is required, the services of a competent professionalperson should be sought.

Library of Congress Cataloging-in-Publication Data

The Library of Congress has cataloged the combined volume as follows:

Name: Wixted, John T., editor.Title: Stevens’ handbook of experimental psychology and cognitive

neuroscience / by John T. Wixted (Editor-in-chief).Other titles: Handbook of experimental psychology.Description: Fourth edition. | New York : John Wiley & Sons, Inc., [2018] |

Includes index. Contents: Volume 1. Learning and memory – Volume 2.Sensation, perception, and attention – Volume 3. Language & thought –Volume 4. Developmental & social psychology – Volume 5. Methodology.

Identifiers: LCCN 2017032691 | ISBN 9781119170013 (cloth : vol. 1) |ISBN 9781119170037 (epdf : vol. 1) | ISBN 9781119170020 (epub : vol. 1) |ISBN 9781119170044 (cloth : vol. 2) | ISBN 9781119174158 (epdf : vol. 2) |ISBN 9781119174073 (epub : vol. 2) | ISBN 9781119170693 (cloth : vol. 3) |ISBN 9781119170730 (epdf : vol. 3) | ISBN 9781119170716 (epub : vol. 3) |ISBN 9781119170051 (cloth : vol. 4) | ISBN 9781119170068 (epdf : vol. 4) |ISBN 9781119170082 (epub : vol. 4) | ISBN 9781119170129 (cloth : vol. 5) |ISBN 9781119170150 (epdf : vol. 5) | ISBN 9781119170143 (epub : vol. 5)Subjects: LCSH: Psychology, Experimental. | Cognitive neuroscience.Classification: LCC BF181 .H336 2018 | DDC 150—dc23 LC record available athttps://lccn.loc.gov/2017032691

Wiley also publishes its books in a variety of electronic formats. Some content thatappears in print may not be available in electronic books. For more informationabout Wiley products, visit our web site at www.wiley.com.

Printed in the United States of America.10 9 8 7 6 5 4 3 2 1

Page 7: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 ffirs.tex V1 - 12/29/2017 6:59 P.M. Page v�

� �

Contributors

Blair C. ArmstrongBasque Center on Cognition, Brain andLanguage, Spain

Lawrence W. BarsalouUniversity of Glasgow

Susan E. BrennanStony Brook University

Zhenguang G. CaiUniversity of East Anglia

Manuel CarreirasBasque Center on Cognition, Brain andLanguage, Spain

Paulo F. CarvalhoCarnegie Mellon University

Jeanne CharoyStony Brook University

Evangelia G. ChrysikouDrexel University

Jon Andoni DuñabeitiaBasque Center on Cognition Brain andLanguage, Spain

Frank EisnerRadboud Universiteit Nijmegen, Nijmegen,Gelderland

Matthew GoldrickNorthwestern University

Robert L. GoldstoneIndiana University

Charlotte HartwrightUniversity of Oxford

Emily HongQueen’s University, Canada

Li-Jun JiQueen’s University, Canada

Michael N. JonesIndiana University, Bloomington

Roi Cohen KadoshUniversity of Oxford

Alan KerstenFlorida Atlantic University

Sangeet S. KhemlaniNaval Research Laboratory

Albert E. KimUniversity of Colorado, Boulder

Judith F. KrollUniversity of California, Riverside

Anna K. KuhlenStony Brook University

Heath E. MathesonUniversity of Pennsylvania

Rhonda McClainPennsylvania State University

James M. McQueenRadboud University

Ken McRaeUniversity of Western Ontario

v

Page 8: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 ffirs.tex V1 - 12/29/2017 6:59 P.M. Page vi�

� �

vi Contributors

Christian A. Navarro-TorresUniversity of California, Riverside

Nora S. NewcombeTemple University

Francesco SellaUniversity of Oxford

Lily TsoiBoston College

Gabriella ViglioccoUniversity College, London

Suhui YapQueen’s University, Canada

Eiling YeeUniversity of Connecticut

Liane YoungBoston College

Page 9: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted ftoc.tex V1 - 12/27/2017 6:56 P.M. Page vii�

� �

Contents

PREFACE ix

1 SPEECH PERCEPTION 1Frank Eisner and James M. McQueen

2 THE NEUROCOGNITIVE MECHANISMS OF SPEECH PRODUCTION 47Rhonda McClain and Matthew Goldrick

3 WORD PROCESSING 75Zhenguang G. Cai and Gabriella Vigliocco

4 SENTENCE PROCESSING 111Albert E. Kim

5 DISCOURSE AND DIALOGUE 149Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy

6 READING 207Manuel Carreiras, Blair C. Armstrong, and Jon Andoni Duñabeitia

7 BILINGUALISM 245Judith F. Kroll and Christian A. Navarro-Torres

8 CATEGORIZATION AND CONCEPTS 275Robert L. Goldstone, Alan Kersten, and Paulo F. Carvalho

9 SEMANTIC MEMORY 319Eiling Yee, Michael N. Jones, and Ken McRae

10 EMBODIMENT AND GROUNDING IN COGNITIVE NEUROSCIENCE 357Heath E. Matheson and Lawrence W. Barsalou

11 REASONING 385Sangeet S. Khemlani

vii

Page 10: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted ftoc.tex V1 - 12/27/2017 6:56 P.M. Page viii�

� �

viii Contents

12 MORAL REASONING 431Lily Tsoi and Liane Young

13 CREATIVITY 457Evangelia G. Chrysikou

14 CULTURE AND COGNITION 491Suhui Yap, Li-Jun Ji, and Emily Hong

15 THREE KINDS OF SPATIAL COGNITION 521Nora S. Newcombe

16 THE NEUROCOGNITIVE BASES OF NUMERICAL COGNITION 553Francesco Sella, Charlotte Hartwright, and Roi Cohen Kadosh

Author Index 601

Subject Index 633

Page 11: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 fpref.tex V1 - 12/28/2017 7:57 P.M. Page ix�

� �

Preface

Since the first edition was published in 1951,The Stevens’ Handbook of Experimental Psy-chology has been recognized as the standardreference in the experimental psychologyfield. The most recent (third) edition of thehandbook was published in 2004, and it wasa success by any measure. But the field ofexperimental psychology has changed in dra-matic ways since then. Throughout the firstthree editions of the handbook, the changes inthe field were mainly quantitative in nature.That is, the size and scope of the field grewsteadily from 1951 to 2004, a trend that wasreflected in the growing size of the handbookitself: the one-volume first edition (1951) wassucceeded by a two-volume second edition(1988) and then by a four-volume third edi-tion (2004). Since 2004, however, this still-growing field has also changed qualitativelyin the sense that, in virtually every subdomainof experimental psychology, theories of themind have evolved to include theories ofthe brain. Research methods in experimen-tal psychology have changed accordinglyand now include not only venerable EEGrecordings (long a staple of research in psy-cholinguistics) but also MEG, fMRI, TMS,and single-unit recording. The trend towardneuroscience is an absolutely dramatic,worldwide phenomenon that is unlikely everto be reversed. Thus, the era of purely behav-ioral experimental psychology is already longgone, even though not everyone has noticed.

Experimental psychology and cognitiveneuroscience (an umbrella term that, asused here, includes behavioral neuroscience,social neuroscience, and developmental neu-roscience) are now inextricably intertwined.Nearly every major psychology departmentin the country has added cognitive neurosci-entists to its ranks in recent years, and thattrend is still growing. A viable handbook ofexperimental psychology should reflect thenew reality on the ground.

There is no handbook in existence todaythat combines basic experimental psychol-ogy and cognitive neuroscience, despite thefact that the two fields are interrelated—andeven interdependent—because they are con-cerned with the same issues (e.g., memory,perception, language, development, etc.).Almost all neuroscience-oriented researchtakes as its starting point what has beenlearned using behavioral methods in exper-imental psychology. In addition, nowadays,psychological theories increasingly take intoaccount what has been learned about thebrain (e.g., psychological models increas-ingly need to be neurologically plausible).These considerations explain why I chosea new title for the handbook: The Stevens’Handbook of Experimental Psychology andCognitive Neuroscience. This title serves asa reminder that the two fields go togetherand as an announcement that the Stevens’Handbook now covers it all.

ix

Page 12: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 fpref.tex V1 - 12/28/2017 7:57 P.M. Page x�

� �

x Preface

The fourth edition of the Stevens’ Hand-book is a five-volume set structured asfollows:

1. Learning & Memory: Elizabeth A.Phelps and Lila Davachi (volume editors)

Topics include fear learning, time per-ception, working memory, visual objectrecognition, memory and future imag-ining, sleep and memory, emotion andmemory, attention and memory, motiva-tion and memory, inhibition in memory,education and memory, aging and mem-ory, autobiographical memory, eyewitnessmemory, and category learning.

2. Sensation, Perception, & Attention:John T. Serences (volume editor)

Topics include attention; vision; colorvision; visual search; depth perception;taste; touch; olfaction; motor control; per-ceptual learning; audition; music percep-tion; multisensory integration; vestibular,proprioceptive, and haptic contributionsto spatial orientation; motion perception;perceptual rhythms; the interface theoryof perception; perceptual organization;perception and interactive technology;and perception for action.

3. Language & Thought: Sharon L.Thompson-Schill (volume editor)

Topics include reading, discourse anddialogue, speech production, sentenceprocessing, bilingualism, concepts andcategorization, culture and cognition,embodied cognition, creativity, reasoning,speech perception, spatial cognition, wordprocessing, semantic memory, and moralreasoning.

4. Developmental & Social Psychology:Simona Ghetti (volume editor)

Topics include development of visualattention, self-evaluation, moral devel-

opment, emotion-cognition interactions,person perception, memory, implicitsocial cognition, motivation group pro-cesses, development of scientific thinking,language acquisition, category and con-ceptual development, development ofmathematical reasoning, emotion regula-tion, emotional development, developmentof theory of mind, attitudes, and executivefunction.

5. Methodology: Eric-Jan Wagenmakers(volume editor)

Topics include hypothesis testing andstatistical inference, model comparisonin psychology, mathematical modelingin cognition and cognitive neuroscience,methods and models in categorization,serial versus parallel processing, theoriesfor discriminating signal from noise,Bayesian cognitive modeling, responsetime modeling, neural networks andneurocomputational modeling, methodsin psychophysics analyzing neural timeseries data, convergent methods ofmemory research, models and methodsfor reinforcement learning, culturalconsensus theory, network models forclinical psychology, the stop-signalparadigm, fMRI, neural recordings, andopen science.

How the field of experimental psychologywill evolve in the years to come is anyone’sguess, but the Stevens’ Handbook providesa comprehensive overview of where itstands today. For anyone in search ofinteresting and important topics to pursuein future research, this is the place to start.After all, you have to figure out the direc-tion in which the river of knowledge iscurrently flowing to have any hope of everchanging it.

Page 13: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 1�

� �

CHAPTER 1

Speech Perception

FRANK EISNER AND JAMES M. MCQUEEN

INTRODUCTION

What Speech Is

Speech is the most acoustically complex typeof sound that we regularly encounter in ourenvironment. The complexity of the signalreflects the complexity of the movementsthat speakers perform with their tongues,lips, jaws, and other articulators in order togenerate the sounds coming out of their vocaltract. Figure 1.1 shows two representationsof the spoken sentence The sun melted thesnow—an oscillogram at the top, showingvariation in amplitude, and a spectrogram atthe bottom, showing its spectral character-istics over time. The figure illustrates someof the richness of the information containedin the speech signal: There are modulationsof amplitude, detailed spectral structures,noises, silences, bursts, and sweeps. Someof this structure is relevant in short temporalwindows at the level of individual phoneticsegments. For example, the vowel in the wordsun is characterized by a certain spectral pro-file, in particular the location of peaks inthe spectrum (called “formants,” the darkerareas in the spectrogram). Other structuresare relevant at the level of words or phrases.

FE is supported by the Gravitation program “Language inInteraction” from the Dutch Science Foundation (NWO).

For example, the end of the utterance is char-acterized by a fall in amplitude and in pitch,which spans several segments. The acousticcues that describe the identity of segmentssuch as individual vowels and consonantsare referred to as segmental information,whereas the cues that span longer stretches ofthe signal such as pitch and amplitude enve-lope and that signal prosodic structures suchas syllables, feet, and intonational phrasesare called suprasegmental.

Acoustic cues are transient and come infast. The sentence in Figure 1.1 is spoken at anormal speech rate; it contains five syllablesand is only 1.3 seconds long. The averageduration of a syllable in the sentence is about260 ms, meaning that information about syl-lable identity comes in on average at a rate ofabout 4 Hz, which is quite stable across lan-guages (Giraud & Poeppel, 2012). In additionto the linguistic information that is denselypacked in the speech signal, the signal alsocontains a great deal of additional informationabout the speaker, the so-called paralinguisticcontent of speech. If we were to listen to arecording of this sentence, we would be ableto say with a fairly high degree of certaintythat the speaker is a British middle-agedman with an upper-class accent, and we mightalso be able to guess that he is suffering froma cold and perhaps is slightly bored as herecorded the prescribed phrase. Paralinguistic

1

Page 14: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 2�

� �

2 Speech Perception

The sun

s m e l dth s n

Ωo

^

melted the snow

eϱeϱ

Figure 1.1 Oscillogram (top) and spectrogram (bottom) representations of the speech signal in the sen-tence “The sun melted the snow,” spoken by a male British English speaker. The vertical lines representapproximate phoneme boundaries with phoneme transcriptions in the International Phonetic Alphabet(IPA) system. The oscillogram shows variation in amplitude (vertical axis) over time (horizontal axis).The spectrogram shows variation in the frequency spectrum (vertical axis) over time (horizontal axis);higher energy in a given part of the spectrum is represented by darker shading.

information adds to the complexity of speech,and in some cases interacts with how lin-guistic information is interpreted by listeners(Mullennix & Pisoni, 1990).

What Speech Perception Entails

How, then, is this complex signal perceived?In our view, speech perception is not primar-ily about how listeners identify individualspeech segments (vowels and consonants),though of course this is an important partof the process. Speech perception is alsonot primarily about how listeners identifysuprasegmental units such as syllables and

lexical stress patterns, though this is anoften overlooked part of the process, too.Ultimately, speech perception is about howlisteners use combined sources of segmentaland suprasegmental information to recog-nize spoken words. This is because thelistener’s goal is to grasp what a speakermeans, and the only way she or he can doso is through recognizing the individualmeaning units in the speaker’s utterance: itsmorphemes and words. Perceiving segmentsand prosodic structures is thus at the serviceof word recognition.

The nature of the speech signal poses anumber of computational problems that the

Page 15: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 3�

� �

Introduction 3

listener has to solve in order to be able torecognize spoken words (cf. Marr, 1982).First, listeners have to be able to recognizewords in spite of considerable variability inthe signal. The oscillogram and spectrogramin Figure 1.1 would look very different if thephrase had been spoken by a female ado-lescent speaking spontaneously in a casualconversation on a mobile phone in a noisy skilift, and yet the same words would need to berecognized. Indeed, even if the same speakerrecorded the same sentence a second time, itwould be physically different (e.g., a differ-ent speaking rate, or a different fundamentalfrequency).

Due to coarticulation (the vocal tractchanging both as a consequence of previousarticulations and in preparation for upcom-ing articulations), the acoustic realizationof any given segment can be strongly col-ored by its neighboring segments. There isthus no one-to-one mapping between theperception of a speech sound and its acous-tics. This is one of the main factors that isstill holding back automatic speech recog-nition systems (Benzeghiba et al., 2007). Infact, the perceptual system has to solve amany-to-many mapping problem, becausenot only do instances of the same speechsound have different acoustic properties, butthe same acoustic pattern can result in per-ceiving different speech sounds, dependingon the context in which the pattern occurs(Nusbaum & Magnuson, 1997; Repp &Liberman, 1987). The surrounding contextof a set of acoustic cues thus has importantimplications on how the pattern should beinterpreted by the listener.

There are also continuous speech pro-cesses through which sounds are added(a process called epenthesis), reduced,deleted, or altered, rendering a given wordless like its canonical pronunciation. Oneexample of such a process is given inFigure 1.1: The /n/ of sun is realized more

like an [m], through a process called coronalplace assimilation whereby the coronal /n/approximates the labial place of articulationof the following word-initial [m].

Speech recognition needs to be robust inthe face of all this variability. As we willargue, listeners appear to solve the variabilityproblem in multiple ways, but in particularthrough phonological abstraction (i.e., cate-gorizing the signal into prelexical segmentaland suprasegmental units prior to lexicalaccess) and through being flexible (i.e.,through perceptual learning processes thatadapt the mapping of the speech signal ontothe mental lexicon in response to particularlistening situations).

The listener must also solve the segmenta-tion problem. As Figure 1.1 makes clear, thespeech signal has nothing that is the equiv-alent of the white spaces between printedwords as in a text such as this that reliablymark where words begin and end. In orderto recognize speech, therefore, listeners haveto segment the quasicontinuous input streaminto discrete words. As with variability, thereis no single solution to the segmentationproblem: Listeners use multiple cues, andmultiple algorithms.

A third problem derives from the fact that,across the world’s languages, large lexica(on the order of perhaps 50,000 words) arebuilt from small phonological inventories (onthe order of 40 segments in a language suchas English, and often much fewer than that;Ladefoged & Maddieson, 1996). Spokenwords thus necessarily sound like other spo-ken words: They begin like other words, theyend like other words, and they often haveother words partially or wholly embeddedwithin them. This means that, at any momentin the temporal unfolding of an utterance,the signal is likely to be partially or whollyconsistent with many words. Once again, thelistener appears to solve this “lexical embed-ding” problem using multiple algorithms.

Page 16: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 4�

� �

4 Speech Perception

We will argue that speech perceptionis based on several stages of processing atwhich a variety of perceptual operationshelp the listener solve these three majorcomputational challenges—the variabilityproblem, the segmentation problem, and thelexical embedding problem (see Box 1.1).These stages and operations have beenstudied over the past 70 years or so usingbehavioral techniques (e.g., psychophysicaltasks such as identification and discrimina-tion; psycholinguistic procedures such aslexical decision, cross-modal priming, andvisual-world eye tracking); and neuroscien-tific techniques (especially measures usingelectroencephalography [EEG] and magne-toencephalography [MEG]). Neuroimagingtechniques (primarily functional magneticresonance imaging [fMRI]) and neuropsy-chological approaches (based on aphasicpatients) have also made it possible to startto map these stages of processing onto brainregions. In the following section we willreview data of all these different types. Thesedata have made it possible to specify at leastthree core stages of processing involved inspeech perception and the kinds of opera-tions involved at each stage. The data alsoprovide some suggestions about the neuralinstantiation of these stages.

As shown in Figure 1.2, initial operationsact to distinguish incoming speech-relatedacoustic information from non-speech-related acoustic information. Thereafter,prelexical processes act in parallel to extractsegmental and suprasegmental informationfrom the speech signal (see Box 1.2). Theseprocesses contribute toward solving thevariability and segmentation problems andserve to facilitate spoken-word recognition.Lexical processing receives input fromsegmental and suprasegmental prelexicalprocessing and continues to solve the firsttwo computational problems while alsosolving the lexical-embedding problem.

Box 1.1 Three ComputationalChallenges

1. The variability problemThe physical properties of any given seg-ment can vary dramatically because of avariety of factors such as the talker’s phys-iology, accent, emotional state, or speechrate. Depending on such contextual factors,the same sound can be perceived as differ-ent segments, and different sounds can beperceived as the same segment. The listenerhas to be able to recognize speech in spite ofthis variability.

2. The segmentation problemIn continuous speech there are no acousticcues that reliably and unambiguously markthe boundaries between neighboring wordsor indeed segments. The boundaries are oftenblurred because neighboring segments tendto be coarticulated (i.e., their pronunciationoverlaps in time) and because there is nothingin the speech stream that is analogous to thewhite spaces between printed words. The lis-tener has to be able to segment continuousspeech into discrete words.

3. The lexical-embedding problemSpoken words tend to sound like other spokenwords: They can begin in the same way (e.g.,cap and cat), they can end in the same way(e.g., cap and map), and they can have otherwords embedded within them (e.g., cap incaptain). This means that at any point intime the speech stream is usually (at leasttemporarily) consistent with multiple lexicalhypotheses. The listener has to be able torecognize the words the speaker intendedfrom among those hypotheses.

Finally, processing moves beyond the realmof speech perception. Lexical processing pro-vides input to interpretative processing,where syntactic, semantic, and pragmaticoperations, based on the words that have beenrecognized, are used to build an interpretationof what the speaker meant.

Page 17: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 5�

� �

Introduction 5

Interpretative processing

Lexical form processing

Segmental

prelexical processing

Auditory preprocessing

Auditory inputVisual input Visual input

Suprasegmental

prelexical processing

Figure 1.2 Processing stages in speech perception. Arrows represent on-line flow of information duringthe initial processing of an utterance.

Box 1.2 Three Processing Stages

1. Segmental prelexical processingPhonemes are the smallest linguistic unitsthat can indicate a difference in meaning. Forexample, the words cap and cat differ by oneconsonant, /p/ versus /t/, and cap and cup differby one vowel, /æ/ vs. / v/. Phoneme-sized seg-ments are also perceptual categories, thoughit is not yet clear whether listeners recognizephonemes or some other units of perception(e.g., syllables or position-specific allophones,such as the syllable-initial [p] in pack vs. thesyllable-final [p] in cap). We therefore usethe more neutral term segments. The speechsignal contains acoustic cues to individualsegments. Segmental prelexical processingrefers to the computational processes actingon segmental information that operate priorto retrieval of words from long-term memoryand that support that retrieval process.

2. Suprasegmental prelexicalprocessingThe speech signal contains acoustic cues for ahierarchy of prosodic structures that are largerthan individual segments, including sylla-bles, prosodic words, lexical stress patterns,and intonational phrases. These structuresare relevant for the perception of words.For example, the English word forbear is

pronounced differently depending on whetherit is a verb or a noun even though the segmentsare the same in both words. The difference ismarked by placing stress on the first or secondsyllable, which can for example be signaledby an increase in loudness and/or duration.Suprasegmental prelexical processing refersto the computational processes acting onsuprasegmental information that operate priorto retrieval of words from long-term memoryand that support that retrieval process.

3. Lexical form processingTo understand a spoken utterance, the lis-tener must recognize the words the speakerintended. Lexical form processing refersto the computational processes that lead tothe recognition of words as phonologicalforms (as opposed to processes that determinethe meanings associated with those forms).The listener considers multiple perceptualhypotheses about the word forms that arecurrently being said (e.g., cap, cat, apt, andcaptain given the input captain). Output fromthe segmental and suprasegmental prelexicalstages directs retrieval of these hypothesesfrom long-term lexical memory. Togetherwith contextual constraints, it also influencesthe selection and recognition of words fromamong those hypotheses.

Page 18: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 6�

� �

6 Speech Perception

STAGES OF PERCEPTUALPROCESSING

Auditory Preprocessing

The sounds we encounter in our environmentare converted in the inner ear from physicalvibrations to electrical signals that can beinterpreted by the brain. From the ear, soundrepresentations travel along the ascendingauditory pathways via several subcorticalnuclei to the auditory cortex. Along the way,increasingly complex representations in thespectral and temporal domains are derivedfrom the waveform, coding aspects of thesignal such as the amplitude envelope, onsetsand offsets, amplitude modulation frequen-cies, spectral structure, and modulationsof the frequency spectrum (Theunissen &Elie, 2014). These representations are oftentopographically organized, for example intonotopic “maps” that show selective sen-sitivity for particular frequencies along aspatial dimension (e.g., Formisano et al.,2003). There is evidence for processinghierarchies in the ascending auditory sys-tem (e.g., Eggermont, 2001). For example,whereas auditory events are represented at avery high temporal resolution subcortically,the auditory cortex appears to integrate eventsinto longer units that are more relevant forspeech perception (Harms & Melcher, 2002).Similarly, subcortical nuclei have been foundto be sensitive to very fast modulations of thetemporal envelope of sounds, but the auditorycortex is increasingly sensitive to the slowermodulations such as the ones that correspondto prelexical segments in speech (Giraud &Poeppel, 2012; Giraud et al., 2000).

The notion of a functional hierarchyin sound processing, and speech in par-ticular, has also been proposed for theprimary auditory cortex and surroundingareas. A hierarchical division of the auditorycortex underlies the processing of simple

to increasingly complex sounds both innonhuman primates (Kaas & Hackett, 2000;Perrodin, Kayser, Logothetis, & Petkov,2011; Petkov, Kayser, Augath, & Logothetis,2006; Rauschecker & Tian, 2000) and inhumans (e.g., Binder et al., 1997; Liebenthal,Binder, Spitzer, Possing, & Medler, 2005;Obleser & Eisner, 2009; Scott & Wise, 2004).Two major cortical streams for processingspeech have been proposed, extending inboth antero-ventral and postero-dorsal direc-tions from primary auditory cortex (Hickok& Poeppel, 2007; Rauschecker & Scott,2009; Rauschecker & Tian, 2000; Scott &Johnsrude, 2003; Ueno, Saito, Rogers, &Lambon Ralph, 2011). The anterior streamin the left hemisphere in particular has beenattributed with decoding linguistic meaningin terms of segments and words (Davis &Johnsrude, 2003; DeWitt & Rauschecker,2012; Hickok & Poeppel, 2007; Scott, Blank,Rosen, & Wise, 2000). The anterior streamin the right hemisphere appears to be lesssensitive to linguistic information (Scottet al., 2000), but more sensitive to speakeridentity and voice processing (Belin, Zatorre,Lafaille, Ahad, & Pike, 2000; Perrodin et al.,2011), as well as to prosodic speech cues,such as pitch (Sammler, Grosbras, Anwander,Bestelmeyer, & Belin, 2015). The subcor-tical auditory system thus extracts acousticcues from the waveform that are relevantfor speech perception, whereas speech-specific processes begin to emerge in regionsbeyond the primary auditory cortex (Overath,McDermott, Zarate, & Poeppel, 2015).

Prelexical Segmental Processing

Neural systems that appear to be specificto speech processing relative to other typesof complex sounds are mostly localized tothe auditory cortex and surrounding regionsin the perisylvian cortex (see Figure 1.3).Several candidate regions in the superior

Page 19: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 7�

� �

Stages of Perceptual Processing 7

IFG

TP

aSTG

pSTG

pMTG

M1

A1

SMG

PMC

Figure 1.3 Lateral view of the left hemisphere showing the cortical regions that are central in speechperception. A1, primary auditory cortex; TP, temporal pole; aSTG, anterior superior temporal gyrus;pSTG, posterior superior temporal gyrus; pMTG, posterior middle temporal gyrus; SMG, supramarginalgyrus; M1, primary motor cortex; PMC, premotor cortex; IFG, inferior frontal gyrus. Color version ofthis figure is available at http://onlinelibrary.wiley.com/book/10.1002/9781119170174.

temporal cortex and the inferior parietalcortex (Chan et al., 2014; Obleser & Eisner,2009; Turkeltaub & Coslett, 2010) havebeen shown to be engaged in aspects ofprocessing speech at a prelexical level ofanalysis (Arsenault & Buchsbaum, 2015;Mesgarani, Cheung, Johnson, & Chang,2014). Neural populations in these regionsexhibit response properties that resemblehallmarks of speech perception, such as cat-egorical perception of segments (Liebenthal,Sabri, Beardsley, Mangalathu-Arumana, &Desai, 2013; Myers, 2007; Myers, Blum-stein, Walsh, & Eliassen, 2009). Bilateralregions of the superior temporal sulcushave recently been shown to be selectivelytuned to speech-specific spectrotemporalstructure (Overath et al., 2015). Many pro-cessing stages in the ascending auditorypathways feature a topographic organization,which has led to studies probing whether aphonemic map exists in the superior tem-poral cortex. However, the current evidencesuggests that prelexical units have complex,

distributed cortical representations (Bonte,Hausfeld, Scharke, Valente, & Formisano,2014; Formisano, De Martino, Bonte, &Goebel, 2008; Mesgarani et al., 2014).

The main computational problems to beaddressed during prelexical processing arethe segmentation and variability problems.The segmentation problem is not only alexical one. There are no reliably markedboundaries between words in the incomingcontinuous speech stream, but there are alsono consistent boundaries between individ-ual speech sounds. Whereas some types ofphonemes have a relatively clear acousticstructure (stop consonants, for instance, aresignaled by a period of silence and a suddenrelease burst, which have a clear signaturein the amplitude envelope; fricatives arecharacterized by high-frequency noise witha sudden onset), other types of phonemes,such as vowels, approximants, and nasals,are distinguished predominantly by theirformant structure, which changes relativelyslowly. The final word snow in Figure 1.1

Page 20: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 8�

� �

8 Speech Perception

illustrates this. There is a clear spectrotem-poral signature for the initial /s/, whereas theboundaries in the following sequence /no℧/are much less clear. Prelexical processessegment the speech signal into individualphonological units (e.g., between the /s/ andthe /n/ of snow) and provide cues for lexicalsegmentation (e.g., the boundary betweenmelted and the).

Recent studies on neural oscillationshave suggested that cortical rhythms mayplay an important role in segmenting thespeech stream into prelexical units. Neuraloscillations are important because they mod-ulate the excitability of neural networks;the peaks and troughs in a cycle influencehow likely neurons are to fire. Interestingly,oscillations in the theta range (4–8 Hz) alignwith the quasiperiodic amplitude envelopeof an incoming speech signal. Giraud andPoeppel (2012) have suggested that thisentrainment of auditory networks to speechrhythm serves to segment the speech streaminto syllable-sized portions for analysis.Each theta cycle may then in turn triggera cascade of higher-frequency oscillations,which analyze the phonetic contents of asyllable chunk on a more fine-grained timescale (Morillon, Liégeois-Chauvel, Arnal,Bénar, & Giraud, 2012).

Psycholinguistics has not yet identifiedone single unit of prelexical representationinto which the speech stream is segmented.In addition to phonemes (McClelland &Elman, 1986), features (Lahiri & Reetz,2002), allophones (Mitterer, Scharenborg, &McQueen, 2013), syllables (Church, 1987),and articulatory motor programs (Galantucci,Fowler, & Turvey, 2006) have all been pro-posed as representational units that mediatebetween the acoustic signal and lexical rep-resentations. There may indeed be multipleunits of prelexical representation that captureregularities in the speech signal at differ-ent levels of granularity (Mitterer et al.,

2013; Poellmann, Bosker, McQueen, &Mitterer, 2014; Wickelgren, 1969). The oscil-lations account is generally compatible withthis view, since different representations ofthe same chunk of speech may exist simul-taneously on different timescales. This lineof research in speech perception is rela-tively new, and there are questions aboutwhether the patterns of neural oscillationsare a causal influence on or a consequenceof the perceptual analysis of speech. Someevidence for a causal relationship comesfrom a study that showed that being able toentrain to the amplitude envelope of speechresults in increased intelligibility of the signal(Doelling, Arnal, Ghitza, & Poeppel, 2014),but the mechanisms by which this occurs arestill unclear.

Oscillatory entrainment may also assistlisteners in solving the lexical segmenta-tion problem, since syllable and segmentboundaries tend to be aligned with wordboundaries. Other prelexical segmentalprocesses also contribute to lexical segmen-tation. In particular, prelexical processingappears to be sensitive to the transitionalprobabilities between segments (Vitevitch &Luce, 1999). These phonotactic regularitiesprovide cues to the location of likely wordboundaries. For example, a characteristicof Finnish that is known as vowel harmonyregulates which kinds of vowels can bepresent within the same word. This kindof phonotactic knowledge provides usefulconstraints on where in the speech streamboundaries for particular words can occur,and Finnish listeners appear to be sensitiveto those constraints (Suomi, McQueen, &Cutler, 1997). Regularities concerning whichsequences of consonants can occur withinversus between syllables (McQueen, 1998),or which sequences are more likely to be atthe edge of a word (van der Lugt, 2001), alsosignal word boundary locations.

Page 21: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 9�

� �

Stages of Perceptual Processing 9

After segmentation, the second majorcomputational challenge addressed at theprelexical stage is how the perception sys-tem deals with the ubiquitous variabilityin the speech signal. Variability is causedby a number of different sources, includingspeech rate, talker differences, and contin-uous speech processes such as assimilationand reduction.

Speech Rate

Speech rate varies considerably within aswell as between talkers, and has a substantialeffect on the prelexical categorization ofspeech sounds (e.g., Miller & Dexter, 1988).This is especially the case for categoriesthat are marked by a temporal contrast, suchas voice-onset time (VOT) for stop conso-nants. VOT is the most salient acoustic cueto distinguish between English voiced andunvoiced stops, and thus between wordssuch as cap and gap. However, what shouldbe interpreted as a short VOT (consistentwith gap) or a long VOT (consistent withcap) is not a fixed duration, but depends onthe speech rate of the surrounding phoneticcontext (Allen & Miller, 2004; Miller &Dexter, 1988). Speech rate may even influ-ence whether segments are perceived at all:Dilley and Pitt (2010) showed that listenerstended not to perceive the function word orin a phrase such as leisure or time whenthe speech was slowed down, whereas theydid perceive it at a normal rate. Conversely,when the speech was speeded up, participantstended to perceive the function word when itwas not actually part of the utterance.

Being able to adapt to changes in speakingrate is thus crucial for prelexical processing,and it has been known for some time thatlisteners are adept at doing so (Dupoux &Green, 1997), even if the underlying mecha-nisms are not yet clear. There is evidence thatadaptability to varying speech rates is medi-ated not only by auditory but also by motor

systems (Adank & Devlin, 2010), possibly bymaking use of internal forward models (e.g.,Hickok, Houde, & Rong, 2011), which mayhelp to predict the acoustic consequencesof faster or slower motor sequences. Thereis an emerging body of research that showsthat neural oscillations in the auditory cortexalign to speech rate fluctuations (Ghitza,2014; Peelle & Davis, 2012). It has yet tobe established whether this neural entrain-ment is part of a causal mechanism thattunes in prelexical processing to the currentspeech rate.

Talker Differences

A second important source of variability inspeech acoustics arises from physiologicaldifferences between talkers. Factors like bodysize, age, and vocal tract length can stronglyaffect acoustic parameters such as funda-mental frequency and formant dispersion,which are critical parameters that encodedifferences between many speech sound cat-egories. It has been known for decades thateven when vowels are spoken in isolation andunder laboratory conditions, there is a greatamount of overlap in the formant measures(peaks in the frequency spectrum that arecritical for the perception of vowel identity)for different speakers (Adank, Smits, & Hout,2004; Peterson & Barney, 1952). In otherwords, formant values measured when agiven speaker produces one particular vowelmay be similar to when a different speakerproduces a different vowel. Formant valuesthus need to be interpreted in the contextof acoustic information that is independentof what the speaker is saying, specificallyacoustic information about more generalaspects of the speaker’s physiology.

It has also been known for a long time thatlisteners do this (Ladefoged, 1989; Lade-foged & Broadbent, 1957), and the specificsof the underlying mechanisms are beginningto become clear. The perceptual system

Page 22: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 10�

� �

10 Speech Perception

appears to compute an average spectrumfor the incoming speech stream that can beused as a model of the talker’s vocal tractproperties, and also can be used as a refer-ence for interpreting the upcoming speech(Nearey, 1989; Sjerps, Mitterer, & McQueen,2011a). Evidence from an EEG study (Sjerps,Mitterer, & McQueen, 2011b) shows thatthis extrinsic normalization of vowels takesplace early in perceptual processing (around120 ms after vowel onset), which is consis-tent with the idea that it reflects prelexicalprocessing. Behavioral and neuroimag-ing evidence suggests that there are separateauditory systems that are specialized in track-ing aspects of the speaker’s voice (Andicset al., 2010; Belin et al., 2000; Formisanoet al., 2008; Garrido et al., 2009; Kriegstein,Smith, Patterson, Ives, & Griffiths, 2007;Schall, Kiebel, Maess, & Kriegstein, 2015).These right-lateralized systems appear tobe functionally connected to left-lateralizedsystems that are preferentially engaged inprocessing linguistic information, which mayindicate that these bilateral systems worktogether in adjusting prelexical processing tospeaker-specific characteristics (Kriegstein,Smith, Patterson, Kiebel, & Griffiths, 2010;Schall et al., 2015).

Listeners not only use the talker infor-mation that is present in the speech signalon-line, but also integrate adaptations tophonetic categories over longer stretchesand store these adapted representations inlong-term memory for later use (Norris,McQueen, & Cutler, 2003). Norris et al.demonstrated that listeners can adapt to aspeaker who consistently articulates a partic-ular speech sound in an idiosyncratic manner.The researchers did this by exposing a groupof listeners to spoken Dutch words and non-words in which an ambiguous fricative sound(/sf?/, midway between /s/ and /f/) replacedevery /s/ at the end of 20 critical words (e.g.,in radijs, “radish”; note that radijf is not

a Dutch word). A second group heard thesame ambiguous sound in words ending in /f/(e.g., olijf, “olive”; olijs is not a Dutch word).Both groups could thus use lexical contextto infer whether /sf?/ was meant to be an/s/ or an /f/, but that context should lead thetwo groups to different results. Indeed, whenboth groups categorized sounds on an /s/–/f/continuum following exposure, the groupin which /sf?/ had replaced /s/ categorizedmore ambiguous sounds as /s/, whereas theother group categorized more sounds as /f/.This finding suggests that the perceptual sys-tem can use lexical context to learn about aspeaker’s idiosyncratic articulation, and thatthis learning affects prelexical processinglater on. A recent fMRI study, using a similarparadigm, provided converging evidence foran effect of learning on prelexical processingby locating perceptual learning effects to thesuperior temporal cortex, which is thought tobe critically involved in prelexical decodingof speech (Myers & Mesite, 2014). This kindof prelexical category adjustment can beguided not only by lexical context, but alsoby various other kinds of language-specificinformation, such as phonotactic regularities(Cutler, McQueen, Butterfield, & Norris,2008), contingencies between acoustic fea-tures that make up a phonetic category(Idemaru & Holt, 2011), or sentence context(Jesse & Laakso, 2015).

A critical feature of this type of percep-tual learning is that it entails phonologicalabstraction. Evidence for this comes fromdemonstrations that learning generalizesacross the lexicon, from the words heard dur-ing initial exposure to new words heardduring a final test phase (Maye, Aslin, &Tanenhaus, 2008; McQueen, Cutler, & Nor-ris, 2006; Reinisch, Weber, & Mitterer, 2013;Sjerps & McQueen, 2010). If listeners applywhat they have learned about the fricative/f/, for example, to the on-line recognitionof other words that have an /f/ in them, this

Page 23: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 11�

� �

Stages of Perceptual Processing 11

suggests first that listeners have abstractknowledge that /f/ is a phonological categoryand second that these abstract representationshave a functional role to play in prelexicalprocessing. Thus, although the nature of theunit of prelexical representation is still anopen question, as discussed earlier, these datasuggest that there is phonological abstractionprior to lexical access.

Several studies have investigated whethercategory recalibration is speaker-specificor speaker-independent by changing thespeaker between the exposure and testphases. This work so far has produced mixedresults, sometimes finding evidence of gener-alization across speakers (Kraljic & Samuel,2006, 2007; Reinisch & Holt, 2014) andsometimes evidence of speaker specificity(Eisner & McQueen, 2005; Kraljic & Samuel,2007; Reinisch, Wozny, Mitterer, & Holt,2014). The divergent findings might be partlyexplained by considering the perceptualsimilarity between tokens from the exposureand test speakers (Kraljic & Samuel, 2007;Reinisch & Holt, 2014). When there is a highdegree of similarity in the acoustic-phoneticproperties of the critical segment, it appearsto be more common that learning transfersfrom one speaker to another. In sum, thereis thus evidence from a variety of sourcesthat speaker-specific information in the sig-nal affects prelexical processing, both byusing the speaker information that is avail-able online, and by reusing speaker-specificinformation that was stored previously.

Accents

Everybody has experienced regional orforeign accents that alter segmental andsuprasegmental information so drasticallythat they can make speech almost unintelli-gible. However, although they are a furthermajor source of variability in the speechsignal, the way in which accents deviate fromstandard pronunciations is regular; that is, the

unusual sounds and prosody tend to occur ina consistent pattern. Listeners can exploit thisregularity and often adapt to accents quitequickly. Processing gains have been shown toemerge after exposure to only a few accentedsentences, as an increase in intelligibility(Clarke & Garrett, 2004) or as a decrease inreaction times in a comprehension-based task(Weber, Di Betta, & McQueen, 2014).

An important question is whether the per-ceptual system adapts to an accent with eachindividual speaker, or whether an abstractrepresentation of that accent can be formedthat might benefit comprehension of noveltalkers with the same accent. Bradlow andBent (2008) investigated this question bylooking at how American listeners adapt toChinese-accented English. Listeners wereexposed to Chinese-accented speech comingeither from only one speaker or from severaldifferent speakers. Following exposure, gen-eralization was assessed in an intelligibilitytask with Chinese-accented speech from anunfamiliar speaker. Intelligibility increased inboth conditions during training, but evidenceof generalization to the novel speaker wasfound only after exposure to multiple speak-ers. This pattern suggests that the perceptualsystem can form an abstract representation ofan accent when the accent is shared betweenseveral different speakers, which can in turnaffect how speech from other speakers withthe same accent is processed. Learning alsogeneralized to different speech materialsthat were used in training and test, whichis consistent with the notion that learnedrepresentations of speech patterns can affectperception at the prelexical level.

Continuous Speech Processes

Another aspect of variability tackled bythe prelexical processor is that caused bycontinuous speech processes, including thecoronal place assimilation process shownin Figure 1.1 (where the final segment of

Page 24: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 12�

� �

12 Speech Perception

sun becomes [m]-like because of the fol-lowing word-initial [m] of melted). Severalstudies have shown that listeners are able torecognize assimilated words correctly whenthe following context is available (Coenen,Zwitserlood, & Bölte, 2001; Gaskell &Marslen-Wilson, 1996, 1998; Gow, 2002;Mitterer & Blomert, 2003). Different pro-posals have been made about how prelexicalprocessing could act to undo the effects ofassimilation, including processes of phono-logical inference (Gaskell & Marslen-Wilson,1996, 1998) and feature parsing (Gow, 2002;feature parsing is based on the observationthat assimilation tends to be phoneticallyincomplete, such that, e.g., in the sequencesun melted the final segment of sun has somefeatures of an [m] but also some featuresof an [n]). The finding that Dutch listen-ers who speak no Hungarian show similarEEG responses (i.e., mismatch negativityresponses) to assimilated Hungarian speechstimuli to those of native Hungarian listeners(Mitterer, Csépe, Honbolygo, & Blomert,2006) suggests that at least some forms ofassimilation can be dealt with by relativelylow-level, language-universal perceptualprocesses. In other cases, however, listenersappear to use language-specific phonologicalknowledge to cope with assimilation (e.g.,Weber, 2001).

There are other continuous speech pro-cesses, such as epenthesis (adding a soundthat is not normally there, e.g., the optionalinsertion of the vowel / e/ between the /l/and /m/ of film in Scottish English), resyl-labification (changing the syllabic structure;e.g., /k/ in look at you might move to thebeginning of the syllable /k et/ when it wouldnormally be the final sound of /l℧k/), andliaison (linking sounds; e.g., in some BritishEnglish accents car is pronounced /ka/, butthe /r/ resurfaces in a phrase like car alarm).Language-specific prelexical processeshelp listeners cope with these phenomena.

For instance, variability can arise due toreduction processes (where a segment isrealized in a simplified way or may even bedeleted entirely). It appears that listeners copewith reduction both by being sensitive to thefine-grained phonetic detail in the speech sig-nal and through employing knowledge aboutthe phonological contexts in which segmentstend to be reduced (Mitterer & Ernestus,2006; Mitterer & McQueen, 2009b).

Multimodal Speech Input

Spoken communication takes place pre-dominantly in face-to-face interactions, andthe visible articulators convey strong visualcues to the identity of prelexical segments.The primary networks for integrating audi-tory and visual speech information appearto be located around the temporoparietaljunction, in posterior parts of the supe-rior temporal gyrus, and in the inferiorparietal lobule (supramarginal gyrus andangular gyrus; Bernstein & Liebenthal,2014). The well-known McGurk effect(McGurk & MacDonald, 1976) demonstratedthat auditory and visual cues are immedi-ately integrated in segmental processing, byshowing that a video of a talker articulatingthe syllable /ba/ combined with an auditory/ga/ results in the fused percept of /da/. Theinfluence of visual processing on speech per-ception is not limited to facial information;text transcriptions of speech can also affectspeech perception over time (Mitterer &McQueen, 2009a).

Visual cues can also drive auditory recal-ibration in situations where ambiguousauditory information is disambiguated byvisual information: When perceivers repeat-edly heard a sound that could be either /d/or /b/, presented together with a video of aspeaker producing /d/, their phonetic categoryboundary shifted in a way that was consistentwith the information they received throughlipreading, and the ambiguous sound was

Page 25: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 13�

� �

Stages of Perceptual Processing 13

assimilated into the /d/ category. However,when the same ambiguous sound was pre-sented with the speaker producing /b/, theboundary shift occurred in the opposite direc-tion (Bertelson, Vroomen, & de Gelder, 2003;Vroomen & Baart, 2009). Thus, listeners canuse information from the visual modalityto recalibrate their perception of ambiguousspeech input, in this case long-term knowl-edge about the co-occurrence of certainvisual and acoustic cues.

Fast perceptual learning processes alreadymodulate early stages of cortical speechprocessing. Kilian-Hütten et al. (Kilian-Hütten, Valente, Vroomen, & Formisano,2011; Kilian-Hütten, Vroomen, & Formisano,2011) have demonstrated that early acoustic-phonetic processing is already influencedby recently learned information about aspeaker idiosyncrasy. Using the visuallyguided perceptual recalibration paradigm(Bertelson et al., 2003), regions of the pri-mary auditory cortex (specifically, Heschl’sgyrus and sulcus, extending into the planumtemporale) could be identified whose activ-ity pattern specifically reflected listeners’adjusted percepts after exposure to a speaker,rather than simply physical properties of thestimuli. This suggests not only a bottom-upmapping of acoustical cues to perceptualcategories in the left auditory cortex, but alsothat the mapping involves the integrationof previously learned knowledge within thesame auditory areas—in this case, comingfrom the visual system. Whether linguisticprocessing in the left auditory cortex can bedriven by other types of information, suchas speaker-specific knowledge from the rightanterior stream, is an interesting question forfuture research.

Links Between Speech Perceptionand Production

The motor theory of speech perceptionwas originally proposed as a solution to

the variability problem (Liberman, Cooper,Shankweiler, & Studdert-Kennedy, 1967;Liberman & Mattingly, 1985). Given theinherent variability of the speech signal andthe flexibility of perceptual categories, thesource of invariance may be found in articu-latory representations instead. According tothis view, decoding the speech signal requiresrecovering articulatory gestures throughmental emulation of the talker’s articulatorycommands to the motor system. The motortheory received support following the dis-covery of the mirror neuron system (Fadiga,Craighero, & D’Ausilio, 2009; Galantucciet al., 2006) and from neuroscience researchthat shows effects on speech processingduring disruption of motor systems (e.g.,Meister, Wilson, Deblieck, Wu, & Iacoboni,2007; Yuen, Davis, Brysbaert, & Rastle,2010). However, the strong version of the the-ory, in which the involvement of speech motorareas in speech perception is obligatory, isnot universally accepted (Hickok et al., 2011;Lotto, Hickok, & Holt, 2009; Massaro &Chen, 2008; Scott, McGettigan, & Eisner,2009; Toni, de Lange, Noordzij, & Hagoort,2008). The main arguments against motortheory are that lesions in the motor cortexdo not result in comprehension deficits, thatcomprehension can occur in individuals whoare unable to articulate, and that the motorcortex is not typically activated in fMRI stud-ies using passive-listening tasks. Behavioralevidence against motor theory comes from anexperiment on speech shadowing (Mitterer &Ernestus, 2008): Participants were not slowerto repeat out loud a spoken stimulus whenthere was a gestural mismatch between thestimulus and the response than when therewas a gestural match.

According to the contrasting auditoryperspective, decoding the speech signalrequires an analysis of acoustic cues that maponto multidimensional phonetic categories,mediated by general auditory mechanisms

Page 26: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 14�

� �

14 Speech Perception

(Hickok & Poeppel, 2007; Holt & Lotto,2010; Obleser & Eisner, 2009; Rauschecker& Scott, 2009). A purely auditory perspec-tive, however, fails to account for recentevidence from transcranial magnetic stimu-lation (TMS) studies showing that disruptionof (pre-)motor cortex can have modulatoryeffects on speech perception in certain sit-uations (D’Ausilio, Bufalari, Salmas, &Fadiga, 2012; Krieger-Redwood, Gaskell,Lindsay, & Jefferies, 2013; Meister et al.,2007; Möttönen, Dutton, & Watkins, 2013).If motor systems are not necessary for speechperception, what might be the functionalitythat underlies these modulatory effects?It is noteworthy that such effects have beenobserved only at the phoneme or syllablelevel, that they appear to be restricted to situ-ations in which the speech signal is degraded,and that they affect reaction times rather thanaccuracy (Hickok et al., 2011).

Although sensorimotor interactions inperception are not predicted by traditionalauditory approaches, several neurobiolog-ical models of language processing havebegun to account for perception–productionlinks (Guenther, Ghosh, & Tourville,2006; Hickok, 2012; Hickok et al., 2011;Rauschecker & Scott, 2009). From a speechproduction point of view, perceptual pro-cesses are necessary in order to establishinternal models of articulatory sequencesduring language acquisition, as well as toprovide sensory feedback for error moni-toring. There is recent evidence from fMRIstudies that the premotor cortex might facil-itate perception, specifically under adverselistening conditions, because activity inmotor areas has been linked to perceptuallearning of different types of degraded speech(Adank & Devlin, 2010; Erb, Henry, Eisner,& Obleser, 2013; Hervais-Adelman, Carlyon,Johnsrude, & Davis, 2012). Such findingsare consistent with the idea that motorregions provide an internal simulation that

matches degraded speech input to articulatorytemplates, thereby assisting speech compre-hension under difficult listening conditions(D’Ausilio et al., 2012; Hervais-Adelmanet al., 2012), but direct evidence for this islacking at present.

Summary

The prelexical segmental stage involvesspeech-specific processes that mediatebetween general auditory perception andword recognition by constructing percep-tual representations that can be used duringlexical access. The two main computationalchallenges approached at this stage arethe segmentation and variability problems.We have argued that listeners use multipleprelexical mechanisms to deal with thesechallenges, including the detection of phono-tactic constraints for lexical segmentation,processes of rate and talker normalizationand of phonological inference, and engage-ment of speech production machinery (atleast under adverse listening conditions).The two most important prelexical mecha-nisms, however, appear to be abstraction andadaptation. The central goal of the prelexicalprocessor is to map from the episodic detailof the acoustic input onto abstract perceptualcategories in order to be able to cope withthe variability problem and hence to facilitatelexical access. This mapping process clearlyseems to be adaptive: Listeners tune in toaspects of the current listening situation (e.g.,who is/are talking, how fast they are talking,whether they have a foreign or regionalaccent). Studying perceptual learning inparticular has been valuable as a window intohow prelexical perceptual representations aremaintained and updated.

Prelexical Suprasegmental Processing

As we have already argued, speech perceptiondepends on the extraction of suprasegmental

Page 27: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 15�

� �

Stages of Perceptual Processing 15

as well as segmental information. Supraseg-mental material is used by listeners to helpthem solve the lexical-embedding, variabil-ity, and segmentation problems. As withprelexical segmental processing, abstractionand adaptation are the two main mechanismsthat allow listeners to solve these problems.

Words can have the same segments butdiffer suprasegmentally. One way in whichthe listener copes with the lexical-embeddingproblem (the fact that words sound like manyother words) is thus to use these fine-grainedsuprasegmental differences to disambiguatebetween similar-sounding words. Italianlisteners, for instance, can use the relativeduration of segments to distinguish betweenalternative lexical hypotheses that have thesame initial sequence of segments but dif-ferent syllabification (e.g., the syllable-final/l/ of sil.vestre, “sylvan,” differs minimallyin duration from the syllable-initial /l/ ofsi.lencio, “silence”), and fragment primingresults suggest that Italians can use this acous-tic difference to disambiguate the input evenwithout hearing the following disambiguat-ing segments (i.e., the /v/ or /ε/; Tabossi,Collina, Mazzetti, & Zoppello, 2000).

English listeners use similar subtledurational cues to syllabic structure todisambiguate oronyms (tulips vs. two lips;Gow & Gordon, 1995); Dutch listeners use /s/duration to distinguish between, for example,een spot, “a spotlight,” and eens pot, “oncejar” (Shatzman & McQueen, 2006b); andFrench listeners use small differences in theduration of consonants to distinguish betweensequences with liaison (e.g., the word-final /r/of dernier surfacing in dernier oignon, “lastonion”) from matched sequences withoutliaison (e.g., dernier rognon, “last kidney”;Spinelli, McQueen, & Cutler, 2003).

Durational differences across multiplesegments also signal suprasegmental struc-ture. Monosyllabic words, for example, tendto be longer than in the same segmental

sequence in a polysyllabic word (e.g., cap islonger on its own than in captain; Lehiste,1972). Experiments using a variety of tasks,including cross-modal priming, eye track-ing, and mouse tracking, have shown thatlisteners use these durational differencesduring word recognition, and thus avoidrecognizing spurious lexical candidates (suchas cap in captain; Blazej & Cohen-Goldberg,2015; Davis, Marslen-Wilson, & Gaskell,2002; Salverda, Dahan, & McQueen, 2003).It appears that these effects reflect the extrac-tion of suprasegmental structure because theyare modulated by cues to other prosodic struc-tures. Dutch listeners in an eye-tracking studylooked more at a branch (a tak) when hearingthe longer word taxi if the cross-spliced takcame from an original context where thefollowing syllable was stressed (e.g., /si/ inpak de tak sinaasappels, “grab the branchof oranges”) than if it was unstressed (/si/ inpak de tak citroenen, “grab the branch oflemons”; Salverda et al., 2003).

Listeners also make use of cues to largersuprasegmental structures to disambiguatebetween words. The presence of the onsetof a larger suprasegmental structure (e.g., anintonational phrase) affects the pronunciationof the segment that happens to be at thatboundary (typically by making it longerand louder). This information can be usedduring lexical form processing to disam-biguate between several word candidates(Keating, Cho, Fougeron, & Hsu, 2003).Cho, McQueen, and Cox (2007) examinedtemporarily ambiguous sequences in Englishsuch as bus tickets, where words such as buststraddle the word boundary. The word buswas easier to recognize in the phrase bustickets if it had been taken from the utterance“When you get on the bus, tickets shouldbe shown to the driver” (in which the /t/was prosodically strengthened) than if it hadbeen taken from “John bought several bustickets for his family” (in which the /t/ was

Page 28: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 16�

� �

16 Speech Perception

not strengthened). Christophe, Peperkamp,Pallier, Block, and Mehler (2004) founda similar effect in French. Words such aschat, “cat,” were harder to disambiguatefrom chagrin, “grief,” in the sequence chatgrinchaux, “grumpy cat,” if the sequencewas part of a single phrase than if a phraseboundary occurred between the two words.

Listeners also use suprasegmental cuesto the lexical stress patterns of words dur-ing word recognition. These cues includepitch, amplitude, and duration differencesbetween stressed and unstressed syllables.Dutch (Cutler & van Donselaar, 2001; vanDonselaar, Koster, & Cutler, 2005) andSpanish (Soto-Faraco, Sebastián-Gallés, &Cutler, 2001) listeners are sensitive todifferences between sequences that are seg-mentally identical but differ in stress, and usethose differences to constrain lexical access(e.g., Dutch listeners can distinguish betweenvoor taken from initially stressed voornaam,“first name,” and voor taken from finallystressed voornaam, “respectable”; Cutler &van Donselaar, 2001). Dutch listeners usethe stress information as soon as it is heardduring word recognition: Eye-tracking datashow disambiguation between, for example,oktober, “October” (stress on the second syl-lable) and octopus, “octopus” (stress on thefirst syllable) before the arrival of unambigu-ous segmental information (the /b/ and /p/ inthis example; Reinisch, Jesse, & McQueen,2010). Italian listeners show similar rapiduse of stress information in on-line wordrecognition (Sulpizio & McQueen, 2012).

Interestingly, however, English listenerstend to be less sensitive to stress cues thanDutch, Spanish, and Italian listeners; acrossa variety of tasks, stress effects are weak andcan be hard to find in English (Cooper, Cutler,& Wales, 2002; Fear, Cutler, & Butterfield,1995; Slowiaczek, 1990). This appears to bebecause stress in English is primarily cued bydifferences between segments (the difference

between full vowels and the reduced vowelschwa) rather than suprasegmental stressdifferences. This means that English listen-ers are usually able to distinguish betweenwords using segmental information aloneand hence can afford to ignore the supraseg-mental information (Cooper et al., 2002; seeCutler, 2012 for further discussion). Englishparticipants (Scarborough, Keating, Mattys,Cho, & Alwan, 2009) and Dutch participants(Jesse & McQueen, 2014) are also sensitiveto visual cues to lexical stress (e.g., chin oreyebrow movements).

Obviously, suprasegmental stress infor-mation can be used in speech perceptiononly in a language that has lexical stress.Similarly, other types of suprasegmental cuescan be used only in languages that makelexical distinctions based on those cues, butthe cross-linguistic evidence suggests thatsuch cues are indeed used to constrain wordrecognition. Speakers of languages with lex-ical tone, such as Mandarin and Cantonese,for example, use tone information in wordrecognition. Note that tone is sometimesregarded as segmental, since a vowel withone f0 pattern (e.g., a falling tone) can beconsidered to be a different segment from thesame vowel with a different pattern (e.g., alevel tone). We consider tone to be supraseg-mental here, however, because it concerns anacoustic feature, pitch, which signals othersuprasegmental distinctions (e.g., lexicalstress). Lexical priming studies in Cantonesesuggest, for example, that tonal informationmodulates word recognition (Cutler & Chen,1997; Lee, 2007; Ye & Connine, 1999; Yip,2001). Likewise, pitch-accent patterns inJapanese (based on high [H] and low [L]syllables, again cued by differences in the f0contour) are picked up by Japanese listeners;for example, they can distinguish between/ka/ taken from baka [HL] versus gaka[LH] (Cutler & Otake, 1999), and accentpatterns are used to distinguish between

Page 29: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 17�

� �

Stages of Perceptual Processing 17

words (Cutler & Otake, 1999; Sekiguchi &Nakajima, 1999).

The data previously reviewed all make thesame general point about how listeners solvethe lexical-embedding problem. Listenerscope with the fact that words sound likeother words in part by using suprasegmentaldisambiguating information. Suprasegmentalprelexical processing thus entails the extrac-tion of this information so that it can be usedin lexical processing. This can be also beconsidered to be a way in which listenerssolve the variability problem. Segments havedifferent physical realizations in differentprosodic and intonational contexts (e.g.,they are longer, or louder, or have higherpitch). The suggestion here is that this kindof variability is dealt with by suprasegmentalprelexical processes, which use this informa-tion to build phonologically abstract prosodicstructures that are then used to constrainword recognition.

As with segmental prelexical processing,therefore, abstraction is a key mechanismthat allows listeners to cope with variability.Word-learning experiments provide evidencefor suprasegmental abstraction. In Shatzmanand McQueen (2006a), Dutch listeners weretaught pairs of novel words, such as bap andbaptoe, that were analogues of real pairssuch as cap and captain. The listeners hadto learn to associate the new words withnonsense shapes. Critically, during learn-ing, the durational difference between themonosyllabic novel words and the samesyllable in the longer words was neutralized.In an eye-tracking test phase, however, thesyllables had their normal duration (bap waslonger than the bap in baptoe). Even thoughthe listeners had never heard these formsbefore, effects of the durational differences(analogous to those found in eye trackingwith real words) were observed (e.g., listen-ers made more fixations to the bap nonsenseshape when the input syllable was longer

than when it was shorter). This suggests thatthe listeners had abstract knowledge aboutthe durational properties of monosyllabicand polysyllabic words and could bring thatknowledge to bear during word recognitionthe first time they heard the novel words withthose properties. A word-learning experimentwith a similar design (Sulpizio & McQueen,2012) suggests that Italian listeners haveabstract suprasegmental knowledge aboutlexical stress (about the distribution of lex-ical stress patterns in Italian, and about theacoustic-phonetic cues that signal stress), andthat they too can use that knowledge duringonline recognition of novel words, in spite ofnever having heard those words with thosestress cues ever before.

A perceptual learning experiment usingthe lexically guided retuning paradigm ofNorris et al. (2003) also provides evidencefor suprasegmental abstraction. Mandarinlisteners exposed to syllables with ambigu-ous pitch contours in contexts that biasedthe interpretation of the ambiguous syllablestoward either tone 1 or tone 2 subsequentlycategorized more stimuli on tone 1–tone 2test continua in a way that was consistentwith the exposure bias (Mitterer, Chen, &Zhou, 2011). This tendency was almost aslarge for new test words as for words that hadbeen heard during exposure. This generaliza-tion of learning indicates that the listenershad adjusted phonologically abstract knowl-edge about lexical tone. Generalization ofperceptual learning across the lexicon aboutthe pronunciation of syllables also indi-cates that listeners have abstract knowledgeabout suprasegmental structure (Poellmannet al., 2014).

Suprasegmental information also has arole to play in solving the segmentationproblem. The studies previously reviewed onuptake of fine-grained suprasegmental cues(Blazej & Cohen-Goldberg, 2015; Cho et al.,2007; Christophe et al., 2004; Davis et al.,

Page 30: TrimSize:7inx10in Wixted-Vol3 ffirs.tex V1-12/29/2017 6:59P.M. … · 2018. 2. 3. · Susan E. Brennan, Anna K. Kuhlen, and Jeanne Charoy 6 READING 207 Manuel Carreiras, Blair C

Trim Size: 7in x 10in Wixted-Vol3 c01.tex V1 - 12/27/2017 6:57 P.M. Page 18�

� �

18 Speech Perception

2002; Gow & Gordon, 1995; Salverda et al.,2003; Spinelli et al., 2003) can all also beconsidered as evidence for the role of thesecues in segmentation. The fine-grained detailis extracted prelexically and signals wordboundaries.

But there is also another important way inwhich suprasegmental prelexical processingsupports lexical segmentation. The rhythmicstructure of speech can signal the location ofword boundaries (Cutler, 1994). Languagesdiffer rhythmically, and the segmentationprocedures vary across languages accord-ingly. In languages such as English andDutch, rhythm is stress-based, and strongsyllables (i.e., those with full vowels, whichare distinct from the reduced vowels in weaksyllables) tend to mark the locations of theonsets of new words in the continuous speechstream (Cutler & Carter, 1987; Schreuder &Baayen, 1994). Listeners of such languagesare sensitive to the distinction between strongand weak syllables (Fear et al., 1995), anduse this distinction to constrain spoken-wordrecognition, as measured by studies examin-ing word-boundary misperceptions (Borrie,McAuliffe, Liss, O’Beirne, & Anderson,2013; Cutler & Butterfield, 1992; Vroomen,van Zon, & de Gelder, 1996) and inword-spotting tasks (Cutler & Norris, 1988;McQueen, Norris, & Cutler, 1994; Norris,McQueen, & Cutler, 1995; Vroomen et al.,1996; Vroomen & de Gelder, 1995). Cutlerand Norris (1988), for example, comparedword-spotting performance for target wordssuch as mint in mintayf (where the secondsyllable was strong) and mintef (where thesecond syllable was weak). They foundpoorer performance in sequences such asmintayf, and argued that this was because thestrong syllable—tayf—indicated that therewas likely to be a new word starting at the /t/,which then made it harder to spot mint.

Languages with different rhythms aresegmented in different ways. Languages such

as French, Catalan, and Korean have rhythmbased on the syllable, and speakers of theselanguages appear to use syllable-based seg-mentation procedures (Content, Meunier,Kearns, & Frauenfelder, 2001; Cutler,Mehler, Norris, & Segui, 1986, 1992; Kim,Davis, & Cutler, 2008; Kolinsky, Morais, &Cluytens, 1995; Sebastián-Gallés, Dupoux,Segui, & Mehler, 1992). Likewise, languagessuch as Japanese and Telugu have rhythmbased on the mora, and speakers of theselanguages appear to use mora-based seg-mentation procedures (Cutler & Otake, 1994;Murty, Otake, & Cutler, 2007; Otake, Hatano,Cutler, & Mehler, 1993). In spite of thesedifferences across languages, what appears tobe common is that segmentation uses rhythm.

Summary

The prelexical suprasegmental stage acts inparallel with the prelexical segmental stageto construct speech-specific representationsof suprasegmental structures that can beused to constrain and assist lexical access.Multiple mechanisms at this stage of pro-cessing help the listener to solve all threemajor computational problems. As withprelexical segmental processing, the keymechanisms in suprasegmental processingare abstraction and adaptation. There hasbeen relatively little work using neurosci-entific methods to address the nature ofprelexical suprasegmental processing.

Lexical Form Processing

Although it is broadly established thatprelexical processes and representationsare instantiated in the superior temporallobes, there is less consensus about thelocalization of lexical processing (see, e.g.,Price, 2012). In some neurobiological mod-els, the primary pathway from prelexicalprocesses to word forms and meaning isalong the antero-ventral stream (DeWitt &