long-term music training tunes how the brain temporally binds signals from multiple … ·...

10
Long-term music training tunes how the brain temporally binds signals from multiple senses HweeLing Lee 1 and Uta Noppeney Cognitive Neuroimaging Group, Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany Edited by Robert J. Zatorre, McGill University, Montreal, QC, Canada, and accepted by the Editorial Board October 21, 2011 (received for review September 16, 2011) Practicing a musical instrument is a rich multisensory experience involving the integration of visual, auditory, and tactile inputs with motor responses. This combined psychophysicsfMRI study used the musicians brain to investigate how sensory-motor experience molds temporal binding of auditory and visual signals. Behavior- ally, musicians exhibited a narrower temporal integration window than nonmusicians for music but not for speech. At the neural level, musicians showed increased audiovisual asynchrony responses and effective connectivity selectively for music in a superior temporal sulcus-premotor-cerebellar circuitry. Critically, the premotor asyn- chrony effects predicted musiciansperceptual sensitivity to audio- visual asynchrony. Our results suggest that piano practicing ne tunes an internal forward model mapping from action plans of piano playing onto visible nger movements and sounds. This in- ternal forward model furnishes more precise estimates of the rela- tive audiovisual timings and hence, stronger prediction error signals specically for asynchronous music in a premotor-cerebellar circuitry. Our ndings show intimate links between action produc- tion and audiovisual temporal binding in perception. audiovisual synchrony | multisensory integration | sensorimotor learning | crossmodal integration | experience-dependent plasticity P racticing a musical instrument is a rich multisensory experience involving the integration of visual, auditory, and tactile inputs with motor responses. The musicians brain, thus, provides an ideal model to study experience-dependent plasticity in humans (1, 2). Previous research in musicians has focused on neural plasticity affecting unisensory and motor processing. Little is known about how musical expertise alters the integration of inputs from multiple senses. Because musical performance requires precise timing, musical expertise may specically modulate the temporal binding of sensory signals. Given the variability in physical and neural transmission times, sensory signals do not have to be precisely synchronous but must co-occur within a temporal window that exibly adapts to the temporal statistics of the sensory inputs as a consequence of music (3) or audiovisual training (4). At the neural level, audiovisual (a)synchrony pro- cessing relies on a widespread neural system encompassing subcortical, primary sensory, higher-order association, cerebel- lar, and premotor areas (58). This study used the musicians brain as a model to investigate how long-term sensory-motor experience (i.e., piano practicing) shapes the neural processes underlying temporal binding of au- ditory and visual signals. We presented subjects with synchronous and asynchronous speech and piano music as two stimulus classes that are both characterized by a rich hierarchical temporal structure but linked to different motor effectors (mouth vs. hand). Comparing the effect of musical expertise on synchrony percep- tion of speech and music allowed us to dissociate generic and context-specic neural mechanisms by which piano practicing ne tunes audiovisual temporal binding and synchrony perception. Generic mechanisms of musical expertise may rely on expe- rience-driven plasticity affecting sensory and particularly, audi- tory processing. Brainstem responses in musicians relative to nonmusicians have recently been shown to be faster, larger, and more reliable when encoding the periodicity of speech and music (9, 10). Importantly, the enhanced auditory processing skills transferred from music to speech (11). Hence, if music training induces a general sensitization to audiovisual temporal (mis) alignment, we would expect a narrower temporal integration window and increased neural audiovisual (a)synchrony effects along the auditory processing hierarchy in musicians for both music and speech. Context-specic mechanisms of musical expertise may rely on the formation of internal forward models that are ne tuned to specic motor tasks and effectors. Internal forward models have been invoked as a mechanism for not only motor control but also, motor and perceptual timing in the unisensory domains (1214). They are learned by error feedback in interactions with the environment and thought to be instantiated in a cortico- cerebellar circuitry. Specically, piano practicing may ne tune an internal forward model that maps from the motor plan of piano playing onto its sensory consequences (i.e., the visible nger movements and concurrent auditory sounds). Because piano playing generates sensory signals in multiple modalities, the internal forward model indirectly also furnishes predictions about the relative timings of the auditory and visual signals leading to a narrower temporal binding window. We would, therefore, expect audiovisual asynchronous stimuli that violate the models temporal predictions to elicit an error signal within this cortico-cerebellar circuitry selectively for music and not for speech that relies on different motor effectors and sensory-motor transformations (1215). Results Eighteen musicians and nineteen nonmusicians participated in the psychophysics study [before functional MRI (fMRI)] and the fMRI study. During the psychophysics study, subjects explicitly judged the audiovisual synchrony of speech sentences and piano music at 13 levels of audiovisual stimulus onset asynchronies (AV- SOAs; ±360, ±300, ±240, ±180, ±120, ±60, and 0 ms). From the proportion of synchronous responses, we estimated the temporal integration window for each subject. In the fMRI study (28 wk later), subjects were presented with the same set of speech and music material. The stimuli were presented synchronously and asynchronously selectively with a temporal offset of ±240 ms (Fig. S1). This AV-SOA level was associated with an average proportion of synchronous responses of 33.6% across subjects and stimulus classes in our psychophysics study (Fig. 1). During the fMRI study, subjects passively perceived the audiovisual stimuli to evaluate Author contributions: H.L. and U.N. designed research; H.L. performed research; H.L. and U.N. analyzed data; and H.L. and U.N. wrote the paper. The authors declare no conict of interest. This article is a PNAS Direct Submission. R.J.Z. is a guest editor invited by the Editorial Board. 1 To whom correspondence should be addressed. E-mail: [email protected]. de. See Author Summary on page 20295. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1115267108/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1115267108 PNAS | December 20, 2011 | vol. 108 | no. 51 | E1441E1450 NEUROSCIENCE PNAS PLUS Downloaded by guest on August 6, 2021

Upload: others

Post on 08-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

Long-term music training tunes how the braintemporally binds signals from multiple sensesHweeLing Lee1 and Uta Noppeney

Cognitive Neuroimaging Group, Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany

Edited by Robert J. Zatorre, McGill University, Montreal, QC, Canada, and accepted by the Editorial Board October 21, 2011 (received for reviewSeptember 16, 2011)

Practicing a musical instrument is a rich multisensory experienceinvolving the integration of visual, auditory, and tactile inputs withmotor responses. This combined psychophysics–fMRI study usedthe musician’s brain to investigate how sensory-motor experiencemolds temporal binding of auditory and visual signals. Behavior-ally, musicians exhibited a narrower temporal integration windowthan nonmusicians for music but not for speech. At the neural level,musicians showed increased audiovisual asynchrony responses andeffective connectivity selectively for music in a superior temporalsulcus-premotor-cerebellar circuitry. Critically, the premotor asyn-chrony effects predicted musicians’ perceptual sensitivity to audio-visual asynchrony. Our results suggest that piano practicing finetunes an internal forward model mapping from action plans ofpiano playing onto visible finger movements and sounds. This in-ternal forward model furnishes more precise estimates of the rela-tive audiovisual timings and hence, stronger prediction errorsignals specifically for asynchronousmusic in a premotor-cerebellarcircuitry. Our findings show intimate links between action produc-tion and audiovisual temporal binding in perception.

audiovisual synchrony | multisensory integration | sensorimotor learning |crossmodal integration | experience-dependent plasticity

Practicing a musical instrument is a rich multisensory experienceinvolving the integration of visual, auditory, and tactile inputs

with motor responses. The musician’s brain, thus, provides an idealmodel to study experience-dependent plasticity in humans (1, 2).Previous research in musicians has focused on neural plasticity

affecting unisensory and motor processing. Little is known abouthow musical expertise alters the integration of inputs frommultiple senses. Because musical performance requires precisetiming, musical expertise may specifically modulate the temporalbinding of sensory signals. Given the variability in physical andneural transmission times, sensory signals do not have to beprecisely synchronous but must co-occur within a temporalwindow that flexibly adapts to the temporal statistics of thesensory inputs as a consequence of music (3) or audiovisualtraining (4). At the neural level, audiovisual (a)synchrony pro-cessing relies on a widespread neural system encompassingsubcortical, primary sensory, higher-order association, cerebel-lar, and premotor areas (5–8).This study used the musician’s brain as a model to investigate

how long-term sensory-motor experience (i.e., piano practicing)shapes the neural processes underlying temporal binding of au-ditory and visual signals. We presented subjects with synchronousand asynchronous speech and piano music as two stimulus classesthat are both characterized by a rich hierarchical temporalstructure but linked to different motor effectors (mouth vs. hand).Comparing the effect of musical expertise on synchrony percep-tion of speech and music allowed us to dissociate generic andcontext-specific neural mechanisms by which piano practicing finetunes audiovisual temporal binding and synchrony perception.Generic mechanisms of musical expertise may rely on expe-

rience-driven plasticity affecting sensory and particularly, audi-tory processing. Brainstem responses in musicians relative tononmusicians have recently been shown to be faster, larger, and

more reliable when encoding the periodicity of speech and music(9, 10). Importantly, the enhanced auditory processing skillstransferred from music to speech (11). Hence, if music traininginduces a general sensitization to audiovisual temporal (mis)alignment, we would expect a narrower temporal integrationwindow and increased neural audiovisual (a)synchrony effectsalong the auditory processing hierarchy in musicians for bothmusic and speech.Context-specific mechanisms of musical expertise may rely on

the formation of internal forward models that are fine tuned tospecific motor tasks and effectors. Internal forward models havebeen invoked as a mechanism for not only motor control butalso, motor and perceptual timing in the unisensory domains(12–14). They are learned by error feedback in interactions withthe environment and thought to be instantiated in a cortico-cerebellar circuitry. Specifically, piano practicing may fine tunean internal forward model that maps from the motor plan ofpiano playing onto its sensory consequences (i.e., the visiblefinger movements and concurrent auditory sounds). Becausepiano playing generates sensory signals in multiple modalities,the internal forward model indirectly also furnishes predictionsabout the relative timings of the auditory and visual signalsleading to a narrower temporal binding window. We would,therefore, expect audiovisual asynchronous stimuli that violatethe model’s temporal predictions to elicit an error signal withinthis cortico-cerebellar circuitry selectively for music and not forspeech that relies on different motor effectors and sensory-motortransformations (12–15).

ResultsEighteen musicians and nineteen nonmusicians participated in thepsychophysics study [before functional MRI (fMRI)] and thefMRI study. During the psychophysics study, subjects explicitlyjudged the audiovisual synchrony of speech sentences and pianomusic at 13 levels of audiovisual stimulus onset asynchronies (AV-SOAs; ±360, ±300, ±240, ±180, ±120, ±60, and 0 ms). From theproportion of synchronous responses, we estimated the temporalintegration window for each subject. In the fMRI study (2–8 wklater), subjects were presented with the same set of speech andmusic material. The stimuli were presented synchronously andasynchronously selectively with a temporal offset of ±240 ms (Fig.S1). This AV-SOA level was associated with an average proportionof synchronous responses of 33.6% across subjects and stimulusclasses in our psychophysics study (Fig. 1). During the fMRI study,subjects passively perceived the audiovisual stimuli to evaluate

Author contributions: H.L. and U.N. designed research; H.L. performed research; H.L. andU.N. analyzed data; and H.L. and U.N. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. R.J.Z. is a guest editor invited by the EditorialBoard.1To whom correspondence should be addressed. E-mail: [email protected].

See Author Summary on page 20295.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1115267108/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1115267108 PNAS | December 20, 2011 | vol. 108 | no. 51 | E1441–E1450

NEU

ROSC

IENCE

PNASPL

US

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 2: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

automatic (a)synchrony effects in motor, premotor, and prefrontalregions unconfounded by motor responses and task-induced pro-cesses (e.g., response selection demands, etc.). Inside the scanner,subjects’ performance (e.g., fixation) was monitored using eyetracking to ensure that they attended to the visual and auditorystimulus components (SI Results, Eye Movement Monitoring).

Psychophysics Experiment. For each subject, psychometric func-tions were estimated separately for speech and music from theproportion of synchronous responses at each AV-SOA level [SIExperimental Procedures, Behavioral Analysis: Psychophysics Study(Before fMRI Study)]. The audiovisual temporal integrationwindow was defined as the integral of the fitted psychometricfunction bounded by ±360 ms. As shown in Fig. 1, the temporalintegration window was narrower for musicians than non-musicians for music [musicians (mean ± SEM): 1.10 ± 0.07,nonmusicians: 1.54 ± 0.09] but not for speech (musicians: 1.37 ±0.07, nonmusicians: 1.47 ± 0.06). Indeed, this impression wasstatistically validated in a mixed design ANOVA of the integralwith stimulus class (music vs. speech) as within-subject factor andgroup (musicians vs. nonmusicians) as between-subject factor,showing a main effect of group [F(1,35) = 8.67, P < 0.01] andstimulus class [F(1.0,35.0) = 5.26, P < 0.05] and a group by stimulusclass interaction [F(1.0,35.0) = 13.2, P = 0.001]. Posthoc testingconfirmed that musicians exhibited a narrower temporal in-tegration window than nonmusicians for music [t(35) = 3.92, P <0.001] but not for speech [t(35) = 1.08, P = 0.14]. Furthermore,paired samples t tests comparing the temporal integration win-dows for speech and music in each group showed that musiciansdisplayed a narrower temporal integration window for musicrelative to speech [t(18) = 4.29, P < 0.001], whereas the temporalintegration windows for speech and music did not differ innonmusicians [t(18) < 1, nonsignificant (n.s.)].Collectively, piano practicing narrows the temporal integration

window significantly only for music and not for speech. Theseresults show that piano practicing employs a context-specificmechanism to fine tune audiovisual temporal binding andsynchrony perception.

fMRI Experiment. (A)synchrony system for music and speech. We firstidentified candidate regions that are sensitive to the temporal(mis)alignment of the audiovisual signals. Asynchronous relativeto synchronous conditions (pooled over musicians and non-musicians) increased activation in a widespread neural systemencompassing bilateral superior/middle temporal gyri/sulci, bi-

lateral occipital and fusiform gyri, left premotor cortex, and bi-lateral cerebellar cortices. This asynchrony sensitive system waslargely shared by music and speech (Fig. S2 and Tables S1 andS2). Indeed, the direct comparison of asynchrony effects forspeech and music [i.e., the interaction between audiovisual (a)synchrony and stimulus class] did not reveal any asynchronyeffects that were selective for either music or speech.Effect of musical expertise on synchrony processing. We then in-vestigated whether and how musical expertise shapes the neuralprocesses underlying audiovisual synchrony perception. Specifi-cally, we expected piano practicing to mold audiovisual syn-chrony processing for music.Separately for speech and music, we, therefore, identified

asynchrony effects that are (i) common [i.e., (conjunction-null)conjunction analysis] and (ii) different [i.e., the interaction be-tween audiovisual (a)synchrony and group] for musicians andnonmusicians.For speech, musicians (M+) and nonmusicians (M−) showed

common asynchrony effects in bilateral posterior superior tem-poral sulci/gyri (STS/STG) and left cerebellum, with a non-significant asynchrony effect in right cerebellum (x = +22, y =−74, z = −38, z score = 2.56) (Fig. 2 A and B and Table S3).However, no asynchrony effects were observed that differed formusicians and nonmusicians. These results suggest that pianopracticing does not significantly affect (a)synchrony processingin speech perception.For music, we identified common asynchrony effects for

musicians (M+) and nonmusicians (M−) in left extrastriatecortex (Fig. 3 A and C and Table S3). Crucially, musicians (M+)relative to nonmusicians (M-) showed enhanced asynchrony ef-fects in left superior precentral sulcus (anterior premotor cortex),right posterior STS/middle temporal gyrus (MTG), and left cer-ebellum (Fig. 3B and Table 1). As shown in the percent signalchange plots, asynchrony effects for music in the right posteriorSTS/MTG emerged primarily for musicians (Fig. 3C). More spe-cifically, the right posterior STS/MTG showed primarily a robustasynchrony effect in musicians (z score = 4.0) and a less reliablesynchrony effect (z score = 2.4) in nonmusicians. In the neigh-boring voxel (x = +60, y = −40, z = −2), the synchrony effect fornonmusicians was negligible (z score = 1.6), whereas the asyn-chrony effect in musicians was even more robust (z score = 4.4).By contrast, in the case of speech, the right posterior STS/

MTG showed asynchrony effects for both musicians and non-musicians. This activation profile highlights the role of priorsensory-motor experience (available for speech in both groups

Fig. 1. The psychometric functions for speech (Left) and music (Right) in musicians (black, M+) and nonmusicians (gray, M−; from the psychophysical ex-periment before the fMRI study).

E1442 | www.pnas.org/cgi/doi/10.1073/pnas.1115267108 Lee and Noppeney

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 3: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

but for music only in musicians) in tuning the neural systemsinvolved in automatic audiovisual asynchrony detection.A similar profile, as in the right posterior STS/MTG, was also

observed in the left anterior premotor cortex. In the anteriorpremotor cortex (left superior precentral sulcus), asynchronyeffects for music were strongly modulated by musical expertiseand amplified for musicians. Surprisingly, the left anterior pre-motor cortex exhibits a cross-over interaction. In other words,the most anterior premotor cortex shows activation increases forasynchronous stimuli in musicians but synchronous stimuli innonmusicians. Additional exploration of the activation in musi-cians and nonmusicians revealed that (i) the asynchrony effectsin musicians were located in the posterior and anterior portionsof premotor cortex, but (ii) the synchrony effects in nonmusiciansextended from the superior frontal gyrus as part of the de-activation network. Because the synchrony effects in the superiorfrontal gyrus were not significant given our statistical criteria, theyare not further discussed.

Collectively, our fMRI and behavioral results indicate that pianopracticing shapes automatic audiovisual temporal binding by a con-text-specific neural mechanism selectively for music and not forspeech. Indeed, a three-way interaction confirmed that the modu-lation of the asynchrony effects by musical expertise was greater formusic than speech (at P < 0.001, uncorrected) in left superiorprecentral sulcus (x = −42, y = +20, z = +50, z score = 4.2),cerebellum (x = −32, y = −64, z = −38, z score = 4.0), and rightposterior STS/MTG (x = +62, y = −40, z = −4, z score = 3.7).Relationship between subject-specific perceptual and neural asynchronyeffects. We investigated whether subjects’ perceptual sensitivity toaudiovisual asynchrony as measured in the psychophysical studypredicted their individual asynchrony-induced activation enhance-ment separately for speech and music (i.e., the contrast estimatepertaining to asynchronous − synchronous conditions for speech ormusic; referred to as neural asynchrony effect). To relate percep-tual and neural effects more closely, we determined each subject’sperceptual asynchrony sensitivity as the difference in proportion of

Fig. 2. Asynchrony effects for speech that are common in nonmusicians (M−) and musicians (M+). (A) Asynchrony effects for speech averaged across bothgroups (yellow) and common in both groups (green) are rendered on a template brain. (B Left) Asynchrony effects for speech that are common in both groupsare displayed on a coronal slice of a normalized structural image (averaged across subjects). (B Center and Right) Fitted event-related BOLD responses (lines)and peristimulus time histogram (markers) at the given coordinate are displayed as a function of poststimulus time (PST; averaged over sessions and subjects).Insets show contrast estimates (across subjects’ mean ± SEM) of the asynchrony (async − sync) effect in arbitrary units (corresponding to percentage of whole-brain mean) for musicians (black, M+) and nonmusicians (gray, M−). Z scores pertain to the comparison between M+ and M−. A positive z score indicates thatthe asynchrony effect is greater in musicians relative to nonmusicians (and vice versa). (C Left) Neural asynchrony effects for speech that are significantlypredicted by subjects’ perceptual asynchrony sensitivity. (C Right) Scatter plot depicting the regression of neural asynchrony effects for speech on theirperceptual asynchrony sensitivity in musicians (black, M+) and nonmusicians (gray, M−).

Lee and Noppeney PNAS | December 20, 2011 | vol. 108 | no. 51 | E1443

NEU

ROSC

IENCE

PNASPL

US

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 4: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

synchronous responses for synchronous − asynchronous conditionsseparately for speech and music. The regression analysis wasconstrained to all voxels showing an asynchrony effect for speech[respective (resp.) for music) at P < 0.001, uncorrected (Experi-mental Procedures, Search Volume Constraints).

For speech, the perceptual asynchrony sensitivity significantlypredicted subjects’ neural asynchrony effects in left cerebellumin both musicians and nonmusicians (Table 2). In other words,the more accurately that subjects discriminated between syn-chronous and asynchronous conditions in the psychophysical

Fig. 3. Asynchrony effects for music that are common (A) and distinct (B) in nonmusicians (M−) and musicians (M+). (A) Asynchrony effects for music av-eraged across both groups (yellow) and common in both groups (green) are rendered on a template brain. (B) Asynchrony effects for music that are enhancedfor musicians relative to nonmusicians are rendered on a template brain. (C Left) Asynchrony effects for music that are selective for musicians are displayed ona coronal slice of a normalized structural image (averaged across subjects). (C Center and Right) Fitted event-related BOLD responses (lines) and peristimulustime histogram (markers) at the given coordinate are displayed as a function of poststimulus time (PST; averaged over sessions and subjects). Insets showcontrast estimates (across subjects’ mean ± SEM) of the asynchrony (async − sync) effect in arbitrary units (corresponding to percent of whole-brain mean) formusicians (black, M+) and nonmusicians (gray, M−). Z scores pertain to the comparison between M+ and M−. A positive z score indicates that the asynchronyeffect is greater in musicians relative to nonmusicians (and vice versa).

Table 1. Asynchrony effects that are modulated by musical expertise [i.e., the interaction between audiovisual (a)synchrony and musical expertise]

MNI coordinates

Brain region Cluster size x y z z Score (peak) P value

Asynchrony effects for speech that are enhanced in musiciansM+ > M− for async > sync for speech NIL

Asynchrony effects for music that are enhanced in musiciansM+ > M− for async > sync for music

R. posterior STS/middle temporal gyrus 73 62 −40 −4 4.5 0.03L. superior precentral sulcus/L. premotor 81 −42 20 50 4.4 0.04L. cerebellum (Crus II/VIIb) 182 −32 −60 −38 4.3

P value, Corrected at peak level for multiple comparisons within the search volume of interest (see Experimental Procedures,Search Volume Constraints).

E1444 | www.pnas.org/cgi/doi/10.1073/pnas.1115267108 Lee and Noppeney

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 5: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

study, the stronger was their asynchrony effects in left cerebellumduring speech perception (Fig. 2C).For music, the perceptual asynchrony sensitivity significantly

predicted the neural asynchrony effects in the left premotorcortex only in musicians but not in nonmusicians (more specifi-cally, we observed a significant interaction; i.e., a change in re-gression slopes) (Fig. 4E and Table 2). Hence, this analysisprovided additional evidence that music training influences au-diovisual synchrony perception by a context-specific mechanismthat does not generalize to speech.Interestingly, for both speech and music, the asynchrony

effects that were predicted by subjects’ perceptual asynchronysensitivity were found in left cerebellum and premotor cortex(i.e., two brain areas that are traditionally associated with higher-order motor processing like motor planning and sequencing andless so with sensory processing) (16). However, perhaps sur-prisingly, the asynchrony effects were consistently more pro-nounced in the left cerebellar hemisphere (i.e., ipsilateral to thepremotor activation), although they were observed bilaterally ata lower statistical threshold of significance. This response profilemight be explained by the specific role of the left cerebellum intemporal processing within the millisecond range, which is par-ticularly relevant for audiovisual asynchrony detection in thecurrent paradigm (17, 18).Temporal processing along the motor hierarchy. To further investigatethe role of the motor system in audiovisual temporal processing,we characterized the anatomical relation of the following threeeffects in motor areas: (i) activation induced by processing au-diovisual speech or music action sequences relative to fixation(i.e., music + speech in M+ + M− > fixation), (ii) asynchronyeffects pooled over stimulus class and group (i.e., asynchronous >synchronous for music + speech in M+ + M−), and (iii) theasynchrony effects for music that are modulated by musical ex-pertise (i.e., asynchronous > synchronous for music for M+ >M−). Collectively, these three effects identified a processinghierarchy within the motor system.In fact, the asynchrony effects emerged progressively along the

motor cortical hierarchy when moving from left posterior primaryto left anterior premotor cortices (Fig. 4). Thus, the primarymotor cortex was activated generally by music and speech but didnot show a significant asynchrony effect for any stimulus class (Fig.4B) (M+ +M− for music + speech > fixation; x= −28, y = −22,z = +64, z score = 5.3, P < 0.05, corrected for the entire brain).The posterior premotor cortex was sensitive to audiovisual asyn-chrony and showed increased activation for asynchronous relativeto synchronous conditions averaged across groups and stimulusclasses (Fig. 4C) (asynchronous > synchronous for music + speechin M+ + M−; x = −34, y = 0, z = +64, z score = 4.8, P < 0.05,corrected for the entire brain). Finally, in the anterior premotor

cortex (left superior precentral sulcus), the asynchrony effects formusic were strongly modulated by musical expertise and amplifiedfor musicians (Fig. 4D) (asynchronous > synchronous for musicfor M+ > M− as reported above and in Table 2).Dynamic causal modelling results. Fig. 5B shows the exceedanceprobabilities of each model in our factorial 6 × 6 model space. Innonmusicians, model 22 is the winning model (P = 0.14) fol-lowed by model 23 (P = 0.10). In musicians, model 23 is thewinning model (P = 0.41) followed by model 29 (P = 0.11).Family-level inference confirmed these results. In nonmusicians,the highest exceedance probabilities were assigned to the modelfamilies with (i) speech asynchrony affecting the cerebellum →STS connection (P = 0.65) and (ii) music asynchrony affectingthe cerebellum → STS connection (P = 0.61). In musicians, thehighest exceedance probabilities were assigned to the modelfamilies with (i) speech asynchrony affecting the cerebellum →STS connection (P = 0.84) and (ii) music asynchrony affectingthe premotor → STS connection (P = 0. 94).Because the winning models in nonmusicians and musicians

were different, we averaged across the two top models (i.e., M22and M23) separately in nonmusicians and musicians. Fig. 5Cshows the strengths (nonbold numbers) of the intrinsic, extrinsic,and modulatory connections for the averaged model in non-musicians and musicians and their posterior probability of beingdifferent from zero (bold numbers). The nonbold numbers by themodulatory effects index the change in coupling (i.e., re-sponsiveness of the target region to activity in the source region)induced by asynchronous music or speech at the group level.Asynchronous speech enhanced the strength of the cerebellum→ STS connection similarly in both musicians and nonmusicians.In contrast, asynchronous music increased the strength of thepremotor→ STS connection selectively in musicians. Indeed, themodulatory effect of asynchronous music on the premotor →STS connection was significantly stronger in musicians thannonmusicians (P = 1.0). Thus, asynchronous music and speechincreased the connection strengths from premotor and cerebel-lum to STS. Specifically, both connections were inhibitory whensubjects were presented with synchronous music or speech sig-nals (i.e., premotor and cerebellum suppress activation in STS).However, when subjects were presented with audiovisual asyn-chronous signals, the connections became excitatory and induceda prediction error signal within STS, propagating throughout theSTS-premotor-cerebellar circuitry.Most importantly, musical expertise changed the dynamics in

this circuitry; it increased the modulatory effect of music asyn-chrony on the premotor → STS connection, thus turning an in-hibitory pathway for synchronous music into an excitatorypathway for asynchronous music in musicians.

Table 2. Regression analyses of neural asynchrony effects on perceptual asynchrony sensitivity for speech and music

MNI coordinates

Brain region Cluster size x y z z Score (peak) P value

Neural asynchrony effects for speech that are significantly predicted bymusicians’ + nonmusicians’ perceptual asynchrony sensitivity for speech

L. cerebellum (Crus II/VIIb) 25 −16 −76 −40 3.7 0.008L. cerebellum (Crus II/VIIb) −18 −74 −34 3.7 0.008

Neural asynchrony effects for music that are significantly predicted bymusicians’ + nonmusicians’ perceptual asynchrony sensitivity for music

NIL

Neural asynchrony effects for music that are significantly more predicted bymusicians’ than nonmusicians’ perceptual asynchrony sensitivity for music

L. superior precentral sulcus/L. premotor 1 −50 16 44 3.2 0.058L. superior precentral sulcus/L. premotor 2 −48 18 46 3.2 0.059

P value, Corrected at peak level for multiple comparisons within the search volume of interest (see Experimental Procedures, Search Volume Constraints).

Lee and Noppeney PNAS | December 20, 2011 | vol. 108 | no. 51 | E1445

NEU

ROSC

IENCE

PNASPL

US

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 6: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

DiscussionOur psychophysical and fMRI data show that piano practicingmolds audiovisual temporal binding and synchrony perception bya context-specific neural mechanism. Behaviorally, musiciansrelative to nonmusicians exhibited a significantly narrower tem-poral integration window for music but not for speech. At theneural level, musicians showed increased audiovisual asynchronyeffects and effective connectivity for music in an STS-premotor-cerebellar circuitry. Collectively, these results suggest that pianopracticing provides more precise estimates of the relative au-diovisual timings in music by fine tuning an internal forwardmodel that maps from action plans of piano playing onto visiblefinger movements and concurrent piano sounds.Accumulating evidence suggests that long-term music training

changes auditory processing throughout the cortical hierarchyand produces behavioral benefits beyond music performance (11,19) (e.g., most prominently in speech processing) (10). In con-trast to these generic auditory processing benefits, musical ex-pertise in our study sensitized to the audiovisual temporalrelationship selectively for music with no significant transfer tospeech processing (3).At the neural level, asynchronous relative to synchronous

conditions increased activation in a distributed neural systemencompassing bilateral STS/MTG, occipital and fusiform gyri,and premotor and cerebellar cortices (6, 7). This asynchronysystem was largely shared by speech and music, with no asyn-chrony effects that were specific to speech or music. Despite thiscommon neural asynchrony system, piano practicing significantlymodulated the neural asynchrony effects specifically for musicbut not for speech, thus replicating the context specificity alreadyindicated at the behavioral level. Although speech elicited com-parable asynchrony effects in musicians and nonmusicians inposterior STS/MTG bilaterally and left cerebellum, music evokedincreased asynchrony effects for musicians relative to nonmusi-cians in left premotor cortex, left cerebellum, and right posteriorSTS/MTG. Hence, audiovisual asynchrony is detected automat-ically not only along the sensory processing hierarchies andclassical audiovisual integration areas such as STS (7, 20, 21) butalso in a premotor-cerebellar circuitry. Importantly, the asyn-chrony effects within the premotor-cerebellar circuitry dependedon the availability of prior sensory-motor experience. In line withhumans’ generic speech expertise, the asynchrony effects wereobserved in left premotor cortex and cerebellum for speech inboth groups but for music primarily in musicians that wereendowed with the relevant motor repertoire of piano playing.Natural connected speech and piano music are characterized

by a rich hierarchical temporal structure and linked to the motorsystem by different effectors. Although it is well-established thateven passive speech and music perception implicitly activateparts of the motor system (22, 23), our results reveal a more fine-grained cortical hierarchy within the motor system. (i) The pri-mary motor cortex passively coactivated for music and speechactions, irrespective of the temporal relationship of the audio-visual signals. (ii) The premotor cortex was sensitive to audio-visual asynchrony and activated more for asynchronous relative

Fig. 4. Cortical hierarchy of audiovisual asynchrony effects from posteriorto anterior areas in the left frontal cortex. (A) Stimulus-evoked activationsfor speech and music (blue, M+ + M− asynchronous + synchronous > fixationfor speech + music), asynchrony effects for speech and music (orange, M+ +M− asynchronous > synchronous for speech + music), and asynchrony effectsthat are enhanced for musicians relative to nonmusicians (red, M+ > M−asynchronous > synchronous for music) are rendered on a template flat-tened brain. (B–D) Fitted event-related BOLD responses (lines) and peri-

stimulus time histogram (markers) at the given coordinate are displayed asa function of poststimulus time (PST; averaged over sessions and subjects).Insets show contrast estimates (across subjects’ mean ± SEM) of the asyn-chrony (async − sync) effect in arbitrary units (corresponding to percent ofwhole-brain mean) for musicians (black, M+) and nonmusicians (gray,M−). Z scores pertain to the comparison between M+ and M−. A positive zscore indicates that the asynchrony effect is greater in musicians relative tononmusicians (and vice versa). (E) Scatter plot depicting the regression of theneural asynchrony effects for music on perceptual asynchrony sensitivity inmusicians (black, M+) and nonmusicians (gray, M−).

E1446 | www.pnas.org/cgi/doi/10.1073/pnas.1115267108 Lee and Noppeney

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 7: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

to synchronous conditions (Fig. 4). (iii) Critically, in the anteriorpremotor cortex, audiovisual asynchrony responses for musicwere modulated by subjects’ prior sensory-motor experience ofpiano playing. Furthermore, in musicians’ only, they were alsomodulated by their perceptual asynchrony sensitivity for music.Collectively, these results suggest that sensory-motor experi-

ence enables the engagement of a premotor-cerebellar circuitryas a supplementary mechanism to determine the temporal (mis)alignment of auditory and visual signals. Previous neurophysio-logical functional imaging and lesion studies have implicated thecerebellum and premotor cortex in the perception (22, 24, 25)

and production (e.g., tapping in synchrony with musical rhythms)(26, 27) of musical and in particular, rhythmic sequences. Acti-vation in the dorsal premotor cortex was modulated by the metricstructure of the auditory stimulus (27), subjects’ cognitive set(e.g., motor imagery) (28), and their musical expertise (28, 29).Furthermore, cerebellum and premotor cortex were involved inmotor and perceptual timing, particularly at themillisecond range(12, 30–35). Patients with cerebellar lesions showed increasedvariability on temporal production tasks such as rhythmic tapping(36). Temporal relative to spatial prediction tasks increased ac-tivation in the posterior cerebellum (12).

Fig. 5. (A) Basic DCM. From this basic DCM, 36 candidate DCMs were generated by factorially manipulating the connection that was modulated by musicasynchrony or speech asynchrony. (B) Bayesian model comparison—random effects analysis for (i) musicians and (ii) nonmusicians. The matrix shows theexceedance probability of the 36 DCMs in a factorial fashion. The abscissa shows the effect of speech asynchrony. The ordinate shows the effect of musicasynchrony. In nonmusicians, model 22 is associated with the highest exceedance probability; in musicians, model 23 is associated with the highest exceedanceprobability. (C) The strengths (mean ± SEM; nonbold numbers) of the intrinsic, extrinsic, and modulatory connections for the averaged model and theirposterior probability of being different from zero (bold numbers) in the (i) nonmusicians and (ii) musicians group.

Lee and Noppeney PNAS | December 20, 2011 | vol. 108 | no. 51 | E1447

NEU

ROSC

IENCE

PNASPL

US

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 8: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

Computationally, the cerebellum is thought to instantiate aforward model that maps from motor (and even cognitive) plansonto their sensory consequences learned by error feedback fromreal life sensory-motor experience (13, 14, 37). Because speechproduction and piano playing induce concurrent visible facialmovements and auditory outputs, the internal forward modelindirectly furnishes predictions about the relative timings ofauditory outputs and visual movements. Hence, asynchronousmusic (in musicians) and speech (in both groups) that violatethese temporal predictions induce a prediction error signal inthis premotor-cerebellar circuitry (14, 15). Thus, the forwardmodel instantiates a supplementary motor-based mechanism toenable more precise audiovisual temporal estimates by simulat-ing actions and their effects. From a more cognitive perspective,this motor-based mechanism may manifest itself in the superiorskills for action imagery, simulation, and imitation in musicians(related studies of action observation and/or imagery in pianistsin refs. 28, 29, and 38).The functional relevance of left cerebellum and premotor

cortex is also supported by the relationship of subjects’ perceptualsensitivity to audiovisual asynchrony (as measured outside thescanner) and their neural asynchrony responses. For speech,the cerebellar asynchrony effects were significantly predicted bythe perceptual sensitivity to audiovisual asynchrony in bothnonmusicians and musicians. For music, the anterior premotorasynchrony effects were significantly predicted by the perceptualsensitivity to audiovisual asynchrony in musicians only. Theseresults cannot be explained by explicit motor responses, becausesubjects were engaged in passive speech and music perceptioninside the scanner. Also, they cannot be explained by eye move-ment artifacts, because subjects were fixating and eye movements(as measured during fMRI) were not significantly different be-tween synchronous and asynchronous trials (SI Results, EyeMovement Monitoring). Instead, our results show that left cere-bellum and premotor cortex are behaviorally relevant for implicitautomatic evaluation of the temporal relationship between au-ditory and visual signals. By fine tuning an internal forward modelstored within a premotor-cerebellar circuitry, sensory-motor ex-perience can, thus, influence which sensory inputs are consideredsynchronous and integrated into a coherent percept.The role of the cerebellum and premotor cortex in evaluating

the temporal alignment of auditory and visual signals is furthersupported by our results obtained by combining dynamic causalmodeling (DCM) with Bayesian model comparison and Aver-aging. In the averaged optimal DCM, asynchronous speechmodulates the connection from cerebellum → STS, whereasasynchronous music modulates primarily the connection frompremotor cortex → STS. Both asynchronous music and speechenhance the effective connectivity to STS, turning inhibitory (i.e.,negative) connections for audiovisual synchronous signals intoexcitatory (i.e., positive) connections for asynchronous signals.These findings suggest that the premotor cortex and cerebelluminfluence audiovisual temporal binding in STS by generatinga prediction error signal by increased connectivity to STS. Im-portantly, comparing the modulatory effects of asynchronousspeech and music across groups shows that the effective con-nectivity is also altered by musical expertise in a context-specificfashion. Although the modulatory effect of asynchronous speechis comparable across groups, the modulatory effect of asyn-chronous music is selectively enhanced for musicians relative tononmusicians.The specificity of the plastic changes for music in terms of (i)

behavioral audiovisual temporal integration window, (ii) audio-visual asynchrony blood oxygenation level-dependent (BOLD)responses, and (iii) effective connectivity strongly suggests sen-sorimotor rather than pure auditory learning mechanisms.However, future studies that formally compare the effect of pureaudiovisual vs. audiovisual-motor training schemes on audiovi-

sual synchrony perception are needed to further dissociate thecontributions of sensory-motor from pure audiovisual (i.e., sen-sory) learning effects. In the unisensory domains, the role ofaudio-motor experience has recently been highlighted in a mag-netoencephalography (MEG) study showing a larger mismatchnegativity for deviant musical structures after audio-motor (i.e.,piano practicing) than pure auditory learning (39). Conversely,studies comparing motor skills that are and are not associatedwith sounds (e.g., pianists vs. athletes) are needed to confirm thatmotor skills per se do not afford higher sensitivity to the tem-poral misalignment of auditory and visual signals. Finally, al-though in the current study, the benefit of musical expertise didnot significantly generalize from music to speech, it is an openquestion whether more intensive training (e.g., professionalpianists) will indeed induce generalization to some extent.In conclusion, our behavioral and neural data jointly provide

strong evidence for a context-specific mechanism, where pianopracticing affords an internal forward model that enables moreprecise predictions of the relative timings of the auditory andvisual signals. Asynchronous speech and music stimuli that vio-late the predictions from this internal forward model elicit anerror signal in an STS-premotor-cerebellar circuitry that is finetuned by sensory-motor experience. Collectively, our findingshighlight intimate links between sensory-motor experience andaudiovisual synchrony perception, where our interactions withthe environment determine whether and how we integrate au-ditory and visual inputs into a unified percept.

Experimental ProceduresSubjects. Thirty-seven (18 musicians and 19 nonmusicians) healthy Germannative speakers participated in the fMRI and psychophysics study after givinginformed consent. The musicians (M+) were amateur pianists that were se-lected based on the following criteria: (i) started piano practicing before theage of 12 y (mean ± SD = 7.9 ± 1.9 y), (ii) practicing piano for at least 6 y(mean ± SD = 16.4 ± 5.6 y), and (iii) practicing piano for at least 1 h (mean ±SD = 3.33 ± 1.69 h) per week over the past 3 y. The nonmusicians (M−) wereselected based on having no piano practicing and less than 3 mo of anymusical training (16 subjects had no musical experience).

A detailed description is in SI Experimental Procedures, Subjects.

Stimuli. Stimulus material was taken from close-up audiovisual recordings of(i) a female actress’ face looking straight into the camera, uttering shortsentences, or (ii) one male right hand playing short piano melodies on thekeyboard. The piano melodies were generated to match the rhythm andnumber of syllables of the speech sentences.

A detailed description is in SI Experimental Procedures, Stimuli.

Experimental Design and Procedures. Psychophysics study. Between 2 and 8 wkbefore the fMRI study, subjects were presented with music and speech (andsine wave analogs of speech that are not included in this report) stimuli at 13levels of audiovisual stimulus onset asynchronies (AV-SOA; ±360, ±300, ±240,±180, ±120, ±60, and 0 ms). Each stimulus was presented four times at eachAV-SOA, amounting to 1,248 presentations in total that were assigned to twosessions on separate days. The AV-SOA level and stimulus classes were ran-domized. In a two alternative, forced choice task, subjects judged each au-diovisual stimulus as synchronous or asynchronous in a nonspeeded fashion.fMRI study. Subjects were presentedwith exactly the same stimulus material asin the psychophysical study. However, the AV-SOA levels were limited tosynchronous (i.e., 0 AV offset) and asynchronous (±240ms auditory and visualleading offset). The level of AV-SOA was determined to relate to an averageof 33.6% judged as synchronous based on the psychophysical data (Fig. 2).Hence, the fMRI study used a 2 × 3 × 2 factorial design with the within-subject factors [(i) audiovisual (a)synchrony (synchronous vs. asynchronous)and (ii) stimulus class (music vs. speech vs. sinewave speech analogs (SWS))]and the between-subjects factor [group (musicians vs. nonmusicians)].

Each stimulus was presented 24 times, amounting to 576 trials in total. Thestimuli were presented in blocks of eight trials (stimulus onset asynchrony =5.6 s) interleaved with 8-s fixation. Audiovisual (a)synchrony was pseudo-randomized in an event-related fashion, and the stimulus class was manip-ulated across blocks.

To enable characterization of asynchrony effects in the motor and otherneural systems unconfounded by motor responses and task-induced pro-

E1448 | www.pnas.org/cgi/doi/10.1073/pnas.1115267108 Lee and Noppeney

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 9: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

cesses, subjects passively perceived the audiovisual stimuli, with their per-formance being monitored by eye tracking recordings. Indeed, eye trackingrecordings (inside the scanner) confirmed that subjects fixated the visualstimuli equally in all conditions and both groups (SI Results, Eye MovementMonitoring).

Experimental Setup and Stimulus Presentation. A detailed description is in SIExperimental Procedures, Experimental Setup and Stimulus Presentation.

Behavioral Analysis: Psychophysics Study (Before fMRI Study). To refrain frommaking any distributional assumptions, the psychometric function was esti-mated from the proportion synchronous responses using local quadratic fit-ting as a nonparametric approach (40). The bandwidth for the local quadraticfitting was optimized individually for each subject in a cross-validation pro-cedure. The audiovisual temporal integration window was estimated as theintegral of the fitted psychometric function bounded by ±360 ms.

fMRI Data Acquisition and Analysis. Structural and functional images wereacquired with a Siemens Trio TIM 3T scanner (SI Experimental Procedures,fMRI Data Acquisition).

The data were analyzed with statistical parametric mapping (SPM8;Wellcome Center for Neuroimaging). Scans from each subject were realigned,unwarped, and spatially normalized into Montreal Neurological Institute(MNI) space using parameters from segmentation of the T1 structural image(41), resampled to 2 × 2 × 2 mm3 voxels and spatially smoothed witha Gaussian kernel of 8 mm full width half maximum (FWHM). The time seriesin each voxel was high-pass filtered to 1/128 Hz. The fMRI experiment wasmodeled in an event-related fashion, with regressors entered into the de-sign matrix after convolving each event-related unit impulse (indexingstimulus onset) with a canonical hemodynamic response function and its firsttemporal derivative. In addition to modeling the nine conditions in our 3 × 3factorial design (auditory and visual leading asynchronous trials weremodeled separately), the statistical model included six realignment param-eters as nuisance covariates to account for residual motion artifacts. Con-dition-specific effects for each subject were estimated according to thegeneral linear model and passed to a second-level analysis as contrasts. Thisprocess involved creating four contrast images that compared (i) synchro-nous speech, (ii) asynchronous speech, (iii) synchronous music, and (iv)asynchronous music relative to fixation summed over the six sessions foreach subject and entering them into a second-level ANOVA or regressionmodels (see below). This second-level ANOVA modeled the eight conditions(i.e., four conditions for the M− and M+ groups each). Inferences were madeat the second level to allow for a random effects analysis and inferences atthe population level (42).

At the random effects level, we first tested for (i) the main effect ofaudiovisual (a)synchrony (synchronous > asynchronous for speech and musicaveraged across groups and vice versa) and (ii) interactions between au-diovisual (a)synchrony and stimulus class. We then evaluated the effect ofmusical expertise separately for speech and music. Separately for each of thetwo stimulus classes, we tested for (i) the main effect of asynchrony, (ii)asynchrony effects that are common for musicians and nonmusicians (i.e.,conjunction null conjunction analysis), and (iii) asynchrony effects that differbetween musicians and nonmusicians [i.e., the interactions between audio-visual (a)synchrony and group].

Finally, we characterized the relationship between perceptual and neuralasynchrony measures. For this characterization, the activation differencesasynchronous − synchronous separately for music (resp. for speech) wereentered into a multiple regression analysis that used subject’s perceptualasynchrony sensitivity as measured by the proportion of synchronousresponses for synchronous minus asynchronous conditions separately formusicians and nonmusicians for music (resp. for speech) as predictors fortheir corresponding neural asynchrony effects (i.e., activation differenceasynchronous − synchronous for music). We then investigated whether theneural asynchrony effects were positively (and for completeness, also neg-atively) predicted by the perceptual asynchrony sensitivity (i) averagedacross the two groups and (ii) differently for the two groups (i.e., the in-teraction between behavioral regression and group).

Search Volume Constraints. The search space for the main and interactioneffects of audiovisual (a)synchrony was limited to the cortical audiovisualprocessing system (speech + music > fixation; P < 0.05, whole-brain cor-rected, extent threshold > 100 voxels) combined with the entire cerebellumas our a priori search volume of interest (i.e., including 40,003 voxels).

To identify asynchrony effects that were common for musicians andnonmusicians, eachasynchronyeffect foronegroupwas testedwithina search

volume mutually constrained by the other contrast. This approach is basicallyequivalent to a (conjunction-null) conjunction analysis (i.e., a logical AND).

For the regression analyses (i.e., neural asynchronous − synchronous dif-ferences against behavioral synchronous − asynchronous differences) forspeech (resp. for music), we constrained the search space to voxels showingan asynchrony effect for speech (resp. for music) thresholded at P < 0.001,uncorrected and with an extent threshold > 20 voxels (speech = 277 voxels;music = 336 voxels). We applied this additional constraint to investigatewhether responses in regions of the asynchrony system are significantlypredicted by subjects’ individual perceptual sensitivity to audiovisual asyn-chrony (this constraint does not bias our inference, because the contrastsare orthogonal).

Unless otherwise stated, we report activations at P < 0.05 at the peak levelcorrected for multiple comparisons within the particular search volume.

For illustration purposes only (i.e., not for statistical inference), activationsare displayed using a height threshold of P < 0.001, uncorrected and extentthreshold > 20 voxels and are inclusively masked with the search mask asdescribed above.

Effective Connectivity Analysis: DCM. For each subject, 36 DCMs (43) wereconstructed. Each DCM included the three regions that showed greaterasynchrony effects for music in musicians than nonmusicians: (i) the leftanterior premotor cortex (x = −42, y = +20, z = +50), (ii) a right posterior STS(x = +62, y = −40, z = −4), and (iii) the left cerebellum (x= −32, y = −60, z =−36) (Fig. 4B). Given its prominent role in audiovisual integration, the rightposterior STS was chosen as the audiovisual input region. The three regionswere bidirectionally connected. The timings of the onsets were individuallyadjusted for each region to match the specific time of slice acquisition. Allaudiovisual speech and music stimuli entered as extrinsic inputs to posteriorSTS. Holding the number of parameters and the intrinsic and extrinsic con-nectivity structure constant, the 6 × 6 model space (36 DCMs) factoriallymanipulated the connection that was modulated by (i) music asynchronyand (ii) speech asynchrony (Fig. 5B).

Region-specific time series (concatenated over the six sessions and adjustedfor effects of interest) comprised the first eigenvariate of all voxels withina 4-mm-radius sphere centered on the subject-specific peak in the asyn-chronous > synchronous for speech + music contrast. The subject-specificpeak was uniquely identified as the positively valued maximum within theasynchronous > synchronous for speech + music contrast in a particular sub-ject in a 10-mm-radius sphere centered on the peak coordinates from thegroup random effects analysis. For our input region posterior STS, we im-posed the additional constraint that the effect for speech and music > fixa-tion exceeded a t value of 1.65. In cases where we could not identifya maximum given these constraints, we selected the random effects maxi-mum for this particular region and subject (this procedure was applied in fivesubjects for one region each).

Bayesian Model Comparison and Averaging. To determine the most likely ofthe 36 DCMs given the observed data from all subjects, we applied Bayesianmodel selection separately for musicians and nonmusicians in a randomeffects group analysis to avoid distortions from outliers (44). Bayesian modelselection was implemented in a hierarchical Bayesian model that estimatesthe frequencies with which models are used in each group. Gibbs samplingwas used to estimate the posterior distribution over these frequencies (45).To characterize our Bayesian model selection results at the random effectslevel, we report the exceedance probability of one model being more likelythan any other model tested.

Because Bayesian model selection of individual models can become brittle,when a large number of models is considered (45), we also used family-levelinference. Given our factorial 6 × 6 model space, we compared the six modelfamilies that differ in the connection that is modulated by (i) music asynchronyor (ii) speech asynchrony. As reported in the results section, the optimalmodelsdiffered for the musician and nonmusician groups. To enable inference andcomparison of the connectivity parameters (given a particular model) acrossmusicians and nonmusicians, we used Bayesian model averaging that com-putes an estimate of each model parameter (e.g., connection strength) byaveraging the parameters across the models weighted by the posterior prob-ability of each model. Using Bayesian model averaging at the group level, weobtained a sample-based representation of the posterior density for each in-trinsic, extrinsic, or modulatory connection parameter. From this sample-basedposterior density, we computed the posterior probability of a connection be-ing greater than zero (equivalently smaller than zero if the connectionstrength is negative). For the modulatory connections, we also computed theposterior probability of a connection strength being increased (resp. for de-creased) for the musician relative to nonmusician group.

Lee and Noppeney PNAS | December 20, 2011 | vol. 108 | no. 51 | E1449

NEU

ROSC

IENCE

PNASPL

US

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1

Page 10: Long-term music training tunes how the brain temporally binds signals from multiple … · Long-term music training tunes how the brain temporally binds signals from multiple senses

Model comparison and statistical analysis of connectivity parameters ofthe optimal averaged model enable us to address the following two ques-tions. First, we asked which connections in the STS-premotor-cerebellar cir-cuitry aremodulated by speech andmusic asynchrony tomediate the regionalprediction error signals. Second, given the averaged optimal model, we askedwhether any modulatory effects on connection strength differ across the twogroups. In other words, we investigatedwhethermusical expertise shapes theeffective connectivity in a context-specific fashion. Specifically, we hypoth-

esized that asynchronousmusicwould enhance the connection strengthmorestrongly in musicians than nonmusicians.

ACKNOWLEDGMENTS. We thank Kamila Zychaluk and David H. Foster foruseful discussions and Fabian Sinz and Mario Kleiner for help with stimulusgeneration. This study is funded by Max Planck Society and part of theresearch program of the Bernstein Center for Computational Neuroscience,Tuebingen, funded by the German Federal Ministry of Education andResearch (BMBF; FKZ: 01GQ1002).

1. Münte TF, Altenmüller E, Jäncke L (2002) The musician’s brain as a model ofneuroplasticity. Nat Rev Neurosci 3:473–478.

2. Zatorre RJ, Chen JL, Penhune VB (2007) When the brain plays music: Auditory-motorinteractions in music perception and production. Nat Rev Neurosci 8:547–558.

3. Petrini K, et al. (2009) Multisensory integration of drumming actions: Musicalexpertise affects perceived audiovisual asynchrony. Exp Brain Res 198:339–352.

4. Powers AR, 3rd, Hillock AR, Wallace MT (2009) Perceptual training narrows thetemporal window of multisensory binding. J Neurosci 29:12265–12274.

5. Bushara KO, et al. (2003) Neural correlates of cross-modal binding. Nat Neurosci 6:190–195.

6. Miller LM, D’Esposito M (2005) Perceptual fusion and stimulus coincidence in thecross-modal integration of speech. J Neurosci 25:5884–5893.

7. Noesselt T, et al. (2007) Audiovisual temporal correspondence modulates humanmultisensory superior temporal sulcus plus primary sensory cortices. J Neurosci 27:11431–11441.

8. Lewis RK, Noppeney U (2010) Audiovisual synchrony improves motion discriminationvia enhanced connectivity between early visual and auditory areas. J Neurosci 30:12329–12339.

9. Musacchia G, Sams M, Skoe E, Kraus N (2007) Musicians have enhanced subcorticalauditory and audiovisual processing of speech and music. Proc Natl Acad Sci USA 104:15894–15898.

10. Wong PC, Skoe E, Russo NM, Dees T, Kraus N (2007) Musical experience shapes humanbrainstem encoding of linguistic pitch patterns. Nat Neurosci 10:420–422.

11. Kraus N, Chandrasekaran B (2010) Music training for the development of auditoryskills. Nat Rev Neurosci 11:599–605.

12. O’Reilly JX, Mesulam MM, Nobre AC (2008) The cerebellum predicts the timing ofperceptual events. J Neurosci 28:2252–2260.

13. Ramnani N (2006) The primate cortico-cerebellar system: Anatomy and function. NatRev Neurosci 7:511–522.

14. Wolpert DM, Miall RC, Kawato M (1998) Internal models in the cerebellum. TrendsCogn Sci 2:338–347.

15. Friston K (2010) The free-energy principle: A unified brain theory? Nat Rev Neurosci11:127–138.

16. Bengtsson SL, Ullén F (2006) Dissociation between melodic and rhythmic processingduring piano performance from musical scores. Neuroimage 30:272–284.

17. Koch G, et al. (2007) Repetitive TMS of cerebellum interferes with millisecond timeprocessing. Exp Brain Res 179:291–299.

18. Lewis PA, Miall RC (2003) Brain activation patterns during measurement of sub- andsupra-second intervals. Neuropsychologia 41:1583–1592.

19. Patel AD, Iversen JR (2007) The linguistic benefits of musical abilities. Trends Cogn Sci11:369–372.

20. Werner S, Noppeney U (2010) Distinct functional contributions of primary sensory andassociation areas to audiovisual integration in object categorization. J Neurosci 30:2662–2675.

21. Beauchamp MS, Argall BD, Bodurka J, Duyn JH, Martin A (2004) Unravelingmultisensory integration: Patchy organization within human STS multisensory cortex.Nat Neurosci 7:1190–1192.

22. Chen JL, Penhune VB, Zatorre RJ (2008) Listening to musical rhythms recruits motorregions of the brain. Cereb Cortex 18:2844–2854.

23. Lahav A, Saltzman E, Schlaug G (2007) Action representation of sound: Audiomotorrecognition network while listening to newly acquired actions. J Neurosci 27:308–314.

24. Bengtsson SL, et al. (2009) Listening to rhythms activates motor and premotorcortices. Cortex 45:62–71.

25. Grahn JA, Rowe JB (2009) Feeling the beat: Premotor and striatal interactions inmusicians and nonmusicians during beat perception. J Neurosci 29:7540–7548.

26. Chen JL, Penhune VB, Zatorre RJ (2008) Moving on time: Brain network for auditory-motor synchronization is modulated by rhythm complexity and musical training. JCogn Neurosci 20:226–239.

27. Chen JL, Zatorre RJ, Penhune VB (2006) Interactions between auditory and dorsalpremotor cortex during synchronization to musical rhythms. Neuroimage 32:1771–1781.

28. Baumann S, et al. (2007) A network for audio-motor coordination in skilled pianistsand non-musicians. Brain Res 1161:65–78.

29. Haslinger B, et al. (2005) Transmodal sensorimotor networks during actionobservation in professional pianists. J Cogn Neurosci 17:282–293.

30. Ivry RB, Spencer RM (2004) The neural representation of time. Curr Opin Neurobiol14:225–232.

31. Penhune VB, Zattore RJ, Evans AC (1998) Cerebellar contributions to motor timing: APET study of auditory and visual rhythm reproduction. J Cogn Neurosci 10:752–765.

32. Del Olmo MF, Cheeran B, Koch G, Rothwell JC (2007) Role of the cerebellum inexternally paced rhythmic finger movements. J Neurophysiol 98:145–152.

33. Moberget T, et al. (2008) Detecting violations of sensory expectancies followingcerebellar degeneration: A mismatch negativity study. Neuropsychologia 46:2569–2579.

34. Rao SM, et al. (1997) Distributed neural systems underlying the timing of movements.J Neurosci 17:5528–5535.

35. Spencer RM, Verstynen T, Brett M, Ivry R (2007) Cerebellar activation during discreteand not continuous timed movements: An fMRI study. Neuroimage 36:378–387.

36. Spencer RM, Zelaznik HN, Diedrichsen J, Ivry RB (2003) Disrupted timing ofdiscontinuous but not continuous movements by cerebellar lesions. Science 300:1437–1439.

37. Mauk MD, Buonomano DV (2004) The neural basis of temporal processing. Annu RevNeurosci 27:307–340.

38. Bangert M, et al. (2006) Shared networks for auditory and motor processing inprofessional pianists: Evidence from fMRI conjunction. Neuroimage 30:917–926.

39. Lappe C, Herholz SC, Trainor LJ, Pantev C (2008) Cortical plasticity induced by short-term unimodal and multimodal musical training. J Neurosci 28:9632–9639.

40. Zychaluk K, Foster DH (2009) Model-free estimation of the psychometric function.Atten Percept Psychophys 71:1414–1425.

41. Ashburner J, Friston KJ (2005) Unified segmentation. Neuroimage 26:839–851.42. Friston KJ, et al. (1994) Statistical parametric maps in functional imaging: A general

linear approach. Hum Brain Mapp 2:189–210.43. Friston KJ, Harrison L, Penny W (2003) Dynamic causal modelling. Neuroimage 19:

1273–1302.44. Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009) Bayesian model

selection for group studies. Neuroimage 46:1004–1017.45. Penny WD, et al. (2010) Comparing families of dynamic causal models. PLOS Comput

Biol 6:e1000709.

E1450 | www.pnas.org/cgi/doi/10.1073/pnas.1115267108 Lee and Noppeney

Dow

nloa

ded

by g

uest

on

Aug

ust 6

, 202

1