culture modulates the brain response to human expressions ...major western culture (e.g.,...

13
Culture modulates the brain response to human expressions of emotion: Electrophysiological evidence Pan Liu a,n , Simon Rigoulot b , Marc D. Pell a,b a School of Communication Sciences and Disorders, McGill University, Montréal, Québec, Canada b International Laboratory for Brain, Music and Sound Research (BRAMS), Centre for Research on Brain, Language, and Music (CRBLM), McGill University, Montréal, Québec, Canada article info Article history: Received 27 July 2014 Received in revised form 24 November 2014 Accepted 30 November 2014 Available online 2 December 2014 Keywords: Cross-cultural studies ERP Stroop Facial expression Emotional prosody abstract To understand how culture modulates on-line neural responses to social information, this study com- pared how individuals from two distinct cultural groups, English-speaking North Americans and Chinese, process emotional meanings of multi-sensory stimuli as indexed by both behaviour (accuracy) and event-related potential (N400) measures. In an emotional Stroop-like task, participants were presented facevoice pairs expressing congruent or incongruent emotions in conditions where they judged the emotion of one modality while ignoring the other (face or voice focus task). Results indicated that while both groups were sensitive to emotional differences between channels (with lower accuracy and higher N400 amplitudes for incongruent facevoice pairs), there were marked group differences in how in- truding facial or vocal cues affected accuracy and N400 amplitudes, with English participants showing greater interference from irrelevant faces than Chinese. Our data illuminate distinct biases in how adults from East Asian versus Western cultures process socio-emotional cues, supplying new evidence that cultural learning modulates not only behaviour, but the neurocognitive response to different features of multi-channel emotion expressions. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction In today's highly globalized world, social interactions involving people from different cultural backgrounds are not unusual, de- spite the fact that acquired rules and routines for interacting so- cially, and possibly the neurocognitive system that supports social communication, vary in marked ways across cultures. One aspect of communication that is vital for interpersonal rapport is the ability to correctly interpret facial and vocal displays of emotion. This study represents one of the rst attempts to investigate whether culture shapes the on-line neural response to multi-sen- sory emotional expressions (combined faces and voices) beyond the level of behavioural performance. These data will supply new information on how culture modulates the cortical response to socio-emotional cues that are an integral part of (inter-cultural) communication. 1.1. Cultural differences in cross-channel emotion processing Humans typically utilize multiple information channels to ex- press and understand emotions in daily life (Grandjean et al., 2006; Paulmann and Pell, 2011); for example, a happy utterance is often accompanied by a smiling face. However, recent psycholo- gical studies imply that cultures vary in how they process emo- tions from different information sources, at least when beha- vioural responses are examined. In a key study conducted by Ta- naka et al. (2010), Japanese and Dutch participants performed an emotional Stroop-like task consisting of emotionally congruent or incongruent facevoice pairs; participants had to identify the emotion of the face while ignoring the voice, or vice versa. In the Stroop paradigm, a Stroop or congruence effect on behaviour is expected in the form of longer response times and/or lower ac- curacy for incongruent versus congruent trials, which suggests that incongruent cues from the to-be-ignored source impede performance of the target task; likewise, congruent information from the to-be-ignored channel facilitates the target task. Tanaka et al. found that Japanese participants demonstrated a signicantly larger congruence effect in accuracy than Dutch participants when judging facial expressions, whereas the Dutch exhibited a smaller congruence effect when identifying the tone. This suggests that Japanese participants were inuenced to a larger extent by to-be- Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neuropsychologia Neuropsychologia http://dx.doi.org/10.1016/j.neuropsychologia.2014.11.034 0028-3932/& 2014 Elsevier Ltd. All rights reserved. n Correspondence to: School of Communication Sciences and Disorders, McGill University, 2001 McGill College, 8th oor, Montréal, Québec, Canada H3G 1A8. Fax: þ1 514 398 8123. E-mail address: [email protected] (P. Liu). Neuropsychologia 67 (2015) 113

Upload: others

Post on 27-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

Neuropsychologia 67 (2015) 1–13

Contents lists available at ScienceDirect

Neuropsychologia

http://d0028-39

n CorrUniversFax: þ1

E-m

journal homepage: www.elsevier.com/locate/neuropsychologia

Culture modulates the brain response to human expressions ofemotion: Electrophysiological evidence

Pan Liu a,n, Simon Rigoulot b, Marc D. Pell a,b

a School of Communication Sciences and Disorders, McGill University, Montréal, Québec, Canadab International Laboratory for Brain, Music and Sound Research (BRAMS), Centre for Research on Brain, Language, and Music (CRBLM), McGill University,Montréal, Québec, Canada

a r t i c l e i n f o

Article history:Received 27 July 2014Received in revised form24 November 2014Accepted 30 November 2014Available online 2 December 2014

Keywords:Cross-cultural studiesERPStroopFacial expressionEmotional prosody

x.doi.org/10.1016/j.neuropsychologia.2014.11.032/& 2014 Elsevier Ltd. All rights reserved.

espondence to: School of Communication Scity, 2001 McGill College, 8th floor, Montréa514 398 8123.ail address: [email protected] (P. Liu).

a b s t r a c t

To understand how culture modulates on-line neural responses to social information, this study com-pared how individuals from two distinct cultural groups, English-speaking North Americans and Chinese,process emotional meanings of multi-sensory stimuli as indexed by both behaviour (accuracy) andevent-related potential (N400) measures. In an emotional Stroop-like task, participants were presentedface–voice pairs expressing congruent or incongruent emotions in conditions where they judged theemotion of one modality while ignoring the other (face or voice focus task). Results indicated that whileboth groups were sensitive to emotional differences between channels (with lower accuracy and higherN400 amplitudes for incongruent face–voice pairs), there were marked group differences in how in-truding facial or vocal cues affected accuracy and N400 amplitudes, with English participants showinggreater interference from irrelevant faces than Chinese. Our data illuminate distinct biases in how adultsfrom East Asian versus Western cultures process socio-emotional cues, supplying new evidence thatcultural learning modulates not only behaviour, but the neurocognitive response to different features ofmulti-channel emotion expressions.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

In today's highly globalized world, social interactions involvingpeople from different cultural backgrounds are not unusual, de-spite the fact that acquired rules and routines for interacting so-cially, and possibly the neurocognitive system that supports socialcommunication, vary in marked ways across cultures. One aspectof communication that is vital for interpersonal rapport is theability to correctly interpret facial and vocal displays of emotion.This study represents one of the first attempts to investigatewhether culture shapes the on-line neural response to multi-sen-sory emotional expressions (combined faces and voices) beyondthe level of behavioural performance. These data will supply newinformation on how culture modulates the cortical response tosocio-emotional cues that are an integral part of (inter-cultural)communication.

34

iences and Disorders, McGilll, Québec, Canada H3G 1A8.

1.1. Cultural differences in cross-channel emotion processing

Humans typically utilize multiple information channels to ex-press and understand emotions in daily life (Grandjean et al.,2006; Paulmann and Pell, 2011); for example, a happy utterance isoften accompanied by a smiling face. However, recent psycholo-gical studies imply that cultures vary in how they process emo-tions from different information sources, at least when beha-vioural responses are examined. In a key study conducted by Ta-naka et al. (2010), Japanese and Dutch participants performed anemotional Stroop-like task consisting of emotionally congruent orincongruent face–voice pairs; participants had to identify theemotion of the face while ignoring the voice, or vice versa. In theStroop paradigm, a Stroop or congruence effect on behaviour isexpected in the form of longer response times and/or lower ac-curacy for incongruent versus congruent trials, which suggeststhat incongruent cues from the to-be-ignored source impedeperformance of the target task; likewise, congruent informationfrom the to-be-ignored channel facilitates the target task. Tanakaet al. found that Japanese participants demonstrated a significantlylarger congruence effect in accuracy than Dutch participants whenjudging facial expressions, whereas the Dutch exhibited a smallercongruence effect when identifying the tone. This suggests thatJapanese participants were influenced to a larger extent by to-be-

Page 2: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

P. Liu et al. / Neuropsychologia 67 (2015) 1–132

ignored vocal tones, whereas Dutch participants were influencedmore by irrelevant faces. The authors concluded that Japanesepeople are more likely to allocate their attention to vocal cues inspeech, whereas Westerners are more attentive to facial in-formation even when instructed to ignore it, highlighting im-portant differences in how each cultural group processes emo-tional cues.

These results are in line with previous findings pointing tocultural differences in how emotions are perceived from combinedvocal cues (“speech prosody”) and semantic word meanings. Ki-tayama and Ishii (2002) presented a Stroop-like task consisting ofsemantically negative or positive words, spoken in an emotionallycongruent or incongruent vocal tone, to groups of Japanese andAmerican participants who had to identify the emotion of theword meaning while ignoring the tone, or vice versa. They re-ported that for Japanese participants, response times when jud-ging word meaning were influenced to a larger extent by to-be-ignored vocal tones than for American participants, whereas theopposite group pattern was observed when identifying the tone.This pattern again suggests that Japanese participants were moreattuned to vocal cues when performing the task, whereas Amer-icans were more sensitive to semantic cues. Coupled with Tanakaet al.'s (2010) findings, these results imply that during multi-channel emotion perception, East Asians are relatively more sen-sitive to vocal information whereas Westerners are more orientedtowards semantic or facial cues. This conclusion is also suggestedin a study of Tagalog–English bilinguals tested in the Philippines(Ishii et al., 2003). Collectively, these data argue that cultures varyin how they process emotional expressions, although evidence isso far restricted to off-line behavioural judgments.

To explain these cultural differences, it has been suggested thatdisplay rules, which refer to culture-specific social norms regulat-ing how emotions are expressed in socially appropriate ways, playa central role in emotion perception (Engelmann and Pogosyan,2013; Ishii et al., 2003; Park and Huang, 2010). In contrast toWestern “individualist” cultures, it is believed that East Asian“collectivist” cultures consider harmonious social relations as mostimportant (Hall and Hall, 1990; Scollon and Scollon, 1995). Tomaintain social harmony, certain display rules are thereforeadopted by these groups (Gudykunst et al., 1996; Matsumoto et al.,1998); for example, East Asian cultures tend to favour indirectnessin communication (e.g., unfinished sentences or vague linguisticexpressions) to a greater extent than Western cultures to preventembarrassment between interlocutors (Bilbow, 1997). East Asiancultures also endorse the control of facial expressions, especiallynegative ones, whereas Western cultures encourage explicit facialexpressions of emotion (Ekman, 1971; Markus and Kitayama, 1991;Matsumoto et al., 1998, 2008, 2005). East Asians are also moreinclined to avoid eye contact than Westerners, as a sign of respectand politeness to others that is thought to benefit social harmony(Hawrysh and Zaichkowsky, 1991; McCarthy et al., 2006, 2008).Due to these culture-specific display rules, it is possible that EastAsians rely more on vocal cues to express their emotions and tounderstand the feelings of others during interpersonal commu-nication; in contrast, Western cultures tend to rely more on facialand semantic information which tends to be more salient andaccessible during their social interactions (Ishii et al., 2003; Ki-tayama and Ishii, 2002; Tanaka et al., 2010).

These ideas lead to the question: do cultural differences inemotion processing previously attributed to “East Asian” culturesgeneralize to Chinese? Given that the Chinese culture constitutes amajor part of East Asia—one that shares the essential collectivistdisplay rules of previously-studied cultures such as Japanese—itwould be valuable to advance this literature by examining howChinese process cross-channel emotions when compared to amajor Western culture (e.g., English-speaking North Americans).

From the perspective that display rules promote differences in howEastern and Western cultures process facial and vocal cues aboutemotion, one might expect that Chinese participants would show asimilar pattern as the Japanese population when compared to Wes-tern participants, i.e., higher sensitivity to vocal cues than to facialcues. Testing this hypothesis in a manner that includes both beha-vioural and on-line neural responses—data that begin to inform thenature as well as the time course of cultural effects on multi-channelemotion processing—was another major goal of this study.

1.2. Cultural effects on neurophysiological responses to emotionalstimuli

While cultural differences in behavioural judgments of emo-tional displays have been reported, there is still little evidence thatculture affects brain responses that occur as humans process cross-channel emotions in real time. Electrophysiological data show thatcross-channel emotion perception is a dynamic process marked byvarious temporal stages that can be indexed by different event-related brain potentials (ERPs) (de Gelder et al., 1999; Ishii et al.,2010; Jessen and Kotz, 2011; Schirmer and Kotz, 2003). For in-stance, using an Oddball-like task consisting of emotional face–voice pairs that were presented to Dutch participants, de Gelderet al. (1999) observed a mismatch negativity (MMN) componentsuggesting that early cross-sensory integration of emotional cuesoccurs very quickly, approximately 100–200 milliseconds (ms)after stimulus onset. Similarly, in an ERP study that presented aStroop-like task, presented positive, negative, and neutral wordsspoken in a congruent or incongruent emotional voice to Germanparticipants, who identified the valence of the word meaning andignored the tone, or vice versa. In this study they observed a largerN400—a component sensitive to the integration of matching/mis-matching semantic meanings from different contextual sources(Bower, 1981; Bowers et al., 1993; Brown et al., 2000; Schirmer andKotz, 2003)—in response to incongruent versus congruent word–tone pairings in the word valence judgment (although only infemale participants). Differences in N400 amplitude were alsoreported by Ishii et al. (2010) using an emotional Stroop-like taskthat contrasted emotional tone and emotional word meanings inJapanese, with greater amplitudes when cross-channel emotionalcues were incongruent in meaning. Although these studies did notdirectly compare participants from different cultural backgrounds,they suggest that the N400 response can be meaningfully ex-amined to shed light on whether cultural differences exist at theneural processing stage where the emotional meaning of a faceand a vocal expression is registered and compared, around 400 msafter event onset, allowing deeper specification of behaviouralfindings on face–voice processing (e.g., Tanaka et al., 2010).

In another noteworthy study, Schirmer et al. (2006) conductedan EEG experiment in which they presented an emotional Stroop-like task to Cantonese speakers from Hong Kong. Two-syllableCantonese words spoken in a congruent or incongruent emotionalvoice were presented to Cantonese listeners, who identified theemotion of the word meaning while ignoring the vocal tone asbrain potentials were recorded. Results of this study failed to un-cover a typical Stroop/N400 effect in the Cantonese as that re-ported for German participants (Schirmer and Kotz, 2003); rather,a late positivity peaking around 1500 ms was observed with largeramplitude in the incongruent versus congruent condition. Theauthors argued that in contrast to the N400 effect found in Ger-man participants, the delayed congruence effect they witnessed inCantonese listeners was related to the existence of lexical tonesand a higher degree of homophony in the Cantonese language;this may have rendered lexical processing more effortful andpostponed the integration of emotional voice tone with the se-mantic message in this study (Schirmer et al., 2006).

Page 3: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

P. Liu et al. / Neuropsychologia 67 (2015) 1–13 3

While this conclusion may seem inconsistent with behaviouraldata showing that Japanese participants are more sensitive tovocal emotional cues than word meanings, it does not necessarilycontradict the hypothesis that East Asians prioritize emotionalspeech prosody to a relatively greater extent when processingmulti-channel emotions. Schirmer et al.'s (2006) observationscould be accounted for by several methodological factors; for ex-ample, since the emotional salience of word meanings and emo-tional prosody was not matched between channels in the Canto-nese stimuli, it is possible that prosody was emotionally lesssalient and easy to ignore, exerting little or delayed influence onjudging the word meaning in their task. This can be inferred by theparticipants' high accuracy rates (around 95%) in the word judg-ment task. From a cultural perspective, given the long-term historyof Hong Kong as a British colony, and the fact that Hong Kongpeople are mostly educated in the English language, it is alsopossible that differences in cultural exposure may have affectedhow Schirmer et al.'s participants process emotional information,even in their native language (see Liu et al., in preparation). Pro-cessing emotions from two-syllable words versus full utterancescould also affect the extent to which biases for vocal cues aredetected across studies. Here, by directly comparing participantsfrom Mainland China who had just recently been exposed toWestern society with matched participants from (English) NorthAmerica, and by matching cross-channel stimuli for emotionalsalience, we therefore sought to address some of the valuablequestions raised by previous authors to clarify how culture influ-ences the processing of emotional meanings encoded in more thanone channel (in this case, face and voice).

Thus, while cultural differences in behaviour have been docu-mented (Tanaka et al., 2010; Kitayama and Ishii, 2002) and thereare clues in the N400 literature that the neural semantic proces-sing of multi-channel emotions may differ between Eastern andWestern cultures (Schirmer and Kotz, 2003; Schirmer et al., 2006;Ishii et al., 2010), the hypothesis that culture modulates on-lineneural responses to multi-channel emotions is largely untested.Nonetheless, this idea fits with recent evidence that the N400component is sensitive to cultural differences in other cognitivedomains. For example, cultural background was shown to mod-ulate the N400 when (otherwise unspecified) East Asian andEuropean American participants were presented an affectivebackground picture, followed by a superimposed facial expressionthat was emotionally congruent or incongruent with the back-ground scene (e.g., a happy face on a positive scene). Whenidentifying the facial expression, East Asians showed a larger N400for incongruent picture–face pairs compared to congruent pairs,whereas Americans exhibited no difference across conditions,suggesting that East Asians were influenced to a larger degree bybackground information than the other group. The authors con-cluded that it is the process of semantic integration, indexed byN400, that is modulated by culture (Goto et al., 2013). These dataprovide further incentives to investigate how culture shapes on-line neural responses as humans process multi-channel emotionalexpressions.

1.3. The present study

Building on recent work, this study administered an emotionalStroop-like task to inform how culture modulates the processingof emotional meanings from combined faces and voices in re-ference to both “off-line” behavioural judgements (accuracy, re-sponse times) and on-line neural responses to underlying stimulusmeanings (N400 component). Adults from two major culturalgroups, English North Americans and Chinese, were directly com-pared using emotional stimuli (static faces and pseudo-utterances)that were paired with each other to form congruent and

incongruent face–voice dyads in each cultural condition. Based onprevious findings suggesting that East Asian display rules favour-ing social harmony promote culture-specific biases in emotionprocessing for Japanese participants (Ishii et al., 2003; Kitayamaand Ishii, 2002; Tanaka et al., 2010), it was hypothesized thatMandarin-speaking Chinese participants, who share many of thesame values and cultural display rules, will also be relatively moreattuned to vocal expressions than facial expressions when pro-cessing cross-channel emotions. In contrast, English North Amer-icans might be expected to attend to a relatively greater extent tofacial cues when performing our task. Based on literature showingthat the N400 is typically sensitive to the emotional relationship ofmulti-channel stimuli in Stroop-like paradigms (i.e., Schirmer andKotz, 2003; Ishii et al., 2010; Schirmer et al., 2006), and that cul-tural variables modulate N400 amplitudes in other cognitive do-mains involving semantic processing (Goto et al., 2010, 2013), wepredicted that culture-specific biases would be observed in bothbehavioural performance and N400 responses to our stimuli whenthe groups are compared: Chinese participants should demon-strate a larger Stroop effect than North Americans when judgingthe emotion of faces and ignoring the voices for both types ofmeasurements, whereas the opposite group pattern was predictedwhen participants judged the voice and ignored the face (i.e.,larger Stroop effect for English North Americans).

2. Method

2.1. Participants

Two groups of participants were recruited. The first groupconsisted of 19 English-speaking North Americans (10 female,9 male; mean age¼2573.91 years; years ofeducation¼14.1872.19 years). Given the diversity of culturalbackgrounds in native English speakers in North America, ascreening questionnaire was administered to ensure that eachparticipant (1) was born and raised in Canada or the U.S. and spokeEnglish as their native language and (2) had at least one grand-parent of British decent on both the maternal and paternal side oftheir family, with other grandparents of descent from otherWestern European countries. The questionnaire also establishedthat no participant in this group had lived for any extended periodoutside of North America. The second group consisted of 20Mandarin-speaking Chinese participants (10 female, 10 male; meanage¼24.5572.75 years; years of education¼16.4572.68 years).Based on a questionnaire, each participant was a native Mandarinspeaker born and raised in Mainland China and none had livedoutside of China for more than one year (all participants were re-cent immigrants or students studying in Montreal, Canada). Nodifference in age (F(1,37)¼0.253, p¼0.618) or years of education (F(1,37)¼1.039, p¼0.319) was found between groups, which will bereferred to henceforth as the “English” and “Chinese” groups forease of description. None of the participants reported any hearingimpairment, and all had normal or corrected-to-normal vision. Thestudy was approved by the Faculty of Medicine Institutional Re-view Board, McGill University and informed written consent wasobtained in the native language of the participant prior to testing.All participants received modest financial compensation at the endof the study ($60CAD).

2.2. Stimuli

The vocal stimuli were fragments of Chinese and Englishpseudo-sentences uttered in sad and fear emotional tones adoptedfrom previously validated recording databases (Liu and Pell, 2012;Pell et al., 2009a, 2009b). They were produced by 4 native

Page 4: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

P. Liu et al. / Neuropsychologia 67 (2015) 1–134

speakers of each language (2 female, 2 male) in a sad or a feartone; in total, 8 fear and 8 sad pseudo-utterances (2 items�4speakers) were included. Pseudo-sentences are composed ofpseudo content words conjoined by real function words, renderingthem meaningless but resembling the phonetic-segmental andsupra-segmental properties of the target language (Banse andScherer, 1996; Pell and Baum, 1997); such stimuli have been usedeffectively in studies interested in the perception of voice toneindependent of the linguistic-semantic content during speech(Castro and Lima, 2010; Paulman and Pell, 2010; Pell et al., 2009a,2009b; Rigoulot and Pell, 2012, 2014; Rigoulot et al., 2013). In thepresent study, our interest also lies in the perception of emotionsin the context of speech processing, which occurs most frequentlyin daily communication. To meet this end, emotionally inflectedpseudo-utterances were adopted as vocal stimuli to maximallyimitate human speech processing while the semantic content waswell controlled.

The facial stimuli were black and white Chinese and Caucasianfaces expressing sadness and fear adopted from the Montreal Setof Facial Displays of Emotions (Beaupre and Hess, 2005). In thisdatabase, 6 adult actors (3 female, 3 male) from Chinese or Cau-casian North American group were instructed to produce faces ofdifferent emotions according to the Facial Action Coding System(Ekman and Friesen, 1978). Six fear and six sad faces (1 item�6actors) of each group were selected for the current study (Fig. 1).

In a pilot study designed to further validate and select thestimuli, 562 English pseudo-utterances and 553 Chinese pseudo-utterances conveying six emotional meanings (fear, sadness, hap-piness, disgust, anger, and neutrality) were initially selected fromthe vocal database (Liu and Pell, 2012; Pell et al., 2009a, 2009b). Inorder to keep the amount of expressed information consistentacross stimuli, the original utterances with varied duration werecut into fragments of 800 ms measured from sentence onset usingPraat software (Boersma and Weenink, 2001); this duration ap-proximates the average length of utterances presented elsewherein the immediate literature. For the facial stimuli, 48 Caucasianfaces and 48 Chinese faces of the six emotion categories wereselected from the facial database. Both the 800 ms long vocal sti-muli and the facial expressions were then subjected to a validationstudy, in which each stimulus was validated by 20 in-group nativespeakers (i.e., 20 English-speaking European Canadians and 20Mandarin-speaking Chinese, respectively). For each stimulus, theparticipant made two consecutive judgments: they first identifiedthe emotion being expressed by each item in a six forced-choiceemotion recognition task (with happiness, sadness, anger, disgust,

Fig. 1. Examples of facial and vocal stimuli. Top left, example of Chinese sad face; bottomsadness. Top right, example of Caucasian fear face; bottom right, example of English ps

fear, and neutrality as the 6 options); immediately after, they ratedthe intensity of the emotion they had selected in the previousrecognition task on a 5-point rating scale, where 0 indicated “notintense at all” and 4 indicated “very intense”.

For stimulus selection, recognition rates were calculated foreach item across participants; intensity ratings for each item werecalculated across participants with correct responses in the emo-tion recognition task. A three-way ANOVA was run on recognitionrates and intensity ratings separately, with sensory channel (voiceand face), emotion (fear and sadness), and group (English andChinese) as three independent variables. To control the emotionalsalience of the stimuli and ensure that the expected culture-spe-cific bias is not confounded with stimulus salience and clarity(Matsumoto, 2002), stimuli were selected in a way that both therecognition rates and intensity ratings were matched betweenmodalities (vocal and facial), emotions (fear and sadness), andcultural groups (English and Chinese); i.e., there were no sig-nificant differences in recognition rates or intensity ratings be-tween vocal and facial modalities, fear and sad emotions, and thetwo groups. In order to meet all these criteria, 8 pseudo-utterances(2 item�4 speakers) and 6 facial expressions (1 item�6 actors) oftwo emotional categories, sadness and fear, were selected for eachgroup in the present study. Emotion recognition rates and in-tensity ratings of these selected stimuli were of medium-to-highlevel for both groups (Table 1). In our tasks, only in-group stimuliwere presented to participants in each group (i.e., Chinese facesand voices to the Chinese group, Caucasian faces and Englishvoices to the English group); this design satisfies our immediategoal of evaluating how multi-sensory emotions are perceived byeach group in the context of their native language and culture,based on learning such as display rules that are likely to produceculture-specific responses. For each group, the in-group facial andvocal stimuli were paired with each other to construct bi-sensoryemotional stimuli, forming congruent (fear face and fear voice, sadface and sad voice) and incongruent (fear face and sad voice, sadface and fear voice) trials. In each pair, the static face was syn-chronized to the dynamic voice, resulting in a total duration of800 ms for each face–voice pair.

2.3. Task

An emotional Stroop-like paradigm yielding both behaviouraland ERP measures was administered, composed of two experi-mental conditions: a face task and a voice task. Each conditionconsisted of 1 block of 192 trials, in which 96 trials were composed

left, example of Chinese pseudo-sentence and waveform of this sentence spoken ineudo-sentence and waveform of this sentence spoken in sadness.

Page 5: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

Table 1Mean recognition rates (per cent correct target identification) and emotional intensity ratings (on a 5-point scale, where 0¼very weak and 4¼very intense) of the vocal andfacial stimuli by emotion and cultural group (standard deviation in parentheses).

Voices Faces

English Chinese English Chinese

Fear Sadness Fear Sadness Fear Sadness Fear Sadness

Recognition rates 90.6(1.8) 92.5(2.7) 91.2(5.2) 91.2(5.2) 90.8(8.6) 91.7(9.3) 91.2(7.6) 90.9(10.0)Intensity ratings 2.4(0.3) 2.4(0.4) 2.5(0.3) 2.4(0.3) 2.4(0.3) 2.4(0.4) 2.3(0.4) 2.6(0.5)

P. Liu et al. / Neuropsychologia 67 (2015) 1–13 5

of congruent face–voice pairs (48 fear face–voice pairs, 48 sadface–voice pairs), while the other 96 were incongruent pairs (48fear face–sad voice pairs, 48 sad face–fear voice pairs). In the facetask, the participants were required to identify the facial expres-sion as fear or sad and ignore the voice; in the voice task, theyidentified the vocal expression as fear or sad while ignoring theface.

2.4. Procedure

The experimental stimuli were presented and controlled byPresentations software (Version 16.0, www.neurobs.com). Eachtrial started with a 1000 ms fixation cross at the centre of themonitor followed by a 500 ms blank screen, followed immediatelyby the face–voice pair which lasted for 800 ms. The face was dis-played at the centre of a computer monitor (600�420 pixels) andthe voice was presented binaurally via insert headphones at acomfortable listening level. The participants were instructed torespond as accurately and as quickly as possible to identify theemotion of the face or the voice as fear or sad, by pressing one oftwo buttons on a response box, while ignoring the other channel.The assignment of the two response labels, fear (“害怕” in Chinese)and sad (“伤心” in Chinese), was counterbalanced to the right andleft response buttons. During the voice task, the participants wereinstructed to passively look at the faces but actively attend to theemotion expressed by the voice and make the judgment accord-ingly. They were constantly monitored by the experimenter duringthe task to make sure that they were closely following the in-structions. After a response (or after 3500 ms from the offset of theface–voice stimulus in case of no response), a variable inter-trialinterval of 500–1000 ms occurred before the next trial began(specifically, a series of random number within the range of 500–1000 were generated in Excel and was used as the length of inter-trial interval for each experimental condition). Fig. 2 illustrates thesequential structure of each trial. All trials were presented ran-domly within each block. Each condition started with a practiceblock of 10 trials (which did not appear in the experiment) to fa-miliarize participants with the experimental procedure. The orderof the two conditions (face task condition and voice task condition)

Fig. 2. Trial structure of the voice and face tasks.

was counter-balanced among participants, and a 10-min breakwas inserted between blocks.

2.5. EEG recording and preprocessing

After preparation for EEG recording, the participants were se-ated at a distance of approximately 65 cm in front of a computermonitor in a dimly lit, sound-attenuated and electrically-shieldedtesting booth. While performing the experiment, EEG signals wererecorded from 64 cap-mounted active electrodes (modified 10/20System) with AFz electrode as the ground, FCz electrode as on-linereference (Brain products, ActiCap, Munich), and a sampling rate of1000 Hz. Four additional electrodes were placed for vertical andhorizontal electro-oculogram recording: two at the outer canthi ofeyes and two above and below the left eye. The impedance for allelectrodes was kept below 5 kΩ. The EEG data were resampled off-line to 250 Hz, re-referenced to the average of all 64 electrodes,and 0.1–30 Hz bandpass filtered by EEGLab (Delorme and Makeig,2004). Trials were epoched from �200 to 800 ms relative to sti-mulus onset with a pre-stimulus baseline of 200 ms (�200 to0 ms). The data were then inspected visually to reject unreliablechannels and trials containing large artifacts and drifts, afterwhich EOG artifacts were rejected by means of ICA decomposition.On average, approximately 78.4% of the data were retained foranalysis after excluding artifacts and incorrect responses (Chineseface task: 80.8%; Chinese voice task: 80.2%; English face task:76.1%; and English voice task: 76.6%). At a final step, data wereaveraged according to the experimental conditions for each group(congruent/incongruent trials of the face task, congruent/incon-gruent trials of the voice task).

2.6. Statistical analyses

Our analyses were designed to evaluate the predicted Stroopeffect in both our behavioural measures (accuracy rates and re-sponse time) and N400 data. For the ERP data, only trials withcorrect behavioural responses were included in the statisticalanalyses; this contrasts with previous studies that employedemotional Stroop-like paradigms (e.g., Schirmer and Kotz, 2003;Goto et al., 2010, 2013) where all trials were averaged for the N400component. By focusing here on correct trials when there wasevidence that the registered Stroop effect—i.e., the N400 differencebetween emotionally incongruent and congruent conditions—re-flected true instances when participants attended to the targetchannel, we hoped to achieve a stronger test of the cost or benefitof the unattended information channel. The N400 literature sug-gests that this component is typically largest in the centro-parietalregion during the 200–600 ms post-stimulus time window (Kutasand Federmeier, 2011), with more frontal distribution for visualstimuli (Ganis et al., 1996) and auditory language stimuli (see re-view by Kutas and Van Petten, 1994). Therefore, we expected toobserve the N400 component in this corresponding area fallingwithin the 200–600 ms time span and restricted our ERP dataanalyses as such (for a discussion of ERP data quantification, see

Page 6: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

1 Effect size, calculated as r¼√F/(Fþdferror) (Rosnow and Rosenthal, 2003).

P. Liu et al. / Neuropsychologia 67 (2015) 1–136

Handy, 2004). Visual inspection of the averaged ERPs for eachcondition of each group confirmed these expectations, with anegative deflection after 250 ms post-stimulus in the incongruentversus congruent trials, distributed primarily in the middle fronto-centro-parietal area. Accordingly, 15 electrodes from this region(FC1, FC2, Cz, C1, C2, C3, C4, CPz, CP1, CP2, CP3, CP4, Pz, P1, and P2)were selected for subsequent analyses. For these electrodes, anexploratory examination of the peak latency of the negativitieswithin the 200–600 ms time window yielded an averaged peaklatency of 337 ms (range 327–363 ms) across conditions andgroups. Based on this observation, which conformed to both pre-vious literature and visual inspection, mean amplitude in the 250–450 ms time window was extracted for the N400 analysis for se-lected electrodes.

For both the behavioural and N400 data, a three-step analysiswas performed to focus each analysis on patterns of specific the-oretical importance in a clear and incremental manner. First, two-way repeated-measures ANOVAs were conducted on accuracy, RT,and the mean amplitude of 250–450 ms time window across se-lected electrodes to examine whether there was an effect of con-gruence elicited in each condition, separately for each group. Foreach ANOVA, congruence of the trials (incongruent and congruent)and emotion of the attended channel (i.e., voices in the voice task,faces in face task; fear and sadness) were included as two within-subject factors.

Second, behavioural and neural indices of the congruence effectwere analyzed in the form of difference scores to directly comparethe two cultural groups, while controlling for extraneous factors(e.g., repetition and/or motor-related effects; see Goto et al., 2010,2013; Ishii et al., 2010; Tanaka et al., 2010). Two difference scoreswere computed for the behavioural data for each participant ineach condition: difference accuracy as the subtraction of accuracyin the incongruent trials from that in the congruent trials (con-gruent–incongruent); and difference RT as the subtraction of RT inthe congruent trials from that in the incongruent trials (incon-gruent–congruent). For the ERP data, a difference wave (i.e., dif-ference N400) was obtained from each selected electrode bysubtracting the mean amplitude of the 250–450 ms time windowin the congruent trials from that of the incongruent trials (in-congruent–congruent). Differences between congruent and in-congruent trials in behavioural or neural responses reflect theextent to which the information from the task-irrelevant channelis integrated into the task-relevant channel (i.e., how much themismatching information from the task-irrelevant channel inter-feres with the target task, and how much the matching informa-tion from the task-irrelevant channel facilitates the target task).Therefore, higher difference scores and larger difference wave in-dicate a larger congruence effect. Third, to specify the impact ofculture on the integration effect, a three-way repeated-measuresANOVA was conducted on difference accuracy, difference RT, andthe mean amplitude of difference N400 across selected electrodes,separately. Task (face task and voice task) was considered as thewithin-subjects factor, while Culture (Chinese and English) andGender (female and male) were taken as two between-subjectsfactors.

In addition to hypothesized changes in the N400, visual in-spection showed an earlier positive-going deflection around 200–300 ms after stimulus onset in the parieto-occipital region in thevoice task of the English group, but not in the other conditions. Toquantify this potential earlier effect, 11 parieto-occipital electrodes(Oz, O1, O2, POz, PO3, PO4, Pz, P1, P2, P3, and P4) were selectedbased on visual inspection and the mean amplitude of the timewindow 200–300 ms was extracted for each electrode. A similarthree-step analysis (i.e., two-way ANOVA of Congruence andEmotion; calculation of difference wave; three-way ANOVA ofCulture, Task, and Gender) was conducted on the amplitude values

of this time window across the selected electrodes.Finally, to explore potential associations between the partici-

pants' behaviour and on-line brain responses to cross-sensoryemotional stimuli, Pearson correlation analyses (two-tailed) wereconducted between the behavioural difference scores (accuracy;RT) and the amplitude of difference N400 in each experimentalcondition for each group, as well as between groups for eachcondition.

3. Results

3.1. Behavioural results

The ANOVAs performed on the behavioural data for each con-dition for each group showed a significant main effect of Con-gruence on accuracy rates (Chinese face task, F(1,19)¼7.114, po .01,r1¼0.522; Chinese voice task, F(1,19)¼6.967, po .01, r¼0.518;English face task, F(1,18)¼5.406, po .01, r¼0.481; English voicetask, F(1,18)¼8.197, po .01, r¼0.559) and on response times(Chinese face task, F(1,19)¼9.212, po .01, r¼0.571; Chinese voicetask, F(1,19)¼8.031, po .01, r¼0.545; English face task, F(1,18)¼5.037, po .05, r¼0.468; and English voice task, F(1,18)¼6.859,po .01, r¼0.525). Overall, lower accuracy and longer responsetimes were observed when participants encountered emotionallyincongruent face–voice pairs than congruent pairs. No significanteffect involving the Emotion was found for these analyses (allps4 .07).

Analysis of the difference scores revealed a significant inter-action between Task and Culture on difference accuracy (F(1,35)¼4.767, po .05, r¼0.346), but no such effect was found on differ-ence RT (p¼ .17). Simple effect analysis of difference accuracy in-dicated that the Chinese group showed no difference in perform-ing the face task versus the voice task (p¼ .26); however, theEnglish group exhibited a significant Task effect (F(1,35)¼20.921,po .01, r¼0.612) explained by a larger congruence effect in thevoice task than in the face task. The effect of culture was sig-nificant in the voice task (F(1,35)¼3.912, po .05, r¼0.317), asChinese participants showed a smaller congruence effect than theEnglish group when attending to vocal emotions; no such culturaldifference was noted in the face task (p¼ .20). No effect involvingGender was found (ps4 .21; see Fig. 3 for illustration; see Table 2for means and standard deviations of the behavioural data).

3.2. ERP results

The two-way ANOVAs performed on the mean amplitude of the250–450 ms time window showed a significant main effect ofCongruence across the selected electrodes (i.e., evidence of theN400 component) in each condition for each group (Chinese facetask, F(1,19)¼6.140, po .01, r¼0.494; Chinese voice task, F(1,19)¼6.375, po .01, r¼0.501; English face task, F(1,18)¼5.431, po .05,r¼0.481; and English voice task, F(1,18)¼9.786, po .01, r¼0.593).More negative going amplitudes were observed in response toemotionally incongruent face–voice pairs relative to congruentpairs, demonstrating a Stroop/congruence effect elicited at theneural processing level. Again, no significant effect involving theEmotion was found (ps4 .27; see Fig. 4A).

The following analysis on difference N400 revealed a sig-nificant interaction between Task and Culture (F(1,35)¼5.619,po .05, r¼0.372). Step-down analyses indicated that while noeffect of Task was observed in the Chinese group (p¼ .31), sig-nificant task differences characterized the English group (F(1,35)¼

Page 7: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

Fig. 3. Difference accuracy in the voice and face task for each cultural group.

Table 2Mean behavioural accuracy and response time (RT) for each group in each ex-perimental condition (standard deviation in parentheses).

Group Task Condition Emotion (face–voice) Accuracy RT (ms)

Chinese Face Congruent Sad–sad 0.92(0.05) 745(263)Fear–fear 0.91(0.04) 755(273)

Incongruent Sad–fear 0.89(0.09) 776(313)Fear–sad 0.88(0.06) 770(256)

Voice Congruent Sad–sad 0.87(0.08) 784(354)Fear–fear 0.87(0.05) 780(332)

Incongruent Sad–fear 0.84(0.08) 810(183)Fear–sad 0.83(0.07) 802(347)

English Face Congruent Sad–sad 0.94(0.05) 699(184)Fear–fear 0.93(0.07) 700(215)

Incongruent Sad–fear 0.91(0.05) 718(165)Fear–sad 0.90(0.06) 715(268)

Voice Congruent Sad–sad 0.91(0.08) 760(305)Fear–fear 0.90(0.07) 757(271)

Incongruent Sad–fear 0.80(0.13) 781(318)Fear–sad 0.79(0.09) 782(298)

2 An omnibus five-way ANOVA performed on the behavioural and N400 data,with Congruence of the trials, Emotion of the attended channel, and Task as within-subjects factors, and Culture and Gender as between-subjects factors, yielded con-sistent results with that of the three-step analysis. In particular, significant effect ofCongruence was observed in behavioural response time (F(1,35)¼8.036, po .01,r¼0.432), behavioural accuracy (F(1,35)¼9.164, po .01, r¼0.456), and the ampli-tude of N400 (F(1,35)¼8.373, po .01, r¼0.439), demonstrating a Stroop effect onboth behavioural and neural semantic levels. Moreover, in the accuracy data, asignificant three-way interaction was found between Congruence, Task, and Culture(F(1,35)¼5.501, po .05, r¼0.369). Step-down analysis exhibited that the effect ofCulture is significant only in the incongruent trials of the voice task (F(1,35)¼4.517,po .05, r¼0.338) where the Chinese group outperformed the English, but not inany other condition. This pattern suggests that compared to the Chinese, theEnglish participants were impeded to a larger degree by incongruent to-be-ignoredfaces in recognizing voices. In the N400 data, a significant interaction betweenCulture and Task (F(1,35)¼5.705, po .05, r¼0.374) was observed; larger N400 wasfound in the English group than in the Chinese group in the voice task (F(1,35)¼7.113, po .01, r¼0.411), but not in the face task (p¼ .23). A significant three-wayinteraction between Congruence, Task and Culture was also observed (F(1,35)¼6.889, po .01, r¼0.406). Specifically, the effect of Task is significant in the incon-gruent condition of the English group only but not in the Chinese (F(1,35)¼5.410,po .05, r¼0.366); in the English group, a larger N400 was elicited by the incon-gruent trials of the voice task compared to that of the face task. Again, these ob-servations suggest that the English participants were distracted to a larger degreeby incongruent facial expressions in the voice task. No significant effect involvingthe Emotion of the attended channel and Gender was found (ps4 .11). Last, the five-way ANOVA was also conducted on the mean amplitude of the early positivity(200–300 ms) across selected electrodes. The results showed a significant three-way interaction between Congruence, Task, and Culture, according to which theeffect of congruence was significant in the voice task of the English group only (F(1,35)¼6.001, po .01, r¼0.383), but not in any other condition, suggesting that onlythe English participants were influenced by irrelevant faces in the voice task to asignificant degree. These results demonstrate consistent patterns with that of thethree-step analysis, demonstrating culture-specific sensitivities towards differentinformation sources in emotion perception.

P. Liu et al. / Neuropsychologia 67 (2015) 1–13 7

10.587, po .01, r¼0.482) who showed a larger difference N400 inthe voice task than in the face task. Cultural effect was significantin the voice task only (F(1,35)¼7.257, po .01, r¼0.414), with theChinese group showing a smaller difference N400 than the Englishgroup (see Fig. 5). No effect involving Gender was found (ps4 .11).

For the earlier positivity observed in the parieto-occipital area,the two-way ANOVAs showed a significant main effect of con-gruence on the mean amplitude of the 200–300 ms time windowin the voice task of the English group only (F(1,18)¼8.906, po .01,r¼0.575), not in any other condition (ps4 .13). The incongruenttrials in the voice task elicited a more positive-going amplituderelative to the congruent ones in the English group, but not in theChinese group. No significant effect involving the emotion of theattended channel was found (ps4 .24). The following three-wayanalysis on the difference wave revealed a significant effect ofCulture (F(1,35)¼6.113, po .05, r¼0.386) and a significant inter-action between Task and Culture (F(1,35)¼5.174, po .05, r¼0.359).Post-hoc comparisons suggested that the English group showed alarger difference wave between congruent and incongruent con-ditions than the Chinese group (F(1,35)¼6.062, po .05, r¼0.384);this cultural difference was significant in the voice task (F(1,35)¼8.066, po .01, r¼0.433) but not in the face task (p4 .23). Again, no

effect involving Gender was found (ps4 .15; see Fig. 4B).2

3.3. Correlation results

Pearson correlations demonstrated significant/marginally sig-nificant positive correlations between accuracy (difference score)and N400 amplitudes (difference score) for each condition of eachgroup: Chinese face task, r(18)¼ .298, po .05; Chinese voice task, r(18)¼ .207, p¼ .061; English face task, r(17)¼ .250, p¼ .05; andEnglish voice task, r(17)¼ .301, po .05. The analysis also yieldedsignificant positive correlations across the two groups in eachcondition between accuracy and N400: face task: r(37)¼ .270,po .05; voice task, r(37)¼ .265, po .05. No significant relationshipbetween difference RT and difference N400 was observed(ps4 .10). In general, larger difference accuracy was associatedwith larger difference N400 for each group, suggesting that thecross-channel emotion integration occurred at both behaviouraland neural semantic levels in a consistent and related manner.

4. Discussion

To our knowledge, this is one of the first studies that examinedhow Chinese participants perceive multi-sensory emotions incomparison to individuals from a Western culture. The patternobserved in the Chinese group was consistent with that foundwhen Japanese participants processed cross-channel emotionalstimuli, where they showed lower sensitivity to facial (Tanakaet al., 2010) and semantic information (Ishii et al., 2003; Kitayamaand Ishii, 2002) than Westerners. Our results are also in line with aprevious observation in non-emotional communication that Ja-panese speakers use visual cues less than English speakers wheninterpreting audiovisual speech (Sekiyama and Tohkura, 1991). Thecompatibility between the present study and the relevant litera-ture suggests that the cultural characteristics previously found in

Page 8: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

Fig. 4. (A) Grand averages elicited by congruent trials (solid lines), incongruent trials (dotted lines), and the difference wave (dashed lines; incongruent–congruent) forChinese and English participants in the Face and Voice task, measured at (A) Cz electrode and (B) POz electrode.

P. Liu et al. / Neuropsychologia 67 (2015) 1–138

the Japanese population (i.e., higher reliance on vocal cues) areshared by the Chinese group, which is another major East Asiancollectivist culture that shares certain cultural similarities with theJapanese culture. More importantly, this study provides novelevidence on the cultural effect on the neural activities underlyingemotional processing and communication. The modulatory effectof culture was found not only in behavioural performance, but alsoin N400 during semantic processing. In the literature on visualperception, cultural effect has been reported on N400 in visualemotion/object perception (Goto et al., 2010, 2013). The presentstudy provides the first neurophysiological evidence that this ERPcomponent is sensitive to culture in the domain of audio–visualemotion perception, and broadens the knowledge of the role ofcultural learning on perception and cognition in general.

4.1. The findings of this study

As predicted, our results demonstrate significant cultural dif-ferences between Chinese and English groups in processing multi-channel emotional stimuli: whereas Chinese participants did not

show significant differences between the face and voice tasks, theEnglish group was influenced to a larger extent by facial cues inthe voice task than by vocal cues in the face task. These differenceswere evident in behavioural accuracy, the N400 component, aswell as a positive deflection that occurred around 200ms post-stimulus onset in select conditions (as elaborated below).

Regardless of cultural origin, a significant congruence effectwas observed at both the behavioural and electrophysiologicallevel in each experimental condition, as participants performedworse in incongruent versus congruent trials with increased N400component. This confirms previous reports that emotion integra-tion across faces and voices can be observed at both the beha-vioural and neural levels (Collignon et al., 2008; de Gelder andVroomen, 2000; Föcker et al., 2011; Ishii et al., 2010; Massaro andEgan, 1996; Mitchell, 2006; Nygaard and Queen, 2008; Schirmerand Kotz, 2003). Moreover, as the congruence effect was not in-fluenced by the emotional meaning of the facial or vocal expres-sions, it seems that the process of cross-channel semantic in-tegration is relatively independent of the specific emotions beingappraised. Importantly, while the congruence effect was observed

Page 9: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

Fig. 5. (A) Grand averages at Cz electrode and topographic maps of difference N400 for Chinese and English participants in each task. (B) Mean amplitude values ofdifference N400 averaged across selected electrodes by task and group.

P. Liu et al. / Neuropsychologia 67 (2015) 1–13 9

in each attentional condition for each group, there was a sig-nificant distinction between groups in the size of the congruenceeffect as a function of task. The English group exhibited a largereffect in the voice task than the face task, indicated by larger dif-ference accuracy scores and larger difference N400 in the voicetask, whereas no between-task difference was observed for Chi-nese participants. These observations suggest that the Englishparticipants were distracted to a larger degree by to-be-ignoredfaces in the voice task than by to-be-ignored voices in the facetask, whereas no such pattern was observed in the Chinese group.These results are compatible with the hypothesis that individualsfrom Western cultures are relatively more oriented to facial versusvocal cues during communication in comparison with individualsfrom East Asian cultures (Tanaka et al., 2010).

The N400 effect observed in each experimental conditionshowed similar temporal and spatial characteristics to those of thesemantic-related N400 reported in previous literature (Kutas andFedermeier, 2011). In particular, it has been argued that the N400effect reflects the processing of mismatching semantic/emotionalcues from different information sources, and the larger amplitudeof N400 is associated with increased cognitive processing

resources necessary to integrate mismatching meanings fromdifferent sources (Brown and Hagoort, 1993; Brown et al., 2000;Chwilla et al., 1995; Holcomb and Neville, 1990; Kotz and Paul-mann, 2007; Paulmann and Kotz, 2008; Paulmann et al., 2008;Schirmer et al. 2002). While this is the typical interpretation of theStroop-induced N400 effect adopted by previous studies that aredirectly relevant to this one (Ishii et al., 2010; Schirmer and Kotz,2003), an alternative interpretation of the N400 effect is that itreflects top-down conflict control functions. When mismatchinginformation from two different sources is presented simulta-neously in interference tasks, the participant has to focus on onesource and inhibit conflicting information activated by the othersource. It has thus been argued that the Stroop-elicited N400-likecomponent in the fronto-central area is associated with conflictmonitor and inhibition control (Dong et al., 2011; Liotti et al.,2000), which may arise from the anterior cingulate cortex andfrontal cortex (Etkin et al., 2006; Haas et al., 2006). In the presentstudy, it is possible that this conflict control process is vulnerableto the modulatory effect of acquired cultural biases about theimportance of specific social cues in communication which impacton multi-sensory emotion perception, yielding our observation

Page 10: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

P. Liu et al. / Neuropsychologia 67 (2015) 1–1310

that English participants found it more difficult to inhibit irrele-vant facial information than participants in the Chinese group.

In addition to the hypothesized N400 effect, a significant con-gruence effect on a parietal-occipital positivity around 200-300mswas also observed in the EEG data (i.e., larger amplitude elicited inincongruent trials relative to congruent trials), but exclusively inthe voice task for the English group. The temporal and spatialfeatures of this deflection resemble those of a posterior P2 com-ponent reported in the literature, which is considered originatingfrom posterior visual association cortex and related to visual at-tention; more specifically, larger posterior P2 has been observed inresponse to attended affective stimuli (Carretié et al., 2001; Eimeret al., 2001; Talsma et al., 2005). In our experiment, this increasedpositivity found in the voice task may reflect involuntary attentionshift to the irrelevant facial information. Interestingly, this wasonly observed for English and not Chinese participants, providingpotentially new support for the argument that individuals frommany Western cultures are more attentive to facial cues whencompared to East Asian participants, based on highly sensitive, on-line brain responses to emotional stimuli.

While the English group showed the expected cognitive biasesto irrelevant faces in the voice task (e.g., larger congruence effectin accuracy and N400), no difference between the two tasks wasobserved in the Chinese group. This is somewhat unexpected dueto the assumption that Chinese participants, like Japanese in-dividuals (Ishii et al., 2003), would be more oriented to vocal cuesand therefore show a larger congruence effect in the face task thanthe voice task. This pattern could be especially pronounced in thecontext of processing negative emotions as in the present study,given that Chinese/East Asian display rules tend to mask negativefacial expressions (Ekman, 1971; Matsumoto et al., 2008, 2005),meaning that these individuals may rely more on tonal cues tounderstand negative feelings (Ishii et al., 2010). Indeed, the ab-sence of the expected voice bias in the Chinese group here couldbe partly due to our focus on only negatively-valenced emotions,sadness and fear; encountering explicit, unmasked expressions ofnegative emotion may well have been more salient and capturedrelatively greater attention for Chinese participants who are lessaccustomed to these expressions in daily life, attenuating the sizeof the Stroop effect in the voice judgment condition. Future studieswill need to employ other sets of emotions, including positiveemotions, to elaborate our findings. Nonetheless, in spite of thelack of differences between tasks for the Chinese participants,when compared to English participants they showed significantlysmaller congruence effects in the voice task, although not in theface task. This suggests that in the voice task, Chinese adults hadless difficulty ignoring the faces and more easily attuned to thevoices when compared to English listeners; this pattern is com-patible with results reported in the behavioural literature invol-ving the Japanese population (Ishii et al., 2003; Kitayama and Ishii,2002; Tanaka et al., 2010). Moreover, given that these previousstudies focused on the perception of emotions differing in posi-tive/negative valence (e.g., pleasant and unpleasant; happinessand anger), the current study extends previous work by showingthat the integration of two discrete negative emotions (sadnessand fear) is subject to the influence of culture as well.

Although facial and vocal stimuli were carefully validated andcontrolled for perceived emotional salience between commu-nication channels, a majority of participants in each group re-ported that it was more demanding to identify the emotion ofvocal expressions in pseudo-sentences than of facial expressionswhen debriefed after the experiment. Given that pseudo-sen-tences were somewhat unusual and less familiar than facial dis-plays, it is possible that for both groups, vocal expressions wererelatively easier to ignore in the face task, whereas the faces aremore difficult to ignore in the voice task, leading to a potential

ceiling effect in the face task. During the task, participants wereadvised that the pseudo-sentences they heard were semanticallymeaningless, and in the voice task condition, they were furtherinstructed to pay attention to the tone of the utterance withoutwondering about the semantic meaning; however, one can stillnot exclude the possibility that participants were involuntarilydistracted by the unusual linguistic content of the pseudo-sen-tences. This is especially possible for the English participants, gi-ven suggestions that they are more attuned to semantic meanings(e.g., Kitayama and Ishii, 2002; Ishii et al., 2003). If true, the largercongruence effect expected in the face task relative to the voicetask for the Chinese group might have been mitigated by differ-ences in stimulus features; likewise, in the English group, the largecongruence effect observed in the voice task may have been fa-cilitated for this reason. In future investigations, using faces ofreduced visibility (e.g., by adding dynamic noise to the images;Tanaka et al., 2010), or that are physically degraded in somemanner to be more comparable in their task demands to pseudo-utterances, might help to reduce the apparent ceiling effect in theface task and better reveal the impact of vocal cues in the Chinesegroup.

Despite the atypical nature of pseudo-utterances (which werenecessary to isolate emotion in the voice), it should be underlinedthat emotion recognition rates and intensity ratings of stimuli inboth channels were high overall and carefully matched acrosschannels. Moreover, the semantic content of the pseudo-sentencesdid not bear any emotional meaning or emotional relationship toeither the voice or the face; consequently, no match or mismatchbetween the semantic content of the sentences and their emo-tional tones could be made. Thus, any potential distraction effectof the (pseudo) content on judging vocal tones would not differbetween the congruent and incongruent conditions and would becancelled out for each group when the Stroop effect was calcu-lated. A similar argument can be made with respect to otherpossible limitations of our stimuli; apart from emotional features,there were other obvious (hypothesis-irrelevant) mismatches be-tween the facial and vocal stimuli. For instance, the identities ofthe faces were not consistently matched with the identity of ac-companying voices, and facial expressions were static whereas thevocal expressions were dynamic. These discrepancies between thetwo information sources of interest are undoubtedly registered bythe neural system in each task condition as well. However, thesefactors were again controlled by focusing on the difference be-tween the congruent and the incongruent conditions in the dataanalysis, which only varied according to the emotional relation-ship of the two events. Similarly, the fact that stimuli presented toeach cultural group were physically distinct—i.e., each group waspresented a unique set of culturally-appropriate (in-group) stimuli—does not represent a limitation of our design, as group compar-isons always considered the difference in the Stroop effect be-tween the face and the voice tasks in each cultural context. Thus, itis unlikely that the robust group differences we observed areconfounded by the use of pseudo-utterances or other uncontrolledstimulus features in our experiment; rather, our data convincinglyargue that English participants were more attuned to the emo-tional faces than to voices, whereas the Chinese participants ex-hibited a more balanced pattern.

As a result, this experiment extends previous findings onemotion perception in East Asians by contributing new ERP evi-dence that culture affects emotional meaning integration at neuralprocessing stages that precede behavioural judgements. Whilethere are data showing that the N400 is modulated by culture inthe domain of visual perception (Goto et al., 2010, 2013), our re-sults demonstrate that this component is also a reliable neuralindex of cultural effects during audio–visual emotion perception.Our results also highlight a posterior positivity occurring around

Page 11: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

P. Liu et al. / Neuropsychologia 67 (2015) 1–13 11

200–300 ms post-stimulus onset that could represent an addi-tional neural index of acquired cultural differences on attentionallocation during emotion perception. This implies that even ear-lier processing stages implicated during multi-sensory emotionperception (e.g., de Gelder et al., 1999) may be amenable to cul-tural learning and vary between cultures as well.

4.2. How culture guides emotion processing and communication

While it is increasingly clear that cultural differences shape theperception and cognitive evaluation of emotional expressions,leading to distinct cultural biases in the uptake of particular socialcues, how can these differences be explained and how do theyshape underlying neurocognitive mechanisms? In the case ofemotion communication, it has been argued that culture-specificdisplay rules, which play a critical role in regulating how emotionsare expressed in socially appropriate ways, play a role in emotionperception (Engelmann and Pogosyan, 2013; Ishii et al., 2003; Parkand Huang, 2010). In a specific culture, display rules refer to socialnorms about when, how, to whom, and to what extent one'sfeelings should be displayed in various social settings, if at all(Ekman et al., 1969; Engelmann and Pogosyan, 2013; Matsumoto,1990; Matsumoto et al., 2002, 2008). In contrast to Western in-dividualist cultures, it has been said that East Asian collectivistcultures consider social harmony within a group as more valuableand important than the needs of individuals (Hall and Hall, 1990;Scollon and Scollon, 1995). These values are reinforced by culture-specific display rules to avoid social conflicts and to maintainharmony (Gudykunst et al., 1996; Matsumoto et al., 1998). Thus,for East Asian cultures, indirectness in communication, e.g., un-finished sentences or vague linguistic expressions, is more com-mon and socially-acceptable than direct references when com-pared to Western cultures (Bilbow, 1997) to prevent potentialconflicts between interlocutors. It has also been shown that Ja-panese participants tend to mask their negative facial expression(e.g., by putting on a smile) when another person is present,whereas Americans convey similar negative faces irrespective ofthe presence of another person (Ekman, 1971; Markus and Ki-tayama, 1991; Matsumoto et al., 1998, 2008, 2005). East Asians arealso more inclined to avoid eye contact during face-to-face inter-actions than Westerners, as a sign of respect and politeness toothers (Hawrysh and Zaichkowsky, 1991; McCarthy et al., 2006,2008).

These culture-specific display rules result in sustained learningexperiences about emotional expressions, including routine par-ticipation in behavioural practices and exposure to particularpatterns of emotional expressions of others; this, in turn, shapesperceptual and attentional biases for extracting and perceivingemotional information (Nisbett et al., 2001). In East Asian cultures,the more frequent use of indirectness in speech, the constrainedfacial expressions, and reduced eye contact in comparison withWestern cultures, may cause East Asians to rely more on cues fromother sources, e.g., prosodic cues, to convey their feelings effec-tively (Ambady et al., 1996; Hall, 1976; Markus and Kitayama,1991). Thus, cultural specificities in the availability of emotionalinformation from different sources, which to a large extent is de-termined by display rules, may well contribute to ingrained biasesin perception and attention when perceiving emotions. Arguably,our data show that these biases are rooted in specific culturallearning to such a degree that they have an in-depth impact, notonly on behaviour but on early neurocognitive operations thatintegrate and elaborate the meanings encoded by multi-channelemotion expressions (de Gelder et al., 1999; Ishii et al., 2010;Schirmer and Kotz, 2003).

While it is reasonable to infer that acquired knowledge aboutdisplay rules underlies the differences we observed between

Chinese and English participants, an alternative interpretationcentres on linguistic characteristics of the language spoken byeach group. Unlike English, Mandarin Chinese is a tone language inwhich syllables are marked by four different pitch patterns tolexically distinguish words with otherwise similar sound patterns.It is possible that Mandarin speakers, who constantly attend tovariations in pitch to ascertain word meanings, develop a highersensitivity to subtle changes in voice/pitch cues not only in lexicalprocessing but when vocal cues are encountered in other speechcontexts, such as when they convey emotions. Here, this couldpartly explain the sensitivity of Mandarin speakers to emotionalvocal cues when processing pseudo-speech. Interestingly, the twoother East Asian languages that have been studied previously inthis literature, Japanese (Ishii et al., 2003; Kitayama and Ishii,2002; Tanaka et al., 2010) and Tagalog (Ishii et al., 2003), are notcomplex tone languages; rather, each uses “pitch accents”, a sim-plified use of tonal information to mark certain word meanings(e.g., in Japanese, “ame” with a low-high pitch means “candy”,whereas “ame” with a high-low pitch means “rain”). Since the twopreviously-studied Western languages, English and Dutch, useneither tonal contrasts nor ‘pitch accents’, it is plausible that evena simplified use of tone/pitch accents in East Asian languagespromotes a bias to pay greater attention to vocal (emotion) cues,although perhaps to a lesser extent than in Chinese or other lan-guages with complex tone systems. One way to begin teasing apartthese factors is to administer a similar protocol to participantsfrom an East Asian culture that shares many of the display rulefeatures of collectivist cultures, but who speak a language withoutstrong tonal features at the lexical level, such as standard Korean.

4.3. Future directions

The present study supplies new evidence that culture mod-ulates human behaviour and real-time brain responses duringemotional communication; in addition, it points to several newdirections to build on these findings. As noted earlier, future stu-dies that consider a broader range of emotional categories areneeded to clarify whether the observed cultural differences aregeneralizable to other discrete emotions, especially positive emo-tions, given that different socio-cultural meanings are often at-tributed to positive expressions and events (Lu and Gilmour, 2004;Oishi et al., 2013; Uchida et al., 2004). Individual differences insocial orientation and/or cultural values (e.g., independence vs.interdependence) have also been associated with cognitive pro-cessing biases, modulating neurophysiological responses to emo-tional information (Ishii et al., 2010; Goto et al., 2010). These re-sults make a strong case for examining not only how one's culturalbackground, but also how individual differences along socio-cul-tural dimensions, interact to shape the neurocognitive systemdevoted to emotional processing and related areas of perceptionand cognition.

In this work, Chinese participants had just immigrated to Ca-nada or recently arrived in Montreal to attend university. Despitethe fact that most of them had fair/good knowledge of English andwere being exposed to a Western culture, our results illustraterobust differences between the Chinese and English groups.Nonetheless, this raises the question of how the extent of culturalexposure, and different types of cultural experiences—e.g., short-vs. long-term cultural immersion, different ages of arrival in theforeign country, and different degrees of acculturation, etc.—shapebrain functioning during communication, a topic that has receivedscant attention to date. This question also touches on the idea ofwhether acquired biases related to cultural display rules, whicharguably motivated the differences we observed in emotioncommunication, would transfer to another language and culture(see Elfenbein, 2013 for a related view). That is, would culture-

Page 12: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

P. Liu et al. / Neuropsychologia 67 (2015) 1–1312

specific neural responses observed here persist when participantsare presented out-group stimuli that reflect the cultural norms of aforeign culture? Answering this question is another task for futurework.

Finally, the likelihood that neurocognitive responses to emo-tional stimuli occur at an even earlier stage than was the focushere, i.e., the stage of semantic retrieval and meaning integration(N400), merits further investigation. In a seminal paper, de Gelderet al. (1999) presented evidence that cross-sensory emotion in-tegration begins very early after stimulus onset; using an Oddballtask, they reported an auditory MMN-like component peaking at178 ms for deviant face–voice pairs relative to standards, a typicalneural index of the early pre-attentive processing of sensory input(Näätänen, 1992). This suggests that multi-sensory integration ofemotion begins at a very early stage, prior to 200 ms post-stimulusonset (Jessen and Kotz, 2011; Pourtois et al., 2000). While thesedata do not reveal whether culture modulates brain responses atthis early processing stage, this possibility seems reasonable inlight of our observation that a posterior positivity around 200–300 ms post-stimulus onset differed between Chinese and Englishparticipants. We are now testing this question directly using anOddball paradigm similar to that used by de Gelder et al. (1999) toevaluate cultural effects on earlier stages of perceptual processingof multi-sensory emotions. Future studies are also tasked withlocalizing the neural generators involved in these cultural effects(e.g., Gutchess et al., 2006). Interestingly, some neuroimaging datasuggest that culture shapes fronto-parietal functions involved intop-down executive control; culturally-disfavoured tasks, whichrequire more attentional resources that culturally-favoured tasks,appear to induce greater activity within these brain regions(Hedden et al., 2008). It is possible that culturally-dictated changeswithin this network, which is also implicated in multi-sensoryemotion processing (Ethofer et al., 2006), contribute to observedcultural effects on the N400 which are likely to engage both se-mantic processing and conflict control processes. As a long-termgoal, aligning data from ERPs with fMRI will provide richer in-sights into the relations between culture, emotion, and the brain.

Acknowledgement

This research was supported by an Insight Grant (435-2013-1027) awarded to M.D. Pell from the Social Sciences and Huma-nities Research Council of Canada, and by a doctoral fellowship(Chinese Government Award for Outstanding Self-financed Stu-dents Abroad) awarded to P. Liu. These funding sources had noinvolvement in the design or conduct of this study.

References

Ambady, N., Koo, J., Lee, F., Rosenthal, R., 1996. More than words: linguistic andnonlinguistic politeness in two cultures. J. Personal. Soc. Psychol. 70 (5),996–1011.

Banse, R., Scherer, K.R., 1996. Acoustic profiles in vocal emotion expression. J.Personal. Soc. Psychol. 70 (3), 614–636.

Beaupre, M.G., Hess, U., 2005. Cross-cultural emotion recognition among Canadianethnic groups. J. Cross-Cultur. Psychol. 36 (3), 355–370.

Bilbow, G.T., 1997. Spoken discourse in the multicultural workplace in Hong Kong:applying a model of discourse as ‘impression management’. In: Bargiela-Chiappini, F., Harris, S. (Eds.), The languages of Business: An InternationalPerspective. Edinburgh University Press, Edinburgh, pp. 21–48.

Boersma, P., Weenink, D., 2001. Praat, a system for doing phonetics by computer.Glot Int. 5 (9/10), 341–345.

Bower, G.H., 1981. Mood and memory. Am. Psychol. 36 (2), 129–148.Bowers, D., Bauer, R.M., Heilman, K.M., 1993. The nonverbal affect lexicon: theo-

retical perspectives from neuropsychological studies of affect perception.Neuropsychology 7 (4), 433–444.

Brown, C., Hagoort, P., 1993. The processing nature of the N400: evidence frommasked priming. J. Cogn. Neurosci. 5 (1), 34–44.

Brown, C., Hagoort, P., Chwilla, D., 2000. An event-related brain potential analysis ofvisual word-priming effects. Brain Lang. 72 (2), 158–190.

Carretié, L., Mercado, F., Tapia, M., Hinojosa, J.A., 2001. Emotion, attention, and the‘negativity bias’, studied through event-related potentials. Int. J. Psychophysiol.41 (1), 75–85.

Castro, S.L., Lima, C.F., 2010. Recognizing emotions in spoken language: a validatedset of Portuguese sentences and pseudosentences for research on emotionalprosody. Behav. Res. Methods 42 (1), 74–81.

Chwilla, D.J., Brown, C.M., Hagoort, P., 1995. The N400 as a function of the level ofprocessing. Psychophysiology 32 (3), 274–285.

Collignon, O., Girard, S., Gosselin, F., Roy, S., Saint-Amour, D., Lassonde, M., Lepore,F., 2008. Audio–visual integration of emotion expression. Brain Res. 1242,126–135.

de Gelder, B., Bocker, K.B.E., Tuomainen, J., Hensen, M., Vroomen, J., 1999. Thecombined perception of emotion from voice and face: early interaction re-vealed by human electric brain responses. Neurosci. Lett. 260 (2), 133–136.

de Gelder, B., Vroomen, J., 2000. The perception of emotions by ear and by eye.Cogn. Emot. 14 (3), 289–311.

Delorme, A., Makeig, S., 2004. EEGLAB: an open source toolbox for analysis ofsingle-trial EEG dynamics. J. Neurosci. Methods 134 (1), 9–21.

Dong, G., Zhou, H., Zhao, X., 2011. Male Internet addicts show impaired executivecontrol ability: evidence from a color-word Stroop-like task. Neurosci. Lett. 499(2), 114–118.

Eimer, M., Cockburn, D., Smedley, B., Driver, J., 2001. Cross-modal links in en-dogenous spatial attention are mediated by common external locations: evi-dence from event-related brain potentials. Exp. Brain Res. 139 (4), 398–411.

Ekman, P., 1971. Universals and cultural differences in facial expressions of emotion.In: Cole, J. (Ed.), Nebraska Symposium on Motivation. University of NebraskaPress, Lincoln, NE, pp. 207–282.

Ekman, P., Friesen, W.V., 1978. Facial Action Coding System: A technique for theMeasurement of Facial Action. Consulting Psychologist Press, Palo Alto, CA.

Ekman, P., Sorenson, E.R., Friesen, W.V., 1969. Pan-cultural elements in facial dis-plays of emotion. Science 164 (3875), 86–88.

Elfenbein, H., 2013. Nonverbal dialects and accents in facial expressions of emotion.Emot. Rev. 5, 90–96. http://dx.doi.org/10.1177/1754073912451332.

Engelmann, J.B., Pogosyan, M., 2013. Emotion perception across cultures: the role ofcognitive mechanisms. Front. Psychol. 4, 118.

Ethofer, T., Pourtois, G., Wildgruber, D., 2006. Investigating audiovisual integrationof emotional signals in the human brain. Progr. Brain Res. 156, 345–361.

Etkin, A., Egner, T., Peraza, D.M., Kandel, E.R., Hirsch, J., 2006. Resolving emotionalconflict: a role for the rostral anterior cingulate cortex in modulating activity inthe amygdala. Neuron 51 (6), 871–882.

Föcker, J., Gondan, M., Röder, B., 2011. Preattentive processing of audio–visualemotional signals. Acta Psychol. 137 (1), 36–47.

Ganis, G., Kutas, M., Sereno, M.I., 1996. The search for “common sense”: an elec-trophysiological study of the comprehension of words and pictures in reading.J. Cogn. Neurosci. 8, 89–106.

Goto, S.G., Ando, Y., Huang, C., Yee, A., Lewis, R.S., 2010. Cultural differences in thevisual processing of meaning: detecting incongruities between background andforeground objects using the N400. Soc. Cogn. Affect. Neurosci. 5 (2–3),242–253.

Goto, S.G., Yee, A., Lowenberg, K., Lewis, R.S., 2013. Cultural differences in sensitivityto social context: detecting affective incongruity using the N400. Soc. Neurosci.8 (1), 63–74.

Grandjean, Banziger, T., Scherer, K.R., 2006. Intonation as an interface betweenlanguage and affect. Prog. Brain Res. 156, 235–247.

Gudykunst, W.B., Matsumoto, Y., Ting-Toomey, S., Nishida, T., Kim, K., Heyman, S.,1996. The influence of cultural individualism-collectivism, self construals, andindividual values on communication styles across cultures. Hum. Commun. Res.22 (4), 510–543.

Gutchess, A.H., Welsh, R.C., Boduroglu, A., Park, D.C., 2006. Cultural differences inneural function associated with object processing. Cogn. Affect. Behav. Neu-rosci. 6 (2), 102–109.

Haas, B.W., Omura, K., Constable, R.T., Canli, T., 2006. Interference produced byemotional conflict associated with anterior cingulate activation. Cogn. Affect.Behav. Neurosci. 6 (2), 152–156.

Hall, 1976. Beyond Culture. Doubleday, New York.Hall, Hall, M.R., 1990. Understanding Cultural Differences—Germans, French and

Americans. Intercultural Press Inc, Yarmouth, ME.Handy, T.C., 2004. Event-related Potentials: A Methods Handbook. MIT Press,

Cambridge, MA.Hawrysh, B.M., Zaichkowsky, J.L., 1991. Cultural approaches to negotiations: un-

derstanding the Japanese. Asia Pac. J. Mark. Logist. 3 (1), 40–54.Hedden, T., Ketay, S., Aron, A., Markus, H.R., Gabrieli, J.D.E., 2008. Cultural influences

on neural substrates of attentional control. Psychol. Sci. 19 (1), 12–17.Holcomb, P.J., Neville, H.J., 1990. Auditory and visual semantic priming in lexical

decision: a comparison using event-related potentials. Lang. Cogn. Process. 5(4), 281–312.

Ishii, K., Kobayashi, Y., Kitayama, S., 2010. Interdependence modulates the brainresponse to word-voice incongruity. Soc. Cogn. Affect. Neurosci. 5 (2–3),307–317.

Ishii, K., Reyes, J.A., Kitayama, S., 2003. Spontaneous attention to word contentversus emotional tone: differences among three cultures. Psychol. Sci. 14 (1),39–46.

Jessen, S., Kotz, S.A., 2011. The temporal dynamics of processing emotions fromvocal, facial, and bodily expressions. NeuroImage 58 (2), 665–674.

Page 13: Culture modulates the brain response to human expressions ...major Western culture (e.g., English-speaking North Americans). From the perspective that display rules promote differences

P. Liu et al. / Neuropsychologia 67 (2015) 1–13 13

Kitayama, S., Ishii, K., 2002. Word and voice: spontaneous attention to emotionalutterances in two languages. Cogn. Emot. 16 (1), 29–59.

Kotz, S.A., Paulmann, S., 2007. When emotional prosody and semantics dance cheekto cheek: ERP evidence. Brain Res. 1151, 107–118.

Kutas, M., Federmeier, K.D., 2011. Thirty years and counting: finding meaning in theN400 component of the event-related brain potential (ERP). Annu. Rev. Psychol.62, 621–647.

Kutas, M., Van Petten, C., 1994. Psycholinguistics electrified: event-related brainpotential investigations. In: Gernsbacher, M.A. (Ed.), Handbook of Psycho-linguistics. Academic Press, San Diego, CA, pp. 83–143.

Liotti, M., Woldorff, M.G., Perez III, R., Mayberg, H.S., 2000. An ERP study of thetemporal course of the Stroop color-word interference effect. Neuropsychologia38 (5), 701–711.

Liu, P., Rigoulot, S., Pell, M.D., 2014. Cultural Immersion Alters Emotion Perception:Electrophysiological Evidence from Chinese Immigrants in North America, inpreparation.

Liu, P., Pell, M.D., 2012. Recognizing vocal emotions in Mandarin Chinese: a vali-dated database of Chinese vocal emotional stimuli. Behav. Res. Methods 44 (4),1042–1051.

Lu, L., Gilmour, R., 2004. Culture and conceptions of happiness: individual orientedand social oriented swb. J. Happiness Stud. 5 (3), 269–291.

Markus, H.R., Kitayama, S., 1991. Culture and the self: implications for cognition,emotion, and motivation. Psychol. Rev. 98 (2), 224–253.

Massaro, D.W., Egan, P.B., 1996. Perceiving affect from the voice and the face.Psychon. Bull. Rev. 3 (2), 215–221.

Matsumoto, 1990. Cultural similarities and differences in display rules. Motiv. Emot.14 (3), 195–214.

Matsumoto, Consolacion, T., Yamada, H., Suzuki, R., Franklin, B., Paul, S., et al., 2002.American–Japanese cultural differences in judgements of emotional expres-sions of different intensities. Cogn. Emot. 16 (6), 721–747.

Matsumoto, D., 2002. Methodological requirements to test a possible in-groupadvantage in judging emotions across cultures:comment on Elfenbein andAmbady (2002) and evidence. Psychol. Bull. 128 (2), 236–242.

Matsumoto, D., Takeuchi, S., Andayani, S., Kouznetsova, N., Krupp, D., 1998. Thecontribution of individualism vs. collectivism to cross-national differences indisplay rules. Asian J. Soc. Psychol. 1 (2), 147–165.

Matsumoto, D., Yoo, S.H., Fontaine, J., 2008. Mapping expressive differences aroundthe world: the relationship between emotional display rules and individualismversus collectivism. J. Cross-Cult. Psychol. 39 (1), 55–74.

Matsumoto, D., Yoo, S.H., Hirayama, S., Petrova, G., 2005. Development and vali-dation of a measure of display rule knowledge: the display rule assessmentinventory. Emotion 5 (1), 23–40.

McCarthy, Lee, K., Itakura, S., Muir, D.W., 2006. Cultural display rules drive eye gazeduring thinking. J. Cross-Cult. Psychol. 37 (6), 717–722.

McCarthy, Lee, K., Itakura, S., Muir, D.W., 2008. Gaze display when thinking de-pends on culture and context. J. Cross-Cult. Psychol. 39 (6), 716–729.

Mitchell, R.L.C., 2006. Does incongruence of lexicosemantic and prosodic in-formation cause discernible cognitive conflict? Cogn. Affect. Behav. Neurosci. 6(4), 298–305.

Näätänen, R., 1992. Attention and brain function. L. Erlbaum, Hillsdale, NJ.Nisbett, R.E., Peng, K.P., Choi, I., Norenzayan, A., 2001. Culture and systems of

thought: Holistic versus analytic cognition. Psychol. Rev. 108 (2), 291–310.Nygaard, L.C., Queen, J.S., 2008. Communicating emotion: linking affective prosody

and word meaning. J. Exp. Psychol.: Hum. Percept. Perform. 34 (4), 1017–1030.Oishi, S., Graham, J., Kesebir, S., Galinha, I.C., 2013. Concepts of happiness across

time and cultures. Personal. Soc. Psychol. Bull. 39 (5), 559–577.Park, D.C., Huang, C.M., 2010. Culture wires the brain: a cognitive neuroscience

perspective. Perspect. Psychol. Sci. 5 (4), 391–400.Paulman, S., Pell, M.D., 2010. Contextual influences of emotional speech prosody on

face processing: how much is enough? Cogn. Affect. Behav. Neurosci. 10 (2),230–241.

Paulmann, S., Kotz, S.A., 2008. An ERP investigation on the temporal dynamics ofemotional prosody and emotional semantics in pseudo- and lexical-sentencecontext. Brain Lang. 105 (1), 59–69.

Paulmann, S., Pell, M.D., 2011. Is there an advantage for recognizing multi-modalemotional stimuli? Motiv. Emot. 35 (2), 192–201.

Paulmann, S., Pell, M.D., Kotz, S.A., 2008. Functional contributions of the basalganglia to emotional prosody: evidence from ERPs. Brain Res. 1217, 171–178.

Pell, M.D., Baum, S.R., 1997. Unilateral brain damage, prosodic comprehensiondeficits, and the acoustic cues to prosody. Brain Lang. 57 (2), 195–214.

Pell, M.D., Monetta, L., Paulmann, S., Kotz, S.A., 2009a. Recognizing emotions in aforeign language. J. Nonverbal Behav. 33 (2), 107–120.

Pell, M.D., Paulmann, S., Dara, C., Alasseri, A., Kotz, S.A., 2009b. Factors in the re-cognition of vocally expressed emotions: a comparison of four languages. J.Phon. 37 (4), 417–435.

Pourtois, G., deGelder, B., Vroomen, J., Rossion, B., Crommelinck, M., 2000. Thetimecourse of intermodal binding between seeing and hearing affective in-formation. Neuroreport 11 (6), 1329–1333.

Rigoulot, S., Pell, M.D., 2012. Seeing emotion with your ears: emotional prosodyimplicitly guides visual attention to faces. PLoS One 7, e30740.

Rigoulot, S., Pell, M.D., 2014. Emotion in the voice influences the way we scanemotional faces. Speech Commun. 65, 36–49.

Rigoulot, S., Wassiliwizky, E., Pell, M.D., 2013. Feeling backwards? How temporalorder in speech affects the time course of vocal emotion recognition. Front.Psychol. 4, 367.

Rosnow, R.L., Rosenthal, R., 2003. Effect sizes for experimenting psychologists. Can.J. Exp. Psychol. 57 (3), 221–237.

Schirmer, A., Kotz, S.A., 2003. ERP evidence for a sex-specific stroop effect inemotional speech. J. Cogn. Neurosci. 15 (8), 1135–1148.

Schirmer, A., Kotz, S.A., Friederici, A.D., 2002. Sex differentiates the role of emo-tional prosody during word processing. Cogn. Brain Res. 14 (2), 228–233.

Schirmer, A., Lui, M., Maess, B., Escoffier, N., Chan, M., Penney, T.B., 2006. Task andsex modulate the brain response to emotional incongruity in asian listeners.Emotion 6 (3), 406–417.

Scollon, R., Scollon, S.W., 1995. Intercultural Communication: A Discourse Ap-proach. Blackwell, Oxford.

Sekiyama, K., Tohkura, Y., 1991. McGurk effect in non-English listeners: few visualeffects for Japanese subjects hearing Japanese syllables of high auditory in-telligibility. J. Acoust. Soc. Am. 90 (4 pt 1), 1797–1805.

Talsma, D., Slagter, H.A., Nieuwenhuis, S., Hage, J., Kok, A., 2005. The orienting ofvisuospatial attention: an event-related brain potential study. Cogn. Brain Res.1 (25), 117–129.

Tanaka, A., Koizumi, A., Imai, H., Hiramatsu, S., Hiramoto, E., de Gelder, B., 2010. Ifeel your voice: cultural differences in the multi-sensory perception of emotion.Psychol. Sci. 21 (9), 1259–1262.

Uchida, Y., Norasakkunkit, V., Kitayama, S., 2004. Cultural constructions of happi-ness: theory and emprical evidence. J. Happiness Stud. 5 (3), 223–239.