in tianjin chinese - department of...

37
1 23 Journal of East Asian Linguistics ISSN 0925-8558 Volume 25 Number 1 J East Asian Linguist (2016) 25:1-35 DOI 10.1007/s10831-015-9135-0 The productivity of variable disyllabic tone sandhi in Tianjin Chinese Jie Zhang & Jiang Liu

Upload: others

Post on 14-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

1 23

Journal of East Asian Linguistics ISSN 0925-8558Volume 25Number 1 J East Asian Linguist (2016) 25:1-35DOI 10.1007/s10831-015-9135-0

The productivity of variable disyllabic tonesandhi in Tianjin Chinese

Jie Zhang & Jiang Liu

1 23

Your article is protected by copyright and all

rights are held exclusively by Springer Science

+Business Media Dordrecht. This e-offprint

is for personal use only and shall not be self-

archived in electronic repositories. If you wish

to self-archive your article, please use the

accepted manuscript version for posting on

your own website. You may further deposit

the accepted manuscript version in any

repository, provided it is only made publicly

available 12 months after official publication

or later and provided acknowledgement is

given to the original source of publication

and a link is inserted to the published article

on Springer's website. The link must be

accompanied by the following text: "The final

publication is available at link.springer.com”.

The productivity of variable disyllabic tone sandhiin Tianjin Chinese

Jie Zhang1 · Jiang Liu2

Received: 9 January 2013 /Accepted: 23 December 2014 / Published online: 3 November 2015

© Springer Science+Business Media Dordrecht 2015

Abstract Tianjin Chinese has one of the more complex tone sandhi systems in

Northern Chinese dialects. Due to its close contact with Standard Chinese, many of

its tone sandhi patterns are also variable. This article first reports a detailed acoustic

study of tone sandhi patterns in both real lexical items and novel words in Tianjin.

The data were collected from 48 speakers of Tianjin, who were instructed to pro-

nounce disyllabic sequences as real words based on voice prompts. The results

showed that the productivity of the sandhis in novel words varied depending on the

sandhi—some were less productive than in real words, and some were more pro-

ductive, indicating a combination of underlearning, overlearning, and proper

learning of the sandhis from the lexicon. A theoretical model that predicts the

productivity patterns based on the phonetic properties of the sandhis and statistical

generalizations about the sandhis over the lexicon is then proposed.

Keywords Tone · Tone sandhi · Tianjin · Productivity · Optimality theory ·

Maximum entropy grammar

Electronic supplementary material The online version of this article (doi:

10.1007/s10831-015-9135-0) contains supplementary material, which is available to authorized

users.

& Jie Zhang

[email protected]

& Jiang Liu

[email protected]

1 Department of Linguistics, The University of Kansas, 1541 Lilac Lane, Blake Hall, Room 427,

Lawrence, KS 66045-3129, USA

2 Department of Asian Languages and Literatures, University of Minnesota, 220 Folwell Hall, 9

Pleasant Street SE, Minneapolis, MN 55455, USA

123

J East Asian Linguist (2016) 25:1–35

DOI 10.1007/s10831-015-9135-0

Author's personal copy

1 Introduction

1.1 Two types of evidence for phonological knowledge

Kenstowicz and Kisseberth, in Chap. 5 of their seminal Generative phonology:Description and theory (1979), raised a serious methodological issue for generative

phonology research: they questioned the assumption that the phonological

abstractions derived by traditional research methods that focused on lexically

manifested patterns of sound distribution and morpheme alternation were the same

abstractions in speakers’ unconscious phonological knowledge—the knowledge that

generative phonology aims to uncover. Consequently, they advocated the research

practice of complementing the evidence gleaned from such traditional sources with

evidence from speakers’ linguistic behavior that directly manifested their uncon-

scious knowledge, from speech errors and language games to loanwords and second

language acquisition.

Their skepticism of the assumption turned out to be well founded as subsequent

research showed that speakers know both more and less than the lexical patterns. A

number of recent studies have shown that speakers possess phonological knowledge

that the lexical patterns of their language do not inform them of—a scenario that we

will refer to as “overlearning.” For instance, Zuraw (2007) showed through a corpus

study on loans and a web-based survey on novel words that Tagalog speakers

possessed knowledge of the splittability of word-initial consonant clusters that could

not be deduced from the lexicon. Berent et al. (2007) demonstrated through a series

of experiments that English speakers preferred /bd/ as an onset cluster over /lb/,

even though neither is a legal onset cluster in English. In an artificial language-

learning setting, Wilson (2006) established that when English speakers were

presented with velar palatalization before mid vowels, they could extend the process

before high vowels but not vice versa. These have been taken as “the poverty of the

stimulus” arguments for the relevance of Universal Grammar or substantive biases

in phonological learning.

“Underlearning,” alternatively termed “the surfeit of the stimulus” (Becker et al.

2011), refers to speakers’ subpar knowledge, and sometimes total ignorance, of

generalizable patterns in the lexicon. For example, Becker et al. (2011) found that

Turkish speakers could generalize to novel words the statistical patterns seen in

relations between an obstruent voicing alternation and word length as well as place

of articulation in obstruents in the lexicon, but they were oblivious to a similar

statistically significant relation between the voicing alternation and properties of the

preceding vowel (height, backness). Hayes et al. (2009b) investigated the variation

patterns in suffixal vowel harmony in Hungarian and compared how speakers

internalized two types of gradient patterns in novel words—natural ones in which

the harmony behavior is based on the properties of the stem vowels (number of

triggers, height of the trigger) and unnatural ones in which the harmony is correlated

with features of the stem-final consonant. They found that speakers learned both the

natural and unnatural patterns, but the unnatural patterns were undervalued and

learned less robustly than the natural ones. Using an artificial language-learning

2 J. Zhang, J. Liu

123

Author's personal copy

paradigm, Moreton (2008) showed that English speakers learned a vowel height-

voicing dependency significantly more poorly than a height-height dependency

despite the facts that (a) neither dependency is attested in English, (b) the

dependency in question was present in the learning experiment, and (c) the two

dependencies have comparable phonetic precursors. These results also suggest that

speakers’ phonological knowledge is the combined result of learned lexical patterns

and a priori knowledge.

These studies support Kenstowicz and Kisseberth’s thesis that evidence for

speakers’ phonological knowledge needs to come from both within and beyond

lexical patterns. Beyond the areas identified by Kenstowicz and Kisseberth such as

speech errors and loanwords, corpus-external evidence has emerged from exper-

imental investigations of productivity, especially in the form of wug tests (Berko

1958), in which speakers are asked to provide responses to novel words in contexts

that are facilitative to the application of the phonological process in question. This

methodology has been widely used to test the productivity of phonological

alternations (e.g., Albright et al. 2001; Hayes and Londe 2006; Zuraw 2007; Hayes

et al. 2009b; Becker et al. 2011) as well as regular and irregular morphological rules

(e.g., Bybee and Pardo 1981; Albright 2002; Albright and Hayes 2003;

Pierrehumbert 2006).

1.2 The role of productivity in tone sandhi research

Tone sandhi research, particularly descriptive work, has had a long tradition in

Chinese phonology. Both detailed descriptions of tone sandhi in individual dialects

and typological works on cross-linguistic patterns of tone sandhi abound (see Zhang

2014a, b for reviews and references). The relation between tone sandhi and

theoretical phonology, however, has been an uncomfortable one. The analysis of

Chinese tone sandhi patterns has presented considerable challenges to theoretical

phonology in both rule-based and constraint-based frameworks, and complete

theoretical analyses of any given tone sandhi system have proven difficult. Beyond

the sheer complexity of tone sandhi patterns often observed in Chinese dialects,

especially in the Wu and Min groups, three other properties of tone sandhi are

responsible for this difficulty. First, as the result of diachronic changes, many of the

sandhi patterns in the present-day systems are phonetically arbitrary. This presents

particular challenges to the analysis of these patterns in Optimality Theory (Prince

and Smolensky 1993), which relies on surface-oriented, generalizable markedness

constraints. Second, many of the tone sandhi patterns are phonologically opaque

(Kiparsky 1973). For example, in Taiwanese, four of the five tones in the tonal

inventory on non-checked syllables are involved in a circular chain shift:

55 → 33 → 21 → 51 → 55 (Cheng 1968; Chen 1987); in Fuzhou, the following

synchronic chain shifts are attested: 32 → 44 → 53 → 21 / __ {212, 242};

44 → 53 → 32 → 24 / __ 32 (Liang and Feng 1996). These patterns also pose

analytical challenges for Optimality Theory: circular chain shift has been shown to

be incomputable by a “conservative” OT grammar that uses only IO-faithfulness

and markedness constraints (Moreton 2004), and regular chain shift requires

additional mechanisms such as constraint conjunction to be captured (Kirchner

Tone sandhi productivity in Tianjin Chinese 3

123

Author's personal copy

1996). Third, due to complex contact situations as well as internal factors, many

sandhi patterns are riddled with variation and exceptions. Under these contexts, it is

particularly worthwhile to ask whether the lexical sandhi patterns that the speakers

encounter are a true reflection of their phonological knowledge via productivity

studies. Do speakers overlearn/generalize sandhi patterns in the face of variation

and exceptions? Do speakers underlearn lexical regularities in tone sandhi due to

their phonetic arbitrariness and phonological opacity? In other words, we need to

expand our empirical basis from which theoretical analysis of tone sandhi proceeds

to include not only lexical patterns of tone sandhi but also experimental evidence of

tone sandhi productivity. This was exactly Kenstowicz and Kisseberth’s recom-

mendation to phonologists over 30 years ago.

Using wug tests to investigate the productivity of tone sandhi patterns can be

traced back to the ground-breaking work of Hsieh (1970, 1975, 1976), who showed

that the opaque tone sandhi circle in Taiwanese is generally not productive. Later

works by Wang (1993), Zhang and Lai (2008), and Zhang et al. (2009, 2011)

replicated and expanded Hsieh’s studies and reached similar conclusions. Zhang and

Lai (2008) and Zhang et al. (2009, 2011), in addition, showed that sandhi

productivity is also correlated with the frequencies of the sandhi patterns in the

lexicon and the phonetic nature of the tone change: sandhis that have higher type

and token frequencies in the lexicon tend to have higher productivity, and sandhis

that turn longer tones into shorter tones have a productivity advantage over sandhis

that turn shorter tones into longer tones due to the impoverished duration of the

sandhi position as compared to the non-sandhi position. Zhang and Lai (2010) tested

the productivity difference between the third-tone sandhi (213 → 35 / __ 213) and

half-third sandhi (213 → 21 / __ T, T ≠ 213) in Standard Chinese in two wug test

experiments and showed that the former applies less productively in novel words

than the latter. They argued that the results were due to the fact that the half-third

sandhi is a contour reduction process directly related to the shortened duration in

non-final positions and thus has a clearer phonetic motivation than the third-tone

sandhi, which (a) has a long diachronic history, (b) involves a pitch raising not

easily explainable by phonetics, and (c) is also perceptually neutralizing. Zhang and

Meng (2012) demonstrated that in Shanghai Wu, rightward contour extension,

which effectively reduces contour tones on both syllables, is more productive than

rightward contour displacement, which does not level the contour, and in the

meantime causes large phonetic mismatches in both stress and tonal contour

between the base and sandhi tones. These studies indicate that wug testing the

productivity of tone sandhi patterns is a worthy research endeavor as speakers’

phonological knowledge can indeed differ from lexically manifested sandhi patterns

due to the phonetic (e.g., tone duration, tone similarity) and phonological (e.g.,

opacity) properties of the sandhis.

What we hope to achieve in this article is to present a productivity study of the

tone sandhi system in Tianjin Chinese, which differs from previously investigated

sandhi systems in a number of respects. First, as a northern dialect with a close

affinity to the Beijing dialect and Standard Chinese, Tianjin’s sandhi pattern is also

“right-dominant” (Yue-Hashimoto 1987), in that the tone at the right edge of the

sandhi domain remains intact while non-final tones undergo sandhi. But its sandhi

4 J. Zhang, J. Liu

123

Author's personal copy

pattern is considerably more complex than that of Beijing and Standard Chinese.

Second, different from the “right-dominant” southern Min dialects like Taiwanese,

the Tianjin sandhi pattern does not involve phonological opacity. Third, the sandhi

pattern in Tianjin is riddled with variation and exceptions, likely due to its close

contact with the Beijing dialect and the dominance of Standard Chinese. The

productivity study on Tianjin tone sandhi, therefore, allows us to expand the

typology of sandhi productivity, address new questions such as the effect of

variation and exceptions to productivity, and in the meantime provide further tests

of some of the hypotheses mentioned earlier, such as the relevance of lexical

frequency and phonetic properties to sandhi productivity. In the rest of the article,

we introduce the Tianjin tone sandhi pattern first in Sect. 1.3, then discuss the

hypotheses and the methodology of the productivity study in Sect. 2. Results of our

experiment follow in Sect. 3. We then provide a theoretical model for our results in

Sect. 4. Discussions and concluding remarks are provided in Sect. 5.

1.3 Tianjin tone sandhi

Tianjin Chinese is spoken in the city of Tianjin 65 miles to the southeast of Beijing.

Its four lexical tones are cognates with the four tones in Standard Chinese, but the

pitch values of the tones in the two dialects differ, as shown in (1) (Chen 2000).1

The four-way contrast maH ‘mother’ � maMH ‘hemp’ � maMLH ‘horse’ � maHL

‘to scold’ in Standard Chinese, for example, is realized as maL � maH � maLH

� maHL in Tianjin.

(1) Lexical tones and Tianjin Chinese and Standard Chinese:

Tone 1 Tone 2 Tone 3 Tone 4

Tianjin L H LH HL

Standard Chinese H MH MLH HL

As mentioned previously, despite its close affinity and similarity to Standard

Chinese, Tianjin has a considerably more complex system of tone sandhi. The

traditional disyllabic sandhis reported in Li and Liu (1985) and later confirmed by

Shi (1986), Yang et al. (1999), and Chen (2000), are summarized in (2). The T3+T3

sandhi in (2b) is cognate with the third-tone sandhi in Standard Chinese, which also

changes a T3 to a T2 before another T3. The other three sandhis are not attested in

Standard Chinese nor do they have extensive synchronic counterparts in other

dialects to the best of our knowledge.

1 The transcriptions of the Tianjin tones vary from source to source. For example, using Chao’s tone

numbers (Chao 1968), Li and Liu (1985) transcribed the four tones as 21, 45, 213, 54, respectively, while

Shi (1990) used 11, 55, 24, 53. We use Chen’s (2000) notation here. For more detailed discussion and

acoustic data on Tianjin citation tones, see Zhang and Liu (2011).

Tone sandhi productivity in Tianjin Chinese 5

123

Author's personal copy

(2) Tianjin disyllabic tone sandhi I:

a. L+L → LH+L (T1+T1 → T3+T1)

b. LH+LH → H+LH (T3+T3 → T2+T3)

c. HL+L → H+L (T4+T1 → T2+T1)

d. HL+HL → L+HL (T4+T4 → T1+T4)

Shi (1988) noted that the four sandhi processes in (2) applied with different

propensities in Tianjin. Under the criteria of the number of lexical exceptions and

the likelihood with which the base-tone combinations surface as the result of tone

sandhi in longer sequences, Shi ordered the sandhis according to their “strength” as

follows: (T3+T3) [ (T1+T1) [ (T4+T4) [ (T4+T1). From the recordings of

204 Tianjin speakers in different age groups, Shi and Wang (2004) showed that the

T4+T1 sandhi had a tendency to apply with greater regularity among younger

speakers (close to 100 % application for speakers younger than 20 but only around

60 % for speakers older than 70), and the T4+T4 sandhi had generally become

obsolete for younger speakers (close to 0 % application for \20 years; around 40 %

for [70 years).2 The disappearance of the T4+T4 sandhi has also been reported in

Liu and Gao (2003) and Gao (2004), and they attributed the disappearance to the

influence of Standard Chinese, which has a similar T4 (51) that does not undergo

sandhi before another T4. Shi and Wang’s (2004) results were in general agreement

with Zhang and Liu’s (2011) acoustic findings on disyllabic tone sandhi from 12

Tianjin speakers (average age = 34.3), which showed that the T3+T3 and T1+T1

sandhis applied consistently, the T4+T1 sandhi had a small number of exceptions,

and the T4+T4 sandhi only applied to a handful of words for a small subset of the

speakers. Furthermore, Zhang and Liu (2011) showed that the sandhi patterns, even

when they applied, generally did not result in tonal neutralization as the description

in (2) implies, as the sandhi tone always preserved certain pitch properties from the

base tone.

Wee (2004) reported two additional tone sandhis for Tianjin, given in (3). These

sandhis likely originated from the half-third sandhi in Standard Chinese, whereby

the falling-rising T3 is realized as its first half before a tone other than T3 (213

+T → 21+T, T ≠ 213). Although Wee (2004) reported these sandhis as

neutralizing sandhis (neutralization of T3 and T1 in the sandhi contexts), Ma and

Jia’s (2006) acoustic and perceptual studies showed that neither sandhi in (3) was

truly neutralizing: the sandhi tones partially preserved the rising property of T3, and

listeners could identify the difference between T1 and T3 in the sandhi contexts

with an accuracy rate of over 85 %. Zhang and Liu’s (2011) acoustic results further

supported the incomplete neutralization property of these two sandhis. In our

discussion of the sandhis below, we will still use the conventional categorical

transcriptions, but only as a convenient shorthand.

2 In addition, Shi and Wang (2004) also found that for T1+T1, younger speakers (\20 years)

consistently used T2+T1 as the sandhi tones, not the previously reported T3+T1, while older speakers

([70%) varied between T3+T1 and T2+T1. See Lu (1997, 2004) and Zhang and Liu (2011) for similar

findings and additional discussions.

6 J. Zhang, J. Liu

123

Author's personal copy

(3) Tianjin disyllabic tone sandhi II:

a. LH+H → L+H (T3+T2 → T1+T2)

b. LH+HL → L+HL (T3+T4 → T1+T4)

The complexity of tone sandhi in Tianjin, therefore, comes not only from the

intricacy of the pattern itself but also from the variation and exceptions in the

pattern and the changes that it is currently undergoing. The pattern itself, then, is not

only interesting in its own right but also presents an opportunity to contribute to the

theoretical debate on the roles of variation and exceptions in the formal grammar—

an issue that has captured much attention in the recent phonological literature (see

Coetzee and Pater 2011 for a review). It is also worth noting that the complexity of

the Tianjin sandhi pattern does not involve opaque chain shifts as in Taiwanese. A

study of the productivity of the tone sandhi pattern in Tianjin, therefore, allows us to

investigate the speakers’ knowledge of a typologically different kind of sandhi

system. We lay out the specific hypotheses about the productivity of Tianjin tone

sandhi and the methodology for the study in the next section.

2 Hypotheses and methodology

2.1 Hypotheses

We have seen in Sect. 1.2 that a series of work on the productivity of tone sandhi

patterns in Chinese dialects has shown that the phonological transparency, phonetic

properties, and lexical frequency of a sandhi can all affect its productivity in novel

words. Phonological transparency is not relevant here as all Tianjin tone sandhis are

transparent. But we expect the effects of phonetic properties and lexical frequency

to manifest themselves in Tianjin. In particular, we first hypothesize that regular

sandhis with a strong phonetic basis, such as the half-third sandhis LH+H → L+H

(T3+T2 → T1+T2) and LH+HL → L+HL (T3+T4 → T1+T4), would be more

productive than other regular sandhis L+L → LH+L (T1+T1 → T3+T1) and LH

+LH → H+LH (T3+T3 → T2+T3), whose phonetic basis is less strong. Our

judgment of the strength of the phonetic basis follows that of Zhang and Lai’s

(2010) for Standard Chinese. The half-third sandhi is a contour reduction process

directly related to the shortened duration in non-final positions.3 The other sandhis

have properties that are not directly related to phonetic reduction. The T1+T1

sandhi involves a contouring process in non-final position, which is typologically

rare (Yue-Hashimoto 1987; Zhang 2002). It also cannot be easily interpreted as

phonetically motivated dissimilation as coarticulatory dissimilation typically

involves the raising of a high tone before a low tone (see Gandour et al. 1994 for

3 An anonymous reviewer questioned the phonetic basis of the T3+T2 sandhi as the opposite pattern,

whereby L+H → LH+H, is attested in African languages. But this type of regressive tone spreading is

considerably rarer than progressive assimilation (Maddieson 1978; Hyman 2007; Zhang 2007). Hyman

(2007) in fact goes on to argue that regressive tone spreading is due to special circumstances involving

tone attraction to stressed positions or pressure from intonation at the right edge and therefore is not a

diachronically natural process.

Tone sandhi productivity in Tianjin Chinese 7

123

Author's personal copy

Thai; Xu 1997 for Standard Chinese; Peng 1997 for Taiwanese; and Zhang and Liu

2011 for Tianjin). The third-tone sandhi, like in Standard Chinese, also involves a

raising of the pitch not easily explainable by phonetics.

Second, we hypothesize that ceteris paribus, sandhi patterns with higher type and

token frequencies will be more productive than those with lower frequencies. This

should be most clearly manifested in the comparison between the two half-third

sandhi patterns: both are equally motivated by insufficient duration, yet Tone 2 has

lower type and token frequencies than Tone 4 (based on Da 2004). Therefore, we

expect the T3+T2 sandhi to be less productive than the T3+T4 sandhi, a result also

found in Zhang and Lai’s (2010) study on Standard Chinese. Relatedly, we also

hypothesize that the token frequency of a particular lexical item is related to how the

sandhi applies to the item, in that higher frequency leads to higher productivity. If

so, then any underlearning or overlearning effects in novel words may be interpreted

as exaggerated frequency effects. This will further inform the theoretical model for

the speakers’ sandhi knowledge.

Third, for the sandhis with exceptions, we predict that they will tend to change in

the innovative direction in novel words. This is because we expect new words to

take on the behavior that represents the direction of change. In other words, the

disappearing HL+HL → L+HL (T4+T4 → T1+T4) should show further

underlearning in novel words while the sandhi gaining popularity—HL+L → H

+L (T4+T1 → T2+T1)—should be overlearned and generalized.

In short, we hypothesize that a Tianjin speaker’s knowledge of tone sandhi is a

combination of proper learning, underlearning, and overlearning from the lexicon:

the lexical frequency of the sandhi pattern is positively correlated with sandhi

productivity; however, sandhis that lack phonetic motivation should be under-

learned and lack full productivity, yet sandhis with a limited number of exceptions

should be overlearned and generalized.

2.2 Methodology

2.2.1 Experimental design

To test these hypotheses, we designed a wug test in which native speakers of Tianjin

were asked to pronounce two separately presented individual syllables together as a

real disyllabic word in Tianjin. All six sandhis in (2) and (3) were tested, and within

each sandhi, three types of words were used: real disyllabic words in Tianjin,

pseudo words composed of two actual-occurring syllables in Tianjin, and novel

words in which the first syllable was an accidental gap in the Tianjin syllabary. An

accidental gap is a syllable in which both the segmentals and tone are legal, but their

combination happens to be missing in Tianjin. We will refer to these three groups as

REAL, PSEUDO, and NOVEL henceforth. REAL words were then further divided into

four subtypes according to whether the disyllable and the first syllable were of high

or low token frequency as in Fig. 1a, and PSEUDO words were further divided into

two subtypes depending on whether the first syllable had high or low token

frequency as in Fig. 1b. For each of the word-(sub)type/sandhi-type combination,

we used four different words, which resulted in 168 test words (6 9 7 9 4). Token

8 J. Zhang, J. Liu

123

Author's personal copy

frequency data were derived from a corpus of written Chinese with 28,278,285

bigrams compiled from online resources by Da (2004). The mean raw bigram

frequency for the high-frequency disyllabic words is 3721, and that for the low-

frequency words is 178. Frequencies for the first syllables in high-frequency REAL,

low-frequency REAL, and PSEUDO words include the frequencies of all homophonous

characters, provided that the characters are among the 3500 most commonly used

characters in Da’s character corpus. In other words, these frequencies are

approximations of the frequencies of the phonetic syllables with tones. High-

frequency syllables all have a mean raw frequency over 210,000 while low-

frequency syllables all have a frequency under 80,000. Care was taken to minimize

the effect of tonal combination on word and syllable frequencies. We also used 160

fillers, 16 for each of the 10 disyllabic tonal combinations that did not undergo

sandhi. We did not control for whether the REAL words were verbs or nouns or

whether the PSEUDO words were more easily interpreted as verbs or nouns as word

category is not known to affect the application of disyllabic tone sandhi in either

Tianjin or Standard Chinese. Additional information on the selection of the stimuli

and the entire word list are given in Appendix 1 (see Supplementary material).

The 328 experimental stimuli were recorded in their monosyllabic citation form

by a 23-year-old male native speaker of Tianjin in an anechoic chamber at the

University of Kansas. Each monosyllable was read without sentential context twice,

and the token deemed clearer by the two authors was used in the experiment. The

experiment was implemented in Paradigm® (Perceptional Research Systems). The

stimuli were evenly divided into two blocks. Block A included all stimuli with the

tonal combinations T1+T1, T3+T2, and T3+T3 as well as fillers with the tonal

combinations T1+T2, T1+T3, T1+T4, T2+T1, and T2+T2. Block B included all

T3+T4, T4+T1, and T4+T4 stimuli and T2+T3, T2+T4, T3+T1, T4+T2, and T4

+T3 fillers. Half of the subjects took block A first, and the other half took block B

first. There was a 5-min break between the blocks. Within each block, the stimuli

were randomized by Paradigm® for each speaker. Each stimulus consisted of two

monosyllables separated by an 800 ms interval. The stimuli were played through a

pair of headphones to the subjects. For each stimulus, the subjects were asked to put

the two syllables together and pronounce them as a real disyllabic word in Tianjin as

naturally as possible. Before the experiment began, there was an introduction in

Tianjin that the subjects heard through the headphones and simultaneously read on a

(a) (b)

REAL

High freq. Low freq.

Highfreq. 1

Lowfreq. 1

Highfreq. 1

Lowfreq. 1

PSEUDO

Highfreq. 1

Lowfreq. 1

Fig. 1 Stimulus design for a REAL words and b PSEUDO words

Tone sandhi productivity in Tianjin Chinese 9

123

Author's personal copy

computer screen in front of them. The introduction explained their task both in prose

and through examples. There was then a practice session of 9 words that did not

appear in the real experiment (three of each of REAL, PSEUDO, and NOVEL words).

The instruction and practice items were recorded by the same male speaker whose

voice was used in the experiment. The experiment began after a verbal confirmation

from the subjects that they were ready. The entire experiment took around 45 min.

Fifty native speakers of Tianjin participated in the experiment. Two of them were

recorded in an anechoic chamber in the Phonetics and Psycholinguistics Laboratory

of the University of Kansas using a Marantz solid state recorder PMD 671 sampling

at 22.05 kHz and an Electro-Voice RE-20 microphone. The other 48 were recorded

in a quiet room in the Phonetics Laboratory of the Department of Chinese Language

and Literature at Nankai University in Tianjin using the same model of solid-state

recorder and an EV N/D 767a microphone. These speakers all self-reported to be

native Tianjin speakers but were all bilingual in Tianjin and Standard Chinese. We

made it clear to them that we were interested in the Tianjin dialect, and the native-

Tianjin instruction and practice should also orient them to the Tianjin context. The

speakers’ recordings were judged to be native-Tianjin-like by a native Tianjin

consultant in the US and a trained Tianjin linguist in Tianjin. The data from two of

the speakers in Tianjin could not be used: one speaker was from a suburb of Tianjin

and spoke a different native dialect; the other’s data were lost due to a software

malfunction. For the 48 speakers whose data we did use, all were from the six inner-

city districts of Tianjin and used both Tianjin and Standard Chinese in their daily

lives; 14 were male, 34 were female; they had an average age of 23.4 at the time of

the experiment.

2.2.2 Data analysis

All acoustic analyses of the data were conducted in Praat (Boersma and Weenink

2009). For the first syllable in all test words, we took an f0 measurement every 10 %

of the rhyme duration using Yi Xu’s TimeNormalizedF0 Praat script (Xu 2005),

giving eleven f0 measurements for each syllable. The Maxf0 and Minf0 parameters

in the script as well as the octave-jump cost were adjusted for each speaker, and the

f0 measurements were hand-checked against narrow-band spectrograms in Praat.

There were two situations in which a token was not used in further analysis: first, if

neither the TimeNormalizedF0 script nor the narrow band spectrogram could

produce reliable pitch measurements for it; second, if its second syllable was

pronounced as a stressless syllable, as judged by both authors, who are native

speakers of Standard Chinese.4 The reason the latter cases were excluded was that

stressless syllables in Tianjin have a reduced tonal inventory, and words with

stressless syllables have a different set of tone sandhi behaviors as shown in Jiang

(1994) and Wang (2002). Of the 8064 tokens recorded (168 test words 9 48

speakers), 932 were excluded due to these two reasons—an attrition rate of 11.56 %.

4 Although neither author is a native speaker of Tianjin, we believe that our judgment was accurate as

stressless syllables in Tianjin have significantly reduced duration (Jiang 1994), similar to Standard

Chinese.

10 J. Zhang, J. Liu

123

Author's personal copy

The f0 measurements in Hz were converted to Semi-tone using the formula in

(4a) to better reflect pitch perception (Rietveld and Chen 2006). The Semi-tone

values were then z-score transformed using the formula in (4b) over all

measurements from a given speaker in order to normalize for between-speaker

variation, especially male and female differences (Rose 1987; Zhu 2004). Then for

each speaker, the f0 values of the four words within each word-(sub)type/sandhi-

type combination were averaged, and the averaged data were submitted for

statistical analyses.

(4) a. ST = 39.87 9 log10(Hz/50)

b. zSTx ¼STx�1

n

Pn

i¼1STiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1n�1

Pn

i¼1ðSTi�1

n

Pn

i¼1STiÞ2

p

3 Results and discussions

3.1 Word-type results

We first report the productivity differences among the three word types—REAL,

PSEUDO, and NOVEL—for the different sandhis. For REAL and PSEUDO words, we

further averaged the pitch values of the first syllable from different lexical

frequencies within each word type. For each sandhi, three two-way Repeated-

Measures ANOVAs were conducted, with Word-Type [two levels each: (1) REAL vs.

PSEUDO; (2) PSEUDO vs. NOVEL; (3) REAL vs. NOVEL] and Data-Point (11 levels) as

independent variables. A significant main effect on Word-Type indicates that the

pitches from the two word types under comparison have different means; a

significant main effect on Data-Point indicates that the pitch value is time-sensitive;

and a significant interaction between Word-Type and Data-Point indicates that the

pitches from the two word types have different slopes. Huynh–Feldt adjusted values

were used to correct for sphericity violations.

The average pitches for the different word types for each sandhi are plotted in

Fig. 2, and the ANOVA results are summarized in Appendix 2 (see Supplementary

material). Let us first note that regardless of the word type, the general sandhi

patterns agree with the acoustic findings in Zhang and Liu (2011). Most notably,

compared to traditional descriptions, for the T1+T1 sandhi, the sandhi tone is

higher than expected and closer to a base T2 (Fig. 2a); for the T3+T3 sandhi, the

sandhi tone is lower than expected and does not neutralize with T2 (Fig. 2b); and for

the T4+T4 sandhi, the sandhi tone has the same falling shape as the base tone,

indicating that the sandhi has indeed become obsolete (Fig. 2d).

But crucially, there are differences in how the sandhis applied to the three types

of words as indicated by the often significant differences in pitch means and pitch

slopes between the word types under comparison. Specifically, we can categorize

the six sandhis into three types depending on whether it is the NOVEL words or the

REAL words or neither that share more phonetic properties with the base tone of the

first syllable of the stimuli. The properties under comparison are the pitch mean and

Tone sandhi productivity in Tianjin Chinese 11

123

Author's personal copy

pitch slope of the tones. In L+L → LH+L (Fig. 2a), LH+LH → H+LH (Fig. 2b),

HL+HL → L+HL (Fig. 2d), and LH+H → L+H (Fig. 2e), the sandhi tone for

NOVEL words shares more phonetic properties with the base tone compared to REAL

Fig. 2 Average pitch contours of the first syllable in REAL, PSEUDO, and NOVEL disyllabic words for thesix different sandhis (a–f). Significant comparisons in pitch means and pitch slopes are noted in thegraphs. *, **, and *** significant differences at the p \ 0.05, p \ 0.01, and p \ 0.001 levels,respectively. For detailed ANOVA results, see Appendix 2 in Supplementary material. a L+L → LH+L(T1+T1 → T3+T1), b LH+LH → H+LH (T3+T3 → T2+T3), c HL+L → H+L (T4+T1 → T2+T1),d HL+HL → L+HL (T4+T4 → T1+T4), e LH+H → L+H (T3+T2 → T1+T2), f LH+HL → L+HL(T3+T4 → T1+T4)

12 J. Zhang, J. Liu

123

Author's personal copy

words and PSEUDO words. For instance, in L+L → LH+L (Fig. 2a), the sandhi tone

for NOVEL words is lower in pitch than that for REAL words. Given that the base tone

(L) has a lower pitch than the expected sandhi tone (LH), this falls under the

category in which the NOVEL words share more phonetic properties with the base

tone. On the other hand, in HL+L → H+L (Fig. 2c), the sandhi tone in NOVEL

words has more properties of the expected sandhi tone H by having an overall

higher pitch than the sandhi tones in REAL and PSEUDO words. Finally, for LH

+HL → L+HL (Fig. 2f), there is no difference in pitch mean or pitch slope among

the sandhi tones for the three types of words.

Our interpretation of these results is as follows. The first type of sandhi is

underlearned by the speakers as the sandhi applies less productively in NOVEL words

than REAL words, as indicated by the greater phonetic similarity between the sandhi

tone and the base tone in NOVEL words. The second type of sandhi—the sandhi with

exceptions HL+L → H+L (Fig. 2c)—has been generalized and thus applies with a

greater regularity in NOVEL words. We consider this as an instance of overlearning.The last type of sandhi—LH+HL → L+HL (Fig. 2f)—is properly learned from the

lexicon and applies in the same fashion to PSEUDO and NOVEL words as in REAL

words. These results and their interpretations are summarized in Table 1.

A note of caution is in order for the interpretation of the gradient differences

among different word types. The average pitches in Fig. 2 represent all usable

tokens in the recorded data regardless of whether the token undergoes the sandhi per

the rules in (2). This is because whether a tonal combination has undergone the

sandhi categorically, incompletely, or has not undergone the sandhi at all is often

difficult to determine except for a handful of cases. For L+L → LH+L and LH

+LH → H+LH, all usable tokens have undergone sandhi regardless of word type;

therefore, the gradient differences among the different word types were due to

incomplete application of the sandhis to at least some of the wug tokens. For HL

+L → H+L and HL+HL → L+HL, however, there are tokens in which the sandhi

clearly did not apply—a handful for the former and the vast majority for the latter;

the gradient differences seen in Fig. 2 are thus likely due to both categorical and

gradient differences in the application of the sandhis. For the two half-third sandhis

LH+H → L+H and LH+HL → L+HL, whether the sandhi has applied to a token

was particularly difficult to decide, and we surmise that the gradient differences

Table 1 A summary of the word-type results for the six tone sandhi patterns

Sandhi pattern Acoustic results for sandhi tone Learning

classification

L → LH/ __ L Lower pitch in NOVEL than REAL words Underlearing

LH → H/ __ LH Lower pitch in NOVEL than REAL words Underlearing

HL → H/ __ L Higher pitch in NOVEL than REAL words Overlearning

HL → L/ __ HL Higher pitch in NOVEL than REAL words Underlearing

LH → L / __ H Higher pitch in NOVEL than REAL words Underlearing

LH → L/ __ HL No pitch difference among word types Proper learning

Tone sandhi productivity in Tianjin Chinese 13

123

Author's personal copy

observed for the former are primarily caused by different degrees of gradient

application of the sandhi.

Our results are in agreement with our hypotheses. We have shown that a tone

sandhi pattern may be underlearned despite its full productivity in the lexicon, and

the underlearning may be gradiently realized as the incomplete application of the

sandhi. As hypothesized, the set of sandhis that shows underlearning includes not

only the regular and the obsolete sandhis, L+L → LH+L, LH+LH → H+LH, and

HL+HL → L+HL, but also the durationally based LH+H → L+H (reduction of

contour due to insufficient duration). The other durationally-based sandhi LH

+HL → L+HL, however, shows proper learning as expected. It is possible that in

order to approach proper learning, the pattern needs the help of both phonetics and

high lexical frequency: the trigger of the properly learned half-third sandhi, HL

(Tone 4), has considerably higher type and token frequencies than the trigger of the

underlearned half-third sandhi, H (Tone 2). Zhang and Lai (2010) found the same

underlearning and proper learning patterns for the half-third sandhis before Tone 2

and Tone 4 in Standard Chinese as well. For the underlearning of the obsolete

sandhi HL+HL → L+HL, our interpretation is that the real words that still undergo

the sandhi are listed in the lexicon, but the sandhi itself has become unproductive,

manifested in the results as underlearning. And for HL+L → H+L, the exceptions

to the sandhi are listed in the lexicon, but the sandhi itself is productive, manifested

in the results as overlearning.

3.2 Lexical frequency results

The effects of lexical frequency on sandhi productivity are reported on three separate

graphs for each tone sandhi, two for REAL words and one for PSEUDO words as shown

in Fig. 3. The two comparisons for the REAL words are based on the token frequency of

the disyllable (high vs. low) and the token frequency of the first syllable (high vs.

low), and the comparison for the PSEUDO words is based on the token frequency of the

first syllable. Each graph represents the average pitch contours of the first syllable of

the two word types under comparison. A two-way Repeated-Measures ANOVA was

conducted for each comparison, with Frequency (two levels) and Data-Point (11

levels) as independent variables. A significant main effect on Frequency indicates that

the pitches from the two frequency profiles have different means, and a significant

interaction between Frequency and Data-Point indicates that the pitches from the

different frequencies have different slopes. Huynh–Feldt adjusted values were again

used. Significant comparisons are indicated in Fig. 3. Detailed ANOVA results are

given in Appendix 3 (see Supplementary material).

The frequency results in Fig. 3 show that higher token frequency generally leads

to higher productivity. For both L+L → LH+L (Fig. 3a) and LH+LH → H+LH

(Fig. 3b) for which the sandhi raises the base tone, the σ1 comparison for the PSEUDO

words showed that higher token frequency σ1 leads to higher pitch. This indicates

that higher frequency for a syllable likely leads to a stronger allomorph listing for its

sandhi tone. In turn, this supports the hypothesis that the gradient underlearning of

sandhis exhibited in wug words is an exaggerated frequency effect. The higher

14 J. Zhang, J. Liu

123

Author's personal copy

Fig. 3 Effects of lexical frequency on the productivity of different sandhis. For each sandhi, threecomparisons are shown: REAL-Word-High vs. REAL-Word-Low; REAL-Syll1-High vs. REAL-Syll1-Low;PSEUDO-Syll1-High vs. PSEUDO-Syll1-Low. All graphs show the pitch contours of the first syllable of thetwo word types under comparison. In the graphs, *, **, and *** significant differences at the p \ 0.05,p \ 0.01, and p \ 0.001 levels, respectively. For detailed ANOVA results, see Appendix 3 inSupplementary material. a L+L → LH+L (T1+T1 → T3+T1), b LH+LH → H+LH (T3+T3 → T2+T3), c HL+L → H+L (T4+T1 → T2+T1), d HL+HL → L+HL (T4+T4 → T1+T4), e LH+H → L+H (T3+T2 → T1+T2), f LH+HL → L+HL (T3+T4 → T1+T4)

Tone sandhi productivity in Tianjin Chinese 15

123

Author's personal copy

Fig. 3 continued

16 J. Zhang, J. Liu

123

Author's personal copy

Fig. 3 continued

Tone sandhi productivity in Tianjin Chinese 17

123

Author's personal copy

sandhi productivity for high-frequency words is also attested in the two REAL

comparisons of the obsolete sandhi HL+HL → L+HL (Fig. 3d): the sandhi tones

for words with higher frequency are lower than those for words with low frequency,

which, for a sandhi that lowers the base tone, indicates higher productivity for the

high-frequency words. This is likely due to the fact that high-frequency words are

more conservative in maintaining exceptional behavior, which in this case is to

maintain the sandhi. Interestingly, for the sandhi with exceptions, HL+L → H+L

(Fig. 3c), in the REAL-Word-High vs. REAL-Word-Low comparison, higher

frequency in fact leads to lower productivity as evidenced by the lower pitch of

the sandhi tone for the high-frequency words. This reversal of the pattern may also

be caused by the conservative nature of high-frequency words in maintaining

exceptional patterns; but in this case, the exceptional pattern is the failure to apply

this sandhi.

We also found a difference between the two half-third sandhis in the frequency

effect. For LH+H → L+H, two of the frequency comparisons (REAL-Word-High

vs. REAL-Word-Low; PSEUDO-σ1-High vs. PSEUDO-σ1-Low) showed a significant

difference in pitch slope, yet no significant difference was obtained for any

frequency comparison for LH+HL → L+HL. In particular, for the two

comparisons for LH+H → L+H that showed a significant difference, the high-

frequency words both had a more pronounced pitch fall at the beginning. Given that

the half-third sandhi in Tianjin primarily renders the first syllable a falling tone as

shown in Zhang and Liu (2011), this seems to indicate a productivity advantage for

the high-frequency words. This result, again, maybe due to the overall higher lexical

frequency of HL (Tone 4) than H (Tone 2), further encouraging proper learning of

the sandhi involving the former.

We fail to interpret two of the frequency patterns observed in PSEUDO words: the

high productivity of HL+L → H+L when σ1 has a high frequency and the low

productivity of HL+HL→ L+HL when σ1 has a high frequency. It is interesting to

note that these anomalies occur in the two sandhis with exceptional behaviors and

that they are the mirror images of the patterns observed for REAL words for these

sandhis. But we are yet to understand the significance of these observations.

4 A learning model

Our experimental results showed that Tianjin speakers’ knowledge of tone sandhi is

a combination of proper learning, underlearning, and overlearning from the lexicon:

on the one hand, lexical statistics do inform learning as evidenced by the frequency

effects found in our experiment; on the other hand, productive patterns in the

lexicon can be underlearned, especially when the patterns do not have strong

phonetic bases, and the underlearning can be gradiently manifested in the phonetic

realization of the sandhi tones, yet patterns with exceptions can also be overlearned

and generalized to novel words. It is therefore imperative to have a learning model

that is able to make these predictions.

18 J. Zhang, J. Liu

123

Author's personal copy

4.1 The maximum entropy (MaxEnt) model

To this end, we designed a substantively-biased learning model based on the

Maximum Entropy (MaxEnt) grammar. In MaxEnt, each constraint is associated

with a weight, and for each input, the probability of a particular candidate

surfacing as the output is determined by how well this candidate satisfies the

constraint weight hierarchy when compared with all other candidates. Learning in

a MaxEnt grammar is to determine the constraint weights that maximize the log

probability of the learning data, and for each constraint, the learner can impose a

Gaussian prior, with a mean of μ and a variance of σ2, over its weight to prevent

overfitting the data. The μ represents the default weight for the constraint, and σ2

determines the severity of the penalty when the weight of the constraint deviates

from μ—the smaller the σ2, the greater the penalty. Crucially, learning biasescan be encoded as different σ2s for different constraints. For more details on

MaxEnt grammars and learning biases as Gaussian priors, see Goldwater and

Johnson (2003), Wilson (2006), Jager (2007), and Hayes and Wilson (2008),

among others.

4.2 Constraints

We also base our analysis on the dual listing/generation model of Zuraw (2000,

2010). This model assumes that existing forms are lexically listed and are protected

by highly-ranked faithfulness constraints, but lower and stochastically-ranked

constraints can encode both patterns of lexical statistics and phonetically-based

generalizations. One crucial type of constraint in our model is USELISTED, inspired

by Zuraw (2000). Two types of USELISTED constraints are proposed. First, given that

the speakers performed the sandhis better in real words than in wug words, we posit

that the disyllabic words are listed in the lexicon with their sandhi tones, and there

are USELISTED constraints on disyllables that force the listed disyllables to be used as

in (5a). Second, since our results also showed that the speakers performed the

sandhis better when the first syllable is an existing Tianjin syllable than when it is an

accidental gap, this indicates that sandhi allomorphs of existing syllables are also

listed, and we posit a second group of USELISTED constraints that forces the listed

syllable allomorphs to be used in non-final sandhi positions as in (5b). Note that the

term “allomorph” in (5b) is used in a more abstract sense than the morpheme-

specific traditional definition of the term as it refers to syllables that can cue

multiple homophonous morphemes. For example, [panHL] is an existing syllable in

Tianjin and can represent morphemes meaning “half,” “partner,” “to mix,” “to act

as,” “to deal with,” and “to trip.” Therefore, this syllable has a listed allomorph

[panH] to be used before an L-toned syllable, and USELISTED(panHL/_L) requires the

use of this allomorph in the appropriate context regardless of which morpheme this

syllable represents.

Tone sandhi productivity in Tianjin Chinese 19

123

Author's personal copy

(5) USELISTED constraints:

a. USELISTED(σL–σL): Use the listed /σLH–σL/ for /σL/+/σL/.Mutatis mutandis for USELISTED(σLH–σLH), USELISTED(σHL–σL),USELISTED(σHL–σHL), USELISTED(σLH–σH), and USELISTED(σLH–σHL).

b. USELISTED(σL/_L): Use the listed allomorph /σLH/ for /σL/ before an

/L/-toned syllable.

Mutatis mutandis for USELISTED(σLH/_LH), USELISTED(σHL/_L),USELISTED(σHL/_HL), USELISTED(σLH/_H), and USELISTED(σLH/_HL).

In our implementation of the model, the USELISTED constraints in (5a) are word-

specific, and the ones in (5b) are syllable-specific. In other words, there are as many

(5a)-type USELISTED constraints as words in Tianjin, and there are as many (5b)-type

USELISTED constraints as syllable types. This is in the same spirit as the lexically

indexed constraints à la Coetzee (2009), Becker et al. (2011), and Coetzee and

Kawahara (2013), and it can be seen as a possible way in which lexical entries

interacts with the rest of the phonological grammar: the strength of the lexical entry

is now represented as the weight of its USELISTED constraint.

The USELISTED constraints employed here are different from USELISTED in Zuraw

(2000) in the following respects. First, Zuraw employs USELISTED only for

morphologically complex forms, not for allomorphs. Second, Zuraw assumes that

each candidate is an input–output pairing, and her USELISTED constraint is defined as

“The input portion of a candidate must be a single lexical entry” (p. 50). We have

made a different assumption: the candidate that is identical to the listed form is

necessarily derived from the listed form. Third, Zuraw uses only one USELISTED

constraint and encodes the strength of a lexical entry by a listedness value from 0 to

1 that is determined by the entry’s lexical frequency. The listedness value reflects

the availability of the lexical entry in the derivation of the output. We, on the other

hand, have a proliferation of USELISTED constraints whose weights reflect the

strengths of lexical entries and syllable allomorph listings as determined by their

frequencies. The assumption, then, is that a lexical entry for a word or an abstract

allomorph for a syllable with the appropriate tone sandhi is built together with a

USELISTED constraint whenever a sandhied formed is encountered and accepted by

the speaker, and the weight of the USELISTED constraint gradually increases as the

form is further encountered.5

5 A reviewer asked what restrictions USELISTED constraints would have and whether they only refer to

tones of a given language. These are interesting and difficult questions. Provided that (a) a lexical

phonological pattern is not entirely productive, and (b) there are productivity differences among different

types of phonological patterns, indicating that the lack of full productivity is not just a task effect, it is

necessary to encode the effect of lexicality for this pattern in the grammar. Therefore, USELISTED

constraints would be applicable to any type of phonological pattern, not just tonal ones. It is possible to

conceive of USELISTED constraints simply as IO-faithfulness constraints, which would require the output

to be identical to the listed form. This is essentially how we have used these constraints here. It is then

less of a surprise that these constraints are applicable to other phonological features. In a published update

of Zuraw (2000), Zuraw (2010) in fact rephrased the USELISTED constraints in similar terms and

distinguished the correspondence between the output and the listed form and the correspondence between

the output and the “underlying” form by shifting the burden of the latter to Output–Output-

correspondence. We have simply maintained the distinction between USELISTED and IO-faithfulness

here. The proliferation of the USELISTED constraints is necessary for the analysis of lexical frequency

20 J. Zhang, J. Liu

123

Author's personal copy

Markedness constraints that militate against certain tonal combinations and

hence motivate tone sandhi6 and faithfulness constraints that protect underlying

tones, as defined in (6) and (7), are also included in our model.

(6) Markedness constraints:

a. *L+L b. *LH–LH c. *HL–L

d. *HL–HL e. *LH–H f. *LH–HL

(7) Faithfulness constraints:7

a. PRESERVE(L) b. PRESERVE(H) c. PRESERVE(LH)

d. PRESERVE(HL/_L) e. PRESERVE(HL/_HL)

In order to capture the gradience observed in sandhi application, we define

these constraints to be gradient in that candidates may incur different degrees of

violation of the constraints encoded as different numbers of violation marks. We

assume that the number of violations for each constraint ranges from 0 to 4, with

0 indicating that the output tone completely satisfies the requirement set forth by

the constraint, and 4 indicating that the output tone maximally deviates from the

requirement. The 0–4 scale is admittedly ad hoc, but it represents a reasonable

trade-off between the contrastive tone differences in Tianjin and the potential

gradient steps between contrastive tones given the production and perception of

tones. As an illustration, Table 2 shows the evaluations of five candidates for a

real word with /L/+/L/ base tones and a listed /LH–L/ form against USELISTED

(σL–σL), USELISTED(σL/_σL), PRESERVE(L), and *L–L. The five candidates [L–L],

[LL↑–L], [LM–L], [LH↓–L], and [LH–L] are phonetically evenly spaced between

[L–L] and [LH–L]. The closer a candidate is to /L–L/, the more violations it

incurs for USELISTED(σL–σL), USELISTED(σL/_L), and *L–L, but the fewer

violations it incurs for PRESERVE(L).

Footnote 5 continued

effects on productivity as well as lexical variation, and Coetzee (2009), Becker et al. (2011), and Coetzee

and Kawahara (2013), among others, have used a similar strategy.6 The markedness constraints should be taken as phonotactic generalizations that speakers make when

tonal alternations are encountered. This is different from the canonical OT assumption that all constraints

are in UG (Prince and Smolensky 1993). For modeling the learning of phonotactic constraints, see Hayes

and Wilson (2008).7 The reason we use PRESERVE instead of IDENT in our faithfulness constraints is that in its formal

definition, IDENT(F) requires [F] to be a distinctive feature; the featural representation of tone, however, is

controversial in both the number of tone levels and whether there are contour tone features (see Zhang

2010 for a review of the issue). We have therefore chosen to use the theory-neutral PRESERVE to avoid this

controversy.

Tone sandhi productivity in Tianjin Chinese 21

123

Author's personal copy

4.3 Learning biases as σ2 values

We set the default weight μ to be 0 and the default σ2 to be 10−3 for all constraints.

But we also encode two learning biases by adjusting the σ2 values of the USELISTED

and the markedness constraints in the following ways.

First, the σ2 value of each USELISTED constraint is multiplied by a coefficient

BListed that is smaller than 1 and thus biases against promoting the weight of the

constraint. For each USELISTED constraint, we posit BListed to be 10 to the negative

power of a logistic function, in which x represents the number of morphemes that

the USELISTED constraint covers as in (8). The x value for the USELISTED

constraints for disyllabic words is naturally 1. For the USELISTED constraints for

syllable-level allomorphs, the x value equals the number of homophones that the

syllable represents. As estimated from Da’s (2004) corpus, the average numbers

of homophones for a syllable in each of the tones in Mandarin are summarized in

Table 3. We will use these numbers as approximations for the x values for the

syllable-level USELISTED constraints in our learning simulation. The BListed values

according to these numbers are summarized in Table 3 as well. The intuition

behind this bias coefficient is that learners use lexical information in concomi-

tance with grammatical resources such as the MARKEDNESS » FAITHFULNESS ranking

to make phonological generalizations, but they do so cautiously, expressed in the

model by assigning USELISTED constraints greater penalties if they deviate from

the default ranking of 0, so that the weights of these constraints are harder to

promote; moreover, learners are unwilling to treat large amounts of data as listed

behavior, expressed in the model as greater penalties for syllable-level USELISTED

constraints, so that these constraints are even harder to promote along the weight

scale.

(8) BListed ¼ 10� 1

1þe1�0:25x

(x = the number of morphemes that the USELISTED constraint covers.)

Second, we encode a learning bias in favor of promoting the weights of

USELISTED constraints that regulate base-sandhi mappings with a strong phonetic

basis [i.e., USELISTED(σLH/_H), USELISTED(σLH/_HL)] and the relevant markedness

constraints (i.e., *LH+H, *LH+HL) by multiplying their σ2 values with a

coefficient BPhonetics = 10. The rest of the constraints are assumed to have a

Table 2 Constraint evaluations

Base: /L/+/L/

listed: /LH–L/

USELISTED(σL–σL) USELISTED(σL/_L) PRESERVE(L) *L–L

L–L 4 4 4

LL↑–L 3 3 1 3

LM–L 2 2 2 2

LH↓–L 1 1 3 1

LH–L 4

22 J. Zhang, J. Liu

123

Author's personal copy

BPhonetics = 1. This coefficient expresses a substantive bias à la Wilson (2006) in

allowing phonetically motivated patterns to have an edge in learning over other

patterns (see also Zhang and Lai 2010; Zhang et al. 2009, 2011). Each USELISTED

constraint’s σ2 value, then, is 10−3 multiplied by its BListed and BPhonetics values while

the rest of the constraints’ σ2 values are 10−3 multiplied by their respective BPhonetics

values.

The σ2 values for all constraints are summarized in Table 4.

4.4 Learning simulations

The goal of our learning simulation is to train the learner with a representative

sample of the Tianjin lexicon so that it will acquire a grammar that can predict our

speakers’ wug test behavior. The learning was simulated using the MaxEnt

Grammar Tool (Hayes et al. 2009a). The training dataset included 20 real words for

each of the base tone combinations L+L, LH+LH, HL+L, HL+HL, LH+H, and

LH+HL. Among the 20 words for each tonal combination, 10 were high frequency,

and 10 were low frequency. We used the average raw frequencies of the disyllabic

words in each tonal combination used in our experiment from Da’s corpus to

simulate the token frequencies of words in the training dataset as shown in Table 5.

For example, for L+L, each of the 10 high-frequency words had a token frequency

of 4615, and each of the 10 low-frequency words had a token frequency of 75.

For each word, five candidates whose initial syllables were phonetically evenly

spaced between the base tone and the sandhi tone, like in Table 2, were considered.

For L+L, LH+LH, LH+H, and LH+HL, the base tone was consistently listed as

undergoing sandhi; for HL+L, one high-frequency word and one low-frequency

word were listed not to undergo sandhi; for HL+HL, only one high-frequency word

and one low-frequency word were listed to undergo sandhi. The USELISTED

constraints were indexed to the words and the syllables.

Each sandhi was tested separately, and the learner acquired the weights of the

constraints relevant for the sandhi. We will not list the weights for individual

constraints due to the large number of word- and syllable-specific USELISTED

constraints. But overall, the USELISTED constraints for high-frequency disyllabic

words have higher weights than those for low-frequency disyllable words, and the

USELISTED constraints for high-frequency syllable allomorphs have higher weights

Table 3 BListed values for

USELISTED constraintsx BListed

USELISTED(σ–σ) 1 0.4777

USELISTED(σL/_σL) 5.45 0.2573

USELISTED(σLH/_σLH) 3.72 0.3292

USELISTED(σHL/_σL) 5.76 0.2465

USELISTED(σHL/_σHL) 5.76 0.2465

USELISTED(σLH/_σH) 3.72 0.3292

USELISTED(σLH/_σHL) 3.72 0.3292

Tone sandhi productivity in Tianjin Chinese 23

123

Author's personal copy

than those for low-frequency syllable allomorphs. Also, USELISTED for a disyllabic

word has a higher weight than USELISTED for the syllable allomorph of its first

syllable. The markedness constraints generally have high weights except for *HL–

HL, which has a weight of 0. The faithfulness constraints, on the other hand, have a

weight of 0 except for PRESERVE(HL/_HL), which has a high weight.

To test the accuracy of the learning model, we considered the learner’s

predictions for five types of words: high- and low-frequency real words (REAL-High,

REAL-Low), pseudo words in which σ1 comes from high- and low-frequency real

words (PSEUDO-High, PSEUDO-Low), and novel words with a nonce σ1 (NOVEL). For

PSEUDO words, we assumed that the only relevant type of USELISTED constraint was

the syllable-level constraints, and for NOVEL words, none of the USELISTED

constraints were relevant. For HL+L and HL+HL, we tested both words that are

listed to undergo sandhi as well as words that are listed not to. The learner made

predictions on the percentages of the five output candidates whose initial syllables

were phonetically evenly spaced between the base tone and the sandhi tone.

Given that for all the sandhis, the base and the sandhi tones differ in pitch only at

either the left or the right edge of the tone, in reporting the learner’s predictions, we

report the average pitch of this crucial edge according to the predicted outputs for each

base tone combination. To facilitate the pitch calculation, we represented the pitch on a

1–5 numerical scale, on which 5=H, 4=H↓, 3=M, 2= L↑, 1= L. To illustrate, take

Table 4 σ2 values for all constraints

Constraints σ2 Constraints σ2

USELISTED(σ–σ) 0.0004777 *HL+L 0.001

USELISTED(σL/_L) 0.0002573 *HL+HL 0.001

USELISTED(σLH/_LH) 0.0003292 *LH+H 0.01

USELISTED(σHL/_L) 0.0002465 *LH+HL 0.01

USELISTED(σHL/_HL) 0.0002465 PRESERVE(L) 0.001

USELISTED(σLH/_H) 0.003292 PRESERVE(H) 0.001

USELISTED(σLH/_HL) 0.003292 PRESERVE(LH) 0.001

*L+L 0.001 PRESERVE(HL/_L) 0.001

*LH+LH 0.001 PRESERVE(HL/_HL) 0.001

Table 5 Token frequencies of

disyllabic words in the training

dataset

High frequency Low frequency

L+L 4615 75

LH+LH 3267 201

HL+L 3652 307

HL+HL 3629 173

LH+H 2851 117

LH+HL 4291 216

24 J. Zhang, J. Liu

123

Author's personal copy

the example of a high-frequency real word with the base tones /L/+/L/: if the learner

predicts the five candidates [L–L], [LL↑–L], [LM–L], [LH↓–L], and [LH–L] to have

the percentages 0.005, 0.062, 0.705, 8.021, and 91.206 %, respectively, then the

predicted average offset pitch for σ1 is 19 0.005 %+ 29 0.062 %+ 39 0.705 %+

4 9 8.021 % + 5 9 91.206 % = 4.9036. For HL+L and HL+HL, the average pitch

Fig. 4 The learner’s predictions for the behavior of different sandhis for five word types: REAL-High,REAL-Low, PSEUDO-High, PSEUDO-Low, and NOVEL. Bars in the graphs represent the average pitchesamong the predicted outputs of the edge of the tone where the base and the sandhi tones differ. The pitchis represented in a 1–5 numerical scale: 5 = High and 1 = Low. a L+L → LH+L (T1+T1 → T3+T1),b LH+LH→ H+LH (T3+T3→ T2+T3), c HL+L → H+L (T4+T1→ T2+T1), d HL+HL→ L+HL(T4+T4 → T1+T4), e LH+H → L+H (T3+T2 → T1+T2) f LH+HL → L+HL (T3+T4 → T1+T4)

Tone sandhi productivity in Tianjin Chinese 25

123

Author's personal copy

was derived by proportionally combining the predictions for forms with listed sandhi

and the predictions for forms with listed no-sandhi (9:1 for HL+L, 1:9 for HL+HL).

The learner’s predictions are summarized in Figure 4.

For the sandhi L+L → LH+L, given that the sandhi changes L to LH, the higher

the right edge of the output tone is, the more productively the sandhi has applied.

Our predictions in Fig. 4a are that, first, there is a general gradation of sandhi

productivity from REAL to PSEUDO to NOVEL, and second, the sandhi applies more

productively in high-frequency than low-frequency words. These predictions were

borne out in our experimental results. For LH+LH → H+LH (Fig. 4b), the pattern

is similar.

For the two half-third sandhi patterns LH+H → L+H (Fig. 4e) and LH+

HL → L+HL (Fig. 4f), our model predicts that the magnitude of the differences

among the different word types is smaller due to the Bphonetics coefficient that

allowed the weights of relevant USELISTED and markedness constraints to be

promoted more easily. In our experiment, the LH+H → L+H sandhi showed

underlearning in PSEUDO and NOVEL words but no clear results based on lexical

frequency; the LH+HL sandhi showed proper learning. Our model in fact predicts

slightly smaller pitch differences among different word types for the latter due to the

higher token frequencies of the LH+HL words in the learner’s input.

Regarding the sandhis with exceptional behavior, for HL+HL → L+HL we

predicted a higher productivity in real words, especially those with high frequency

as indicated by a lower σ1 onset pitch (Fig. 4d), and for HL+L → H+L we

predicted the mirror image, namely, a lower productivity in high-frequency real

words as indicated by a lower σ1 offset pitch (Fig. 4c); both agreed with the

experimental results. In our model, the nature of the predicted productivity

differences is a combination of categorical and gradient differences in the

application of the sandhis. This also echoes the experimental results.

We compared the current model with a baseline model in which the phonetic

nature of the sandhi is not encoded in the grammar; in other words, Bphonetics = 1 for

all constraints. This baseline model makes different predictions for LH+H and

LH+HL as shown in Fig. 5. The main difference is in the magnitude of the

Fig. 5 The predictions of the baseline learner in which Bphonetics = 1 for all constraints for the LH+H andLH+HL sandhis. a LH+H → L+H (T3+T2 → T1+T2), b LH+HL → L+HL (T3+T4 → T1+T4)

26 J. Zhang, J. Liu

123

Author's personal copy

predicted pitch difference: the baseline model predicts a considerably larger

productivity difference between different word types and words of different lexical

frequencies. Given that in our experimental results, lexical frequency had a

significant effect on productivity for only a subset of the comparisons for LH+H

(Fig. 3e), and neither word type nor lexical frequency had a significant effect on

productivity for LH+HL, the smaller effect predicted by the model with the

Bphonetics coefficient is more consistent with the experimental results.

Another baseline model that we compared our results to was one in which there is

no bias against the promotion of USELISTED constraints; in other words, BListed = 1.

The phonetic bias is retained. The predictions of this baseline model are given in

Fig. 6. Compared to the original model, this baseline model predicts similar

patterns, but the differences predicted among word types and different frequencies

are of slightly greater magnitudes. This works to the advantage of the non-phonetic

sandhis of L+L, LH+LH, HL+L, and HL+HL as the magnitude of effects

predicted by the original model was smaller than the attested effects, but to the

disadvantage of the two phonetic sandhis LH+H and LH+HL as our experimental

result showed no consistent effect.

However, we do not consider this baseline model to be a theoretically sound

model. This is because earlier work by Zhang et al. (e.g., 2009, 2011) has shown

that the BListed coefficients are crucial to the learning of opaque tone sandhi patterns

in Taiwanese. There is thus no reason to assume that they would not be relevant for

Tianjin. The reason these coefficients are particularly important for opaque sandhis

is that these sandhis cannot be captured by the MARKEDNESS » FAITHFULNESS schema

and must be acquired through lexical and allomorph listings in our model; in order

to capture the lack of full productivity of the opaque sandhis manifested in wug

tests, the learning model must actively suppress the promotion of weights for

USELISTED constraints, especially those regarding syllable and tonal allomorphs. For

transparent sandhis like those in Tianjin, the patterns are the combined result of both

USELISTED and markedness constraints. The suppression of weights for USELISTED,

therefore, has less of a dramatic effect as the markedness constraints will

compensate for the effect by acquiring greater weights. Indeed, in the baseline

simulation where BListed was set to 1, the weights for USELISTED constraints were

greater, but the weights for the markedness constraints were smaller. This trade-off

produced the similar effects of word type and frequency to the original model.

Finally, we also tested a baseline model in which both Bphonetics and BListed are set

to 1. Aside from the theoretical issue of not suppressing the weights for USELISTED

constraints just mentioned, this model has the same problem as the first baseline

model in predicting a productivity difference between different word types and

words of different lexical frequencies for the two half-third sandhis as shown in

Fig. 7. The predicted patterns for the sandhis without clear phonetic motivation are

identical to those in Fig. 6a–d.

Overall, we believe that our biased model is the one that is both theoretically sound

and makes good empirical predictions. It succeeds in predicting the simultaneous

underlearning and overlearning of the sandhi patterns: the learner can underlearn the

sandhi patterns slightly despite their full productivity in the lexicon, but it can also

overgeneralize the sandhi with exceptions to wug words; both the underlearning and

Tone sandhi productivity in Tianjin Chinese 27

123

Author's personal copy

overlearning are correlated with the frequency effects in the right direction as well.

Regarding proper learning for one of the half-third sandhis, the biased model only

predicts smaller differences in productivity between real and wug words, not the

identity between the two; the predicted differenceswould likely be further reduced if we

took type frequency into account in our model as the HL tone that triggers the properly

learned half-third sandhi has the highest syllable-type frequency among all tones.

Fig. 6 The predictions of the baseline learner in which Blisted = 1 for all USELISTED constraints for the sixsandhi patterns. a L+L → LH+L (T1+T1 → T3+T1), b LH+LH→ H+LH (T3+T3 → T2+T3), c HL+L → H+L (T4+T1 → T2+T1), d HL+HL → L+HL (T4+T4 → T1+T4), e LH+H → L+H (T3+T2 → T1+T2), f LH+HL → L+HL (T3+T4 → T1+T4)

28 J. Zhang, J. Liu

123

Author's personal copy

Our model, however still needs improvements in the following areas. First, the

overall magnitudes of the predicted productivity differences are currently too small

compared to our wug test results. Second, like our failure of interpretation, we also

fail to model the frequency patterns in PSEUDO words for the two sandhis with

exceptional behaviors. Third, although we have commented on the influence of

Beijing and Standard Chinese (SC) on Tianjin, our model has not formally taken

this influence into account and can hence only model underlearning or overlearning

effects due to Tianjin-internal factors. A more comprehensive model should be able

to make predictions on how the SC input helps shape the productivity patterns.

5 General discussion

5.1 The theoretical model

Earlier theoretical analyses of disyllabic tone sandhi in Tianjin (e.g., Wang 2002;

Lin 2008) take the four sandhi patterns in (2) as a given in terms of both their

productivity and their neutralizing nature and account for the patterns via the

interaction between various types of tonal Obligatory Contour Principle (OCP,

Leben 1973) constraints and tonal faithfulness constraints. For example, Lin (2008)

accounts for the sandhi L+L → LH+L as in (9). The tones are represented on two

levels: the tonal level (T), which is directly associated with the syllable, and the

tonemic level (t), which are level components of contour tones dominated by the

tonal level. OCP and faithfulness constraints to tones can be defined on either the T

or the t level, and subscripted L or R indicates the level tone on the left or right of a

contour tone. The conjoined constraint [IDENT(t)R and IDENT(t)L]T militates against

changing both the left and right edges of the contour tone.

Fig. 7 The predictions of the baseline learner in which Bphonetics = Blisted = 1 for all constraints for theLH+H and LH+HL sandhis. a LH+H → L+H (T3+T2 → T1+T2), b LH+HL → L+HL (T3+T4 → T1+T4)

Tone sandhi productivity in Tianjin Chinese 29

123

Author's personal copy

We have taken a very different approach in our analysis. The advantage of our

proposal is that it is a more accurate reflection of speakers’ knowledge of Tianjin

disyllabic tone sandhi, which involves exceptions and incomplete neutralization,

and the productivity of the patterns also varies depending on the sandhi. An analysis

along the lines of (9) misses these nuanced yet important generalizations. The price

that we pay, however, is that we now have a proliferation of USELISTED constraints

that interact with the rest of the grammar, and the syllable-level USELISTED

constraints partially duplicate the function of the MARKEDNESS » FAITHFULNESS

ranking. Zhang et al. (2009, 2011) have shown that this duplication is empirically

necessary to capture the lack of full productivity of the opaque tone sandhis in

Taiwanese. What we have seen here is that even for transparent sandhis that can be

captured by the markedness and faithfulness interaction, full productivity is still not

guaranteed, and lexical listing is still necessary.

The coexistence of traditional markedness and faithfulness constraints with ad

hoc USELISTED constraints that require the surface forms to use listed allomorphs

coincides with Moreton’s (2004) argument that the grammar is composed of an

innate and “conservative” component of markedness and faithfulness constraints

and a language-specific component with constraints that require particular lexical

items to have particular surface representations in particular environments. We

share with Moreton (2004) the intuition that such constraints are necessary in the

grammar in any case to deal with processes that target specific lexical items and

morphological categories, that are suppletive, and that have lexical exceptions, but

we have taken the position one step further by positing that speakers will build

lexical constraints in any event. In other words, USELISTED can be considered as a

universal template into which learners plug the specifics of their language.

Finally, the model proposed here has affinities with exemplar-based models of

grammar (e.g., Bybee 2001, 2006; Pierrehumbert 2001, 2002; Gahl and Yu 2006) in

that it allows usage frequency effects on phonological patterning to be captured. But

the frequency effects are derived through the weights of USELISTED constraints,

which interact with other constraints in the grammar, rather than just emerging from

the lexicon. Thus, the frequency effects are predicted to interact with other

grammatical effects in ways constrained by the grammar.

30 J. Zhang, J. Liu

123

Author's personal copy

5.2 Size of the acoustic effects

We have seen in the acoustic results that although some of the word-type or

frequency comparisons show significant differences in either pitch mean or pitch

slope, the differences are typically of small magnitudes. Absolute f0 differences

found in the comparisons are generally in the order of a few Hertz before

normalization. An anonymous reviewer questioned whether such small differences

can be the basis for the claim of learning differences and, hence, grammatical

differences. The point that we would like to emphasize, however, is that our main

result is in the different behaviors of different sandhi patterns, and we have

interpreted the different behaviors based on the lexical and phonetic properties of

the sandhi patterns that are known to affect phonological productivity in general.

Therefore, the pitch differences, though small, cannot be easily claimed to have

resulted from the nature of the task and swept under the rug; they need to be

accounted for in other ways. The position we have taken is that the lexical and

phonetic properties of the sandhi directly influence its production grammar. The fact

that the small acoustic differences may not be perceptible does not contradict the

fact that different sandhis are processed differently in production. This is in fact a

familiar scenario: production and perception studies of incompletion neutralization

and near merger often show consistent small acoustic differences as the result of

these, but speakers’ perceptual use of these subtle cues is highly context-dependent

and often unreliable (e.g., Jassem and Richter 1989; Port and Crawford 1989; Peng

2000; Warner et al. 2004; Yu 2007; Herd et al. 2010).

5.3 Aggregate vs. individual differences

As pointed out by an anonymous reviewer, it is important to recognize that our

grammatical model is based on the aggregate acoustic results from multiple speakers

of Tianjin. Therefore, the model is only a representation of the behavior of an

idealized native speaker of Tianjin. As we have commented in Sects. 1.3 and 3, there

were clearly individual differences in how the speakers behaved. This means that

each individual speaker’s grammar will deviate from the model that we have

proposed. However, we opted not to take each individual speaker’s data and construct

a grammar for him/her as idiosyncrasies of the speakers will likely be overrepresented

in these grammars while a grammar based on the aggregate results is more likely to be

representative of the Tianjin language. This is common practice for modeling

analyses of phonological patterns based on experimental results or corpus data (e.g.,

Wilson 2006; Hayes and Londe 2006; Coetzee and Pater 2008; Hayes et al. 2009b;

Becker et al. 2011; Zuraw 2010; Coetzee and Kawahara 2013).

6 Conclusions

The tone sandhi patterns of Tianjin Chinese are variable, gradient, and full of

exceptions. To understand how the speakers of Tianjin tackle phonological patterns

with such complexity, we conducted a wug test to investigate the productivity of the

Tone sandhi productivity in Tianjin Chinese 31

123

Author's personal copy

sandhi patterns. Our results indicate that a Tianjin speaker’s knowledge of tone

sandhi may differ from the sandhi pattern in the lexicon in nuanced ways: sandhis

with exceptions can be generalized and overlearned while a number of fully

productive sandhis in the lexicon are underlearned, both of which illustrate the

effects of frequency and lexical listing on sandhi productivity; the phonetic nature

of a sandhi may encourage learning, bringing underlearning closer to proper

learning. These mismatches are claimed here to be informative as to the nature of

the speakers’ phonological grammars. A model of the grammar, consequently,

needs to be quantitative and flexible enough to capture the variability, gradience,

and exceptions, and the resultant overlearning and underlearning effects.

Acknowledgments We are indebted to Ping Wang, Xiaoyu Zeng, and Feng Shi at Nankai Universityfor hosting us during data collection and discussing various aspects of this project with us. We also thankGeng Wang for serving as our Tianjin language consultant and the speakers of Tianjin who participated inour experiment. We are grateful to the participants at GLOW-Asia 8 and the second Pan-American/Iberian Meeting on Acoustics, especially James Myers, Doug Whalen, and Charles Yang, for theircomments on this research. We, however, remain fully responsible for the opinions expressed here. Thisresearch was supported by the National Science Foundation grant BCS-0750773 and the University ofKansas General Research Fund 2301166.

References

Albright, Adam. 2002. Islands of reliability for regular morphology: Evidence from Italian. Language 78(4): 684–709.

Albright, Adam, and Bruce Hayes. 2003. Rules vs. analogy in English past tenses: A computational/ex-

perimental study. Cognition 90: 119–161.

Albright, Adam, Argelia Andrade, and Bruce Hayes. 2001. Segmental environments of Spanish

diphthongization. In UCLA working papers in linguistics 7, (Papers in phonology 5), ed. AdamAlbright, and Taehong Cho, 117–151. Los Angeles: UCLA Department of Linguistics.

Becker, Michael, Nihan Ketrez, and Andrew Nevins. 2011. The surfeit of the stimulus: Analytic biases

filter lexical statistics in Turkish laryngeal alternations. Language 87: 84–125.

Berent, Iris, Donca Steriade, Tracy Lennertz, and Vered Vaknin. 2007. What we know about what we

have never heard: Evidence from perceptual illusions. Cognition 104: 591–630.

Berko, Jean. 1958. The child’s learning of English morphology. Word 14: 150–177.

Boersma, Paul and David Weenink. 2009. Praat: Doing phonetics by computer (computer program).http://www.praat.org/. Accessed 5 Jan 2009.

Bybee, Joan. 2001. Phonology and language use. Cambridge: Cambridge University Press.

Bybee, Joan. 2006. From usage to grammar: The mind’s response to repetition. Language 82: 711–733.Bybee, Joan, and Elly Pardo. 1981. On lexical and morphological conditioning of alternations: A nonce-

probe experiment with Spanish verbs. Linguistics 19: 937–968.Chao, Yuen Ren. 1968. A grammar of spoken Chinese. Berkeley and Los Angeles: University of

California Press.

Chen, Matthew Y. 1987. The syntax of Xiamen tone sandhi. Phonology Yearbook 4: 109–150.

Chen, Matthew Y. 2000. Tone sandhi: Patterns across Chinese dialects. Cambridge: Cambridge

University Press.

Cheng, Robert L. 1968. Tone sandhi in Taiwanese. Linguistics 41: 19–42.Coetzee, Andries W. 2009. Learning lexical indexation. Phonology 26: 109–145.

Coetzee, Andries W., and Shigeto Kawahara. 2013. Frequency biases in phonological variation. NaturalLanguage and Linguistic Theory 31: 47–89.

Coetzee, Andries W., and Joe Pater. 2008. Weighted constraints and gradient restrictions on place co-

occurrence in Muna and Arabic. Natural Language and Linguistic Theory 26: 289–337.

32 J. Zhang, J. Liu

123

Author's personal copy

Coetzee, Andries, and Joe Pater. 2011. The place of variation in phonological theory. In The handbook ofphonological theory, 2nd ed, ed. John A. Goldsmith, Jason Riggle, and Alan C.L. Yu, 401–434.

Cambridge, MA and Oxford, UK: Blackwell.

Da, Jun. 2004. Chinese text computing. http://lingua.mtsu.edu/chinese-computing. Accessed 1 Sept 2008.

Gahl, Susanne, and Alan Yu. 2006. Special issue on exemplar-based models in linguistics. The LinguisticReview 23(3): 289–318.

Gandour, Jackson T., Siripong Potisuk, and Sumalee Dechongkit. 1994. Tonal coarticulation in Thai.

Journal of Phonetics 22: 474–492.Gao, Jing. 2004. The changing sandhi rules in Tianjin dialect. In Phonetic and phonological studies on

Tianjin dialect, ed. Lu Jilun, 193–247. Beijing: Beijing Institute of Technology Press.

Goldwater, Sharon, and Mark Johnson. 2003. Learning OT constraint ranking using a maximum entropy

model. In Proceedings of the Stockholm workshop on variation within optimality theory, ed. JenniferSpenader, Anders Eriksson, and Osten Dahl, 111–120. Stockholm: Stockholm University.

Hayes, Bruce, and Zsuzsa C. Londe. 2006. Stochastic phonological knowledge: The case of Hungarian

vowel harmony. Phonology 23: 59–104.

Hayes, Bruce, and Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic

learning. Linguistic Inquiry 39: 379–440.

Hayes, Bruce, Colin Wilson, and Benjamin George. 2009a. Maxent grammar tool. Java program.

http://www.linguistics.ucla.edu/people/hayes/MaxentGrammarTool/. Accessed 22 May 2009.

Hayes, Bruce, Kie Zuraw, Peter Siptar, and Zsuzsa Londe. 2009b. Natural and unnatural constraints in

Hungarian vowel harmony. Language 85: 822–863.

Herd, Wendy, Allard Jongman, and Joan Sereno. 2010. An acoustic and perceptual analysis of /t/ and /d/

flaps in American English. Journal of Phonetics 38: 504–516.Hsieh, Hsin-I. 1970. The psychological reality of tone sandhi rules in Taiwanese. In Papers from the 6th

meeting of the Chicago Linguistic Society, ed. M.A. Campbell, 489–503. Chicago: Chicago

Linguistic Society.

Hsieh, Hsin-I. 1975. How generative is phonology. In The transformational-generative paradigm andmodern linguistic theory, ed. E.F. Koerner, 109–144. Amsterdam: John Benjamins.

Hsieh, Hsin-I. 1976. On the unreality of some phonological rules. Lingua 38: 1–19.

Hyman, Larry. 2007. Universals of tone rules: 30 years later. In Tones and tunes vol 1: Typological studiesin word and sentence prosody, ed. Tomas Riad, and Carlos Gussenhoven, 1–34. Berlin: Mouton de

Gruyter.

Jager, Gerhard. 2007. Maximum Entropy models and stochastic optimality theory. In Architectures, rulesand preferences: Variation on themes by Joan W. Bresnan, ed. Annie Zaenen, Jane Simpson, Tracy

H. King, Jane Grimshaw, Joan Maling, and Chris Manning, 467–479. Stanford: CSLI Publications.

Jassem, Wiktor, and Lutoslawa Richter. 1989. Neutralization of voicing in Polish obstruents. Journal ofPhonetics 17: 317–325.

Jiang, Hui. 1994. The phonetic description of neutral tone in Tianjin dialect. MA thesis, Tianjin Normal

University, Tianjin.

Kenstowicz, Michael, and Charles Kisseberth. 1979. Generative phonology: Description and theory. SanDiego: Academic.

Kiparsky, Paul. 1973. Abstractness, opacity, and global rules. In Three dimensions of linguistic theory, ed.Osamu Fujimura, 57–86. Tokyo: TEC Company Ltd.

Kirchner, Robert. 1996. Synchronic chain shifts in optimality theory. Linguistic Inquiry 27: 341–350.

Leben, William. 1973. Suprasegmental phonology. PhD dissertation, MIT.

Li, Xing-Jian, and Si-Xun Liu. 1985. Tianjin fangyan de liandu biandiao [Tone sandhi in the Tianjin

dialect]. Zhongguo Yuwen [Studies of the Chinese Language] 1985(1): 76–80.Liang, Yuzhang, and Aizhen Feng. 1996. Fuzhouhua yindang [The sound system of Fuzhou dialect].

Shanghai: Shanghai Education.

Lin, Huishan. 2008. Variable directional applications in Tianjin tone sandhi. Journal of East AsianLinguistics 17: 181–226.

Liu, Yu-Zhen, and Jiang Gao. 2003. Qu-Qu liandu biandiao guize: shehui yuyanxue bianxiang [FF sandhi

rule in Tianjin dialect: A sociolinguistic variable]. Tianjin Shifan Daxue Xuebao—Shehui Kexue Ban[Journal of Tianjin Normal University—Social Sciences] 2003(5): 65–69.

Lu, Ji-Lun. 1997. Tianjin fangyan zhong de yizhong xin de liandu biandiao [A new tone sandhi rule in

Tianjin dialect]. Tianjin Shida Xuebao [Journal of Tianjin Normal University] 1997(4): 67–72.

Tone sandhi productivity in Tianjin Chinese 33

123

Author's personal copy

Lu, Ji-Lun. 2004. A new phenomenon in Tianjin tone sandhi. In Phonetic and phonological studies onTianjin dialect: Festschrift for Professor Wang Jialing’s 70th birthday, ed. Lu Ji-Lun, 89–137.

Beijing: Beijing Institute of Technology Press.

Ma, Qiuwu, and Yuan Jia. 2006. Tianjinhua shangsheng de liangtiao “biandiao guize” bianxi [Two new

third tone sandhi rules in Tianjin dialect—a critical reanalysis]. Tianjin Shifan Daxue Xuebao—Shehui Kexue Ban [Journal of Tianjin Normal University—Social Science] 2006(1): 53–58.

Maddieson, Ian. 1978. Universals of tone. In Universals of human language, vol. 2: Phonology, ed. JosephH. Greenberg, 335–366. Stanford: Stanford University Press.

Moreton, Elliott. 2004. Non-computable functions in optimality theory. In Optimality theory inphonology, ed. John McCarthy, 141–164. Malden: Blackwell.

Moreton, Elliott. 2008. Analytical bias and phonological typology. Phonology 25: 83–127.

Peng, Shu-Hui. 1997. Production and perception of Taiwanese tones in different tonal and prosodic

contexts. Journal of Phonetics 25: 371–400.Peng, Shu-Hui. 2000. Lexical versus ‘phonological’ representations of Mandarin sandhi tones. In

Language acquisition and the lexicon: Papers in laboratory phonology 5, ed. Michael B. Broe, and

Janet B. Pierrehumbert, 152–167. Cambridge: Cambridge University Press.

Pierrehumbert, Janet B. 2001. Exemplar dynamics: Word frequency, lenition and contrast. In Frequencyand the emergence of linguistic structure, ed. Joan Bybee, and Paul Hopper, 137–157. Amsterdam:

John Benjamins.

Pierrehumbert, Janet B. 2002. Word-specific phonetics. In Laboratory phonology 7, ed. Carlos

Gussenhoven, and Natasha Warner, 101–139. Berlin: Mouton de Gruyter.

Pierrehumbert, Janet B. 2006. The statistical basis of an unnatural alternation. In Laboratory phonology 8.Varieties of phonological competence, ed. Louis Goldstein, Douglas H. Whalen, and Catherine Best,

81–107. Berlin: Mouton de Gruyter.

Port, Robert, and Penny Crawford. 1989. Incomplete neutralization and pragmatics in German. Journal ofPhonetics 17: 257–282.

Prince, Alan, and Paul Smolensky. 1993. Optimality theory: Constraint interactions in generativegrammar. New Brunswick: Rutgers Center for Cognitive Science, Rutgers University. (re-printedin 2004 by MIT Press, Cambridge, MA).

Rietveld, Toni, and Aoju Chen. 2006. How to obtain and process perceptual judgements of intonational

meaning. In Methods in empirical prosody research, ed. Stefan Sudhoff, Denisa Lenortova, Roland

Meyer, Sandra Pappert, Petra Augurzky, Ina Mleinek, Nicole Richter, and Johannes Schieβer, 283–319. Berlin: Walter de Gruyter.

Rose, Phil. 1987. Considerations in the normalization of the fundamental frequency in linguistic tone.

Speech Communication 6: 343–351.

Shi, Feng. 1986. Tianjin fangyan shuangzizu shengdiao fenxi [An analysis of disyllabic tones in Tianjin

dialect]. Yuyan Yanjiu [Linguistic Research] 1986(1): 77–90.Shi, Feng. 1988. Shilun Tianjinhua de shengdiao jiqi bianhua—xiandai yuyinxue biji [On tones and their

recent changes in Tianjin dialect—modern phonetics notes]. Zhongguo Yuwen [Studies of theChinese Language] 1988(5): 351–360.

Shi, Feng. 1990. Hanyu he Dong-Tai yu de shengdiao geju [Tone systems in Chinese and Kam-Tai

languages]. PhD dissertation, Nankai University, Tianjin.

Shi, Feng, and Ping Wang. 2004. Tianjinhua shengdiao de xin bianhua [New changes in Tianjin tones]. In

The joy of research: A festschrift in honor of Professor William S.-Y. Wang on his seventieth birthday,ed. Feng Shi, and Zhongwei Shen, 176–188. Tianjin: Nankai University Press.

Wang, Samuel H. 1993. Taiyu biandiao de xinli texing [On the psychological status of Taiwanese tone

sandhi]. Tsinghua Xuebao [Tsinghua Journal of Chinese Studies] 23: 175–192.Wang, Jia-Ling. 2002. Youxuanlun he Tianjinhua de liandu biandiao ji qingsheng [Optimality Theory and

tone sandhi and neutral tone in Tianjin dialect]. Zhongguo Yuwen [Studies of the Chinese Language]2002(4): 363–371.

Warner, Natasha, Allard Jongman, Joan Sereno, and Rachel Kemper. 2004. Incomplete neutralization of

sub-phonemic durational differences in production and perception of Dutch. Journal of Phonetics32: 251–276.

Wee, Lian-Hee. 2004. Inter-tier correspondence theory. PhD dissertation, Rutgers University, New

Brunswick, NJ.

Wilson, Colin. 2006. Learning phonology with substantive bias: An experimental and computational

study of velar palatalization. Cognitive Science 30(5): 945–982.

Xu, Yi. 1997. Contextual tonal variations in Mandarin. Journal of Phonetics 25: 61–83.

34 J. Zhang, J. Liu

123

Author's personal copy

Xu, Yi. 2005. TimeNormalizedF0. Praat script. http://www.phon.ucl.ac.uk/home/yi/tools.html. Accessed 1

Dec 2005.

Yang, Zi-Xiang, He-Tong Guo, and Xiang-Dong Shi. 1999. Tianjinhua Yindang [The sound system ofTianjin dialect]. Shanghai: Shanghai Education Press.

Yu, Alan C.L. 2007. Understanding near mergers: The case of morphological tone in Cantonese.

Phonology 24: 187–214.

Yue-Hashimoto, Anne O. 1987. Tone sandhi across Chinese dialects. In Wang Li memorial volumes,English volume, ed. Chinese Language Society of Hong Kong, 445–474. Hong Kong: Joint

Publishing Co.

Zhang, Jie. 2002. The effects of duration and sonority on contour tone distribution: A typological surveyand formal analysis. New York: Routledge.

Zhang, Jie. 2007. A directional asymmetry in Chinese tone sandhi systems. Journal of East AsianLinguistics 16: 259–302.

Zhang, Jie. 2010. Issues in the analysis of Chinese tone. Language and Linguistics Compass 4(12): 1137–1153.

Zhang, Jie. 2014a. Tones, tonal phonology, and tone sandhi. In The handbook of Chinese linguistics, ed.C.-T.James Huang, Y.-H.Audrey Li, and Andrew Simpson, 443–464. Oxford: Wiley-Blackwell.

Zhang, Jie. 2014b. Tone sandhi. In Oxford bibliographies in linguistics, ed. Mark Aronoff. New York:

Oxford University Press. http://www.oxfordbibliographies.com/view/document/obo-978019977281

0/obo-9780199772810-0160.xml. Accessed 15 July 2014.

Zhang, Jie, and Yuwen Lai. 2008. Phonological knowledge beyond the lexicon in Taiwanese double

reduplication. In Interfaces in Chinese phonology: Festschrift in honor of Matthew Y. Chen on his70th birthday, ed. Yuchau E. Hsiao, Hui-Chuan Hsu, Lian-Hee Wee, and Dah-An Ho, 183–222.

Taipei: Academia Sinica.

Zhang, Jie, and Yuwen Lai. 2010. Testing the role of phonetic knowledge in Mandarin tone sandhi.

Phonology 27(1): 153–201.

Zhang, Jie, and Jiang Liu. 2011. Tone sandhi and tonal coarticulation in Tianjin Chinese. Phonetica 68

(3): 161–191.

Zhang, Jie, and Yuanliang Meng. 2012. Structure-dependent tone sandhi in real and nonce words in

Shanghai Wu. In Proceedings of the 3rd international symposium on tonal aspects of languages, ed.Gu Wentao. Nanjing: Nanjing Normal University.

Zhang, Jie, Yuwen Lai, and Craig Sailor. 2009. Opacity, phonetics, and frequency in Taiwanese tone

sandhi. In Current issues in unity and diversity of languages: Collection of papers selected from the18th International Congress of Linguists, ed. Manghyu Pak, 3019–3038. Seoul: Linguistic Society of

Korea.

Zhang, Jie, Yuwen Lai, and Craig Sailor. 2011. Modeling Taiwanese speakers’ knowledge of tone sandhi

in reduplication. Lingua 121(2): 181–206.

Zhao, Yuan, and Dan Jurafsky. 2009. The effect of lexical frequency and Lombard reflex on tone

hyperarticulation. Journal of Phonetics 37: 231–247.Zhu, Xiaonong. 2004. Jipin guiyihua — ruhe chuli shengdiao de suiji chayi? [F0 normalization — How to

deal with between-speaker tonal variations?]. Yuyan Kexue [Linguistic Sciences] 3(2): 3–19.Zuraw, Kie. 2000. Patterned exceptions in phonology. PhD dissertation, University of California, Los

Angeles.

Zuraw, Kie. 2007. The role of phonetic knowledge in phonological patterning: Corpus and survey

evidence from Tagalog infixation. Language 83: 277–316.

Zuraw, Kie. 2010. A model of lexical variation and the grammar with application to Tagalog nasal

substitution. Natural Language and Linguistic Theory 28: 417–472.

Tone sandhi productivity in Tianjin Chinese 35

123

Author's personal copy