duration and speed of speech events: a selection of methods
TRANSCRIPT
lingua posnaniensis2014 lVi (1)
Duration and speed of speech events: a selection of methods
Dafydd Gibbon1, Katarzyna Klessa2 & Jolanta Bachan2
1âFakultĂ€tâfĂŒrâlinguistikâundâliteraturwissenschaft,âuniversitĂ€tâBielefeld,â[email protected]âInstituteâofâlinguisticsâadamâMickiewiczâuniversityâinâpoznaĆ,â[email protected],â
abstract:â Dafyddâ gibbon,â katarzynaâ klessaâ &â jolantaâ Bachan.â Duration and speed of speech events: A selection of methods.âtheâpoznaĆâSocietyâ forâ theâadvancementâofâ theâartsâandâSciences.âplâISSnâ0079-4740,âISBnâ978-83-7654-384-0,âpp.â59â83
theâstudyâofâspeechâtiming,â i.e.â theâdurationâandâspeedâorâ tempoâofâspeechâevents,âhasâ increasedâinâimportanceâoverâtheâpastâtwentyâyears,âinâparticularâinâconnectionâwithâincreasedâdemandsâforâaccuracy,âintelligibilityâandânaturalnessâinâspeechâtechnology,âwithâapplicationsâinâlanguageâteachingâandâtesting,âandâwithâtheâstudyâofâspeechâtimingâpatternsâinâlanguageâtypology.âhowever,âtheâmethodsâusedâinâsuchâstudiesâareâveryâdiverse,âandâsoâfarâthereâisânoâaccessibleâoverviewâofâtheseâmethods.âSinceâtheâfieldâisâtooâbroadâforâusâtoâprovideâanâexhaustiveâaccount,âweâhaveâmadeâtwoâchoices:âfirst,âtoâprovideâaâframe-workâofâparadigmaticâ(classificatory),âsyntagmaticâ(compositional)âandâfunctionalâ(discourse-oriented)âdimensionsâforâdurationâanalysis;âandâsecond,âtoâprovideâworkedâexamplesâofâaâselectionâofâmethodsâassociatedâprimarilyâwithâtheseâthreeâdimensions.âSomeâofâtheâmethodsâwhichâareâcoveredâareâestab-lishedâstate-of-the-artâapproachesâ(e.g.â theâparadigmaticâClassification and Regression Trees, cart, analysis),âothersâareâdiscussedâinâaâcriticalâlightâ(e.g.âso-calledâârhythmâmetricsâ).âaâsetâofâsyntagmaticâapproachesâappliesâtoâtheâtokenisationâandâtreeâparsingâofâdurationâhierarchies,âbasedâonâspeechâannota-tions,âandâaâfunctionalâapproachâdescribesâdurationâdistributionsâwithâsociolinguisticâvariables.âSeveralâofâtheâmethodsâareâsupportedâbyâaânewâweb-basedâsoftwareâtoolâforâanalysingâannotatedâspeechâdata,âthe Time Group Analyser.
Keywords: speech timing, polish, english, speech technology
1. objectives and topic overview
theâpresentâcontributionâconcentratesâonâaâselectionâofâmethodsâforâanalysingâspeechâtimingâinâenglishâandâpolish.âtheâunifyingâprincipleâisânotâsoâmuchâextensiveâdataâanalysisâorâhistoricalâreview,âbutâratherâmethodological,âlookingâatâspeechâtimingâfromâthreeâpointsâofâview:âparadigmaticâorâclassificatory,âsyntagmaticâorâstructure-building,âandâfunctionalâinâdiscourseâcontexts.
DoI:â10.2478/linpo-2014-0004
UnauthenticatedDownload Date | 12/13/15 8:16 AM
60 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
preferredâmethodsâhaveâvariedâconsiderablyâoverâtime,âpartlyâinâdependenceâonâavail-ableâstatistical,âformalâandâtechnologicalâtechniques.âForâexample,âinâ1950sâlinguisticâpho-netics, Jassemâ&âabercrombieâanalysedâ structuralâ relationsâbetweenâphonemes,â syllables,âfeetâandârhythm.âByâcontrast,â inâ1960sâquantitativeâphonetics,âlehiste, Jassem and others concentratedâonâisochronyâ(equalâunitâtiming)âinârelationâtoâwords,âsyllablesâandâphonemes,âwhileâ inâ theâ1970sâpsycholinguisticsâ introducedâperceptualâ experimentsâwithâ timingâ andâsentenceâcomplexity.âInâtheâ1980sâandâ1990s,âworkâinâspeechâtechnologyâbyâCampbell and inâcomputationalâphonologyâbyâBirdâledâtoâstatisticalâandâlogicalâmodelsâofâspeechâtiming.âSubsequently,âcontinuingâ toâ theâpresentâday,â rhythmâmodellingâ inâcomparativeâphoneticsâand formal oscillator models, and the analysis of large corpora with cart (Classification and Regression Trees)âmethods,âasâwellâasâquantitativeâapplicationsâ toâl2â learning,âhaveâemerged.âwhatâhasâemerged,âatâthisâglobalâlevelâofâdiscussion,âisâtheâveryâlargeânumberâofâdegrees of freedom manifested in speech timing, including properties of different phone types,âphoneâpositionsâinâsyllables,âsyllableâpositionsâinâwords,âandâwordâpositionsâinârela-tionâtoâboundaryâtypesâinâparallelâsyntacticâandâintonationalâphraseâstructuresâ(cf. Campbell 1992), asâwellâasâpauseâdistributionâandâfunctionality (cf. dechertâ&âraupachâ1980). in this briefâreview,âonlyâaâfewâimmediatelyârelevantâtrendsâareâselected.
aâmajorâinfluenceâinâtheâinvestigationâofâspeechâtimingâmodelsâhasâbeenâtheâneedâforâpredictiveâmodelsâofâsegmentâdurationâinâspeechâtechnology,âparticularlyâspeechâsynthesis.âtheâ earliestâmodelsâwereâ rule-based,â andâ usedâ aâ combinationâ ofâ linguisticâ andâ phoneticâanalysisâtoâcreateâsetsâofâsegmentâdurationârulesâforâenglish.âInâanâearlyâmodelâ(klatt 1976) eachâsegmentâisâattributedâanâinherentâduration,âandâisâshortenedâorâlengthenedâbyâaâcon-text-dependentâ percentageâ value,â subjectâ toâ aâ specifiedâminimumâduration.âtheâ contextsâincludedâpre-pausalâfinalâ lengthening,ânon-finalâ shortening,ânon-initialâvowelâ shortening,ânon-stressedâsoundâshortening,âandâvowelâlengtheningâbeforeâvoicedâconsonants.âtheâmodelâwas successfully applied in speech systems such as Klattalk and DECtalk (klattâ 1987).âRule-basedâdurationâmodelsâwereâalsoâcreatedâforâmanyâotherâ languages,âe.g.â forâFrenchâ(oâshaughnessy 1984), german (porteleâetâal.â1990)âandâhungarianâ(olaszyâ2002).âInâtheâdevelopmentâofârule-basedâmodels,âlinguisticâknowledge,âexperienceâandâintuitionâdominateâoverâextensiveâquantitativeâanalysisâofâactualâcorpora,âandâbothâtheârulesâandâtheirâparam-etersâareâdefinedâwithâaâsequential,â(semi-)manualâtrialâandâerrorâapproach.âCorpus-basedâmodelsâ focusâmoreâ onâ variationâ andâ constancyâ inâ largeâ collectionsâ ofâ data,â thoughâ theyânecessarilyâalsoâinvolveâlinguisticâinformation.
Studiesâofâenglishâtimingâhaveâbeenâwellâdocumentedâ(cf.âcontributionsâtoâgibbonâetâal.â2012).âForâpolish,âinitialâsignificantâresultsâonâspeechâtimingâwereâachievedâseveralâdecadesâagoâ(e.g.ârichter 1973; 1974; 1987; Jassemâetâal.â1981),âwithâinvestigationâofârelationsâbe-tweenâlogatomsâandâlinguisticâfeatures,âtheâinfluenceâonâsegmentâdurationâofâpositionâinâac-centâunits,âandâdistinctionsâbetweenâdurationâclasses,âmanyâofâtheâstudiesâfocusingâonâisoch-ronyâandâitsâlimits.âtheâmethodsâusedâincludedârhythmâstructureâmodellingâwithâlogatomsâandâ linguisticâ features,â aâ powerâ functionâ relatingâ segmentalâ durationâ andâ theâ numberâ ofâsyllablesâwithinâanâaccentâunit,âandâregressionâmodelsâforâisochronyâinâtheânRuâ(narrowâRhythmâunit)âvs.ânumberâofâsyllablesâinârhythmâunitsâ(cf.âJassemâetâal.â1981),âfindingâthatâtheâgreatestâtendencyâtoâisochronyâwasâpresentâinâtheânarrowâRhythmâunit.
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 61
Variableâspeechârate,âandâitsâeffectâonâphoneâdurations,âvowelâformantâpatternsâandâsyl-lableâstructure,âposesâanotherâchallengeâ forâ speechâ timingâmodelsâ (Ćobaczâ1976a,âb;âzee 2002; cummins 1999; crystalâ1969).âFindingsâincludeâdependencyâonâtypeâofâspeaker,âtypeâofâsentence,âpositionâinâtheâphrase,âandâasymmetryâofâdistanceâbetweenâtempi:âslowâtoânor-malâisâgreaterâthanânormalâtoâfast.âtheâoptimalâfastârateâwasâcalculatedâtoâbeâalmostâdoubleâthatâofâtheâoptimalâslowârate.
Inâtheâpresentâcontribution,âaâsmallâselectionâofâcurrentâmethodsâforâtheâinvestigationâofâspeechâtimingâisâbroughtâintoâfocus,âwithâparticularâemphasisâonâsyllableâdurationâpat-terning.âweâtakeâtheâpragmaticâpositionâthatâtheoryâandâitsâempiricalâgroundingâareâheavilyâinfluencedâbyâavailableâmethods,âtechniques,âproceduresâandâtools,âandâconsequentlyâweâdoânotâconcentrateâonâlinguisticâorâcognitiveâtheoriesâofârhythm.âtheâmethodsâweâselectâfocusâmainlyâonâtheâcomputationalâtreatmentâofâlargeâcorpora.
the polish and english data used include the following:1.âanalysisâofââauthenticâdataâ,âi.e.âspeechâwhichâisânotâelicitedâforâtheâspecificâpurposeâ
ofâanalysis.2.âanalysisâofâaâwell-definedâdataâsetâviaâperceptualâjudgmentsâbyâselectedâsubjects.3.â linguisticâ corpusâ analysisâ andâ functionalâ interpretationâ ofâ temporalâ propertiesâ ofâ
speechâinârelationâtoâfeaturesâofâdiscourse,âspecificallyâfocusingâonâgenderâdifferences.
2. a paradigmatic perspective on contextual factors
2.1. Cart analysis
Moreârecently,âtechnologicalâadvancesâhaveâresultedâinâtheâuseâofâtechniquesâbasedâonâuniversalâ statisticalâ toolsâsuchâasâCaRtâ(Classification and Regression Trees,âfirstâ intro-ducedâbyâBreimanâetâ al.â 1984),â clusterâ analysisâ (e.g.âeverittâ etâ al.â 2011)â andâneuralâ net-worksâ(usedâforâdurationâanalysisâe.g.âbyâVainioâ2001),âwithâdataâobtainedâfromâlargeâ(andâveryâlarge)âcorporaâofâcontinuousâspeech.âhowever,âitâneedsâtoâbeâmentionedâthatâalthoughâcorpus-basedâmodelsâoftenâguarantee,âforâinstance,âbetterânaturalnessâofâsynthesisedâspeech,âandâthusâareâstronglyâpreferredâinâmanyâpracticalâapplications,ârule-basedâmodelsâareâalsoâstillâpresent.âtheyâcanâbeâdevelopedâwithâtheâsupportâofâavailableâstatisticalâtechniques,âbutâwithoutâcostlyâspeechâcorpora,âandâhaveâ thusâfoundâapplicationsâ inâsituationsâwhereâ itâ isâmoreâ importantâ toâachieveâspeechâcharacterisedâbyâhighâspeedâwhileâstillâ retainingâ intel-ligibilityâandârelativeâcorrectnessâ(overâânaturalnessâ)â(e.g.âmoosâ&âtrouvain 2007; moers et al.â2010).âInâfact,ânowadaysâitâisâoftenâtheâcaseâthatâtheâtwoâapproachesâoverlap,âandâcarefulâlinguisticâfeatureâextractionâisâusuallyâanâimportantâstageâprecedingâtheâactualâstatisticalâpro-cessing.âlinguisticâknowledgeâmayâbeâusedânotâonlyâatâtheâdataâpreparationâstage,âbutâalsoâinâthe modelling process itself (van santen 1993; Möbiusâ&âvan santenâ1996).
Studiesâvaryâinâtheâchoiceâofâtheâunitâusedâasâtheâbaseâforâsegmentalâdurationâmodelling.âFrequently,âtheâphoneâisâusedâasâtheâunit,âthoughâCampbellâs model (1992) analyses phone durationâasâdependentâonâsyllableâproperties.âtheâhugeânumberâofâcombinatoryâpossibilitiesâfor units in natural speech generate a large space of coarticulation and other inter-unit effects (van santen 1993): unnatural distortion results at concatenation points which do not capture theseâeffects,âevenâifâaâttSâ(text-to-speech)âsystemâotherwiseâworksâwell.
UnauthenticatedDownload Date | 12/13/15 8:16 AM
62 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
aârelatedâchallengeâforâacousticâinventoryâdesignâandâacousticâmodellingâisâtheâhighârateâofâoccurrenceâofârareâevents,âtheâsoâcalledâlnReâproblemâ(largeânumberâofâRareâevents;âMöbiusâ2001).âaâcompromiseâbetweenâdatabaseâsizeâandâsufficientâcoverageâofâunitâcom-binationsâcanâbeâreachedâbyâoptimisingâtheâcontentsâofâtheâdatabase,âe.g.âusingâgreedyâsetâcoveringâalgorithms,âi.e.âheuristicâapproximationâbasedâonâlocallyâoptimalâchoicesâ(Buchs-baumâ&âvan santenâ1997)âandâbyâmanipulatingâtheâsizeâofâunitsâusedâforâunitâselection.ânon-uniformâunitâselectionâhasâbeenâreportedâtoâresultâinâaâgoodâqualityâofâsynthesisedâspeechâforâmanyâlanguagesâ(e.g.âkingâetâal.â1997):âselectingâlongerâconcatenationâunitsâisâexpectedâtoâresultâinâaâsmallerânumberâofâglitchesâatâconcatenationâpoints,âandâaâmoreânaturalâsound.âhowever,âforâhighlyâinflectingâlanguagesâ(e.g.âpolish,âturkish,âarabic)âitâisâespeciallyâchal-lengingâtoâuseâlargerâconcatenationâunits,âbecauseâaâveryâlargeânumberâofâinflectedâformsâinâtheseâunitsâwouldâbeârequired.
Settingâunitâselectionâpreferencesâbyâmeansâofâcostâfunctionsâandâpenaltiesâinfluencedâbyâconstraintsâfromâstructuresâatâdifferentâlevelsâisâanotherâstrategyâforâimprovingâdurationâmodels.âInâaâdurationâmodelâdevelopedâforâtheâpolishâBoSSâsynthesiserâ(SzymaĆskiâetâal.â2011)âtheâbestâresultsâofâperceptionâtestsâasâregardsâtheâqualityâofâsynthesizedâspeechâwereâachievedâwhenâtheâsystemÊŒsâunitâselectionâalgorithmâwasâsetâupâtoâuseâphoneâlevelâunitsâasâtheâbasisâwithâaâdurationâmodelâcontainingâfeaturesâfromâbothâsegmentalâandâsupraseg-mentalâlevelsâofâutteranceâstructureâ(klessaâetâal.â2007).âthus,âalthoughâtheâunitâselectionâalgorithmâisâphone-basedâonly,âinformationâfromâdifferentâlevelsâofâutteranceâstructureâisâprovided.â
theâCaRtâstatisticalâmethodâofâanalysisâisâbasedâonâtwoâkindsâofâtreeâtechniquesâforâsolvingâtheâtasksâofâ(1)âclassifyingâobjectsâ(forâcategoricalâvariables)âandâ(2)âpredictingâtheâactualâvaluesâofâaâfeatureâ(continuousâvariables).âInâtheâcaseâofâsegmentalâdurationâmodel-ling,âbothâtypesâofâtasksâareâhighlyâuseful,âdueâtoâtheâfactâthatâdurationâmodelsâneedâtoâbeâbasedâonâvariousâtypesâofâ(oftenâinteractingâandâinterdependent)âvariables.âtheâtargetâtaskâofâcreatingâaâdurationâmodelâ(andâpredictingâdurations)âcanâbeâsolvedâusing,âforâexample,ânominalâcategoricalâvariablesâ(suchâasâtheâtypeâofâvowel,âplaceâorâmannerâofâarticulation),ânumericalâvariablesâ(theâlengthâofâaâsyllable,âwordâorâfootâcontainingâtheâsoundâinâquestionâexpressedâinâtimeâunitsâorâasâaânumberâofâcomponentâsub-units),âandâalsoâordinalâcategori-calâvariablesâ(theâpositionâofâaâsoundâwithinâaâhigherâstructure,âe.g.âaâsyllableâorâaâword).âgenerally,âtheâaimâisâtoâdefineâaâsetâofâlogicalâif-then split conditions that allow prediction or classificationâofâcases.âexampleâconditionsâforâdurationâpredictionâmightâincludeâinstancesâsuchâas:âisâthisâtheâsoundâ/a/?âââifâyes,âthenâisâtheâsoundâpositionâwithinâtheâsyllableâstructureââonsetâ?âââifânot,âthenâisâtheâsoundÊŒsâmannerâofâarticulationââfricativeâ?âetc.
amongâotherâthings,âCaRt-basedâmodelsâsurelyâoweâtheirâpopularityâtoâtheâavailabilityâofâeasyâautomaticâconstructionâofâtheâmodelsâ(e.g.âkingâetâal.â2003).âhowever,âalthoughâtheâtreeâbuildingâproceduresâareâautomated,âtheâinputâforâCaRtâstillâdependsâonâcorpusâdata,âsoâitâisâcrucialâtoâprovideâhigh-qualityâannotationsâandâtoâdefineâfeaturesâwhoseâvaluesâwillâbeâderivableâfromâtheâdata.âtheâinfluenceâofâtheâfeaturesâisâusuallyâanalysedâinâseveralâstagesâduringâmodelâdevelopment:âseparatelyâforâindividualâfeaturesâorâforâsmallâsubsetsâofâaâlargerâfeatureâsetâ (usingâvariousâstatisticalâmethodsâsuchâasâanalysisâofâvarianceâorâcorrelationsâbetweenâfactors)âandâwithâ theâuseâofâ theâwholeâsetâofâ features.âtheâwagonâCaRtâbuild-ing programme (kingâetâal.â2003),âforâexample,âoffersâanâautomatedâstepwiseâoptionâthatâincrementallyâfindsâfeaturesâthatâcontributeâmostâtoâtheâpredictedâvariableâwithinâaâspecificâ
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 63
featureâset.âtheâfeatureâsetâisâtreatedâasâaâwhole,âandâtheâcorrelationâofâparticularâfeaturesâisâexpressedâasâaâcumulativeâcorrelation,âi.e.âtheâfeaturesâareârankedâinâaâwayâthatâtheâmostâcontributiveâfeatureâisâtreatedâasâtheâbest,âandâtheâcorrelationâofâeachâofâtheâsubsequentâfea-turesâisâincreasedâbyâaânumberâdependingâonâtheirâpercentâcontributionâtoâtheâoverallâmeanâcorrelationâofâtheâfeatureâset.âthisâprovidesâtheâpossibilityâofâobservingâtheâimpactâofâtheâin-clusionâofâparticularâfeaturesâonâtheâoverallâresultâofâtheâdevelopedâfeatureâset.âForâinstance,âinâtheâpolishâBoSSâdurationâmodelâobtainedâwithâsuchâaâCaRtâpredictionâprocedure,âtheâcontextâinformationâforâphoneâdurationâisâprovidedâforâtheâphoneâinâquestionâandâforâthreeâadjoiningâ leftâandâ rightâcontextâ sounds.âtheâ featuresâ inâ theâfinalâ setâ relateâ toâ theâcurrentâphoneâ identity,â itsâmanner/placeâofâarticulation,âpresenceâofâvoice,âandâsoundâpositionâasâregardsâhigher-levelâunits.âtheâcorrelationâobtainedâwithâtheâfinalâ57-elementâsetâofâfeaturesâwasâ0.8â(withâRMSeâatâ15.4,âandâerrorâatâ11.3451).
2.2. Measuring and perceiving speech rate
whenâspeakingâofâspeechârateâorâtempo,âtheâfundamentalâquestionâisâtheâdefinitionâofâwhatâactuallyâisâmeantâbyâtheâtermsâandâhowâtheâacceptedâacousticâorâarticulatoryâmeas-uresâareârelatedâtoâhumanâperceptionâofâspeechârate.âInâorderâtoâaddressâtheseâquestions,âitâmightâ beâ helpfulâ toâmentionâ atâ leastâ severalâ conceptsâ andâ definitions.â First,â thereâ isâtheâdistinctionâbetweenâtheâobjectiveâ(actuallyârealisedâandâmeasurable/quantifiable)âandâsubjectiveâspeechâtempoâ(dependingâonâindividualâjudgment,âreferringâtoâeitherâintendedâorâ perceivedâ tempo).âthenâ thereâ areâ theâ notionsâ correspondingâ toâ theâ timeâ spanâunderâconsideration,âaccordingâtoâwhichâspeechârateâcanâbeâseenâasâglobalâ/ long-term (related to theâwholeâutteredâtext,âsentence,âindividualâcharacteristicsâofâaâpersonÊŒsâspeakingâstyle)âor localâ/âshort-termâ(localâvariationsâofâtempoâwithinâtheâutteredâtext).âaârelatedâissueâwillâbeâtheâmulti-directionalârelationshipsâbetweenâtheâglobalâandâlocalârates,âbothâasâregardsâacousticâ measurementsâ andâ perception-basedâ rateâ judgmentsâ (cf.âwagnerâ &âwindmann 2011).âanotherâdichotomyâcomesâfromâtheâdistinctionâofâgross (including pauses) and net (excludingâpauses)âspeechâtempo.âRespectingâtheâgrossâvs.ânetâdistinctionâmayâbeâespe-cially important when dealing with longer-term speech rate, in terms of acoustics or the perceptualâassessmentsâofâspeakingârateâasâaâcharacteristicâofâaâlongerâstretchâofâspeechâorâofâaâpersonÊŒsâspeakingâstyle.
Speechârateâcanâbeâunderstoodâandâthusâmeasuredâinâvariousâwaysâdependingâonâtheâac-ceptedâdefinitionsâandâprospectiveâapplicationâofâtheâmeasurementâresults.âtheâsameârefersâtoâ theâchoiceâofâ theâbaseâunitâandâ theâ intervalâ forâcalculationsâ (syllables,â speechâsounds,âmorphemes,âwordsâorâevenâsentencesâperâunitâofâtimeâinâmilliseconds,âsecondsâorâminutes).âĆobaczâ(1976b)âpointsâoutâissuesârelatedâtoâtheâeaseâofâdiscerningâlimitsâofâtheâunitsâandâtheâdesiredâunambiguityâofâ theirâborders,â andâonâ theâotherâhandâ theâquestionsâofâ reductions,âomissionsâorâ transpositionsâofâ segmentsâ inâdifferentâ realisationsâofâ theâ sameâ text.âword-basedârateâmeasuresâareâinâsomeâcasesâpreferredâ(e.g.âSyrdalâetâal.â2012)âdueâtoâtheâeaseâofâdistinguishingâwordsâinâtranscripts.âhowever,âtheâapparentâeaseâmightânotâalwaysâbeâborneâoutâinâreality,âespeciallyâinâtheâcaseâofâcomparativeâstudies.âwhenâcomparingâmeasurementsâbasedâonâwordsâorâphonesâwithâmodelsâconstructedâforâautomaticâspeechârecognition,âtheâre-sultsâachievedâwithâphone-basedârateâmeasuresâwereâsignificantlyâbetterâthanâthoseâachievedâwithâratesâcalculatedâusingâwordsâasâtheâbasicâunitsâ(sieglerâ&âstern 1995), due especially
UnauthenticatedDownload Date | 12/13/15 8:16 AM
64 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
toâdifferencesâinâwordâlengthsâorâstructures.âanâexampleâofâaâproblematicâwordâelementâinâpolishâmightâbeâtheâcaseâofânon-syllabicâprepositionsâz, w (pronouncedâ/z/âorâ/s/âandâ/v/âorâ/f/,ârespectively,âdependingâonâtheâpresenceâofâvoiceâinâtheâdirectlyâfollowingâcontext).âInâfact, for technical applications, these prepositions are often treated not as independent units butâasâpartsâofâsubsequentâsyllabicâwords;âinâthisâwayâtheâpronounâbecomesâmergedâwithâtheâneighbouringâword.âSuchâaâsolutionâwasâchosenâforâtheâpolishâBoSSâsynthesiserâ(Demenko etâal.â2010).âpfitzinger (1996) compared automatic estimations of speech rate using local phoneârateâversusâsyllableârate,âandâclaimedâthatâalthoughâbothâofâtheseâmeasuresâgaveâsig-nificantâandâsimilarâresults,âtheyâwereânotâidenticalâandâthusâtheâoverallâspeechârateâmeasureâshouldâbeâtreatedâasâaâcombinationâofâtheâtwoâtypesâofâmeasuresâratherâthanâasâanyâofâthemâseparately.
Regardlessâofâsignificantârateâvariationâacrossâparticularâutterancesâproducedâbyâaâspeak-er,âtheâspeakerÊŒsâoverallârateâcanâbeâviewedâasâhis/herâindividualâcharacteristic.âasâanâacous-ticâcorrelate,âtheââindividualââspeechârateâcanâbeâtreatedâasâtheâmeanârateâperâunitâofâtimeâ(probablyâdifferingâforâaâparticularâtypeâofâspeech:âread,âspontaneous,âaffective,âetc.).âtheâperceivedârateâofâspeechâmayâbeâcharacterisedâbyâaârangeâofâcuesâ(e.g.âpausingâschemes,âarticulation),âandâtheâweightâattributedâtoâtheseâcuesâbyâtheâlistenersâcanâdependâonâvariousâfactorsâsuchâasâvariabilityâofâtheâcuesâinâtheâsignalâ(grosjeanâ&âlassâ1977).âInâtheâpercep-tionâdomain,âlistenersâalsoâsomehowâcompensateâforâtheâlocalâspeechârateâvariationsâorâuseâthemâasâcuesâtoâformulateâaâgeneralâimpressionâofâoverallârate.âalthoughâtheâtaskâofâassess-ingâtheâspeechâtempoâoftenâappearsâtoâbeâaâquiteâintuitiveâandâeasyâtaskâforâaâlistener,âtheâexactâmannerâofâcompensationâandâspecificâwaysâofâusingâtheâcuesâareânotâobvious.âSpeechârateâperceptionâcanâbeâaffectedâbyâbothâtheâintendedâandâtheârealisedârateâ(koremanâ2006),âasâdependentâonâtheâactuallyâperceivedâspeechâsignalsâasâwellâasâonâtheâlistenerÊŒsâpreviousâknowledgeâandâtheirâownâspeakingâhabits.âthusâyetâanotherâcomplicatingâissueâisâtheâsubjec-tiveânatureâofâlistenersÊŒâjudgments.â
Inâmostâstudies,âspeechâratesâareâgroupedâintoâtwo,âthreeâorâsometimesâfiveâcategoriesâ(fast-slow,âfast-neutral-slow,âmedium-fast/slow,âetc.).âhowever,âasâwasâobservedâbyâĆobacz (1976b),âtheââdistancesââbetweenâtheânominalâcategoriesâofâspeechârateâareânotâsymmetri-callyâdistributedâaroundâtheâneutral speechârate,âwhichâmightâsuggestâaâneedâforâverificationâofâtheâcategorisation.âaâpossibleâstartingâpointâmightâbeâmadeâbyâusingâaâmoreâsophisticatedâor a continuous rating scale for speech tempo assessment (treiblmaierâ&âFilzmoser 2009; arnoldâetâal.â(2011):âonâprominenceâratingâscales).
Inâtheânextâtwoâsubsectionsâtheâresultsâofâspeechârateâmeasurementsâandâperceptualâas-sessmentâinâpolishâreadâspeechâareâdiscussed.âtheâmainâgoalâofâtheâfirstâpartâofâtheâstudyâ(Sectionâ2.3)âisâtoâinspectâselectedâquantitativeâmeasuresâofâspeechârateâexpressedâinâsounds,âsyllablesâandâwords,âandâmakeâaâcomparisonâwithâdialogues.âForâtheâsecondâpartâ(Sectionâ2.4),âitâisâaimedâtoâcompareâtheâmeasurementsâwithâperceptualâjudgmentsâofâglobalâtempo,âandâtoâinvestigateâtheâassessmentsâobtainedâwithâtheâuseâofâaâcontinuousâratingâscale.âasâtheâtextâmaterial,âaesopâsâFableâThe North Wind and the Sun (for polish transcription see jassemâ2003)âisâused.âtheârecordingsâcomeâfromâtheâparalinguaâcorpusâ(klessaâetâal.â2013)âandâwereârealisedâaccordingâtoâtwoâscenarios:âtheâspeakersâwereâasked:â(1)âtoâreadâtheâtextâneutrallyâusingâtheirâhabitualâreadingâstyle,âandâ(2)âtoâreadâtheâtextâasâifâtheyâwereâpushedâforâtimeâbutâstillâneededâtoâreadâtheâtextâinâanâunderstandableâway.âForâtheâpresentâstudy,âtheâ
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 65
recordingsâofâ6âspeakersâreadingâtheâtextâtwiceâ(accordingâtoâeachâofâtheâtwoâscenarios)âareâused,âasâwellâas,âadditionally,âtheârecordingsâofâ2âmoreâspeakersâreadingâtheâtextâonlyâonceâ(oneâspeakerâinâscenarioâ1,âandâoneâinâscenarioâ2).
2.3. some quantitative measures of tempo
Figureâ1âshowsâtheâobtainedâmeasurementsâofâmeanâsyllable,âsound,âandâwordâratesâperâsecondâasâwellâasâ theâ totalânumberâandâmeanâdurationâofâpausesâproducedâbyâsixâspeakersâ(forâbetterâcomparabilityâofâtendencies,âsomeâvaluesâwereâscaledâasâshownâinâtheâlegend;âtheâfigureâdepictsâresultsâonlyâforâ12âoutâofâ14âspeakers,âi.e.âthoseâwhoâreadâtheâtextâtwice,âbutâtheânumbersâinâtheâtextâareâgivenâforâallâparticipants).âasâcanâbeâseen,âtheâmeanâvaluesâofâtheâsyllableâandâwordârateâtendâtoâdifferâinâaâsimilarâwayâacrossâspeakers,âwhileâtheâmeanâsoundârateâdifferencesâshowâmoreâinter-speakerâvariability.âItâshouldâbeânotedâhereâthatâtheâveryâhighâcorrelationâ(seeâalsoâtableâ1âforânumbers)âbetweenâtheâsyllableârateâandâwordârateâmightâbeâpartlyâexplainedâbyâtheârepetitiveâoccurrenceâofâseveralâmonosyllabicâwords,âmostâofâ themâcontainingâcomplexâconsonantâclustersâ(e.g.âwiatrâ/vjatr/âorâpĆaszczâ/pwaSt^S/).âSinceâtheâtextâisânotâphoneticallyâbalancedâandâtheânumberâofâspeakersâisâlimited,âtheâobtainedâratesâoughtâtoâbeâtreatedâonlyâasâroughâestimatesâofâtempoâinâpolishâreadâspeech.âtheâresultsâgenerallyâcon-firmâtheâfiguresâreportedâbyâĆobaczâ(1976b),âwhereâfastâphoneârateâwasâfoundâtoâlieâbetweenâ14.3âandâ16.2âandânormalâphoneârateâbetweenâ11.8âandâ14.6.âInâtheâpresentâresults,âtheâoverallâmeansâforâallâspeakersâwereâ16.69âandâ13.66âforâfastâandânormalâintendedâspeechârespectively.âtheâfastâtempoâwasâsignificantlyâhigherâthanâtheâmaximumâinâtheâstudyâofâĆobaczâinâtheâcaseâofâthreeâspeakers,âandâinâtheâcaseâofânormalâtempoâallâspeakersâusedâratesâwithinâtheârespectiveârange,âexceptâforâspeaker h,âwhoâspokeâtheâfastestâoverall.â
Figure 1: Mean values of: syllable, sound, and word rates, number and duration of pauses for six speakers in fast and normal intended speech tempo
UnauthenticatedDownload Date | 12/13/15 8:16 AM
66 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
theânumberâofâpausesâappearedâ toâbeâ slightlyâhigherâ inânormalâ intendedâ tempoâ (theâmeanânumberâofâpausesâwasâ8.43âforâfastâspeechâandâ10.71âforânormal),âapartâfromâtheâre-sultsâforâtwoâspeakersâ(oneâofâwhomâwasâspeakerâh,âwhoâhadâtheâfastestârateâofâall).âonâtheâotherâhand,âinâtheseâtwoâcases,âtheâmeanâdurationâofâpausesâwasâsignificantlyâhigherâthanâinâtheâcaseâofâanyâotherâspeaker,âwhichâmightâsuggestâaâkindâofâcompensationâforâtheâlowerânumberâofâpausesâbyâpauseâlengthening.âtheâmeanâdurationsâofâpausesâdifferâsignificantlyâinâfastâandânormalârates,âbeingâconsistentlyâ(andânotâsurprisingly)âhigherâinâtheâlatterâ(206.24â/â373.69).âInâtheâcaseâofâspeakerâe,âtheâsmallestâdifferentiationâasâregardsâtheânumberâandâlengthâofâpausesâcanâbeâobservedâinâtheâtwoâintendedârates,âhoweverâatâtheâsameâtimeâforâthisâspeakerâtheâlargestâdifferenceâbetweenâmeanâphoneâratesâwasânotedâ(4.62)âwhichâinâturnâmightâshowâthatâthisâspeakerÊŒsâpreferenceâwasâtoâdifferentiateâratesâbyâalteringâarticulationârateâratherâthanâpausingâschemes.
Inâorderâtoâfurtherâexamineâtheâtimingâpropertiesâofâfastâandâslowâreadâspeech,âselectedâglobalâmeasuresâofââtimingâwereâperformedâforâtheâaboveâmaterialâwithâtheâtgaâtoolâ(Sec-tionâ3âandâgibbonâ2013)âandâcomparedâwithâtheâresultsâobtainedâfromâsixâpolishâdialoguesâ(detailsâinâSectionâ4).âtheâresultsâareâbasedâonâ148âinterpausalâtimeâgroupsâforâreadâspeechâandâ390âgroupsâforâdialogues.âasâcanâbeâseenâinâFigureâ2,âapartâfromâtheâexpectedâdifferenceâinâoverallâdurationsâbetweenâtheâfastâspeakersâandâtheâremainingâones,âtheâlargestâdiscrep-anciesâ canâbeâobservedâ inâ theâoverallâ andâmeanâ slopesâbetweenâ readâandâconversationalâspeech,âwhichâmightâ beâ regardedâ asâ aâ confirmationâ ofâ theâ tendencyâ reportedâ inâ Sectionâ3.2âbelow,âi.e.âtheâslopeâbeingâaâpotentialâstyleâorâgenreâmarker.âhowever,âthisâobservationârequiresâfurtherâverification,âespeciallyâdueâtoâtheâspeaker-relatedâdifferencesâinâslopesâinâdialogues.âanotherââdiscrepancyâcanâbeâseenâinâtheâtendenciesâforâSDâmeasures,âespeciallyâtheâoverallâSD,âwhichâappearsâtoâdifferâforâeachâofâtheâthreeâdatasets.
Ov.
min
Ov.
max
/10
Ov.
rate
/sec*
10
Ov.
intc
pt/1
0
Mea
n int
cpt/1
0
Med
. int
cpt/1
0
Ov.
slop
e *5
00
Mea
n slo
pe
Med
. slop
e
Ov.
SD
Mea
n SD
s
Med
. SD
s
Ov.
nPV
I
Mea
n nP
VIs
Med
. nPV
Is-500
50100150
dialogueread normalread fast
Scal
ed va
lues
Figure 2: Comparison of selected quantitative measures of timing in read speech (fast and normal rates) and dialogues
2.4. perception-based subjective assessment of tempo
the same recordings of read speech were used in a perception test in which 23 listen-ersâ (studentsâofâ theâsameâ linguisticsâdepartment)âwereâaskedâ toâperceptuallyâassessâ theâspeechâratesâofâtheâspeakers.âDuringâtheâtest,âtheâsignalsâwereâplayedâinâaârandomâorderâtoâeachâsubjectâindividuallyâviaâheadphones.âparticipantsâwereâinstructedâtoâlistenâtoâeachâofâtheârecordings,âandâwereâalsoâallowedâtoâreplayâtheârecordingâorâitsâfragments.âSubjectsâwereâpresentedâwithâtheâratingâscaleâandâtheâmethodâofâratingâbeforeâtheyâstartedâtoâlisten.âafterâlistening,âtheâtaskâwasâtoâmarkâtheirâownâsubjectiveâjudgmentâofâtheâspeakerÊŒsâover-allâspeechâtempoâonâaâcontinuousâscaleâwithoutâanyânumberâorâscaleâgivenâ(onlyâmin-max
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 67
markersâatâtheâendsâofâtheâscale).âItâwasâemphasizedâthatâtheâtaskâwasânotâtoâcompareâratesâbetweenâparticularârecordings,âbutâratherâtoâexpressâoneÊŒsâpersonalâjudgmentâorâimpres-sion.âafterâmarkingâanâanswer,âtheâsubjectâcouldâmodifyâitâ(asâmanyâtimesâasâdesired)âbutâonlyâuntilâshe/heâproceededâtoâaârecordingâofâanotherâspeakerâââitâwasânotâpossibleâtoâalterâtheâratingsâafterwards.â
Table 1: Correlation table for overall rates expressed in syllables, sounds, words per second, mean of perceptual ratings, number of pauses and mean duration of pauses
Coefficientsâinâboldââsignificantâwithâ
pâ<â.05000
overallâsyllableârate
overallâsound rate
overallâword rate
mean of ratings
pause number
mean pauseâdur.
overallâsyllableârate 1.000000 0.873802 0.999914 0.957529 â0.627509 â0.635721overallâsoundârate 0.873802 1.000000 0.874916 0.827639 â0.493676 â0.539552overallâwordârate 0.999914 0.874916 1.000000 0.958162 â0.628586 â0.637595mean of ratings 0.957529 0.827639 0.958162 1.000000 â0.605941 â0.715032pauseânumber â0.627509 â0.493676 â0.628586 â0.605941 1.000000 0.163704mean pause duration â0.635721 â0.539552 â0.637595 â0.715032 0.163704 1.000000
theâresultsâofâtheâperceptionâtestâshowedâthatâtheâlistenersÊŒâjudgmentsâofâspeechârateâwereâgenerallyâinâlineâwithâtheâspeakersÊŒâintentionsâ(allârecordingsâintendedâasâfastâobtainedâmeanâratingsâaboveâtheâgeneralâmean,âandâconversely,âallâânormalââonesâwereâgivenâratesâbelowâtheâoverallâmean).âtableâ1âpresentsâcorrelationsâbetweenâtheâmeanârateâexpressedâinâsyllables,âsounds,âandâwords,âandâalsoâtheâmeanâofâperceptualâratings,âpauseânumberâandâmeanâpauseâdurations.âtheâperceptualâratingsâwereâfoundâtoâbeâhighlyâpositivelyâcorrelatedâwithâtheâoverallâsyllableârateâandâoverallâwordârateâ(corr.âaboveâ0.95).âtheâcorrelationâwithâtheâphoneârateâisâalsoâpositiveâandâstatisticallyâsignificant,âbutâslightlyâweaker.âasâwasâal-readyâmentionedâaboveâinâSectionâ2.3,âthereâisâaâveryâhighâcorrelationâbetweenâwordâandâsyl-lableârates,âthusâatâthisâstageâitâisânotâconclusiveâwhetherâtheâlistenersâbasedâtheirâjudgmentsâmoreâonâwordâorâ syllableâ rateâcues.âtheânegativeâcorrelationâofâ ratingsâwithâ theânumberâandâdurationâofâpausesâisâalsoâsignificant,âwithâpauseâdurationâbeingâaâlittleâmoreâinfluentialâ(â0.71)âthanâtheâpauseânumber.â
Inâorderâtoâexamineâtheâoutcomeâofâusingâtheâcontinuousâratingâscaleâinâtheâperceptionâexperiment,âaâ treeâdiagramâ(Figureâ3)âwasâproducedâasâaâ resultâofâclusterâanalysisâper-formedâwithâStatisticaâsoftware.âpartitionsâofâtheâresultsâvisualisedâonâsuchâaâtreeâdiagramâcanâbeâachievedâbyâcuttingâtheâtreeâatâaâspecificâheightâ(y-axisâvalue).âInâtheâsearchâforâmethods of attaining the optimalâcuttingâlevel,âseveralâapproachesâhaveâbeenâdevelopedâ(cf.âe.g.âeverittâetâal.â2011:â95â96).âConsideringâtheâstandardâagglomerativeâclustering,âtheâdivisionâshouldâbeâmadeâatâaâheightââsuchâthatâclustersâbelowâthatâheightâareâdistantâfromâeachâotherâbyâatâleastâthatâamountâ,âthusâinformallyâsuggestingâtheânumberâofâclusters.âInâFigureâ3,â twoâmainâclustersâofâ judgmentsâcanâbeâdistinguishedâ(cuttingâatâaâdistanceâofâca.â60);âhowever,âtheâoptimalâclusteringâmightâbeâexpectedâwithâcuttingâeitherâatâanâag-glomerationâdistanceâofâ30â(thusâgivingâ3âcategoriesâofâspeechârate)âorâatâaâdistanceâofâ10â(resultingâinâ5âcategories).â
UnauthenticatedDownload Date | 12/13/15 8:16 AM
68 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
Figure 3: Cluster analysis of the perception test results: agglomeration tree diagram
for the two hypothesised groupings, a k-meansâclusteringâwasâperformedâtoâlookâatâtheâmeansâofâratingsâgroupedâintoâ3âorâ5âclustersâofâgreatestâpossibleâdistinction.âtheâresultsâareâgivenâinâtableâ2.âallâdistancesâofâmeansâbetweenâclustersâareâsignificant,âandârangeâfromâ13.6âtoâ18.78âforâtheâ5-clusterâgrouping,âwhileâforâtheâ3-clusterâgroupingâtheâdifferenceâinâmeansâbetweenâcl.1âandâcl.2â(29.93)âwasâslightlyâhigherâthanâbetweenâcl.2âandâcl.3â(25.02).âtheâresultsâinâtableâ2âshowâmeansâforâclustersâorderedâaccordingâtoâtheâtreeâdiagramâ(andânotâtheâratingâvalues).âthisâfindingâmightâtentativelyâbeâconsideredâtoâcontributeâtoâtheâdis-cussionâinitiatedâbyâĆobaczâ(1976a:â178â179),âwhoâfoundâthatâspeakersâtendedâtoâdifferenti-ateâmoreâbetweenâslowâandânormalâratesâthanâbetweenânormalâandâfastâratesâ(theâextremelyâfastâtempoâbeingâlimitedâbyâphysiologicalâfactors).âhowever,âtheâclusteringâresultsâpresentedâhereâareâpreliminaryâandâneedâtoâbeâexaminedâinâmoreâdetail,âespeciallyâasâregardsâtheâquali-tativeâvalidityâofâtheâgrouping.
Table 2: Results of k-means analysis for 3 and 5 clusters (cl.) of rate assessments
no.âofâclusters Meanâforâcl.1 Meanâforâcl.2 Meanâforâcl.3 Meanâforâcl.4 Meanâforâcl.55 49.7791 33.4365 63.3864 82.1709 14.99003 23.5961 53.5356 78.5576 n/a n/a
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 69
3. syntagmatic aspects: time types, linearity, alternation, hierarchy
3.1. Time types: a framework for defining contextual factors
aâtheoreticalâ framework,âTime Typeâ theoryâ (gibbonâ1992;â2006),âwasâdevelopedâ forâdistinguishingâbetweenâformalâtypesâofâtemporalâstructure:âCategorial Timeâ(e.g.âdurationâasâdistinctiveâfeature),âRelational Timeâ(e.g.âparallelârelationsâbetweenâdifferentâphoneticâorâphonologicalâpropertiesâsuchâasâintonationâandâphrasesâorâsyllablesâandâtone,âorâco-articulat-ing phonetic features), and Fuzzy Time,âthatâis,âquantitativeâstatisticallyâmeasurableâproper-tiesâofâspeechâsignals.âtimeâtypeâtheoryâwasâdesignedâtoâprovideâaâframeworkâforâlinguisticâandâphoneticâspeechâtimingâstudies:âCategorialâtimeâasâtheâdistinctiveâfeatureââlong-shortâ,ârelational time as isochrony, rhythmic alternation and hierarchical timing relations, and Fuzzyâtimeâasâtheâstatisticallyâaccessibleâdomainâofâspeechâsignalâmeasurements.âtheâfol-lowingâdiscussionâfirstâaddressesâtheâquantitativeâlinearâmodels,âknownâasâârhythmâmetricsâ,âatâtheâFuzzyâtimeâlevel,âfollowedâtheâinter-levelârelationsâbetweenâtheâthreeâtimeâtypeâlev-els.âtimeâtypeâtheoryâwasâappliedâbyâcarson-Berndsen (1998) in a computational linguistic approachâtoâautomaticâspeechârecognition.
3.2. Linear models
gibbonâetâal.â(2005)âandâgibbon (2006) regard rhythm as an epiphenomenon deter-minedâbyâmanyâlinguisticâandâcognitiveâfactors,âbutâabstractâaânumberâofâpropertiesâforâtheâstructuralâcomponentâofâanâepiphenomenalâapproach.âtheâBase Unit of a rhythm (or otherâtimingârelation)âisâpattern,âgenerallyâaâsyllableâorâaâfootâ(accentedâsyllableâplusâun-accentedâsyllables)âconsistingâofâaâfiniteâtrajectoryâthroughâanân-dimensionalâparameterâspaceâ(pitch,âdurationâpatterns,âsegmentalâpatternsâinâsyllables,âetc.).âSequencesâofâBaseâunitsâ areâ relatedâ byâAlternation,â i.e.â dynamicâ traversalâ throughâ atâ leastâ twoâ positionsâinâtheâBaseâunitâparameterâspaceâ(e.g.âhigh-lowâpitch,âCVâsyllableâstructure,âlong-shortâorâstrong-weakâsyllableâpatterns).âtheâBaseâunitâsequencesâwithâalternationâmustâenterâinto an Iterationâ relation,â i.e.â theâalternatingâbaseâpatternâmustâ repeatâwithâatâ leastâ twoâoccurrences.â Finally,â forâ aâ rhythmâ toâ beâ identified,â theâBaseâunitsâ inâ aâ sequenceâwithâalternation and iteration must enter into an additional relation of Isochrony,âi.e.âtheâBaseâunitsâmustâbeâequalâinâlength.âRhythmicâBaseâunitsâareârarelyâexactlyâequalâinâlengthâatâtheâFuzzyâtimeâlevel,âbutâareâsubjectâtoâfuzzy isochrony (âsloppy isochronyâ): Base unit durationsâareâmeasuredâonâaâscaleâfromâmore-or-lessâequalâtoâmore-or-lessâunequal,âbutâmayâneverthelessâbeâinterpretedâperceptually,âandâexplainedâcognitively,âasâisochronous,âwithinâspecifiableâdifferenceâthresholds.
Severalâquantitativeâlinearâmodelsâhaveâbeenâproposedâforâspeechâtiming.âtableâ3âsum-marisesâthreeâofâtheâmostâwell-knownâmodels,âwhichâspecificallyâaddressâtheâtopicâofâisoch-ronyâinâpresumedâfoot-timedâlanguages,âtogetherâwithâtheâextentâtoâwhichâtheâmodelsâfulfilâtheânecessaryâconditionsâonârhythm.âtheâmethodsâusingâlinearâModelsâareâcorpus-based,âinductive,âa posteriori procedures which start with input from annotated speech data, and extractâtime-stamps,âdifferencesâbetweenâtime-stampsâ(i.e.âunitâintervalâdurations),âandâdif-ferencesâbetweenâdurationsâ(i.e.âdecelerationâandâaccelerationâofâintervals).
UnauthenticatedDownload Date | 12/13/15 8:16 AM
70 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
Table 3: Quantitative linear rhythm models (scott et al. 1986; Roach 1982; Low et al. 2001)
model description Constraintâfulfilment
pim
â
j
i
IIlog
sum of the ratios of each foot to each other foot (the log function reduces theâimpactâofâlongerâfeet).
Basic unit:Alternation:Iteration:Isochrony:
footnonoyes
pfd Sumâ ofâ absoluteâ (un-signed) differences of each foot from mean, di-videdâ byâ theâ meanâ footâlengthâ(%,âmaxâ=â100%).
Basic unit:Alternation:Iteration:Isochrony:
footnonoyes
npVi Meanâabsoluteâ(unsigned)âdifferenceâbetweenâneigh-boursâ (normalisedâ byâ di-visionâbyâmeanâ lengthâofâneighbours);â scaleâ fromâ0âtoâasymptoteâofâ200.
Basic unit:Alternation:Iteration:Isochrony:
vocalicâseqnoyesyes
gibbonâetâ al.â (2005)â showedâ thatâ thereâ isâ aâ strongâcorrelationâbetweenâeachâofâ theseâmeasuresâwhenâappliedâtoâsyllables,âandâbetweenâtheseâmeasuresâandâstandardâdeviationâofâsyllableâdurations,âandârejectedâclaimsâthatâtheseâlinearâmodelsâareâmodelsâofârhythm,âonâtheâgroundsâthatâtheyâdoânotâaccountâforârhythmicâalternationâ(cf.âalsoâgutâ2012)âbecauseâtheyâoperateâonâabsoluteâ(unsigned)âdurationâdifferences.
in the remainder of this section the results of using a new tool, the time group ana-lyser (tga: gibbonâ2013),âtoâinvestigateâsyllableâdurationalâproperties,âsomeâofâthemânovel,âusingâtheââSyllablesââannotationâtierâofâtheâaix-MaRSeCâcorpusâofâenglishâ(auranâetâal.â2004),âareâreported.âSixâofâtheâelevenâgenreâcategoriesârepresentedâinâtheâaix-MaRSeCâcor-pusâwereâselectedâonâtheâgroundsâofâgreaterâsimilarityâofâinformallyâdefinedâspeechâstyles:âaâ(âCommentaryâ),âBâ(ânewsâbroadcastâ),âCâ(âlectureâaimedâatâgeneralâaudienceâ),âDâ(âlec-tureâaimedâatârestrictedâaudienceâ),âFâ(âMagazine-styleâreportingâ),âkâ(âpropagandaâ).âtheâfunctionallyâlessâsimilarâfiveâcategoriesâeâ(âReligiousâbroadcastâincludingâliturgyâ),âgâ(âFic-tionâ),âhâ(âpoetryâ),âjâ(âDialogue)âandâMâ(âMiscellaneousâ)âwereânotâdealtâwith.
the following procedure was used:1.âannotationsâinâeachâgenreâcategoryâwereâanalysedâseparately.2.âtheâannotationsâwereâdividedâintoâpause-delimitedâ(inter-pause,âinterpausal)âsyllableâ
groups.3.âForâeachâgenre,âoverallâvaluesâforâdurationâmaximum,âmean,ârange,âintercept,âslope,â
standardâdeviationâandânpVIâwereâautomaticallyâcalculatedâwithâtheâtga.4.âValuesâforâallâsequencesâwereâdisplayedâtogetherâonâaâlineâgraphâinâorderâtoâpermitâ
directâ âeyeballingââ ofâ similaritiesâ andâ differencesâ betweenâmeasuresâ andâ betweenâ genresâ(furtherâcorrelationsâwereânotâinvestigatedâinâthisâstudy).
( )| |MFLn
footlenMFL i
Ă
âĂâ100
,
| |nfoot
=MFL iâ
( ) ( )1/2/
100 1 â
âĂâ â m
d+ddd
1+kk
kk
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 71
theâresultsâofâtheâquantitativeâanalysisâofâtheâgenreâcategoriesâareâvisualisedâinâFigureâ4.âSomeâresultsâareâscaledâ(seeâlegendâofâFigureâ4)âinâorderâtoâcreateâaâvisuallyâinterpretableâcombinedâdisplayâofâvaluesâforâeachâmeasureâandâeachâgenre.
Figure 4: Comparison of quantitative measures in six aix-MaRseC genre categories
predictably,âhighâcorrelationsâholdâbetweenâmeanâandâintercept,âbetweenâSDâandânpVI,âandâbetweenârangeâandâmaximum.âtheâinterestingâparameterâisâslope:âeachâcaseâshowsâde-celeration,âi.e.âaverageâincreaseâinâdurationâoverâtheâpause-definedâsegment.âtheâslopesâforâgenreâcategoriesâa,âB,âCâandâDâ(newsâbroadcastâandâlectures)âareâcloseâtogether,âwhileâtheâmoreâinformal,âaudience-directedâgenresâFâandâkâ(magazineâandâpropaganda)âshowâmuchâlargerâdeceleration.âthisâresultâsuggestsâaâphonostylisticâeffect,âwithâsyllableâslopeâpattern-ingâoverâpause-delimitedâsegmentsâasâaâcontributionâtoâspeechâstyle,âwhichâneedsâfurtherâinvestigationâinâtermsâofâspeechârate,âasâwellâasâmoreâpreciseâsociolinguisticâspecificationâofâgenreâcategories.
3.3. alternation models
theâsecondârelevantâpropertyâofâspeechâtimingâisâalternation.âtheâlinearâModelsâfailâbecauseâ theyâ lackâ thisâ alternationâ detectionâ property.âoneâ approachâ toâ characterisingâ al-ternationâinâspeechâtimingâisâtheâoscillatorâModel,âincorporatingâquantitativeâmeasuresâofârhythmâasâoscillationsâinâperceptionsâofârelativeârhythmicityâ(cf.âBarbosa 2009; indenâetâal.â2012).âtheâpresentâapproachâusingâtheâtgaâtoolâtakesâaâmoreâopportunisticâapproach,âandâretainsâtheâessentialâunitâintervalâdurationâdifferenceâpropertyâofâtheâlinearâModelsâ(referredâto here as ÎD),â extractedâ inâ theâ sameâwayâ fromâ speechâ signalâ annotations,â butâ alsoâ hasâanâ alternationâdetectionâproperty.âunlikeâ inâ theâoscillatorâModels,â insteadâofâ attemptingâtoâcharacteriseââalwaysâonââoscillators,âtheâintervalâdurationâdifferencesâareâtokenisedâintoâdiscreteâunitsâ(increase,âdecreaseâandâequalityâofâduration),âandâaâdistributionalâanalysisâofâtheâfrequenciesâofâtheseâintervalâdurationâtokensâisâmade,âfollowingâfamiliarâcomputationalâproceduresâfromâcorpusâlinguistics.
the initial output of the alternation model is a stream of ÎDâtokens:âforâthisâconversion,âminimalâdurationâchangesâareâdefinedâbyâmeansâofâanâadjustableâlocalâthreshold,âtypicallyâaroundâ50âms,âandâchangesâbelowâthisâthresholdâcountâasâequalâdurationâ(currentlyâthresh-oldsâareâinvestigatedâmanually;ânoâalgorithmicâoptimisingâsearchâisâperformed).â âtheâÎD tokensâareârepresentedâasâsymbols:âequalityâ(â=â),âaccelerationâ(â/â)âandâdecelerationâ(â\â).â
UnauthenticatedDownload Date | 12/13/15 8:16 AM
72 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
threshold-determinedâequalityâwillâbeâreferredâtoâasâfuzzy isochrony or sloppy isochrony.âtoâsomeâextent,âtheâprocedureâparallelsâforâdurationâsomeâofâtheâstylisationâproceduresâusedâinâconsiderationsâofâpitch:âforâtheâanalysisâofâpitchâintoâdiscreteâentitiesâ(e.g.ââtâhartâetâal.â1990;âauranâetâal.â2004).
second, in order to identify alternating, isochronous or random duration tendencies, frequenciesâofâtokenâdigrams,âtrigrams,âquadgramsâandâquingramsâareâmeasured.
Inâviewâofâtheâmethodologicalâemphasisâofâtheâpresentâcontribution,âtheâtokenân-gram frequencyâanalysisâprocedureâisâillustratedâusingâaâsingleâmonologueâfileâa0102B.TextGrid fromâtheâaix-MaRSeCâcorpus.âtheâresultsâareâshownâinâtableâ4.âtheâtableâshowsâtheâfirstâfiveâranksâforâfrequenciesâofâdigram,âtrigram,âquadgramâandâquingramâÎDâtokenâpatternsâatâlocalâthresholdâsettingsâofâ0âms,â20âms,â40âms,â60âmsâandâ80âms.âFiguresâgivenâareâpercent-agesâand,âinâparentheses,âabsoluteânumbers.
Inspectionâofâtheârowsâinâtableâ4âshowsâthatâtheâthresholdâvaluesâ0âandâ20âleadâtoâalmostâidentical results for all of the top three ÎDâtokenâpatternâranks,âindicatingâaâprevalenceâofâalternations,âandâonlyârareâ threshold-determinedâequality.âatâ40âmsâtheâsituationâstartsâ toâchange,âwithâmoreâequalities,âandâfromâaâthresholdâofâ60âmsâthereâisâincreasinglyâaâprepon-deranceâofâequalities.âInformally,âtheseâresultsâindicateâaâsourceâofâevidenceâforâaâlimitâofâaroundâ50âmsâonâtheâcontributionâofâdurationâdifferencesâtoâtheâidentificationâofâisochronyâinâthisâenglishâtext.
thereâareâaânumberâofâconsequencesâtoâbeâdrawnâfromâthisâanalysisâinâtermsâofâfurtherâclarificationsâwhichâareâneeded,âbutâwhichâareânotâwithinâtheâscopeâofâtheâpresentâcontribu-tion:
1.âtheâ50âmsâlimitâitselfâisâveryâlikelyâanâindicationâofâaâstructurallyârelevantâboundary.âhowever,âthisâcanâonlyâbeâverifiedâbyâexaminationâofâtheâlinguisticâconstructionsâassociatedâwith the ÎDâtokenâpatterns.
Table 4: Stylised duration difference token patterns for Aix-MARSEC files with A initial. Tokens: \ (increasing), / (decreasing), = (equal), + (initial pausal unit boundary), # (final pausal unit boundary)
ltâ=â0 ltâ=â20 ltâ=â40 ltâ=â60 ltâ=â80unit Rank count pattern count pattern count pattern count pattern count pattern
2-gram 1. 24%â(65) /\ 20%â(55) /\ 15%â(41) /\ 17%â(46) ==â 24%â(64) ==â2. 23%â(61) \/ 18%â(48) \/ 13%â(34) \/ 11%â(29) =\â 11%â(29) =\â3. 13%â(36) \\ 9%â(24) \# 9%â(24) \= 10%â(26) /=â 10%â(26) \=â
3-gram 1. 17%â(39) \/\ 13%â(31) \/\ 9%â(21) \/\ 8%â(20) ===â 12%â(29) ===â2. 13%â(31) /\/ 10%â(23) /\/ 7%â(17) /\/ 6%â(13) ==\â 8%â(18) ==\â3. 9%â(21) /\\ 6%â(13) /\\ 5%â(11) =/\ 5%â(12) \/=â 6%â(15) \==â
4-gram 1. 10%â(20) \/\/ 7%â(14) \/\/ 5%â(10) /\/\ 4%â(8) ====â 5%â(11) ===\â2. 9%â(18) /\/\ 7%â(14) /\/\ 4%â(9) \/\/ 3%â(7) ===\â 5%â(11) ====â3. 5%â(11) \/\\ 4%â(8) =\/\ 3%â(7) =\/\ 3%â(7) ==/\â 4%â(9) \===â
5-gram 1. 6%â(10) \/\/\ 5%â(9) \/\/\ 4%â(6) \/\/\ 3%â(5) ==/\/â 4%â(6) ====\â2. 5%â(9) /\/\/ 4%â(7) /\/\/ 3%â(5) \=/=\ 3%â(5) +====â 3%â(5) =\===â3. 5%â(8) \/\// 3%â(5) /\/\\ 2%â(4) /\/\\ 2%â(4) ====\â 3%â(5) +====â
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 73
2.âoneâfundamentalâproblemâofâtheâso-calledârhythmâmetricsâisâthatâtheyâcanâidentifyâdegreesâofâisochrony,âbutâinâtheâdirectionâofânon-isochronyâtheâvaluesâbecomeâlessâandâlessâmeaningful,âsinceâtheyâdoânotâdistinguishâbetweenâalternatingâandârandomâsequences.âtheâÎD Analysisâprocedureâoutlinesâaâpathâforwardâinâthisârespect.â
3.âanotherâ fundamentalâ problemâofâ theâ so-calledâ rhythmâmetricsâ isâ thatâ theyâdoânotâemployâthresholds,âbutâindiscriminatelyâincorporateâallâdurationâdifferences,âhoweverâsmall.
thereâareâaânumberâofâopenâissuesâwithâtheâÎD Analysis procedure, which are currently underâinvestigation,âconcerningâautomaticâthresholdâoptimisation,ânumericalâweightingâofâÎDâtokens,âfurtherânumericalâevaluationâofâtheâÎD n-gramâdistributionsâtoâinduceâaâârhythmâgrammarâ, and, not least, alignment of ÎDâtokenâpatternsâwithâgrammaticalâpatternsâinâorderâtoâdetermineâtheâsignificanceâofâÎDâthresholds.
however,âtheâgeneralâconclusionâisâthatâthisânovelâmethodâprovidesâoneâinterestingâwayâforwardâforâidentifyingâtheâessentialâalternationâpropertiesâofârhythm,âandâtherebyâcorrectingâaâcoreâweaknessâofâso-calledârhythmâmetricsâwhichâignoreâalternation.
3.4. Hierarchical models
theâtwoâbest-knownâhierarchicalâmodelsâofâspeechâtimingâareâthoseâofâJassemâ&âaber-crombieâ(cf.âdiscussionâinâgibbonâetâal.â2012)âforâenglish,âwhichâidentifyâtheâârhythmâgroupââorâ âfootââ asâ aâ basicâ unitâwithâ syllableâ components.âtheseâmodelsâ haveâbecomeâ standardâmodelsâforâprovidingâframeworksâforâstatisticalâanalyses.âtheâjassemâmodelâidentifiesâtwoâunits, the Narrow Rhythm Unit, NRU,âwhichâstartsâwithâaâstressedâsyllableâandâcontinuesâ(optionally)âwithâunstressedâsyllablesâuntilâtheânextâclearâwordâboundary,âandâtheâ(optional)âAnacrusis, ANA,âaâsequenceâofâunstressedâsyllablesâfromâaâclearâwordâboundaryâtoâtheâbe-ginningâofâtheânextânRu.âtheâjassemâmodelâclaimsâthatâtheâanaâandâtheânRuâdifferâinâtheirâtimingâproperties:âeachânRuâinâaâsequenceâtendsâtowardsâequalâlengthâ(conditionedâbyâtheânumberâofâsyllabicâandâphonemicâconstituentsâitâcontains),âwhileâtheâanaâtendsâtoâbeâfaster,âlessâstressed,âandâlessâconstrainedâtowardsâisochrony.âaâsequenceâofâANA and NRU, boundedâleftâandârightâbyâclearâwordâboundaries,âconstitutesâaâTotal Rhythm Unit, TRU.âtheâabercrombieâmodel,âonâtheâotherâhand,âpostulatesâonlyâtheâfoot,âdefinedâinâaâsimilarâwayâtoâjassemâsânRu,âandâintroducesâtheâconceptâofâtheââsilentâbeatâ,âwhichârelatesâindirectlyâtoâjassemâsâana.âBothâmodelsâareâcandidatesâforâaârhythmâtheory,âsinceâtheâclaimsâembodyâaâclearâBaseâunitâ(theâfoot),âalternationâ(stressed-unstressedâsyllables),âIterationâ(footâse-quences),âandâIsochronyâ(tendencyâtoâequalânRuâorâfootâtiming).âJassemâetâal.â(1984)âdem-onstratedâtheâquantitativeâvalidityâofâtheâjassemâmodel;âinvestigationsâofâtheâabercrombieâmodelâhaveâbeenâlessâsuccessful,âwhichâhasâinâturnâledâtoâpessimismâaboutâfindingâquantita-tiveârhythmâcorrelatesâinâtheâspeechâsignal.
Campbellâ (1992)â investigatedâ hierarchicalâ structuresâ inâ speechâ timingâ fromâ severalâperspectives,â includingâ theâdependenceâonâphoneâdurationsâonâsyllableâproperties,âandâatâaâhigherâlevelâtheârelationâofâsyllableâdurationsâtoâprosodicâstructureâ(usingâBreakâIndicesâmarkingâdifferentâlevelsâinâaâhierarchyâofâboundariesâbetweenâphonologicalâandâprosodicâunits)âandâgrammaticalâstructures.âheâfoundâaânumberâofâtendencies:âsyllableâdurationsâtendâto shorten in proportion to the hierarchical depth of a preceding grammatical phrase struc-tureâboundary,âandâ lengthenâ inâproportionâ toâ theâhierarchicalâdepthâofâeitherâaâ followingâ
UnauthenticatedDownload Date | 12/13/15 8:16 AM
74 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
grammaticalâphraseâstructureâboundary,âdepthâofâgrammaticalâembedding,âorâaâfollowingâprosodicâboundaryâinâtermsâofâBreakâIndicesâ(cf.âFigureâ16.12âinâCampbellâ1992).
the present approach to hierarchical modelling introduces the notion of Time Tree In-duction,âwhich,âlikeâtheâlinearâModelâandâalternationâModelâapproaches,âisâaâdata-drivenâa posteriori approach, in contrast to approaches which start with a priori models, such as linguisticallyâmotivatedâprosodicâhierarchyâ trees.â Inâ thisâ sense,â theâtimeâtreeâ InductionâapproachâbuildsâonâtheâlinearâModelâandâalternationâModelâapproaches,âandâextendsâcamp-bellâsâduration-hierarchyâcorrelationâmodel.âaâfirstâattemptâtoâcompareâa posteriori duration hierarchies to a prioriâgrammaticalâhierarchiesâwasâmadeâbyâgibbonâ(2003;â2006).
likeâalternationâModelâanalysis,âTTIâisâalsoâdeterminedâbyârelationsâbetweenâaccelerat-ingâorâdeceleratingâtokens,âexceptâthat,âinâcontrastâtoâdiscreteâtokenâsequenceâanalysis,âtheânumericalâdurationsâareâusedâforâtreeâinduction.âCurrentlyâtheâinductionâalgorithmâusesâei-therâdecelerationârelationsâorâaccelerationârelations,âbutânotâboth.âtheâfollowingârulesâdefineâbinaryâdeceleratingâ(short-long)âtrees,âforâexample:
(1) aâsyllableâsiâisâaâtreeâconstituent.
InâaâtreeâconstituentâsequenceâSâ=â<âsi , sj > , if dur(si) < dur(si) , then S is a tree constitu-entâwithâ theâdurationâlabelâdur(sj).aâbottom-upâalgorithmâappliesâ theârulesâuntilânoâmoreâapplicationsâareâpossible.âtreesâwithâotherâstructuresâemerge,âdependingâonâseveralâfactors:â(1)âhowââ=ââisâdealtâwithâ(e.g.âasââ>=ââorâânotâ>â),â(2)âwithââ>ââ(acceleration)âinsteadâofââ<ââ(deceleration), and (3) whether a right-left or left-right schedule together with early or late recursiveâclosureâisâusedâtoâimplementâtheâgroupingâcriterion.
theâ followingâ illustrationâofâ theâprocedureâusesâ theâduration-annotatedâ sequenceâ forâoneâinter-pausalâgroupâwhichâwasâextractedâautomaticallyâfromâtheâmonologueâfileâa0102B.TextGridâfromâtheâaix-MaRSeCâcorpus:
âmo::160â ânju:z:330â@:60â âbaut:150âD@:100â âre:160â vrâ@n:210 âsVn:290 âmjVn:290 âmu:n:500
aâ left-rightâ recursiveâ algorithmâappliesâ theâ specifiedâÎD criterion to the current and followingâinput-levelâannotationâdurationsâtoâcreateâaâbinaryâsubtree;âifâtheâcriterionâfails,âaâstackâofâpreviouslyâconstructedâsubtreeâconstituentsâisâexaminedâinâorderâtoâcreateâlargerâsubtrees,â andâ ifâ thisâ fails,â theâbottom-upâ searchâ forâ aânewâsubtreeâ restarts.â (noteâ thatâ anâalternativeâ algorithmâwhichâ processesâ theâ stackâ immediatelyâ afterâ successfulâ input-levelâconstructionâmayâleadâtoâdifferentâresults.)âtheâtgaâtoolâcomputesâtheâderivationâstepâbyâstepâ(cf.âtableâ5).âtheâautomaticallyâgeneratedâoutputâofâtheâimplementationâisâaâparsedâtree-bracketingâ(whichâisâvisualisedâasâaâtreeâgraphâinâFigureâ5):
((âmo:âânju:z)â((((@ââbaut)â((D@ââre)âvr@n))ââsVn)â(âmjVnââmu:n)))
Table 5: Time tree derivation
1.â160â330â60â150â100â160â210â290â290â50 7.â(160â330)â((60â150)â((100â160)â210))â290â290â5002.â(160â330)â60â150â100â160â210â290â290â500 8.â(160â330)â(((60â150)â((100â160)â210))â290)â290â5003.â(160â330)â(60â150)â100â160â210â290â290â500 9.â(160â330)â(((60â150)â((100â160)â210))â290)â(290â500)4.â(160â330)â(60â150)â(100â160)â210â290â290â500 10.â(160â330)â((((60â150)â((100â160)â210))â290)â(290â500))5.â(160â330)â((60â150)â(100â160))â210â290â290â500 11.â((160â330)â((((60â150)â((100â160)â210))â290)â(290â500)))
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 75
Figure 5: Time Tree parse with the ÎD iambic criterion
Comparisonâofâtheâtimeâtreeâwithâgrammaticalâunitsârevealsâsixâcorrespondencesâ(giv-enâinâorthography,âforâreadability):ââmore newsâ, âaboutâ, âthe Reverendâ, âabout the Rev-erendâ, âabout the Reverend Sun Mun Moonâ,âand,ânon-trivially,âtheâwholeâinter-pauseâunitââmore news about the Reverend Sun Mun Moonâ.âtwoâsequencesâdoânotâcorrespondâexactlyâto grammatical units: âthe Reâ, âthe Reverend Sunâ,âofâwhichâtheâsequenceââthe Reââcanâbeâanalysedâ asâanaâ inâ theâ jassemâ timingâmodel,â followedâ byâ aâmoreâ prominentâ âverendâ.âaâtree-comparisonâalgorithmâhasâbeenâusedâtoâdetermineâtheâdegreeâofâsimilarityâbetweenâtime trees and grammatical trees (gibbonâ2003;â2006).âexperimentsâwithâanâaccelerationâconditionâyieldâaâ largelyâ right-branchingâstructureâwhichâdoesânotâyieldâanyâcorrespond-encesâwithâgrammaticalâorâotherâplausibleâunitsâbeyondâsuffixedâwords.âtheâÎD relations areânotânecessarilyârelatedâtoârhythm,âthoughâsymmetriesâinâtheâtreeâmayâprovideâcluesâtoârhythmicâpatterns.âhowever,âgrammaticalâstructure,ânotârhythm,âisâatâissueâatâthisâpoint.
Clearly,âinâviewâofâtheânumberâofâdegreesâofâfreedomâdependingâonâtheâselectedâdurationâdifferenceâcriterionâandâparseâschedules,âfurtherâlevelsâofâautomationâareârequiredâinâorderâtoâsearchâtheâspaceâofârelationsâbetweenâtimeâtreesâandâgrammaticalâstructures.
Finally,âtheâgenreâunderâconsiderationâ(âCommentaryââbyâaâfemaleâspeaker)ârepresentsâaâsomewhatâformal,ârehearsedâstyle,âwhereâprosody-grammarâcorrespondencesâmayâbeâex-pected.âItâisânotâonlyâdurationâandâgrammaticalâstructureâwhichâareâlikelyâtoâcorrelate,âbutâalsoâsemanticallyâandâpragmaticallyâmotivatedâconstrastiveâandâemphaticâstructures,âwhileâonâtheâphoneticâsideâpitchâpatterningâwillâalsoâbeâinvolved,âasâwellâasâeffectsâofâ intrinsicâphoneâdurationâonâsyllableâdurationâandâhenceâonâ theâdurationâ trees.âtheseâcomplexitiesârequireâextensiveâfurtherâresearch.
4. Functional interpretation of timing in dialogue
Speechâtimingâfunctionsâatâseveralâlevelsâinâdialogue:âinâturn-takingâ(relativeâlengthâofâturns,âgapsâandâoverlapsâbetweenâturns),âandâwithinâturnsâ(pauses,âprominenceâpatternsâandâhierarchicalârhythmâstructures).âtoâinvestigateâsociophoneticâtimingâinâdialogueâinâconnec-tion with the phonetic alignment or non-alignment of participants, a scenario was designed inâwhichâmisunderstandingsâareâelicited:âspeakerâaâhasâaâcallerâroleâandâgivesâinstructionsâtoâspeakerâB,âinâaâcall-centreârole,âaboutâhowâtoâgetâfromâaâhospitalâtoâaâpersonâwithâaâheartâattack.âBecauseâspeakersââmapsâdifferedâaâlittle,âmisunderstandingsâoccurredâandâtheâspeak-ersâhadâtoânegotiateâtheârouteâinâorderâtoâfinishâtheâtaskâ(Bachanâ2011).âtheâdialoguesâwereâ
UnauthenticatedDownload Date | 12/13/15 8:16 AM
76 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
conductedâ inâpolishâbetweenâpolishânativeâspeakers,âandârecordedâ inâstressfulâconditionsâbetweenâpeopleâwhoâdidânotâknowâeachâother.âSixâdialoguesâ(totalâdurationâ15âminâ20âsec,âthreeâmale,âthreeâfemale)âwereârecorded,âannotatedâatâsyllableâlevelâandâanalysedâusingâde-scriptiveâstatisticalâmethods.
theâfollowingâdiscussionâaddressesâtheâspecificâquestionsâofâwhetherâthereâareâgenderâorâroleâdifferencesâinâstressfulâdialogues,âandâwhichâspeechâtimingâmodelsâperformâbetterâthanâothersâinâthisâtask.âtheâoscillogramsâinâtableâ6âillustrateâtheâturn-takingâactivityâofâtheâdia-logues:âspeakerâBâdoesânotâhaveâaâsimpleâlistenerârole,âbutâgaveâaâlotâofâfeedbackâtoâspeakerâaâaboutâwhetherâtheâinstructionsâwereâunderstood.âtheâupperâandâlowerâoscillogramsâshowâtheâspeechâofâspeakerâBâ(call-centre)âandâspeakerâaâ(caller)ârespectively.
Table 6: oscillograms of the female (left) and male (right) dialogues
Dial.â1
Dial.â4
Dial.â2
Dial.â5
Dial.â3
Dial.â6
Initialâ analysisâ ofâ theâ temporalâ turnâorganisationâ showedâ thatâ theâ femaleâBâ speakersâspeakâlessâthanâmaleâBâspeakers,âgivingâlessâfeedbackâandâenquiringâlessâaboutâtheâcorrectâroute.âDeeperâanalysisâofâtheâdialoguesâshowedâthatâspeechâinâfemaleâdialoguesâhardlyâover-laps,âthisâoccurringâonlyâwhenâfemaleâspeakerâBâmisunderstoodâanâinstructionâandâspeakerâaâinterruptedâspeakerâBâtoâclarify.âDifferentâkindsâofâturn-takingâoccur.âInâDialogueâ2âspeak-erâBâgaveâbelatedâpositiveâfeedback:âspeakerâaâgaveâspeakerâBâtimeâtoâprovideâpositiveâfeed-back,âbothâspeakersâwereâsilentâforâaâfewâseconds,âthenâwhenâspeakerâaâcontinued,âspeakerâBâprovidedâfeedbackâtoâtheâpreviousâinstructions,âperhapsâdueâtoâspeakerâBâsâinitiallyâbeingâsilentâwhileâconcentratingâonâmarkingâtheârouteâonâtheâmap.
Maleâdialoguesâwereâmuchâmoreâlivelyâandâinteractive,âandâtheirâturnâtimingâshowsâthreeâphases:âinitial,âmedialâandâfinal.âInitially,âtheirâspeechâoverlapsâinâtheâgreetingâandâintroductoryâpartâofâtheâdialogueâ(e.g.âarrangingâwhatâtheâtaskâisâandâwhereâtoâstart).âtheâmaleâ Bâ speakersâ gaveâ briefâ positiveâ feedback,â andâ theirâ utterancesâwereâmuchâ longerâwhenâ theyâwereâ askingâ forâ informationâ orâ providingâ informationâ aboutâ understandingâinstructionsâorâaboutâwhereâtheyâwereâmovingâonâtheâmap.âalthoughâinitiallyâtheâspeak-ersââspeechâoverlapped,âregardlessâofâ theâfunctionâofâ theâ turnâ(positiveâfeedback,â infor-mationâproviding),âinâtheâcourseâofâtheâtask,âinâtheâmedialâphase,âtheâspeakersâtendedâtoâalign,âwithâspeakerâaâwaitingâforâspeakerâBâtoâgiveâpositiveâfeedbackâ(noâspeechâover-
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 77
lap),âbeforeâcontinuingâwithâaâfurtherâinstruction.âalso,âwhenâtheâBâspeakersâwereâaskingâquestions,âspeakerâaâwaitedâuntilâtheâquestionâfinishedâbeforeâanswering.âasâwithâfemaleâspeakers,âoverlapsâhappenedâwhenâtheâinstructionsâwereâmisinterpretedâbyâspeakerâB,âandâspeakerâaâhadâtoâinterruptâtoâclarifyâtheâroute.âInâtheâfinalâphaseâafterâtheâdialogue,âwhenâparticipantsâhadâaccomplishedâtheâtask,âtheyâtookâleaveâofâeachâother,âandâtheirâgoodbyeâutterancesâagainâoverlapped.
4.1. Quantitative analysis of dialogue
Forâquantitativeâanalysisâofâtheâdialogueâtheâtgaâtoolâwasâused,âwithâfurtherâevaluationâasânecessary.âtheâannotationsâofâsilentâpauses,âspeakerânoises,âintrusiveânoises,âandâlaughterâwereâtreatedâasâpauses.âaâsetâofâdifferentâmeasuresâbasedâonâsyllableâtimingâwithinâinter-pauseâgroupsâwasâselectedâandâinvestigated:
1.âoverallâ timingâproperties:â forâeachâspeaker,âoverallâduration,âminimumâandâmaxi-mumâsyllableâlengths,âsyllable/secâspeechârate.
2.âglobalâtendencies:âforâeachâspeaker,âoverallâmedian,âmeanâandânormalisedâpairwiseâvariabilityâindexâ(nPVI),âi.e.âmeanâdifferencesâbetweenâadjacentâsyllableâpairs,ânormalisedâbyâdividingâtheâdifferenceâbyâtheâmeanâofâtheâpair.
Figureâ6âpresentsâtheâmeanâandâmedianâdurationâofâsyllablesâandâtheâstandardâdeviation.âtheâoverallâmeanâdurationsâvaryâwithinâaâdialogueâ(theâexceptionsâareâdialogueâ1âandâ2),âwhereasâtheâoverallâmedianâdurationâvaluesâareâmoreâsimilar.âtheâstandardâdeviationâisâveryâhigh,âindicatingâaâbroadârangeâofâvariationâbetweenâveryâshortâ(e.g.âinâfastâspeech)âorâveryâlongâ(e.g.âfilledâpausesâandâhesitations).
Figure 6: Mean and median duration of syllables and standard deviation in six dialogues
the nPVIâvaluesâareâpresentedâinâFigureâ7.âtheâoverallânPVIâvaluesâforâallâtheâdialogueâpairsâareâalmostâtheâsameâââanâexceptionâisâdialogueâ5â(speakerâa:â39,âspeakerâB:â46),âwithâsmaller nPVIâforâfemaleâspeakersâandâhigherâforâmaleâspeakers.âacrossâtheâdialogues,âmeanâand median nPVIâvaluesâareâmoreâdiverse,âbutâbetweenâinterlocutorsâtheyâtendâtoâbeâmoreâsimilar,âindicatingâphoneticâalignmentâofâspeakersâwithinâaâdialogue.â
theâdetailedâresultsâofâanalysisâofâtheâsixâdialoguesâareâpresentedâinâtableâ7.âtheâanaly-sisâconfirmsâtheâimpressionâthatâbothâspeakersâwereâactiveâinâtheâdialogue:âComparisonâofâtheââValidâtimeâgroupsââshowsâthatâoneâofâtheâspeakers,âhereâspeakerâa,âspokeâmuchâmoreâthanâspeakerâB.
UnauthenticatedDownload Date | 12/13/15 8:16 AM
78 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
Figure 7: npVi values for six dialogues
Table 7: Results of quantitative analysis of six dialogues
female dialogues male dialoguesdialogue 1 dialogue 2 dialogue 3 Mean dialogue 4 dialogue 5 dialogue 6 Mean
duration: 70.1 128.2 169.8 â 156.5 170.4 225.6 âSpeaker: a B a B a B â a B a B a B âage: 27 25 23 31 21 28 â 19 28 30 29 22 25 âoverallâ duration
44357 10311 77365 25898 94915 37816 48443 107892 42018 100320 58364 144105 47121 83303
overallâmin 42 55 48 59 31 39 45.67 25 44 62 41 39 54 44.17âoverallâmax 710 442 769 535 1002 607 677.5 1680 594 930 1577 1218 754 1125.5âValid time groups
21 9 31 12 38 30 23.5 44 28 41 30 72 34 41.50â
overallârate/sec 5.48 5.33 5.29 5.64 4.11 5.18 5.17 4.88 5.93 5.43 4.64 5.39 4.75 5.17âoverallâslope 0.18 0.65 0 0.07 0.09 0.16 0.19 â0.12 â0.12 0 0.08 0 â0.11 â0.05âmean of slopes 24.11 33.67 29.29 75.01 22.38 43.57 38.01 14.1 40.45 7.88 67.13 35.32 57.88 37.13âmedian of slopes
10.07 25.17 9.63 28.24 2.66 20.25 16 0.98 19.53 0.5 21.85 8.25 20.08 11.87
Clearâgenderâdifferencesâareâindicatedâbyâtwoâvariables.âFirst,ââoverallâdurationââshowsâthatâfemaleâBâspeakersâwereâsilentâaboutâ66%âofâtheâtime;âmaleâdialoguesâwereâlonger;âmaleâBâspeakersâspokeâmore,âaboutâ40%âofâtheâtime;âfemaleâandâmaleâspeechâratesâareâequalâ(5.17âsyll/sec),âbutâ femalesâ inâ aâdialogueâhadâmoreâ similarâ speechâ ratesâ exceptâ inâDialogueâ3,âwhileâmaleâspeakersâvariedâmoreâinâspeechârate.âSecond,ââoverallâslopeââshowsâthatâinâfe-maleâdialogues,âforâfemaleâBâspeakersâ(instructionâfollowers)âtheâslopeâisâsteeperâthanâforâaâspeakers,âwhichâmeansâthatâtheâBâspeakersâslowedâdownâtheirâspeechâduringâanâutterance.âMaleâspeakerâslopeâvaluesâareâlessâsteepâandâevenânegative,âsuggestingâthatâmaleâspeakersâsometimesâincreasedâtheirâspeechâtempoâduringâanâutterance.âoverallâslopeâvaluesâforâmaleâspeakersâareâmoreâsimilarâinâeachâpair,âbutââMeanâofâSlopesââandââMedianâofâSlopesââforâfe-maleâandâmaleâdialoguesâshowâthatâspeakersâinâtheâaâandâBâdialogueârolesâdifferâconsiderably.
4.2. Comparison of female vs. male dialogues
InâFigureâ8âvariousâmeasurementsâofâtheâsyllableâduration,âstandardâdeviationâandânPVI indexâinâdialoguesâbetweenâfemaleâandâmaleâspeakersâaâandâBâareâpresented.âtheâoverallâmeanâandâmedianâofâsyllableâdurationsâforâeachâgroupâdifferâaâgreatâdeal,âwhichâsuggestsâthatâthereâareâmanyâextremeâvaluesâ(eitherâveryâshortâsyllablesâinâfastâspeechâorâlongâsyllables,âi.e.âhesitationsâandâfilledâpauses).
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 79
tableâ8âshowsâtheâsummaryâanalysisâofâtheâdialoguesâbetweenâfemaleâandâmaleâspeak-ersâaâandâB.âtheânPVIâvaluesâ(i.e.âtheâoverallâmeanâandâmedian)âareâalmostâtheâsameâforâtheâfemaleâspeakers,âwhileâmaleâvaluesâdiverge.âtheâvaluesâofâstandardâdeviationâareâhigherâforâaâspeakers,âprobablyâdueâtoâtheirâchangingâtheirâspeakingâstyleâorâspeedâfromâveryâfastâspeechâwhenâgivingâinstructionsâtoâveryâslowâhesitatingâspeechâandâfilledâpausesâwhenâtheyâcouldânotâfindâcorrectâwordsâtoâexpressâthemselves.âtheâoverallâinterceptsâforâBâspeakersâareâveryâsimilar,âwhileâtheâvaluesâforâaâspeakersâareâquiteâdifferent.âhowever,âwhenâlookingâatâtheâmeanâandâmedianâofâtheâintercepts,âtheâresultsâofâfemaleâandâmaleâaâspeakersâareâsimilar,âasâwellâasâtheâresultsâofâfemaleâandâmaleâBâspeakers.âtheâoverallâslopeâvaluesâforâfemaleâspeakersâareâveryâclose,âwhileâtheâmaleâvaluesâdiffer,âevenâbeingânegativeâforâaâspeakers.
tableâ9âshowsâtheâresultsâofâaâsummaryâcomparisonâbetweenâaâspeakersâandâBâspeak-ers,âasâwellâasâbetweenâfemaleâspeakersâandâmaleâspeakers.âtheâresultsâshowâthatâaâspeakersâspokeâmuchâmoreâthanâBâspeakers,âandâalsoâtheâmaleâspeakersâspokeâmoreâthanâfemales.âtheâoverallâminimumâvalueâisâtheâsmallestâforâmaleâaâspeakersâââcausedâprobablyâbyâfastâspeech.âtheâoverallârateâisâsimilar,âbutâtheâvaluesâforâfemaleâspeakersâareâtheâsmallest.âaâsimilarityâisâseenâbetweenâtheâoverallâmeanâandâmedianâvaluesâbetweenâspeakersâaâandâB,âwhileâtheâdifferenceâisâlargerâbetweenâfemaleâandâmaleâspeech.âInâallâcases,âfemalesââsyllableâdura-
Figure 8: Measurements of syllable durations, standard deviation and npVi index
Table 8: Quantitative results of the analysis of speech of female a and B and male a and B speakers
female a female B male a male B overallâduration 216638 74024 352318 147378 overallâmin 31 39 25 41 overallâmax 1002 607 1680 1577 Valid time groups 90 51 157 93 overallârate/sec 4.81 5.36 5.25 5.04âComponents:âglobalâtendenciesoverallâmean 207.91 186.46 190.65 198.36âoverallâmedian 168.5 168 157 163 overallânPVI 45 45 46 49 overallâintercept 162.62 173.54 206.15 171.67âoverallâSD 126.93 91.44 133.34 125.62âoverallâslope 0.09 0.07 â0.01 0.07â
UnauthenticatedDownload Date | 12/13/15 8:16 AM
80 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
tionsâareâtheâlongest.âtheâmeanâandâmedianânPVIsâvaluesâdifferâlessâbetweenâspeakersâaâandâB,âwhileâtheâdifferenceâisâlargerâbetweenâfemaleâandâmaleâspeakers.âtheâmeanâandâmedianâvalueâofâtheâslopeâisâtheâsmallestâforâaâspeakers,âindicatingâthatâtheirâspeechâwasâfastâandâspeedingâupâtowardsâtheâendâofâtheâutterance.âStandardâdeviationâisâhighâforâallâanalysedâgroupsâofâspeakers.
4.3. Conclusions
theâ temporalâ structureâofâdialoguesâ indicatedâaâclearâdifferenceâbetweenâ femaleâandâmaleâdialogues.âFemaleâdialoguesâwereâshorter,âandâtheâspeakersââspeechâdidânotâoverlapâmuch,â apartâ fromâ theâmisunderstandingsâ andâhesitations,âwhileâmaleâ speakersâ interactedâaâlot,âinterruptingâeachâother,âbutâfinallyâalsoâaccommodatingâandâreducingâspeechâoverlap.âSuchâaâdifferenceâmayâbeâcausedânotâonlyâbyâtheâfemale-maleâdifferences,âbutâalsoâbyâtheâspecificânatureâofâ theâ task.â Itâ isâ suspectedâ thatâmalesâ feltâmoreâcomfortableâwhenâgivingâdirectionsâonâhowâtoâgetâtoâtheâplaceâandâalsoâinâfollowingâinstructionsâaboutâturningâleftâorâ right.âtheâdialogueâstrategiesâdifferedâbetweenâ femalesâandâmales.âwhileâ femalesâdidânotâinterruptâeachâotherâduringâspeaking,âmalesâprovidedâaâlotâofâfeedbackâandâinterruptedâeachâother.âhowever,âinâtheâcourseâofâtheâdialogues,âtheâmaleâspeakersâalignedâtheirâbehav-ioursâandâdidânotâstartâtalkingâbeforeâtheâotherâspeakerâfinished.âInâgeneral,âtheâBâspeakersâslowedâdownâinâtheâcourseâofâtheirâutterances,âasâshownâbyâtheâslopeâhighâvalues,âwhereasâtheâslopeâofâaâspeakersâwasâmuchâsmaller,âevenâbeingânegativeâoverallâforâmaleâaâspeakers.âtheâstandardâdeviationsâforâallâspeakersâwereâhigh,âindicatingâthatâtheâspeechâwasâvividâandâspontaneous.
Table 9: Summary table: speakers A vs. speakers B and female speakers vs. male speakers
Speakersâa SpeakersâB females males overallâduration: 568 956 221 402 290 662 499 696 overallâmin 25 39 31 25 overallâmax 1680 1577 1002 1680 Valid time groups 247 144 141 250 overallârate/sec 5.08 5.15 4.95 5.19âComponents:âglobalâtendenciesoverallâmean 196.87 194.21 201.99 192.86âoverallâmedian 161 164 168 159 overallânPVI 45 47 45 47 mean of nPVIs 47 50 45 50 median of mnPVIs 41 44.5 38 44 overallâintercept 209.25 173.21 164.41 203.28âoverallâSD 131.33 115.02 118.6 131.22âoverallâslope 0 0.04 0.05 0 mean of slopes 23.28 52.84 33.87 34.34âmedian of slopes 4.94 20.78 10 7.93
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 81
5. Summary and outlook
Bothâtheâstudyâofâtheâliteratureâandâtheâoriginalâresearchâreportedâinâthisâstudyârevealâaâwideâvarietyâofâfruitfulâmethodologiesâwhichâhaveâbeenâandâareâcontinuingâtoâbeâdeployedâinâtheâstudyâofâspeechâtiming.âonâtheâoneâhand,âtheâcomplexityâofâidentifyingâvalidâtimingâparadigmaticâpropertiesâbyâmeansâofâcontextualâfactorsâisâmadeâveryâclearâbyâtheâClassifica-tion and Regression Treesâ(CaRt)âstudies.âonâtheâother,âtheâneedâtoâexamineâtheâsyntag-maticâstructuresâofâlinearity,âalternationâandâhierarchyâhasâalsoâbeenâdemonstrated.âFinally,âtheâoptionsâforâinterpretingâdurationâpatterningâatâtheâdiscourseâlevelâfromâaâfunctionalâpointâofâviewâareâclear.
theâresultsâofâtheâvariousâtimingâanalysisâmethodsâcanâbeâusedâinâvariousâapplicationâscenarios.âoneâveryâcommonâscenario,âwhichâcannotâbeâdealtâwithâhere,âliesâinâtheâcompu-tationalâsupportâofâforeignâlanguageâlearningâproficiencyâtestingâbyâobjectiveâcomparisonâofâdurationâpropertiesâofânativeâspeakerâandâforeignâlanguageâspeakerâtimingâpatterns.âanâopenâquestionâconcernsâtheâpossibleâpotentialâofâusingâtheâresultsâofâperception-basedâstudiesâasâaâsupportâforâcharacterisingâlong-termâfeaturesâofâspeechâandâspeakers.âtheseââareâongoingâresearchâfields.âanotherâscenario,â toâwhichâ theâpresentâstudyâ isâcloselyârelated,â isâspeechâtechnologyâandâdialogueâ systemâdesign.â Itâ isâ notâonlyâ theâparadigmaticâ andâ syntagmaticâpropertiesâofâtimingâpatternsâthatâareâusefulâinâthisâscenario,âbutâalsoâtheâsociolinguisticâpat-ternsâwhichâemergeâfromâdialogueâcorpusâstudy.âtheâfemale-maleâdifferencesâshowedâthatâdifferentâdialogueâstrategiesâcouldâbeâimplementedâinâaâdialogueâsystemâwhenâinteractingâwith females or males, though much further sociolinguistic research on the reasons for these differencesâ isânecessary,â andâ itâwouldânotâbeâadvisableâ toâapplyâ theseâdescriptiveâ resultsâwithoutâcarefulâconsiderationâofâtheseâreasons.
References
arnold,âDenisâ&âwagner,âpetraâ&âMöbius,âBernd.â2011.âevaluatingâdifferentâratingâscalesâforâobtainingâjudg-mentsâofâsyllableâprominenceâfromânaiveâlisteners.âInâProceedings of XVIIth International Congress of Pho-netic Sciences,â253â255.âhongâkong.
auran,âCyrilâ&âBouzon,âCarolineâ&âhirst,âDaniel.â2004.âtheâaix-MaRSeCâproject:âanâevolutiveâdatabaseâofâspokenâenglish.âInâBel,âBernardâ&âMarlien,âIsabelleâ(eds.),âProceedings of the Second International Confer-ence on Speech Prosody,â561â564.ânara,âjapan.
Bachan,âjolanta.â2011.âCommunicative alignment of synthetic speech.âpoznaĆ:âadamâMickiewiczâuniversityâinâpoznaĆ.â(Doctoralâdissertation.)
Barbosa,âplinio.â2009.âMeasuringâspeechârhythmâvariationâinâanâoscillator-basedâframework.âInâProceedings of Interspeech 2009.âBrighton:âInternationalâSpeechâCommunicationâassociation.
Breiman,âleoâ&âFriedman,âjeromeâ&âolshen,âR.âa.â&âStone,âCharles.â1984.âClassification and regression trees. Monterey,âCa:âwadsworthâ&âBrooks/ColeâadvancedâBooksâ&âSoftware.
Buchsbaum,âadamâ&âvanâSantenâl.,âjanâp.âh.â1997.âMethodsâforâoptimalâtextâSelection.âInâProceedings 5th Euro. Conf. on Speech Communication and Technology, Vol 2, 553â556.âRhodes,âgreece.
Campbell,ânick.â1992.âMulti-level timing in speech. Brighton,âuk:âuniversityâofâSussexâ(exp.âpsychol).â(Doc-toralâdissertation.)
Carson-Berndsen,âjulie.â1998.âTime map phonology: Finite state models and event logics in speech recognition.âDordrecht:âkluwerâacademicâpublishers.
Cummins,âFred.â1999.âSomeâlengtheningâfactorsâinâenglishâspeechâcombineâadditivelyâatâmostârates.âThe Journal of the Acoustical Society of America 105.â476â480.
UnauthenticatedDownload Date | 12/13/15 8:16 AM
82 lp lVi (1)DaFYDDâgIBBon,âkataRzYnaâkleSSaâ&âjolantaâBaChan
Dechert,âhansâw.â&âRaupach,âManfredâ(eds.),âTemporal Variables in Speech. Studies in Honour of Frieda Gold-man-Eisler.âtheâhague:âMouton.
Demenko,âgraĆŒynaâ&âklessa,âkatarzynaâ&âSzymaĆski,âMarcinâ&âBreuer,âStefanâ&âhess,âwolfgang.â2010.âpolishâunitâselectionâspeechâsynthesisâwithâBoSS:âextensionsâandâspeechâcorpora.âInternational Journal of Speech Technology 13(2).â85â99.
everitt,âBrianâS.â&âlandau,âSabineâ&âleese,âMorvenâ&âStahl,âDanielâ2011.âCluster Analysis, 5th Edition. kingâs College,âlondon:âjohnâwileyâ&âSons.
gibbon,âDafydd.â1992.âprosody,âtimeâtypes,âandâlinguisticâdesignâfactorsâinâspokenâlanguageâsystemâarchitectures.âProceedings of KONVENS 1992.â90â99.
gibbon,âDafydd.â2003.âComputationalâmodellingâofârhythmâasâalternation,âiterationâandâhierarchy.âInâProceedings of International Congress of Phonetic Sciences iii. Barcelona,â2489â2492.
gibbon,âDafydd.â2006.âtimeâtypesâandâtimeâtrees:âprosodicâminingâandâalignmentâofâtemporallyâannotatedâdata.âInâSudhoff,âStefanâetâal.â2006.âMethods in Empirical Prosody Research,â281â209.âBerlin:âwalterâdeâgruyter.
gibbon,âDafydd.â2013.âtga:âaâwebâtoolâforâtimeâgroupâanalysis.âInâProceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP).âaix-en-provence.
gibbon,âDafyddâ&âFernandes,âFlavianeâRomani.â2005.âannotation-miningâforârhythmâmodelâcomparisonâinâBra-zilianâportuguese.âProceedings of Interspeech 2005,â3289â3292.
gibbon,âDafyddâ&âhirst,âDanielâ&âCampbell,ânickâ(eds.).â2012.âRhythm, melody and harmony in speech. Studies in honour of Wiktor Jassem.âSpeech and Language Technologyâ14/15.âpoznaĆ.
grosjean,âFrançoisâh.â&âlass,ânormanâj.â1977.âSomeâfactorsâaffectingâtheâlistenerâsâperceptionâofâreadingârateâinâenglishâandâFrench.âLanguage and Speech 20(3).â198â208.
gut,âulrike.â2012.âRhythmâinâl2âspeech.âInâgibbon,âDafyddâ&âhirst,âDanielâ&âCampbell,ânickâ(eds.),âRhythm, melody and harmony in speech. Studies in honour of Wiktor Jassem.âSpeech and Language Technology 14/15.â105â114.âpoznaĆ.
âtâhart,â johanâ&âCollier,âReneâ&âCohenâantonie.â 1990.âA Perceptual Study of Intonation: An Experimental-Phonetic Approach to Speech Melody.âCambridge:âCambridgeâuniversityâpress.
hirst,âDanielâ&âDiâCristo,âalbertâ(eds.).â1998.âIntonation Systems. A survey of Twenty Languages.âCambridge:âCambridgeâuniversityâpress.
Inden,âBenjaminâ&âMalisz,âzofiaâ&âwagner,âpetra,â&âwachsmuth,âIpke.â2012.âRapidâentrainmentâtoâspontaneousâspeech:âaâcomparisonâofâoscillatorâmodels.âInâMiyake,ân.â&âpeebles,âD.â&âCooper,âR.âp.â(eds.),âProceedings of 34th Annual Conference of the Cognitive Science Society.âaustin,âtX:âCognitiveâScienceâSociety.
jassem,âwiktor.â2003.âIpa:âPolish. Journal of the International Phonetic Association 33(1).â103â107.jassem,âwiktorâ&âkrzyĆko,âMirosĆawâ&âStolarski,âprzemysĆaw.â1981.âRegressionâmodelâofâisochronyâinâspeechâ
signal, IPPT PAN 33.âwarszawa.jassem,âwiktorâ&âhill,âDavidâR.â&âwitten,âIanâh.â1984.âIsochronyâinâenglishâspeech:âitsâstatisticalâvalidityâandâ
linguisticârelevance.âInâgibbon,âDafyddâ&âRichter,âhelmutâ(eds.),âIntonation, accent and rhythm. Studies in Discourse Phonologyâ8. 203â225.
king,âSimonâ&âportele,âthomasâ&âhöfer,âFlorian.â1997.âSpeechâsynthesisâusingânon-uniformâunitsâinâtheâVerb-mobilâproject.âProceedings Eurospeech 2.â569â572.âRhodes.
king,âSimonâ&âBlack,âalanâw.â&âtaylor,âpaulâ&âCaley,âRichardâ&âClark,âRob.â2003.âedinburghâSpeechâtools.âSystemâDocumentationâeditionâ1.2,âforâ1.2.3â24thâjanâ2003.â(Retrievedâfrom:âhttp://www.cstr.ed.ac.uk/proj-ects/speech_tools/manual-1.2.0âonâ27âaprilâ2013).
klatt,âDennis.âh.â1976.âlinguisticâusesâofâsegmentalâdurationâinâenglish:âacousticâandâperceptualâevidence.âThe Journal of the Acoustical Society of America 59.â1208-1221.
klatt,âDennis.âh.â1987.âReviewâofâtext-to-speechâconversionâforâenglish.âThe Journal of the Acoustical Society of America 88(3).â737â793.
klessa,âkatarzynaâ&âSzymaĆski,âMarcinâ&âBreuer,âS.,â&âDemenko,âgraĆŒyna.â2007.âoptimizationâofâpolishâseg-mentalâdurationâpredictionâwithâCaRt.âInâProceedings of 6th ISCA Workshop on Speech Synthesis (SSW-6). Vol.â1.âBonn.
klessa,âkatarzynaâ&âwagner,âagnieszka,âoleĆkowicz-popiel,âMagdalenaâ&âkarpiĆski,âMaciej.â2013.ââparalin-guaââââaânewâspeechâcorpusâforâtheâstudiesâofâparalinguisticâfeatures.âInâVargas-Sierra,âCheloâ(ed.),âCorpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia â Social and Behavioral Science. Vol.â95,â48â58.
UnauthenticatedDownload Date | 12/13/15 8:16 AM
Duration and speed of speech events: A selection of methodslp lVi (1) 83
koreman,âjacques.â2006.âperceivedâspeechârate:âtheâeffectsâofâarticulationârateâandâspeakingâstyleâinâspontaneousâspeech.âJournal of the Acoustical Society of America 119.â582â596.
lehiste,âIlse.â1970.âSuprasegmentals.âCambridge,âMassachusettsâlondon:âM.I.t.âpress.lehiste,âIlse.â1977.âIsochronyâreconsidered. Journal of Phoneticsâ5.low,âeeâlingâ&âgrabe,âestherâ&ânolan,âFrancis.â2001.âQuantitativeâcharacterisationsâofâspeechârhythm:âSyllable-
timingâinâSingaporeâenglish.âLanguage and Speech 43(4).â377â401.Ćobacz,â piotra.â 1976a.â objectiveâ andâ subjectiveâ speechâ tempoâ inâ polish.â Speech Analysis and Synthesis 4.â
â173â186.Ćobacz,âpiotra.â1976b.âSpeechârateâandâvowelâformants.âSpeech Analysis and Synthesisâ4.â187â218.Möbius,âBerndâ&âvanâSanten,âjanâp.âh.â1996.âModelingâsegmentalâdurationâinâgermanâtext-to-speechâsynthesis.â
Spoken Language, 1996. Proceedings of ICSLP.âVol.â4,â2395â2398.âphiladelphia,âpa:âIeee.Möbius,âBernd.â2001.âRareâeventsâandâclosedâdomains:âtwoâdelicateâconceptsâinâspeechâsynthesis.â4th ISCA ITRW
on Speech Synthesis.âperthshire.Moers,âDonataâ&âjauk,âIgorâ&âMöbius,âBerndâ&âwagner,âpetra.â2010.âSynthesizingâFastâSpeechâbyâImplementingâ
Multi-phoneâunitsâinâunitâSelectionâSpeechâSynthesis.âInâProceedings of 7th ISCA Tutorial and Research Workshop on Speech Synthesis (SSW-7).
Moos,âanja,â&âtrouvain,âjĂŒrgen.â2007.âComprehensionâofâultra-FastâSpeechâBlindâvs.âânormallyâhearingââper-sons.âInâProceedings of the 16th International Congress of Phonetic Sciences,â677â680.
olaszy,âgĂĄbor.â2002.âpredictingâhungarianâsoundâdurationsâforâcontinuousâspeech.âActa Linguistica Hungarica 49(3â4).â321â345.
oÊŒShaughnessy,âDouglas.â1984.âaâmultispeakerâanalysisâofâdurationâinâreadâFrenchâparagraphs.âJournal of the Acoustical Society of Americaâ76(6).â1664â1672.
pfitzinger,âhartmutâR.â1996.âtwoâapproachesâtoâspeechârateâestimation.âInâProceedings SST. Vol.â96,â421â426.portele,âthomasâ&âSendlemeier,âwalterâ&âhess,âwolfgang.â1990.âaâsystemâforâgermanâspeechâsynthesisâbasedâonâ
demisyllables,âdiphones,âandâsuffixes.âInâESCA Workshop on Speech Synthesis Autrans,â161â164.Richter,âlutosĆawa.â1973.âtheâdurationâofâpolishâvowels.âSpeech Analysis and Synthesis 3.â87â115.âwarszawa.Richter,âlutosĆawa.â1974.âporĂłwnanieâiloczasuâsamogĆosekâpolskichâwymĂłwionychâwâlogatomachâorazâwâwyr-
azach.âBiuletyn Polskiego Towarzystwa Fonetycznegoâ32.â173â178.Richter,âlutosĆawa.â1987.âModellingâofâtheârhythmicâstructureâofâutterancesâinâpolish.âStudia Phonetica Posnani-
ensia 1.â91â125.Roach,âpeter.â1982.âonâtheâdistinctionâbetweenââstress-timedââandââsyllable-timedââlanguages.âInâCrystal,âDavidâ
(ed.),âLinguistic Controversies: Essays in Linguistic Theory and Practice,â73â79.âlondon:âedwardâarnold.Scott,âDoniaâR.â&âIsard,âS.âD.â&âdeâBoysson-Bardies,âBĂ©nĂ©dicte.â1986.âonâtheâmeasurementâofârhythmicâirregu-
larity:âaâreplyâtoâBenguerel.âJournal of Phoneticsâ14.â327â330.Siegler,âMatthiewâa.â&âStern,âRichardâM.â1995.âonâtheâeffectsâofâspeechârateâinâlargeâvocabularyâspeechârecogni-
tionâsystems.âInâInternational Conference on Acoustics, Speech, and Signal Processing 1995. ICASSP-95.âVol.â1,â612â615.
Syrdal,âannâk.â&âBunnell,âtimothyâ&âhertz,âSusanâR.â&âMishra,âtaniyaâ&âSpiegel,âMurrayâ&âBickley,âCorineâ&âRekart,âDeborahâ&âMakashay,âMatthewâj.â2012.âtext-to-SpeechâIntelligibilityâacrossâSpeechâRates.âInâProceedings of Interspeech.âportland,âoregon.
SzymaĆski,âMarcinâ&âklessa,âkatarzynaâ&âBreuer,âStefanâ&âDemenko,âgraĆŒyna.â2011.âoptimizationâofâunitâse-lectionâspeechâsynthesis.âInâProceedings of XVIIth International Congress of Phonetic Sciences,â1930â1933.âhongâkong.
treiblmaier,âhorstâ&âFilzmoser,âpeter.â2009.âBenefits from using continuous rating scales in online survey re-search.âtechnischeâuniversittâwien,âForschungsberichtâSM-2009-4.
Vainio,âMartti.â2001.âArtificial neural network based prosody models for Finnish text-to-speech synthesis. hel-sinki:âuniversityâofâhelsinki.â(Doctoralâdissertation.)
vanâSanten,âjanâp.âh.â1993.âQuantitativeâmodelingâofâsegmentalâduration.âInâProceedings of the workshop on Hu-man Language Technology,â323â328.âassociationâforâComputationalâlinguistics.
wagner,âpetraâ&âwindmann,âandreas.â2011.âtheâshrinkingâeffectsâonâspeechâtempoâperception.âInâProceedings of XVIIth International Congress of Phonetic Sciences,â2082â2085.âhongâkong.
zee,âeric.â2002.âtheâeffectâofâspeechârateâonâtheâtemporalâorganizationâofâsyllableâproductionâinâcantonese.âPro-ceedings of Speech Prosody. aix-en-provence.
UnauthenticatedDownload Date | 12/13/15 8:16 AM