straight
DESCRIPTION
VOCODERTRANSCRIPT
![Page 1: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/1.jpg)
9-10 August, 2002 Computational Audition 1
Fixed Point Representations forFixed Point Representations forVery High-Quality Speech andVery High-Quality Speech and
Sound Modification SystemSound Modification System
Hideki KawaharaHideki KawaharaWakayama University, JapanWakayama University, Japan
![Page 2: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/2.jpg)
9-10 August, 2002 Computational Audition 2
SummarySummary
nn Functional (computational after Marr)Functional (computational after Marr)approach is important and productive.approach is important and productive.
nn Fixed points provide feature values asFixed points provide feature values aswell as their reliability indices.well as their reliability indices.–– Using Using within channelwithin channel information information
nn Fixed point concept may provide clue toFixed point concept may provide clue tointegrate Fourier based concept andintegrate Fourier based concept andwavelet-wavelet-Mellin Mellin transform based concept.transform based concept.
Reference systemReference system
““LocalLocal”” center of gravitycenter of gravity
![Page 3: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/3.jpg)
9-10 August, 2002 Computational Audition 3
original
STRAIGHT: demoSTRAIGHT: demo
![Page 4: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/4.jpg)
9-10 August, 2002 Computational Audition 4
STRAIGHT demo: morphingSTRAIGHT demo: morphingneutral angry
interpolationextrapolation extrapolation
Word: /hai/ (“Yes” in Japanese)
![Page 5: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/5.jpg)
9-10 August, 2002 Computational Audition 5
BackgroundBackground
nn ““Auditory BrainAuditory Brain”” Project by CREST Project by CREST–– Short term goal: speech processing systemsShort term goal: speech processing systems
based on functional models of auditory functions.based on functional models of auditory functions.–– STRAIGHT: a very high-quality speechSTRAIGHT: a very high-quality speech
manipulation systemmanipulation system–– Fixed point based algorithmsFixed point based algorithms
(alternative way of dimensional reduction of(alternative way of dimensional reduction ofauditory representations)auditory representations)
–– Long term goal: Long term goal: ““computationalcomputational”” theorirdtheorird of ofauditionaudition
nn Frustrations in ways how auditory modelsFrustrations in ways how auditory modelsare used in ASR and how speech processingare used in ASR and how speech processingsystems are evaluated.systems are evaluated.
![Page 6: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/6.jpg)
9-10 August, 2002 Computational Audition 6
““Auditory BrainAuditory Brain”” Project Project
nn To develop a very high quality speechTo develop a very high quality speechand/or sound manipulation system basedand/or sound manipulation system basedon perceptually relevant parameters andon perceptually relevant parameters andit does not preserve phase/waveformit does not preserve phase/waveforminformation.information.–– ?? Are distance based quality measures relevant???? Are distance based quality measures relevant??–– ?? Why does periodic sound sounds smoother and ?? Why does periodic sound sounds smoother and
richer (in Auditory Fovea)??richer (in Auditory Fovea)??–– ?? Is it relevant to test highly nonlinear speech ?? Is it relevant to test highly nonlinear speech
perception using elementary sounds ??perception using elementary sounds ??
![Page 7: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/7.jpg)
9-10 August, 2002 Computational Audition 7
Why high quality?Why high quality?
nn Ecological approach for investigatingEcological approach for investigatinghighly nonlinear system, Humanhighly nonlinear system, Human
Not necessarily be predictablefrom elementary test signals
Necessary to useecologically valid stimuli
Naturalness
![Page 8: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/8.jpg)
9-10 August, 2002 Computational Audition 8
Hans Moravec: Robot, 2000, Oxford
xbox
iMac
PlayStation-3
Key issue: compatibilityKey issue: compatibility
Background figure is removed.Please visit Hans Moravec’s page forthe original figure.Faster than exponential growth in computing power(Chapter 3: Power and Presence, Page 60)http://www.frc.ri.cmu.edu/~hpm/book98/
![Page 9: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/9.jpg)
9-10 August, 2002 Computational Audition 9
““Auditory BrainAuditory Brain”” Project Project
nn Computational theories of speech/auditoryComputational theories of speech/auditoryperceptionperception–– ecological constraints on evolution ecological constraints on evolution
–– It cannot be ad hoc. It cannot be ad hoc.»» When there is an elegant and reasonable algorithmWhen there is an elegant and reasonable algorithm
and it does not violate ecological (biological andand it does not violate ecological (biological andenvironmental) constraints, there is no reason toenvironmental) constraints, there is no reason todeny that the algorithm shares the commondeny that the algorithm shares the commonunderlying principles with our auditory system.underlying principles with our auditory system.
![Page 10: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/10.jpg)
9-10 August, 2002 Computational Audition 10
““Auditory BrainAuditory Brain”” Project Project
nn Computational theories of speech/auditoryComputational theories of speech/auditoryperceptionperception–– Periodicity: time-frequency sampling grid Periodicity: time-frequency sampling grid–– Periodicity: stable reference point for wavelet- Periodicity: stable reference point for wavelet-
Mellin Mellin transformtransform–– Log-linear frequency axis Log-linear frequency axis
»» Wavelet-Wavelet-Mellin Mellin transform: shape and sizetransform: shape and size
–– Why two ears? ICA Why two ears? ICA–– Long term correlation (structure) Long term correlation (structure)
»» ASR, musicASR, music
![Page 11: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/11.jpg)
9-10 August, 2002 Computational Audition 11
STRAIGHT a core technologySTRAIGHT a core technology
nn Conceptually simple architectureConceptually simple architecture–– Channel VOCODERChannel VOCODER–– Source filter modelSource filter model
nn Graded parameters (Graded parameters (vsvs binary decision) binary decision)–– Sensitivity analysisSensitivity analysis–– MorphingMorphing
nn Reliability / TransparencyReliability / Transparency–– No post-processingNo post-processing–– Weakly constrained modelWeakly constrained model
![Page 12: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/12.jpg)
9-10 August, 2002 Computational Audition 12
Structure of STRAIGHTSTRAIGHT: architectureSTRAIGHT: architecture
![Page 13: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/13.jpg)
9-10 August, 2002 Computational Audition 13
STRAIGHT a core technologySTRAIGHT a core technology
nn Conceptually simple architectureConceptually simple architecture–– Channel VOCODERChannel VOCODER–– Source filter modelSource filter model
nn Graded parameters (Graded parameters (vsvs binary decision) binary decision)–– Sensitivity analysisSensitivity analysis–– MorphingMorphing
nn Reliability / TransparencyReliability / Transparency–– No post-processingNo post-processing–– Weakly constrained modelWeakly constrained model
![Page 14: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/14.jpg)
9-10 August, 2002 Computational Audition 14
Structure of STRAIGHT
Spectral envelope estimation
STRAIGHT: structureSTRAIGHT: structure
![Page 15: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/15.jpg)
9-10 August, 2002 Computational Audition 15
Weakly constrained spectralWeakly constrained spectralenvelope estimationenvelope estimation
waveform
Time window
Interferencesin the time domain
Interferences in the frequency domain
![Page 16: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/16.jpg)
9-10 August, 2002 Computational Audition 16
Weakly constrained spectralWeakly constrained spectralenvelope estimationenvelope estimation
Reduction of edge discontinuity
Reduction of periodicity interference
smoothing by spline basisComposite window
![Page 17: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/17.jpg)
9-10 August, 2002 Computational Audition 17
Time-frequency smoothing(
Time-frequency smoothing(current implementation)current implementation)
F0 synchronousGaussian window
complimentarytime window
reduced interferencespectrum
F0 synchronousGaussian window
complimentarytime window
![Page 18: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/18.jpg)
9-10 August, 2002 Computational Audition 18
Compensation of over-Compensation of over-smoothingsmoothing
![Page 19: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/19.jpg)
9-10 August, 2002 Computational Audition 19
Compensation of over-smoothingCompensation of over-smoothing
![Page 20: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/20.jpg)
9-10 August, 2002 Computational Audition 20
Weakly constrained spectralWeakly constrained spectralenvelope estimationenvelope estimation
![Page 21: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/21.jpg)
9-10 August, 2002 Computational Audition 21
Weakly constrained spectralWeakly constrained spectralenvelope estimationenvelope estimation
![Page 22: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/22.jpg)
9-10 August, 2002 Computational Audition 22
Fixed point based algorithmsFixed point based algorithms
nn Fixed points in the frequency domain:→Fixed points in the frequency domain:→ F0 extraction F0 extraction
nn Fixed points in the time domain:→Fixed points in the time domain:→ Excitation extraction Excitation extraction
![Page 23: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/23.jpg)
9-10 August, 2002 Computational Audition 23
Fixed point of mappingFixed point of mapping
fixed point
y
x
y=f(x)
* Instantaneous frequencyof a filter output arounda sinusoidal component
* Energy centroid of a windowed signalaround an event
Examples
![Page 24: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/24.jpg)
9-10 August, 2002 Computational Audition 24
Averaging and fixed pointAveraging and fixed point
nn Prominent componentProminent component
Windowlocations
Average of windowedvalue
Fixed point
background
![Page 25: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/25.jpg)
9-10 August, 2002 Computational Audition 25
Averaging and fixed pointAveraging and fixed point
nn Prominent componentProminent component
Windowlocations
Average of windowedvalue
Fixed point
background
Parameters(position,slope,[level])
![Page 26: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/26.jpg)
9-10 August, 2002 Computational Audition 26
Structure of STRAIGHT
F0 estimation
STRAIGHT: structureSTRAIGHT: structure
![Page 27: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/27.jpg)
9-10 August, 2002 Computational Audition 27
window selection for reliablewindow selection for reliablerepresentation of mappingrepresentation of mapping
Refinement of Fo synchronous windows
![Page 28: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/28.jpg)
9-10 August, 2002 Computational Audition 28
Window with harmonicWindow with harmoniccancellationcancellation
![Page 29: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/29.jpg)
9-10 August, 2002 Computational Audition 29
Fixed-point-based sinusoidalFixed-point-based sinusoidalcomponents extractioncomponents extraction
![Page 30: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/30.jpg)
9-10 August, 2002 Computational Audition 30
Fixed-point-based sinusoidalFixed-point-based sinusoidalfrequency and C/N estimationfrequency and C/N estimation
C/N information enablesC/N information enablesoptimum F0 estimation based onoptimum F0 estimation based onmultiple harmonic componentsmultiple harmonic components
![Page 31: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/31.jpg)
9-10 August, 2002 Computational Audition 31
Approximate estimation of C/NApproximate estimation of C/N
![Page 32: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/32.jpg)
9-10 August, 2002 Computational Audition 32
Reliable built-in mechanismReliable built-in mechanismfor fundamental component selectionfor fundamental component selection
linearlinear filterarrangement
log-linearlog-linear filterarrangement
mapping filter output
![Page 33: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/33.jpg)
9-10 August, 2002 Computational Audition 33
Fixed points on C/N mapFixed points on C/N map
Fundamentalcomponent
![Page 34: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/34.jpg)
9-10 August, 2002 Computational Audition 34
F0 evaluation based on EGGF0 evaluation based on EGG
gross error
W/O:0.72%with:0.32%
female
![Page 35: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/35.jpg)
9-10 August, 2002 Computational Audition 35
Graded sourceGraded sourceInformationInformation
nn Fixed point basedFixed point basedFoFo extraction extraction(with C/N map)(with C/N map)
F0 trajectoriesF0 trajectories(resolution: 1/F0)(resolution: 1/F0)
C/N for each fixed point
Graded aperiodicity informationIs also extracted
![Page 36: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/36.jpg)
9-10 August, 2002 Computational Audition 36
Fixed points in the time domainFixed points in the time domain
nn How to define auditory temporal eventsHow to define auditory temporal events–– Localized energy Localized energy centroidcentroid
Alternative representation
![Page 37: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/37.jpg)
9-10 August, 2002 Computational Audition 37
Fixed points in the time domainFixed points in the time domain
Squared whitened signal Energy centroid
Gaussianwindow
Amount of energyconcentration
Speechwaveform
![Page 38: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/38.jpg)
9-10 August, 2002 Computational Audition 38
waveform
Energycentrold
Window center
Fixed points
Fixed point based event detectionFixed point based event detection
![Page 39: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/39.jpg)
9-10 August, 2002 Computational Audition 39
Mean time
duration
Definition of event in the time domainDefinition of event in the time domain
Event location
Windowed whitened signal
![Page 40: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/40.jpg)
9-10 August, 2002 Computational Audition 40
Windowed event location andWindowed event location andthe original event locationthe original event location
Gaussian window
Approximation of envelope
Windowed location
Originallocation
Window location
![Page 41: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/41.jpg)
9-10 August, 2002 Computational Audition 41
Slope at fixed point
durationWindowparameter
Duration can be estimated fromDuration can be estimated fromthe geometrical parameter atthe geometrical parameter at
the fixed pointthe fixed point
![Page 42: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/42.jpg)
9-10 August, 2002 Computational Audition 42
Equivalence between the time domainEquivalence between the time domaindefinition and the frequency domaindefinition and the frequency domain
definitiondefinition
waveform
Time domain definition
Frequency domain definition
Group delay
![Page 43: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/43.jpg)
9-10 August, 2002 Computational Audition 43
Inverse problem:Inverse problem:Where is the excitation?Where is the excitation?
Minimum phaseresponse
Event as theenergy centroid
Excitation (impulse)
compensation
![Page 44: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/44.jpg)
9-10 August, 2002 Computational Audition 44
Equivalence in definitionsEquivalence in definitions
nn Frequency domain definition of the event locationFrequency domain definition of the event location
nn Assuming causalityAssuming causality
Group delay
![Page 45: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/45.jpg)
9-10 August, 2002 Computational Audition 45
Group delay of a minimum phaseGroup delay of a minimum phaseresponseresponse
を介した計算Cepstrum
![Page 46: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/46.jpg)
9-10 August, 2002 Computational Audition 46
Compensation based onCompensation based onminimum phase group delayminimum phase group delay
Observed group delay
Causal group delay
Compensated event location
Compensated event duration
![Page 47: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/47.jpg)
9-10 August, 2002 Computational Audition 47
example
Observed group delayMinimum phasegroup delay
Compensated group delay
![Page 48: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/48.jpg)
9-10 August, 2002 Computational Audition 48
Excitation estimation based on fixedExcitation estimation based on fixedpoint based event detectionpoint based event detection
Event based concentration Excitation based concentration
Energy centroid
Compensatedgroup delay
excitation
Vocal fold closure
![Page 49: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/49.jpg)
9-10 August, 2002 Computational Audition 49
Excitation extraction accuracyExcitation extraction accuracy
Standarddeviation
![Page 50: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/50.jpg)
9-10 August, 2002 Computational Audition 50
Estimated excitation
Speechwaveform
Multiple resolution display ofMultiple resolution display ofevents (fixed points)events (fixed points)
![Page 51: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/51.jpg)
9-10 August, 2002 Computational Audition 51
Multiple resolution display ofMultiple resolution display ofevents (fixed points)events (fixed points)
demo
Fixed pointsdue to oneexcitation aligns on astraight line
![Page 52: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/52.jpg)
9-10 August, 2002 Computational Audition 52
Phase map of wavelet transformPhase map of wavelet transform
![Page 53: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/53.jpg)
9-10 August, 2002 Computational Audition 53
Instantaneous frequency basedInstantaneous frequency basedfixed pointsfixed points
![Page 54: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/54.jpg)
9-10 August, 2002 Computational Audition 54
Instantaneous frequency basedInstantaneous frequency basedfixed pointsfixed points
![Page 55: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/55.jpg)
9-10 August, 2002 Computational Audition 55
Group delay based fixed pointsGroup delay based fixed points
![Page 56: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/56.jpg)
9-10 August, 2002 Computational Audition 56
Group delay based fixed pointsGroup delay based fixed points
![Page 57: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/57.jpg)
9-10 August, 2002 Computational Audition 57
Structure of STRAIGHT
Source attribute control
STRAIGHT: structureSTRAIGHT: structure
![Page 58: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/58.jpg)
9-10 August, 2002 Computational Audition 58
Group delay manipulatedmixed-mode excitation sourceGroup delay manipulatedmixed-mode excitation source
group delay asymmetry
impulse response
..provides continuous coveragefrom pulse train to random noise
![Page 59: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/59.jpg)
9-10 August, 2002 Computational Audition 59
SummarySummary
nn Functional (computational after Marr)Functional (computational after Marr)approach is important and productive.approach is important and productive.
nn Fixed points provide feature values asFixed points provide feature values aswell as their reliability indices.well as their reliability indices.–– Using Using within channelwithin channel information information
nn Fixed point concept may provide clue toFixed point concept may provide clue tointegrate Fourier based concept andintegrate Fourier based concept andwavelet-wavelet-Mellin Mellin transform based concept.transform based concept.
![Page 60: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/60.jpg)
9-10 August, 2002 Computational Audition 60
ColleaguesColleaguesnn Haruhiro KatayoseHaruhiro Katayose, Toshio , Toshio IrinoIrino,, Takanobu Takanobu Nishiura Nishiura
(Wakayama (Wakayama UnivUniv.).)nn Minoru Minoru TsuzakiTsuzaki, Hideki , Hideki IwasawaIwasawa (ATR) (ATR)nn Parham Parham Zolfaghari Zolfaghari (NTT)(NTT)nn Kiyohiro ShikanoKiyohiro Shikano, Hiroshi , Hiroshi SaruwatariSaruwatari (NAIST) (NAIST)nn Fumitada ItakuraFumitada Itakura, Kazuya Takeda, Shoji , Kazuya Takeda, Shoji kajitakajita, Hideki, Hideki
BannoBanno (CIAIR, Nagoya (CIAIR, Nagoya UnivUniv.).)nn Masato Masato AkagiAkagi, Masashi, Masashi Unoki Unoki (JAIST) (JAIST)nn Seiichi Nakagawa (Seiichi Nakagawa (Toyohashi Toyohashi Inst. Tech)Inst. Tech)nn Shigeki Shigeki SagayamaSagayama, Nobuaki, Nobuaki Minematsu Minematsu ( (UnivUniv. Tokyo). Tokyo)nn Diane Diane KewleyKewley-Port (Indiana -Port (Indiana UnivUniv. USA). USA)nn Osamu Fujimura (Ohio state Osamu Fujimura (Ohio state UnivUniv. USA). USA)nn Alain de Alain de CheveignCheveignéé (IRCAM, France) (IRCAM, France)nn Roy D. Patterson (CNBH, UK)Roy D. Patterson (CNBH, UK)
![Page 61: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/61.jpg)
9-10 August, 2002 Computational Audition 61
ReferencesReferencesnn Hideki Kawahara,Hideki Kawahara, Ikuyo Ikuyo Masuda- Masuda-KatsuseKatsuse and Alain de and Alain de Cheveigne Cheveigne: Restructuring: Restructuring
speech representations using a pitch-adaptive time-frequency smoothing and anspeech representations using a pitch-adaptive time-frequency smoothing and aninstantaneous-frequency-based F0 extraction: Possible role of ainstantaneous-frequency-based F0 extraction: Possible role of a reptitive reptitivestructure in sounds, Speech Communication, 27, pp.187-207 (1999).structure in sounds, Speech Communication, 27, pp.187-207 (1999).
nn Hideki Kawahara,Hideki Kawahara, Haruhiro Katayose Haruhiro Katayose, Alain de, Alain de Cheveigne Cheveigne, Roy D. Patterson:, Roy D. Patterson:Fixed Point Analysis of Frequency to Instantaneous Frequency Mapping forFixed Point Analysis of Frequency to Instantaneous Frequency Mapping forAccurate Estimation of F0 and Periodicity , Proc. EUROSPEECH'99, Volume 6,Accurate Estimation of F0 and Periodicity , Proc. EUROSPEECH'99, Volume 6,Page 2781-2784 (1999).Page 2781-2784 (1999).
nn Hideki Kawahara, YoshinoriHideki Kawahara, Yoshinori Atake Atake and Parham and Parham Zolfaghari Zolfaghari: Accurate vocal event: Accurate vocal eventdetection method based on a fixed-point to weighted average group delay,detection method based on a fixed-point to weighted average group delay,ICSLP-2000, Beijing, pp.664-667 2000.ICSLP-2000, Beijing, pp.664-667 2000.
nn H. Kawahara and PH. Kawahara and P Zolfaghari Zolfaghari: Systematic F0 glitches around vowel nasal: Systematic F0 glitches around vowel nasaltransitions, EUROSPEECH'2001, pp.2459-2462, 2001.transitions, EUROSPEECH'2001, pp.2459-2462, 2001.
nn H. Kawahara, JoH. Kawahara, Jo Estill Estill and O. Fujimura: and O. Fujimura: Aperiodicity Aperiodicity extraction and control using extraction and control usingmixed mode excitation and group delay manipulation for a high quality speechmixed mode excitation and group delay manipulation for a high quality speechanalysis, modification and synthesis system STRAIGHT, MAVEBA 2001,analysis, modification and synthesis system STRAIGHT, MAVEBA 2001,Sept.13-15,Sept.13-15, Firentze Firentze Italy, 2001. Italy, 2001.
nn H. Kawahara and H. H. Kawahara and H. KatayoseKatayose: Scat generation research program based on: Scat generation research program based onSTRAIGHT, a high-quality speech analysis, modification and synthesis system,STRAIGHT, a high-quality speech analysis, modification and synthesis system,J. IPSJ, 43, 2, pp.208-218 2002. (in Japanese)J. IPSJ, 43, 2, pp.208-218 2002. (in Japanese)
![Page 62: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/62.jpg)
9-10 August, 2002 Computational Audition 62
For computational For computational ““AuditionAudition””
seed#1 seed#2
F0 trajectory and F0 trajectory and frequency axis modificationfrequency axis modification
Parts preparationParts preparation
Mixing and level adjustmentMixing and level adjustment
![Page 63: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/63.jpg)
9-10 August, 2002 Computational Audition 63
Nonlinear time warping based Nonlinear time warping basedon phase of the F0 componenton phase of the F0 component
(FM pulse train)(FM pulse train)
without time warpingwithout time warping with time warpingwith time warping
![Page 64: Straight](https://reader033.vdocuments.site/reader033/viewer/2022060115/55794d6ed8b42a31678b521c/html5/thumbnails/64.jpg)
9-10 August, 2002 Computational Audition 64
Nonlinear time warping based Nonlinear time warping basedon phase of the F0 componenton phase of the F0 component
(vowel sequence /(vowel sequence /aiueoaiueo/)/)
without time warpingwithout time warping with time warpingwith time warping