functional demarcation of pitch

8
Signal Processing 3 (1981) 277-284 North-Holland Publishing Company SHORT COMMUNICATION FUNCTIONAL DEMARCATION OF PITCH T.V. SREENIVAS and P.V.S. RAO Speech and Digital Systems Group, Tata Institute of Fundamental Research, Bombay 400 005, India Received 25 June 1980 Revised 10 September 1980 and 16 February 1981 Abstract. A satisfactory solution is yet to be found for the problem of estimating the pitch of speech signals. It is difficult even to evolve objective criteria to evaluate existing algorithms. This is usually done on the basis of complexity of the algorithm, speed of computation, ease of implementation, etc. Even the verification of the results of one particular technique is difficult. This paper presents three functional demarcations of pitch estimation methods (based on the linear model of speech production, analysis of the short-time spectrum and examination of the time domain signal, respectively) as used for speech processing. It is shown that evaluation and comparison of different algorithms becomes consistent and easy within each demarcation. Also, methods falling within each demarcation are shown to be suited for a particular area of speech processing. Zusammenfassung. Fiir die Aufgabe der Grundfrequenzbestimmung von Sprachsignalen steht eine befriedigende L6sung noch aus. Es ist sogar schwierig, objektive Kriterien zu entwickeln, anhand derer bestehende Algorithmen beurteilt werden k6nnen. Oblicherweise werden diese Algorithmen beurteilt nach Kriterien wie z.B. Komplexidit, Rechengeschwindigkeit, oder Einfachheit der Implementierung. Bereits die Priifung der Ergebnisse eines einzelnen Verfahrens zur Grundfrequenzbes- timmung yon Sprachsignalen ist schwierig. In vorliegendem Artikel werden die Verfahren in drei Klassen eingeteilt; die Klassen lehnen sich an (1) an das lineare Modell der Spracherzeugung, (2) an die Kurzzeitspektralanalyse und (3) an die Priifung des Signals in Zeitbereich. Wie sich zeigt, ergeben sich einfache und iibereinstimmende Vergleichsm6glichkeiten innerhalb jeder Klasse. Ausserdem ergibt sich fiir die Verfahren jeder Klasse eine spezielle Anwendungsm6glichkeit, fiir die diese Verfahren besonders geeignet erscheinen. R6sum6. Une solution satisfaisante dolt encore ~tre trouv6e au probl~me de l'extraction de la fondamentale du signal parole. I1 est m~me difficile de trouver des crit~res objectifs pour 6valuer les algorithmes existants. Ceci est g6n6ralement fait sur la base de la complexit6 de l'algorithme, du temps de calcul, de la facilit6 de mise en oeuvre, etc. M~me la v6rification des r6sultats d'une technique particuli6re est difficile. Cet article pr6sente trois d6marcations fonctionnelles des m6thodes d'estimation de la fondamentale, bas6es respectivement sur le mod61e lin6aire de production de la parole, l'analyse du spectre instantan6 et l'examen du signal temporel. II est montr6 que l'6valuation et la comparaison des diff6rents algorithmes deviennent consistentes et ais6es dans chaque d6marcation. On montre aussi que les m6thodes tombant dans chaque d~marcation conviennent pour un domaine particulier du traitement de la parole. Keywords. Pitch estimation, speech signal, linear model, pitch perception, laryngeal disorders. 1. Motivation Estimation of pitch from the speech signal is an important problem for which a satisfactory solu- tion is yet to be found. Estimating the pitch of a perfectly periodic signal is quite simple; however, for speech which is inherently quasi-periodic (or non-stationary), pitch estimation tends to become complex. The addition of noise and other inter- ference due to the transmission system renders it even more difficult. It is intriguing that the human listener can sense pitch of such a complex signal, even under interference. Many diverse techniques have been proposed for estimating the pitch of speech signals. Often, these techniques are evaluated by using their results in a speech synthesis system or by com- paring them with the results of manual pitch 0165-1684/81/0000-0000/$02,50 (~ North-Holland Publishing Company

Upload: tv-sreenivas

Post on 21-Jun-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional demarcation of pitch

Signal Processing 3 (1981) 277-284 North-Holland Publishing Company

S H O R T C O M M U N I C A T I O N

FUNCTIONAL DEMARCATION OF PITCH

T.V. S R E E N I V A S and P.V.S. R A O

Speech and Digital Systems Group, Tata Institute of Fundamental Research, Bombay 400 005, India

Received 25 June 1980 Revised 10 September 1980 and 16 February 1981

Abstract. A satisfactory solution is yet to be found for the problem of estimating the pitch of speech signals. It is difficult even to evolve objective criteria to evaluate existing algorithms. This is usually done on the basis of complexity of the algorithm, speed of computation, ease of implementation, etc. Even the verification of the results of one particular technique is difficult. This paper presents three functional demarcations of pitch estimation methods (based on the linear model of speech production, analysis of the short-time spectrum and examination of the time domain signal, respectively) as used for speech processing. It is shown that evaluation and comparison of different algorithms becomes consistent and easy within each demarcation. Also, methods falling within each demarcation are shown to be suited for a particular area of speech processing.

Zusammenfassung. Fiir die Aufgabe der Grundfrequenzbestimmung von Sprachsignalen steht eine befriedigende L6sung noch aus. Es ist sogar schwierig, objektive Kriterien zu entwickeln, anhand derer bestehende Algorithmen beurteilt werden k6nnen. Oblicherweise werden diese Algorithmen beurteilt nach Kriterien wie z.B. Komplexidit, Rechengeschwindigkeit, oder Einfachheit der Implementierung. Bereits die Priifung der Ergebnisse eines einzelnen Verfahrens zur Grundfrequenzbes- timmung yon Sprachsignalen ist schwierig. In vorliegendem Artikel werden die Verfahren in drei Klassen eingeteilt; die Klassen lehnen sich an (1) an das lineare Modell der Spracherzeugung, (2) an die Kurzzeitspektralanalyse und (3) an die Priifung des Signals in Zeitbereich. Wie sich zeigt, ergeben sich einfache und iibereinstimmende Vergleichsm6glichkeiten innerhalb jeder Klasse. Ausserdem ergibt sich fiir die Verfahren jeder Klasse eine spezielle Anwendungsm6glichkeit, fiir die diese Verfahren besonders geeignet erscheinen.

R6sum6. Une solution satisfaisante dolt encore ~tre trouv6e au probl~me de l'extraction de la fondamentale du signal parole. I1 est m~me difficile de trouver des crit~res objectifs pour 6valuer les algorithmes existants. Ceci est g6n6ralement fait sur la base de la complexit6 de l'algorithme, du temps de calcul, de la facilit6 de mise en oeuvre, etc. M~me la v6rification des r6sultats d'une technique particuli6re est difficile. Cet article pr6sente trois d6marcations fonctionnelles des m6thodes d'estimation de la fondamentale, bas6es respectivement sur le mod61e lin6aire de production de la parole, l'analyse du spectre instantan6 et l'examen du signal temporel. II est montr6 que l'6valuation et la comparaison des diff6rents algorithmes deviennent consistentes et ais6es dans chaque d6marcation. On montre aussi que les m6thodes tombant dans chaque d~marcation conviennent pour un domaine particulier du traitement de la parole.

Keywords. Pitch estimation, speech signal, linear model, pitch perception, laryngeal disorders.

1. Motivation

Es t ima t ion of pi tch f rom the speech signal is an

i m p o r t a n t p r o b l e m for which a sat isfactory solu-

t ion is yet to be found. Es t ima t ing the pi tch of a

per fec t ly per iod ic signal is qu i te s imple; however ,

for speech which is inheren t ly quas i -pe r iod ic (or

non-s ta t ionary) , pi tch es t imat ion tends to b e c o m e

complex . The addi t ion of noise and o the r in ter-

f e rence due to the t ransmiss ion system renders it

even m o r e difficult. It is in t r iguing that the human

l is tener can sense pitch of such a complex signal,

even under in ter ference .

M a n y diverse t echn iques have been p roposed

for es t imat ing the pitch of speech signals. Of ten ,

these t echn iques are eva lua ted by using their

results in a speech synthesis system or by com-

par ing t h e m with the results of manua l pi tch

0 1 6 5 - 1 6 8 4 / 8 1 / 0 0 0 0 - 0 0 0 0 / $ 0 2 , 5 0 (~ N o r t h - H o l l a n d Publ ishing C o m p a n y

Page 2: Functional demarcation of pitch

278 T. V. Sreenivas, P. I7.8. Rao / Functional demarcation of pitch

extraction. When the signal is perfectly periodic and noise free, most of the techniques yield nearly identical results and verification does not give rise to any ambiguity. However, when the signal is complex, these techniques yield divergent results and there are no obvious criteria for justifying their correctness. Concern for this lack of an adequate definition and a proper understanding of pitch has been expressed by Flanagan in his classic book [3, p. 184].

New pitch algorithms are usually evaluated using criteria such as complexity of the algorithm, speed of computation, ease of implementation, etc.; systematic authentication of the results is in general not attempted. These algorithms extract pitch by operating on different transformations of the signal, viz., filtered signal, short-time spec- trum, LPC (linear predictive coding) error signal etc. Although these transformations are interre- lated, comparing the pitch results analytically is not feasible due to the diversity of the representa- tion domains. There are only a few comparative studies on pitch estimation methods in the literature. The lastest by McGonegal, et al., [9] presented a qualitative or subjective comparison of some important pitch algorithms. The algorithms were used in a LPC analysis-synthesis system and were rank ordered by judging the naturalness of the synthesized speech. Rabiner et

al., [11] have compared the results of the same set of algorithms with that of the semiautomatic pitch detector (SAPD) [8] by taking it as a standard. (The SAPD estimates pitch using human judge- ment, aided by visualisation of the speech waveform, cepstrum and autocorrelation func- tion.) Hardly any correlation was found between the subjective and objective rank orderings as shown by McGonegal et al. [9] Thus a comparison of different techniques becomes difficult due to the lack of a common set of criteria to define pitch. Without a proper understanding of pitch, even verification of the results of one particular tech- nique becomes difficult.

This paper proposes three functional demarca- tions of pitch. As a consequence, existing pitch Signal Processing

algorithms can be grouped into three categories corresponding to the three functional demarca- tions. The criteria for comparing different algorithms become more clear within each category, as pitch caters to a specific purpose in each case.

2. Three demarcations

An analysis of all the pitch algorithms published to date reveals that these algorithms are based on one of three basic premises. In the literature, such premises have, in some cases, been explicitly stated and in some other cases, implicitly assumed in developing an algorithm. These premises are:

(1) "Although voiced speech is in general non-stationary, it can be assumed to be station- ary and periodic over short durations of the order of 30-40 msec" (linear model) [10].

(2) "The short-time power spectrum of a voiced speech signal exhibits harmonic structure" [6].

(3) "Voiced speech can be identified by observ- ing the periodic structure in the waveform and the pitch epochs can be marked by such visual observation" [5] 1 . In all the three premises above, the absence of voicing or failure to find pitch indicates an unvoiced segment 2. (A silent segment is an unvoiced segment for the purposes of pitch estima- tion.).The reference indicated with each premise provides a typical example of a pitch algorithm for that category. Each premise provides a distinctly different approach to the pitch problem. It is clear that premise (3) is quite independent of the other two premises. Although first and second premises appear to be related, they differ significantly because of the radically different methodologies used for processing spectral information under

1 There does exist another premise: the presence or absence of a fundamenta l harmonic; some early algorithms were based on this. For speech signals this assumption is too restrictive, (e.g., te lephone speech) and hence not included here.

z There exists a pat tern recognition method for voiced, unvoiced and silence classification [1]. A few methods make use of this as a first stage in the algorithm.

Page 3: Functional demarcation of pitch

T. V. Sreenivas, P. V.S. Rao / Functional demarcation o[pitch 279

premise (2) (e.g., see [6, 14]). While the premises are distinct, it is shown below that each premise also caters to a specific area of speech processing.

Methods based on the linear model:

Synthetic speech

Most of the analysis-synthesis systems known today are based on the linear model of speech production (Fig. 1) [3]. Here, speech is the output of a slowly varying linear system H ( z ) excited by periodic impulses U ( z ) or white noise N ( z ) . Dur- ing a short duration of 30--40 msec., it is assumed that all the parameters of the system are constant and that the output speech is stationary; the perio- dicity of U ( z ) determines the pitch of the signal. Pitch estimation using cepstrum and related tech- niques 3, autocorrelation and related methods 3, and methods based on least squared error 3, are based on these assumptions (see premise (1) above) which are consistent with the linear model. The validity of the parameters of the model (in particular of the pitch P) extracted by any one of these methods, can be determined by the natural- ness of the synthesized speech. Thus any two algorithms can be compared with respect to the naturalness of speech synthesized using the extrac- ted pitch values, while keeping all other para- meters of the system common for both.

3 A detailed discussion of this classification is given in [13].

The advantage of grouping pitch algorithms based on premise (1) can be illustrated using the study made by Rabiner et al. [11] who reported a comparative evaluation of seven pitch algorithms. The seven algorithms were

(i) AUTOC: modified autocorrelation method, (ii) CEP: cepstrum method, (iii) SIFT: simplified inverse filtering technique, (iv) DARD: data reduction method, (v) PPROC: parallel processing method, (vi) 'LPC': spectral equalisation using Newton's

transformation, and (vii) AMDF: average magnitude difference

function. All these algorithms have been explained and

their block diagrams have been reported by Rabiner et al. [11] Taking the pitch results of the semi-automatic pitch detector (SAPD) [8] as the standard, the seven pitch algorithms were rank ordered by Rabiner et al. with respect to four types of errors, viz., (i) V-UV errors, (ii) UV-V errors, (iii) Gross errors and (iv) Fine errors. Further, McGonegal et al. [9] made a subjective com- parison of the same seven pitch algorithms by incorporating each of their results in a LPC analy- sis-synthesis system. Based on the naturalness of the synthesized speech listeners rank ordered the seven algorithms. But, on comparing these two results, very little correlation was )round between subjective rank ordering and the rank ordering

Pitch period P

Generator I " X ~

Random [ I number [Generator I Gain G

Tr ! L,

Digital filter coefficients ( Vocal tract parameters )

Time Varying Digital Filter

~pee~h ~omp~es

)

Fig. 1. Linear model of speech production.

Vol. 3, No. 3, July 1981

Page 4: Functional demarcation of pitch

280 7". V. Sreenivas, P, V.S. Rao / Functional demarcation of pitch

done by Rainer et al. with respect to the four errors (Fig. 2 shows all the rank orderings and their correlation).

It should be noted that each of the seven algorithms used for the comparison, is based on one of the three different premises stated earlier. However, the method used for subjective rank ordering is based on the linear model. (premise (1)) Again, in the objective rank ordering the autocorrelation and cepstrum functions used in SAPD (which is the standard of comparison) are based on the linear model; human judgement in SAPD is resorted to only when both functions fail. Hence, the comparison of rank orderings would be better justified, if the pitch algorithms are also

based on the methodology of the linear model, i.e., based on premise (1).

Accordingly among the seven algorithms we could consider the subset of algorithms AUTOC, CEP, SIFT and AMDF (which are based on prem- ise (1)) and exclude the remaining three from the comparison of rank ordering. Fig. 3(a) shows such a comparison between subjective rank ordering and ordering with respect to V-UV errors. It is interesting to note than there is complete agree- ment in the two rankings unlike when all the algorithms are considered (contrast with Fig. 2(c)). Fig. 3(b) shows a comparison of objective ranking of these four algorithms with respect to UV-V errors, Gross errors and Fine errors,

Objective S u b j e c l i v e O b j e c t i v e Subjective ranking ranking ranking ranking

LPC LPC LPC LPC

CEP ~ ~ _A M D F A U T O C ~ PPROC A U T O C _ _ AMDF \ PPROC o

PPRoc/ \c P OARO/......_ "cEP DARD DAR° AMDF " ~ D A R D

(a) Gross errors (b) Standctrd deviation of fine errors

Objective Subjective Objective Subject ive ranking ranking ronking rank in 9

A U T O C / / ~ PP ROC

LPC " XAUTOC

bARD _ SIFT

SIFT ~ C E P CEP i -bARD

CEP LPC

AU TOC'~.. S " A MDF D A R D ~ L ~ PPROC

P P R O C ~ ~ "AUTOC

AMDF ~ . ~ % ~ S I FT

(c) V-UV errors (d) UV-V e r r o r s

Fig. 2. Comparison of pitch algorithms with respect to subjective ranking and objective ranking based on four types of errors (from McGonegal et al. [9]).

Signal Processing

Page 5: Functional demarcation of pitch

T. V. Sreenivas, P. V.S. Rao / Functional demarcation of pitch 281

Objective ranking wrt Objective ranking Subjective wrt v - u v errors ronk in 9 u v - v errors Gross errors Fine e r r o r s

AMDF -- AMDF CEP -- CEP AUTOC ~J

AUTOC -- AUTOC AUTOC -- AUTOC ~ CE P

SIFT ~ S I F T A M D F -- AMDF SIFT

X CEP - - CEP SIFT - - SiFT AMDF

(o) (b)

Fig. 3. Comparison of the ranking of a subset of pitch algorithms based on premise (1).

independent of the subjective ranking. Here again a remarkable consistency exists between the three types of errors, not present in comparing (a), (b) and (d) of Fig. 2. Such consistency in the position of an algorithm with respect to different errors can be interpreted as being due to the methodology of the algorithm: SIFT, for example, is generally poorer due to the reduction of SNR in inverse filtering. Subjective ranking correlates poorly with these three types of errors, as can be seen by comparing Fig. 3(a) and Fig. 3(b). This may be because either such errors are few or because they do not significantly affect the perceived quality of synthesized speech.

Methods based on short-time spectra : Pitch perception

From studies on the human hearing mechanism it is quite clear that neural signals originating from the basilar membrane form the basis for auditory recognition. [3] The membrane displacement gen- erates signals indicating both the magnitude of the spectral components and their frequency deriva- tive. The temporal pattern of membrane displace- ment also gets transmitted, limited by the maximum discharge rate of neurons. The human listener can sense pitch based on just three frequency components existing anywhere in the region of 500-5000 Hz, (residue pitch) even under noisey conditions [12]. Unrestricted by the non- stationarity of the signal, the listener seems to be able to selectively pick up spectral information

leading to pitch. The pitch here is characterized by what the listener perceives rather than what exists in the stimulus. Pitch algorithms based on premise (2) above also derive pitch cues from the spectrum. It would therefore seem reasonable to compare the different algorithms based on this premise using the human hearing mechanism as the basis. Even the performance of a particular algorithm can be evaluated by comparing its results with the results of human pitch perception experiments for the same signals.

The advantage of viewing perceived pitch independently of the linear model can be seen using the following example. Consider a signal s(n), shown in Fig. 4(b), which is the output of the linear system shown in Fig. 1, when excited by a non-stationary impulse train. In the three period segment shown, the period . progressively decreases, being P, P - x and P - 2 x respectively, where x = 0.1 P. (See Appendix for a description of these signals.) While such a signal is quite com- mon in continuous speech, it is not at all obvious as to what the pitch of this should be. The cepstrum c(n) and autocorrelation function R(n), which show a clear pitch peak when all the three periods are equal (see Fig. 4(a)), fail to show any significant pitch peak for the aperiodic signal. (see Fig. 4(b)) Premise (1) does not hold good for s(n) and obviously the ear cannot be sensing each period P, P - x and P - 2 x separately (as could be done by visual inspection of the waveform). One might just speculate that the pitch period should be the average of (P, P - x , P - 2 x ) . But one might

Vol. 3, No. 3, July 1981

Page 6: Functional demarcation of pitch

282 7". V. Sreenivas, P. V.S. Rao / Functional demarcation of pitch

s(n) n=0-149

S'(k)

k=0-74,

S(k)

k=o-255

c(n)

n=o-149

R(n)

n=o-149

n=o-134

"~ "~ "" "" "" "" S~k)

k=0-67

0 . 3 * 1

S(k)

k=o-z55

c(n)

n=0-134

Fig. 4. Comparison of spectrum, cepstrum and autocorrelation of (a) periodic signal, (b) aperiodic signal.

R(n)

n=0-134

equally well contend that pitch freqeucny should

be the average of the frequencies {1/P, 1 / ( P - x ) , 1 / ( P - 2 x ) } . Perception experiments 4 on such sig-

nals show that it is the periods that are averaged and not the frequencies. Also the D F T spectrum

of such a signal, comprising of al ternate bands of noise-like and harmonic-l ike structures, has a har-

monic spacing corresponding to the average of the pitch periods. 5

Methods based on waveform analysis:

Laryngeal studies

Waveform analysis of speech signal can be used as a diagnostic aid. Compar ing the waveforms of normal speech and pathological speech (of

4 Results of these experiments will be reported shortly. s Fig. 4 is meant to show that premise (1) fails and premise

(2) is relevant; however, in a way, it brings together all three premises: it puts forth the inadequacy of premise (1) (acf and cepstrum), the feasibility of premise (2) (structure in the spec- trum) and the advantage of premise (3) (apparent time domain information).

Signal Process ing

patients of laryngeal disorders), Features indica- tive of the abnormali ty may be deduced. This is

essentially a pat tern recognition approach 6 in

which the capacity of the human eye to identify

patterns plays an important role. Pitch estimation methods [5] based on premise (3) above adopt the

same methodology to find one particular waveform feature, viz., pitch period defined as the duration between two successive pitch epochs. By virtue of using t ime-domain analysis, these

methods can provide period to period information

about pitch. Thus, small perturbat ions and other systematic variations in the pitch period can be identified; these would get averaged out in autocorrelat ion or spectral representation. The standard for comparing the results of these algorithms can be obtained through manual identification of pitch epochs in the waveform (unlike SAPD, where cepstrum and autocorrela-

6 Waveform parsing techniques of syntactic pattern recogni- tion [4] would he very useful.

Page 7: Functional demarcation of pitch

T. V. Sreenivas, P. V.S. Rao / Functional demarcation of pitch 283

tion are also used). The utility of an algorithm in laryngeal studies would depend on its success in identifying the necessary waveform features.

Lieberman [7] studied the speech of normal subjects and patients of laryngeal disorders, for pitch perturbation over two successive periods by visual observation. He computed a perturbation factor which was a good indicator for the dete~tion of certain types of pathologic laryngeal conditions. In the well known case of diplophonia, alternate periods of the patient's speech signal exhibit greater similarities than successive periods. This gives rise to the so called "pitch doubling" effect in the autocorrelation and cepstrum methods of pitch estimation. However, a pattern recognition method which uses waveform features can indicate the correct pitch as well as the property of alter- nate period similarity. Fig. 5 shows the waveforms of a female hoarse voice, phonating / a / and /i / . It can be seen that after every three or four periods, a large perturbation occurs consistently for both / a / a n d / i / . Such features could be symptomatic of laryngeal abnormalities. It is, however, hard to detect such a structure using non-time-domain methods.

3. Conclusion

It has been proposed that it would be advan- tageous to group pitch estimation methods with

respect to the operating premise on which they are based. It is shown that comparison of different algorithms is meaningful only within each demarcation. Hopefully, such a grouping would pave the way for a clearer definition of pitch and for more robust and reliable pitch estimation tech- niques.

Appendix

Description of signals in Fig. 4

Signal s(n) was synthesized using a LPC syn- thesizer at a sampling frequency of 10 kHz to which white noise was added at a SNR of 20 dB. S'(k) is the DFT of s(n) where k and n are of the same length (computed using a mixed radix FFT). Since s(n) has an integral number of "periods", S'(k) can be expected to have minimum Picket-fense effect [2]. S(k) is the regular Black- man-Tuckey spectrum (log-magnitude spectrum) using a Hamming window and a 512 point radix-2 FFT. (s (n) was appended with the requisite num- ber of zeros.) Note the distinguishable pitch peaks in S'(k), which are hard to distinguish in S(k). c(n) is the cepstrum of s (n) computed as usual by taking the cosine transform of S(k). To graphically illus- trate the cepstral peak, c(n) has been plotted by suppressing the beginning 4 points to c(5). The autocorrelation function R(n) is computed directly from the unwindowed signal s (n).

Fig. 5. /a/ and /i/ of a female hoarse voice of duration 70 msec. Vol. 3, No. 3, July 1981

Page 8: Functional demarcation of pitch

284 T. V. Sreenivas, P. V.S. Rao / Functional demarcation of pitch

Acknowledgement

We are thankful to Prof. R. Vaidyanathan, Audiology and Speech Therapy School, Nair Hospital, Bombay-8, who provided us the voice samples of patients having laryngeal disorders.

References

[1] B.S. Atal and L.R. Rabiner, "A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition', IEEE Trans. Acoust. Speech and Signal Processing, Vol. ASSP-24, No. 3, June 1976, pp. 201-212.

[2] G.D. Bergland, "A guided tour of the fast Fourier trans- form", IEEE Spectrum, Vol. 6, No. 7, July 1969, pp. 41-52.

[3] J.L. Flanagan, Speech Analysis, Synthesis and Perception, 2nd ed., Springer-Verlag, New York, 1972.

[4] K.S. Fu, Syntactic Pattern Recognition Applications, Springer-Verlag, New York, 1977.

[5] B. Gold, "Computer program for pitch extraction", J. Acoust. Soc. Amer., Vol. 34, No. 7, July 1962, pp. 916- 921.

[6] C.M. Harris and M.R. Weiss, "Pitch extraction by com- puter processing of high-resolution Fourier analysis data", J. Acoust. Soc. Amer., Vol. 35, No. 3, March 1963, pp. 339-343.

[7] P. Lieberman, "Some acoustic measures of the funda- mental periodicity of normal and pathological cases", jr. Acoust. Soc. Amer., Vol. 35, No. 3, March 1963, pp. 344-353.

[8] C.A. McGonegal, L.R. Rabiner and A.E. Rosenberg, "A semiautomatic pitch detector (SAPD)", IEEE Trans. Audio. Electroacoustics, Vol. AU-21, No. 6, Dec. 1975, pp. 154-160.

[9] C.A. McGonegal, L.R. Rabiner and A.E. Rosenberg, "A subjective evaluation of pitch detection methods using LPC synthesized speech", IEEE Trans. Acoust. Speech and Signal Processing, Vol. ASSP-25, No. 3, June 1977, pp. 221-229.

[10] A.M. Noll, "Cepstrum pitch determination", J. Acoust. Soc. Amer., Vol. 41, No. 2, Feb. 1967, pp. 293-309.

[11] L.R. Rabiner, M.J. Cheng, A.E. Rosenberg and C.A. McGonegal, "A comparative performance study of several pitch detection algorithms", IEEE Trans. Acoust. Speech and Signal Processing, Vol. ASSP-24, No. 5, Oct. 1976, pp. 399-418.

[12] R.J. Ritsma, "Existence region of the tonal residue-l ' , J. Acoust. Soc. Amer., Vol. 34, No. 9, Sept. 1962, pp. 1224-1229.

[13] T.V. Sreenivas and P.V.S. Rao, "Problem of funda- mental frequency analysis in speech", Technical report, Speech and Digital systems group, TIFR, Bombay-5, Jan. 1980.

[14] T.V. Sreenivas and P.V.S. Rao, "Pitch extraction from corrupted harmonics of the power spectrum", J. Acoust. Soc. Amer., Vol. 65, No. 1, Jan. 1979, pp. 223-228.

Signal Processing