real-time display of voice source characteristics

7
Real-time display of voice source characteristics* Paul E. Garner and David M. Howard Parallel Real-time Systems and Speech Processing & Acoustics Research Group, Department of Electronics, The University of York, Heslington, York, UK Log Phon Vocol 1999; 24: 19–25 Previous research investigating electrolaryngographically derived voice source characteristics suggest that the vocal fold closed quotient (CQ) and fundamental frequency (F0) are useful parameters for assessing and developing voice production skills. This paper describes the design of a real-time visual feedback display system of CQ and F0 based on the electrolaryngograph output waveform, comprising a PC compatible computer with an Ariel PC-56D DSP co-processor board, which incorporates a Motorola 56001 DSP integrated circuit. Details of a pilot study are given, in which the system was used during a number of singing training sessions. Early indications show that this system provides useful visual feedback of progress to both the pupil and the instructor during voice training. Key words: electrolaryngograph, larynx closed quotient, fundamental frequency, voice. P. E. Garner, Parallel Real -time Systems and Speech Processing & Acoustics Research Group, Department of Electronics, The Uni6ersity of York, Heslington, York YO10 5DD, UK. Tel: +44 1 904 432413. Fax: +44 1 904 432335. E -mail: peg@ohm.york.ac.uk SHORT REPORT INTRODUCTION Over many years there has been increasing research interest in the teaching and coaching of vocal skills by speech therapists, acting and drama coaches and singing teachers. This interest applies to all levels and ages of vocal skill, from those with pathological voices at one end of the spectrum, to professional voice users at the other. The terminology used by singing teachers and voice coaches often relate to the physical gestures actually required to achieve the appropriate sound and these can bear little relation to physical reality in either physiological or acoustic terms (8), which some stu- dents find unhelpful and confusing. Much research, therefore, has centred on various aspects of the singing voice with the aim of replacing the present terminology with some reliable quantifiable measure of singing performance. Particular aspects include the fundamental fre- quency (F0) of the vocal folds perceived by the listener as voice pitch (11, 12, 17) and the larynx closed quotient (CQ), defined as the ratio between the time the vocal folds are in contact during each cycle and the period of the cycle. In particular, previous research work seems to suggest that the CQ measure gives an indication of the level of training and experi- ence of the singing voice. Adult males with greater singing training/experience make use of higher abso- lute CQ values, and their CQ values are at the upper range of those used in speech (8). However, with female subjects the difference does not appear in absolute CQ, but in the patterning of CQ variation with F0 as a function of training (10). As the power and more importantly the speed of computers has increased, the use of real-time visual feedback displays of various voice related features have become well-established (7, 13). We have devel- oped a system which displays CQ and F0 in real-time over the complete vocal range for both speech and singing. Previous research (8, 14) suggests 20% B CQ B80% to be a suitable CQ range, and 64 Hz B F0 B1 kHz for F0. This paper describes the real-time visual display system of CQ and F0, and examples of various display formats are given for healthy non-pathologi- cal speaking and singing voices. Current and poten- tial applications are discussed. MEASUREMENTS OF ASPECTS OF VOCAL FOLD VIBRATION In voiced speech and singing it is the larynx and in particular the vocal folds which are the acoustic * Paper presented at the PEVOC-II Conference, August 29 – 31, 1997, in Regensburg, Germany. © 1999 Scandinavian University Press. ISSN 1401-5439 Log Phon Vocol 24 Logoped Phoniatr Vocol Downloaded from informahealthcare.com by Cornell University on 11/06/14 For personal use only.

Upload: phungquynh

Post on 12-Mar-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-time display of voice source characteristics

Real-time display of voice source characteristics*Paul E. Garner and David M. Howard

Parallel Real-time Systems and Speech Processing & Acoustics Research Group, Department of Electronics, TheUniversity of York, Heslington, York, UK

Log Phon Vocol 1999; 24: 19–25

Previous research investigating electrolaryngographically derived voice source characteristics suggest that the vocal foldclosed quotient (CQ) and fundamental frequency (F0) are useful parameters for assessing and developing voice productionskills. This paper describes the design of a real-time visual feedback display system of CQ and F0 based on theelectrolaryngograph output waveform, comprising a PC compatible computer with an Ariel PC-56D DSP co-processorboard, which incorporates a Motorola 56001 DSP integrated circuit. Details of a pilot study are given, in which the systemwas used during a number of singing training sessions. Early indications show that this system provides useful visualfeedback of progress to both the pupil and the instructor during voice training.

Key words: electrolaryngograph, larynx closed quotient, fundamental frequency, voice.

P. E. Garner, Parallel Real-time Systems and Speech Processing & Acoustics Research Group, Department of Electronics, TheUni6ersity of York, Heslington, York YO10 5DD, UK. Tel: +44 1 904 432413. Fax: +44 1 904 432335. E-mail:[email protected]

SHORT REPORT

INTRODUCTION

Over many years there has been increasing researchinterest in the teaching and coaching of vocal skillsby speech therapists, acting and drama coaches andsinging teachers. This interest applies to all levels andages of vocal skill, from those with pathologicalvoices at one end of the spectrum, to professionalvoice users at the other.

The terminology used by singing teachers and voicecoaches often relate to the physical gestures actuallyrequired to achieve the appropriate sound and thesecan bear little relation to physical reality in eitherphysiological or acoustic terms (8), which some stu-dents find unhelpful and confusing. Much research,therefore, has centred on various aspects of thesinging voice with the aim of replacing the presentterminology with some reliable quantifiable measureof singing performance.

Particular aspects include the fundamental fre-quency (F0) of the vocal folds perceived by thelistener as voice pitch (11, 12, 17) and the larynxclosed quotient (CQ), defined as the ratio between thetime the vocal folds are in contact during each cycleand the period of the cycle. In particular, previousresearch work seems to suggest that the CQ measure

gives an indication of the level of training and experi-ence of the singing voice. Adult males with greatersinging training/experience make use of higher abso-lute CQ values, and their CQ values are at the upperrange of those used in speech (8). However, withfemale subjects the difference does not appear inabsolute CQ, but in the patterning of CQ variationwith F0 as a function of training (10).

As the power and more importantly the speed ofcomputers has increased, the use of real-time visualfeedback displays of various voice related featureshave become well-established (7, 13). We have devel-oped a system which displays CQ and F0 in real-timeover the complete vocal range for both speech andsinging. Previous research (8, 14) suggests 20%BCQB80% to be a suitable CQ range, and 64 HzBF0B1 kHz for F0.

This paper describes the real-time visual displaysystem of CQ and F0, and examples of variousdisplay formats are given for healthy non-pathologi-cal speaking and singing voices. Current and poten-tial applications are discussed.

MEASUREMENTS OF ASPECTS OF VOCALFOLD VIBRATION

In voiced speech and singing it is the larynx and inparticular the vocal folds which are the acoustic

* Paper presented at the PEVOC-II Conference, August 29–31,1997, in Regensburg, Germany.

© 1999 Scandinavian University Press. ISSN 1401-5439 Log Phon Vocol 24

Log

oped

Pho

niat

r V

ocol

Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Cor

nell

Uni

vers

ity o

n 11

/06/

14Fo

r pe

rson

al u

se o

nly.

Page 2: Real-time display of voice source characteristics

P. E. Garner and D. M. Howard20

Fig. 1. The measurement of fun-damental period (Tx), open phase(OP) and the closed phase (CP)from the output waveform fromthe electrolaryngograph (Lx). Theonset of CP is determined as thetime differential of Lx (DLx), andits offset from the instant wherethe Lx amplitude drops below afixed threshold (see text). Thepercentage of each cycle forwhich the vocal folds are in con-tact is known as the larynx closedquotient (CQ) which is calculatedas shown in the text.

source during phonation. The periodic oscillation ofthe vocal folds determines the fundamental frequencyof the acoustic output, perceived by the listener as thepitch of the sound.

There are various techniques and instruments usedto determine the fundamental frequency, a review ofwhich is given by Hess (6). One such instrument is theelectrolaryngograph (5) which measures the electricalimpedance across the larynx during phonation fromwhich the fundamental frequency can be derived.

The example output waveform (Lx) from an elec-trolaryngograph shown in Fig. 1 indicates an increas-ing electrical impedance as negative-going. Therefore,the positive peaks in Lx represent the closed phase(CP) of the vocal folds and the troughs the openphase (OP). The boundaries between these two

phases are found in the following way. The transitionfrom open to closed phase is a well-defined event andcorresponds to the point when the vocal fold area ofcontact is increasing most rapidly (10). This is locatedas the positive peak in the differentiated Lx waveform(DLx) and the time between these positive peaksgives the fundamental period (Tx).

There are various methods previously defined forthe transition from closed to open phase, and heremethod (a) of (3) is used which can be summarized asfollows. The opening phase corresponds to a positionwhere Lx has reached a value that divides the rangefrom peak to trough of that cycle in a fixed ratio.Previous experimental work (4) indicated a ratio of3/7 (see Fig. 1) is appropriate. The closed phase (CP)can be expressed as a ratio to the overall Lx period asshown below and is usually referred to as the closedquotient.

Closed Quotient: CQ=�CP

Tx�100

�%.

Also the fundamental frequency is given by:

Fundamental Frequency: F0=1

Tx.

These two parameters, CQ and F0, form the basis forthe real-time displays described here.Fig. 2. System block diagram.

Log Phon Vocol 24

Log

oped

Pho

niat

r V

ocol

Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Cor

nell

Uni

vers

ity o

n 11

/06/

14Fo

r pe

rson

al u

se o

nly.

Page 3: Real-time display of voice source characteristics

Real-time display of 6oice source characteristics 21

Fig. 3. CQ and F0 screen dis-play for an adult male singing atwo octave ascending and de-scending A major arpeggio fromA (110 Hz).

METHOD

The real-time display system is shown in Fig. 2 andcomprises of the electrolaryngograph and a PC com-patible computer controlling an Ariel PC-56D DSPco-processor board which incorporates a Motorola56000 DSP integrated circuit. The system has a fixedsampling rate of 19.6 kHz and a bandpass filter (10Hz–5 kHz) is also included in the system to removelow frequency variations on the Lx due to, for exam-ple, larynx movement (2) and any spurious highfrequency signals which could incorrectly trigger thesystem.

The Lx waveform is time differentiated (DLx) andthe maximum positive peaks are used to indicate thestart of each closed phase and the Tx markers. Thesepeaks are subject to a voiced/voiceless thresholdvalue set to exclude non-valid peaks in DLx due tonoise.

Initially the threshold level used is a default valueset to a typical value prior to running the software.Once a number of valid Lx cycles have been received,a new threshold is calculated in real-time using a levelextracted from the Lx waveform itself. During thenormal running of the system an average value of thepresent and the last three values is used for thethreshold level applied to the next Lx cycle. If thisnew threshold level falls below a pre-determined min-imum the system reverts back to using the originaldefault value until a new valid threshold is found.

Once a valid positive peak in DLx has been found,the system is inhibited from finding another peak fora fixed ‘‘refractory’’ time to prevent re-triggering onpeaks due to noise and local overshoots of the Lx

waveform. The refractory time can be altered and ithas a default setting of 0.75 ms.

After finding a valid Tx marker, the 56001 storesthe incoming Lx waveform up to the next Tx marker.As Lx is stored, the maximum amplitude is soughtfollowed by the minimum amplitude for that cycle.Then the amplitude threshold, established as 3/7 ofthe peak-to-peak amplitude of each cycle, can befound and used to find the end of the closed phasefor that cycle.

Real-time displays

Once the CQ and F0 data has been calculated usingCP and Tx it can be displayed in a variety of formatsby the PC.

The first of these is a plot of CQ and F0 againsttime, examples of which are shown in Figs. 3 and 4,and show displays for an adult male and an adultfemale, respectively, singing two octave A majorarpeggii, ascending and descending. The male startnote is A (110 Hz) and that for the female is A (220Hz). The F0 trace is plotted logarithmically betweenthe frequency bounds selected by the user to either 10Hz–1 kHz, 100 Hz–1 kHz or 100 Hz–10 kHz andCQ is plotted on a linear 0%–100% axis.

For the investigation of long-term trends in pro-longed passages of singing it is usual to plot CQagainst F0 with the density at a given point represent-ing the number of occurrences of a larynx cycle. Thistype of plot is usually referred to as a scattergram (1)or Qx. A typical display is shown in Fig. 5 for anadult male phonating a sustained /a/ vowel whilstvarying the pitch.

Log Phon Vocal 24

Log

oped

Pho

niat

r V

ocol

Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Cor

nell

Uni

vers

ity o

n 11

/06/

14Fo

r pe

rson

al u

se o

nly.

Page 4: Real-time display of voice source characteristics

P. E. Garner and D. M. Howard22

Fig. 4. CQ and F0 screen displayfor an adult female singing a twooctave ascending and descendingA major arpeggio from A (220Hz).

This Qx display is particularly demanding on thereal-time signal processing program since all cycles ofthe incoming Lx waveform must be included. Theloss of any Lx cycles would affect the statistics of theQx measurement.

Pilot study

Since the real-time visual display software has beendeveloped it has been used in a variety ofapplications.

By way of a pilot study, adult male singing pupilsat the Michael De Costa Academy of Singing, York,were analysed by the real-time display system duringa number of sessions of tuition. During each sessionthe instructor was able to use the display to showvarious voice source features associated with thepupils singing and to indicate overall singing ability.The scattergram display (Qx) was used over a num-ber of sessions to monitor long-term changes in thepupil’s CQ patterning. In contrast the CQ and F0against time display gave immediate visual feedbackto the student and instructor during the trainingsession. For instance with the CQ measure the in-structor would be looking for an improved constantabsolute level especially with high F0 which from anuntrained voice would invariably result in a drop inthe CQ value. The F0 display could be used in avariety of ways either for monitoring pitch accuracyor pitch variations. An example of this is shown inFig. 6. The top trace shows a pupil attempting alegato between two notes with good result in terms ofthe smooth and steady transition between the notes.In Fig. 7 the top trace shows a different pupils

attempt at the same exercise. Although the pitch risewas good the fall was poor. The instructor was ableto use the display to indicate to the pupil, visually,the nature of the problem. On the second attemptshown in Fig. 8 both rise and fall were good. Boththe pupil and the instructor found the display usefulas an indicator of various voice source features andalso as a way of charting improvement in perfor-mance to the instructor.

DISCUSSION

Previous research has shown that CQ appears toincrease with the number of years training/experiencefor adult males (8) and for adult females a variationin the pattern of CQ occurs (10). Changes in CQ hasalso been observed when singing in different styles(9). A positive correlation has been found for adultmale singers in training (15) between increasing CQand increased relative energy in the spectral regioncovering the so-called ‘‘singer’s formant’’ (16). Sopra-nos tend to make use of formant tuning, where theymodify their vocal tract shapes to place their formantcentre frequencies close to prominent lower harmon-ics of the source spectrum thereby enhancing theiramplitudes for pitches above approximately C (512Hz). Vowels become increasingly difficult to distin-guish as the pitch ascends above this value. Thesechanges may be associated with variation in CQwhich is directly proportional to fundamental fre-quency at these pitches (10).

This research would suggest that a real-time dis-play of CQ against time could be a useful parameter

Log Phon Vocol 24

Log

oped

Pho

niat

r V

ocol

Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Cor

nell

Uni

vers

ity o

n 11

/06/

14Fo

r pe

rson

al u

se o

nly.

Page 5: Real-time display of voice source characteristics

Real-time display of 6oice source characteristics 23

Fig. 5. Scattergram (Qx) screen displayof an adult male phonating a sustained/a/ vowel where every cycle of Lx isrepresented as a point plotted at its mea-sured F0 and CQ.

Fig. 6. CQ and F0 display for amale pupil attempting a legatowhere the F0 change is smooth andaurally acceptable.

to provide the quantitative measure of the voicetraining process based on previous findings forsingers in training (8, 10) and benefit the pupilboth during lessons and in private practice.

CONCLUSIONS

This paper has described a visual feedback displaysystem which is capable of displaying closed quo-tient (CQ) and fundamental frequency (F0) in

real-time in a variety of formats. The systems op-erating range includes the complete human vocalrange for both speech and singing (CQ from 20%to 80% and F0 from 64 Hz to 1 kHz) and hasbeen successfully used in a variety of speech andsinging applications as an aid to vocal trainingand assessment. The future intention is that thisdisplay will be used as a real-time visual feedbacksystem during vocal training for professional voiceusers.

Log Phon Vocal 24

Log

oped

Pho

niat

r V

ocol

Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Cor

nell

Uni

vers

ity o

n 11

/06/

14Fo

r pe

rson

al u

se o

nly.

Page 6: Real-time display of voice source characteristics

P. E. Garner and D. M. Howard24

Fig. 7. CQ and F0 display for amale pupil attempting a legatowhere the descending F0 change isabrupt and aurally unacceptable.

Fig. 8. CQ and F0 display for amale pupil attempting a legatowhere the descending F0 change isimproved (c.f. Fig. 7).

ACKNOWLEDGEMENTS

The authors thank the staff and pupils of the DeCosta Academy of singing in York, England, fortheir part in this experiment.

REFERENCES

1. Abberton ERM, Howard DM, Fourcin AJ. Laryngo-graphic assessment of normal voice: a tutorial. ClinLinguist Phon 1989; 3: 281–96.

2. Baken RJ. Clinical measurement of speech and voice.Boston: College-Hill Press, 1987.

3. Davies P, Lindsey GA, Fuller H, Fourcin AJ. Varia-tion in glottal open and closed phases for speakers ofEnglish. Proc Inst Acoust 1986; 8 (7): 538–46.

4. Fourcin AJ. Laryngographic assessment of vocal foldvibration. In: Wyke B (ed.). Ventilatory and phonatorycontrol systems. Oxford: Oxford University Press,1974.

5. Fourcin AJ, Abberton ERM. First applications of anew laryngograph. Med Biol Rev 1971; 21: 172–82.

6. Hess W. Pitch determination of speech signals. Berlin:Springer-Verlag, 1983.

7. Hirson A, Fawcus R. Visual feedback in the manage-ment of dysponia. In: Fawcus M (ed.). Voice disordersand their management, 2nd. London: Chapman Hall,1991.

8. Howard DM, Lindsey GA, Allen B. Toward the quan-tification of vocal efficiency. J Voice 1990; 4(3): 204–212. [See also Errata. J Voice 1991; 5: 93–95.]

9. Howard DM. Quantifiable aspects of different singingstyles—a case study. Voice 1992; 1: 47–62.

Log Phon Vocol 24

Log

oped

Pho

niat

r V

ocol

Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Cor

nell

Uni

vers

ity o

n 11

/06/

14Fo

r pe

rson

al u

se o

nly.

Page 7: Real-time display of voice source characteristics

Real-time display of 6oice source characteristics 25

10. Howard DM. Variation of Electrolaryngographicallyderived closed quotient for trained and untrained adultfemale singers. J Voice 1995; 9: 163–72.

11. Howard DM, Rossiter D. Results from a pilot longitu-dinal study of electrolaryngographically derived closedquotient for adult male singers in training. Proc InstAcoust 1992; 14: 529–36.

12. Howard DM, Welch GF. Visual displays for the assess-ment of vocal pitch matching development. ApplAcoust 1993; 39 (3): 235–52.

13. Howard DM, Welch GF. Microcomputer-basedsinging ability assessment and development. ApplAcoust 1989; 27: 89–102.

14. Lindsey G, Breen AP, Fourcin AJ. Glottal closed timeas a function of prosody, style and sex in England.Proceedings of Speech 88. 7th FASE Symposium, Ed-inburgh, 1988.

15. Rossiter DP, Howard DM, De Costa M. Voice devel-opment under training with and without the influenceof real-time visually presented biofeedback. J AcoustSoc Am 1996; 99 (5): 3253–6.

16. Sundberg J. The science of the singing voice. Dekalb,IL: Northern Illinois University Press, 1987.

17. Welch GF, Howard DM, Rush C. Real-time visualfeedback in the development of vocal pitching accuracyin singing. Psychol Music 1989; 17: 146–57.

SAMMANFATTNING

Realtidsdisplay a6 rostkallekarakteristika

I tidigare forskning som gallt elektroglottografiskaegenskaper i rostkallan har man kommit fram till att

stambandens slutningskvot (‘‘closed quotient’’, CQ)och grundtonsfrekvensen (F0) ar anvandbaraparametrar for att utvardera och utveckla rostbild-ningsformagan. Denna artikel beskriver utformningenav en utrustning for visuell feedback i realtid av CQoch F0 baserade pa den elektroglottografiska kurvan.Utrustningen bestar av en persondator (PC) med ettAriel PC-56D DSP –kort, som omfattar en Motorola56001 DSP integrerad krets. Detaljer fran en pilot-studie redovisas, i vilken systemet anvandes under ettantal sanglektioner. De forsta resultaten visar attsystemet ger anvandbar visuell feedback over utveck-ling av saval elev som larare under rosttraningen.

YHTEENVETO

A8 anilahteen toiminnan tosiaikainen ku6aus

A8 anilahteen toiminnan kuvaaminen elektroglot-tografisignaalista analysoidun glottiksen sulkeutumis-suhdeluvun (closed quotient) ja perussaveltaajuuden(F0) avulla on aiemmissa tutkimuksissa osoittautunutmielekkaaksi arvioitaessa henkilon aanta ja ke-hitettaessa aanenkayttoa. Tassa tyossa kokeiltiin ke-hitettya tosiaikaisesti toimivaa laitteistoa, jolla juuriCQ:ta ja F0:aa voidaan seurata aaniharjoitustenyhteydessa. Lauluharjoitusten osalta voitiin todeta,etta kuvantaminen antaa naonvaraisen palautteen aa-nentuotosta, joka auttaa seka oppilasta etta opetta-jaa.

Log Phon Vocal 24

Log

oped

Pho

niat

r V

ocol

Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Cor

nell

Uni

vers

ity o

n 11

/06/

14Fo

r pe

rson

al u

se o

nly.