r '‘ veterans administration · r'‘ veterans administration journal of rehabilitation...

10
' r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol . 23 No . 1 1986 pages 119-128 Continuing Evaluation of the Queen's University Tactile Vocoder. I. Identification of Open Set Words P . L . BROOKS, Ph . D . Abstract — Identification of open set words, by an experienced B.J .FROST,Ph .D . normal hearing subject using the tactile vocoder developed at J . L . MASON, Ph . D . Queen's University, was examined . The tactile vocoder filters and D . M . GIBSON, M . Sc . processes the acoustic waveform into 16 filter channels, each of which controls a vibrator on the skin surface . After acquiring a 250- Departments of Psychology` and word vocabulary through the tactile vocoder, the subject was Electrical Engineering* presented with three sets of 1000 different open set words in three Queen's University reception conditions . The percentages of words correctly identified Kingston, Ontario, Canada K7L 3N6 in the tactile vocoder (TV), lipreading (L), and lipreading plus tactile vocoder (L + TV) conditions were 8 .8, 39 .4, and 68 .7 percent respec- tively . Phonemic analysis of stimulus/response pairs revealed that 36 .7, 64 .9 and 85.5 percent of the phonemes were correctly identified in TV, L, and L+TV conditions, respectively, indicating that incorrect-response words often contained many correct phonemes. Also, syllabic stress of stimulus and response words was identical 88 percent of the time in the TV condition . Important information about speech was transmitted through the tactile vocoder. INTRODUCTION There have been many attempts to provide tactile sensory communication for the deaf through tactile vocoders . These devices filter the acoustic waveform and transduce it into vibratory or electrical patterns of skin stimulation : see references (1) through (14) . A review of these systems and their evaluation can be found in Reed et al ., 1982 (15) and Sherrick, 1984 (16). The experiments described in this paper are part of a continuing evaluation of the tactile vocoder developed at Queen's University. In this system the acoustic waveform is filtered and the output activates a series of vibrators that are in contact with the skin . The goal of this research is to provide the profoundly deaf with a wear- able tactile vocoder that will aid in identification of speech and environmental sounds, and provide feedback for improvement of vocalizations. Several generations of tactile aids have been developed at Queen's University. The first two systems had 10 and 14 filter channels that activated a series of solenoids that touched the skin surface . Artificially deafened subjects learned to discriminate upswept and downswept tones chosen to approximate some of the critical formant transitions in speech as described by Martin in 1972 (17) . Using the second-generation system, which extended the high frequency range, subjects learned to identify 20 environ- mental sounds and a 10-word vocabulary as described by Kofmel 119 `Dr . Brooks and Dr . Frost are members of the Department of Psychology ; Dr. Mason and Mr . Gibson are members of the Department of Electrical Engineer- ing. Address for correspondence : Dr . B . J. Frost, Department of Psychology, Queen's University, Kingston, Ontario, Canada K7L 3N6 .

Upload: others

Post on 22-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

'r'‘ Veterans

Administration

Journal of Rehabilitation Researchand Development, Vol . 23 No . 1 1986pages 119-128

Continuing Evaluation of theQueen's University Tactile Vocoder.I . Identification of Open Set Words

P . L . BROOKS, Ph . D .

Abstract — Identification of open set words, by an experiencedB.J.FROST,Ph.D .

normal hearing subject using the tactile vocoder developed atJ . L . MASON, Ph . D .

Queen's University, was examined . The tactile vocoder filters andD . M . GIBSON, M . Sc .

processes the acoustic waveform into 16 filter channels, each ofwhich controls a vibrator on the skin surface . After acquiring a 250-

Departments of Psychology` and

word vocabulary through the tactile vocoder, the subject wasElectrical Engineering*

presented with three sets of 1000 different open set words in threeQueen's University

reception conditions . The percentages of words correctly identifiedKingston, Ontario, Canada K7L 3N6

in the tactile vocoder (TV), lipreading (L), and lipreading plus tactilevocoder (L + TV) conditions were 8.8, 39 .4, and 68 .7 percent respec-tively. Phonemic analysis of stimulus/response pairs revealed that36.7, 64 .9 and 85.5 percent of the phonemes were correctly identifiedin TV, L, and L+TV conditions, respectively, indicating thatincorrect-response words often contained many correct phonemes.Also, syllabic stress of stimulus and response words was identical88 percent of the time in the TV condition . Important informationabout speech was transmitted through the tactile vocoder.

INTRODUCTION

There have been many attempts to provide tactile sensorycommunication for the deaf through tactile vocoders . These devicesfilter the acoustic waveform and transduce it into vibratory orelectrical patterns of skin stimulation: see references (1) through(14) . A review of these systems and their evaluation can be foundin Reed et al ., 1982 (15) and Sherrick, 1984 (16).

The experiments described in this paper are part of a continuingevaluation of the tactile vocoder developed at Queen's University.In this system the acoustic waveform is filtered and the outputactivates a series of vibrators that are in contact with the skin . Thegoal of this research is to provide the profoundly deaf with a wear-able tactile vocoder that will aid in identification of speech andenvironmental sounds, and provide feedback for improvement ofvocalizations.

Several generations of tactile aids have been developed atQueen's University. The first two systems had 10 and 14 filterchannels that activated a series of solenoids that touched the skinsurface. Artificially deafened subjects learned to discriminateupswept and downswept tones chosen to approximate some ofthe critical formant transitions in speech as described by Martinin 1972 (17) . Using the second-generation system, which extendedthe high frequency range, subjects learned to identify 20 environ-mental sounds and a 10-word vocabulary as described by Kofmel

119

`Dr . Brooks and Dr . Frost are membersof the Department of Psychology ; Dr.Mason and Mr . Gibson are members ofthe Department of Electrical Engineer-ing.

Address for correspondence : Dr . B . J.Frost, Department of Psychology,Queen's University, Kingston, Ontario,Canada K7L 3N6 .

Page 2: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

i

BROOKS et al . : Vocoder Evaluation, I

n 1979 (18) .Results from the latter study led to modification

and design of a third-generation system, which wasdescribed in detail by Brooks and Frost in 1983 (19).Extensive evaluation of that system was undertakenby Scilley in 1980 (20) : acquisition of a tactualvocabulary was tested using two artificially deafenedsubjects who obtained information solely throughthis tactile vocoder . With 80.5 and 40 .5 hours ofexperience, respectively, two subjects learned 250and 70 words (2O,19'21) . Word pairs that had beeninitially confused became easier for the subjects toidentify as training time increased . Analyses showedthat most consonants were discriminated ; the fewexceptions were confusion of phonemes within thecategories of voiced stops, approximants, andnasals . introduction of nVvel readers did not inter-fere markedly with the subjects' performance, indi-nat)ng that the subjects were extracting generalcharacteristics of the speech signal.

In addition to this evaluation with artificallydeafened subjects, a series of experiments wascarried out on a postlingually profoundly deaf13-year-old boy (20) . in 12 hours this subject enthu-siastically learned to recognize 50 environmentalsounds. Results also indicated that the device couldbe umeful in eliminating many confusions inherentin lipreading by supplying the subject with informa-tion that helped him discriminate within mouth-movement groups. Lipreading scores within mouth-movement groups (e .g . 'pet-bet-met') averaged 33percent, whereas identification within these samegroups was 88 percent when the subject used onlyinformation from the taoti!mvocmdar . Finally, a brieftest showed that the intelligibility of the subject'sown speech improved 104 peroent, after a fewhours of training, when feedback from the tactilevocoder was provided.

Fourth-generation design -- Analysis of confu-sions led to a design modification that pre-empha-mized high frequencies . A filter with 6 dB/octavegain starting at 3 kHz was inserted after the micro-phone stage . The system with this addition wasreferred to as the fourth-generation tactile vocoder;it was evaluated using open set words as stimuliin the experiment, described here . ("Open set" refersto the fact that the subject was not previously showna list of possible alternatives on which to base the re-sponse.) Results on identification of words using thetactile vocoder alone, lipreading alone, and lipreadingplus the tactile vocoder are reported.

Performance of an experienced subject on iden-tification of open set sentences, and tracking rates

ua!ng the "trooking prooedune^ of DaFi!ippo andScott, 1878(22 are described in the paper* followingthis one. Prelingually profoundly deaf teenagershave also been trained to use the tactile vocoder.Word acquisition rates, placement of CV's in linguisticcategories, and results on standardized tests such asthe C.I .D. Sentence Test, are reported in Brooks, 1984(23) and Brooks, Frost, Mason and Gibson (24).

METHOD

Subject

One female graduate student who had nornnalhearing participated in this study . Prior to this study,the subject had spent 80 .5 hours acquiring a tactualvocabulary of 250 words, 8 hours testing on thetotal 250-word vocabulary, 20 hours of generalizationand discrimination training, and 87 .5 hours testingvariousdeeignnnndifinm\ions -- atota!of 196 hoursof experience.

Specialized lipreading training, and practice usinglipreading and the tactile vocoder in conjunction,were not given prior to this study; this subjectpossessed only the lipreading skills that normal-hearing individuals develop through everyday speechcomprehension.

Apparatus

Sound information entered the portable fourth-generation tactile vocoder through two Sony ECM-10e!ectn*t microphones, one held by the reader andthe other held by the subject . A filter with 6 dB/octave gain starting at 3 kHz was inserted after themicrophone. The sound spectrum was then analyzedby 18 one-third-octave bandpass filters (center fre-quencies : 1O0,200, 250,320,400 .500 ` 640,800, 1000,1260, 1800, 2000, 2560, 3200, 4000, 5000, 6400, and8000 Hz) . Outputs from the 160-Hz and 200-Hz filters,and from the 250-Hz and 320-Hz f\!terG, were com-bined to form two channels with increased band-widths.

Each filter channel was followed by an envelopedetector and a logarithmic amplifier . The outputfrom a given filter channel was used toamplitude-modulate a 100-Hz square-wave carrier that drovethe solenoids.

The system was housed in aoabinet 16x23x28cm (weight 3 .6 kg), which was attached to a 12-Vpower supply. Power consumption of the system

was 7.0W. Further details on the signal processing

'Brooks PL, Frost BJ, Mason JL, Gibson DM : Continuing Evalua-tion of the Queen's university Tactile Vocodor - II : Identification

of Open Set Sentences and Tracking Narrative

Page 3: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

121-

-

-

-

`

can be found in the user manual by Gibson, 1983

(25) . Five of these portable systems have been con-structed atthetirneofthievv,iting.

The 16 solenoids were spaced 3 cm apart alonga supporting piece of Plexiglas which was attachedto the subject's ventromedial forearm . Output fromthe channel of the lowest frequency activated thesolenoid closest to the wrist, with increasing fre-quencies represented along the length of the array.

Procedure

The subject and reader sat in separate adjoiningsound-attenuated booths separated by four panesof glass with air gaps between the panes . Earplugsand headphones delivering white noise were wornby the subject, to mask noise from the vibratingsolenoids.

The 3000 English words used as stimuli weredrawn from Kuoeramnd Francis (26) and O'Rouke(27), and are shown in the appendices of Brooks (23).Test lists of approximately 40 words were createdby the reader.

Word identification was tested daily under threedifferent conditions : tactile vocoder alone (TV) ; lip-reading alone (L) ; and lipreading plus the tactilevocoder(LfTV)i In a half-hour session the subjectwould receive one test in the L condition and anothertest in the LfTVoondition . In a different half-hoursession a test was given under the TV condition.The order of the sessions, and conditions withinsessions, were randomly determined . Approximately7 hours of testing occurred for each of the L andL+TV conditions and 14 hours of testing occurredfor the TV condition . The number of presentationsgiven in the TV condition, described below, accountsfor the difference in testing time between conditions.

In the L and L+TV conditions, visual contactwas allowed between the two sound-attenuatedbooths. The subject wore the vocoder in bothconditions and set the ON/OFF switch in accordancewith the experimental condition . The reader wasnot informed of the condition order. During thesession, the reader established eye contact withthe subject and then said a stimulus word in aclear, normally paced voice . The subject's responsewas heard through an intercom and was alsorecorded . A single presentation of each test wordwas given . Immediate feedback was providedfollowing the subject's response.

In the TV condition, visual contact between thesubject and reader was blocked . The reader said astimulus word and the subject responded . If theresponse was not correct the stimulus word waspresented again -- with a maximum of three

Journal of Rehabilitation Research and Development Vol . 23 No . 11980

presentations allowed . No information was furnishedthat enabled the subject to determine how closeshe was to identifying the word until after the threepresentations.

Training in open set word identification was notgiven to the subject prior to these tests, It shouldbe noted that, concurrent with this testing, thesubject was also receiving a half-hour per day oftesting both on sentence identification and ontracking. A minimum of a half-hour break separatedthe four daily sessions, and the order of the foursessions in a given day was random.

RESULTS

Before any data are presented, it should be notedagain that the experiments described were notdesigned to enable direct comparison between theTV condition and the two lipreading conditions . Inthe L and L+TV conditions, words were presenteda single time, whereas in the TV condition stimulicould be presented up to a maximum of three times.However, since the data for the TV condition under-went the same analyses as the two lipreadingconditions, the results are presented together.

Identification of Open Set Words

Initially, the data were analyzed by calculatingthe number of words that were correctly identifiedfrom the 1000 different words presented in eachcondition. A correct identification was scoredwhen the phonemes in the response word wereidenUowl to the stimulus word (e .g. the response'friend' to the stimulus word 'friends' would beincorrect).

In the L condition 39 .4 percent of responses werecorrect, whereas in the L+TV condition 681 percentof response words matched stimulus words ; animprovement of 29.3 percent in performance . In theTV condition a total of 88 (8 .8%) words, shown inTable 1, were correctly identified . Of the random1000 stimulus words presented in the TV condition,64 were also found in the 250-vvord acquired vocab-ulary. Forty of these 64 (62.5%) were correctlyidentified . It is noteworthy that these words werenot confused with linguistically similar wordswhen they were included in the larger seti

The errors that occurred in thetaod!e'vocVder-a!one condition demonstrated that substantialcorrect information was being obtained throughthe device, even though the response word was notphoneme-for-phoneme identical to the stimulusword. In many cases, only one phoneme would bedifferent in the stimulus and response paira, as

Page 4: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

122

BROOKS et al . : Vocoder Evaluation, I

TABLs2Near Misses Obtained in the TV Condition.

Stimulus Response Stimulus Response

actaffordannualappliedatattentionbasebratbreathbrothercasecan'tcatcentralcheckooundlcreepydaddatedeardoordoubteightexplainfactsfarfigureframefunfurtherGermanyglasshate

atafraidannuallyapplaudbatdetentionbusratproofrathercursecatcotcentrechickcanoelteepeehaddeetearforoutatexclaimactsforfingerfromsunfatherchimneygashat

historykeenlossleftlineslivesmachinemerelymorningmutteredoffpoisonousringscreenscurryseensheshirtsnowspokestaffstepstrivetasktastethroughthustraditiontruthwordworst

sisterlybeenmossliftlenslinesmissionyearlymeaningwateredifbusinessbringskeenscaryseedshoeshortsospeakstuffstopstovepaskcostthreethisconditiontrueworldwest

some examples in Table 2 indicate . In numerouscases the stimulus and response words differed bymore than one phoneme, but the similarity betweenthe words indicated that significant informationhad been acquired through the tactile vocoder.Samples of some of the more interesting substitu-tions, where two or more phonemes were incorrect,are also shown in Table 2.

Visual analysis of incorrect TV responses

Visual analysis of word pairs in the TV conditionrevealed that the subject's incorrect responseswere not random but followed specific patterns.

1 . The subject's response decision was influencedmore by the consonants in the stimulus words thanby[hevowels . In many cases, such as the ones

TAeLsIOpen Set Words Identified Correctly : Tactile Vocoder

AboutAboveAcrossAfternoonAgeAllowedAttackBabiesBabyBirdBitBreakfastCarCareCarsCenturyDifficultDiscussionDoesn'tEndEnoughFeelFinishedFixedForgiveFriendsFunctionHabitHalfHappenHeartHotInsteadInteresteditchKeepLetterListLostLunchManyMarriedMatterMaybe

MisterMoneyMusic

MyNeedNumberOpenOutsideParkPastPicturePleasePoorRadioRelaxSaltSeedSevenShallSheShipShopShowSpeakSoSonSomethingStandingStrangeStrongStruckStudentStudentsStuffSuccessTestThinkTodayTogetherTriedTrustUnderstandUntilWeek

Page 5: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

123

Journal of Rehabilitation Research and Development Vol . 23 No . 1 1986

shown below, the response consonants were allcorrect but the response vowels were not.

Stimulus Response Stimulus Responseoff if applied applaudhate hat step stopframe from spoke speaktape top staff stuff

2. The phonemic substitutions that occurredwere often from within the same linguistic category.For example, the response to 'past' was 'cask'.Both the stimulus and response words began andended with unvoiced stop consonants, althoughthe specific unvoiced stops were confused. Otherstimulus-response pairs show substitutions amongvoiced stops ('bad-dog', 'agreed-abroad'), nasals('aoen'eing' . 'oun-oa!nn`), approximants ('round-wind','clay-tray'), and confusion between nasals andapproximants ('watch-much', 'merely-yearly') . Whenphonemes were not identified correctly, often aphoneme within the same linguistic category wassubstituted.

3. Stimulus-response pairs such as 'worst-west','step-stop', 'staff-stuff', 'spoke-speak', 'task-cask','scurry-scary', and 'left-lift' indicate that the subjectwas successfully identifying many unvoiced-fricative/unvoiced-stop consonant combinations.

4. The syllable / n/ (e .g . tension) was presented24 times in the novel words and 11 responses tothese words also contained this syllable . An addi-tional 4 responses contained the voiced syllable/ nl

( e .g . decision). Thus the subject reliably identifiedthis ey!\ab!o, even though the 250-word acquiredvocabulary only contained the voiced syllable /n/once (television).

5. The voiced stop consonants /b/ and Id/ werefrequently omitted when they occurred at the be-ginning of a word, as the stimulus-response pairs'dear-ear' and 'bring-ring' demonstrate.

6. When the approximants /r/ and /I/ occurred ina consonant cluster they were not always detected,as the following stimulus-response pairs demon-strate: e!eep'ooap, glass-gas, greon'b gan, creepy-teepee.

The patterns of errors that occurred in the lip-reading condition will not be discussed in detailhere. A substantial amount of research exists thatreports the lipreading confusions that are typicallyfound . See: Woodward and Barber, 1960 (28) ; Fisher,1868(2Q) ; Jeffers and Barley, 1971 (30) ; Franks andKimble, 1972 (31) ; Binnie, Jackson, and Montgomery,1976 (32) . The present findings mirror those results .

Briefly, confusions occurred between phonemesthat have the same mouth-movements.

Examinations of stimulus and response pairs inthe L+TV condition indicated that many of theerrors could be categorized as follows:

1. Incorrect identification of vowels : e .g. 'done-den', 'press-price'.

2. Voicing errors : e.g . 'bump-pump', 'surface-service' and 'view-few'.

3. The consonant /h/ was omitted or added ina8veral cases such as the following ones : 'older-holder', 'arm-harm' and 'he's-ease'.

Phonemic analysis of stimulus-response pairs

Response patterns in all three conditions revealthat simple calculation of the number of wordscorrectly identified does not accurately reflect thetotal amount of information that is available throughthe tactile vocoder . In order to obtain additionalinformation about the capability of the systems, aphonemic analysis of the data was undertaken.Data analysis in the first stage involved phonemictranscription of the 6000 stimulus and responsewords. A graduate in linguistics, familiar with theSouthern Ontario accent, performed the transcrip-tion . The 14 vowel and 25 consonant phonemesemployed in the transcription are shown in Table 3.

Stimulus and response words were then matched,phoneme-for-phoneme. Two extra symbols denotedthat a phoneme existed in the response word thatdid not exist in the stimulus word and vice versa.For example, if the response 'face' was uttered forthe stimulus word 'faces', a symbol indicated thatwhen /z/ was presented no response had occurred.If the stimulus and response words differed mark-edly, the match was performed in serial order . Thephonemic transcription was undertaken on the3000 stimulus-response pairs in the L, TV, and L+TVconditions.

A reliability check was undertaken on thotran-scription andmnotohingprnooUurea .Fivepornentofthe stimulus/response pairs were randomly chosen,and transcription and matching carried out by asecond linguist was compared to that of the first.Agreement between the two linguists was 97 percent.

A computer program constructed a matrix of thestimulus-response phoneme pairs that occurred ineach of the three conditions, and calculatedpercentage-correct scores.

It should be noted that in all instances the effectsof context should not be ignored . Although aphonemic analysis has been undertaken on thedata, the original data concerned words . The sub-ject vvaunotaek8dtoid8ntifyind\viUua!phon8nn g 8.

Page 6: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

124

BROOKS et aL : vnooues"muavnn .}

or CV's, from a fixed set, but instead attempted toidentify English words . By responding with words,contextual cues were brought into play . Conoe-quendy,ao/ne phonemes were undoubtedly addedor missed so that a real word was given as aresponse.

Results from the phonemic analysis are displayedin Table 3 for the TV, L, and L+TV conditions . When

TABLE 3Identification of Phonemes within Words

Vowels TV L L+TV_1 beat 33 .2%

_72 .5%

..87396

! bit 36 .7 60 .5 842e bait 21 .9 66 .4 80 .4E bet 31 .7 65 .0 82 .5ae bat 3&7 67 .1 85 .7a bought 22 .9 82 .2 95 .5o boat 18 .5 85 .8 96 .6u put 0 44 .0 882u boot 22 .0 82 .8 95 .8A but 37 .9 61 .8 81 .2al bite 7 .1 69 .6 887an bout 60 .0 889 100 .0oc boy 0 44 .4 100 .03' b!rd 300 72 .2 902

Consonants TV L L+TV

p 17 .5% 74.2% Q1 .4Y6t 66 .4 52 .1 83 .2k 46 .5 40 .6 79 .3b 26 .0 59 .8 82 .5d 411 47 .7 73 .6

g 11 .7 53 .3 67 .3m 16 .3 74 .1 95 .6n 43 .0 53 .0 70 .7

n (sing) 39 .0 4&8 89 .8f 45 .5 90 .6 96 .20 (thin) 23 .8 96 .0 96 .7

1 (shin) 48.3 43 .7 87 .0

l{ (chin) 51 .7 53 .6 86 .8s 76 .2 57 .2 85 .5h 3Q]} 74 .4 83 .3hw 0 880 100 .0

v 22 .5 77 .9 94 .1

8 (the) 27 .8 928 100 .0

z 53 .2 303 84 .7

d3

(jaw) 22 .8 041 88.9

3 (vision) 25 .0 375 100 .0

w 13 .1 78 .2 92 .0

r 21 .7 85 .2 91 .9

i (yes) 322 37 .5 70 .3

15 .4see* et_

84 .5_tee ..

91 .7

Total CorrectPhonemes

et_

367% 64.9%

response words were incorrect, the responses oftencontained many correct phonemes . Although in theTV condition only 8.8 percent of the words werecorrectly identified, if the responses were analyzedat the phonemic level the a:tual number of pho-nemes correctly identified was 363 percent . Similarly, in the L and L+TV conditions, the percentagesof words correctly identified were 39 .4 percent and681 percent, respectively, whereas at the phonemiclevel, response words contained 64,9 percent and85.5 percent correct information.

Once again, it should be noted that the subjectreceived three trials for each word in the TV condi-tion and one trial for words in the other conditions.Thus the TV condition should not be compareddirectly with the L or L+TV conditions, or the com-parison must be made acknowledging the proceduraldifferences.

An average of 4882 phonemes were presented ineach condition, and the frequency of occurrencefor each phoneme followed the frequency withwhich they occur in the English language. In theL+TV condition, 32 of the 39 phonemes were corcor-rectly identified more than 80 percent of the time,whereas in the L condition 9 of the 39 identificationscores were greater than 80 percent . In every case,performance with the tactile vocoder and lipreadingwas superior to lipreading performance alone.When the total number of correct phoneme identifi-cations was taken as a percentage of the totalnumber of phonemes presented, it was found that64 .9 percent and 85.5 percent of vowels werecorrectly identified in the Land LfTVownditions '

respectively, an improvement of 20 .6 percentagepoints.

In the TV condition, 36 .7 percent of phonemeswere correctly identified . 'The six phonemes withthe highest percent correct scores were /t/ . /k/ '

f 1,

and M. One characteristic shared by thesesounds is that they contain primarily high frequencyenergy. Identification of/t/, /Uf , /a/, and tz/ werehigher with the tactile vocoder alone than with lip-reading alone.

Identification of phonemes within linguisticcategories

From vioual analysis of Table 2 it is apparent thatthe majority of errors were not random . Phonemic

substitutions in the response words were oftenfrom the same linguistic category as the stimulusphoneme. To examine this more thoroughly, fre-quencies from the stimulus-response matrix ofphonemes were summed to form the categories ofunvoiced stops, voiced stops, unvoiced fricatives,

Page 7: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

125

Journal of Rehabilitation Research and Development Vol . 23No. 1 1900

voiced fricatives, and approximants. Detailed resultswere described in Brooks in 1984 (23) . Briefly,identification using only the tactile vocoder was74.7 percent for unvoiced fricatives and 73 .4 percentfor unvoiced stops . ForthoLfTVcondition, identi-fication of unvoiced stops, unvoiced fricatives,voiced fricatives, and approximants hovered around90 percent correct, indicating that the featuresdefining the categories were reliably representedthrough the combination of the tactile vocoder andlipreading.

Identification of number of syllables andstress pattern

One of the striking observations that can bemade from Table 2 is that stimulus and responsewords in the TV condition follow the same rhythmicpattern, in most instances . Even when the responseis incorrect, as in 'sisterly-history', the number,duration, and stress pattern of the syllables wereidentical.

To examine this relationship more closely, thesyllable patterns of the words in the TV, L, andL+TVconditions were scored according tothodic-tionary entryforaaoh word . Of the 3000 stimuluswords, 82 .0 percent of them fe!l within the fivefollowing categories:

1. Monosyllabic words : one-syllable words2. Trochees: two-syllable words, first syllable

stressed, second unstressed (e .g. father)3. lambs: two-syllable vvonja, first syllable un-

stressed, second stressed (e.g. hotel)4. Dactyles: three-syllable words, first syllable

stressed, second syllable unstressed, third syllableunstressed (e .g . history)

5. Ambibrachs: three-syllable words, first syllableunstressed, second syllable stressed, third syllableunstressed (e .g . together)

To ensure reliable probabilities, only these fivecategories of stress pattern were examined . Table4 displays the percentage correct within each cate-gory for the three reception conditions . The totalnumber of cases where the stress patterns of thestimulus and response words matched were alsotaken as a percentage of total stimuli presented.The percentage of response words that had stresspatterns identical to those of the stimulus wordswas 88 .0, 86.6 and 96.3 for the TV. L, and L+TVconditions, respectively . The L+TV condition con-tained the highest performance score for each ofthe five categories.

The most general error observed was that under-

TABLE 4Identification of Syllabic Stress : within each of five namedcategories of stress pattern in one-, two-, and three-syllablewords, the percentages indicate percent of words for whichresponse word showed number, duuminn, and stress pattern ofsyllables identical to stimulus word, under each of the threereception conditions.

SyllabicReception Condition

Stress TV L L+TV

Monosyllable 95 .9% 92.2% 97.8%Trochee 82 .7 83 .8 97 .6lamb 792 82.9 92 .5Dactyl 50 .0 71 .4 89 .5Ambibrach 07 .9 82 .9 91 .1

Total 88 .0 86 .6 96 .3

estimation of the number of syllables occurredmore frequently than overestimation, across condi-tions. In the TV condition when dactyles werepresented, 42.5 percent of the time the responseword was a trocheo, a very striking confusion. Inmost of these cases a nasal, approximant, or vowelexisted between two syllables in the stimulus word,and the response word given indicated that thissyllable was missed.

One vital point should be emphasized . Thissyllable analysis was undertaken because of patternsthat had been observed in the data . But in theexperiment, a stimulus word was given and thesubject's task was to respond with a word . If thesubject had been asked simply to tap out the rhyth-mic pattern of the word, the results might havebeen quite different.

DISCUSSION

Two particularly striking results arose from thisstudy. First, after limited training that involved theacquisition of a 250-word vocabulary through thetactile v000der ' a subject was able to identify 8 .8percent of the random English words presentedthrough the system . Words for which the subjecthad never received prior tactual experience wererecognized correctly . In addition, the incorrectresponses were not random, but instead often con-tained many phonemes that were identical to thosefound in the stimulus words . Further, when thephonemes in the stimulus and response wordswere not identical, the substitution phonemesoften belonged to the same linguistic category asthe stimulus phonemes . Thus, important informationabout speech is available through the tactile vocoder.

The second striking result is the impressive per-forrnen000btained when the tactile vocoder and

Page 8: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

126

BROOKS et al . : vocode,Evaluation, !

lipreading were used in conjunction . Responsewords were identical to the stimulus words 681percent of the time, and at the phonemic level 85.5percent of the response information was correct.Thirty-two of the 39 phonemes were identifiedcorrectly more than 80 percent of the time.

The fact that words were used as stimuli in theseexperiments complicated the work of differentiatingbetween the effects of the particular receptioncondition and those of language competence.Language skill is acknowledged to be a crucialfactor in speech perception, and the difficulty ofthe word identification task was reduced becauseof linguistic predictability within words. The experi-ment could have been designed to test performanceemploying nonsense syllables or CV's as stimuli,thereby removing within-word contextual effects.However, the present philosophy of the researchersis that the device should be tested primarily insituations that approximate the normal environment,rather than by concentrating attention on analyticquestions or by requiring identification of elements.

It is also apparent that the tactile vocoder andlipreading provided different types of information.If the information available through the tactilevocoder duplicated that supplied by lipreading, onewould expect the L and L+TV scores to be similar.However, 20 .6 percent more phonemes were correctlyidentified in the LfTVonndition than in the L con-dition, and for every phoneme, identification washigher in the L+TV condition.

The types of information available through lip-reading and the tactile vocoder appear to becomplementary, as well as supplementary. Spocifi-oa!!y, four features that were detectable throughthe tactile vocoder (but were difficult to lipread)were voicing, frication, nasality, and plosion. Thetwo types of information used in conjunction mayprovide sufficient information to allow knowledgeof contextual constraints to limit the alternativeseven more.

One final point to note about the L+TV conditionis the apparent ease with which the two sources ofinfornne + ic n were integrated. The very first time thesubject used both the tactile vocoder and lipreadingtogether, 68 .0 percent of the words were correctlyidentified . It appears that speech features can beextracted by independent modalities and combinedby higher-order processors, and that both theex-traction andinteg[8tionoanocnurwithea8e .

Number of syllables and stress pattern of wordswere identified well in the TV, L and LfTVcondi-tions. Performance in the L+TVoondition . where86.3 percent of the response words had stress

patterns identical to those of the stimulus words,approached perfect performance . Remember thatthis value was calculated by taking the Lotal num-ber of cases where the stimulus and responsewords had identical stress patterns compared tothe total number ot stimuli presented . Thus, thefrequently occurring stress patterns contributedmore heavily to the measure than did the infrequentones. This ensures that the percentage has moremeaning for generalizing to real-world situationsthan it would have if all categories had been pn*-oontedequa\!y.

Of particular interest was the high correspondencebetween the stress patterns of stimulus words andresponse words in the TV condition . Some research-ers (33, 34, 35, 36) have expressed concern thatsince epeotral displays divide the acoustic wave-form into many channels, there is an absence of asingle coherent amplitude envelope. The importanceof the envelope, as an integrator for the more rapidspectral changes, has been stressed by Cole andScott in 1974 (37) . Kirman in 1973 (35) has suggestedthat failure of spectral displays may be due to thelack of an available amplitude envelope . Scott et al.in 1977 (36) stated that the impressive performanceof Gaults's subjects, who in 1924 were able to iden-tify any one of 120 sentences, may be due to thefact that Gault's single vibrator provided the rhythmicpatterns of stress (1) . The present results, where 88percent of the response words in the TV conditionhad patterns of stress identical to those of thestimulus words, contradicts these ideas.

Results from the present study also do not agreewith more recent studies . Bourgeois and Goldsteinin 1983 (38) reported tests of visual (L), tactual (TV),and visual-plus-tactual (L + TV) perception of numberof syllables in sentences that contained 1, 3, 5, andy ay!!ab!ee. The device used in the taotual condi-tions was the MESA: see Sparks et al ., 1978 (11).MESA is a two-dimensional frequency/amplitudedisplay . Five normal, temporarily deafened subjectswere familiarized with the conditions for three one-hour sessions before testing began . The percentageof correct syllable-tapping responses for mono-syllabic stimuli was 62 .7, 94 .0, and 80.0 percent forthe TV, L, and L+ TV conditions, respectively. Thelipreading results were very similar to those obtainedin the present study (92 .2 percent), so the comparisonbetween studies is probably valid . However, per-formance in the TV condition differed dramatically.!nihg TVcondition inth8present experiment mono-syllabic words were correctly identified 95 .5 percent

of the time, which is 33 .2 percent higher then fheresults obtained by Bourgeois and Goldstein's

Page 9: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

127

Journal of Rehabilitation Research and Development Vol . 23 No . 1 1986

/

°

subjects. The trend was similar for three-syllablestimuli : identification was 42 .3 percent in theBourgeois and Goldstein study and 60 .7 percent inthe present study. It is possible that differencesbetween subjects could account for this discrep-ancy, since the present study uses a single subject,whereas Bourgeois and Goldstein report an averageover five subjects . However, the most obviousexplanation for the difference in results is theamount of training the subjects in the two studiesreceived using the tactile vocoder . Approximately200 hours were received by the subject previous tothis study, whereas only 3 hours of training weregiven in the Bourgeois and Goldstein study (38).

In determining syllable number, an inexperienceduser generally counts the number of "bounces" feltthrough the tactile vocoder . Since the frequencycontent of the incoming auditory signal is dividedinto many channels, this is not an accurate indica-tion of number of syllables . However, an experiencedsubject learns to judge syllable stress and numberaccurately by means other than "bounce counting".The subjects in Bourgeois and Goldstein's studywere never provided with feedback, so they wereunable to adopt amore reliable strategy for identify-

ing Discouraging results were also reported byBaach!orend Carney in 1981 (39) ; they examinedsyllabic stress patterns as well as doing syllabiccounting . Normal hearing subjects were tested ontheir ability to detect syllabic stress through twodifferent vibrotactile instruments.

The first device was the Fonator, described bySchulte in 1070 (40) . It delivers an unfiltered vibra-tory signal to the subject's fingers or palms.

The second device was the 24-channel tactilev000derdSGigned by Engelmann and Roaov, whopublished in 1975 (7) . Their device more closelyresembles the device used in this experiment . Thesubjects were asked to identify the number ofsyllables present in 70 words of 1, 2, and 3 syllables,and the syllabic stress of 10 examples each ofepondeao, trochees, iambs, dactyls, ambibrachs,and anapests.

For identification of syllable number, both trainedand untrained subjects scored around 66 percentwhen tested on the Fonator . Performance scoreson their tactile vocoder after six training sessions(58%) were 20 percent higher than pre-trainingscores, although in general performance was poorerthan that found on the Fonator.

In the present study, the data were reanalyzed toexamine identification of number of syllables inone-to-three-syllable words through the tactile

vocoder alone, it was found that 90 .7 percent of thestimulus words had response words with the samenumber of syllables . Performance of the experiencedsubject in this study (90 .7%) was substantially betterthan that obtained by BOaoh!erand Carney's sub-jects (58%) on a similar system.

Results for the identification of syllable stressshowed much the same trends . Regardless of train-ing, all subjects tested on the FOn8torin Beachlerand Carney's 1981 study scored in the 47-49 percentrange . Subjects trained on their vocoder scored anaverage of 44 percent correct, whereas untrainedsubjects scored 38 percent . Data for the presentstudy was analyzed to examine syllabic stressidentification of theoixoetogorieeoftvvo- end three-syllable words that were used in the Beaoh!nrandCarney study . For the 491 words that fa!l in thesesix categories, 391 (78 .6%) had identical syllabicstress as the stimulus words . This is much superiorto the average of 44 percent scored by Beachlerand Carney's subjects (39).

Beachler and Carney concluded that cautionshould be exercised in using vibrotactile devices,particularly multichannel ones, to teach syllablenumber and stress . However, the results from thepresent study strongly suggest that these con-clusions were premature. Six sessions of trainingmay not be sufficient to judge the capabilities ofany tactile vocoder . In the tactile vocoder conditionin the present study, 88 .0 percent of response wordshad identical syllabic stress as stimulus words,which indicates that very important stress informa-tion is being transferred through the device.

The most noteworthy problem in detection ofsyllabic stress that existed in the TV condition wasthe frequent lack of detection of syllables when/I,w,r,j,m,n/ occurred at syllabic boundaries . This isa problem that had been observed previously withour tactile vocoder by Scilley (20) . Erber (34) hasattempted to show this optically by displaying theenvelope patterns ofwords that have been bondpaan'filtered between 200 and 600 Hz . Most two-syllablewords produce a pattern that resembles what wehear . However, many words do not produce thistypical pattern, and the anomalies occur wheneversyllables are joined by phonemes such as nasals oraproximants . For example, the word "lemon" ap-pears as a single energy burst . Thus the lack ofdetection of syllable boundaries that sometimesocouravvhen/! .vv,r,] .no/or/n/are at the boundariesmay be a result of the acoustic properties of thewaveform, and of our lack of an accurate definitionand understanding of the word "syllable ."

In conclusion, the results of this paper indicate

Page 10: r '‘ Veterans Administration · r'‘ Veterans Administration Journal of Rehabilitation Research and Development, Vol. 23 No. 1 1986 ... placement of CV's in linguistic categories,

128

BROOKS et al . : Vocoder Evaluation, I

that a very large amount of information is availablethrough the tactile vocoder . Open set words wereidentified correctly with information obtained onlythrough the tactile vocoder. When this informationwas integrated with information available throughlipreading, 85 .5 percent of the phonemes withinwords were correctly identified.

The following paper examines the performanceof the same experienced subject, using open setsentences plus the tracking procedure . Thesestimuli provide syntactic and semantic contextualinformation that is not present in isolated words.Using continuous speech also allows examinationof the problem of parsing continuous speech intowords

REFERENCES

Gault RH: Progress in experimentson tactile interpretation of oralspeech . J Abn Soc Psych 19 :155-159,1924.Wiener N, Wiesner J : Felix (Sensoryreplacement project). Q Prog RepRes Lab Electron MIT, 1949-1951.Guelke RW, Huyssen RMJ : Develop-ment of apparatus for the analysisof sound by the sense of touch . JAcoust Soc Am 31 :799-809, 1959.Pickett JM, Pickett PH : Communica-tion of speech sounds by a tactualvocoder . J Speech Hear Disord 6:207-222, 1963.Ifukube T, Yoshimoto C: A sono-tactile deaf-aid made of piezoelectricvibrator array . J Acoust Soc Jpn 30(8):461-462, 1974.Kirman JH : Tactile perception ofcomputer-derived formant patternsfrom voiced speech . J Acoust SocAm 55 :163-169, 1974.Engelmann S, Rosov R: Tactual learn-ing experiment with deaf and hearingsubjects . Except Child 41 :243-253,1975.Goldstein MH, Stark RE: Modificationof vocalizations of preschool deafchildren by vibrotactile and visualdisplays . J Acoust Soc Am 59(6):1477-1481, 1976.Saunders FA, Hill WA, Simpson CA:Speech perception via the tactilemode : Progress report . Proc IEEE IntConf Acoust Speech Sig Processing,Philadelphia . pp 594-597, 1976.Yeni-Komshian GH, Goldstein MH:Identification of speech sounds dis-played on a vibrotactile vocoder. JAcoust Soc Am 62 :194-198, 1977.Sparks DW, Kuhl PK, Edmonds AE,Gray GP: Investigating the MESA(multipoint electrotactile speech aid):The transmission of segmented fea-tures of speech . J Acoust Soc Am63(1) :246-257, 1978.Sparks DW, Ardell LA, Bourgeois MR,Wiedmer B, Kuhl PK: Investigating

the MESA (multipoint electrotactilespeech aid) : The transmission ofconnected discourse . J Acoust SocAm 65(3) :810-815, 1979.

13. Oiler KD, Payne SL, Gavin WJ : Tac-tual speech perception by minimallytrained deaf subjects. J Speech HearRes 23(4) :769-778, 1980.

14. Clements MA, Durlach NI, Braida LD:Tactile communication of speech:Comparison of two spectral displaysin a vowel discrimination task . JAcoust Soc Am 72(4) :1131-1135,1982.

15. Reed CM, Durlach NI, Braida LD : Re-search on tactile communication ofspeech : A review . ASHA Monographs20, 1982.

16. Sherrick CE : Research on tactile aidsfor deaf people : Progress and pros-pects . J Acoust Soc Am 75(5) :1325-1342, 1984.

17. Martin BD: Some aspects of visualand auditory information transfer bycutaneous stimulation . Unpubl M . Sc.thesis . Queen's University, Kingston,Canada, 1972.

18. Kofmel L : Perception of environ-mental and speech sounds througha vibrotactile device. Unpubl B .A.thesis . Queen's University, Kingston,Canada, 1979.

19. Brooks PL, Frost BJ : Evaluation of atactile vocoder for word recognition.J Acoust Soc Am 74 :34-39, 1983.

20. Scilley PL: Evaluation of a vibrotac-tile auditory prosthetic device for theprofoundly deaf . Unpubl M .A . Thesis.Queen's University, Kingston, Can-ada, 1980.

21. Brooks PL, Frost BJ, Mason JL,Chung K : Identification of wordsthrough the tactile vocoder . J AcoustSoc Am (in press .)

22. DeFilippo CL, Scott BL : A method fortraining and evaluating the receptionof ongoing speech . J Acoust SocAm 63(4)1186-1191, 1978.

23. Brooks PL : Comprehension of speechby normal-hearing and profoundlydeaf subjects using the Queen ' s

University tactile vocoder . Unpubldoctoral dissertation . Queen's Uni-versity, Kingston, Canada, 1984.

24. Brooks PL, Frost BJ, Mason JL, Gib-son DM : Identification of words andmanner of articulation featuresthrough a tactile vocoder by pro-foundly deaf subjects . J SpeechHear Res (in press).

25. Gibson DM : Tactile Vocoder UserManual . Queen's University, Kings-ton, Canada: Dept Electrical Engi-neering, 1983.

26. Kucera H, Francis WN : Computa-tional analysis of present-day Amer-ican English . Rhode Island, BrownUniv Press . pp 5-11, 1967.

27. O'Rourke TJ : A basic course in man-ual communication . Silver Spring,MD: Nat Assn Deaf . pp 107-119,150-161,1973.

28. Woodward MF, Barber CG : Phonemeperception in lipreading . J SpeechHear Res 3(3) :212-222, 1960.

29. Fisher CG: Confusions among visu-ally perceived consonants . J SpeechHear Res 11 :796-804, 1968.

30. Jeffers J, Barley M : Speechreading.Springfield, IL, Chas . C Thomas . pp41-78, 1971 (Fifth printing, 1977).

31. Franks JR, Kimble J : The confusionof English consonant clusters in lip-reading . J Speech Hear Res 15 :474-482, 1972.

32. Binnie CA, Jackson PL, MontgomeryAA: Visual intelligibility of conso-nants : A lipreading screening testwith implications for aural rehabili-tation. J Speech Hear Disord 41:530-539, 1976.

33. Erber NP: Speech-envelope cues asan aid to lipreading for profoundlydeaf children . J Acoust Soc Am 51:1224-1227, 1972.

34. Erber NP: Vibratory perception bydeaf children . Int J Rehab Res 1 :27-37, 1978.

35. Kirman JH : Tactile communicationof speech : A review and an analysis.Psycho) Bull 80(1) :54-74, 1973.

36. Scott BE, DeFilippo C, Sachs, R, Mil-ler J : Evaluating with spoken text ahybrid vibrotactile-electrotactile aidto lipreading . In Papers from the Re-search Conference on Speech-Proc-essing Aids for the Deaf . Washing-ton, D.C . : Gallaudet College. pp 106-118, 1977.

37. Cole RA, Scott B : Toward a theory ofspeech perception . Psychol Rev 81:348-374, 1974.

38. Bourgeois MS, Goldstein H : Visualand tactual perception of syllablenumber in sentence stimuli . J AcoustSoc Am 73(1) :367-371, 1983.

39. Beachler CA, Carney AE : Vibrotactileperception of suprasegmentals : Acomparison of single-channel andmulti-channel aids . J Acoust Soc Am69(S1):S123(abstract), 1981.

40. Schulte K: Fonator System, In Inter-national Symposium on SpeechCommunication Ability with Pro-found Deafness . Washington, D .C.:Alex Graham Bell Assn Deaf :351-353, 1970.

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12 .

L