1 ful:incorporating phonological theory into asr aditi lahiri(prof in oxford) henning reetz(prof in...
TRANSCRIPT
![Page 1: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/1.jpg)
1
FUL: Incorporating phonological theory into ASRAditi lahiri (Prof in Oxford)Henning Reetz (Prof in Frankfurt)
presented by Jacques KoremanJacques Koreman (ISK), presntation speech group IET at NTNU
![Page 2: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/2.jpg)
2
Acknowledgment and responsibilities
• Some of the slides (in Times New Roman) were made available by Henning Reetz
• The ideas are all Aditi’s and Henning’s• Their (mis)representation is mine…
![Page 3: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/3.jpg)
3
What is FUL, and why is it interesting?
FUL stands for featurally underspecified lexicon. This presentation addresses its main characteristics:
• Underspecified features are omitted from the underlying representation
• Non-stochastic approach, in contrast to any current techniques in ASR• Psychological reality proven by psycholinguistic and other evidence
![Page 4: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/4.jpg)
4
An example of underspecification
Underspecification can help to deal with assimilation,as for instance in spontaneous speech
green bag green grassoften realised as greem bag greeng grass
while lame dog long dayis never realised as lane dog londay
Why? Because /n/ is underspecified for place and can therefore borrow a place features from its
neighbourwhile /m/ is [LABIAL]
![Page 5: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/5.jpg)
5
FUL featural specification
The specification of features is constrained by universal properties and language-specific requirements: for German [ABRUPT] and [CORONAL] (cf. ”green”) are not specified in the lexicon.
• FUL uses monovalent, not binary features
• V and C share the same place features
The type of features are very much under debate: binary or monovalent, fully specified or underspecified, V and C features together or separate, feature names?
On the next slide, the latest version of the feature hierarchy in FUL is shown.
![Page 6: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/6.jpg)
6
Latest version FUL feature hierarchy
![Page 7: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/7.jpg)
7
Lexical entries/access in FUL
• Entries contain underspecified representations.As opposed to standard full, binary specification!
• Each morpheme has a unique representation.Diametrically opposed view of dealing with variation in
the signal compared to exemplar-based modelling!
• Rough signal parameters mapped onto phonological features (no segments, syllables or other intermediate representation)Unlike detailed acoustic analysis in other systems!
• Features used to directly access the lexicon using a non-stochastic, ternary matching procedure.Human speech processing as opposed to pattern
matching?
![Page 8: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/8.jpg)
8
ASR on the basis of a FUL
How does ASR with FUL work?
Slides 9-18 explain the recognition steps in the FUL system.
Why does ASR with FUL work?
After that, evidence for the approach from human speech processing will be presented.
![Page 9: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/9.jpg)
9
Acoustic signal (stream of samples)
Stream of phonological features
SegmentsProsodyMorphologySyntaxSemantic
Phonological & syntactic parsing
match no mismatch mismatch Matching process
Acoustic front end
Word candidates
Representationwith
phonologicalfeatures
Word lexicon
Overview of the FUL system
![Page 10: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/10.jpg)
10
Acoustic signal (stream of samples, waveform)LPC
FFT• • •
Stream of formants and spectral shape parameter
Heuristics(e.g. [high] := F1<450 Hz)
Stream of features (labial, nasal, low,...)
Heuristics(e.g. length > 5 ms)
20ms window1 ms step rate
1 ms step rate
synchronise features
Stream of corrected and synchronised featuresend
LPCAcoustic font end
Could maybe also be
landmarks…
![Page 11: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/11.jpg)
11
speech signal
formants
heuristics, e.g. [high] := F1 < 450 Hz
Acoustic font end... parameter extraction
![Page 12: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/12.jpg)
12
Phonological features
[son]
[low][high]
•
••
•
Acoustic font end….. to features
![Page 13: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/13.jpg)
13
Phonological features, filtered and synchronised01111001001
01110001001
00110001010
00110000010
01110001000
10001000000
00110010101
00110001010
01110000000
10001000000
01110000001
10001000000
00110001010
01000001000
00110010010
01110010001
[son]
[low][high]
•
••
•
Acoustic font end….. features filtered/synchronised
![Page 14: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/14.jpg)
14
p f u a s b a i s t S p i t s ´
btdfv
vpbsz
UoO´
OE{o´e
SZtd
• • •
Lexicon search with underspecified features01111001001
01110001001
00110001010
00110000010
01110001000
10001000000
00110010101
00110001010
01110000000
10001000000
01110000001
10001000000
00110001010
01000001000
00110010010
01110010001
[son]
[low][high]
•
••
•
Acoustic font end….. lexical access with features
![Page 15: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/15.jpg)
15
strident /s/
labial nasal /m/labial nasal /m/
no mismatch /s/ strident
nasal /n/
labial /p/consonantal
features, computed from signal at one instance in time
features, stored in the lexicon
labial nasal [m]
• • • • • •
labial no mismatch /p/consonantal
no mismatch /n/ nasal
The crunch of FUL: ternary matching
![Page 16: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/16.jpg)
16
labial nasal /m/
features, stored in the lexicon
features, computed from signal at another instance in time
coronal nasal [n]
labial nasal /m/
labial nasal /m/
features, computed from signal at one instance in time
features, stored in the lexicon
labial nasal [m]
no mismatch /n/ nasal
nasal /n/no mismatch /n/ nasal
The crunch of FUL: ternary matching
![Page 17: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/17.jpg)
17
• • •
/fa/ {„fang!“ catch! verb, imp., ....}
{„fange“ I catch verb, 1st sg., ....}
{„fangen“ we catch verb, 1st pl.+ inf., ....}
{„fang an“ start! verb, imp., ....}
{„fange an“ I start verb, 1st sg., ....}
{„fangen an“ we start verb, 1st pl., ....}
{„fang auf“ catch! verb, imp., ....}
Morphological extension of underspecif.
![Page 18: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/18.jpg)
18
The crunch of FUL: ternary matching
matching features
features in lexiconscore =
2
x features in signal
• Mismatches cause words in the lexicon to be dropped from the list.
• No-mismatches or matches do not, but lead to different scores for the word candidates by comparing the number of features derived from the signal with those specified in the lexicon:
An im-probable system?
![Page 19: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/19.jpg)
19
An im-probable system? Evidence.
FUL stands for featurally underspecified lexicon.
• Underspecified features are omitted from the underlying representation
• Non-stochastic approach, in contrast to any current techniques in ASR
• Psychological reality proven by psycholinguistic and other evidence
![Page 20: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/20.jpg)
20
Evidence for underspecification:semantic priming in lexical decision• Crossmodal experiment (German):
– hear prime: Honig (honey) Hammer (hammer)
– see target: Biene (bee) Nagel (nail)
• Subjects’ task: lexical decision
• Pseudo-word Ho[m]ig primes Biene, but Ha[n]er does not prime Nagel
• Conclusion: [n] underspecified for place in lexicon
leads to no-mismatch for Ho[m]ig,but [m] in lexicon is labial, thus
mismatch for Ha[n]er
![Page 21: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/21.jpg)
21
Evidence for underspecification:semantic priming in EEG
• The N400 is an event-related potential (ERP) component typically elicited by unexpected linguistic stimuli.
• It is characterized as a negative deflection peaking ca. 400ms after stimulus presentation.
• In models of speech comprehension, N400 is often associated with the semantic integration of words in sentence context; its finding is interpreted as pointing to the activation of a process working on semantics in the general time frame.
![Page 22: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/22.jpg)
22
Evidence for underspecification:semantic priming in EEG
• word target: Hor[d]e (horde) Pro[b]e (test)pseudo-word target: Hor[b]e (horde) Pro[d]e (test)
• Subjects’ task: speeded lexical decision
• Similar RTs for words and pseudo-words, but more errors in lexical decision for Hor[b]e (no-mismatch for Hor[d]e) than for Pro[d]e (mismatch on Pro[b]e)
Also large negative peak for Pro[d]e but not for Hor[b]e (which behaved similarly to real words).
• Conclusion: [d] underspecified for place in lexicon, but [b] specified as [LABIAL]
![Page 23: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/23.jpg)
23
Evidence for underspecification:vowel listening in MEG experiment
• standard (continuous): [o:]
deviant (played once): [ø:]
• Subjects’ task: just listen….
• Asymmetrical MisMatch Negativity (MMN) effect (perception of change) for [o:]- [ø:] greater than for [ø:]- [o:] : higher amplitude difference ca. 180 ms from onset of deviant and earlier effect.
Similar effects for other pairs.
• Conclusion: Results fit with underspecification
![Page 24: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/24.jpg)
24
Evidence for underspecificationAnd there is more evidence
• from CVC gating experiments in English and Bengali, where a non-nasalised oral vowel could lead to both oral and nasal responses when the CV is heard (Lahiri & Marslen-Wilson, 1991,1992)
• from priming experiments, suggesting there are two kinds of [o:] in German, one which is specified for [labial,dorsal] (Boote-Bötchen as primes for Boot), the other specified only for [labial] (Söhne-Söhnchen as primes for Sohn)
• from language change in Miogliola (Northern Italian), wher two types of [n] were shown to exist, one [coronal], the other unspecified for place (Ghini, 2001).
….and more
![Page 25: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/25.jpg)
25
Conclusions
• FUL is an implementation of phonological theory in ASR.
• FUL is firmly grounded in psycholinguistic experiments and observations on language change.
• FUL recognition is robust against variation in speech, but does not contain mechanisms to normalize for variation not directly related to the linguistic content (as we possibly do when we begin to understand a speaker better when we first meet him/talk to him on the phone), nor to use this information.
![Page 26: 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman](https://reader036.vdocuments.site/reader036/viewer/2022070411/56649d025503460f949d5c71/html5/thumbnails/26.jpg)
26
References
This presentation was mainly based on
• (a draft version of)Lahiri, A. & Reetz, H. (2002). "Underspecified recognition", in C. Gussenhoven & N. Warner (eds.) Laboratory Phonology 7. Berlin: Mouton, 637-675.
• Lahiri, A. & Reetz, H. (submitted to J. Phon.). ”Distinctive features: phonological underspecification in processing”.
• See also: http://ling.uni-konstanz.de/pages/ proj/sfb471/ publ/d-3.html