catalina mal 3 h

31
Catalina Forensic Audio Toolbox Version 3.0h User’s Manual Catalin Grigoras, Ph.D.

Upload: ripudaman-kochhar

Post on 18-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

catalina manual

TRANSCRIPT

Page 1: Catalina Mal 3 h

Catalina Forensic Audio Toolbox

Version 3.0h

User’s Manual

Catalin Grigoras, Ph.D.

Page 2: Catalina Mal 3 h

Page 2 of 31

Copyright Notice (C)2007 Catalin Grigoras, Ph.D. [email protected]

Page 3: Catalina Mal 3 h

Page 3 of 31

Content Introducing Catalina Forensic Audio Toolbox..............................................................4 System Requirements..........................................................................................4 Installation...........................................................................................................5 Getting Help........................................................................................................5 Interfacing Wavesurfer..................................................................................................5 Fundamental Frequency......................................................................................7 Formants..............................................................................................................8 Long Term Average Spectrum............................................................................8 Catalina Forensic Audio Toolbox Basics General Plots.............................................................................................9 Long Term Formants...............................................................................10 Formant Space.........................................................................................11

Long term average spectrum…………………………………………...14 Recommendations........................................................................................................16 Future Developments...................................................................................................16 References....................................................................................................................17 Appendix A..................................................................................................................19 Appendix B..................................................................................................................24 Appendix C..................................................................................................................29

Page 4: Catalina Mal 3 h

Page 4 of 31

Introducing Catalina Toolbox

Catalina Forensic Audio Toolbox is a software system for forensic audio analysis. This version has been designed for Windows 98/2000/Me/NT/XP/Vista. Historical Note

The first version of the Catalina Forensic Audio Toolbox ("Catalina") was developed in 1993. At the time, the speed of PCs and sound card quality was relatively low compared to present-day equipment. The most important updates were written during completion of my Ph.D. dissertation1998-2001. The program evolved to use an external software program to do the work of analyzing speech fundamental frequency F0, formants F123 and the long term average spectrum LTAS. For this current version I use Wavesurfer 1.8.5, developed at the KTH Institute in Stockholm by Kåre Sjölander and Jonas Beskow. More details about this software can be found on http://www.speech.kth.se/wavesurfer/. In the chapter on "Interfacing Wavesurfer” I explain the use of this software with Catalina. History

- v3.0h (2007): added indications for intra-speaker variability, - v3.0c (2006): first stand-alone version, - v3.0b (2005): added information on individual vowels, - v3.0a (2004): added information on long term cumulative formant distribution, - v3.0 (2003): added information on long term average formants analysis, - v2.0c (2002): added information on long term average spectrum histogram, - v2.0 (1998-2001): my Ph.D. thesis, second major version of Catalina, - v1.0 (1992): first version of Catalina, a Matlab toolbox.

System Requirements

1. A PC running Windows 98 SE, Windows XP or Windows NT 2. A computer having a CPU of at least 133 MHz 3. A copy of Wavesurfer, version 1.8.5 or later

Special Thanks

I wish to thank IAFPA-International Association of Forensic Phonetics and Acoustics for a grant to finish this latest version of this software. I also appreciate the very important help of Professor Francis Nolan from Cambridge University, Professor Brandusa Pantelimon from Bucharest University and Durand R. Begault, Ph.D., Audio Forensic Centre, Charles M. Salter Associates, Inc., San Francisco, CA, USA.

I am grateful to the Cambridge Colleges Hospitality Scheme for making possible my visit to Cambridge in summer 2003.

Page 5: Catalina Mal 3 h

Page 5 of 31

Installation

Run CatalinaSetup.exe. By default the program will be installed on C:\Catalina and a shortcut will be placed on Desktop. You should get the following folder structure: C:\Catalina\bin - for executable files, do not modify it C:\Catalina\Evidence - for WAV files to be analysed and saved with Wavesurfer C:\Catalina\Plots - for graphical TIFF results C:\Catalina\Results - for numerical results C:\Catalina\toolbox - do not modify it.

Getting Help

For further details, you can contact the author directly at [email protected]. In the e-mail title/subject please indicate „Catalina Toolbox”. Interfacing Wavesurfer

Catalina depends on the long-term average and formant analysis capabilities of Wavesurfer. Other programs that can provide exported text versions of these analyses can also be used, but the demonstrations given here use Wavesurfer. There is a specific naming format that Catalina depends on when exporting data analyses from Wavesurfer to the 'Evidence' folder that is explained in detail below.

Run Wavesurfer and open a WAV PCM file, 8 KHz, 16 bit, mono file recommended. You should get a window like the following one (see Fig.1). Select the Speech analysis configuration.

Figure 1. Wavesurfer, Choose Configuration option

Page 6: Catalina Mal 3 h

Page 6 of 31

The Wavesurfer display will show 3 plots: waveform, spectrogram with formant estimator tracking overlay, and fundamental frequency (see figure 2).

Figure 2. Wavesurfer Speech Analysis display

The following Wavesurfer settings are recommended for correct parameters extraction: F0 Properties > Pitch contour Pitch method: ESPS Max pitch value: 200 Hz for male voices and 400 Hz for female voices Analysis window length: 0.0075 s Frame interval: 0.01 s

Spectrogram FFT window length: 256 points Analysis window type: Hamming Analysis bandwidth: 125 Hz, Window 64 points Pre-emphasis factor: 0.97 Cut spectrogram at: 4000 Hz

Formants Number of formants: 4 Analysis window length: 0.049 s Analysis window type: Hamming Pre-emphasis factor: 0.7 Frame interval: 0.01 s LPC order: 12 LPC type: 0 Down-sampling frequency: 8000 Hz

waveform

spectrogram and formants

fundamental frequency

Page 7: Catalina Mal 3 h

Page 7 of 31

Fundamental frequency

Fundamental frequency (F0) is the frequency of repetition of the (quasi-)periodic waveform of the voiced speech signal, corresponding closely to our perception of the pitch of the speech. F0 analysis can be performed with different algorithms either in the time or in the frequency domain.

In Wavesurfer the analysis can be carried out using the AMDF (Average Magnitude Difference Function) algorithm or ESPS (Entropic Speech Processing System). By default Wavesurfer uses the ESPS algorithm. See Wavesurfer manual for details about F0 settings.

To reduce spurious (false) F0 values introduced by other sounds or non-normal speech (e.g. Fig.3) three methods can be applied: filter the noises, delete these samples or change the F0 limits in Wavesurfer. This last technique is primarily for limiting non-normal speech effects (e.g., falsetto).

Figure 3. F0 selection You may create an F0 text file using the following steps: select the entire wave by pressing F11, select all with Ctrl+A, right click on F0 plot and Save data file as C:\Catalina\Evidence\filename.f0 For example, to test.wav file will correspond the test.f0 file.

Page 8: Catalina Mal 3 h

Page 8 of 31

Formants In Wavesurfer the formants analysis can be carried out using linear prediction. By default Wavesurfer uses the 12th order LPC algorithm. (Refer to the Wavesurfer manual for details about formant settings).

Catalina requires a text file containing data for formants F1-F2-F3. You will need to create an F123 text file using the following steps: (1) select the entire wave by pressing F11 or select all with Ctrl+A, (2) right click on formants plot, (3) export the formant data file as filename.frm For example, the test.wav file will correspond the test.frm file. Long Term Average Spectrum The long term average spectrum (LTAS) is the mean of successive short-term spectral analyses computed over the duration of a given speech sample. Each short-time spectrum (computed by means of the discrete Fourier transform) reflects the phonetic quality of the current segment, but the LTAS analysis characterizes the overall spectral content of the entire sample. The LTAS is influenced by the combined effect of the analyzed speech, the background noise, the equipments noises and the frequency response of the transmission chain.

Right click on Waveform and select LTAS. Make certain that the entire waveform has been selected and that 'average of selection' has been chosen. The following plot will be displayed.

Fig.4. LTAS option You now need to create an LTAS text file by clicking the 'export 'option and saving it to the filename.lts. For example, to test.wav file will correspond the test.lts file.

Page 9: Catalina Mal 3 h

Page 9 of 31

Catalina Forensic Audio Toolbox Catalina Forensic Audio Toolbox allows an examiner to compute statistics and create TIFF files containing text information and plot distributions for the data files exported from Wavesurfer or an equivalent software program:

- fundamental frequency F0, - formants F1, F2 and F3, - long term formants distributions LTCF, - long term cumulative formants distribution LTCFD, - F1-F2 space for vowels [a], [e], [i] and [o], - F2-F3 space for vowels [a], [e], [i] and [o], - long term average spectrum LTAS.

General Plots

Create or copy the filename.f0, filename.frm and filename.lts files to C:\Catalina\Evidence folder. Run Catalina from the desktop icon or C:\Catalina\bin\win32\Catalina3x.exe and select a file from the C:\Catalina\Evidence folder. Catalina will ask for the name of the F0 text file, and it will then search for this file, along with similarly-named frm and lts text export files, from this same 'Evidence' folder. The program then writes plot files to C:\Catalina\Plots using the same naming convention. As an example to demonstrate the program, select the included file test20sec. The software will start to compute statistics and create TIFF files stored in C:\Catalina\Plots. Check the resulted TIFF files on C:\Catalina\Plots 01-test20sec.tif voice profile containing F0, LTAS and F123 histogram plots 02-test20sec.tif LTAF and LTCF 03-test20sec.tif all F1 vs F2 formant space 04-test20sec.tif all F2 vs F3 formant space 05-test20sec.tif F1 vs F2 formant space 06-test20sec.tif F2 vs F3 formant space 07-test20sec.tif LTAS 08-test20sec.tif LTAS, LTCF and LTAF

Page 10: Catalina Mal 3 h

Page 10 of 31

Figure 5. General plot obtained for the file test180sec.

The general plot from figure 5 contains:

- fundamental frequency F0 histogram, mean and standard deviation F0 values, along with the total length of voiced signal; Catalina requires 8 kHz sample rate files to determine the duration correctly,

- long term average spectrum LTAS, - long term average formant distributions LTAF for F1 (red), F2 (green), F3

(blue), mean and standard deviation for F1, F2 and F3. Histograms for F1, F2, and F3

Long Term Formants Plots of the Long term average formants LTAF and long term cumulative formants LTCF are displayed on Figure 6. LTCF represents the vertical addition of all LTAF. Note that the LTCF and LTAF outlines represent the same contours of the histograms seen in the lower plot of Figure 5.

Page 11: Catalina Mal 3 h

Page 11 of 31

Figure 6. Long term formants

Formant Space Catalina creates the formants F2 vs F1 and F2 vs F3 plots, and automatically detects vowels [a], [e], [i], [o] based on the user-defined settings in the editable text file C:\Catalina\formants.txt. By default, the settings in formants.txt are as follows: 601 850 1100 1600 2200 2800 ← vowel [a] 401 600 1500 2000 2100 2800 ← vowel [e] 220 400 2000 2400 2400 2900 ← vowel [i] 370 600 700 1200 2200 2600 ← vowel [o] ↑ ↑ ↑ ↑ ↑ ↑ low high low high low high limits limits limits for F1 for F2 for F3

Page 12: Catalina Mal 3 h

Page 12 of 31

Figure 7. F1-F2 and F2-F3 space display

Page 13: Catalina Mal 3 h

Page 13 of 31

These values are those indicated in different reference for different languages. Other references may be used to determine the vowel limits for a specific language, or vowel limits can be analyzed by inspecting formant values for a specific set of speakers. An example of F1-F2 vowel space display is presented in figure 8. Filled red circles indicate the mean of the supplied values from the F123 file at those times when a corresponding F0 value has been indicated for that specific time frame. When there is no estimate for an F0 time frame, the corresponding F123 value is discarded from the mean calculation. This removes any bias from the mean estimate that would be caused by formant values analyzed during unvoiced sections. An example of F2-F3 vowel space display is presented in figure 9. In figures 8-9, the blue points adjacent to the filled red circles represent the average values for first and second halves of the all analysed formants. These dots and their values can be useful to analyse intra-speaker variability.

Figure 8. F1-F2 vowel space display

Page 14: Catalina Mal 3 h

Page 14 of 31

Figure 9. F2-F3 vowel space display

Long term average spectrum The long term average spectrum – Fast Fourier Transform (LTAS-FFT) plot produced by Catalina is identical to the LTAS plot produced in Wavesurfer. The LTAS-Histogram plot produced by Catalina shows, for each individual short-term DFT plot, the number of appearances of each energy level in the spectrum. These plots may be potentially useful in comparing speech exemplars where the same level of background noise and speech transmission system is present. Any differences in the compared LTAS plots can then be explained as resulting primarily from characteristics of vocal formants. As explained earlier, LTAS is the mean of successive short-term spectral analyses computed over the duration of a given speech sample. Each short-time spectrum (computed by means of the discrete Fourier transform) reflects the phonetic quality of the current segment, but the LTAS analysis characterizes the overall spectral content of the entire sample. The LTAS is influenced by the combined effect of the analyzed speech, the background noise and other periodic background sounds, and the frequency response of the transmission chain.

Page 15: Catalina Mal 3 h

Page 15 of 31

Figure 10. LTAS-FFT and LTAS-Histogram

Figure 11. LTAS-FFT, LTCF and LTAF plots

Page 16: Catalina Mal 3 h

Page 16 of 31

Recommendations For comparison between plots generated for different voice samples such as questioned and known exemplars, it is recommended that Catalina be used with:

- linear PCM, 8 kHz, 16 bits, mono recorded wav files, analyzed within Wavesurfer,

- known (reference, suspect) and unknown (questioned) exemplars recordings made as contemporaneously as is practically possible,

- known (reference, suspect) and unknown (questioned) recordings made with the same recording/transmission channel,

- normal/modal phonation samples, - exemplar durations of longer than 10 seconds, - speech signal to noise ratio (SNR) greater than > 10 dB.

Users should note that some telephonic transmission systems or other recordings may have high-pass filter characteristics (visible in the LTAS analysis) that can bias the estimate of F1 to a higher frequency compared to what would be recorded for the same voice, using a reference microphone and linear recording system. Future Developments Future options, including a means for calculating a likelihood ratio, will be added to future releases of the Catalina Forensic Audio Toolbox. Check the website periodically for updates.

Page 17: Catalina Mal 3 h

Page 17 of 31

REFERENCES Baldwin, J. and French, P. (1990) Forensic Phonetics, London: Pinter. Byrne, C., Foulkes, P. (2004) ‘The Mobile Phone Effect on Vowel Formants’, International Journal

of Speech, Language and the Law 11(1), 83-102 Carlson, R., Fant, G., and Granström, B. (1975) ‘Two-formant models, pitch and vowel perception’,

in G. Fant and M.A.A. Tatham (eds), Auditory Analysis and Perception of Speech, London: Academic, 55-82.

Gonzalez-Rodriguez, J., Ortega-Garcia, J. and Lucena-Molina, J.J. (2001) ‘On the application of the Bayesian approach in real forensic conditions with GMM-based systems’, Proceedings of 2001: A Speaker Odyssey - The Speaker Recognition Workshop, 135-138.

Grigoras, C. (2001) ‘Digital voice processing system’, unpublished PhD thesis, University of Bucharest, Electric Department, Romania

Grigoras, C. (2003) ‘Voice analysis on noisy recordings’, Paper presented at Cambridge Forensic Phonetics Workshop, August 2003, Cambridge, UK.

Hess, W. (1983) Pitch Determination of Speech Signals: Algorithms and Devices, Berlin: Springer-Verlag.

Hollien, H. (1990) The Acoustics of Crime: the New Science of Forensic Phonetics, New York: Plenum.

Hollien, H. (2000) Forensic Voice Identification, New York: Academic Press. Jessen, M., Köster. O. Gfroerer, S. (2005) ‘Influence of vocal effort on average and variability of

fundamental frequency’, International Journal of Speech, Language and the Law 12(2), 174-213 Künzel, H.J. (2001) ‘Beware of the ‘telephone effect’: the influence of telephone transmission on

the measurement of formant frequencies’, Forensic Linguistics 8(1), 80-99. Ladd, D.R. and Terken, J. (1995) ‘Modelling intra- and inter-speaker pitch range’, Proceedings of

the 13th International Congress of Phonetic Sciences, Stockholm, vol.2, 386-89. Laver, J. (1980) The Phonetic Description of Voice Quality, Cambridge: Cambridge University

Press. McDougall, K. (2004) ‘Speaker-specific formant dynamics: an experiment on Australian English

/aI/’, International Journal of Speech, Language and the Law 11(1), 103-130. Meuwly, D. (2001) ‘Reconnaissance de locuteurs en sciences forensiques: l'apport d'une approche

automatique’, PhD thesis, University of Lausanne. Nolan, F. (1983) The Phonetic Bases of Speaker Recognition, Cambridge: Cambridge University

Press. Nolan, F. (1990) ‘The limitations of auditory phonetic speaker recognition’, in H. Kniffka (ed.),

Texte zu Theorie und Praxis forensischer Linguistik, Tübingen: Niemeyer, 457-479. Nolan, F. (1993) ‘Auditory and acoustic analysis in speaker recognition’, in J. Gibbons (ed.),

Language and the Law, London: Longman, 326-345. Nolan, F. (2002) ‘The “telephone effect” on formants: a response’, Forensic Linguistics 9(1), 74-82. Nolan, F. (2005) ‘Forensic speaker identification and the phonetic description of voice quality’, in

W.J. Hardcastle and J. MacKenzie Beck (eds), A Figure of Speech: a Festschrift for John Laver, Mahwah, N.J.: Erlbaum, 385-411.

Nolan, F. and Grigoras, C. (2005) ‘A case for formant analysis in forensic speaker identification’, International Journal of Speech, Language and the Law 12(2), 143-173

Rabiner, L.R., Cheng, M.J., Rosenberg, A.E. and McGonegal, C.A. (1976) ‘A comparative study of several pitch detection algorithms’, IEEE Transactions on Audio, Speech and Signal Processing 24, 399-413.

Repp, B. (1982) ‘Phonetic trading relations and context effects: new experimental evidence for a speech mode of perception’, Psychological Bulletin 92, 81-110.

Rodman, R., McAllister, D., Bitzer, D., Cepeda, L. and Abbitt, P. (2002) ‘Forensic speaker identification based on spectral moments’, Forensic Linguistics 9(1), 22-43.

Rose, P.J. (2002) Forensic Speaker Identification, London: Taylor and Francis.

Page 18: Catalina Mal 3 h

Page 18 of 31

Scherer, K. R. (1986). ‘Voice, stress, and emotion’, in M. H. Appley and R. Trumbull (eds), Dynamics of Stress: Physiological, Psychological, and Social Perspectives, New York: Plenum, 159-181.

Stevens, K.N. (1989) ‘On the quantal nature of speech’, Journal of Phonetics 17, 3-45. Wells, J. (1982) Accents of English, Cambridge: Cambridge University Press.

Page 19: Catalina Mal 3 h

Page 19 of 31

Appendix A - two samples analysis of the same speaker

Page 20: Catalina Mal 3 h

Page 20 of 31

Page 21: Catalina Mal 3 h

Page 21 of 31

Page 22: Catalina Mal 3 h

Page 22 of 31

Page 23: Catalina Mal 3 h

Page 23 of 31

Page 24: Catalina Mal 3 h

Page 24 of 31

Appendix B - samples analysis of two different speakers

Page 25: Catalina Mal 3 h

Page 25 of 31

Page 26: Catalina Mal 3 h

Page 26 of 31

Page 27: Catalina Mal 3 h

Page 27 of 31

Page 28: Catalina Mal 3 h

Page 28 of 31

Page 29: Catalina Mal 3 h

Page 29 of 31

Appendix C – a short (approx. 10 sec) voice sample analysis

Page 30: Catalina Mal 3 h

Page 30 of 31

Page 31: Catalina Mal 3 h

Page 31 of 31