forensic audio and visual evidence 2004-2007: a review€¦ · fields of expertise: audio analysis,...

30
Review on Forensic audio and Visual Evidence 2004-2007 15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007 Forensic audio and visual evidence 2004-2007: A Review Authors: Jurrien Bijhold(1), Arnout Ruifrok(1),Michael Jessen (2), Zeno Geradts(1), Sabine Ehrhardt (2), Ivo Alberink (1) Editors: Jurrien Bijhold (1) , Stefan Gfroerer (2), Richard Vorder Bruegge (3) 1 Netherlands Forensic Institute (NFI) 2 Bundeskriminalamt Germany (BKA) 3 FBI Digital Evidence Laboratory (FBI) Version: October 1, 2007 Abstract Although audio and visual evidence (video, photographs and laserscans) may have been treated by the same experts in many organizations in the past, it is now clear that there are a number of totally different fields of expertise that deal with these types of evidence. In this review, we distinguish six fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo- grammetry and 3D modeling, and Facial identification. However, experts on investigations of authen- ticity and integrity of audio and video still share a number of methods and can benefit from each others knowledge and expertise. Since most of this evidence is now generated as digital data, more expertise on digital evidence is needed. Experts on facial identification, speaker identification and forensic lin- guistics share a common interest in the use of statistics, and methods for dealing with subjective in- formation. Audio analysis: The new methods for investigation of authentication and integrity, based on the elec- tric network frequency and the use of opto-magnetic crystals, that have been introduced in the period of the review 2001-2004, are well known now, and more reports are being published on the use and effectiveness of these methods. Speaker identification: In the period of this review a lot of research has been done on the use of statistics of frequency measurements in acoustic analysis. Forensic speaker identification is now preferably based on a combination of results from auditory analysis and acoustic analysis as well, using a Bayesian framework for assessing the evidential value. Forensic linguistics: this expertise is now often requested for in the analysis of letters claiming re- sponsibility for politically motivated offenses, and the language samples from refugees in order to con- firm their alleged origin. The use of text databases for statistical analysis is being debated. Video analysis: the widespread introduction of large scale digital video surveillance systems in public and private domains resulted in large scale research and development programs and the development of a new field of expertise closely linked to the development of special organizations for dealing with digital evidence. Photogrammetry and 3D modeling: laser scanning of crime scenes has become well known tech- nology. The widespread use is still hampered by the high costs of equipment and training of person- nel. A large number of papers have been published now on body length estimation from CCTV im- ages. Measurement errors have been studied and quantified. The evidential value is strongly limited by the number of uncertainties in a case. Facial identification: error rates between 5 and 10 % for facial image recognition and identification have been reported for biometric systems and human observers. The latest studies show that human observers who are assisted by biometric systems perform significantly better. For facial reconstruction from the remains of skulls, computer modeling software is available that can work with statistical data on soft tissue thickness measurements. In this review paper a full overview is given on all relevant developments in these fields of discipline, based on an extensive search in literature databases and the exchange of information in a large num- ber of conferences.

Upload: others

Post on 07-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

Forensic audio and visual evidence 2004-2007: A Review Authors: Jurrien Bijhold(1), Arnout Ruifrok(1),Michael Jessen (2), Zeno Geradts(1), Sabine Ehrhardt (2), Ivo Alberink (1)

Editors: Jurrien Bijhold (1) , Stefan Gfroerer (2), Richard Vorder Bruegge (3) 1 Netherlands Forensic Institute (NFI) 2 Bundeskriminalamt Germany (BKA) 3 FBI Digital Evidence Laboratory (FBI)

Version: October 1, 2007

Abstract

Although audio and visual evidence (video, photographs and laserscans) may have been treated by the same experts in many organizations in the past, it is now clear that there are a number of totally different fields of expertise that deal with these types of evidence. In this review, we distinguish six fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and 3D modeling, and Facial identification. However, experts on investigations of authen-ticity and integrity of audio and video still share a number of methods and can benefit from each others knowledge and expertise. Since most of this evidence is now generated as digital data, more expertise on digital evidence is needed. Experts on facial identification, speaker identification and forensic lin-guistics share a common interest in the use of statistics, and methods for dealing with subjective in-formation. Audio analysis: The new methods for investigation of authentication and integrity, based on the elec-tric network frequency and the use of opto-magnetic crystals, that have been introduced in the period of the review 2001-2004, are well known now, and more reports are being published on the use and effectiveness of these methods. Speaker identification: In the period of this review a lot of research has been done on the use of statistics of frequency measurements in acoustic analysis. Forensic speaker identification is now preferably based on a combination of results from auditory analysis and acoustic analysis as well, using a Bayesian framework for assessing the evidential value. Forensic linguistics: this expertise is now often requested for in the analysis of letters claiming re-sponsibility for politically motivated offenses, and the language samples from refugees in order to con-firm their alleged origin. The use of text databases for statistical analysis is being debated. Video analysis: the widespread introduction of large scale digital video surveillance systems in public and private domains resulted in large scale research and development programs and the development of a new field of expertise closely linked to the development of special organizations for dealing with digital evidence. Photogrammetry and 3D modeling: laser scanning of crime scenes has become well known tech-nology. The widespread use is still hampered by the high costs of equipment and training of person-nel. A large number of papers have been published now on body length estimation from CCTV im-ages. Measurement errors have been studied and quantified. The evidential value is strongly limited by the number of uncertainties in a case. Facial identification: error rates between 5 and 10 % for facial image recognition and identification have been reported for biometric systems and human observers. The latest studies show that human observers who are assisted by biometric systems perform significantly better. For facial reconstruction from the remains of skulls, computer modeling software is available that can work with statistical data on soft tissue thickness measurements. In this review paper a full overview is given on all relevant developments in these fields of discipline, based on an extensive search in literature databases and the exchange of information in a large num-ber of conferences.

Page 2: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

Introduction In the review on audio and visual evidence 2001-2004 Two separate papers were presented. The first paper presented a very detailed description of the fields of expertise in (1) audio analysis, (2) speaker identification and (3) linguistic authorship. The second paper presented a general overview of the many different types of forensic investigations on visual evidence and a way to catagorize them into three general fields of expertise: (4) imaging and video technology (5) crime scene photography, la-serscanning, photogrammetry and 3d-modeling, and (6) biometric identification. Two other discplines, medical imaging and pattern recognition in forensic database were treated as miscellenious topics in a separate chapter. In this review, all topics in audio and visual evidence are covered in one paper, and two of the previ-ously mentioned fields of expertise will be named differently. The most important developments will be presented for six general fields of expertise: (1)audio analysis, (2) speaker identification, (3) linguistic authorship, (4) imaging and video technology, (5) photogrammetry, crime scene recording and 3d-modeling, (6) facial image identification and earprints. In a separate chapter, we focus briefly on a number of forensic activities that do no fit well in the previous chapters. The last chapter deals with relevant organizations and their work in forensic image and audio analysis. Each chapter starts with a list of keywords to help the reader finding topics of interest. Literature references are given at the end of each chapter. This review is based on information from an extensive search in literature databases (Inspec, Com-pendex, Scopus, patent databases and several others), participation in meetings organized by the AAFS, ENFSI and SPIE, and contacts with the working groups SWGIT, LEVA, EESAG, ENFSIDIWG and ENFSIAS. This review is certainly not complete for two reasons: most of the information used is obtained from American and European sources and the scope of the review is limited to the fields of expertise that the authors have been working in or with. Due to the amount of publications in certain fields, the authors have not read the complete articles for making this overview. However, for a large number of articles, abstracts provided by the literature database could be used.

Audio analysis (Michael Jessen) authentication, speech enhancement, transcription of linguistic content / disputed utterance examina-tion, non-speech events, magneto-optical methods, ENF, Electric Network Frequency The primary domains of forensic audio analysis are authentication, speech enhancement, transcription of linguistic content / disputed utterance examination, and the analysis of non-speech events. For the authentication of analog recordings on audio tape, the use of magneto-optical investigations based on the Faraday or Kerr effect has been recognized by now as the most efficient, accurate and non-destructive method (Boss et al. 2003; Bouten et al. 2007). This method makes use of crystals that contain ferrimagnetic garnet film. When such a crystal is brought into contact with the audio tape it captures the magnetization patterns on the tape very accurately. Viewed under polarized light, these patterns can be examined and photographed. Various types of crystal are available as well as different setups of how the tape is placed and how the crystals are examined. A recent development is MOSeS (magneto-optical sensor system). This system is more practical than previous ones by offering better tape transport (hence easier processing of larger amounts of tape) and more flexible processing of a wider variety of tapes beyond regular compact audio cassettes, including microcassettes and video tapes (Boss 2005). Magneto-optical-based authentication can detect various types of information, including erase head marks, which can indicate if a passage has been deleted, and unusual track widths and positions due to misaligned audio heads etc., which can help to individualize the recorder that was used. Although the general principles of the magneto-optical method are known, many of the specifics of the observable patterns have been interpreted on the basis of experience rather than on a deep theoretical understanding of the physical explanations behind these patterns. An important im-provement upon this state of the art has been provided recently by Bouten et al. (2007), who presents a detailed theory of magneto-optical authentications and tested its validity with experiments where the

Page 3: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

audio data and recording devices are known. At the 2005 meeting of the ENFSI (European Network of Forensic Science Institutes) Working Group for Forensic Speech and Audio Analysis (FSAAWG) in Wiesbaden the results of a collaborative exercise on analog audio authentication were presented, showing, among other things, that the use of magneto-optical methods is time-consuming but that is offers advantages not achievable by other methods.

Whereas methods for the authentication of analog recordings on tape are well established, still extremely little has been published on authentication of digital audio (see the overview of Cooper 2006). Although tampering might leave no traces if performed by somebody with profound knowledge in areas such as signal analysis, acoustics and phonetics, and if the original material is of sufficient quality, tampering is more than a simple cut and paste operation (Cooper 2005). For example, since the recordings are rarely in studio quality they will probably contain background noise, and cutting or copying sections of speech might lead to noticeable discontinuities. With authentication of digital speech data there is a stronger burden on linguistic and phonetic methods than before. The general idea of these methods is that additions or deletions might have taken place where the sequence of linguistic and phonetic events in a recording does not correspond to the known rules of linguistic and phonetic sequencing. For example, the sequences of words might not correspond to syntactic rules, or the sequences of sounds and sound elements might not be compatible with the rules of phonotactics or coarticulation. Application of this method requires good knowledge of the features of spontaneous fluent speech, since the phonological and phonetic sequencing rules in this speech style can differ, for example, from those in read speech, on which more published work is available.

An important authentication method that works with digital but not normally with analog data (due to wow and flutter in analog tape recorders) is the Electric Network Frequency (ENF) detection method (Grigoras 2005, 2007). This method registers the (near) random fluctuations of the 50 Hz fre-quency emitted by the electric network (60 Hz in the USA). As research by Grigoras (2007) has shown, these fluctuation patterns over time have remarkable stability over space, for example being essentially the same in Bukarest, Madrid and The Hague. When an ENF database over several years exists (as it has been compiled by Grigoras), the ENF fluctuation patterns of a forensic case can be matched to a specific time of recording (if the recording quality is sufficient). One very important aspect of the ENF method is to identify the mains-powered recording type (e.g. regular electric network, unin-terruptable power supply, inverters etc.). Another major aspect is the possibility to locate a recording in time, thereby being able to refute incorrect claims as to when a recording took place. Another potential of the method lies in detecting several simultaneous ENF components, which can indicate that the original was re-recorded – opening the possibility that an authenticity violation took place. Fourthly, edits from the original are detectible if the time of recording is known and it can be seen that there is not a complete, but only a section-wise match of ENF fluctuation patterns. Speech enhancement is a widely researched topic that is of interest for many applications, forensics being only one of them. A recent special issue of Speech Communication on that topic (issue 49, 2007) gives an impression of the state of the art in that area (see also textbooks, such as Davis 2002). As summarized in the editorial of that special issue, speech enhancement works very well in dealing with stationary noise and deterministic problems and it is very successful with multi-channel re-cordings, whereas the strongest challenges occur with nonstationary noise and constantly-changing speech distortions and when only a one-channel recording is available. It is unfortunately these more complicated types of distortions as well as the one-channel scenarios that occur most frequently in forensic speech material. Nonstationary distortions require the use of adaptive algorithms that esti-mate and attack the distortion patterns in short time intervals.

Hu & Loizou (2007) performed a perceptual experiment on four classes of speech enhance-ment methods (spectral subtractive algorithms, subspace algorithms, Wiener-type algorithms and statistical-model based algorithms) in four types of real-world nonstationary settings (car, multi-talker babble, street, suburban train noise). In terms of an overall quality rating the subspace algorithms performed worst, but even the other classes of algorithms performed significantly above unprocessed speech only with certain combinations of variables involving the specific algorithms from the four classes, the SNR amplitude level of the distortion and the type of distortion. Multi-talker babble was the most difficult type of distortion, which none of the speech enhancement methods handled better than baseline – probably due to its strongly non-stationary character and the fact that spectral ranges and patterns of different talkers overlap to a large extent. These results correspond well to the experi-ence in practical forensic work that speech enhancement in case material with complicated distortions only rarely crosses the perceptual border from complete non-intelligibility to complete intelligibility. Even when no such overall success is achieved speech enhancement is useful because some listen-ers, when transcribing speech, prefer enhanced speech and get better results with it, while others

Page 4: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

prefer working with the unprocessed original and perform the distortion normalization cognitively, based on sufficient experience.

There is a large number of commercial speech enhancement systems/software on the market that are explicitly directed towards practitioners in forensic speech and audio analysis. These systems differ in a variety of aspects, including the number and type of enhancement tools and methods, whether they offer fully automated routines of analyzing and attacking the distortions, their ability to interface with other speech processing systems/software, whether they are based on genuine forensic experience or are offshoots from general audio processing, and, last but not least, their price, which can range from less than € 1,000 to about € 30,000 and more. At the 2005 meeting of the ENFSI-FSAAWG in Wiesbaden, Anna Czajkowski (FSS) and Reva Schwartz (USSS) listed a variety of com-mercial speech enhancement systems and compared them with respect to criteria like support, stabil-ity, documentation of enhancement sessions (audit trail), plus the five aspects mentioned above. The list of commercial speech enhancement systems/software was as follows: DAC (MCAP / PCAP / PCAP II Plus / Cardinal / MAP II); Cedar for Windows / Cambridge / DNS 1000; Sound Cleaner; DC Live Forensics; SES4; Sonic Solutions; Foenics; Audio Cube; Signalscape (StarWitness SoundBite / StarWitness Audio); Pristine Sounds; Algorithmix Sound Laundry; NCT Clear Speech; Cool Edit / Adobe Audition; MIT Enhancement Tools. Tests of the intelligibility gains of these or other forensic enhancement systems are planned as a collective exercise within the ENFSI-FSAAWG group. Several methods have been proposed for testing the perceptual effects of speech enhancement. The ones used by Hu & Loizou (2007) actually only provide an indirect test of intelligibility. More direct ones are the modified rhyme protocol and similar tests often used in audiology (cf. Sungyub et al. 2007).

The main goal of speech enhancement is to enable more accurate and more efficient tran-scriptions of the speech and non-speech events in a low-quality recording. Some practitioners also use speech enhancement prior to speaker identification procedures. However, such an approach has to be justified and applied carefully. Speech enhancement is commonly used in automatic speaker recognition, but most experts in auditory-acoustic speaker identification use it only rarely, like for ex-ample when a strong hum prevents the measurement of fundamental frequency. It is probably also less appropriate to use enhancement prior to disputed utterance examination than to the transcription of linguistic content. Transcription of linguistic content and disputed utterance examination both have the same goal but differ in their methods and scope. In both tasks the goal is to identify the linguistic events (most spe-cifically the sequence of words that were spoken) in speech material that for various reasons suffers from limitations in intelligibility. The two tasks might also contain aspects of speaker identification when it is not entirely obvious which sections were spoken by which person.

The transcription of linguistic content usually involves relatively long stretches of speech that can be minutes, sometimes hours in duration. The linguistic content is transcribed based on intensive and repetitive listening in a quiet environment using high-quality equipment. A special notation system is used that is capable of documenting events such as filled pauses (like “uh”, “uhm”) and simultane-ous talking of more than one speaker, or of documenting whether a given portion of the utterance can be understood in more than one way. Methodologically, transcription of linguistic content is mainly an auditory-holistic approach in the sense that the practitioner directly transcribes as much as s/he can hear, without performing a phonetic or linguistic analysis. Being mainly holistic does not mean to imply that proper expert training is not necessary. As pointed out convincingly by Fraser (2003), there are many perceptual and psycholinguistic effects – often counterintuitive from laypersons perspective – that need to be known for the forensic transcription of difficult material, which requires academic train-ing in linguistics, phonetics and psycholinguistics.

Disputed utterance examination differs from linguistic transcription in two aspects. First, the duration of the material to be processed in disputed utterance examination is much shorter. It often boils down to a single alternative of words that could have been spoken, over which the dispute in a case arises. Secondly, the approach to disputed utterance analysis is mainly analytical, by drawing upon phonetic and linguistic methodology and knowledge. If for example, a particular sound or sound combination is difficult to understand in a crucial word, it is investigated auditorily and acoustically how the speaker produces this sound in other structurally analogous words that are more intelligible. A recent case of a disputed utterance examination was presented by Hirson et al. (2007). Most frequently, the experts performing audio analysis are trained in speech analysis. For them the analysis of non-speech events is something of an interdisciplinary undertaking. (Some of these events, however, are rather closely connected to speech analysis, like for example the signals that are com-mon in telephone communications.) Whereas some non-speech events can be identified based on

Page 5: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

general everyday experience, some are so specialized that it is either necessary to contact a specialist on these sounds or to either run an experiment that is tuned to the specifics of the case or to compile a database of sounds that can be used for later reference. An example of contacting a specialist oc-curred in a recent series of cases at the German Bundeskriminalamt (BKA), where frequent instances of birdsong were found on tapes and where one of the requests was to estimate where and when the recordings took place. With the help of an ornithologist is was possible to narrow down the location in terms of geographic region and the distinction between urban and rural areas and also to narrow it down to a certain time of the day and season during the year. An example of running specialized ex-periments and compiling a database is gunshot analysis.

• Boss, D.; Gfroerer, S; Neoustroev. A new tool for the visualization of magnetic features on audiotapes. International Journal of Speech, Language and the Law 10: 255-276, 2003.

• Boss, D. Visualisation of magnetic features by means of crystals – recent developments and experiences. Paper presented at the 14

th Annual Conference of the International Association

for Forensic Phonetics and Acoustics, Marrakesh, Morrocco, 2005 [see Conference Report in: In-ternational Journal of Speech, Language and the Law 12: 279-289].

• Bouten, J.; van Rijsbergen, M.; Donkers, S. Derivation of a transfer function for imaging po-larimetry used in magneto-optical investigations of audio tapes in authenticity investiga-tions. Journal of the Audio Engineering Society 55: 257-265, 2007.

• Cooper, A.J. Detection of Copies of Digital Audio Recordings for Forensic Purposes. Ph.D.-dissertation, The Open University, 2006 [www.robinhow.com/ac].

• Davis, G.M. (ed.) Noise Reduction in Speech Applications. Boca Raton: CRC Press, 2002.

• Fraser, H. Issues in transcription: factors affecting the reliability of transcripts as evidence

in legal cases. International Journal of Speech, Language and the Law 10: 203-226, 2003.

• Grigoras, C. Digital audio recording analysis: the Electric Network Frequency (ENF) Crite-rion. International Journal of Speech, Language and the Law 12: 63-76, 2005.

• Grigoras, C. Applications of ENF criterion in forensic audio, video, computer and telecom-munication analysis. Forensic Science International 167: 136-145, 2007.

• Hirson, A.; Howard, D.; McClelland, E. Perceptual testing and speech synthesis in the analy-

sis of disputed utterance. Paper presented at the 16th Annual Conference of the International

Association for Forensic Phonetics and Acoustics, Plymouth, UK. 2007

[www.iafpa.net/confprog07.htm].

• Hu, Y.; Loizou, P.C. Subjective comparison and evaluation of speech enhancement algo-rithms. Speech Communication 49: 588-601, 2007.

• Sungyub, D.Y.; Boston, J.R.; El-Jaroudi, A.; Li, C.C.; Durrant, J.D.;Kovacyk, K.; Shaiman, S. Speech signal modification to increase intelligibility in noisy environments. Journal of the Acoustical Society of America 122: 1138-1149, 2007.

Speaker identification (Michael Jessen) Speaker identification, speaker recognition, auditory-acoustic methods, acoustic analysis, vocal tract characteristics, formant frequencies, automatic speaker recognition Most of the new developments and controversies between 2004 and 2007 have taken place in the domain of speaker identification by experts. It is on that domain that the present review will focus. For a more comprehensive overview of methods in forensic speaker identification see Broeders (2001,

Page 6: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

2004), as well as introductory texts such as Rose (2002). The main two approaches in speaker identi-fication by experts are the auditory-acoustic method and the automatic method. It is becoming less clear whether a third approach, referred to as “semiautomatic”, still needs to be recognised as a sepa-rate method. First, the number of publications between 2004 and 2007 that refer to their approach to forensic speaker identification as semiautomatic has become very small indeed. Secondly, the semi-automatic method already is a hybrid between auditory-acoustic and automatic speaker identification, in the sense that, essentially, it adopts from the auditory-acoustic method the use of acoustic-phonetic parameters such as fundamental frequency and the interpretation of these data by an expert trained in phonetics, and that it adopts Bayesian statistical modelling from the automatic method. And thirdly, interactions between the auditory-acoustic and the automatic method have intensified during the past years. For example, phoneticians working with the auditory-acoustic method more often than before use Bayesian modelling in the expression and evaluation of their results, which had been a character-istic of the semiautomatic approach. And researchers in automatic speaker recognition increasingly make use of linguistic and phonetic properties referred to as higher-level features. One striking development in auditory-acoustic speaker identification in the 2004-to-2007 period is the increased interest in formant frequency measurements in speaker comparisons. This increased inter-est is probably partially due to the circumstance that in 2002 the Court of Appeal in Northern Ireland ruled that expert opinions in speaker identification must not be confined to auditory analysis, but that the expert should also present evidence from acoustic analysis, including formant measurements. Though not binding elsewhere, this ruling is now taken into account by experts throughout the UK, (French & Harrison 2006). Though there have been concerns that formant frequencies might be strongly affected by landline or mobile telecommunication or by compression schemes such as ATRAC and MP3, research has shown that this influence is small and manageable as long as certain precautions are taken – like for example avoiding the measurement of formant values that are very close to the limits of the telephone passband (Byrne & Foulkes 2004).

Formant frequency measurements in forensic speech analysis have been approached from different directions. The most classical method is the measurement of the center frequencies of differ-ent phonetic/phonological vowel categories in a language (Rose 2006a for overview). As emphasized by Rose (2006b), Bayesian evaluation of formant evidence is confronted with the difficulty that in mul-tispeaker data sets the frequency values of the different formants (of which F2 and F3 are most useful forensically) are to some extent correlated, which precludes a procedure by which the Likelihood Ra-tios for different formants are simply multiplied (likewise, there are correlations between the formant values of different vowels). As a solution to this problem, Rose used multivariate kernel density LRs, as proposed by Aitken and Lucy (2004). One important message from Rose’s research is that the LR for vowel center frequencies are not very high when treated separately for different formants and dif-ferent vowel categories but that, instead, the real potential lies in the combination of information from different formants and vowels. This combinatory approach, however, requires sophisticated statistical modelling that takes into account the correlations between the frequency values of different formants and vowels.

Two other directions in formant frequency analysis are the use of dynamic formant information (the changes of formant frequencies over time) and the use of Long Term Formant Distribution. Re-cent research on formant dynamics has been performed by McDougall (2007) on spontaneous speech based on the DyViS database, which was compiled recently at Cambridge University (see www.ling.cam.ac.uk/dyvis for further information on that project and publications by its contributors). So far this research has shown that the speaker-distinguishing potential of formant dynamics is high, but that dynamic formant patterns also vary speaker-internally with segmental and prosodic contextual conditions that occur in spontaneous speech. These sources of intraspeaker variation have to be con-trolled or investigated further in order to increase the practical forensic significance of dynamic formant pattern analysis.

Long Term Formant Distribution (LTF) analysis was proposed by Nolan & Grigoras (2005). Unlike with the methods discussed so far, were individual speech segments were selected for analy-sis, LTF makes use of the entire reliable formant information that is available in a speech file. This information is extracted with the help of formant tracking algorithms. Research on LTF based on lab-speech data and forensic case data is currently conducted at the German Bundeskriminalamt (BKA). The method has the advantage that large amounts of data (entailing large statistical robustness) can be processed in a relatively limited amount of time and that working with languages that are not spo-ken by the expert is no more difficult than working with familiar languages. It has the disadvantage, however, that the resulting formant patterns are not as well explainable as those derived from selected phonological categories, because the formant values of different such categories are merged.

Page 7: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

Recent findings have also been obtained on other features used within the auditory-acoustic ap-proach. For example, Jessen et al. (2005) provided fundamental frequency (f0) data from a corpus of 100 male speakers of German producing read and spontaneous speech at both normal and increased vocal loudness (Lombard speech). Although speakers differ widely in their average f0, average f0 is also strongly influenced speaker-internally by variations in vocal loudness. It turned out that speakers differ in how much they raise f0 for a constant amount of amplitude increase. This result implies that at the current state of knowledge it is not entirely possible to compensate average f0 values across dif-ferent levels of vocal loudness. Recent research shows that perhaps higher speaker-distinguishing power and less intraspeaker variability is obtained if not just average f0 (or standard devia-tion/coefficient of variation) are considered but if the shape of the f0 distribution is modelled more completely and in Bayesian terms as rigorous as those proposed by Rose for formant measurements (Kinoshita et al. 2007). At least since 2004 forensic automatic speaker recognition has outgrown the initial developmental stages and is now a mature speech technological discipline in which there is solid knowledge about the ranges of recognition rates that can be obtained with this method. There also seems to be broad agreement as to which essential components in the three stages feature extraction, feature modelling and the calculation of distances a speaker recognition system must have as well as how the evidence is evaluated in a Bayesian approach to forensic decision making (see Gonzalez-Rodriguez et al. 2006 and Drygajlo 2007 for recent overviews). Forensic automatic speaker recognition differs from general automatic speaker recognition in that in the former case the quality and the quantity of the test or train-ing material often is very limited and that the technical and behavioural conditions might not match between test data, training data, and the reference population. These adverse conditions have to be taken into account when developing and testing automatic speaker recognition systems that are in-tended for forensic use. One important step in testing such systems is the NFI-TNO test, which was the first in which authentic forensic data were used (van Leeuwen et al. 2006).

Most current systems use UBM-MAP-adapted speaker modelling with GMM, based upon MFCC as spectral features that are extracted automatically from the signal about every 10 millisec-onds with about 20 ms window durations (UBM = Universal Background Model; MAP = Maximum A Posteriori; GMM = Gaussian Mixture Models; MFCC = Mel Frequency Cepstral Coefficients). Some systems also use delta and even delta-delta feature values in order to capture dynamic speech pat-terns (cf. the discussion of formant dynamics, above). Alternatives to GMM modelling in text-independent automatic speaker recognition are vector quantization, artificial neural networks and, probably most importantly, support vector machines (Campbell et al. 2006). However, at present there seems to be only limited experience as to how these alternative methods would work in a forensic framework. On the spectral feature level, an alternative to MFCC is RASTA-PLP (Relative Spectral Transform – Perceptual Linear Prediction), as proposed for forensic automatic speaker recognition by the signal processing group at the Swiss Federal Institute of Technology Lausanne (Drygajlo et al. 2003, Alexander et al. 2004). The BKA have adopted the use of RASTA-PLP (in slightly modified form), as well as the normalisation method, from the Lausanne group. Upon carrying out the NFI-TNO test the BKA have found an improvement of recognition rates as compared to the use of MFCC.

In addition to spectral features like MFCC some groups include “higher-level” features. (Within that terminology, “lower-level” features refer to the familiar spectral features such as MFCC.) One example of this approach is the SuperSID project at MIT Lincoln Lab (Campbell et al. 2003). Higher-level features are units and parameters familiar in linguistics and phonetics, such as the sounds and words and their sequencing in a speech sample, f0 and amplitude distributions and their movements over time, segment or syllable durations, as well as pausing behaviour. The general findings are that recognition rates with classical spectral features alone are better than with higher-level features alone, but that the combination of both higher-level and lower-level information leads to the best results. However, many of the higher-level features need more training material than the lower-level features, which can be difficult in forensic cases. On the other hand, higher-level features are generally more robust against technical quality reductions than lower-level features, which makes them interesting forensically.

The use of higher-level features within automatic speaker recognition and the use of analo-gous phonetic and linguistic features within auditory-acoustic speaker identification are related, but not the same approach. Higher-level features within automatic speaker recognition are used under the condition that these features can be extracted automatically. This will ensure fast processing and no need to involve a phonetic expert, but automatic higher-level feature extraction will inevitably make errors, especially with forensic material (e.g. errors in the recognition and segmentation of sounds or

Page 8: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

words or errors in f0 tracking). Within auditory-acoustic speaker identification such errors are avoided by analysing the speech data manually or by correcting the results of automatic routines – both based on the phonetic or linguistic knowledge of an expert. There will have to be discussions as to what the implications of this difference are in a forensic context. Having characterised both the auditory-acoustic and the automatic approach to forensic speaker iden-tification the important question arises if and how these two methods should interact in forensic case-work and the presentation of the results in court. Although hard facts on this issue are hard to come by, a survey carried out in the ENFSI (European Network of Forensic Science Institutes) Working Group for Forensic Speech and Audio Analysis (FSAAWG) has shown that currently most forensics labs apply only one of these methods. The least that should be said about this is that the auditory-acoustic and the automatic approach are in many respects complementary, and that using just one of them misses out on those aspects where the method of choice is weak. For example, automatic speaker recognition can process large amounts of data in a short amount of time, which makes case-work more manageable and – more importantly – gives a very accurate account of the performance of the system when running many speaker comparisons for testing purposes. This potential ties in very well with current demands on quality assurance and evaluation. Because expert analysis takes up more time, it will not be possible to gain such an accurate picture of the performance of the auditory-acoustic method. Another major advantage of automatic speaker recognition is based on the experi-ence that it does not matter much at all which languages are spoken in the case material (e.g. van Leeuwen et al. 2006). It is entirely possible, for example, to process questioned and suspect samples spoken in English when the reference population is in Spanish. Given the increase of casework involv-ing languages foreign to the investigator, which happens in many forensic labs today, this is an impor-tant asset of automatic speaker recognition. For the auditory-acoustic approach, on the other hand, foreign languages inevitably pose additional difficulties that require some collaboration with language specialists or informants.

Proceeding from the opposite side, the auditory-acoustic method has several advantages over the automatic method. One is the explainability of the observations, measurements and results in terms of long established academic disciplines that specifically deal with human language and speech behaviour (esp. phonetics and linguistics). Although automatic speaker recognition has its own scien-tific grounding in disciplines such as signal processing, pattern recognition, statistics and psycho-acoustics, this scientific background is of a more general kind than in acoustic-auditory speaker identi-fication and therefore explanations of the particularities of a case are inevitably of a more general and black-box-like character as well. This aspect of explainability might not matter much for surveillance purposes, where the final results are all that matters, but it can be important in court, when explana-tions are required and the expert witness is expected to be a specialist in all relevant matters concern-ing language and speech. Of course explainability is also important for the development of forensic speaker identification as a scientific discipline in itself. Another advantage of the auditory-acoustic method is its flexibility with regard to cases with qualitative and quantitative limitations. For example, there can be a lot of valuable auditory-acoustic speaker-specific information in a 5-second questioned recording, whereas such a sample would generally be too short for automatic speaker recognition. According to the experience at the BKA, automatic speaker recognition can be applied only in about 1/3 of all voice comparison cases.

It has been tested how the BKA automatic speaker recognition system SPES (Sprechererken-nungssystem = speaker recognition system) behaves when cases are processed both with the audi-tory-acoustic and the automatic method (Bross et al. 2006, Jessen et al. 2007). The experience so far shows that in about 90% of the cases where both methods could be applied the results were congru-ent, i.e. both resulted in identity or non-identity statements. In most of the non-congruent cases auto-matic speaker recognition seems to have reacted too strongly on mismatched or “overmatched” be-havioural and technical conditions. In such cases the system reported non-identity if the conditions of the questioned recordings were very different from those of the suspect recording (mismatch) and it reported identity where those recordings were conditionally similar in ways that differed from most other cases in the reference population (overmatch). This problem with condition mismatch or over-match shows that more work on the compensation of these effects needs to be done (cf. Botti et al. 2004). A different kind of problem occurred in some cases where the system reported identity whereas differences in dialect or other linguistic properties derived by auditory-acoustic analysis strongly sug-gested non-identity. It is clear that at least presently and in the near future automatic systems are un-able to detect most of the subtle linguistic phonetic features that the auditory-acoustic expert will no-tice.

Page 9: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

The complementary nature of automatic and auditory-acoustic speaker identification and the practical advantages of side-by-side application in casework suggests that future research and devel-opment on speaker identification should involve collaborations between linguists/phoneticians and speech technologists. A remarkable example of such research cooperation is found in Gonzalez-Rodriguez et al. (2007). As far as practical casework is concerned, the ideal situation is that the foren-sic lab has both linguistic/phonetic and speech technological forensic experts working together on a voice comparison case. Failing that, the most reasonable alternative is that a linguistic/phonetic expert who carries out the auditory-acoustic analyses is trained in the application of automatic speaker rec-ognition. The alternative of training a speech engineer in linguistics and phonetics is not realistic since full academic education in these disciplines is required. One reason why this is necessary is because it cannot be predicted which aspects of linguistics and phonetics will become relevant in a specific forensic case. Hence a “compact” phonetics/linguistics training course limited to a few weeks is not sufficient.

It would be going too far to make the use of automatic speaker identification a requirement. But practitioners of voice comparisons should be aware that vocal tract characteristics are an impor-tant source of speaker-specific differences. Automatic speaker identification is one method of account-ing for interindividual vocal tract variations. But if another method is used – such as the systematic elicitation and evaluation of formant frequency patterns – the vocal tract aspect of speaker identifica-tion is respected as well. Future research will have to show which way of accounting for vocal tract characteristics is going to be the best one in a forensic context.

• Aitken, C.G.G.; Lucy, D. Evaluation of trace evidence in the form of multivariate data. Applied Statistics 53: 109-122, 2004.

• Alexander, A ; Botti, F ; Dessimoz, D ; Drygajlo, A. The effect of mismatched recording condi-tions on human and automatic speaker recognition in forensic applications. Forensic Sci-ence International 146, Suppl.1: S95-99, 2004.

• Botti, F.; Alexander, A.; Drygajlo, A. On compensation of mismatched recording conditions in the Bayesian approach for forensic automatic speaker recognition. Forensic Science Interna-tional 146, Suppl.1: S101-106, 2004.

• Broeders, A.P.A. Forensic Speech and Audio Analysis, Forensic Linguistics 1998-2001. 13th

INTERPOL Forensic Science Symposium, 2001.

• Broeders, A.P.A. Forensic Speech and Audio Analysis, Forensic Linguistics 2001-2004. 14th

INTERPOL Forensic Science Symposium, 2004.

• Bross, F.; Jessen, M; Köster, J.-P. Some experiences from tests of an automatic speaker rec-

ognition system under forensic conditions. Proceedings of the 4th European Academy of Fo-

rensic Science Conference, Helsinki, 2006. pp. 45-46.

• Byrne, C.; Foulkes, P. The ‘mobile phone effect’ on vowel formants. International Journal of

Speech, Language and the Law 11: 83-102, 2004.

• Campbell, J.P., Reynolds, D.A.; Dunn, R.B. Fusing high- and low-level features for speaker recognition. Proceedings of Eurospeech 2003.

• Campbell, W.M.; Campbell, J.P.; Reynolds, D.A. ; Singer, E.; Torres-Carraquillo, P.A. Support vector machines for speaker and language recognition. Computer Speech and Language 20: 210-229, 2006.

• Drygajlo, A. Forensic automatic speaker recognition. IEEE Signal Processing Magazine 24: 132-135, 2007.

Page 10: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Drygajlo, A., Meuwly, D., Alexander; A. Statistical methods and Bayesian interpretation of

evidence in forensic automatic speaker recognition. Proceedings of Eurospeech 2003, pp.

689-692, 2003.

• French, P.; Harrison, P. “Investigative and evidential applications of forensic speech sci-ence”. In: Witness Testimony, Psychological, Investigative and Evidential Perspectives, Heaton-Armstrong et al. Eds, Oxford: Oxford University Press, pp. 247-262, 2006.

• Gonzalez-Rodriguez, J.; Drygajlo, A.; Ramos-Castro, D.; Garcia-Gomar, M; Ortega-Garcia, J. Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech and Language 20: 331-355, 2006.

• Gonzalez-Rodriguez, J.; Rose P.; Ramos, D; Toledano, D.T.; Ortega-Garcia, J. Emulating DNA: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing 15: 2104-2115, 2007.

• Jessen, M.; Köster, O.; Gfroerer, S. Influence of vocal effort on average and variability of fun-damental frequency. International Journal of Speech, Language and the Law 12:174-213, 2005.

• Jessen, M.; Bross, F.; Gfroerer, S. Developments in automatic speaker recognition at the BKA. Paper presented at the 16

th Annual Conference of the International Association for Forensic

Phonetics and Acoustics, Plymouth, UK, 2007 [www.iafpa.net/confprog07.htm].

• Kinoshita,Y.; Ishihara, S.; Rose, P. Beyond the long-term mean: Multivariate likelihood ratio-based FSR using F0 distribution parameters. Paper presented at the 16

th Annual Conference

of the International Association for Forensic Phonetics and Acoustics, Plymouth, UK, 2007 [www.iafpa.net/confprog07.htm].

• McDougall, K. Distinguishing speakers using formant dynamics in read and spontaneous speech: a study of British English /u:/. Paper presented at the 16

th Annual Conference of the

International Association for Forensic Phonetics and Acoustics, Plymouth, UK, 2007 [www.iafpa.net/confprog07.htm].

• Nolan, F.; Grigoras, C. A case for formant analysis in forensic speaker identification. Interna-tional Journal of Speech, Language and the Law 12:143-173, 2005.

• Rose P. Forensic Speaker Identification. London: Taylor & Francis, 2002.

• Rose, P. Technical forensic speaker recognition: evaluation, types and testing of evidence. Computer Speech and Language 20: 159-191, 2006 (a).

• Rose, P. Accounting for correlation in linguistic-acoustic likelihood ratio-based forensic speaker discrimination. Proceedings of Speaker Odyssey 2006 (b).

• Van Leeuwen, D.A; Martin, A.F.; Przybocki, M.A.; Bouten, J.S. NIST and NFI-TNO evaluations of

automatic speaker recognition. Computer Speech and Language 20, 128-158, 2006.

Forensic Linguistics / Authorship Identification (Sabine Ehrhardt) Authorship identification, authorship attribution, forensic linguistics, forensic stylistics, qualitative ap-proach, quantitative approach, corpus linguistics In a broad sense, forensic linguistics applies linguistic theory and methodology to forms of language in legal and forensic contexts: “The criminal process is full of language events from beginning to end.” (Solan/Tiersma 2005: 13) This also means that forensic linguistics cover a large spectrum of different fields of application (see Shuy 2006, Olsson 2004, Gibbons 2003 and McMenamin 2002).

Page 11: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

Forensic linguistics can generally be subdivided into two distinct areas: comprehension of language and production of language: “First, there is evidence whether a specific person, persons or a class of people could comprehend certain language. Second, there is evidence as to whether a specific per-son, persons or a class of people could produce certain language.” (Gibbons 1994: 320) Comprehen-sibility of language refers to such diverse issues as the interpretation and translation of legal texts, the study of courtroom language and, more specifically, questions arising from disputes about the wording of contracts, insurance policies or trademarks, to name just a few (Solan/Tiersma 2005, Gibbons 2003 and McMenamin 2002).

It is the production of texts that Forensic Linguistics in the narrow sense refers to. The German Bundeskriminalamt (BKA) has been working with forensic linguistics for about 20 years and is one of the very few state-run institutions worldwide that makes extensive use of linguistics in a forensic con-text. The texts that are analysed there are either texts that constitute an offence per se, e.g. extortion letters, threats and libel, or texts that appeared in the context of criminal offences, e.g. written accusa-tions and letters claiming responsibility for offences. Each year, 200 up to 400 texts are sent in for analysis by German police authorities, public prosecution or the courts.

In a qualitative approach, the methodological basis of forensic linguistics is the analysis of er-rors and stylistic features (Dern 2006b, Ehrhardt [to appear in 2007]). Error analysis is carried out with respect to the linguistic levels of punctuation, orthography, morphology, syntax, and vocabulary. It comprises at least three aspects: identification of the error with respect to what the author did wrong and what would be the correct form instead, description of the error with respect to the linguistic level and a description of the error origin. The analysis of stylistic features is more complex as there are no tangible guidelines to go by, thus, different points of view have to be taken into consideration: The choice made by an author needs to be analysed in the light of the alternatives that the author had and it needs to be analysed with regard to factors like, among other things, appropriateness and communi-cative function as well as the deliberate or unconscious deviation from communicative demands and expectations. Due to the fact that stylistic choices are made at various linguistic levels, an analysis of style as diversified as possible has to be carried out, including all aspects of punctuation, orthography, grammar, vocabulary, text layout, and text structure.

The analysis of errors and stylistic features is the basis for three main tasks of forensic linguis-tics, namely text analysis, text comparison, and the management of a text corpus. The term text analysis refers to the interpretation of the findings of error and style analysis with respect to character-istics of the author. Thus, the aim of a text analysis is the categorisation of the author of an incriminat-ing text. Olsson 2004 uses the term authorship profiling but correctly emphasises that “this is not a psychological type of study.” (Olsson, 2004: 98) Instead, the analysis is aimed at “the profile of an individual as an author, rather than the author as an individual, in other words the sum of authorship characteristics which describe the author qua author.” (ibid.) Characteristics of an author that are as-sessed with the help of linguistic analyses are for example an author’s mother tongue, his/her age, linguistic competence, education as well as special knowledge and interests. The seriousness of threats and an author’s sex are not commented on as written texts usually do not allow an interpreta-tion of these aspects with the help of linguistic means. When interpreting the findings of a linguistic analysis one always has to bear in mind that an author might make use of red herrings. Disguise is a common feature among incriminating texts, although authors have only a limited number of manipula-tion strategies at hand, e.g. pretending to be a non-native writer or unsystematically including a lot of orthographical and morphological mistakes (Ehrhardt 2007). Intentional manipulations can be detected by considering aspects like, among other things, the incompatibility of errors and their correct equiva-lents within a text and the combination of an extremely faulty language with a perfect understanding of the text content (Dern 2006a).

The analysis of errors and stylistic features is also the basis for text comparisons. A text com-parison is either the comparison of two or more anonymous incriminating texts with the aim to linguis-tically back up yet unproven correlations between different criminal offences (resemblance model of analysis, [McMenamin 2002: 117]), or it is the comparison of an incriminating text with private writings of one or more suspects in order to assess if one of the suspects has authored the anonymous text (consistency model of analysis [McMenamin 2002:118]). The result of each type of text comparison is a statement of probability. These statements of probability are verbal expressions that are not based on calculations, but on qualitative estimations derived from the analyses of similarities as well as diffe-rences between the texts in question. Despite well-known disadvantages concerning statements of probability, these statements are used in authorship identification to make sure that the expert reports are comparable and transparent over the course of time. It goes without saying that this practice obli-ges the experts to make perfectly clear to the courts, the police and prosecution what is meant by such an expression of probability.

Page 12: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

Besides text analysis and text comparison, the management of a text corpus is the third main task to focus on in forensic linguistics. In Germany, the BKA is required to manage a corpus in accor-dance with the BKA Act (§7 BKAG) as a result of it’s function as a central federal agency for German police authorities. The German National Corpus of Incriminating Texts consists of about 4,500 wri-tings, the length of each ranges from fewer than 40 tokens to more than 10,000 tokens. The corpus is continually expanding, as each case dealt with is added. The work on the text corpus is supported by a special software referred to by the acronym KISTE, short for Kriminaltechnisches Informationssys-tem Texte (‘forensic information system for texts’). It masters such diverse functions as the administra-tion of texts, linguistic processing, extensive search functions, and it has a basic statistical tool. KISTE also processes the data that has been gathered in text analyses and text comparisons. Each text that is sent to the BKA for analysis will routinely be checked with respect to linguistic and offence-related similarities. Furthermore, KISTE plays an important role as a tool to back up statements on the signifi-cance of selected text features with options to take into account factors like type of text, context, and communicative function.

A qualitative approach to forensic linguistics like the one outlined above has certainly its ad-vantages. The analysis of errors and stylistic features to differentiate between authors has proven to be more useful than other approaches when dealing with the peculiarities of incriminating texts. The main reason for this is undoubtedly the shortness of the majority of incriminating texts. Between 2002 and 2005, the average text length was as low as 248 tokens per text with sixty five per cent of incrimi-nating texts consisting of fewer than 200 tokens per text (Ehrhardt, to appear in 2007). Such limited data clearly imposes restrictions on linguistic analysis, and quantitative methods based on frequencies of stylistic features and words are rarely successful with these short texts. For institutions that have to deal with 200 up to 400 texts each year, the disregard of more than half of the texts only for their brev-ity is just not an option.

Of course, this approach is also determined by characteristics of each country’s legal system. In the American legal system, for instance, the admissibility of linguistic evidence in courts is restricted because forensic linguistics does not seem to fulfil the strict Daubert criteria of scientific validity (Solan & Tiersma 2005, Tiersma & Solan 2002). In consequence, this leads to developments of forensic lin-guistics which heavily rely on quantitative approaches and statistics. For example, Chaski (2006 and 2001) has shown for her selection of English texts that the relative frequency of certain syntactic struc-tures and of syntactically classified punctuation marks may cluster and differentiate documents. Unfor-tunately, these findings cannot be generalised because the application of statistical methods (like chi-square test) to linguistics has to take into account the “methodological difficulties of identifying valid and reliable markers of authorship”, as Grant and Baker put it (Grant & Baker 2001: 77, also cf. Olson 2004).

Impressive examples of a very complex approach to forensic linguistics are given by Coulthard (2004 and 2002) who, in his analyses of incriminating texts, combines findings of different linguistic theories and methods like corpus linguistics, discourse analysis, speech act theory and analysis of register. In Coulthard’s view, his approach of general linguistic analysis also meets the Daubert criteria of scientific validity on which American court decisions on the admissibility of linguistic evidence to courts are based (Coulthard 2004: 444).

Coulthard also took part in a collaboration in which computer programs, none of them specifi-cally designed for forensic purposes, were successfully applied to incriminating texts (Woolls & Coulthard 1998). Recognising their value for forensic linguistics, these programs have been further developed in response to different requirements and applications like plagiarism cases among stu-dents and comparisons of large amounts of legal writings (Johnson & Woolls 2006, Woolls 2003). The BKA is currently testing one of these programs by applying it to a specific type of text which is very different from the majority of incriminating texts, namely letters claiming responsibility for politically motivated offences. These texts are usually quite long (well over 1,000 tokens per text), are written in perfect German, and they feature identical parts of vocabulary, in short: they appear to have very little distinctive features. Via rapid lexical comparisons, the aim is to cluster large corpora of texts with re-spect to connections between different extremist groups and writers.

Last of all, there is to mention an aspect of forensic linguistics which is an issue in many coun-tries: determining the national origin of asylum seekers with the help of linguistic analysis (Eades 2005, Eades & Arends 2004). Asylum seekers who have no documentary proof of their nationality are asked to provide language samples (e.g. in form of an interview) which are then linguistically analysed in order to confirm the alleged origin of that person. Due to this international linguistic concern, an international group of linguists negotiated and later published “Guidelines for the use of language analysis in relation to questions of national origin in refugee cases” (Language and National Origin Group 2004).

Page 13: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Chaski, C. E. The computational-linguistic approach to forensic authorship attribution. Pa-per presented at the International Conferences on Language and Law, Düsseldorf, 2006.

• Chaski, C. E. Empirical evaluations of language-based author identification techniques. International Journal of Speech, Language and the Law 8: 1-65, 2001.

• Coulthard, M. Author identification, idiolect and linguistic uniqueness. Applied Linguistics 25: 431-447, 2004.

• Coulthard, M. Whose text is it? On the linguistic investigation of authorship. In: Discourse and Social Life, Sarangi, S., Coulthard, M. Eds., Harlow: Longman, pp. 270-287, 2000.

• Dern, C. Bewertung inkriminierter Schreiben: Zur Problem der Verwischung von Spuren durch Verstellung. Kriminalistik 5: 323-327, 2006a.

• Dern, C.. Autorenerkennung In: Münchener Anwaltshandbuch Strafverteidigung, Widmaier, G. (ed.), München: Beck, pp. 2527-2533, 2006b.

• Eades, D. Applied linguistics and language analysis in asylum seeker cases. Applied Lin-guistics 26: 503-526, 2005.

• Eades, D.; Arends, J. Using language analysis in the determination of national origin of asy-lum seekers. International Journal of Speech, Language, and the Law 11: 179-199, 2004.

• Ehrhardt, S. Forensic linguistics at the German Bundeskriminalamt In: Formal Linguistics and Law, Grewendorf, G., Rathert, M. Eds., Berlin: Mouton de Gruyter, to appear in autumn 2007.

• Ehrhardt, S. Disguise in incriminating Texts: theoretical possibilities and authentic cases. Paper presented at the 8

th Biennial Conference of the International Association of Forensic Lin-

guists, Seattle, 2007.

• Gibbons, J. Forensic Linguistics: An Introduction to Language in the Justice System. Ox-ford: Malden/Mass: Blackwell, 2003.

• Gibbons, J. Ed. Language and the Law. London & New York: Verlag, 1994.

• Grant, T.; Baker, K. Identifying reliable, valid markers or authorship: A response to Chaski. International Journal of Speech, Language, and the Law 8: 66-79, 2001.

• Johnson, A.; Woolls, D. Things fall apart: what happens when students fail to write. Paper presented at the 2

nd European IAFL Conference on Forensic Linguistics / Language and the Law,

Barcelona. 2006.

• Language and National Origin Group. Guidelines for the use of language analysis in relation to questions of national origin in refugee cases. International Journal of Speech, Language, and the Law 11 : 261-266, 2004.

• McMenamin, G. R. Forensic Linguistics: Advances in Forensic Stylistics. Boca Raton & Lon-don: CRC Press, 2002.

• Olsson, J. Forensic Linguistics: An Introduction to Language, Crime and the Law. London & New York: Continuum, 2004.

• Shuy, R. Linguistics in the Courtroom: A Practical Guide. Oxford: University Press, 2006.

• Solan, L. B.; Tiersma, P. M. Speaking of Crime: The Language of Criminal Justice. Chicago: University of Chicago Press, 2005.

Page 14: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Tiersma, P. M.; Solan, L. B. The linguist on the witness stand: forensic linguistics in Ameri-can courts. Language 78: 221-239, 2002.

• Woolls, D. Better tools for the trade and how to use them. International Journal of Speech, Language, and the Law 10: 102-112, 2003.

• Woolls, D.; Coulthard, M. Tools for the trade. International Journal of Speech, Language, and the Law 5: 33-57, 1998.

Imaging and video technology Video surveillance in public area’s (Jurrien Bijhold) In the period of this review, the field of expertise for video analysis has completely changed. Analogue systems have become a rarity, while the use of digital systems has become widespread and new sys-tems with new functionalities and new types of cameras are still being developed. Popular TV-shows about forensic crime scene investigation, and the world wide news event about the identification of the suicide bombers in London on 11 July 2005 with the help of CCTV, have raised expectations from the public in relation to crime prevention and the identification of perpetrators. There are also many ques-tions about the use of surveillance systems in public areas, such as airports, soccer stadiums, public transport, shopping malls and streets, and their effectiveness in the prevention and investigation of crime and terror. Generally, public surveillance systems consist of a large number of camera’s (up to a few hundred), a digital storage system and a monitor room where a number of operators watch the camera images on a wall of video monitors. Images are stored for limited periods of time, (up to a few weeks). The opera-tors respond to what they see in the area under surveillance. In case a crime happened, more informa-tion about what happened might be found in the images that have been recorded. However, most systems do not record images with a conventional video frame rate of 25 images (PAL) or 30 frames (NTSC) per second, but with a much lower frame rate, typically between 1 and 4 images per second. These recordings are referred to as time-lapse recordings. Effectiveness of these systems is being debated because the operators are not actively involved and could be distracted during an incident. Research and development is focused on automated detection of incidents, like fast movements of cars, violent movements of people, formation of groups or crowds of people, movement in area’s where no people or cars should be present (e.g. at night), left luggage, etc. Automated detection can be used to alert the operators and change the frame rate for recording temporary to a higher or normal level. Most detection schemes suffer from false alarms, generated for instance by a flying bird in front of the camera. This has prevented the widespread use of such systems up till now.

• Black, J.; Ellis, T.J.; Makris, D. ,Wide area surveillance with a multi camera network , Confer-ence: Intelligent Distributed Surveillance Systems (IDSS-04), London, UK, 23 Feb. 2004

• Chiu S H; Lu C P; Wen C Y A motion detection-based framework for improving image quality of CCTV security systems JOURNAL OF FORENSIC SCIENCES, (SEP 2006) Vol. 51, No. 5, pp. 1115-1119

• Diamantopoulos, G.; Spann, M., Event detection for intelligent car park video surveillance, Real-Time Imaging (June 2005), vol.11, no.3, p. 233-43

• Desurmont, X.; Bastide, A.; Chaudy, C.; Parisot, C.; Delaigle, J.F.; Macq, B., Image analysis architectures and techniques for intelligent surveillance systems, IEE Proceedings-Vision, Image and Signal Processing (8 April 2005), vol.152, no.2, p. 224-31

• Du You-tian , Chen Feng, Xu Wen-li, Li Yong-bin, A survey on the vision-based human motion recognition, Acta Electronica Sinica (Jan. 2007), vol.35, no.1, p. 84-90

Page 15: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Duque, D.; Santos, H.; Cortez, P. , Prediction of abnormal behaviors for intelligent video sur-veillance systems, Conference: First IEEE Symposium on Computational Intelligence and Data Mining , Honolulu, 1-5 April, 2007

• Fuentes, K.M.; Velastin, S.A., Tracking-based event detection for CCTV systems, Pattern Analysis and Applications (2005), vol.7, no.4, p. 356-64

• Krikke, J. , Intelligent surveillance empowers security analysts, IEEE Intelligent Systems (May-June 2006), vol.21, no.3

• Lipton, A.J.; Clark, J.I.; Brewer, P.; Venetianer, P.L.; Chosak, A.J: activity-based video indexing and retrieval for physical security applications Intelligent Distributed Surveillance Systems (IDSS-04), 2004, p. 56-60 of 70 pp

• Long Xie; Ogawa, M.; Kigawa, Y.; Ogai, H, Intelligent surveillance system design based on independent component analysis and wireless sensor network, Transactions of the Institute of Electrical Engineers of Japan, Part C (2006), vol.126C, no.12, p. 1543-50

• Ming Liang , Xie Guihai , Qi Ziyuan , Wang Xinfeng , Peng Deyun, WLAN-based remote video intelligent surveillance system, Computer Engineering (March 2007), vol.33, no.5, p. 266-8

• Pan Feng; Wang Xuan-yin; Zhejiang Univ., Wang Quan-qiang, Human detection based on head and shoulder feature in intelligent surveillance system, Journal of Zhejiang University (April 2004), vol.38, no.4, p. 397-401

• Weidong Zhang; Feng Chen; Wenli Xu; Enwei Zhang, Real-time video intelligent surveillance system, Conference: 2006 IEEE International Conference on Multimedia and Expo, Toronto, Ont., Canada, 9-12 July 2006

• Xu Junli; Liu Jiwei; Wang Zhiliang; Guo Jianbo, Wireless network based intelligent surveil-lance system design and implementation, Control & Automation (2005), no.6, p. 5-7

Since the image quality is still a compromise with the storage capacity of systems, some research is still being done on compression techniques and fast image processing.

• Bing-Fei Wu; Yen-Lin Chen; Chao-Jung Chen; Chung-Cheng Chiu; Chorng-Yann Su , A real-time wavelet-based video compression approach to intelligent video surveillance systems, In-ternational Journal of Computer Applications in Technology (2006), vol.25, no.1, p. 50-64

• Sato, K., Evans, B.L.; Aggarwal, J.K., Designing an embedded video processing camera us-ing a 16-bit microprocessor for a surveillance system, Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology (Jan. 2006), vol.42, no.1, p. 57-68

In more modern systems, a large number of cameras are available that can be remotely controlled from the operator room. With these cameras, the operator can pan and tilt the camera in a specific direction and they can zoom in on small objects, faces, clothing, and license plates. With such cam-eras, video surveillance can be made a more active process by periodically checking all small area’s in the field of view of these remotely controllable cameras. With these cameras, it is also easier to assist the police in the street by checking objects and persons on immediate request. However, there is also a drawback. When a camera has been zoomed in on an event in the street, another event in the same street might be undetected and not visible in the video recordings of the system.

• Kim, Y.-O.; Paik, J.; Jingu Heo; Koschan, A.; Abidi, B.; Abidi, M., Automatic face region track-ing for highly accurate face recognition in unconstrained environments. Conference: Pro-ceedings IEEE Conference on Advanced Video and Signal Based Surveillance. AVSS 2003, Mi-ami, FL, USA, 21-22 July 2003

• Young-Ouk Kim; Sangjin Kim; Chang-Woo Park; Ha-Gyeong Sung; Joonki Paik, Dynamic re-gion-of-interest acquisition and face tracking for intelligent surveillance system, Proceed-

Page 16: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

ings of the SPIE - The International Society for Optical Engineering (2004), vol.5299, no.1, p. 369-77, Conference: Computational imaging II, San Jose, CA, USA, 19-20 Jan. 2004

Video surveillance has become a popular topic for research institutes and universities. Research and development is focused on the use of combinations of cameras. Overview images from a camera with a wide angle view could be used for detection of faces or events and automated remote control of a second camera to zoom in on the detected face or event. Detection and tracking of moving people or cars in the images of one camera could help the operators predict when a car or person could be visi-ble in another camera. Fully automated tracking of a person or a car in multiple cameras could be possible if it is possible to recognize this person or a car in all images with a high confidence level. It can be expected that in the next years surveillance systems will become available with intelligently cooperating camera’s and new interfaces to assist the operators.

• Deng, L.Y., Extended Petri net model for cooperative video surveillance system, Transac-tions on Information Science and Applications (Oct. 2006), vol.3, no.10, p. 1798-803

• Jung-Hwan Ko; Jun-Ho Lee; Eun-Soo Kim, Adaptive 3D target tracking and surveillance scheme based on pan/tilt-embedded stereo camera system, Proceedings of the SPIE - The International Society for Optical Engineering (2004), vol.5558, no.1, p. 191-201

• Ka Keung Lee; Maolin Yu; Yangsheng Xu, Modeling of human walking trajectories for surveil-lance ,Conference: 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 27-31 Oct. 2003

• Ka Keung Lee; Yangsheng Xu, Boundary modeling in human walking trajectory analysis for surveillance, Conference: 2004 IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, 26 April-1 May 2004

• Kyoung-Mi Lee, Intelligent tracking persons through non-overlapping cameras Conference: Advances in Intelligent Computing. International Conference on Intelligent Computing, ICIC 2005. Proceedings, Part II, Hefei, China, 23-26 Aug. 2005

• Liang Ming; Guihai Xie; Hao Li; Lei Yang , Research on remote intelligent surveillance using wireless network , Conference: Sixth World Congress on Intelligent Control and Automation, Da-lian, China, 21-23 June 2006

• Qureshi, F.Z,.Terzopoulos, D., Towards intelligent camera networks: a virtual vision ap-proach, Conference: Proceedings. 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), Beijing, China, 15-16 Oct.

• Youtian Du; Feng Chen; Wenli Xu; Yongbin Li, Recognizing interaction activities using dy-namic Bayesian network, 18th International Conference on Pattern Recognition, Hong Kong, China, 20-24 Sept. 2006

For the purpose of tracking people in a crowd, alternative methods are being developed that are based on the use of laser scanners that produce images that represent the distance to points of reflec-tion. A comparison between video and laser approaches has not been made yet.

• Jinshi Cui; Hongbin Zha, Huijing Zhao; Shibasaki, R. Robust tracking of multiple people in crowds using laser range scanners, 18th International Conference on Pattern Recognition, Hong Kong, China, 20-24 Sept. 2006

In order to reduce the number of cameras, some companies offer high resolution cameras with a fish eye lens or domes with a number of cameras that can produce panoramic images with a 360 degrees view. With the software for viewing such images, the operator can select a part of this high resolution image that is presented on the monitor as a normal camera view in the direction that the operator has selected. The advantage of such systems is that a complete overview is stored while the operator can focus on events of interest.

Page 17: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Ming-Liang Wang; Chi-Chang Huang; Huei-Yung Lin, An intelligent surveillance system based on an omnidirectional vision sensor, 2006 IEEE Conference on Cybernetics and Intelligent Systems , Bangkok, 7-9 June 2006

• Trivedi, M.M.; Huang, K.S.; Tarak Gandhi; Hall, B.; Harlow, K. ,Distributed omni-video arrays and digital tele-viewer for customized viewing, event detection and notification, Proceed-ings. ITCC 2004. International Conference on Information Technology: Coding and Computing, Las Vegas, NV, USA, 5-7, April 2004

The large amounts of cameras and images require new procedures for the use of this type of evidence in crime investigations. Firstly, all cameras and surveillance systems have to be identified in the area of the investigation. Then, all images have to be retrieved that could be of interest in the investigation. The retrieval of images from modern digital surveillance systems has become a task for specialists in digital evidence, because in practice it often happens that special skills are needed for dealing with all kinds of problems with the file copy process and the software for viewing the images. Finally, the analysis of all images requires a lot of organization, because it is not easy to review all conclusions from an analysis without repeating the whole process. In practice, it often happens that video footage of more than one hour and tens of cameras is analyzed to find out e.g. where the robbers came from and to find a number of images that show sufficient visual information for recognition of the robbers by a witness. If video footage from different surveillance systems is used, synchronization might be a problem. Because the internal clocks of systems are generally not related to a reference time, it is not possible to rely on the time and date stamp in the images. Working groups for specialists in forensic analysis of video, (see last chapter) have produced best practice manuals for dealing properly with all these kinds of problems. The latest development is sometimes referred to as sensor fusion. All available camera images are analyzed or viewed in relation to other information. Some of the big surveillance systems show maps of buildings of cities on which the operator can select cameras for watching. New applications come from possibilities to project other information on these maps like the position of sensors that have produced an alert for trespassing, burglary, fire, noise, GPS information from cars and persons, emer-gency calls, GIS information like e.g., names and phone numbers of shop owners with a video system, company names, altitude, traffic jams, etc.

• Lo, B.P.L.; Jie Sun; Velastin, S.A. , Fusing visual and audio information in a distributed intel-ligent surveillance system for public transport systems, Acta Automatica Sinica (May 2003), vol.29, no.3, p. 393-407

• Perez, O.; Patricio, M.A.; Garcia, J.; Carbo, J.; Molina, J.M. Fusion of surveillance information for visual sensor networks, Conference: 2006 9th International Conference on Information Fu-sion, Florence, Italy, 10-13 July 2006

Formats and Codec’s / file repair (Zeno Geradts) Image formats such as JFIF (often saved using the JPEG or JPG file extension), GIF, PNG are used often on digital systems. The BKA has several tools available for finding, analyzing and repairing such files. They developed the tool PBV as a police viewer for image and video files. JLAB - JPEG toolkit is software for restoration and analysis of corrupt JFIF-data. The program PAT is developed for analysis of EXIF and thumbnails. Other tools that have been developed are PIA : extraction of image data and JPEG-fragments, JPEG-puzzler for visualization of JFIF-fragments and BI, Bitmap Interpreter for visu-alization of bitmap data (information from the Videofree-workshop in Den Haag, 27-28 June 2007 from Silke Terodde, BKA). The NICC in Belgium (Patrick de Smet e.a.) developed software for repairing bit errors & syntax errors in GIF as well as MPEG-4 files. Furthermore they analyzed the PNG format, enabling them to recon-struct of PNG header and compensate for missing palette data in the file. The FBI in the US has developed software to help recover deleted JPEG images from flash memory.

Page 18: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

The format of a video file is the format of the file itself that is used, for example an AVI-file. The CO-DEC is a hardware or software algorithm that enables compression or decompression of the digital video data, for example MPEG-4, MPEG-2 or MPEG-1. Many different CODECs exist. It is sometimes difficult to find the right playback software or CODEC. For this reason currently several parties have built a database of these players (e.g. HSDB and NFI). Video surveillance systems often have pro-prietary CODECs and formats which might be derived from an existing format. Since video exists in many files, including deleted files or damaged files, it is sometimes necessary to repair the video file in order to recover a viewable file. Often the repair will rely on using a Hex editor and trial and error. The exact file format of MPEG formats are described in ISO formats. Documenta-tion for the AVI, ASF, MOV and 3GP formats can be found on the internet. Currently, the NFI has de-veloped open source software called Defraser (http://defraser.sourceforge.com) which helps the user to find, analyze and repair damaged video files. Also, NFI has some in-house Python scripts that may aid in extraction of video from damaged 3GP files. On the repair of video streams the Video File Recovery Experience Exchange (VideoFREE) meeting was organized by Rikkert Zoun at the NFI from 27-28 June 2007 as a joint event of ENFSI-FITWG and ENFSI-DIWG groups.

Image authentication and tampering (Zeno Geradts) In literature several methods are described for determining if an image has been tampered with. An overview of methods is given by (Sturak 2004) , with detection of discrepancies in lighting, brightness levels, edge inconsistencies and anomalies of the JPEG compression. A newer method is described by (Popescu 2005) where forgeries are detected based on color filter array interpolated images. In case a single sensor in conjunction with a color filter array is used, the interpolation is claimed to have specific correlations which are destroyed when tampering the image. Methods for making a tamper proof image with a digital camera can be reached by using low level compression, computing the hash (MD5/SHA-1) of a file and hiding information in an image [KIM]. The methods for watermarking or stegonagraphy are well known, as well as trying to detect these with steganalysis (Fridrich 2005, Pevny 2006). Also detecting watermarking in an image can be important to find the source of an image (Cheng 2004). Also a method is described to analyze in a database of images, if an image has been processed, where the authors claim that they were able to tell if some part of an image has undergone processing (Avcibas 2004, Bayram 2006). A completely different method is determination of the time and date when an audio or video recording was made is based on ENF (Electric Network Frequency)-variations. The author (Grigoras 2007) claims that the variations can be specific for a certain time. Also within the electricity network the varia-tions have been recorded by the laboratory as a reference. These variations are visible on the re-cordings, and in this way, the time and date can be retrieved.

• Avcibas, I.; Bayram, S.; Memon, N.; Ramkumar, M.; Sankur, B. , A classifier design for detect-ing image manipulations , 2004 International Conference on Image Processing (ICIP), Vol. 4, 2004, p. 2645-8 Vol. 4 of 5 vol. (xlii+3550) pp.

• Bayram S , Avcibas I; Sankur B; Memon N, Image manipulation detection, JOURNAL OF ELECTRONIC IMAGING, (OCT-DEC 2006) Vol. 15, No. 4, arn. 041102.

• Hui Cheng A review of video registration methods for watermark detection in digital cinema applications, 2004 IEEE International Symposium on Circuits and Systems, Vol.5, 2004, p. V-704-7.

Page 19: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Friedrich Jessica; Goljan Miroslav; Soukal David; Holotyak Taras, Delp Edward J. (ed.); Ping Wah Wong (ed.) Forensic steganalysis : Determining the stego key in spatial domain steg-anographySecurity, steganography, and watermarking of multimedia contents VII : San Jose CA, 17-20 January 2005, SPIE proceedings series, (2005), 5681, 631-642.

• Grigoras Catalin ; Applications of ENF criterion in forensic audio, video, computer and tele-communication analysis Selected articles of the 4th European Academy of Forensic Science Conference (EAFS 2006) June 13-16, 2006 Helsinki, Finland, Forensic science international, (2007), 167(2-3), 136-145.

• Kim Jongweon; Byun Youngbae; Choi Jonguk Alzawa Kiyoharu (ed.); Nakamura Yuichi (ed.); Satoh Shinichi, Image forensics technology for digital camera, (edImage forensics technology for digital camera Advances in multimedia inforamtion processing - PCM 2004 : Tokyo, 30 No-vember - 3 December 2004. Lecture notes in computer science, (2004), 3331, Vol 3, 331-339, 7 refs.Conference: 5 Pacific Rim conference on multimedia, Tokyo (Japan), 30 Nov 2004

• Pevny, T, Fridrich, J. Determining the stego algorithm for JPEG images, IEE Proceedings-Information Security (11 Sept. 2006), vol.153, no.3, p. 77-86, 29 refs.

• Popescu, A.C.; Farid, H. Exposing digital forgeries in color filter array interpolated images, IEEE Transactions on Signal Processing (Oct. 2005), vol.53, no.10, pt.2, p. 3948-59.

• Sturak, J.R; Forensic Analysis of Digital Image Tampering. Master's thesis Mar-Dec 2004. , Air Force Inst. of Tech., Wright-Patterson AFB, OH. School of Engineering and Management.

(000805012 439106) Number of Report ADA430512/XAB; AFIT/GIA/ENG/04-01

Camera identification (Zeno Geradts) Camera identification can be useful for determining if a certain image has been produced with a cer-tain camera (e.g. in child porn cases). It may also provide useul in determining if the same camera was used to produce two different images (e.g. to link multiple terrorist kidnappings). It can also be used in image authenticity investigation, since pixel artifacts can change from position when copying and pasting images. In the field of camera identification several groups are investigating methods of identification. In these groups, people claim that they can identify a camera based on noise (Lukas 2006). These techniques exploits the non-uniform responsivity of individual detectors within a CCD array that effectively creates a “fingerprint’ of the capture device. In the articles it appears to be sensor element artifacts or artifacts caused by lenses (Choi 2004) and by dust (Emir 2007). For the cameras it remains important to vali-date the randomness of the artifacts. Also methods are described to differentiate between CCD/CMOS brands (Mehdi, 2004). Different techniques exist for classification of cameras. (Bayram, 2006) uses the CFA interpolation, demosaiking is used by (Long, 2006). Statistics from JPEG compression is used by (CHOI, 2006). Other methods such as file headers and footers can also be used to link an image with a camera. Several laboratories are currently building databases with cameras. Often with casework it is neces-sary to use cameras of the same brand and model as a reference for comparison (Geradts 2005). However in cases where no cameras are available a method is described for blind camera classifica-tion (Mehdi 2004, Swamithan 2006, Tsai 2006).

• Bayram, S.; Sencar, H.; Memon, N.; Avcibas, I. , Source camera identification based on CFA interpolation , 2005 International Conference on Image Processing, 2006, p. III-69-72.

• Choi K S; Lam E Y; Wong K K Y, Automatic source camera identification using the in-trinsic lens radial distortion , OPTICS EXPRESS, (27 NOV 2006) Vol. 14, No. 24, pp. 11551-11565.

Page 20: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Kai San Choi; Lam, E.Y.; Wong, K.K.Y , Source camera identification by JPEG compression statistics for image forensics , Optics Express, 27 nov 2006, Vol. 14, No. 24, pp. 11551-11565

• Emir Dirik, H.T. Sencar, N. Memon, Source Camera Identification based on sensor dust char-acteristics, http://isis.poly.edu/~forensics/pubs/safe2007.pdf

• Geradts, Z.J.M.H., Vrijdag, D, Alberink, I et.al., Questions about integrity and authenticity of digital images : a review of case reports from the Netherlands Forensic Institute, Proceeding AAFS 2005 meeting New Orleans, C19, http://www.aafs.org/default.asp?section_id=resources&page_id=proceedings

• Yangjing Long; Yizhen Huang , Image based source camera identification using demosaick-ing Author , 2006 IEEE Workshop on Multimedia Signal Processing (IEEE Cat. No.06EX1454), 2006, p. 6 pp. of CD-ROM pp., 15 refs.

• Lukas, J.; Fridrich, J.; Goljan, M. Digital camera identification from sensor pattern noise, IEEE Transactions on Information Forensics and Security (June 2006), vol.1, no.2, p. 205-14.

• Mehdi, K.L. , Secar, H.T.; Memon, N., Blind source camera identification, 2004 International Conference on Image Processing (ICIP) (IEEE Cat. No.04CH37580), Vol. 1, 2004, p. 709-12 Vol. 1 of 5 vol. (xlii+3550).

• Swaminathan, A, Wu, M., Liu, K.J.R. Component forensics of digital cameras: a non-intrusive approach , 40th Annual Conference on Information Sciences and Systems , 2006

• Min-Jen Tsai; Guan-Hui Using image features to identify camera sources, 2006 IEEE Interna-tional Conference on Acoustics, Speech, and Signal Processing , 2006, p. II-297-300

• Kai San Choi, Lam, E.T., Wong K.K.Y., Source camera identification by JPEG compression statistics for forensic images, TENCON 2006 - 2006 IEEE Region 10 Conference , Hong Kong, 14-17 Nov., 2006.

• Jessica Fridrich, J. Lukáš, Determining Digital Image Origin Using Sensor Imperfections, with J. Lukáš and M. Goljan, Proc. SPIE Electronic Imaging San Jose, CA, January 16-20, pp. 249-260, 2005.

• Jessica Fridrich ,J. Lukáš , Digital Bullet Scratches for Images, Proc. ICIP 2005, Sep. 11-14, 2005, Genova, Italy.

• Jessica Fridrich, J. Lukáš and M. Goljan, Digital Camera Identification from Sensor Noise, IEEE Transactions on Information Security and Forensics, vol. 1(2), pp. 205-214, June 2006

• Jessica Fridrich, J. Lukáš and M. Goljan, Detecting Digital Image Forgeries Using Sensor Pat-tern Noise, Proc. of SPIE Electronic Imaging, Photonics West, January 2006.

• Jessica Fridrich, Mo Chen and M. Goljan, Digital Imaging Sensor Identification (Further Study), Proc. of SPIE Electronic Imaging, Photonics West, January 2007, pp. 0P-0Q.

• Jessica Fridrich, Mo Chen, and M. Goljan, Source Digital Camcorder Identification Using Sen-sor Photo-Response Non-Uniformity, Proc. of SPIE Electronic Imaging, Photonics West, Janu-ary 2007, pp. 1G-1H.

Photogrammetry, Crime scene recording and 3d-modeling Keywords: wide view and close up photography, photogrammetry, laser scanning, panoramic views, 3d-modeling of crime scenes, 3d-models as tools for interpretation of questioned video recordings,

Page 21: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

shooting incidents, car accidents or explosions, scenario testing, visualization of wound patterns and channels, virtual stringing of bloodstain patterns, etc. , 3d-from video

Photogrammetry (Jurrien Bijhold) In the analysis of video, photogrammetry is often used for the estimation of the body length of perpe-trators. Two problems have to be solved for this purpose. The first problem is to get a scaling factor for a distance in the image that is measures as a number of pixels and a physical distance in meters. The second problem is the pose of the perpetrator, the shoes, the hair, and the presence of a hat. For dealing with the first problem, a number of methods have been published in the period of the last re-view (2001-2004). The existing methods can be distinguished in two classes. All methods require an investigation of the scene that is visible in the images in order to get some reference data. One class of methods is based on the application of camera mapping and requires the measurement of coordi-nates of points in the scene that can be identified unambiguously in the images. These measurements can be obtained using tape measures, a set of overlapping wide angle view photographs or a laser range scanner. Another class of methods is based on the application of projective geometry and re-quires at least one distance measurements for calibration and the analysis of straight lines in the scene that can be identified unambiguously with lines in the image. All methods require that the user indicates points or lines in the images on monitor screen. In practice, it often happens that one or both of these methods are not applicable due to a lack of identifiable points or lines. New methods are be-ing developed by (Rudin at al 2005 ,2006) that are focused on automated detection of lines and points in the images and the use of smart markers in reference video recordings. For dealing with the second problem, uncertainties about the pose, the shoes, the hair and caps, prac-tical solutions are chosen, such as using solely images in which the perpetrator is standing straight up, and reporting an estimation for the body length that includes shoes, hair and caps. Some investigators are using computer models of human beings that are superimposed on the images. The pose of these models can be changed to make them look like the perpetrator in the images. One of these ap-proaches has been published by (Lynnerup et al. 2005) In a number of meetings of working groups (ENFSI DIWG) and workshops during conferences (AAFS 2004, 2005, 2006 and 2007, EAFS 2004) the reliability of all methods have been discussed. The Netherlands Forensic Institute has performed a study of most existing methods and the errors that can be made in body height estimation from video. Publications of results are to be expected at the end of 2007 and in 2008. The most important conclusion is that for every case, reference video recordings of people with known lengths are needed to estimate the systematic and random error.

• Lynnerup N; Vedel J ,Person identification by gait analysis and photogrammetry , JOURNAL OF FORENSIC SCIENCES, (Jan 2005) Vol. 50, No. 1, pp. 112-118.

• Larsen, P.K., Hansen, L., Simonsen, E.B., Lynnerup, N. Variability of bodily measures of nor-mally dressed people using PhotoModeler Pro , Proceedings of SPIE - The International Soci-ety for Optical Engineering 6491, art. no. 64910Z

• Rudin Lenny; PingYu; Papadopoulo,Theo Said Amir ; Apostopoulos John G. Geometrical meth-ods for accurate forensic videogrammetry, Part I: Measuring with non-point features, Conference: Image and video communications and processing. Conference, San Jose CA (United States), 18 Jan 2005, SPIE proceedings series, (2005), 5685(p.1), 261-271,

• Rudin L.; Monasse P.; Yu P. Said Amir ; Apostopoulos, John G. Geometrical methods for accu-rate forensic videogrammetry, Part II: Reducing complexity of Cartesian scene measurements via epipolar registration, , Conference: Image and video communications and processing. Confer-ence, San Jose CA (United States), 18 Jan 2005, SPIE proceedings series, (2005), 5685(p.1), 272-283

• Rudin, L.; Monasse, P.; Ping Yu , Epipolar photogrammetry: a novel method for forensic image comparison and measurement 2005 International Conference on Image Processing, 2006, p. III-385-8 of CD-ROM pp

Page 22: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

Modeling and laser scanning of crime scenes (Jurrien Bijhold)

For a detailed description of all possibilities of computer modeling of crime scenes and the technology that is needed to build them, we refer to the review of 2001-2004. So far, the impact of all these new possibilities and technology on the organization and presentation of forensic investigations has been modest. In the period of this review, many institutes have started experiments with the use of laser scanners in forensic investigations. Two reasons for using a laser scanner on a crime scene are often mentioned: (1) a registration of the crime scene in 3d, only to be used in the investigation when other options have been ruled out (2) immediate availability of panoramic image presentations for briefing purposes. The cost of creating 3d-models that can be more easily used from laser scan data is still prohibitive for widespread use of laser scans in forensic investigations. New developments include hand-held laser scanners, instant modeling from video, and 360 degrees video. Forensic papers on the use of 3d-modeling of crime scenes have not been found for the period of this review.

Some institutes have reported the use of 3d-models of buildings and cities for the purpose of security analysis and the preparation of police activities like the protection of political events. These models are usually made from aerial photographs. New sources of information, for these purposes and for 3d-modeling, come from companies that routinely make panoramic images on a number of points along every street in a given area.

At the Netherlands Forensic Institute, a project was started in 2006 that aims at the analysis of all available video recordings for the period just before, during and after a big incident, using a 3d model of the area as a tool for the analysis and the presentation of all results.

• Buck, U., Naether, S., Braun, M., Bolliger, S., Friederich, H., Jackowski, C., Aghayev, E., Thali, M.J. Application of 3D documentation and geometric reconstruction methods in traffic ac-cident analysis: With high resolution surface scanning, radiological MSCT/MRI scanning and real data based animation 2007 Forensic Science International 170 (1), pp. 20-28

Facial Image identification and Earprints

Facial identification (Arnout Ruifrok )

Keywords: Facial recognition, facial comparison, facial reconstruction, face biometrics

Within the context of person identification, different processes can be defined. Recall is here defined as the process of retrieving descriptive information of a person from long term memory in the absence of the person, his/her photograph or other image. Recall requires observation, retention and reproduc-tion of a person’s features. Recall is essential for the production of composite images, as produced by a police artist for investigational purposes. However, these images can only be used as investigative tools, and can never be used as proof of identity. Recognition is can be defined as the process of identifying or matching a person, his/her photograph or image with a mental image that one has previ-ously stored in long term memory. Recognition requires observation and retention of a person’s fea-tures and the process of comparison of the retained information with an external image whether it be the life person, a photograph or composite image. Recognition is important for investigation as well as witness statements. Comparison is defined as the process of identifying or matching a person or his/her image with an-other photograph or image without the use of retained information. For comparison, retention and re-production do not play a role. Reconstruction is the related area concerning the process of recreating the face of an unidentified individual from their skeletal remains through an amalgamation of artistry, forensic science, anthropology, and anatomy. It is easily the most subjective - as well as one of the most controversial - techniques in the field of forensic anthropology. However, new methods making use of 3D modeling techniques and face shape statistics are being developed. Summarizing, recall requires observation, retention, and reproduction of facial information, and is of importance for artists composites, discussed later together with reconstruction, in which the composite is made based on skeletal remains. Recognition requires observation and retention, and is of impor-tance for police lineups or identity parades, not discussed in this chapter. Comparison requires obser-

Page 23: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

vation of two or more images, and does not involve retention or reproduction. Obviously, for forensic purposes with respect to surveillance images we talk about recognition or comparison in case human operators are involved and about comparison in case (automated) biometrics is involved.

Facial image comparison (Arnout Ruifrok ) Key words: familiar faces, unfamiliar faces, anthropometrics, recall, recognition, comparison, recon-struction, working practices

People doing facial image comparison can be found in four different kinds of professions: forensic photographers, forensic anthropologists, video investigators and imaging scientists. Knowledge of anatomy and physiology of the face is needed to get a good interpretation of differences and similari-ties in facial features. Similarities or differences in such images can often be explained by differences in the imaging conditions, pointing to the importance of knowledge about optics. Small facial details can be distorted, and artifacts looking like small details introduced due to noise, pixel sampling and compression, requiring knowledge about image processing for the proper interpretation of observa-tions. Changes in image quality, pose and position, lighting and facial expression greatly influence the comparison process. Therefore, it is strongly recommended that one acquire reference images of the suspect and a number of other people with the same video camera in the same situation under similar lighting conditions. While guidelines and procedures have been developed for forensic comparison of facial photographs from surveillance video, it also has become apparent that these methods for identi-fication have to be used extremely carefully. In general people are thought to be reasonably good at recognizing faces. However, this is for a major part based on the recognition of ‘familiar’ faces, which can be highly effective, but is not applicable to ‘unfamiliar’ faces (Bruce 2001, Hancock 2000, Kemp 1997, O’Toole 2007a). Humans are poor at esti-mation of (relative) size (Vanezis 1996). But even when anthropometric measurements can be applied to the image material of high quality, the measurements fail to accurately identify the targets (Ruifrok 2005, Goos 2006, Kleinberg 2007). Similarly, judgment of (differences in) the shape of facial parts is not consistently performed by different people (Kemp 1997, Henderson 2001), and also judgment of similarities between image pairs is not consistent (Scheuchenpflug 1999). Finally also the use of dif-ferent imaging media (video or picture versus live) negatively influences the comparison process (Bruce 2001). The statistical significance of individual facial features for personal identification has yet to be deter-mined. Where data are available, they are of a limited population, so their applicability to different populations is unknown. Therefore, determining the significance of specific features for their value in identification is still a subjective process, although features as scars and moles can be classified as ‘strongly identifying’ (Bromby 2003, Bowie 2004). The evidential value of surveillance material is severely limited by the quality of most surveillance ma-terial. Quality metrics for visual material and the comparison process are still lacking, and important steps in the process are still subjective. The issues ‘quantification of image quality’, ‘human as a measuring instrument’ and ‘individualizing value of features’ are difficult problems due to the large number of factors involved, the difficulty of measuring man’s functioning as a measuring instrument, and the scope of the investigation needed to test individualizing features. In most cases psychological research concerns recognition of people; as far as we know research with regard to facial comparison is limited. The few papers that have been published on this topic however show that error rates may be 10-20%, depending on the likeness of the images, and even may increase to more than 50% if look-alike photo pairs are used (Kemp 1997, O’Toole 2007a). These reports also show that some of the latest biometric recognition algorithms surpass matching of faces in situations of variable lighting, especially techniques using 3D information of the face (Phillips 2007), and that the fusion of auto-mated algorithms with human judgment may result in near-perfect classification accuracy (O’Toole 2007b). Such reports, however, focus solely on the ability of persons with little or no training in the forensic process to perform such comparisons, and offer no insight into the ability of trained forensic scientists to perform these comparisons according to detailed comparison procedures.

Page 24: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

While guidelines and procedures are being developed for forensic comparison of facial photographs from surveillance video, it also has become apparent that these methods for identification have to be used extremely carefully. New developments can be expected from facial 3d-scans and 3d-data ex-traction from video.

• Bowie, L., Plews, S., Bromby, M. When evidence is a question of image. Law Society Gazette 13 May 2004

• Bromby, M. At face value? The use of facial mapping and CCTV image analysis for identifi-cation. NLJ Expert Witness Suppl. 153,302-304, 2003.

• Bruce,V., Henderson, Z., Burton, A. M. Matching identities of familiar and unfamiliar faces caught on CCTV images. J. Exp. Psychol. Appl. 7, 207-218, 2001.

• Hancock, P.J.B., Bruce, V., Burton. Recognition of unfamiliar faces. Trends in Cognitive Sci-ences 4, 330-337, 2000.

• Henderson, Z, Bruce, V., Burton, A.M. Matching faces of robbers captured on video. Appl. Cogn. Psychol. 15, 445-464, 2001.

• Goos, M., Alberink, I.B., Ruifrok, A.C.C. 2D/3D image (facial) comparison using camera match-ing. Forensic Science International, 136: 10-17, 2006

• Kemp, R., Towell, N., Pike, G. When seeing should not be believing: Photographs, Credit Cards and Fraud. Appl. Cogn Psychol. 11, 211-222, 1997.

• Kleinberg, K.F., Vanezis, P., Burton, A.M. Faillure of anthropometry as a facial identification technique using high-quality photographs. J. Forensic Sci, 52: 779-783, 2007.

• O’Toole, A.J., Phillips, P.J., Jiang, F., Ayyad, J., Penard, N., Abdi, H. Face recognition algo-rithms surpass humans matching faces over changes in illumination. IEEE: Transactions on Pattern Analysis and Machine Intelligence, in press 2007a. http://bbs.utdallas.edu/facelab/docs/pami_2007.pdf.

• O’Toole, A.J., Abdi, H., Jiang, F. & Phillips, P.J. Fusing face recognition algorithms and hu-mans. IEEE: Transactions on Systems, Man & Cybernetics. in press, 2007b. http://bbs.utdallas.edu/facelab/docs/IEEE_SMC-2007.pdf.

• Phillips, P.J., Scruggs, W.T., O’Toole, A.J., Flynn, P.J., Bowyer, K.W., Schott, C.L., Sharpe, M. FRVT 2006 and ICE 2006 Large-Scale Results. http://www.frvt.org/FRVT2006/docs/FRVT2006andICE2006LargeScaleReport.pdf.

• Rösing, F.W. Quality standards for forensic opinions on the identity of living offenders in pictures. 9

th Biennial scientific meeting of the international association for craniofacial identifica-

tion, Washington DC, July 24-28, 2000

• Rösing, F.W. Identification von Personen auf Bildern. Worte für Münchener Anwaltshandbuch Strafverteidigung. Hrg G. Widmaier, CH Beck-Verlag, München 2002.

• Rösing, F.W. Gutachten zur Identität nach bildern. Rapportage 18 november 2003.

• Rösing, F.W. Liste von Merkmalen für die Bild-Identification. Personal communication 2004

• Arnout Ruifrok, A., Scheenstra, A., J., Bijhold, Veldkamp, J. “Facial image comparison using 3D techniques”. In: Facial Reconstruction, Buzug, T.M., Sigl, K.-M., Bingartz, J., Prufer, K. Eds, Luchterhand Publishers, pp 192-198.

Page 25: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Scheuchenpflug, R. Predicting face similarity judgements with a computational model of face space. Acta Psychol. 100,229-242, 1999.

• Vanezis, P, et al. Morphological classification of facial features in adult caucasian males based on an assessment of photographs of 50 subjects. J. Forensic Sci., 41, 786-791, 1996.

Biometric facial recognition ( Arnout Ruifrok )

Key words: 2D recognition, 3D recognition, wanted lists

Biometrics is regularly announced in news items as a panacea against terrorism, security problems, fraud, illegal migration, etcetera. Biometrics, which can be defined as the (automatic) identification or recognition of people based on physiological or behavioral characteristics, is not a single method or technique, but consists of a number of techniques, with each their own advantages and drawbacks. None of the available biometric modalities combines the properties of an ideal biometrics system. We have to acknowledge that biometrics never can be 100% accurate. However, if requirements and ap-plications are carefully considered, biometric systems can provide an important contribution to investi-gation, authentication and safety.

On top of the list of preferred, and in most travel documents required, biometric modalities is the face. The face has always been the most important personal feature on travel documents. The most impor-tant change the last decade is that the face is now also stored digitally in passport, and is optimized for automatic facial recognition. However, even with ISO/IEC 19794-5 compliant images, automated facial recognition is far from perfect. Even the best systems still show a verification Equal Error Rate (EER) of about 5%, and a False Reject Rate of around 10% at a False Accept Rate (FAR) of 1% if ISO/IEC 19794-5 compliant images are used (Phillips 2003 a,b, Phillips 2007). The automated systems are still very sensitive to ageing of the person depicted; the FRR may increase to around 20% at an FAR of 1% if the picture is more than 3 years old. The latest test results indicate that higher resolution and well controlled images may result in a 10-fold better performance (Phillips 2007).

The efficiency of the automated systems decreases significantly if pose and position, lighting and fa-cial expression change. Therefore, in the forensic community is a widespread concern about the public expectations from face recognition. The quantities of questioned images from surveillance systems, web-cams, phone-cams, biometric systems are growing while the quality of the images is getting worse. If for example the face of a wanted criminal or terrorist is on a list of 1000 people, the best sys-tems find around 60% of the suspects at a setting resulting in 1% wrongly suspected people. If the size of the list increases, the number of correct hits increases with about 2-3% points for each dou-bling of the wanted list. (Phillips 2003a). However, it should be noted that these results are from ex-periments using standardized images and cooperating people.

Luckily, current developments in automated recognition systems using 3-dimensional images or mod-els show improvement of the algorithms (Blanz 2003, Kim 2003, Packet 2005), even to a level that the algorithms surpass human performance (O’Toole 2007a). Fusion of automated algorithms with human judgment may even result in near-perfect classification accuracy (O’Toole 2007b).

• Blanz, V., Vetter, T. Face recognition based on fitting a 3D morphable model. IEE Trans. Pat-tern Analysis and Machine Intell. 25, 1-11, 2003

• Kim, Y.-O., Paik, J., Jingu Heo, Koschan, A., Abidi, B., Abidi, M. Automatic face region tracking for highly accurate face recognition in unconstrained environments. Proceedings IEEE Con-ference on Advanced Video and Signal Based Surveillance. AVSS 2003, 2003, p. 29-36.

• O’Toole, A.J., Phillips, P.J., Jiang, F., Ayyad, J., Penard, N., Abdi, H. Face recognition algo-rithms surpass humans matching faces over changes in illumination. IEEE: Transactions on Pattern Analysis and Machine Intelligence, in press 2007a. http://bbs.utdallas.edu/facelab/docs/pami_2007.pdf

• O’Toole, A.J., Abdi, H., Jiang, F. & Phillips, P.J. Fusing face recognition algorithms and hu-mans. IEEE: Transactions on Systems, Man & Cybernetics. in press, 2007b. http://bbs.utdallas.edu/facelab/docs/IEEE_SMC-2007.pdf

Page 26: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Paquet, E.; Rioux, M. Robust recognition of 3-D faces based on analytic forms and spectral analysis Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis (IEEE Cat. No. 05EX1094), 2005, p. 173-8

• Phillips, J.P., Grother,P., Micheals, R.J., Blackburn, D.M., Tabassi, E., Bone, M. Face recognition vendor test 2002. Evaluation report. 2003a. http://www.frvt.org/DLs/FRVT_2002_Evaluation_Report.pdf

• Phillips, P.J., Grother, P., Micheals, R.J., Blackburn, D.M., Tabassi, E., Bone, M. Face recogni-tion vendor test 2002. Overview and summary. 2003b. http://www.frvt.org/DLs/FRVT_2002_Overview_and_Summary.pdf.

• Phillips, P.J., Scruggs, W.T., O’Toole, A.J., Flynn, P.J., Bowyer, K.W., Schott, C.L., Sharpe, M. FRVT 2006 and ICE 2006 Large-Scale Results. 2007. http://www.frvt.org/FRVT2006/docs/FRVT2006andICE2006LargeScaleReport.pdf

Facial reconstruction (Arnout Ruifrok )

Key words: facial reconstruction, ageing, composition photography

Composition drawing or photography is being done (police) artists to generate face images from wit-nesses’ memory. Face reconstruction from skulls or dead bodies is done by forensic anthropolo-gists/artists for the purpose of finding people that might be able to help identifying the remains of a person. Both techniques concern the construction of an image or sculpture representing a person, without that person actually being present. For that, a combination of artistry, forensic science, anthropology, and anatomy is needed. Statistical data of facial features and their change in a life time, like for image comparison, are important for these techniques. Currently, computer aided techniques are being developed to objectively perform facial reconstruction (Clement 2005). These techniques are applicable to the making of computer compositions (Blanz 2006), ageing (Hill, 2004, 2005, Scandrett 2006) and facial reconstruction based on skeletal and soft tissue measurement data (Claes 2006).

• Clement, J.G., Marks, M.K. Computer Graphic Facial Reconstruction. Clement, J.G., Marks, M.K. eds Elsevier Academic Press, Amsterdam, 2005.

• Blanz, V., Albrecht, I., Haber, J., Seidel, H.-P. Creating face models from vague mental im-ages. Computer Graphics Forum (2006), vol.25, no.3, p. 645-54.

• Scandrett, C.M.; Solomon, C.J.; Gibson, S.J. Towards a semi-automatic method for the statis-tically rigorous ageing of the human face.IEE Proceedings-Vision, Image and Signal Process-ing (12 Oct. 2006), vol.153, no.5, p. 639-49.

• Claes, P.; Vandermeulen, D.; De Greef, S.; Willems, G.; Suetens, P. Statistically deformable face models for cranio-facial reconstruction. Journal of Computing and Information Technology - CIT (March 2006), vol.14, no.1, p. 21-30,

• Hill, C.M.; Solomon, C.J.; Gibson, S.J. Plausible aging of the human face using a statistical model.Seventh IASTED International Conference on Computer Graphics and Imaging, 2004, p. 56-60

• Hill, C.M.; Solomon, C.J.; Gibson, S.J. Aging the human face - a statistically rigorous ap-proach. IEE International Symposium on Imaging for Crime Detection and Prevention (ICDP 2005), 2005, p. 89-94

Earprints (Ivo Alberink)

Keywords: Earprint comparison, biometrics.

Page 27: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

Within the context of earprint identification (fundamentally different from ear identification on the basis of photographs), two approaches are in use. On the one hand, in anthropological comparison identifi-cation takes place on the basis of defining anthropological features, decided upon by experts. Exper-tise is not tested by means of proficiency testing or the like, and has been called into question regu-larly, e.g. in the Regina vs. Dallagher case (UK). Next to this, there is the possibility of (semi-)automized comparison of earprints, using signal and image analysis. In the EU project FearID (2002-2005) an experimental database of earprints was gathered of over 1200 donors, and research was undertaken in both areas. Results on anthropological comparison can be found in Meijerman (2006) (thesis), results on computerized comparison can be found in Alberink, Ruifrok (2007).

• Alberink I.; Ruifrok A., Performance of the FearID earprint identification system, Forensic sci-

ence international (2007), 166(2-3), 145-154.

• Alberink I.; Ruifrok A.; Kieckhoefer, H., Interoperator test for anatomical annotation of ear-prints, Journal of Forensic Sciences (2006), Volume 51, Number 6, 1246-1254.

• Champod C, Evett IW, Kuchler B. Earmarks as evidence: a critical review. J Forensic Sci 2001;46:1275-84.

• Choras, M, Ear biometrics: feature extraction methods based on geometrical parameters, Przeglad Elektrotechniczny (2006), vol.82, no.12, p. 5-10

• State v. David Wayne Kunze, Court of Appeals Washington, Division 2, 97 Wash.App. 832, 988 P.2d, 977 (1999).

• Ear print catches murderer, http://news.bbc.co.uk/1/hi/uk/235721.stm, BBC online network, De-cember 15, 1998.

• Man convicted of murder by earprint is freed, http://www.timesonline.co.uk/article/0,,1-973291_1,00.html, TIMES ONLINE, January 22, 2004.

• Lucas, G.; Kieckhoefer, H.; Ingleby, M., Monitoring the physical formation of earprints: optical and pressure mapping evidence , Measurement (Dec. 2006), vol.39, no.10, p. 918-35.

• Lugt van der C. Earprint identification. Den Haag: Elsevier Bedrijfsinformatie, 2001.

• Meijerman, L., Sholl, S., De Conti, F., Giacon, M., Van der Lugt, C., Drusini, A., Vanezis, P., Maat, G.J.R., 2004. Exploratory study on classification and individualization of earprints, Forensic Science International 140: 91-99.

• Meijerman, L., Inter- and intra-individual variation in earprints, doctoral thesis, 2006, Barge's Anthropologica, Leiden.

• Meijerman, L., Thean, A., Van der Lugt, C., Maat, G.J.R., 2006. Earprints, In: Thompson, T en Black, S. (eds.), Forensic human identification, an introduction. British Association for Human Identification, CRC Press. Pp 73-84.

• Meijerman, L., Van der Lugt, C., Maat, G.J.R., 2007. Cross-sectional anthropometric study of the external ear, Journal of Forensic Sciences 52(2): 286-293.

Related fields of expertise Keywords: document analysis, pattern recognition in databases, medical imaging

Medical imaging (Jurrien Bijhold)

Page 28: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

A small number of publications on the use of medical imaging in forensic investigations have been found for the period of this review. Two papers deal with virtual autopsy (Dirnhofer 2006 and Levy 2006). Two paper deal with injured skin color analysis (Georgieva 2005 and Shishkin 2006)

• Dirnhofer R; Jackowski C; Vock P; Potter K; Thali M J ,VIRTOPSY: Minimally invasive, imaging-guided virtual autopsy , RADIOGRAPHICS, (SEP-OCT 2006) Vol. 26, No. 5, pp. 1305-1333

• Georgieva, L.; Dimitrova, T., Computer-aided system for the bruise color's recognition , Con-ference: Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing, Varna, Bulgaria, 16-17 June 2005

• Levy Angela D.; Abbott Robert M.; Mallak Craig T.; Getz John M.; Harcke H. Theodore; Champion Howard R.; Pearse Lisa A.,Virtual autopsy : Preliminary experience in high-velocity gunshot wound victims Radiology, (2006), 240(2), 522-528

• Shishkin, Y.Y.; Erofeev, S.V. Computer analysis of the digital images of injured skin sur-face , Pattern Recognition and Image Analysis (Jan.-March 2006), vol.16, no.1, p. 77-8,

• Thali, M.J., Braun, M., Buck, U., Aghayev, E., Jackowski, C., Vock, P., Sonnenschein, M., Dirnhofer, R. VIRTOPSY - Scientific documentation, reconstruction and animation in fo-rensic: Individual and real 3D data based geo-metric approach including optical body/object surface and radiological CT/MRI scanning (2005) Journal of Forensic Sci-ences, 50 (2), pp. 428-442.

Pattern recognition in forensic image databases (Jurrien Bijhold) Since the authors are not actively involved anymore in development and use of forensic databases, the following publications are listed without comments.

• Brown Ross; Pham Binh; De Vel Olivier, Khosla Rajiv (ed.); Howlett Robert J. (ed.); Jain Lakhmi C. (ed.) Design of a digital forensics image mining system, Knowledge-based intelligent in-formation and engineering systems. Part I,III-IV : 9th international conference, KES 2005, Mel-bourne, Australia, September 14-16, 2005

• Hicks, Y.A. , Marshall, D.; Rosin, P.L.; Martin, R.R.; Mann, D.G.; Droop, S.J.M , A model of dia-tom shape and texture for analysis, synthesis and identification, Machine Vision and Ap-plications (Oct. 2006), vol.17, no.5

• Lahouari Ghouti; Ahmed Bouridane; Crookes, D Edge-directed invariant shoeprint image retrieval , Conference: IET Visual Information Engineering (VIE 2006), Bangalore, India, 26-28 Sept. 2006

• Smith, C.L. Profiling toolmarks on forensic ballistics specimens: an experimental approach , Proceedings. 40th Annual 2006 International Carnahan Conference on Security Technology (IEEE Cat. No. 06CH37768), 2006, p. 281-6 of xx+309 pp

• Jin Xie; Kaya, A.; Bain, J.A.; Kumar, B.V.K.V Shallow arc detection in disk surface images for disk forensics, International Conference on Image Processing, 2006, p. III-81-4 of CD-ROM

• Mikkilineni, A.K.; Arslan, O, Pei-Ju Chiang; Kumontoy, R.M.; Allebach, J.P.; Chiu, G.T.-C.; Delp, E.J. Printer forensics using SVM techniques, NIP21. 21st International Conference on Digi-tal Printing Technologies. Final Program and Proceedings, 2005, p. 223-6 of xii+693 pp

Image processing (Jurrien Bijhold) The following list of publication sis given without comments, because of the large diversity of topics.

Page 29: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

• Berger C E H , Koeijer J A; Glas W; Madhuizen H T, Color separation in forensic image proc-essing , JOURNAL OF FORENSIC SCIENCES, (JAN 2006) Vol. 51, No. 1, pp. 100-102.

• Hickman, D.; Goode, A.; (Res. & Dev., Forensic Sci. Service, UK), Gandolfi, P ,Forensic image comparison techniques, IEE International Symposium on Imaging for Crime Detection and Pre-vention (ICDP 2005), 2005, p. 99-103 of 166 pp.

• Funk, W.; Arnold, M.; Busch, C, Munde, A., Evaluation of image compression algorithms for fingerprint and face recognition systems , Proceedings from the Sixth Annual IEEE Systems, Man and Cybernetics (SMC) Information Assurance Workshop (IEEE Cat. No. 05EX1123), 2005, p. 72-8 of xii+469 pp.

• Wen C. Y.; Chen N J. K., Multi-resolution image fusion technique and its application to fo-rensic science, Forensic Science International, (2004), 140(2-3), 217-232

• Jerian, M.; Carrato, S., Paolino, S.; Cervelli, F.; Mattei, A.; garofano, L. , Development of an im-age processing open-source software for video-surveillance devices, IEE International Sym-posium on Imaging for Crime Detection and Prevention (ICDP 2005), 2005, p. 49-54 of 166 pp

Working groups and organizations Keywords: guidelines, proficiency tests, collaborative exercises, quality assurance The development of forensic image analysis has strongly been pushed by the consolidation of a num-ber of relatively new international working groups:

• ENFSIAS: a European group that is focused on all aspects of forensic audio and speech analysis, including linguistics. http://www.enfsi.org/ewg/fsaawg/

• SWGIT: an American group that has produced a lot of guidelines and best practice manuals. http://www.swigit.org

• ENFSIDIWG: a European group that is focused on methods, techniques, education and train-ing. (http://www.forensic.to/webhome/enfsidiwg)

• LEVA : an American group focused on video processing and training (http://www.leva.org)

• EESAG: an Australian-New Zealand group that proficiency tests for video and audio process-ing) http://www.nifs.com.au/eesag/about.html

• AGIB, A working group that is especially focused on facial image comparison was found in Germany: http://www.foto-identifikation.de/ .

• IAFSM, the international association for forensic and security metrology is a new organization that had organized its first meeting in March 26-27, 2007 http://www.iafsm.com

• An active web-based working group working on forensic applications of computer modeling can be found on: http://groups.yahoo.com/group/forensic3d/

American Academy of Forensic Science Within the American Academy of Forensic Science, several sections are active in digital imaging, In 2006 a proposal for a new section Digital Evidence and Multimedia was submitted to the board, and the board of directors approved this proposal in 2007. Since 2003 each year a workshop was organized on Forensic Image and Video processing was or-ganized with the handouts on the methods for face comparison, video restoration, 3D reconstruction, length measurement, photogrammetry and image processing. Also each year a scientific session was organized on this field. Abstracts from the conferences are available from http://www.aafs.org/default.asp?section_id=resources&page_id=proceedings ENFSI Forensic IT Working Group The forensic IT working group of ENFSI handles with digital evidence as such. There exist some over-lap with the Digital Imaging working group, and for that reason joint events are organized. The analysis and repair of video and image formats can be handled in the Forensic IT working group as well as in

Page 30: Forensic audio and visual evidence 2004-2007: A Review€¦ · fields of expertise: Audio analysis, Speaker identification, Forensic linguistics, Video analysis, Photo-grammetry and

Review on Forensic audio and Visual Evidence 2004-2007

15th INTERPOL Forensic Science Symposium, Lyon, France, October 2007

the Digital Imaging Working group. Since most CCTV-systems are digital nowadays, often the question of handling the CCTV system itself is a question of digital evidence. Hard drives and other digital media should be handled in a se-cure way with proper forensic imaging software. The working group organizes training conferences each year. More information is available from http://www.enfsi.eu/ .

ENFSI DIWG and ENFSI FIT have made available a CD-rom from the video FREE workshop in the Hague 2007 SWGIT The Scientific Working Group on Imaging Technology (SWGIT) includes representatives from agen-cies across North America and Europe. Australian agencies have also made frequent contributions. The group addresses imaging issues for law enforcement and intelligence agencies from initial acqui-sition through long-term archiving, including forensic analysis. The most recent SWGIT documents have focused on best practices for authentication of image and video content, archiving image and video data, and verification of image and video integrity. Current work is focused on development of best practices for forensic video analysis and photographic com-parison analysis. Also in development are documents that will assist judges and court officials in un-derstanding some of the technical issues associated with digital imaging technologies.