enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · web...

177
Enhancement of tape recorded voices to facilitate transcription & aural identification: Selected Topics in Forensic Voice Identification Bruce E. Koenig, October 1993 Federal Bureau of Investigation Ongoing law enforcement operations throughout the world are continually capturing the voices of suspects with miniature transmitter/receiver systems, analog and digital on-the-body recorders, telephone intercept devices, and concealed room microphones. Since these recordings are normally utilized for investigative leads and/or legal proceedings, specific speakers must be accurately identified. Voice identifications that occur through self-recognition of one's voice, eye-witness information, surveillance logs, and the use of a person's name in the conversation are usually readily accepted. However; voice identifications that involve listening only and/or laboratory tests are often more difficult to evaluate accurately. To provide a better understanding of these voice comparison topics, two types of aural-only comparisons will be discussed, and an update on the spectrographic technique is included. Aural Identification of Familiar Voices Recognition of familiar voices is a daily occurrence for most people, as they identify spouses, children, coworkers, friends, and business associates after only a few words spoken over the telephone or by hearing them from an adjacent room. This process involves long-term memory, where recognition occurs through a prior knowledge of speech characteristics, including such attributes as accent, speech rate, pronunciation, pitching, vocabulary, and vocal variance (intraspeaker variability). Some of the relevant scientific research, and opinions that address the accuracy of identifying familiar voices include the following: 1

Upload: others

Post on 14-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Enhancement of tape recorded voices to facilitate transcription & aural identification:Selected Topics in Forensic Voice Identification

Bruce E. Koenig, October 1993 Federal Bureau of Investigation

Ongoing law enforcement operations throughout the world are continually capturing the voices of suspects with miniature transmitter/receiver systems, analog and digital on-the-body recorders, telephone intercept devices, and concealed room microphones. Since these recordings are normally utilized for investigative leads and/or legal proceedings, specific speakers must be accurately identified. Voice identifications that occur through self-recognition of one's voice, eye-witness information, surveillance logs, and the use of a person's name in the conversation are usually readily accepted. However; voice identifications that involve listening only and/or laboratory tests are often more difficult to evaluate accurately. To provide a better understanding of these voice comparison topics, two types of aural-only comparisons will be discussed, and an update on the spectrographic technique is included. 

Aural Identification of Familiar Voices Recognition of familiar voices is a daily occurrence for most people, as they identify spouses, children, coworkers, friends, and business associates after only a few words spoken over the telephone or by hearing them from an adjacent room. This process involves long-term memory, where recognition occurs through a prior knowledge of speech characteristics, including such attributes as accent, speech rate, pronunciation, pitching, vocabulary, and vocal variance (intraspeaker variability). Some of the relevant scientific research, and opinions that address the accuracy of identifying familiar voices include the following:

1. Researchers used 7 listeners who were familiar with the 16 chosen speakers through daily contact. The speakers had no pronounced speech defects or accents. Groups of two to eight speech samples of varying lengths were played back to the listeners, which resulted in an identification accuracy of better than 95% for samples lasting from about 1 to 2 seconds. Voice samples were also frequency restricted, but the results reflected only a limited loss of accuracy under conditions normally encountered in law enforcement investigations. In tests involving whispered speech, the duration had to be somewhat greater than three times longer than normal speech samples to obtain equivalent levels of identification (Pollack et al. 1954).

2. Sixteen listeners with no hearing losses, who had known the recorded 10 male coworkers for at least 2 years, were chosen. None of the 10 recorded individuals had either pronounced regional accents or speech abnormalities. When the listeners heard sentences of less than 3 seconds duration from the 10 coworkers, their median accuracy rate of

1

Page 2: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

identification was 98% (range of 92% to 100%). When only a disyllable (e.g., mama) was spoken, the median accuracy rate dropped to 88% (range of 73% to 98%) (Bricker and Pruzansky 1966).

3. In a study of coworkers, recordings were made on different telephone lines of four women and seven men, each talking for 30 seconds to 1 minute on a neutral topic such as the weather. An additional recording was prepared of another male; who was relatively unfamiliar to most of the listeners. The recordings were arranged in a random order and played to 10 of the other coworkers, who were asked to identify the speakers. "All the listeners except one correctly identified all the 11 [coworkers]... The one listener who made an error.. confused two speakers who were not well known to him. Three of the 10 listeners knew [the eighth male, who was not a coworker], and correctly identified him. Of the remaining seven listeners, only two said that they could not recognize this speaker. Five listeners wrongly identified this speaker as..." another one of their coworkers. "It is worth noting that four of the five listeners who made the wrong identification were highly skilled, experienced phoneticians..." with doctoral degrees in the field (Ladefoged 1978). This experiment reflects a 100% identification rate for the coworkers' voices that were well-known to them and an overall average accuracy rate of 96% when the relatively unfamiliar voice was added. 

4. Twenty-four individuals were asked to listen to speech samples of 24 coworkers (15 males and 9 females) whom they had known for several years and 4 speakers unknown to the listeners. The speech samples averaged about 30 seconds in length and contained at least 12 utterances of 2 to 4 words each. Listeners rated each coworker on a scale of very familiar to totally unfamiliar prior to the testing. They listened to the samples for as long as they wished and then rated their decisions as follows: (1) guessing, (2) fairly sure, or (3) very sure. Deleting the results of any voice rated totally unfamiliar to the listener, the results showed a 90.4% correct identification rate and 4.3% incorrect identification rate, with 5.3% who said they did not know the speaker. If the 5.3% are deleted, the correct identification rate is 95.4%. "This rate is probably fairly representative of situations where a limited vocabulary is required and can be expected to be even higher in informal conversations where more of the individual speaker's speech habits are present as cues for identification" (Schmidt-Nielson and Stern 1985).

This research reflects that the identification accuracy rate for familiar voice samples lasting 1 second or longer ranged from 92% to 100% and averaged 95% to 100%. Samples recorded through the telephone or other limited bandwidth systems had little effect on accuracy. The effects of noise and loss of high frequency information were studied in another experiment (Clarke et al. 1966) which found that aural speaker identification was only slightly degraded when progressing from high-quality voice samples to typical investigative recordings. It is obvious from everyday experience and the cited research that identifying familiar voices can be an accurate method for

2

Page 3: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

identifying voices recorded in forensic applications, even with the limiting factors of noise and attenuated high frequencies.

Aural comparisons of unfamiliar voice samples rely on short-term memory. For example, a woman receives a number of different telephone inquiries regarding a classified advertisement. She then receives an obscene telephone call, and she tries to remember if any of the voices match. In a judicial proceeding, a judge and/or a jury may have to decide if a particular crucial comment on an investigative recording was spoken by the defendant, who readily admits to saying the other statements attributed to him on the transcript, or to someone else involved in the conversation. Examiners using the spectrographic technique, described later, play back the separate voice samples concurrently on separate devices or computer files with an electronic patching arrangement to allow rapid aural switching between them or by recording short phrases or sentences from each sample on the same recording (Voice Comparison Standards 1991). The de facto study of unfamiliar voice comparisons (Clarke et al. 1966) determined the following:

1. Sentence length over the range of 5 to 11 syllables is not important variable in identification accuracy.

2. Correct identifications decreased from approximately 90% to 80% when the signal-to-noise ratio (SNR) was reduced from 30 decibels (dB) to 0 dB.

3. Correct identifications decreased from approximately 88% to 78% when the frequency response was reduced from 4,500 hertz (Hz) to 1000 Hz.

Since most investigative recordings have a SNR of 10 dB to 40 dB and a frequency response of 2,500 Hz to 5,000 Hz, the range of expected correct identifications of unfamiliar voices would be 78% to 90%, with most identifications in the 78% to 83% range. The use of expert testimony for aural identifications of unfamiliar voices provides no assistance to the court and/or to the jury. The notes of the advisory committee on Rule 901 of the Federal Rules of Evidence appropriately reflect this fact as follows: "Since aural voice identification is not a subject of expert testimony, the requisite familiarity may be acquired either before or after the particular speaking which is the subject of the identification..." (Federal Criminal Code and Rules 1991). Additionally, the voice comparison standards of the International Associationfor Identification (IAI) specifically state that it "... does not support or approve the use of... aural only expert decisions..." for voice comparisons (1991).

Spectrographic Comparisons The spectrographic laboratory technique is the most well-known and possibly the most accurate of the laboratory testing procedures presently available for comparing verbatim voice samples under forensic conditions. However, some scientists believe that aural identifications of very familiar voices are more accurate (Hecker 1971). The spectrographic technique has been described in numerous forensic and scientific

3

Page 4: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

publications, including an overview article published in the Crime Laboratory Digest (Koenig 1986). Therefore, a detailed explanation will not be rendered here; the following paragraphs provide a brief summary of the examination, a review of the new comprehensive standards passed by the IAI, and its status in government and private laboratories. When properly conducted, spectrographic voice identification is a relatively accurate but not conclusive examination for comparing a recorded unknown voice sample with a suspect repeating the identical contextual information over the same type of transmission system (e.g., a local telephone line). The examiner uses both the short-term memory process previously detailed and a spectral pattern comparison between identically spoken sounds on spectrograms.

Figures 1A and 1B are sound spectrograms of different male speakers saying "salt and pepper." 

The horizontal axis represents time, divided into 0.1-second intervals by the short vertical bars near the top, and the vertical axis is frequency, ranging linearly from 80 Hz to 4000 Hz, with horizontal lines every 1000 Hz. The speech energy is reflected in the gray scale from black (highest level) to white (lowest level). The frequency range of the voice is analogous to the range of a musical instrument, where the lowest notes are at the lowest frequency and the highest notes at the highest frequency. The mostly horizontal bands of darkness reflect the vocal resonances and are called formants. The closely spaced vertical striations represent fundamental frequency (voice pitch) or the actual vibrations of the vocal cords. The spectrographic technique requires comparison of identical phrases between the voice samples, with a decision made at one of a number of confidence levels. The scientific support of this examination is limited, and the actual error rate under most investigative conditions is unknown. The research to date indicates that the technique has

4

Page 5: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

a certain error rate that is independent of examiner-induced errors, with errors of false elimination (the voice samples were actually from the same person, but the examination found that they did not match) appreciably higher than false identification (the voice samples were actually from different persons, but the examination found that the samples matched). In July 1991, the Voice Identification and Acoustic Analysis Subcommittee of the IAI passed and published its first set of comprehensive spectrographic voice identification standards. These requirements, which became effective January 1, 1992, for all certified IAI members, include examiner qualifications, evidence handling, preparation of exemplars, preparation of copies, preliminary-examination, preparation of spectrograms, spectrographic/aural analysis, work notes, testimony, certification, and miscellaneous subjects. Table 1 lists the minimum qualifications for spectrographic examiners of the IAI and the FBI and updates a similar table published in an earlier issue of the Crime Laboratory Digest (Koenig 1986). Table 2 is another updated and expanded table from the same article concerning minimum criteria for spectrographic comparisons. Tables 1 and 2 and the previously published tables reflect that the upgraded IAI standards are now appreciably closer to the FBI's criteria. The FBI's standards require higher educational levels, more words for lower confidence decisions, enhancement procedures when needed, and a higher frequency voice range. The most important legal difference is the FBI's policy not to provide testimony on spectrographic comparisons due to the inconclusive nature of the examination and the unknown error rate under specific investigative conditions.

Table 1. Minimum Qualifications for Spectrographic Examiners of the AIA and FBI

Qualification IAI FBIEducation High School Diploma BS Degree

Periodic Hearing Test Yes YesLength of Apprenticeship Usually 2 Years 2 Years

Number of Comparisons Conducted 100 100Attendance at a Spectrographic School Yes Yes

Formal Certification Yes Yes

Table 2. Minimum Criteria for Spectrographic Comparison for the IAI and the FBI

Criteria IAI FBIWords Needed for Highest Confidence Level 20 20

Words Needed for Lowest Confidence Level 10 20Affirming Independent Secong Decision Yes Yes

Original Recording Required Yes YesAllows Testimony Yes No

Completely Verbatim Knon Samples Usually UsuallySpeech Frequency Rate Above 2 KHz Above 2.5 KHz

Accuracy Statement om Report Yes YesEnhancement Proceedures When Needed Optional Yes

5

Page 6: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Speed Correction of All Recordings Yes Yes

Track Determiniation of All Recordings Yes YesAzimuth Alignment Correction Yes Yes

The use of the spectrographic technique since the mid1980s continues to show a steady decline by both government laboratories and private examiners. As of mid-1993, the New York City Police Department and the FBI were the only government laboratories in this country regularly conducting these examinations. The private sector efforts were limited to less than a dozen part-time examiners. Professional meetings in the field have been sparsely attended, and no major spectrographic research is known to be under way. Problems still persist in the spectrographic voice identification field. Examples of these problems include the following: (1) separate sets of certified examiners making high-confidence decisions for both identification and elimination in the same case;1 (2) individuals with no experience, training, or education in the voice identification discipline making conclusive decisions under oath in court; and (3) examiners testifying that an unknown voice is not the defendant's, although admitting their decisions are really inconclusive based upon accepted standards.

Note 1. Los Angeles Board of Civil Service Commisioners. Threat case decided March 25,1992, in which three IAI examiners made an

identification at a high-confidence level, while two IM examiners eliminated the suspect.

Summary and Conclusion Under investigative conditions, individuals can reliably identify voices that are well known to them, but the accuracy rate drops to approximately 78% to 83°/o when unfamiliar voices are compared to known voice samples. The use of expert witnesses does not improve the accuracy rate of aural only voice comparisons. The use of the spectrographic technique continues to decline, even with the establishment of new standards in 1992.

References Bricker, P. D. and Pruzansky, S. Effects of stimulus content and duration on talker identification,  Journal of the Acoustical Society of America (1966) 40:6:1441-1449.

Clarke, F. R., Becker, R. W., and Nixon, J. C. Characteristics that Determine Speaker Recognition. Technical Report ESD-TR-66-636, Electronic Systems Division, US Air Force, 1966.

Compton, A. J. Effects of filtering and vocal duration upon the identification of speakers, aurally, Journal of the AcousticaI Society of America (1963) 35:11:1748-1752.

Federal Criminal Code and Rules. est, St. Paul, MN, 1991, p. 289.

Hecker, M. H. L. Speaker Recognition: An Interpretive Survey of the Literature. American Speech and Hearing Association, Washington, DC, 1971.

Koenig, B. E. Spectrographic voice identification, Crime Laboratory Digest (1986)13:4:105-118.

Ladefoged, P. Expectation affects identification by listening, Language and Speech (1978) 21:4:373-374.

6

Page 7: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Pollack, I., Pickett, J. M., and Sumby, W. H. On the identification of speakers by voice, Journal of the Acoustical Society of America (1954) 26:3:403-406.  

Schmidt-Nielson, A. and Stern, K. R. Identification of known voices as a function of familiarity and narrowband coding, Journal of the Acoustical Society of America (1985) 77:2:658-663.

Voice comparison standards, Journal of Forensic Identification (1991) 41:5:373-392.

7

Page 8: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

The U.S. Federal Bureau of Investigation and the FBI's forensic laboratory The Federal Bureau of Investigation or FBI was created by Attorney General Charles J. Bonaparte in 1908 to serve as the main investigative agency for the United States Department of Justice (USDOJ). When Bonaparte announced that there would be a new investigative unit, it was only a small group of unnamed Special Agents who would be given that role. Since then, the agency has grown into a much larger, internationally recognized agency. Read more about the history of the FBI here.

Today, the FBI investigates all criminal cases in the federal jurisdiction that have not been assigned by Congress to one of the thirty-two other federal law enforcement agencies as well as threats from foreign intelligence or terrorist groups. This includes "applicant matters; civil rights; counterterrorism; foreign counterintelligence; organized crime/drugs; violent crimes and major offenders; and financial crime." View some of the most famous cases in the FBI archives here.

The FBI also provides investigative support and training to local and international law enforcement agencies. The agency often works closely with other law enforcement agencies in the exchange of information to further an investigation. The information gathered from local law enforcement agencies by the FBI is compiled into a set of statistics describing crime in the US and is known as the Uniform Crime Reports (UCR). This data is used to enable all agencies involved with law enforcement to operate in a fashion that maximizes the management of resources and targets specific areas of crime.

The FBI Headquarters are located in Washington, D.C. The agency is headed by the Director, currently Robert S. Mueller, III, who is in charge of organizing the operations of the agency. The Director is appointed by the President for "a term not to exceed ten years." The Senate must confirm the appointment.

Outside of Washington D.C., the FBI has fifty-six field offices, nearly 400 resident agencies or satellite offices, four field installations, and approximately 40 Legal Attaches, which are foreign liaison posts. These offices and the Headquarters combined employ more than 27,000 individuals.

From early in the history of the Federal Bureau of Investigation, The US Government recognized the need to centralize forensics and encourage forensic science. The FBI lab was started in 1932 and, in its first year of operation, performed nearly one thousand forensic examinations. Today the lab performs approximately one million examinations a year and has expanded to include extensive training programs, an annual international symposium, and a program for technical assistance to the forensic community.

The FBI laboratoryThe FBI lab is actually a collection of related specialized laboratories and facilities including:

1

Page 9: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

CODIS – The Combined DNA Index System is a program that facilitates the electronic sharing of information by outside state and local labs. This system provides forensic labs with software which enables them to access databases of convicted offenders, missing persons and unsolved crimes.  With this system, DNA profiles may be exchanged and compared between labs who are trying to link suspects to crime scenes.

NDIS – The National DNA Index System is part of the CODIS system and allows DNA profiles from convicted offenders to be accessible to forensic labs.

IAFIS – The newest database established by the FBI lab, the Integrated Automated Fingerprint Identification System allows latent fingerprint comparisons to be made between labs. IAFIS is the largest database of its kind.

In addition to technical support to state and local labs, the FBI lab offers procedure manuals that help law enforcement officials properly locate and collect physical evidence from a crime scene.  Details on how to report on evidence and photograph crime scenes, and submission of evidence are made available to investigators.  Guidelines for protecting the safety of investigators are also provided.

To visit the FBI Laboratory click here.

Finally, the FBI lab publishes current research and other information relevant to forensic science in a journal which is available online. To view the current issue, click here. To view other publications of the FBI, click here.

2

Page 10: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Voiceprint Identification

Money Laundering and Narcotics Update, Department of Justice, 1988, and The Legal Investigator 1990 by Steve Cain, Lonnie Smrkovski and Mindy

Wilson

Voiceprint identification can be defined as a combination of both aural (listening) and spectrographic (instrumental) comparison of one or more known voices with an unknVoiceprint identification can be defined as a combination of both aural (listening) and spectrographic (instrumental) comparison of one or more known voices with an unknown voice for the purpose of identification or elimination. Developed by Bell Laboratories in the late 1940s for military intelligence purposes, the modern-day forensic utilization of the technique did not start until the late 1960s following its adoption by the Michigan State Police. From 1967 until the present, more than 5,000 law enforcement related voice identification cases have been processed by certified voiceprint examiners.

Voice identification has been used in a variety of criminal cases, including murder, rape, extortion, drug smuggling, wagering-gambling investigations, political corruption, money-laundering, tax evasion, burglary, bomb threats, terrorist activities and organized crime activities. It is part of a larger forensic role known as acoustic analyses, which involves tape filtering and enhancement, tape authentication, gunshot acoustics, reconstruction of conversations and the analysis of any other questioned acoustic event.

Theory

The fundamental theory for voice identification rests on the premise that every voice is individually characteristic enough to distinguish it from others through voiceprint analysis. There are two general factors involved in the process of human speech. The first factor in determining voice uniqueness lies in the sizes of the vocal cavities, such as the throat, nasal and oral cavities, and the shape, length and tension of the individual's vocal cords located in the larynx. The vocal cavities are resonators, much like organ pipes, which reinforce some of the overtones produced by the vocal cords, which produce formats or voiceprint bars. The likelihood that two people would have all their vocal cavities the same size and configuration and coupled identically appears very remote.

The second factor in determining voice uniqueness lies in the manner in which the articulators or muscles of speech are manipulated during speech. The articulators include the lips, teeth, tongue, soft palate and jaw muscles whose controlled interplay produces intelligible speech. Intelligible speech is developed by the random learning process of imitating others who are communicating. The likelihood that two people could develop identical use patterns of their articulators also appears very remote.

1

Page 11: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Therefore, the chance that two speakers would have identical vocal cavity dimensions and configurations coupled with identical articulator use patterns appears extremely remote. While there have been claims that sever al voices have been found to be indistinguishable, no evidence to support such allegations has been published, offered for examination or demonstrated to the authors.

Several studies have been published evidencing the ability to reliably identify voices under certain conditions, and a Federal Bureau of Investigation survey of its own performance in the examination of 2,000 forensic cases revealed an error rate of 0.31 percent for false identifications, and 0.53 per cent for false eliminations. (See Koenig, B.E., 1986, Spectrographic Voice Identification: a forensic survey, Journal of the Acoustical Society of America, 79:2088-2090.)

While there is disagreement in the so-called "scientific community" on the degree of accuracy with which examiners can identify speakers under all conditions, there is agreement that voices can, in fact, be identified.

To facilitate the visual comparisons of voices, a sound spectrograph is used to analyze the complex speech wave form into a pictorial display on what is referred to as a spectrogram. The spectrogram displays the speech signal with the time along the horizontal axis, frequency on the vertical axis, and relative amplitude indicated by the degree of gray shading on the display. The resonance of the speaker's voice is displayed in the form of vertical signal impressions or markings for consonant sounds, and horizontal bars or formants for vowel sounds. The visible configurations displayed are characteristic of the articulation involved for the speaker producing the words and phrases. The spectrograms serve as a permanent record of the words spoken and facilitate the visual comparison of similar words spoken between and unknown and known speaker's voice.

Procedural Guidelines

The acoustic environment in many cases can be controlled at the receiving end of speech signal. Shutting off the radio, television or other signal- noise generating devices will reduce or eliminate unwanted background speech or noise. While not always possible, the investigator should at tempt to select a reasonably quiet environment for controlled activities such as drug buys or other illegal operations being investigated. Many times these types of activities are carried out in bars, restaurants, car washes, billiard rooms and the like, and the investigator cannot always dictate the location.

It may require the recording of telephone conversations or face-to-face encounters under a variety of acoustic conditions in which someone is wearing a body recorder or transmitting the conversation via radio frequency to a remote location. Unfortunately, in many cases the investigators cannot control the acoustic environment. In situations involving an adverse environment, investigators should use high technology stereo equipment to optimize recording capability.

2

Page 12: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

The attempt to produce samples as parallel to the unknown as possible actually assists the examiner in his task because speaker variables are reduced to a minimum. Numerous studies have been conducted that indicate very reliable decisions can be made by trained professional examiners when samples are obtained in the manner described.

The notion proposed by some opponents that duplicating the unknown as closely as possible may cause error is not supported by any available evidence. Research studies have produced strong evidence that even very good mimics cannot duplicate an- other's speech patterns.

In an attempt to obtain proper speech samples, investigators should not hesitate to ask suspects for the samples they need. Surprisingly, many suspects will voluntarily give a sample of their voice for comparison purposes.

In the event you are dealing with some type of vocal' disguise, attempt to obtain a similarly produced known exemplar in addition to the suspect's normal voice. It should be noted that vocal disguises can be very difficult for the examiner to deal with and the probability of determination is less than with normal voice samples.

If a suspect refuses to cooperate with the investigator, a court order may be acquired compelling the suspect to produce voice recordings for the purpose of comparison. Courts have repeatedly held that requiring the accused to submit voice exemplars for the purpose of comparison for identification or elimination does not violate the suspect's Fifth Amendment rights. In Wade, 388 U.S. 218 (1967), the Court held that the privilege against self-incrimination offers no protection from compulsion to submit to speaking for purpose of voice identification, or to writing, photographing, finger- printing and measurements.

Several problems have been en countered in obtaining known voice exemplars even with the use of a court order. If the court order is vague, the suspect may utter a few words of the text involved, speak too softly, too fast, or too slowly, or otherwise disguise the sample and claim compliance with the order.

To prevent such problems, the investigator is wise to request that the court order specify in detail, that the suspect give a sample of his or her voice, repeating the phrases of the questioned call in a natural conversational voice (or in a similar disguise, if that is the case) and that such sample shall be given at least three times and to the reasonable satisfaction of the investigator. Voice exemplars obtained with such specific instructions are usually very satisfactory for comparison purposes.

Before terminating the recording session, check the recording to deter mine whether or not a satisfactory exemplar was obtained.' Remember that once a suspect is released, a second known sample may be very difficult to obtain.

Whatever the recording circum stance, background noise and the distance between the talker and the receiving device should be minimized for optimal recording. Good quality tape recording equipment should be used, as well as

3

Page 13: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

magnetic recording tape. As a rule of thumb, recording tape with standard 120 equalization, normal bias and no more than a 5 dB drop at 6 KHz should be used.

After the development of a suspect, the next task is to properly obtain known voice samples for comparison purposes. Do not hesitate to ask a suspect for a speech sample. If the suspect refuses, a court order may be obtained requiring compliance with the request. See Schmerber v. California, 384 US. 757(1966). and Gilbert v. California, 388 US. 263 (1967). Both are landmark cases. There are also many additional decisions at both state and federal court levels that may be cited to support such a request. Court orders should clearly spell out the minimum number of samples to be obtained, the manner of speech, and the method to be employed.

The next task for the investigator is to obtain proper speech samples for comparison purposes. Probably the best guide here is attempting to duplicate the recording of the questioned call. Known samples should be obtained via the telephone and recorded in the same manner as the questioned call. If possible, the same recorder and telephone pickup should be used. In some cases, even the same telephone has been employed. If there is room on the questioned tape, the known sample may be placed on it. If there is not, another tape of the same type and brand should be used if at all possible.

Speech samples obtained should contain exactly the same words and phrases as those in the questioned sample because only like speech sounds are used for comparison. Be cause the voice, like handwriting, is dynamic and variant, several samples of each spoken phrase are desired for analysis. Unless the questioned call sounds like a read statement, the suspect should not be allowed to read the phrases from a transcript but should repeat each phrase after it is spoken by someone else. To avoid an unnatural verbal response, the suspect should repeat the first phrase and proceed in the same manner with each successive phrase.

When all phrases have been record ed, the same procedure should be repeated at least two more times beginning with the first word or phrase. The suspect may be asked to read the phrases if a very poor job of repeating is done. Some people do a better job of reading than repeating the phrases.

It is important that the known sample be spoken in the same manner as the questioned sample; therefore, the investigator should be familiar with the voice, manner of speech and the text. If the caller's voice was disguised, the suspect should give a normal sample and a disguised one as in the questioned call.

Recorded evidence should be wrapped in tinfoil to protect it from possible contact with a magnetic field if it is submitted by mail. The evidence should be shipped in a secure container that will prevent the evidence from tearing through the packaging material. Do not submit a copy of your investigative report with the evidence. The examiner does not want to know the details of the case. It is important, however, to provide the examiner with information regarding the

4

Page 14: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

recording method, the number of calls and suspects involved, and any other information that may assist the examiner in the examination of the evidence.

Upon receipt of the evidence by the laboratory, it is properly marked and a case number is assigned. The analysis and comparison of known and questioned voice samples may take several hours or days to complete, depending on the number of samples involved and the complexity of the examination. Both an aural (listening) and visual (spectrographic) examination and comparison is conducted. Aural and spectrographic cues examined should compliment one another in the event the voices are in fact the same.

As with the identification of finger prints, there is presently no universal standard for the number of words required for identification. It does, how ever, vary from a minimum of 10 for some agencies and 20 for others. The Internal Revenue Service has chose to use 20 or more like speech sounds between an unknown and known sample with the degree of certainty based on quality and excellence of the evidence examined. Obtaining a second, independent decision is standard practice in this field as in other forensic sciences.

Visual comparison of spectrograms involves, in general, the examination of spectrograph features of like sounds as portrayed in spectrograms in terms of time, frequency and amplitude. Specific features, the result of producing consonants, vowels and semi-vowels in isolation or in combination (coarticulation), include the following but certainiy not all-inclusive clues: pitch, bandwidth, mean frequency, trajectory of vowel formants, distribution of formant energy, nasal resonance, stops, plosives, fricatives, pauses, inter formant features and other idiosyncratic and pathological features.

Special aural comparison tapes are prepared facilitating comparison of psycholinguistic features via short-term memory. Aural cues compared include resonance quality, pitch, temporal factors, inflection, dialect, articulation, syllable grouping, breath pattern, disguise, pathologies and other peculiar speech characteristics.

Some agencies offer court testimony, others do not. The IRS laboratory is the only federal agency that presently offers testimony. All other certified examiners, whether in state agencies or in private practice, also offer court testimony.

Court AdmissibilityCourt testimony involving aural- spectrographic voice comparison essentially started having an impact on the courts after the Tosi Study in December 1970. Since then there have been between 150 and 200 trials in local, state or federal courts. Because of a difference based on evidentiary philosophical reasons, some courts have admitted aural-spectrographic voice evidence and others have not.

5

Page 15: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

There are two general "rules" or "standards" by which scientific evidence is accepted in courts of law in the United States. The first, commonly referred to as the Frye "rule" or "test," is based on a 1923 District of Columbia case and basically requires "general acceptance in the particular field in which it belongs." See Frye v. United States, 54 App. D.C. 46, 293 F. 1013 (1923). The second is based on the argument of McCormick (See "McCormick on Evidence," 3rd Ed., 203 at 608.) McCormick states: "General scientific acceptance is a proper condition for taking judicial notice of scientific facts, but it is not a suitable criterion for the admissibility of scientific evidence. Any relevant conclusion supported by a qualified expert witness should be received unless there are distinct reasons for exclusion." See Rule 702 of the Federal Rules of Evidence.

Many state and federal courts have abandoned Frye and adopted the argument of McCormick. The supreme courts of Minnesota, Maine, Ohio and Rhode Island have admitted aural-spectrographic voice evidence following McCormick. Intermediate appellate courts in California, Mary land and Michigan admitted such evidence following Frye but were reversed by their respective supreme courts, which held that the Frye test had not been met. The Massachusetts Supreme Court held aural-spectrographic voice evidence admissible applying the Frye test, while those of Arizona, Indiana and Pensylvania did not.

In the federal court system, we are aware of 30 trials in which the question of aural-spectrographic voice evidence was addressed. All but three admitted the evidence based on Frye or McCormick. On appeal, the Second, Fourth and Sixth Circuits held the evidence admissible, applying McCormick, while the District of Columbia did not, applying Frye. See United States v. Williams, 583 F.2d 1194 (2d Cir.), cert. denied 439 US.

1117 (1978); United States v. Bailer, 519 F.2d 463 (4th Cir.), cert. denied

423 US. 1019 (1975); United States v. Franks, 511 F.2d 25 (6th Cir.) cert. denie4 422 US. 1042 (1975), and United States v. McDaniel, 538 F.2d 408 (D.C. Cir. 1976).

In United States v. Williams, supra at 1198, the court said: "The 'Frye' test is usually construed as necessitating a survey and categorization of the subjective views of a number of scientists, assuring thereby a reserve of experts available to testify. Difficulty in applying the 'Frye' test has led a number of courts to its implicit modification." Also see United States v. Bailer, supra at n.6.

Since 1970, the forensic application of aural-spectrographic voice identification has been reliably applied in the investigation of several thousand cases. While there is disagreement on the reliability of the method under all conditions, there is agreement that voices can be identified and eliminated when the proper conditions exist and the analysis is carefully conducted by qualified examiners.

Several state appellate and supreme courts have admitted the evidence, as have three of four federal appellate courts. The United States Supreme Court has

6

Page 16: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

refused to review and decide the three cases brought before it. While the admission of aural-spectrographic voice evidence continues to be decided in various courts, the method continues to be a very important tool m the arsenal against crime.

Other areas of acoustic analysis include, in part, gun shot analysis, tape enhancement and tape authentication. While not discussed in this article, it should be noted that laboratory analysis related to these problems is avail able in some laboratories.

NDAA Bulletin December 1993

7

Page 17: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

VOICE IDENTIFICATION: The Aural/Spectrographic Method

by: Michael C. McDermott ([email protected]), Tom Owen ([email protected]), Frank M. McDermott, Ltd.Owl Investigations, Inc.

Table of Contents:

I.INTRODUCTION

II.The Sound Spectrograph

III.The Method of Voice Identification

IV.History

V.Standards of Admissibility

VI.Research Studies

VII.Conclusion

VIII.Table of Cases

IX.Appendix 1

© 1996 Owl Investigations, Inc.

INTRODUCTION

The forensic science of voice identification has come a long way from when it was first introduced in the American courts back in the mid 1960's. In the early days of this identification technique there was little research to support the theory that human voices are unique and could be used as a means for identification. There was also no standardization of how an identification was reached, or even training or qualifications necessary to perform the analysis. Voice comparisons were made solely on the pattern analysis of a few commonly used words. Due to the newness of the technique there were only a few people in the world who performed voice identification analysis and were capable of explaining it to a

1

Page 18: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

court. Gradually the process became known to other scientists who voiced concerns, not as to the validity of the analysis, but as to the lack of substantial research demonstrating the reliability of the technique. They felt that the technique should not be used in the courtroom without more documentation. Thus the battle lines were drawn over the admissibility of voice identification evidence with proponents claiming a valid, reliable identification process and opponents claiming more research must be completed before the process should be used in courtrooms.

Today voice identification analysis has matured into a sophisticated identification technique, using the latest technology science has to offer. The research, which is still continuing today, demonstrates the validity and reliability of the process when performed by a trained and certified examiner using established, standardized procedures. Voice identification experts are found all over the world. No longer limited to the visual comparison of a few words, the comparison of human voices now focuses on every aspect of the words spoken; the words themselves, the way the words flow together, and the pauses between them. Both aural and spectrographic analysis are combined to form the conclusion about the identity of the voices in question.

The road to admissibility of voice identification evidence in the courts of the United States has not been without its potholes. Many courts have had to rule on this issue without having access to all the facts. Trial strategies and budgets have resulted in incomplete pictures for the courts. To compound the problem, courts have utilized different standards of admission resulting in different opinions as to the admissibility of voice identification evidence. Even those courts which have claimed to use the same standard of admissibility have interpreted it in a variety of ways resulting in a lack of consistency. Although many courts have denied admission to voice identification evidence, none of the courts excluding the spectrographic evidence have found the technique unreliable. Exclusion has always been based on the fact that the evidence presented did not present a clear picture of the technique's acceptance in the scientific community and as such, the court was reluctant to rely on that evidence. The majority of courts hearing the issue have admitted spectrographic voice identification evidence.

THE SOUND SPECTROGRAPH

The sound spectrograph, an automatic sound wave analyzer, is a basic research instrument used in many laboratories for research studies of sound, music and speech. It has been widely used for the analysis and classification of human speech sounds and in the analysis and treatment of speech and hearing disorders.

The instrument produces a visual representation of a given set of sounds in the parameters of time, frequency and amplitude. The analog spectrograph is composed of four basic parts; (1) a magnetic tape recorder/playback unit, (2) a tape scanning device with a drum which carries the paper to be marked, (3) an electronic variable filter, and (4) an electronic stylus which transfers the analyzed

2

Page 19: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

information to the paper. The analog sound spectrograph samples energy levels in a small frequency range from a magnetic tape recording and marks those energy levels on electrically sensitive paper. This instrument then analyses the next small frequency range and samples and marks the energy levels at that point. This process is repeated until the entire desired frequency range is analyzed for that portion of the recording. The finished product is called a spectrogram and is a graphic depiction of the patterns, in the form of bars or formants, of the acoustical events during the time frame analyzed. The machine will produce a spectrogram in approximately eighty seconds. The spectrogram is in the form of an X,Y graph with the X axis the time dimension, approximately 2.4 seconds in length, and the Y axis the frequency range, usually 0 to 4000 or 8000 Hz. The degree of darkness of the markings indicates the approximate relative amplitude of the energy present for a given frequency and time.

Recent developments in sound spectrography have produced computerized digital sound spectrographs ranging from dedicated digital signal analysis workstations to PC-based systems for acquisition, analysis editing, and playback. These sophisticated computer-based systems provide high fidelity signal acquisition, high- speed digital processing circuitry for quick and flexible analysis, and CD-quality playback. The computerize-based systems accomplish all the same tasks of the analog systems, but with the computer-based systems the examiner gains a host of comparison and measurement tools not available with the analog equipment. The computer-based systems are capable of displaying multiple sound spectrogram, adjusting the time alignment and frequency ranges and taking detailed numeric measurements of the displayed sounds. With these advances in technology, the examiner widens the scope of the analysis to create a more detailed picture of the voice or sound being analyzed.

The accuracy and reliability of the sound spectrograph, either analog or digital, has never been in question in any of the courts and never considered an issue in the admissibility of voice identification evidence. This may be due in part to the wide use of the instrument in the field of speech and hearing for non-voice identification analysis of the human voice and, in part to the fact that given the same recording of speech sounds the sound spectrograph will consistently produce the same spectrogram of that speech.

The contest comes in the interpretation of the spectrograms. Proponents of the aural and spectrographic technique of voice identification base their decisions on the theory that all human voices are different due to the physical uniqueness of the vocal track, the distinctive environmental influences in the learning process of speech development, and the unique development of neurological faculties which are responsible for the production of speech. Opponents claim that not enough research has been completed to validate the theory that intraspeaker variability is less than interspeaker variability.

THE METHOD OF VOICE IDENTIFICATION

3

Page 20: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

The method by which a voice is identified is a multifaceted process requiring the use of both aural and visual senses. In the typical voice identification case the examiner is given several recordings; one or more recordings of the voice to be identified and one or more recorded voice samples of one or more suspects. It is from these recordings the examiner must make the determination about the identity of the unknown voice.

The first step is to evaluate the recording of the unknown voice, checking to make sure the recording has a sufficient amount of speech with which to work and that the quality of the recording is of sufficient clarity in the frequency range required for analysis.1 The volume of the recorded voice signal must be significantly higher than that of the environmental noise. The greater the number of obscuring events, such as noise, music, and other speakers, the longer the sample of speech must be. Some examiners report that they reject as many as sixty percent of the cases submitted to them with one of the main reasons for rejection being the poor quality of the recording of the unknown voice.

Once the unknown voice sample has been determined to be suitable for analysis, the examiner then turns his attention to the voice samples of the suspects. Here also, the recordings must be of sufficient clarity to allow comparison, although at this stage, the recording process is usually so closely controlled that the quality of recording is not a problem.

The examiner can only work with speech samples which are the same as the text of the unknown recording. Under the best of circumstances the suspects will repeat, several times, the text of the recording of the unknown speaker and these words will be recorded in a similar manner to the recording of the unknown speaker. For example, if the recording of the unknown speaker was a bomb threat made to a recorded telephone line then each of the suspects would repeat the threat, word for word, to a recorded telephone line. This will provide the examiner with not only the same speech sounds for comparison but also with valuable information about the way each speech sound completes the transition to the next sound.

There are those times when a voice sample must be obtained without the knowledge of the suspect. It is possible to make an identification from a surreptitious recording but the amount of speech necessary to do the comparison is usually much greater. If the suspect is being engaged in conversation for the purpose of obtaining a voice sample, the conversation must be manipulated in such a way so as to have the suspect repeat as many of the words and phrases found in the text of the unknown recording as possible.

The worst exemplar recordings with which an examiner must work are those of random speech. It is necessary to obtain a large sample of speech to improve the chances of obtaining a sufficient amount of comparable speech.

As in any other form of identification analysis, as the quality of the evidence with which the examiner has to work declines, the greater the amount of evidence and

4

Page 21: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

time necessary to complete the analysis, and the less likely the chance for a positive conclusion.

Once the evidence has been determined to be sufficient to perform the analysis, the examiner then begins the two step process of voice sample comparison; one aural (listening) and the other spectrographic (visual). These are two different but interwoven and equally important analytical methods which the examiner combines to reach the final conclusion. The first step is an aural comparison of the voice samples.2 Here the examiner compares both single speech sounds and series of speech sounds of the known and unknown samples. At this stage the examiner is conducting a number of tasks; comparing for similarities and differences, screening out less useful portions of the samples, and indexing the samples for further analysis. An example of the initial aural comparison is the screening of the samples for pronunciation similarities or discrepancies such as the word "the" may be said with a short "a" sound or a long "e" sound. If the word is not pronounced in the same manner it loses comparison value.

Once the examiner has located those portions to be used for the analysis, a more detailed aural comparison is undertaken. This comparison can be accomplished in many different ways. One of the most commonly used methods of aural comparison is re-recording a speech sound sample of the unknown followed immediately by a re-recording of the same speech sounds of the suspect. This is repeated several times so that the final product is a recording of specific speech sounds, in alternating order, by the unknown speaker followed by the suspect. Such comparisons have been greatly facilitated by the use of audio digital recording equipment which allows for the digital recording, storage, and repeated playback of only the desired speech sounds to be examined.

During the aural comparison the examiner studies the psycholinguistic features of the speakers voice. There are a large number of qualities and traits which are examined from such general traits as accent and dialect to inflection, syllable grouping and breath patterns. The examiner also scrutinizes the samples for signs of speech pathologies and peculiar speech habits.

The second step in the voice identification process is the spectrographic analysis of the recorded samples. The sound spectrograph is an automatic sound wave analyzer with a high quality, fully functional tape recorder. The speech samples to be analyzed are recorded on the sound spectrograph. The recording is then analyzed in two and one half second segments. The product is a spectrogram, a graphic display of the recorded signal on the basis of time and frequency with a general indication of amplitude.

The spectrograms of the unknown speaker are then visually compared to the spectrograms of the suspects. Only those speech sounds which are the same are compared.3 The comparisons of the spectrograms are based on the displayed patterns representing the psychoacoustical features of the captured speech. The examiner studies the bandwidths, mean frequencies, and trajectory of vowel formants; vertical striations, distribution of formant energy and nasal

5

Page 22: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

resonances; stops, plosives and fricatives; interformant features, the relation of all features present as affected during articulatory changes and any peculiar acoustic patterning.4 The examiner looks not only for similarities but also for differences. The differences are closely examined to determine if they are due to pronunciation differences or if they are indicative of different speakers.

When the analysis is complete the examiner integrates his findings from both the aural and spectrographic analyses into one of five standard conclusions; a positive identification, a probable identification, a positive elimination, a probable elimination, or no decision. In order to arrive at a positive identification the examiner must find a minimum of twenty speech sounds which possess sufficient aural and spectrographic similarities. There can be no differences either aural or spectrographic for which there can be no accounting.

The probable identification conclusion is reached when there are less then twenty similarities and no unexplained differences. This conclusion is usually reached when working with small samples, random speech samples or recordings of lower quality. The result of positive elimination is rendered when twenty differences between the samples are found that can not be based on any fact other than different voices having produced the samples. A probable elimination decision is usually reached when working with limited text or a recording of lower quality. The no decision conclusion is used when the quality of the recording is so poor that there is insufficient information with which to work or when there are too few common speech sounds suitable for comparison.

HISTORY

A good place to start examining the history of speech sound analysis goes back a little more than one hundred years to Alexander Melville Bell who developed a visual representation of the spoken word. This visual display of the spoken word conveyed much more information about the pronunciation of that word than the dictionary spelling could ever suggest. His depiction of speech sounds demonstrated the subtle differences with which different people pronounced the same words. This system of speech sound analysis developed by Bell is the phonetic alphabet which he called "visible speech".5 His method of encoding the great variety of speech sounds was by handwritten symbols and was language independent. This code produced a visual representation of speech which could convey to the eye the subtle differences in which words were spoken. This system was used by both Bell and his son, Alexander Graham Bell, in helping deaf people learn to speak.6

It was in the early 1940's that a new method of speech sound analysis was developed. Potter, Kopp & Green, working for Bell Laboratories in Murray Hill, New Jersey, began work on a project to develop a visual representation of speech using a sound spectrograph. This machine, an automatic sound wave analyzer, produced a visual record of speech portraying three parameters; frequency, intensity and time. This research was intensified during World War II when acoustic scientists suggested that enemy radio voices could be identified

6

Page 23: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

by the spectrograms produced by the sound spectrograph. The war ended before the technique could be perfected.

In 1947, Potter, Kopp and Green published their work in a book, the title of which was borrowed from Alexander Melville Bell, Visible Speech. Their work is a comprehensive study of speech spectrograms designed to linguistically interpret visible speech sound patterns. This work was similar to that of Bell's in that speech sounds were encoded into a visual form. The difference is, instead of a pen, Potter, Kopp and Green used a sound spectrograph to produce the visual patterns.

Research in the area of speaker identification slowed dramatically with the end of

World War II. It was not until the late 1950's and early 1960's that the research began again. It was at this time the New York City Police Department was receiving a large number of telephone bomb threats to the airlines.7 At that time Bell Laboratories was asked by law enforcement officers to provide assistance in the apprehension of the individuals making the telephone calls. The task of developing a reliable method of identification of a speaker's voice was given to Lawrence G. Kersta, a physicist at Bell Laboratories who had worked on the early experiments using the sound spectrograph. In two years Kersta had developed a method of identification in which he reported results yielding a correct identification 99.65% of all attempts.8

It was in 1966 that the Michigan State Police began the practical application of the voice identification method in attempting to solve criminal cases. A Voice Identification unit was established and the unit personnel received training from Kersta and other speech scientists. During the first few years the voice identification method was used only as an investigative aid.

The first court of published opinion to rule on the admissibility of voice identification analysis was in the case of United States v. Wright, 17 USCMA 183, 37 CMR 447 (1967). This was a court martial proceeding in which the appellate court affirmed the admission of spectrographic voice identification evidence by the board of review. The lengthy dissent by Judge Ferguson based on the requirements for acceptance of scientific evidence spelled out in Frye v. United States, 293 Fed. 1013 (CA DC Cir) (1923), was the beginning of a controversy which continues today.

The first non-military case to review the admissibility of voice identification evidence was the New Jersey Supreme Court in State v. Cary.9 In this case the court stated that "the physical properties of a person's voice are identifying characteristics".10 The court also noted that trial courts in the states of New York and California have admitted voice identification evidence but that these admissions have not been subject of appellate review.11 The court declined to rule on the admissibility issue and remanded the case to determine if the equipment and technique were sufficiently accurate to provide results admissible as evidence. The Superior Court of New Jersey, on appeal from a denial of

7

Page 24: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

admission after remand, held that the majority of evidence "indicates, not that the technique is not accurate and reliable, but rather that it is just too early to tell and at this time lacks the required scientific acceptance".12 The New Jersey Supreme Court reviewed this decision and once again remanded for additional fact finding "in light of the far-reaching implications of admission of voiceprint evidence".13 The State of New Jersey was unable "to furnish any new and significant evidence" by the third time the New Jersey Supreme Court reviewed this issue and as such affirmed the trial court's opinion excluding voice identification evidence.14

California came to a similar holding when the issue first reached the appellate level in People v. King.15 The State brought in Lawrence Kersta as the voice identification expert to testify as to the reliability of the technique. The defense brought in seven speech scientists and engineers to rebut Kersta's claims. The court held that "Kersta's claims for the accuracy of the `voiceprint' process are founded on theories and conclusions which are not yet substantiated by accepted methods of scientific verification".16 The court cited the Frye test as the proper standard for admissibility.17 The court also left the door open for future admission by saying when voice identification evidence has achieved the necessary degree of acceptance they will welcome its use.18

In State ex rel. Trimble v. Heldman 19, the Supreme Court of Minnesota held that "spectrograms ought to be admissible at least for the purpose of corroborating opinions as to identification by means of ear alone".20 The court was impressed by the testimony of Dr. Oscar Tosi who had previously testified against the use of spectrographic voice identification evidence in courtrooms, but after extensive research and experimentation now described the technique as "extremely reliable".21 The court made reference to the Frye test and to the scientific community's acceptance of Dr. Tosi's study, but did not specifically apply the Frye test as the standard for the admissibility of the voice identification evidence.22 In discussing the issue of admissibility the court held that it was the job of the factfinder to weight the credibility of the evidence.

"The opinion of an expert is admissible, if at all, for the purpose of aiding the jury or the factfinder in a field where he has no particular knowledge or training. The weight and credibility to be given to the opinion of an expert lies with the factfinder. It is no different in this field than in any other".23

In 1972 the third and fourth District Courts of Florida, in separate opinions, held admissible the use of spectrographic voice identification evidence.24 The court in Worley held that the voice identification evidence was admissible to corroborate the defendant's identification by other means. The court stated that the technique had attained the necessary level of scientific reliability required for admission, but since it was only offered as corroborative evidence, the court refused to comment as to whether such evidence alone would be sufficient to sustain the identification and conviction.25

8

Page 25: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

The third District Court of Appeals of Florida did not limit the admission of spectrograph evidence to corroborative status. In the Alea opinion the court does not mention the Frye test as the standard to be used for admission, but rather states that "such testimony is admissible to establish the identity of a suspect as direct and positive proof, although its probative value is a question for the jury".26

In the case of State v. Andretta 27, the New Jersey Supreme Court stated that there was much more support for the admission of spectrographic voice identification evidence than at the time they decided Cary, but refused to address the issue further since the only issue before them was whether the defendant should be compelled to speak for a spectrographic voice analysis.28

In California the Court of Appeal affirmed the trial court's admission of voice identification evidence in the case of Hodo v. Superior Court.29 Here the court found the requirements of Frye had been met in that there was now general acceptance of spectrographic voice identification by recognized experts in the field. The court cited Dr. Tosi's testimony that "those who really are familiar with spectrography, they are accepting the technique".30 Tosi also pointed out that the general population of speech scientists are not familiar with this technique and thus can not form an opinion on it.31

The court in United States v. Samples 32 held that the Frye test of general acceptance precludes too much relevant evidence for purposes of the fact determining process at a revocation of probation hearing and the court allowed the use of spectrographic voice identification evidence to corroborate other identification evidence.33

In 1974 the case of United States v. Addison 34 rejected the admission of voice identification evidence saying that such evidence "is not now sufficiently accepted" and as such the requirements of the Frye test were not met.35 At the trial the court heard from two experts endorsing the technique, Dr. Tosi and a recent convert to the reliability of the technique, Dr. Ladefoged. Only one expert, Dr. Stuart, testified that he was still skeptical of the technique and thought that most of the scientific community was also.36 Although the admission of spectrographic voice identification evidence was held to be error by the trial court, the appellate court refused to overturn the conviction due to overwhelming amount of other evidence supporting the conviction.37

Attempted disguise or mimic were the grounds the California Court of Appeal used to reverse a conviction based in part on spectrographic voice identification in the case of People v. Law.38 The court found that "with respect to disguised and mimicked voices in particular, the prosecution did not carry out its burden of proof to demonstrate that the scientific principles pertaining to spectrographic identification were beyond the experimental and into the demonstrable stage or that the procedure was sufficiently established to have gained general acceptance in the particular field in which it belongs".39 The main concern of the court was that no experimentation had been completed studying the effects of attempts to disguise or mimic on the accuracy of the identification process.

9

Page 26: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Without mentioning the Frye test this court used the standards set in Frye as the test of admissibility although the court seemed to be limiting the scope of the opinion to cases involving disguise or mimic.

In United States v. Franks 40, the Sixth Circuit Court of Appeals held spectrographic voice identification evidence to be admissible. The court said it was "mindful of a considerable area of discretion on the part of the trial judge in admitting or refusing to admit evidence based on scientific processes".41 Quoting from United States v. Stifel 42, the court pointed out that "neither newness nor lack of absolute certainty in a test suffices to render it inadmissible in court. Every useful new development must have its first day in court. And court records are full of the conflicting opinions of doctors, engineers and accountants...".43 The court in Franks found that extensive review was given to the qualifications of the experts and opportunity to cross-examine the experts to determine the proper weight to be given such evidence.

The Massachusetts Supreme Court, in Commonwealth v. Lykus 44, allowed the admission of spectrographic voice identification evidence saying that the opinions of a qualified expert should be received and the considerations similar to those expressed in Frye should be for the fact finder as to the weight and value of the opinions. The court gave greater weight to those experts who had had direct and empirical experience in the field as opposed to those who had only performed a theoretical review of that work.45 The court also stated that "neither infallibility nor unanimous acceptance of the principle need be proved to justify its admission into evidence".46 The Massachusetts Supreme Court again, that same year, found no error in the use of spectrographic voice identification evidence in the case of Commonwealth v. Vitello.47

The Fourth Circuit Court of Appeals, in the case of United States v. Baller 48, allowed the admission of spectrographic voice identification evidence saying unless it is prejudicial or misleading to the jury, it is better to admit relevant scientific evidence in the same manner as other expert testimony and allow its weight to be attacked by cross-examination and refutation.49 The court listed six reasons supporting admission; the expert was a qualified practitioner, evidence in voir dire demonstrated probative value, competent witnesses were available to expose limitations, the defense demonstrated competent cross-examination, the tape recordings were played for the jury, and the jury was told they could disregard the opinion of the voice identification expert.50

Voice identification evidence was admitted by the Sixth Circuit Court of Appeals in United States v. Jenkins 51 using the same logic as in Baller. Here the court said that the issue of admissibility was within the discretion of the trial judge and that once a proper foundation had been laid the trier of fact was able to assign proper weight to the evidence.52

In 1976 the New York Supreme Court pointed out, in the case of People v. Rogers 53, that fifty different trial courts had admitted spectrographic voice identification evidence, as had fourteen out of fifteen U. S. District Court judges,

10

Page 27: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

and only two out of thirty- seven states considering the issue had rejected admission.54 The Rogers court stated that this technique, when accompanied by aural examination and conducted by a qualified examiner, had now reached the level of general scientific acceptance by those who would be expected to be familiar with its use, and as such, has reached the level of scientific acceptance and reliability necessary for admission.55 The court also pointed out that other scientific evidence processes are regularly admitted which as, or less, reliable than spectrographic voice identification; hair and fiber analysis, ballistics, forensic chemistry and serology, and blood alcohol tests.56

The Supreme Court of California finally put an end to the see-saw ride of admissibility in that state in People v. Kelly 57 by rejecting admission because of insufficient showing of support. "Although voiceprint analysis may indeed constitute a reliable and valuable tool in either identifying or eliminating suspects in criminal cases, that fact was not satisfactorily demonstrated in this case".58 In this case the court seemed to have the most trouble with the fact the only expert provided to lay the foundation for admission was the technician who performed the analysis, saying that a single witness can not attest to the views of the scientific community on this new technique and that this witness, who may not be capable of a fair and impartial evaluation of the technique since he has built a career on it, lacked the academic credentials to express an opinion as to the acceptance of the technique by the scientific community.59

In United States v. McDaniel 60, it appears that District of Columbia Circuit Court of Appeals would have liked to admit the spectrographic voice identification evidence but had to reject it because the shadow of the Addison decision of two years past "looms over our consideration of this issue".61 The court held the admission of the voice identification evidence to be harmless error in that the rest of the evidence was overwhelming. The court did recognize the trend toward admissibility and contemplated that it may be time to reexamine the holding of Addison "in light of the apparently increased reliability and general acceptance in the scientific community".62

The Supreme Court of Pennsylvania rejected admission in Commonwealth v. Topa 63 holding that the technician's opinion alone will not suffice to permit the introduction of scientific evidence into a court of law.64 This was the same situation, in fact the same single expert, which confronted the Kelly court.

In People v. Tobey 65 the Michigan Supreme Court found, by applying the Frye test, that the trial court erred in admitting spectrographic voice identification evidence. The court found that neither of the two experts testifying in favor of the technique could be called disinterested and impartial experts in that both had built their reputations and careers on this type of work.66 The court pointed out that not all courts require independent and impartial proof of general scientific acceptability and was quick to add that this decision was not intended in anyway to foreclose the introduction of such evidence in future cases where there is demonstrated solid scientific approval and support of this new method of identification.67

11

Page 28: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

In admitting voice identification evidence, the United States District Court for the Southern District of New York, in United States v. Willaims 68, found that the requirements of the Frye test were met when the technique was performed "by aural comparison and spectrographic analysis".69 The court stated that the concerns of the defendant that this technique had a mystique of scientific precision which may mask the ultimate subjectivity of spectrographic analysis, although they were valid concerns, could be alleviated by action other than suppression of the evidence, such as opposing expert opinion and jury instructions allowing the jury to determine the weight, if any, of the evidence.70

In People v. Collins 71, the Supreme Court of New York rejected admission of spectrographic voice identification evidence saying that the Frye test alone was insufficient to determine admissibility and must be used in conjunction with a test of reliability.72 The court found that the proponents of the technique were in the minority and that the remainder of the relevant scientific community either expressed opposition or expressed no opinion.73

In Brown v. United States 74, the District of Columbia Court of Appeals rejected the use of voice identification evidence, but held the error to be harmless and affirmed the conviction in light of overwhelming non-spectrographic identification of the defendant as perpetrator of the crime. One of the main problems in this case was the fact that the exemplar of the defendant's voice was recorded in a defective manner but used anyway after the tape speed malfunction had been corrected in a laboratory. Dr. Tosi, testifying as a proponent of the technique, stated that the technician should not have used the defective recording as a basis of comparison.75 The court held the technique was not shown to be sufficiently reliable and accepted within the scientific community to permit its use in this criminal case, but that this decision did not foreclose a future decision as to admissibility of the technique.76

In the civil case of D'Arc v. D'Arc 77, the court found that the requirements of the Frye test had not been met and thus the evidence could not be admitted. The court believed that even with proper instructions to the contrary, this type of evidence "has the potentiality to be assumed by many jurors as being conclusive and dispositive" and thus should be subject to strict standards of admission.78

The court in State v. Williams 79 refused to apply the Frye standard citing instead the Maine Rules of Evidence, Rule 401, which states "all relevant evidence is admissible", with relevant being described as evidence having any tendency to make the existence of any fact that is of consequence to the determination of the action more probable or less probable than it would be without the evidence.80

In Reed v. State 81 the court applied the Frye standard to determine admissibility with a rather wide definition of the scientific community which included "those whose scientific background and training are sufficient to allow them to comprehend and understand the process and form a judgment about it".82 The court said the trial court erred in using the more restricted definition of scientific community, "those who are knowledgeable, directly knowledgeable through work,

12

Page 29: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

utilization of the techniques, experimentation and so forth" and did not mean the broad general scientific community of speech and hearing science.83

In a fifty-one page dissent to the Reed decision 84, Judge Smith points out that the Frye standard is much criticized and has never been adopted in the state of Maryland, that this decision is out of step with other courts on related issues of fingerprints, ballistics, x-rays and the like, that this decision is out of step with prior Maryland holdings on expert testimony, that the majority of reported opinions have accepted such evidence, and that even if Frye were applicable it is satisfied.

In United States v. Williams 85 the court did not apply the Frye standard but did note that acceptance of the technique appeared strong among scientists who had worked with spectrograms and weak among those who had not.86 The court then focused on the reliability of the technique and the tendency to mislead. As to the reliability of the technique, the court noted the small error rate, 2.4% false identification, the existence and maintenance of standards of analysis, and the conservative manner in which the technique was applied.87 As to the tendency to mislead, the court felt that adequate precautions were taken in that the jury could view the spectrograms and listen to the recording and the expert's qualifications, the reliability of the equipment and the technique were subject to scrutiny by the defense, and the jury was instructed that they were free to disregard the testimony of the experts.88

In the case of People v. Bein 89 the court based admissibility on a two pronged test; general acceptance by the relevant scientific community, and competent expert testimony establishing reliability of the process. The court found that both tests had been met and allow the admission of the evidence.90 The court described the relevant scientific community "to be that group of scientists who are concerned with the problems of voice identification for forensic and other purposes".91 The court also suggested that "it is no different in this field of expertise than in other fields, that where experts disagree, it is for the finder of fact to determine which testimony is the more credible and therefore more acceptable".92

The Ohio Supreme Court, in State v. Williams 93, relied on their own state rules of evidence, as did the Maine court in Williams, and rejected the use of the Frye standard. The court refused "to engage in scientific nose counting for the purpose of whether evidence based on newly ascertained or applied scientific principles is admissible".94 The court noted, with approval, the playing of the recordings to the jury and, that the jury was free to reject the testimony of the expert.95

In that same year, right across the border in Indiana, the court in Cornett v. State96 rejected admission of voice identification evidence saying the conditions set out in Frye had not been met. Here the court used a wide definition of the scientific community which included linguists, psychologists and engineers who use voice spectrography for identification purposes.97 Although the court held

13

Page 30: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

that the trial court erred in admitting the evidence, the error was found to be harmless and the conviction affirmed.98

Likewise the court in State v. Gortarez 99 rejected the admission of voice identification evidence but affirmed the conviction holding such admission to be harmless error. The court also used a wide definition of the scientific community in applying the Frye standard including experts in the fields of acoustical engineering, acoustics, communication electronics, linguists, phonetics, physics and speech communications and found that there was not general acceptance among these scientists.100

In the case of United States v. Love101, the admissibility of spectrographic voice identification was not at issue. The fourth circuit Court of Appeals was reviewing whether the trial judge's comments about a voice identification expert were considered error. The trial judge told the jury that they, the jury, were to assign whatever weight they wanted to the testimony of the expert and even disregard his testimony if they "should conclude that his opinion was not based on adequate education, training or experience, or that his professed science of voice print identification was not sufficiently reliable, accurate, and dependable."102 The Court of Appeals found no error in the judge's instruction to the jury.

In admitting spectrographic voice identification evidence, the Supreme Court of Rhode Island, in State v. Wheeler 103, declined to apply the Frye standard holding instead "the law and practice of this state on the use of expert testimony has historically been based on the principle that helpfulness to the trier of fact is the most critical consideration".104 The court reviewed the cases around the country, both state and federal, and noted that the majority of circuit courts that have considered admission of spectrographic evidence have decided in favor of its admission.105 The court pointed out that the defendant had all the proper safeguards such as cross-examination, rebuttal experts, and the jury had the right to reject the evidence for any one of a number of reasons.106

In State v. Free107 the Court of Appeals of the State of Louisiana did not rely on the Frye test for guidance in determining the admissibility of spectrographic voice identification evidence but instead applied a balancing test set forth in State v. Catanese108). One individual, accepted as an expert in voice identification, testified as to the theoretical and technical aspects of the spectrographic voice analysis method. No other witnesses were called to either support of show fault with the admission of the voice identification testimony. The Court of Appeals found that voice identification evidence, when offered by a competent expert and obtained through proper procedures, "is as reliable as other kinds of scientific evidence accepted routinely by courts" and "can be highly probative"109. Using the Catanese balancing test the Court of Appeals found that trier of fact was likely to give almost conclusive weight to the voice identification expert's opinion, consequently, misleading the jurors. The Court of Appeals was also concerned that there were not enough experts available who could critically examine the validity of a voice identification determination in a particular case. Nine rules were suggested as a basis for which voice identification evidence could be

14

Page 31: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

accepted110). The Court of Appeals held that Catanese prohibits admission of the voice identification evidence at this time111 and found the admission of that evidence to be harmless error.

In 1987 the Supreme Court of New Jersey again addressed the issue of admissibility of spectrographic evidence in the civil case of Windmere v. International Insurance Company.112 In affirming the judgment of the Appellate Division, the Supreme Court of New Jersey ruled that the Appellate court's affirmation of the admission of the spectrographic evidence by the trial court was improper. The court stated the admissibility of the spectrographic voice analysis is based on the scientific technique having sufficient scientific basis to produce uniform and reasonably reliable results and contribute materially to the ascertainment of the truth 113, a standard the court admits bears "a close resemblance to the familiar Frye test".114 The court relies upon the "general acceptance within the professional community" to establish the scientific reliability of the voice identification process. In reaching a determination of general acceptance, the court on a three prong test which includes; (1) the testimony of knowledgeable experts, (2) authoritative scientific literature, and (3) persuasive judicial decisions which acknowledge such general acceptance of expert testimony.115 The court found that none of the three prongs indicated that there was a general acceptance of spectrographic voice identification in the professional community. The court criticized the proponent experts as being too closely tied to the development of this identification analysis to represent the opinions of the community.116 The court found that the trial court did not undertake to resolve the issue of conflicting scientific literature and they would make no effort to resolve the conflict.117 The court also reviewed the judicial decisions regarding admissibility and found a split among the jurisdictions as to the reliability of the identification process.118

The New Jersey Supreme Court specifically limited its decision in Windmere excluding spectrographic voice identification evidence to the present case. The court stated that the future use of voice identification evidence "as a reasonably reliable scientific method may not be precluded forever if more thorough proofs as to reliability are introduced" 119 and they will "continue to await the more conclusive evidence of scientific reliability".120

The Court of Appeals of Texas in the case of Pope v. Texas121 refused to address the issue of admissibility of voice identification evidence stating that "the overwhelming evidence against appellant renders this error, if any, harmless"122). Justice McClung in his dissenting opinion states that the trial court did err in admitting the voice identification evidence and that the error was not harmless123. He suggests that the Frye test is the proper standard for assessing the admissibility issue and that the "relevant scientific community" should be defined broadly124. When this aspect of the test is so defined the "general acceptability" criterion is not met.

In February of 1989, the United States Court of Appeals for the Seventh Circuit affirmed the decision of the United States District Court for the Northern District

15

Page 32: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

of Illinois admitting spectrographic voice identification evidence in the criminal case of United States of America v. Tamara Jo Smith.125 The Seventh circuit now joins the Second, Fourth and Sixth Circuits in affirming the use of spectrographic voice identification evidence.126 The Appellate court used the Frye standard to hold expert testimony concerning spectrographic voice analysis admissible in cases where the proponent of the testimony has established a proper foundation.127 The court noted that this technique was not one-hundred percent infallible and that the entire scientific community does not support it, however, neither infallibility nor unanimity is a precondition for general acceptance of scientific evidence.128 The Seventh circuit found that a proper foundation had been established in that the expert testified to the theory and the technique, the accuracy of the analysis and the limitations of the process.129 The court noted that variations from the norm result in an increase of false eliminations.130 The jury was not likely to be misled in that they had the opportunity to hear the recordings, see the spectrograms, hear the limitations of the process, witnessed a rigorous cross-examination of the expert and could reject the testimony of the expert.131

In United States v. Maivia,132 the United States District Court admitted spectrographic evidence after a four day hearing on the issue. The court examined the various sub- tests of the Frye test and found that spectrographic voice identification evidence met these tests. The court also noted that "inasmuch as the admissibility of spectrographic evidence to identify voices has received judicial recognition, it is no longer considered novel within the Frye test and consequently the test is inapplicable" 133. The court also looked to the Federal Rules of Evidence, specifically rule 403, in deciding the admissibility of spectrographic voice identification evidence.

In affirming the order of the Appellate Division, the New York Supreme Court, in the case of People v. Jeter134, concluded that the trial court was not able to properly determine that voice identification evidence is generally accepted as reliable based on case law and existing literature. The Court stated that the trial court should have held a preliminary inquiry into the reliability of voice spectrographic evidence. In the light of the other evidence, the admission of the voice identification evidence was held to be harmless error in this case.

STANDARDS OF ADMISSIBILITY

Prior to 1993 there were two main standards of admissibility which had been applied to voice identification evidence; the Frye test and the Federal Rules of Evidence (and the rules of evidence of the various states). The Frye test originated from Court of Appeals of the District of Columbia135 in a decision rejecting admissibility of a systolic blood pressure deception test (a forerunner of the polygraph test). The court stated that admission of this novel technique was dependent on its acceptance by the scientific community.

"Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this

16

Page 33: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

twilight zone the evidential force of the principle must be recognized, and while courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs".136

Out of forty published opinions prior to 1993 deciding the admissibility of voice identification evidence, twenty-three courts applied the Frye standard or a standard very similar to Frye. Sixteen of the twenty-three courts rejected the admission of such evidence. Six of these courts held the admission of voice identification evidence by the trial court was harmless error and affirmed the conviction or judgment. Eight of the sixteen stated that although voice identification evidence had not yet met the required standard of scientific acceptability, their decision was not intended to foreclose future admission when such standards were met. Two of these courts denied admission because they felt a single witness could not speak for the entire scientific community regarding the acceptance issue.

Seven courts applied the test and found the requirements of Frye had been met. Of the thirteen courts applying a standard of admissibility different from Frye, only one, the Free court137, rejected voice identification evidence.

There are three problems with the Frye standard; at what point is the principle of "sufficiently established" determined, at what point is "general acceptance" reached, and what is the proper definition of "the particular field in which it belongs".

These three areas have been major stumbling blocks for the courts in deciding the issue of the admissibility of voice identification evidence due to the small number of voice scientists who have performed research in this field. The trial court in People v. Siervonti 138 noted the lack of research in this area saying "one only wishes that the last twelve years had been spent in research and not in attempting to get the method into the courts".139

The Frye test has been criticized as not being the appropriate test to use for the admission of voice identification evidence. This standard was established and applied to the admission of a type of evidence which is very different from voice identification. In Frye the court was concerned with the admission of a test designed to determine if a person was telling the truth or not. This type of evidence invades the province of the finder of fact. Voice identification evidence belongs in the general classification of identification evidence which does not impinge on the role of the finder of fact. As such it shares common traits with the other identification sciences of fingerprinting, ballistics, handwriting, and fiber, serum and substance identification.

Another criticism of the application of the Frye test as the standard for admission of voice identification evidence is that general acceptance by the scientific community is the proper condition for taking of judicial notice of scientific facts.

17

Page 34: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

McCormick states that general scientific acceptance is a proper condition for taking judicial notice of scientific facts, but not a criterion for the admissibility of scientific evidence.140

The court in Reed v. State 141 seemed to note this difference between the standard for the taking of judicial notice and that for admission of evidence such as voice identification. The court said that validity and reliability may be so broadly accepted in the scientific community that the court may take judicial notice of it. If it can not be judicially noticed then the reliability must be demonstrated before it can be admitted.142 The court then applied the Frye test, general acceptance by the scientific community, to determine reliability and thus, admissibility.

Scientific evidence has long been admitted before it was judicially noticed, as with the case of fingerprints. The admission of fingerprint identification evidence was first challenged in the case of People v. Jennings143 in 1911. The court in Jennings allowed the admission of fingerprint evidence saying "whatever tends to prove any material fact is relevant and competent".144 It was not until thirty-three years later that fingerprint evidence was first judicially noticed.145

The majority of courts which have decided the issue of admissibility in favor of allowing voice identification into the courtroom have used similar standards which permit the finder of fact to hear the evidence and determine the proper weight to be assigned to it. Their logic runs parallel to the Federal Rules of Evidence which state that all relevant evidence is admissible with the word "relevant" being defined as evidence tending to make the existence of any fact that is of consequence to the determination of the action more probable or less probable than it would be without the evidence.146 A qualified expert may testify to his opinion if such opinion will assist the trier of fact in better understanding the evidence.147

Many of the courts which have upheld the admission of voice identification evidence have done so because the trial court had set up a number of precautions to insure the evidence was viewed in its proper light. These precautions include allowing the jury to see the spectrograms of the voices in question, allowing the jury to hear the recordings from which the spectrograms were produced, the expert's qualifications and opinions as well as the reliability of the equipment and technique are subject to scrutiny by the other side, the availability of competent witnesses to expose limitations in the process, and instructions to the jury that they were free to assign whatever weight, if any, to the evidence they felt it deserved.

The United States Supreme Court in 1993 changed the long-standing law of admissibility of scientific expert evidence by rejecting the Frye test as inconsistent with the Federal Rules of Evidence in the case of Daubert v. Merrell Dow Pharmaceuticals148. The Court held that the Federal Rules of Evidence and not Frye were the standard for determining admissibility of expert scientific testimony. Frye's "general acceptance" test was superseded by the Federal

18

Page 35: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Rules' adoption. Rule 702 is the appropriate standard to assess the admissibility of scientific evidence. The Court derived a reliability test from Rule 702.

In order to qualify a scientific knowledge, an inference or assertion must be derived by the scientific method. Proposed testimony must be supported by appropriate validation - i.e., good grounds, based on what is known. In short, the requirement that an expert's testimony pertain to scientific knowledge establishes a standard of evidentiary reliability149

The Daubert decision concerns statutory law and not constitutional law. The Court held that the Federal Rules, not Frye, govern admissibility.. The only Federal Circuit to reject spectrographic voice analysis has been the District of Columbia. Daubert may cause the District of Columbia to change its stance the next time such evidence is introduced.

Since Daubert is not binding on the states, it will be difficult to determine just how much impact Daubert will have on the admissibility standards of the states. Many states have adopted evidence rules based on the Federal Rules of Evidence and may not be effected by this holding. Other states which have adopted the Frye test will have to decide to either continue following Frye or change their standard to Daubert. The Arizona Supreme Court declined to follow Daubert saying that it was "not bound by the United States Supreme Court's non-constitutional construction of the Federal Rules of Evidence when we construe the Arizona Rules of Evidence."150

RESEARCH STUDIES

The studies that have been produced over the years have run the gambit in type, parameter, and result. A quick review of the available published data would leave one with the impression that the spectrographic method of voice identification was only somewhat more accurate than flipping a coin. The diversity of the relatively low number of studies and the range of results has only added to the confusion as to the reliability and validity of this method of identification. When one takes the time and expends the effort to analyze the studies in this field, a very different conclusion becomes evident. When the individual parameters of the studies are taken into account, who was being evaluated, what information was given to the examiner to assess, and what limitations were placed on the examiner's conclusions, a much clearer picture of the accuracy of the spectrographic voice identification method develops. The picture is not one of a marginally accurate technique but rather a picture that clearly shows that a properly trained and experienced examiner, adhering to internationally accepted standards will produce a highly accurate result. The studies also show that as the level of training diminishes and/or the conclusions an examiner may reach are artificially limited, the error rate goes up dramatically.

The training for accurately performing the spectrographic voice identification method has been established as requiring completion of (1) a formal course of study, usually 2 to 4 weeks duration, in the basics of spectrographic analysis, (2)

19

Page 36: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

two years of study completing 100 voice comparison cases, usually in a one-to-one relationship with a recognized expert, (3) examination by a board of experts in the field of spectrographic voice identification analysis.

For the most accurate results from the spectrographic voice identification method, a professional examiner (1) will require the original recordings or the best quality re-recordings if the original is not available; (2) will perform a critical aural review of the suspect and known recordings; (3) will produce sound spectrograms of the comparable words and phrases; (4) will produce a comparison recording juxtaposing the known and unknown speech samples; (5) will evaluate the evidence and classify the results into one of five standard categories [ 1 - positive identification, 2 - probable identification, 3. - positive elimination, 4 - probable elimination, and 5 - no decision]. The final decision is reached through a combined process of aural and visual examination.

It is important to remember that the spectrographic method of voice identification is a process that interweaves the visual analysis of the sound spectrograms with the critical aural examination of the sounds being viewed. Taking the results from all of the studies produced shows that if the examiner's ability to analyze both the graphic representations of the voice and the aural cues found in the recordings is limited or restricted, accuracy suffers. Likewise, the amount of training has a direct bearing on the level of accuracy of the results.

In a survey of 18 studies151 of the accuracy of the spectrographic voice identification method, the results fall into two categories; those with proper training, using standard procedures produce very accurate results, whereas those with inadequate training, using limited analysis methods, produce inaccurate results.

In a study152 in 1975 authored by Lt. L. Smrkovski of the Voice Identification Unit of the Michigan State police, error rates in voice identification analysis comparisons, based on three levels of training and experience, were evaluated. The following table summarizes the results of that study.

Error type Novice Trainee Professional

False Ident. 5.0% 0.0% 0.0%

False Elim. 25.0% 0.0% 0.0%

No Decision 2.5% 2.5% 7.5%

Lt. Smrkovski's results show that proper training is essential. The fact that his results show a higher no decision rate among the professional examiners than the trainee examiners may indicate that the professional is a bit more cautious in his analysis than the trainee.

Mark Greenwald, in his 1979 thesis153 for his M.A. degree at Michigan State University, studied the performance of three professional examiners (each with

20

Page 37: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

eight years experience) and five trainees (each with less than two years experience) using standard spectrographic voice identification methods (visual and aural) and result classifications. Greenwald found that the professional examiners produced no errors when using full frequency bandwidth recordings. When the frequency band width was restricted, the professional examiners still produced no errors, but did increase their percentage of no decision classifications. Greenwald also found that the training level was an important factor and that the trainees in this study had an error rate of 6.1% for false identifications in the restricted frequency bandwidth trials.

In 1986, the Federal Bureau of Investigation published a survey of two thousand voice identification comparisons made by FBI examiners154. This survey was based on 2000 forensic comparisons completed over a period of fifteen years, under actual law enforcement conditions, by FBI examiners.155

The examiners had a minimum of two years experience, completed over 100 actual cases, completed a basic two week training course and received formal approval by other trained examiners.156

The results of the survey are depicted in the chart 157 below.

DECISIONS NUMBER PERCENT(%)

No or low confidence 1304 65.2

Eliminations 378 18.9

Identifications 318 15.9

ERRORS

False eliminations 2 0.53

False identification 1 0.31

The FBI results are consistent with the Smrkovski study in that properly trained examiners, utilizing the full range of procedures, produce quite accurate results.

By way of contrast, the 1976 study158 by Alan Reich used four speech science graduate students with previous experience with speech spectrograms (but untrained in spectrographic voice identification analysis) to examine, using visual comparison only, nine excerpted words. This study produced an accuracy rate in the undisguised trials of 56.67%. When disguise was introduced into this study paradigm the accuracy rate decreased significantly.

Taken as a whole the 18 studies support the conclusion that accurate results will be obtained only through the combined use of the aural and visual components of the spectrographic voice identification method as performed by a properly trained examiner adhering to the established standards. Those studies with poor accuracy results are important in that they demonstrate the weaknesses of

21

Page 38: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

improperly performed examinations that do not adhere to the internationally accepted professional standards.

A large part of the debate over the admissibility of spectrographic voice identification analysis in the courts appears due to the fact that the parameters of these studies have not adequately been demonstrated to the courts in the necessary detail which would allow the courts to examine the overall meaning of these studies. Many of these studies look at only one or two aspects of the spectrographic voice identification method. Frequently the results of these restricted scope studies have been misapplied to the entire spectrographic voice identification method resulting in inaccurate information being used as the basis for deciding the admissibility of spectrographic voice identification analysis. It is important to provide an accurate picture of all the studies so the courts will have the foundational information necessary to make an informed decision regarding the admissibility of spectrographic voice identification analysis.

CONCLUSION

The technique of voice identification by means of aural and spectrographic comparison is still an unsettled topic in law. Although the spectrographic voice identification method has progressed greatly since it was first introduced to a court of law back in the mid 1960's, it still faces stiff resistance on the issue of admissibility in the courts today. One of the reasons for such opposition regarding admissibility is that the method has evolved greatly since its initial application. Court decisions based on early methods of voice identification analysis are not applicable to the methods used today. No longer are voices compared on the basis of a limited group of key words. Today's aural/spectrographic voice identification method takes advantage of the latest in technological advancements and interweaves several analyses into one procedure to produce an accurate opinion as to the identity of a voice. This modern technique combines the experience of a trained examiner performing the visual analysis of the spectrograms and aural analysis of the recordings with the use of the latest instruments modern technology has to offer, all in a standardized methodology to assure reliability. Court decisions reviewing the early voice identification cases may not be relevant to present day cases because the older decisions were based on less sophisticated procedures. Most of the courts which have rejected admission have been aware of continuing work in this field and have specifically left the door open as to future admissibility.

Proper presentation and explanation of the research pertaining to spectrographic voice identification analysis will allow the courts to better understand the accuracy and reliability of the spectrographic voice identification method. When the research is properly presented, the studies show that properly trained individuals, using standard methodology, produce accurate results.

The current trends in the admissibility issue of voice identification evidence indicate that courts are more willing to allow the evidence into the courtroom

22

Page 39: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

when a proper foundation has been established which then allows the trier of fact to determine the weight to be assigned to the evidence.

TABLE OF CASES

1. FRYE v US 293 F 1013 (D.C. Ct. App. 1923)

2. US v WRIGHT 37 CMR 447 (1967)

3. STATE v CARY 230 A.2d 384 (N.J. 1967)

4. STATE v CARY 239 A.2d 680 (N.J.Super. 1968)

5. PEOPLE v KING 266 C.A.2d 437 (1968)

6. STATE v CARY 250 A.2d 15 (N.J. 1969)

7. STATE v CARY 264 A.2d 209 (N.J. 1970)

8. STATE EX REL. TRIMBLE v HEDMAN 192 N.W.2d 432 (Minn. 1971)

9. US v RAYMOND 337 F.Supp. 641 (DCDC 1972)

10. WORLEY v STATE 263 So.2d 613 (Fla. 1972)

11. ALEA v STATE 265 So.2d 96 (Fla. 1972)

12. US v ASKINS 351 F.Supp. 408 (1972)

13. STATE v ANDRETTA 296 A2d 644 (N.J. 1972)

14. HODO v SUPERIOR COURT 30 C.A.3d 778 (Calif. 1973)

15. PEOPLE v CHAPTER 13 CrL 2479 (Calif. 1973)

16. US v SAMPLE 378 F.Supp. 44 (Penn. 1974)

17. US v ADDISON 498 F.2d 741 (DCDC 1974)

18. PEOPLE v LAW 40 C.A.3d 69 (Calif. 1974)

19. US v FRANKS 511 F.2d 25 (6th Cir. 1975)

20. COMMONWEALTH v LYKUS 327 N.E.2d 671 (Mass. 1975)

21. COMMONWEALTH v VITELLO 327 N.E.2d 819 (Mass. 1975)

22. STATE v OLDERMAN 336 N.E.2d 442 (Oh. 1975)

23. US v BALLER 519 F.2d 463 (4th Cir. 1975)

24. US v JENKINS 525 F.2d 819 (6th Cir. 1975)

25. PEOPLE v ROGERS 385 N.Y.S.2d 228 (N.Y. 1976)

23

Page 40: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

26. PEOPLE v KELLY 549 P.2d 1240 (Calif. 1976)

27. US v MCDANIEL 538 F2d 408 (D.C. Cir 1976)

28. COMMONWEALTH v TOPA 369 A.2d 1277 (Penn. 1977)

29. PEOPLE v EVANS 393 N.Y.S.2d 674 (1977)

30. PEOPLE v TOBEY 257 N.W.2d 537 (Mich. 1977)

31. US v WILLIAMS 443 F.Supp. 269 (S.D.N.Y. 1977)

32. PEOPLE v COLLINS 405 N.Y.S.2d 365 (1978)

33. BROWN v US 384 A.2d 647 (D.C.C.A. 1978)

34. D'ARC v D'ARC 157 N.J.Super. 553 (1978)

35. STATE v WILLIAMS 388 A.2d 500 (Me. 1978)

36. REED v STATE 391 A.2d 364 (Md. 1978)

37. US v WILLIAMS 583 F.2d 1194 (2nd Cir. 1978)

38. PEOPLE v BEIN 453 N.Y.S.2d 343 (N.Y. 1982)

39. STATE v WILLIAMS 4 OHIO ST.3d 53 (1983)

40. CORNETT v STATE 450 N.E.2d 498 (Ind. 1983)

41. STATE v GORTAREZ 686 P.2d 1224 (Ar. 1984)

42. PEOPLE v SIERVONTI, unpublished, Municipal Court of

the Chico Judicial District, State of California (1985)

43. STATE v WHEELER 496 A.2d 1382 (R.I. 1985)

44. STATE v. FREE 493 So.2d 781 (La., 1986)

45. POPE v. STATE of TEXAS 756 S.W.2d 401 (Texas 1988)

46. UNITED STATES v. MAIVIA 728 F. Supp 1471 (D. Hawaii, 1990)

47. PEOPLE v. JETER 80 N.Y. 818 (NY 1992)

48. DAUBERT v. MERRELL DOW PHARMACEUTICALS 113 S. Ct. 2786 (1993)

APPENDIX 1

The following are summaries of studies of spectrographic voice identification and an FBI survey of forensic cases..

24

Page 41: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Greenwald, M., "The Effects of Decreased Frequency Bandwidth on Speaker Identification by Aural and Spectrographic Examination of Speech Samples", Master Thesis, Michigan State University, 1979

Hall, M. C., "Spectrographic Analysis of Interspeaker and Intraspeaker variables of Professional Mimicry", Master Thesis, Michigan State University, 1975

Hazen, B., "Effects of Different Phonetic Contexts on Spectrographic Speaker Identification", 54 J. Acoust. Soc. Am. 650, 1973

Hollien, H., & McGlone, R., "The Effect of Disguise on Voiceprint Identification", In the Proceedings of the Carnahan Crime Countermeasures Conference, University of Kentucky, University of Kentucky Press, Lexington, KY, 1976

Kersta, L. G., "Voiceprint Identification", 196 Nature Magazine 1253, Dec. 29, 1962

Reich, et al., "Effects of Selected Vocal Disguises upon Spectrographic Speaker Identification", 60 J. Acoust. Soc. Am. 919, 1976

Reich & Duke, "Effects of selected vocal disguises upon speaker identification by listening", 66 J. Acoust. Soc. Am. 1023, 1979

Smrkovski, L. L., "Collaborative Study of Speaker Identification by the Voiceprint Method", 58 J. AOAC 453, 1975

Smrkovski, L. L., "Study of Speaker Identification by Aural and Visual Examination of Non-Contemporary Speech Samples", 59 J. AOAC 927, 1976

Stevens, et al., "Speaker Authentication and Identification: A Comparison of Spectrographic and Auditory Presentations of Speech Material", 44 J. Acoust. Soc. Am. 1596, 1968

Tosi, et al., "Experiment on Voice Identification", 15 J. Acoust. Soc. Am. 2030, 1972

Tosi & Greenwald, "Voice Identification by Subjective Methods of Minority Group Voices", Paper presented at the 6th Meeting of the International Association of Voice Identification, New Orleans, La., 1978

Young, M. A.,& Campbell, R. A., "Effects of Context on Talker Identification", 42 Acoust. Soc. Am. 1250,1967

KERSTA

1962

Examiners: 8 high school girls Training duration: 1 week

Method: visual Speaker population: 123

25

Page 42: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Number of words: 10 words excerpted from sentences Context type: isolated random context

Temporal sequence: contemporary Type of trial: closed

Total number of trials: 2000

Type of decision: forced decisions limited sample limited time random context no aural examination examiners lacked sufficient experience Results: closed trials range of errors for false ID - 0.35 to 1.0% 10 words excerpted 0.00 to 2.0%

YOUNG & CAMPBELL

1967

Examiners: 7 PhD candidates in ASC 3 assistant professors in ASC Training duration: 1 week

Method: visual Speaker population: 5 adult males

Number of words: 2 words (you/it) in isolation & excerpted from 4 short sentences Context type: 1 word in isolation 2 words from random context

Temporal sequence: contemporary Type of trial: closed

Total number of trials: 1046

Type of decision: forced decisions limited sample random context no aural examination examiners not trained Results: closed trials range of errors for false ID - "you" in isolation: 10.4 to 18.0% 'it' in isolation: 22.7 to 33.0% "you/it" from random context in trial 1 of 15: mean error: 62.7%

STEVENS

1968

Examiners: college students 6 in the open trials 4 in the closed trials Training duration: 1 week

Method: aural vs.visual but not combined Speaker population: 24 males

Number of words: catalogue of 11 words in different random order - only 1 word used in most trials Context type: 1 to 4 words

Temporal sequence: non-contemporary (1 week) Type of trial: closed & open

Total number of trials: 216

Type of decision: forced decisions limited sample (1 to 4 words) random context no aural examination examiners not trained Results: open trials: range of errors for false ID for 4 examiners/1 word visual trials - 31.0 to 47.0% aural trials - 6.0 to

26

Page 43: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

8.0% closed trials: range of errors for false ID - 1 - 4 discrete words visual trials 20.0 to 30.0% aural trials 5.0 to 18.0%

TOSI ET AL

1968 - 1970

Examiners: 29 of various backgrounds Training duration: 1 month

Method: visual Speaker population: 250 males randomly selected from a population of 25,000

Number of words: 6 & 9 words Context type: isolated, fixed and random context

Temporal sequence: contemporary & noncontemporary (1 month) Type of trial: closed & open

Total number of trials: 34,992

Type of decision: forced decisions, but allowed to rate confidence level limited sample limited time no aural examination examiners lacked sufficient experience Results: range of errors for all trials false ID - 0.51 to 6.43% when only 'fairly & almost' certain decisions are combined, the error of false ID reduces to 2.4%

HAZEN

1972

Examiners: college students (7 panels of 2) Training duration: 5 lectures and 3 practice sessions

Method: visual Speaker population: 60 males

Number of words: 5 words in the same context, 5 words physically excerpted from random conversation Context type: fixed and random context

Temporal sequence: contemporary Type of trial: closed & open

Total number of trials: 280

Type of decision: forced decisions limited sample (5 words) no aural examination random & fixed context examiners lacked sufficient experience used the most dissimilar spectrographic utterances compared sounds from totally different words studying changing phonetic context examiners could not evaluate effects of coarticulation due to questionable word boundaries Results: closed trials errors for false ID - fixed context range:10.0 to 30.0% mean: 20.0% random context range:50.0 to 90.0% mean: 74.29% open trials errors for false ID - fixed context range:16 to 66% mean: 42.86% random context range:66 to 100% mean: 83%

SMRKOVSKI

27

Page 44: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

1974

Examiners: 7 police & private Training duration: more than 2 years experience/less than 2 years experience

Method: combined aural and visual Speaker population: 7 male & female

Number of words: 38 to 54 words Context type: fixed context

Temporal sequence: noncontemporary (1 week) Type of trial: open

Total number of trials: 84

Type of decision: no forced decisions allowed 1 to 5 conclusions no limited time aural & visual examination trained and experienced examiners Results: open trials trainees w/less than 2 yr experience: false ID - 0.0% false elim. 5.0% no decision 25.0% 0.35 to 1.0% examiners w/more than 2 yr experience: false ID - 0.0% false elim. 0.0% no decision 22.0%

SMRKOVSKI

1975

Examiners: 12 scientists, police and private Training duration: novice: no training trainee: < 2 yr Professional: > 2 yr

Method: combined visual and visual Speaker population: 20 male & female

Number of words: 9 words Context type: fixed context

Temporal sequence: noncontemporary Type of trial: open

Total number of trials: 120

Type of decision: no forced decisions allowed 1 to 5 conclusions no limited time aural & visual examination compared words in context - trainees, novices and experienced examiners Results: open trials: errors novices false ID 5.0% false elim 25.0% no decision 2.5% trainee false ID 0.0% false elim 0.0% no decision 2.5% Professional false ID 0.0% false elim 0.0% no decision 7.5% HALL

1975

Examiners: 4 professional and 20 college graduates Training duration: IAVI certified voice identification examiner

Method: combined visual and visual / visual only Speaker population: professional mimic and 6 celebrity voices

Number of words: mimic (mean of 25 sec.), celebrities (mean of 35 min.) Context type: quasi-fixed and random context

Temporal sequence: contemporary/ noncontemporary Type of trial: open

28

Page 45: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Total number of trials: aural (20/examiner) visual (200/examiner) Type of decision: same, different or undecided 5 IAVA classifications

Results: Interspeaker variability does not exist between a mimicked, disguised voice and the nature voice of the subject mimicked. Intraspeaker variabilities are minute and not significant when comparing mimics' voice and the nature voice of the mimic. Aurally: The smaller signal-to-noise ratio within the recording and the more similar the context, the greater the percentage of accuracy in distinguishing between speakers. AURAL EXAMINATION: Grand means: RIGHT WRONG UNDEC. Grad. students 0.74 0.18 0.08 Professional 0.92 0.082 0.0

HOLLIEN/McCLONE

1975-76

Examiners: 5 faculty 1 graduate student Training duration: "the authors were familiar with the 'voiceprint' method of speaker identification"

Method: visual only (spectrograms were cut & mounted) Speaker population: 25 faculty and graduate students of the University of Florida

Number of words: 7 words Context type: "I do not set the same store"

Temporal sequence: contemporary Type of trial: open

Total number of trials: 25/examiner

Type of decision: record a match/ indicate none was possible Results: ". . . even skilled auditors such as these were unable to match correctly the disguised speech to the reference (normal) samples as much as 25% of the time . . . these groups were able to disguise their voices in such manners that their identification by the 'voiceprint' technique became little more than a matter of chance."

REICH ET AL

1976

Examiners: 2 PhD candidates in speech science 2 PhD candidates in speech pathology Training duration: 3 courses in speech science plus previous experience with speech spectrograms: 4 weeks at 10-15 hr/wk

Method: visual only (words excerpted and mounted) Speaker population: 40 adult males (mean: 27.3 yrs)

Number of words: 9 words Context type: fixed context

Temporal sequence: noncontemporary Type of trial: open

Total number of trials: 105 (7 matching tasks w/15 known & 15 unknown)

29

Page 46: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Type of decision: 1 to 5 certainty scale Results: The examiners were able to match speakers with a moderate degree of accuracy (55.67%) when there was no attempt to vocally disguise. Disguised speech significantly interfered with speaker identification. Further research is needed . . . in which the examiners may listen to the voice as well as view the spectrograms.

ROTHMAN

1977

Examiners: 30 listeners 6 visual examiners Training duration: none

Method: Study I: Aural Study II: Visual (0 to 8kHz) Speaker population: 12

Number of words: four - 2 second speech segments Context type: random context

Temporal sequence: contemporary/ noncontemporary (1wk) Type of trial: open

Total number of trials: 5 visual 38 aural

Type of decision: same/different for each contemporary and noncontemporary Results: 94% correct identifications were obtained for contemporary speech segments. 42% correct identifications were obtained for noncontemporary speech segments. 58.45% correct identifications were obtained when comparing different speakers. All examiners in pretest visual achieved 100% correct matching. Aural method is clearly superior to the spectrographic or 'voiceprint' method

McGLONE, HOLLIEN & HOLLIEN

1977

Examiners: 4 phoneticians Training duration: experienced

Method: visual measurement of format fundamental frequency to obtain for Speaker population: 23 adult males

Number of words: 7 words ("I do not set the same store" Context type: fixed (normal & disguised) context

Temporal sequence: contemporary Type of trial:

Total number of trials: 46/phonetician

Type of decision: Results: A great amount of variability in the fo was found between normal and disguised speech. The mean bandwidth differences (f1, f2, f3) for the group were large and also demonstrated considerable variability. Phonetic means also differed.

HOULIHAN - Study I

30

Page 47: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

1977

Examiners: 21 undergraduate students Training duration: series of lectures & discussions on phonetics, acoustics, and sound spectrography and speaker identification

Method: visual only Speaker population: 9 female, 5 male undergraduates - homogenous age and geographic background

Number of words: 9 words Context type: fixed context: 5 voice conditions (normal, lowered, falsetto, whispered and muffled)

Temporal sequence: contemporary Type of trial: open

Total number of trials: 18 matches

Type of decision: same/different Results: correct identifications: F- voice M-voice normal 100% 95% lowered 85% 95% falsetto 95% 90% whispered 5% 98% muffled 75% 100% range: 39 to 70% correct mean: 58.8% Std.D.: 8.7%

HOULIHAN - Study II

1977

Examiners: 7 students from Experimental phonetics Training duration: completion of Exp. I with feedback

Method: visual only Speaker population: 8 female, 8 male (mean age: 25.3 yrs)

Number of words: 8 words Context type: fixed context: "There's a bomb in the main post office"

Temporal sequence: contemporary Type of trial: closed

Total number of trials: 16/examiner

Type of decision: instructed to consider the sets in a particular order. All examiners considered undisguised before disguised Results: correct identifications: F-voice M-voice normal 71% 100% lowered 85% 100% falsetto 100% 67% whispered 71% 71% muffled 85% 100% The results suggest that minimally trained examiners have little difficulty with spectrographic identification in closed, contemporary, undisguised trials. Results do not suggest that female voices are more difficult to identify than male voices.

TOSI ET AL

1979

Examiners: professional and students Training duration: IAVA certified voice examiners and 2 weeks of training, respectively

31

Page 48: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Method: aural only, visual only and aural/visual combined Speaker population: Chicano (25 female and 25 male)

Number of words: four sentences approximately 2.4 seconds in Spanish Context type: fixed context

Temporal sequence: noncontemporary Type of trial: open - randomized

Total number of trials: 600/examiner

Type of decision: same, different, no opinion. qualified percentage of self- confidence from 51 to 100% Results: Student and Professional examiners for errors of elimination and identification had a mean percentile greater for noisy samples than for quiet samples, however, professional examiners errors were due to aural only examinations whereas spectrographic/aural examinations produced 0.0% errors. The 'no opinion' option was used more by professional examiners.

REICH

1979

Examiners: 24 undergraduate students, 3 doctoral students, 3 professors of Speech and Hearing Science Training duration: brief lecture; 120 discrimination trials identical to the experiment

Method: aural only Speaker population: 40 adult males (mean age: 27.3 yrs)

Number of words: 9 words (it, is, on, you, and, the, I, to, me) Context type: fixed context

Temporal sequence: noncontemporary (2 weeks +) Type of trial: open

Total number of trials: 18 matches

Type of decision: same/different (1 to 5 certainty) Results: Both groups were able to discriminate speakers with moderately high degrees of accuracy, 92% correct for undisguised. Disguised trials ranged from 59 to 81% depending on the disguise. Recommended further research to study the combined aural/spectrographic method.

GREENWALD

1979

Examiners: 3 professional, 5 trainees (less than 2 years experience) Training duration: professionals: 8 yrs each trainees: < 2 yrs

Method: aural only, visual only and aural/visual combined Speaker population: 12 female, 12 male; American Midwest dialect

32

Page 49: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Number of words: 24 words Context type: fixed context

Temporal sequence: noncontemporary Type of trial: open

Total number of trials: 192 discrimination types Type of decision: the five IAVI alternatives

Results: Professional examiners produced no errors of false identification or elimination. 1536 decisions by all eight examiners. Effect of restricted bandwidths (240-2K, 240-2.5K, 240-3K, and 240-4K) does not increase the errors but does increase the percentage of 'no decisions'. Training of the examiner is very important on error rate. Trainees produced errors as follows: 6.1% false identification and 4.1% false elimination for all trials. However, at 240-4khz., 0.0% errors of false identification of elimination.

KOENIG - FBI SURVEY

1986

Examiners: Federal Bureau of Investigation voice identification examiners Training duration: minimum of 2 yrs experience, completion of at least 100 actual voice comparison cases, formal approval by other trained examiners

Method: combined aural/visual method Speaker population: actual criminal cases

Number of words: varied with each case Context type:

Temporal sequence: noncontemporary Type of trial: open

Total number of trials: 2000 forensic comparisons

Type of decision: very similar very dissimilar no decision (low confidence) Results: number percent no/low conf. 1304 65.2 elimination 378 18.9 identification 318 15.9 errors false elim. 2 0.53 false id. 1 0.31

 

33

Page 50: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Spectrographic voice identification: A forensic survey

Bruce E. Koenig

Federal Bureau of Investigation, Engineering Section, Technical Services Division, 8199 Backlick Road, Lorton, Virginia 22079

(Received 25 October 1985; accepted for publication 18 February 1986) - J. Acoust. Soc. Am 79(6) June 1986

A survey of 2000 voice identification comparisons made by Federal Bureau of Investigation (FBI) examiners was used to determine the observed error rate of the spectrographic voice identification technique under actual forensic conditions. The qualifications of the examiners and the comparison procedures are set forth. The survey revealed that decisions were made in 34.8% of the comparisons with a 0.31% false identification error rate and a 0.5 3% false elimination error rate. These error rates are expected to represent the minimum error rates under actual forensic conditions.

PACS numbers: 43.70.Jt

INTRODUCTION

The sound spectrograph is a device which produces a visual graph (spectrogram) of speech as a function of time (horizontal axis), frequency (vertical axis), and voice energy (gray scale or color differences).1,2 It is a well-accepted research tool that is used to study individual vowel characteristics, physiological speech anomalies, etc. However, in the field of forensic voice identification, it has yet to find approval among most scientists in phonetics. linguistics, engineering, and related disciplines as a positive test in comparing voice samples.3-6

Historically, forensic applications were not seriously considered until 1962 when Lawrence Kersta published the results of experiments which reflected error rates of 0% to 3% for one-word spectral comparisons in closed sets (examiner always knows a match exists) of 12 or less speakers.7 In 1972, the findings of a large-scale study at Michigan State University were published in which attempts were made to more closely imitate law enforcement conditions, but only spectral comparisons were made (no aural). The “forensic model” included open set trials (examiner did not know if a match existed), noncontemporary samples (1 month apart), trained examiners, and high-confidence decisions. This resulted in an approximate error rate of 2% for false identification (no match existed but the examiner selected one, or a match existed but the examiner chose the wrong one) and 5% for false elimination (a match existed but the examiner failed to recognize it). The authors of the study attempted to extend the experimental

1

Page 51: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

results to actual law enforcement conditions, which they thought would lower the error rates. They theorized that examiners could aurally compare the voice samples, the number of known suspects would be limited by police investigation, there would be no time limits placed on the examiner, only very high confidence decisions would be used, and additional known voice samples could be obtained.8 Other scientists disagreed on the study’s extensions, and stated that in actual forensic conditions the error rate would increase, not decrease.’ In 1979, a committee of the National Research Council released its findings and recommendations in a Federal Bureau of Investigation (FBI) -funded study on the reliability of spectrographic voice identification under forensic conditions, which found, in part, that:

(1) Error rates vary from case to case due to the properties of the voices compared, the recording conditions used to obtain voice samples, the skill of the examiner, and the examiner’s knowledge about the case. Estimates of error rates are available only for a few situations, and they “do not constitute a generally adequate basis for a judicial or legislative body to use in making judgments concerning the reliability and acceptability of aural-visual voice identification in fo-rensic applications.”10

(2) Examiners should fully use all available knowledge and techniques that could improve the voice identification method.’0

(3) Spectrographic voice identification assumes that intraspeaker variability (differences in the same utterance repeated by the same speaker) is discernable from interspeaker variability (differences in the same utterance by different speakers); however, that “assumption is not adequately supported by scientific theory and data.” Viewpoints on actual error rates are presently based only on “various professional judgments and fragmentary experimental results rather than from objective data representative of results in forensic applications.”’’

FBI examiners have used the spectrographic technique since the 1950s for investigative support, but have not provided expert court testimony on comparison results.’2

This paper presents the results of 2000 forensic comparisons, under actual law enforcement conditions, by FBI examiners.

. SURVEY PROCEDURES

The FBI conducts forensic voice identification examinations using the spectrographic or voiceprint technique for the FBI, other Federal agencies, state and local law enforcement authorities, and many foreign governments. After each examination is conducted, a written report of findings is mailed to the contributor

2

Page 52: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

with the name of the examiner and the disposition of the submitted voice samples. If an identification or elimination is made, the contributor is contacted by telephone and asked if the results are consistent with interviews and other evidence in the investigation. If other information strongly supports the voice comparison result, then the contributor is told to contact the FBI if later developed evidence contradicts the finding. If the voice comparison results contradict other evidence, the matter is closely followed until legally adjudicated or investigatively closed. In the few occurrences where no final determination was possible, the voice comparison result was considered a “no decision” in the survey.

The results of the last 2000 requested comparisons, spanning 15 years, were compiled and organized into total identification and elimination decisions, known errors, and no or low confidence decisions.

II. QUALIFICATIONS OF EXAMINERS

All of the individuals conducting the voice comparison examinations were FBI employees with the following qualifications: (I) at least two years of full-time experience in voice identification and analysis of tape recorded voice signals using sophisticated digital and analog analysis and filtering equipment; (2) completion of over 100 voice comparisons in actual cases; (3) completion of a basic two week course in spectrographic analysis, or equivalent; (4) passing a yearly hearing test; (5) formal approval by other trained examiners; and (6) a minimum of a Bachelor of Science Degree in a basic scientific field.

III. COMPARISON PROCEDURES

The following procedures were used, if at all possible, on every attempted voice comparison in the survey.

(1) Only original recordings of voice samples were accepted for examination, unless the original recording had been erased and a high-quality copy was still available.

(2) The recordings were played back on appropriate professional tape recorders and recorded on a professional full-track tape recorder at 7 1/2 ips. When possible, playback speed was adjusted to correct for original recording speed errors by analyzing the recorded telephone and AC line tones on spectrum analysis equipment. When necessary, special recorders were used to allow proper playback of original recordings that had incorrect track

placement or azimuth misalignment.

(3) Spectrograms were produced on Voice Identification, Inc., Sound Spectrographs, model 700. in the linear expand frequency range (0-4000 Hz), wideband filter (300 Hz) and bar display mode. All spectrograms for each sepa-rate comparison were prepared on the same spectrograph. The spectrograms were phonetically marked below each voice sound.

3

Page 53: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

(4) When necessary, enhanced tape copies were also prepared from the original recordings using equalizers, notch filters, and digital adaptive predictive deconvolution programs13,14 to reduce extraneous noise and correct telephone and recording channel effects. A second set of spectrograms was then prepared from the enhanced copies and was used together with the unprocessed spectrograms for comparison.

(5) Similarly pronounced words were compared between two voice samples, with most known voice samples being verbatim with the unknown voice recording. Normally, 20 or more different words were needed for a meaningful comparison. Less than 20 words usually resulted in a less conclusive opinion, such as possibly instead of probably.

(6) The examiners made a spectral pattern comparison between the two voice samples by comparing beginning, mean and end formant frequency, formant shaping, pitch, timing, etc., of each individual word. When available, similarly pronounced words within each sample were compared to insure voice sample consistency. Words with spectral patterns that were distorted, masked ‘by extraneous sounds, too faint, or lacked adequate identifying characteristics were not used

(7) An aural examination was made of each voice sample to determine if pattern similarities or dissimilarities noted were the product of pronunciation differences, voice disguise, obvious drug or alcohol use, altered psychological state, electronic manipulation, etc.

(8) An aural comparison was then made by repeatedly playing two voice samples simultaneously on separate tape recorders, and electronically switching back and forth between the samples while listening on high-quality headphones. When one sample had a wider frequency response than the other, bandpass filters were used to compensate during at least some of the aural listening tests.

(9) The examiner then had to resolve any differences found between the aural and spectral results, usually by repeating all or some of the comparison steps.

(10) If the examiner found the samples to be very similar (identification) or very dissimilar (elimination), an independent evaluation was always conducted by at least one, but usually two other examiners to confirm the results. If differences of opinions occurred between the examiners, they were then resolved through additional comparisons and

discussions by all the examiners involved. No or low confidence decisions were usually not reviewed by another examiner.

IV. SURVEY RESULTS

The survey found that in 2000 voice comparisons, the following decisions and errors were observed:

4

Page 54: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

DecisionsNumberPercent (%)

No or low confidence130465.2

Eliminations37818.9

Identifications31815.9

Errors

False eliminations20.53

False identification10.31

Most of the no or low confidence decisions were due to poor recording quality and/or an insufficient number of comparable words. Decisions were also affected by high-pitched voices (usually female) and some forms of voice disguise.

V. CONCLUSIONS

(1) The observed identification and elimination errors probably represent the minimum error rates expected under actual forensic conditions, since investigators are not always correct in their evaluation of a suspect’s involvement, due to limited physical evidence, faulty eyewitness statements, etc.

(2) The stated results should only be considered valid when compared with examiners having the same qualifications and using the same comparison procedures.

(3) The FBI has emphasized signal analysis and pattern recognition skills for conducting voice identification examinations, more than formal training in speech physiology, linguistics, phonetics, etc., though a basic knowledge of these fields is considered important. ACKNOWLEDGMENTS

5

Page 55: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Thanks are due to the following colleagues who were involved in conducting the comparisons used in this survey:

Steven A. Killion, Barbara Ann Kohus, Dale Gene Linden, Gregory J. Major, Artese Savoy Kelly, Keith W. Sponholtz, Ernest Terrazas, Richard L. Todd, and Charles Wilmore, Jr.

1 w. Koenig, H. K. Dunn, and L. Y. Lacey, J. Acoust. Soc. Am. 18, 244(1946)2G. M. Kuhn, J. Acoust. Soc. Am. 76, 682—685 (1984).3R. H. Bolt, F. S. Cooper, E. E. David, Jr., P. B. Denes, J. M. Pickett, and K. N. Stevens, J. Acoust. Soc. Am. 47, 591—612 (1970).4R. H. Bolt, F. S. Cooper, E. E. David, Jr., P. B. Denes, J. M. Pickett, and K. N. Stevens, J. Acoust. Soc. Am. 54, 531—534 (1974).5K. N. Stevens, C. E. Williams, J. R. Carbonell, and B. Woods, J. Acoust Soc. Am. 44, 1596—1607 (1968).6R. H. Bolt, F. S. Cooper, D. M. Green, S. L. Hamlet, J. G. McKnight, J. M. Pickett, 0.1. Tosi, and B. D. Underwood, “On the Theory and Practice of Voice Identification,” N.A.S.N.R.C. Publ. (1979).7L. G. Kersta, Nature 196, 1253—1257 (1962).80. Tosi, H. Oyer, w. Lashbrook, C. Pedrey, J. Nicol, and E. Nash, J. Acoust. Soc. Am. 51, 2030—2043 (1972).9R. H. Bolt, F. S. Cooper, E. E. David, Jr., P. B. Deres, J. M. Pickett, and K. N. Stevens, J. Acoust. Soc. Am. 54, 53 1—534 (1974).10R. H. Bolt, F. S. Cooper. D. M. Green, S. L. Hamlet, J. G. McKnight, J. M. Pickett, 0. I. Tosi, and B. D. Underwood, N.A.S.N.R.C. Publ., 60 (1979).11R. H. Bolt, F. 5: Cooper, D. M. Green, S. L. Hamlet, J. G. McKnight, J. M. Pickett, 0. 1. Tosi, and B. D. Underwood, N.A.S.N.R.C. Publ., 2 (1979).12B. E. Koenig, FBI Law Enforcement Bulletin (January and February,1980).13J. E. Paul, IEEE Circuits and Systems Magazine 1, 2—7 (1979).4J. E. Paul, paper presented at Voice Interactive Systems Subtag, Orlando, FL (Oct. 1984); hosted by U. S. Army Avionics Research and Development Activity, Ft. Monmouth, NJ.

 

6

Page 56: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Voice ComparisonApproved by ABRE Voice ID Board - April 1999

AMERICAN BOARD of RECORDED EVIDENCE -- VOICE COMPARISON STANDARDS

Abstract

This document specifies the requirements of the American Board of Recorded Evidence for the comparison of recorded voice samples. These standards have been established for all practitioners of the aural/spectrographic method of voice identification and are intended to guide the examiner toward the highest degree of accuracy in the conduct of voice comparisons. These criteria supersede any previous written, oral, or implied standards, and became effective in 1998.

Foreword

This document was developed by members of the American Board of Recorded Evidence, a board of the American College of Forensic Examiners, following their meeting in San Diego, CA in December, 1996. The document draws upon previously published material from the International Association for Identification, the International Association for Voice Identification, The Journal of the Acoustical Society of America, The Audio Engineering Society and The Federal Bureau of Investigation for much of its content. The contents of this document are for non-commercial, educational use. It is the intent of the Board to publish this document in the official journal of the American College of Forensic Examiners.

VOICE COMPARISON STANDARDS

Table of Contents

1. Scope2. Evidence Handling3. Preparation of Exemplars4. Preparation of Copies5. Preliminary Examination6. Preparation of Spectrograms7. Spectrographic/Aural Analysis8. Work Notes9. Reporting10. Testimony

 

1

Page 57: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

1. SCOPE

This standard specifies recommended practices for the handling, preparation and analysis of recorded evidence to be followed by practitioners of the aural/spectrographic method of speaker identification. The document covers specific instructions for the preparation of exemplar recordings, voice spectrograms and aural comparison samples. It defines criteria to be applied when arriving at conclusions that are based upon the oral evidence. It also includes requirements for reports and testimony that are offered by the expert witness regarding his findings in voice analyses.

This standard is intended as a guide based upon good laboratory practices for handling recordings that may be used in evidence. Persons handling evidence recordings should first obtain and follow the rules of the legal jurisdiction or jurisdictions involved. When a jurisdiction provides instructions, those should be followed. Only in the absence of such instructions should the recommendations of this standard be followed with the approval of the jurisdiction.

2. EVIDENCE HANDLING.

Since evidence involved in criminal or civil proceedings must meet the appropriate jurisdiction's Rules of Evidence, it is important to properly identify and safeguard it from the time of receipt until returned to the contributor or court. The ABRE has adopted as its standard for handling evidence the AES Standard "AES27-1996 - AES recommended practice for forensic purposes-Managing recorded audio materials intended for examination". The complete document is available at:

Audio Engineering Society, Inc.60 East 42nd StreetNew York, NY 10165

3 PREPARATION OF EXEMPLARS. The quality of the exemplars is critical in allowing an accurate comparison with unknown voice samples.

 

3.1 Production. The exemplars can be prepared by either the investigator, attorney, examiner, or other appropriate person. Whenever possible, an impartial individual knowledgeable of the known speaker's voice should be present to minimize attempts at disguise, changes in speech rate, adding or deleting accents, and other alterations. The known speaker should state his or her name at the beginning of the recording and repeat the unknown speaker's statement(s) from three (3) to six (6) times, depending upon the length of the unknown samples. Normally, the person preparing the exemplar should record his or her name and that of any other witnesses present.

3.2 Duplication of Recording Conditions.

2

Page 58: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

3.2.1 Microphone. Whenever possible, the same type of microphone system should be utilized when recording exemplars as was used for the original unknown recording. Therefore, if the unknown caller used a telephone, the exemplar should be prepared by having the suspect talk into one telephone instrument and be recorded at a second telephone set, located an appropriate distance away.

3.2.2 Acoustic environment. The exemplar recordings should be prepared in a quiet environment with relatively short reverberation times. Do not imitate noises present at the location of the unknown call or obvious reverberant effects.

3.2.3 Transmission line. Whenever possible, the same general type of transmission line, such as a telephone call, should be utilized when recording exemplars as was used for the original unknown recording.

3.2.4 Recording system. A good quality recording system should always be used in preparing exemplars; it is usually not necessary to imitate the system utilized in recording the unknown sample, but if the system is available and functional, it may be used. A standard cassette set at 1 7/8 inches per second or open reel tape recorder at 3 3/4 or 7 1/2 inches per second or a digital recorder should otherwise be used. Micro cassette and other miniature formats, speeds below 1 7/8 inches per second, and poor quality/inexpensive units are not recommended. Before the known speaker is allowed to leave the exemplar-taking session, the recordings should be played back to insure that the samples are of high quality and properly prepared.

3.2.5 Recording media. Good quality tape or other appropriate recording media should always be used in preparing exemplars; it is not necessary to duplicate the type of tape utilized in recording the unknown sample. The tape should either be new (preferred) or properly bulk erased.

3.3 Duplication of Speech Delivery.

3.3.1 Reading v. recitation. The suspect should be allowed to review the written text or transcription before actually making the recorded exemplars. This familiarity will usually improve the reading of the text and response to oral prompts and increase the likelihood of obtaining a normal speech sample. When a suspect cannot or will not read normally, it is advisable to have someone recite the phrases in the same manner as the unknown speaker and have the suspect repeat them in a similar fashion. Ideally, the exemplar should be spoken in a manner that replicates the unknown speaker, to include speech rate, accent (whether real or feigned), hoarseness, or any abnormal vocal effect. The individual taking the sample should feel free to try both reading and recitation, until a satisfactory exemplar is obtained.

3.3.2 Repetition. Multiple repetitions of the text are necessary to provide information about the suspect's intraspeaker variability. All material to be used for

3

Page 59: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

comparison should normally be read or recited from three (3) to six (6) times, unless very lengthy.

3.3.3 Speech rate. Exemplars should be produced at a speech rate similar to the unknown voice sample. In general, the suspect is instructed not to talk at his or her natural speaking rate if this is markedly different from the unknown sample. An effort should be made through repetition to appropriately adjust the speech rate and cadence in the exemplar to that in the questioned recording.

3.3.4 Stress/Accents. Stress includes the emphasis and melody pattern in syllables, words, phrases, and sentences. If prominent or peculiar stress is present in the questioned recording, exemplars should be obtained in a similar manner, if possible. Spoken accents or dialects, both real and feigned, should be emulated by the known speaker. The recitation mode is the better technique for accomplishing this.

3.3.5 Effects of alcohol or other drugs. Since the degree and type of effects from alcohol and other drugs varies from person to person, an attempt to duplicate these vocal changes is not recommended when obtaining the exemplar. If the suspect appears to be under the effects of alcohol or other drugs at the time of the exemplar recording the session should be rescheduled.

3.3.6 Other. If any other unique aural or spectrally displayable speech characteristics are present in the questioned voice, attempts should be made to include them in the exemplars.

3.4 Marking. Same as Sect. 2

 

 

4 PREPARATION OF COPIES.

4.1 Playback of Evidential Recordings. The proper playback of the unknown and known voice sample is critical, since it provides the optimum output for the aural and spectral analyses.

4.1.1 Track determination. In situations where the questioned recording was made on equipment of unknown origin or configuration, it may be necessary to analyze oxide on the recording before playing it back. The recorded track position and configuration may be determined by applying an appropriate ferrofluid to the oxide side of analog tapes in a high amplitude portion of the recording. The treated area is then viewed under low magnification to determine the track configuration and offsets.

4.1.2 Azimuth alignment. Where there is evidence of an audio level or clarity problem during playback, azimuth alignment should be checked and adjusted if necessary by either an inspection of the developed magnetic striations (see track

4

Page 60: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

determination above), frequency analysis of the recorded material, or adjustment of the reproducer head azimuth for maximum high frequency output. All audio miniature cassettes, standard cassettes, and open reels (other than loggers) recorded at 15/16 inches per second (2.4 centimeters per second), or less, should be carefully examined for loss of higher frequency information, which often occurs in these formats.

4.1.3 Speed accuracy. Errors in playback speed will cause corresponding variations in the voice frequency, both aurally and spectrally. The playback speed error should be determined for all recordings containing known discrete tones, and then corrected on a reproducer with speed-adjustment circuitry. A Real-Time (RT) Analyzer or Fast Fourier Transform (FFT) analyzer system should be used that allows a resolution of 1% (+0.60 hertz) or better at 60 hertz. Where a known signal is present on the recording, a frequency counter may be employed to correct tape speed. Ideally, there should be less that a 3% error between questioned and known samples that are being compared.

4.1.4 Reproducer. Using the information gleaned from the examinations of the track, azimuth alignment, and speed, a high-quality playback device is configured to allow optimum output.

4.2 Direct Copies. The following information is provided for the analog reel copies that are needed for processing on the Voice Identification, Inc., Series 700 sound spectrograph. If the spectrograph being utilized has a digital memory, the requirements for cabling and retention are still applicable. Even with digital memory systems, a high quality digital or analog tape copy should still be prepared and maintained.

4.2.1 Format. All copies are prepared in a full track, 7 1/2 inches per second format on 1.0 mil or thicker audio tape from a reputable manufacturer. Normally, new, unused reels of tape should be utilized; however, previously recorded tape can be used if either bulk erased or over-recorded on a full track recorder with no input.

4.2.2 Cabling. All copies must be prepared with good quality cables from the playback device to the line input of the recording unit. No loudspeaker-to-microphone copying procedures are permitted.

4.2.3 Recording unit. A separate professional reel recorder, or the one incorporated in the Series 700 Series Spectrograph, is required. At least once a year, the recorder must be checked by a technically competent individual to determine the unit's playback speed accuracy, distortion level, flutter, record/playback frequency response, and record level. The recorder must meet the following criteria: playback speed within 0.15% distortion of less than 3% at 200 nWb/m, wow and flutter below 0.15% (NAB unweighted), record/playback frequency response of 100 to 10,000 hertz + 3 decibels at 200 nWb/m, and a 0 VU level no greater than 250 nWb/m. If the recorder does not meet all of these standards, it must be repaired and/or adjusted. If a digital system is utilized by

5

Page 61: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

the examiner, the system should be checked at least once a year by a technically competent individual according to the manufacturer's written instructions. Digital systems should have almost unmeasurable speed errors, wow and flutter, distortion, and frequency deviations.

4.2.4 Retention. The direct copies must be retained at normal room temperatures and humidity for at least three (3) years, unless the case has been completely adjudicated or the contributor requires the return of all materials used by the examiner.

4.3 Enhanced Copies. When the original recording contains interfering noise and/or limited frequency response, enhanced copies may provide improved audibility and more usable spectrograms. At times, separate enhanced copies will have to be prepared for the aural and spectral examinations to provide optimum results for each. The following information is specifically provided for the analog reel copies that are needed for processing on the Voice Identification, Inc., Series 700 sound spectrograph. If the spectrograph being utilized has a digital memory, the requirements for cabling and retention are still applicable. Even with digital memory systems, a high quality digital or analog tape copy should still be prepared an maintained. A written record of the settings on the devices used should be maintained.

4.3.1 Equalizers. Parametric or graphic equalizers can boost and attenuate selected frequency bands to normalize the recorded speech spectrum. Though an FFT or RT analyzer is of considerable assistance in adjusting the spectrum, a final decision on the equalizer settings should be made by either listening and/or preparing spectrograms, depending upon the enhanced copy's use.

4.3.2 Notch filters. These devices allow the selected attenuation of discrete tones present in the recordings. An FFT or RT analyzer is of considerable assistance in identifying the frequency of the tones and optimally centering the filter's notch.

4.3.3 Deconvolutional filters. These digital devices both automatically attenuate sounds correlated longer than a specified time and flatten the sound spectrum. The filter can, at times, provide improved spectrographic and aural samples for examination. Care should be taken to insure that the adaptation rate is not set at a value that starts to delete speech information.

4.3.4 Other filters. Band pass, shelving, comb, user-characterized digital, and other filters are helpful in a small number of voice identification cases.

4.3.5 Format. Same as 4.2.1.

4.3.6 Cabling. Same as 4.2.2.

4.3.7 Recording unit. Same as Section 4.2.3.

4.3.8 Retention. Same as Section 4.2.4.

6

Page 62: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

 

5 PRELIMINARY EXAMINATION.

A preliminary examination is conducted to determine whether the unknown and known voice samples meet specific guidelines to allow continuation of the examination.

5.1 Original/Duplicate Recordings. The unknown and known voice samples must be original recordings unless listed as a specific exception below. Copies not meeting these guidelines cannot be used for examination. Short time restraints imposed by the contributor are not considered an exception. When access to the original recording is denied due to legal restraints, copies may be used under the allowed exceptions. The exceptions for not examining the original recordings are:

a. If the original recording has been erased or destroyed, the examiner should then use the best first-generation copy available;

b. The copies were prepared by a qualified voice identification examiner or other technically competent individual following Section 4 guidelines;

c. If the original recording is in a relatively unique format or part of a digital storage system, the examiner or other technically competent individual should prepare the copies from the original material following Section 4 guidelines. If that is not possible, then detailed telephonic and/or written instructions should be given to the individual preparing the copies. Copies produced by non-technical individuals should be closely analyzed in the laboratory to insure that the duplication process was properly done.

5.2 Verbatim/Non-verbatim. The known, or another unknown voice sample, must be either wholly verbatim (preferred), or partially verbatim to allow meaningful comparisons with unknown voice samples. A partially verbatim sample should include phrases and sentences containing at least three (3) similar, consecutive matching words. An example of the use of partial verbatim samples would be two (2) unknown recorded false fire alarms containing, at times, nearly identical phraseology. If no verbatim recordings are submitted by the contributor, the examiner may analyze the unknown samples to determine whether they would meet the guidelines if appropriate known voice samples are submitted at a later time.

5.3 Number of Comparable words. There must be at least (10) comparable word between two (2) voice samples to reach a minimal decision criteria. Similarly spoken words within each sample can only be counted once. It is noted that in most voice samples at least some of the words identified at this point will not be useful in the final examinations.

5.4 Quality of Voice Samples. This preliminary aural and spectral review is to determine if the voice samples are of sufficient quality to allow meaningful comparisons between them.

7

Page 63: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

5.4.1 Disguise. Samples, or portions of samples, that contain falsetto, true whispering (in contrast to low amplitude speech), or other disguises that obviously change or obscure the vocal formants or other speech characteristics, may need to be eliminated from comparison consideration. Other types of disguise may or may not be usable, depending upon the nature of the disguise. Sometimes a known voice sample with the same type of disguise can be compared, but the examiner should exercise caution in such examinations.

5.4.2 Distortion. Samples, or portions of samples, that include high-level linear and/or nonlinear distortion should be eliminated from comparison consideration. Such distortion can result from saturation of magnetic tape or overdriven electronic circuits, and can produce artifacts, including formants that did not exist in the original speech information.

5.4.3 Frequency range. Samples, or portions of samples, that are restricted in upper frequency range and produce less than two complete speech formants are of limited value to the examiner. Samples producing three or more speech formants provide the examiner better information with which to make a comparison. Sometimes the use of enhanced copies can allow the frequency range to be extended but note the limitations in Section 7.1.3.

5.4.4 Interfering speech and other sounds. Samples, or portions of samples, that contain any extraneous speech information or sounds which interfere with aural identification or spectral clarity should be eliminated from comparison consideration unless the sounds can be sufficiently attenuated through enhancement procedures.

5.4.5 Signal-to-noise ratio. Samples, or portions of samples, containing recording system or environmental noise that impedes aural identification or spectral clarity should be eliminated from comparison consideration unless the noise can be sufficiently attenuated through enhancement procedures.

5.4.6 Variations between samples. Though the following variations can quickly end a voice comparison, the problem can often be remedied by obtaining additional known samples:

a. Transmission systems. Normally, samples being compared should be produced through the same type of transmission system, for example, the telephone, a microphone in a room, or a RF transmitter/receiver. If aurally or spectrally the samples are noticeably different due to the dissimilarities in the transmission systems and filtering does not rectify these differences, no further comparisons should be made.

b. Recording systems. Normally, samples being compared should be produced on either good quality, or compatible, recording systems. However, if the recordings contain uncorrectable system differences that affect aural and spectral characteristics, no further comparisons should be made. Examples of

8

Page 64: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

recording differences that can affect the results include high-level flutter, gross speed fluctuations, and voice-activated stop/starts.

c. Speech delivery. Normally, samples being compared should have the speakers talking in the same general manner, including speech rate, accent, similar pronunciation, and so on. However, in cases where this has not been done, as in poorly produced known exemplars, no further comparisons should be made.

d. Other. Any other differences between the voice samples that noticeably effect aural and spectral characteristics should be closely reviewed before proceeding with the examination.

 

 

6 PREPARATION OF SPECTROGRAMS.

6.1 Sound Spectrograph. The examiner must use a sound spectrograph, or a digital system, that allows the identification and marking of each speech sound on the spectrogram by either manual manipulation of the drum while listening to the recorded material or the separate identification of the individual sounds on a computer monitor. Spectrographs used must be of professional manufacture, such as the Voice Identification 700 Series or professional computerized systems, such as the Kay Elemetrics Model 5500. The spectrograph should be calibrated at least every six (6) months according to the manufacturer's instructions.

6.1.2 Print Quality. Spectrographic prints must be produced either in an analogue format or, if from a computerized system, must be printed with a minimum of 600 dots per inch resolution.

6.2 Format.

6.2.1 Filter bandwidth. A 250 to 300 hertz bandwidth filter is recommended for the production of most spectrograms. A 450 to 600 hertz bandwidth filter may sometimes improve the formant appearance for high-pitched voices. Narrower filters should only be used for non-voiced sounds and calibration purposes.

6.2.2 Mode. The bar display mode must be used for all spectrograms with the high-shaping equalizer engaged (except when an enhanced copy is being used that has already properly shaped the spectrum).

6.2.3 Frequency range. An appropriate frequency range should be chosen that fully displays all speech sounds in the unknown voice sample. The known voice spectrograms are then prepared using the same frequency range.

6.2.4 Direct v. enhanced. When enhanced copies are used for the examination, at least some spectrograms must be prepared from the direct copies.

9

Page 65: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

6.3 Marking. Each spectrogram must be marked below each speech sound, either phonetically, orthographically, or a combination of both. Great care should be taken to insure that the speech sounds are accurately designated as to how they were spoken, which may not be their correct pronunciation. The spectrograms should be appropriately labeled with identifying information such as specimen, case, and laboratory identifiers. The spectrograms may be marked consecutively for each unknown and known sample. Known and unknown sounds may be marked in different colored ink to facilitate comparisons.

6.4 Retention. All spectrograms should be retained for at least three (3) years after completion of the examination, unless the case has been completely adjudicated or the contributor requires the return of all materials used by the examiner.

 

 

7 SPECTROGRAPHIC/ AURAL ANALYSIS.

7.1 Pattern Comparison.

7.1.1 Intraspeaker consistency. The examiner must visually compare similarly spoken words within each voice sample to determine the range of intraspeaker variability. If there is considerable variability, the word must not be used for comparison. If there is considerable variability in a number of words in a sample, the sample should not be used for comparison. This is often encountered with disguised voices and known exemplars from uncooperative individuals.

7.1.2 Similar speech sounds. Only speech sounds of similarly spoken words should be compared between voice samples. Comparison of the same speech sound but in different words, should be avoided.

7.1.3 Direct v. enhanced. When using spectrograms from direct and enhanced copies, both should be visually compared to words from the known or questioned voice sample. The examiner should be cognizant that the enhancement process may distort the spectral energy distribution, thus increasing the likelihood of a false elimination.

7.1.4 Number of comparable words. This is determined by the total number of different words present in both samples that meet the standards set forth in Section 5.4.1 - 6. A similar or nearly similar word appearing more than once in one or both samples should be counted only as one comparable word.

7.1.5 Speech characteristics.

a. General formant shaping and positioning. A formant is a band of acoustic energy produced by spoken vowels and resonant consonants. Formants and other vocal patterns produced on the spectrograms are visually compared by the

10

Page 66: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

examiner. Generally, the spoken word will produce a set or sets of three (3) or more observable formants. A good pattern match exists when the majority, if not all, of the formant shaping and positioning exhibit strong similarities. A precise photographic match rarely occurs even between two (2) consecutive utterances of the same word spoken by the same individual. Conversely even very different voices can exhibit similarities in general formant shaping and positioning for some words. Examination of these patterns must be conducted between each comparable word of the voice samples.

b. Pitch striations. Pitch, or fundamental frequency, can be a useful characteristic for distinguishing between speakers. Pitch information is displayed on a spectrogram in the form of closely-spaced vertical striations, with the spacing and shaping being useful parameters of the individual talker. Differences in the pitch rate and the smoothness or coarseness of the pitch quality should be examined both spectrally and aurally; but most talkers are characterized by fairly wide pitch ranges.

c. Energy distribution. Energy distribution of certain vocal sounds can assist the examiner in analyzing similarities and differences between voice samples. Certain phonemes are displayed primarily by their energy distribution diffused across a certain frequency range. Plosive and fricative consonants are displayed along the frequency axis as concentrated dark energy distribution patterns. Although the characteristics of energy distributions, especially bursts, are more dependent upon the type of sounds produced than the speakers, some talker-dependent characteristics can be observed.

d. Word length. The time length of a particular spoken word can be readily compared between voice samples. When a person speaks more slowly or faster than normal, the time between words is usually more affected than the length of the individual words. It is noted that a word appearing at the end of a sentence or phrase is usually longer than the same word appearing in the middle.

e. Coupling. The effects of inappropriate coupling can often be observed in spectrograms as either diminished or enhanced energy in the frequency range between the first and second formants. Coupling is related to the open/close condition of the oral and nasal cavities. In normal speaking the nasal cavity is coupled to the oral cavity for nasal sounds, such as "n", "m", and "ng". However, some talkers are hyper nasal, producing nasal-like characteristics in inappropriate vocal sounds; other speakers are hypo nasal producing limited nasal qualities even when appropriate.

f. Other. Plosives, fricatives, and inter-formant features should be spectrally compared between samples by the examiner. Other sounds such as inhalation noise, repetitious throat clearing, or utterances like "um" and "uh" can sometimes be compared to the known exemplar if they have been successfully replicated.

7.2 Aural Comparison.

11

Page 67: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

7.2.1 Short-term memory. An aural short-term memory comparison must be conducted either by playing the two (2) samples on separate playback systems with a patching arrangement to allow rapid switching between them or by recording short phrases or sentences from each sample on the same recording. The short-term memory playback tape should contain all words used in the spectrographic comparison. The two (2) samples should be reviewed at approximately the same speech amplitude and with the same general frequency range. The frequency range may be normalized between the samples by using band pass filtering on the sample with the widest frequency range to duplicate the range found on the other sample.

7.2.2 Direct v. enhanced. When direct and enhanced copies have been produced, both should be aurally compared to the known or questioned sample. The examiner should recognize that though enhancement procedures often improve intelligibility, they can also produce changes, at times, that can make samples of the same talker sound somewhat different.

7.2.3 Pronunciation. Only similarly pronounced words should be compared between samples.

7.2.4 Intraspeaker consistency. The examiner must aurally compare similar words within each sample to determine if they are spoken in a generally consistent manner. If intraspeaker variability is present for a particular word, that word should not be compared to the other voice sample. If considerable intraspeaker variability is present in the entire sample, that sample should not be used for comparison. This is often the problem with disguised speech and known exemplars from uncooperative individuals.

7.2.5 Speech characteristics.

a. Pitch. See sect. 7.1.5.b.

b. Intonation. Intonation is the perception of the variation of pitch, commonly known as a melody pattern. Spontaneous conversation will normally exhibit this characteristic to a greater extent than a passage that is read by the speaker.

c. Stress/Emphasis. The stress or emphasis within the words of the sample should be similar for different recordings of the same talker when no disguise is present.

d. Rate. The rate of speaking under the same conditions is relatively constant for a particular talker. However, rates of reading, recitation, and conversation will normally vary for the same talker.

e. Disguise. Obvious vocal disguises can disqualify a sample for comparison purposes. The examiner should carefully analyze the characteristics of the disguise in a sample and then determine if it is possible to make a meaningful comparison with another sample, whether it also contains a disguised voice or not.

12

Page 68: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

f. Mode. Certain speaker-dependent characteristics can be discerned from the mode in which a speaker initiates sounds. Speakers range from gradually to abruptly initiating voicing, which can reveal useful similarities and differences between two samples.

g. Psychological state. Listening usually reveals many of the effects of an altered psychological state upon the voice. Alterations may be characterized as nervousness, over-excitement, excessive monotone, crying, and so on. The examiner should be cautious in comparing samples with major changes due to an altered psychological state.

h. Speech defects. Speech defects are abnormalities in the voicing of sounds, and can include lisps, pitch and loudness problems, and poor temporal sequencing. Except for extreme cases, there are no criteria to assess whether a voice is considered normal or defective. Obvious, or even subtle, defects in the questioned or known voice samples can often provide vital information in the comparison decision.

i. Vocal quality. Vocal quality is the perception of the complex, dynamic interplay of the laryngeal voicing (pitch, intonation, and stress), articulator movement, and oral cavity resonances. Since each individual’s voice is relatively unique in its vocal quality, comparisons can provide important information regarding similarities and differences between the voice samples.

j. Other. Examples of other useful speech characteristics that are occasionally heard include long-term fluctuations of pitch (vibrato), vocal fry (extremely low pitching), pitch breaks, and stuttering.

7.3 Conclusions. Every aural/spectrographic examination conducted can only produce one of seven (7) decisions; Identification, Probable Identification, Possible Identification, Inconclusive, Possible Elimination, Probable Elimination, or Elimination. The following descriptions for each decision are the minimal decision criteria, and must be adhered to by the examiner, except that lower confidence level can always be chosen, even though the criteria would allow a higher degree of confidence. Within the range of probable decisions, the examiner may wish to clarify his findings, i.e. low probability, high probability, depending upon the quantity and quality of the comparable material available to the examiner. Comparable words must meet the previously listed criteria. The following are the seven (7) possible decisions.

7.3.1 Identification. At least 90% of all the comparable words must be very similar aurally and spectrally, producing not less than twenty (20) matching words. Each word must have three (3) or more usable formants. This confidence level is not allowed when there is obvious voice or electronic disguise in either sample, or the samples are more than six (6) years apart.

13

Page 69: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

7.3.2 Probable Identification. At least 80% of the comparable words must be very similar aurally and spectrally, producing not less than fifteen (15) matching words. Each word must have two (2) or more usable formants.

7.3.3 Possible Identification. At least 80% of the comparable words must be very similar aurally and spectrally, producing not less than (10) matching words. Each word must have two (2) or more usable formants.

7.3.4 Inconclusive. Falls below either the Possible Identification or Possible Elimination confidence levels and/or the examiner does not believe a meaningful decision is obtainable due to various limiting factors. Comparisons that reveal aural similarities and spectral differences, or vice versa, must produce an Inconclusive decision.

7.3.5 Possible Elimination. At least 80% of the comparable words must be very dissimilar aurally and spectrally, producing not less than (10) that do not match. Each word must have two (2) or more usable formants.

7.3.6 Probable Elimination. At least 80% of the comparable words must be very dissimilar aurally and spectrally, producing not less than fifteen (15) words that do not match. Each word must have two (2) or more usable formants.

7.3.7 Elimination. At least 90% of all the comparable words must be very dissimilar aurally and spectrally, producing not less than twenty (20) words that do not match. Each word must have three (3) or more usable formants. This confidence level is not allowed when there is obvious voice or electronic disguise in either sample, or the samples are more than six (6) years apart.

7.4 Second Opinion. A second opinion is not required, but may be obtained from another certified examiner when desired by either the examiner or the party submitting the evidence.

7.4.1 Independence. A second opinion must be completely independent of the first examiner's decision, and no oral or written information shall be provided regarding that first opinion.

7.4.2 Material provided. The second examiner should only be provided the originals, or direct and enhanced copies, any work notes under Sections 2, 3, and 4 and the spectrograms. The second examiner must not be provided any materials that reflect even partially, the first examiner's opinions regarding the examination.

7.4.3 Examination. A thorough analysis should be conducted by the second certified examiner, using the guidelines in Sections 5, 6 and 7 (except for 7.4). It is left to the discretion of the second examiner whether to prepare additional spectrograms or copies.

7.4.4 Resolving differences. If different decisions are reached by the two (2) examiners, a detailed discussion between them of the analysis will often lead to a

14

Page 70: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

resolution. If not, the lower confidence level must be reported and testified to when both decisions are an identification or elimination. If split between and identification and elimination, no matter what the confidence level, the decision must be inconclusive. A third independent decision can be obtained but the result will be the lowest confidence level, or an inconclusive of all the examiners involved.

7.4.5 Reporting. Whenever possible, the second examiner should prepare a short report listing the results of the second opinion. This is not necessary if both examiners are in the same organization. The name and results of the second opinion can then be included in the first examiner's work notes.

 

8 WORK NOTES.

8.1 Required Information. The examiner's work notes should be in accordance with Rule 26 of the Federal Rules of Evidence - Expert Witness Statement categories, and should contain, as a minimum, the following information:

a. Laboratory, case, and specimen identifiers;

b. Description of submitted evidence;

c. Chain-of-custody documentation;

d. Track determination, azimuth alignment, and speed accuracy information, where required, for each submitted sample;

e. Information on the duplication processes, including the type of equipment and format copies;

f. Information of the enhancement processes, if any, including the type of equipment, filter settings, and format copies;

g. List of the exact words used for comparison and whether they matched or not;

h. Name of any second opinion examiner and the results of that examination;

i. Final decision.

8.2 Retention. The work notes should be retained for at least three (3) years after completion of the examination unless the contributor has requested that all material relating to the case be returned.

9 REPORTING.

9.1 Format. The report should be typed, dated, and in a standard laboratory or business letter style. The content of the report should be in conformity with Rule 26 of the Federal Rules of Evidence. The following information must be included: a short description of the evidence being examined, a summary of the

15

Page 71: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

examination performed, the final decision, and a statement of accuracy. Exhibits, handouts and supporting documentation should be separate from the report. Business matters, such as payment of fees, should be set forth in separate communications and not included within the report.

9.2 Decision Statement. The report must clearly state which of the seven (7) decision options listed in Section 7.3 was the final result of the examination.

 

10 TESTIMONY.

The American Board of Recorded Evidence does not take a position as to whether or not a certified examiner should provide testimony regarding examination results. However, an examiner must follow the standards set forth in this document, including the appropriate criteria set forth in this section, whether they provide testimony, or not.

10.1 Testimony v. Investigative Guidance. Each specific organization or individual examiner must decide before conducting spectrographic voice identification examinations whether testimony will be provided. If not, the contributor must be advised of the investigative guidance policy and all oral and written reports should set forth this information.

10.2 Qualification List. The presentation of the qualifications of the examiner should be in conformity with Rule 26 of the Federal Rules of Evidence - Expert Witness Statement categories, regarding expert witnesses.

10.3 Pre testimony Conference. Discussion of the examination with the attorney before judicial proceedings is an important aspect of providing meaningful testimony and educating the attorney on the strengths and limitations of the technique. The conference should include a candid discussion, the inherent problems, identification of scientific literature that is either critical or supportive, and other information important to the testimony.

10.4 Appearance and Demeanor. Whenever possible, examiners must dress in proper business attire or appropriate law enforcement or military uniform for all judicial proceedings, maintain a professional demeanor even under adversarial conditions, and direct explanations to the jury, when present.

10.5 Presentation. The examiner should provide to the judge and/or jury, as a minimum, his/her qualifications, an overview of the spectrographic technique, its scientific basis, the details of the analysis procedures followed in the specific case, and the results of the analysis. The information should be presented in a form understandable to non-experts, but with no loss of accuracy.

 

16

Page 72: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

INFORMATION REGARDING STATUS FORENSIC VOICE IDENTIFICATION

April 10, 1996 by Lonnie Smrkovski and Steve Cain

Court Decisions:

The most recent appellate decisions include:

U.S. v Tonya and Tanya Smith, 869 F2d 348,354, Circuit Court of Appeals for the 7th Circuit1989. Admissibility affirmed.U.S. v Maivia, 728 F. Supp. 1471, District of Hawaii, 1990. Allowed expert testimony by bothprosecution and defense experts.

Earlier decisions include

U.S. v Raymond and Addison, U.S. Dist. Court of Appeals, 498 F2d 741,743,744,15 CrL2248,1974. Held evidence inadmissible but affirmed conviction.U.S. v Frank, 6th Circuit Court of Appeals, 511 F2d 25,16 CrL 2499,1975. Held evidenceadmissible.U.S. v Baller, 4th Circuit Court of Appeals, 519 F2d 463,17 CrL 2359 1975.U.S. v McDaniel, Court of Appeals for the District of Columbia, 538 F2d 408, I9CrL 2234,1976. Held evidence inadmissible but affirmed conviction.U.S. v Brown, Court of Appeals for the District of Columbia, 557 F2d 541,21 CrL 2356,1978.Held evidence inadmissible but affirmed conviction.U.S. v Williams, 2nd Circuit Court of Appeals, 583 F2d 1194, 1199. Held evidence admissible.

Present Use of Forensic Voice Identification:

Forensic voice analysis for the purpose of identification and elimination has been and continues to be applied in criminal and civil cases on a regular basis since 1971. The method has been extremely useful in identifying and eliminating suspect voices. Errors of identification by lay persons occur in many cases because voices can be sufficiently similar in pitch, vocal quality resonance, and dialect to fool the human car, i.e. identical twins, other siblings, parent and child, as well as unrelated speakers. The acoustic environment and transmission carrier bandwidth, transmitter and receiver, and recording device can impact on the sound of the voice.Aural/spectrographic analysis has been paramount in eliminating falsely accused

1

Page 73: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

suspects in both criminal and civil cases and subsequently applied to identify the responsible caller. Following ethical standards and applying adopted protocol and standards of the International Association for Identification, certified examiners conduct voice analysis and make determinations completely independent of a perhaps desired result by clients. The examiner conducts analysis in search of a factual findingAural/spectrographic voice analysis is used by major foreign police agencies including the Royal Canadian Mount Police, the Japanese National Police Research Center, the Israeli National Police Jerusalem, the Spanish National Police -Madrid. the Ministry of Interior - Forensic Laboratory in Saudi Arabia, the Ministry of Justice Science Research Center - Taipei - Taiwan, the Ministry of the Interior - France, the Dubai Police Force Crime Laboratory- Dubai - United Arab Emirates, the Italian National Police - Rome. The agencies in Japan, Israel, Taiwan, Spain, Canada, Italy, and United Arab Emirates are offering and giving expert testimony in their respective courts of law. The status of court use in France and Saudi Arabia is not known. Most of the examiners in the countries listed have scientific degrees to include PhD and are considered to be within the scientific community. Examiners in Canada, Japan, Italy have been employing this method of identification/elimination for approximately 15 years.Prior to retirement Dr. Len Jansen worked for the South African National Police as a forensic voice examiner. He continues in private practice since he left the police agency. He too, is a scientist and has been working in this area for about 15 years.The Ministry Of Interior, Directorate Of Security, Ankara, Turkey has 3 degreed persons presently being trained in the United States and the Gendarmeria in Argentina has requested training for 3 people. The Vietnamese government recently purchased equipment for forensic voice analysis and likewise is requesting training.Aural/spectrographic voice analysis has been available from police agencies, several private laboratories, and several universities in the United States since 1970 and continues to be available to the public and private sector.

Prepared by Lonnie L Smrkovskl, Institute for Forensic Voice and Tape Analysis, Lonnie L Smrkovski

& Associates, 4829 Tartan Lane, Holt, MI 48842.

Steve Cain, President, FTA Inc., 638 W Main St, Lake Geneva, WI 53147

 

2

Page 74: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

The Use of Voice as a Forensic Tool"by Steve Cain

National District Attorneys Association, NDAA Bulletin, Vol. 10 No. 6 Nov./ Dec. 1991 by Steve Cain

Selected Forensic and Physical Evidence Experts

This is the first of two articles on the use of the voice as a forensic tool by prosecutors. This article deals with the use of voiceprints in linking suspects to a crime or crime scene. A subsequent article will deal with tape enhancement and forensic tape authentication.

The caller on the 911 line was requesting an ambulance because, he said, he thought the woman in Apartment 204 was badly injured or perhaps dead. The caller did not identify himself.

The case turned out -to be murder and later, when the police had a suspect, the district attorney used a spectograph to check the suspect's voice with the tape-recorded voice of the 911 caller.

Bingo! The voiceprints matched and now the DA had some evidence to work with.

Many murder cases that hinge on the matching of voiceprints involve tape-recorded 911 calls, according to Steve Cain, one of the nation's leading experts on voiceprint (or spectograph) technology.

"Often it's the murderer who calls," he says, "because he feels sympathy for his victim and he calls to get an ambulance there before the victim dies. Or he may call to taunt the police and say, 'Hey, you s.o.b.s, that's one more I killed and you can't catch me.' Quite often the only clue the police have is that 911 call, with a tape from a recorder of not good quality. And we've got to clean it up, take out the extraneous noise and then get a sample of the suspect's voice, for comparison. The suspect cannot refuse to provide a voice sample. Under the law he's required to do it. All you need is a court order. Then we make the match and go to court and testify against him."

Cain is president of Applied Forensic Technologies International of Chicago and Lake Geneva, Wisconsin, which specializes in voiceprint and audio tape analysis, tape authentication, polygraph examinations and analysis of questioned documents. He has more than 25 years experience in forensic science, including service as an agent in both the U.S. Secret Service and Internal Revenue Service. He was chief of the voiceprint units and senior document examiner for both the Secret Service and IRS as well as chief polygraph examiner for the IRS, with extensive courtroom testimony experience, including California's notorious Hillside Strangler case.

1

Page 75: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Steve Cain

As Cain explained in an article he wrote for the Criminal Division of the U.S. Department of Justice — in collaboration with Lonnie Smrkovski, chief of the voiceprint unit of the Michigan State Police and Mindy Wilson, a psychologist and private examiner practicing in Lansing, Michigan — the fundamental principle of voice identification rests on the fact that like a fingerprint, every voice is unique and "individually characteristic enough to distinguish it from others through...analysis." Fingerprints are identified through literal analysis; voices are identified through comparative voiceprints. Cain points out that uniqueness in human speech is the product of two general factors.

"The first," he says, "lies in the sizes of the vocal cavities such as the throat, nasal and oral cavities and the shape, length and tension in an individual's vocal cords located in the larynx. The vocal cavities are resonators, much like organ pipes, which reinforce some of the overtones produced by the vocal cords, which produce formats or voiceprint bars. The likelihood that two people would have exactly the same size and configuration (is) very remote."The second factor in determining voice uniqueness is the manner in which the "articulators" or muscles of speech are manipulated when an individual is talking. The articulators include the lips, teeth, tongue, soft palate and jaw muscles, "whose controlled interplay"— Cain explains — "producesThe second factor in determining voice uniqueness is the manner in which the "articulators" or muscles of speech are manipulated when an individual is talking. The articulators include the lips, teeth, tongue, soft palate and jaw muscles, "whose controlled interplay"— Cain explains — "produces intelligible speech...The likelihood that two persons could develop identical use patterns of their articulators also appears to be very remote."

While Cain agrees that "there is disagreement in the so-called 'scientific community' on the degree of accuracy with which examiners can identify speakers under all conditions, there is agreement that voices can, m fact, be identified."

Several studies have been published on the reliability of voice identification. A Federal Bureau of Investigation survey of its own performance in the voiceprint examination of 2,000 forensic cases revealed an error rate of 0.31 percent for false identifications and 0.53 percent for false eliminations.

The process of identifying voices visually involves translating the wave patterns produced by the voice into a pictorial display called a spectogram. The spectogram serves as a permanent record of the words spoken and facilitates the visual comparison of similar words spoken by an unknown and known speaker's voice.

Is it possible for a person to "fool" a spectograph (the device that produces the spectogram)?

2

Page 76: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Not really, Cain says.

"You can disguise your voice," he notes, "but you're not fooling the spectograph; you're just not giving a parallel sample. If you distort your natural speaking voice to the point that you're not giving parallel voice samples you're really not comparing apples and apples. You're comparing apples and oranges. An experienced operator would notice this immediately. If I see this I won't stand for it and I will tell the court I will not accept such a sample and often they'll throw the defendant in jail for failing to comply with the district attorney's request for a natural, undisguised sample."

Cain says that it's essential that speech samples contain exactly the same words and phrases as those in the questioned sample, because only identical speech sounds are used for comparison. He says the suspect should not be allowed to read the phrases from a transcript but should repeat each phrase after it is spoken by someone else. To avoid an unnatural response, the suspect should repeat the first phrase and proceed in the same manner with each successive phrase.

What are the limits of the accuracy of voiceprints?

'The limits," says Cain, "generally are the quality of the evidence it self. It's like any other pattern-matching skill, such as handwriting. You have to have good samples."

Do voiceprints compare in accuracy to fingerprints?

"If done properly, yes, in my opinion," says Cain, who adds, "However, with fingerprints you have static images that don't change unless some damage is done to the fingerprint ridge detail. In voiceprints these are dynamic qualities. For example, when you say good morning to your wife or husband early in the morning before you've had your first cup of coffee and then say it again later in the morning there will be some changes in the pitch of your voice and how you stress certain vowels. That's why we get several repetitions of a speaker's voice, saying the same thing, so we can find the range of variation."

Courts have repeatedly held that requiring a suspect to submit to voice samples for the purpose of comparison does not violate the suspect's Fifth Amendment rights. The definitive case is U.S. v. Wade, 388 US. 218, 87 S Ct. 1926 (1967) which held that the privilege against self-incrimination offers no protection from compulsion to submit to speaking for voice identification or to writing, photographing, fingerprinting and measurements.

Voiceprints are gaining progressively more approval in the federal courts. The second, fourth, sixth and seventh circuits already have approved and the ninth has given a tentative OK. Voiceprints are not admissible in California state courts, Cain notes, because, he says, "The proponents made stupid errors in overstating the accuracy of the tests in a hypothetical that a good defense attorney posed for them. This was a mistake and it has plagued us for years."

3

Page 77: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Steve Cain's advice to district attorneys contemplating the use of voiceprints in cases where such evidence is admissible and would be valuable:

"Have the FBI or someone like myself in private practice examine the tape and extract all the extraneous noises and clean it up, so you get a clean, clear sample. And by all means have an experienced examiner take the voiceprints."

Next: Forensic tape enhancement and authentication.

 

"Tape Enhancement"National District Attorneys Association, NDAA Bulletin, Vol. 11 No. 1

Jan./Feb. 1992 by Steve Cain

This is the second of two articles on the voice as a forensic tool by prosecutors. This article deals with the enhancement and authentication of tape-recorded voices and is adapted from two articles on the subject by forensics specialist Steve Cain, president of Applied Forensic Technologies, Intl., plus discussions with Cain.

You're a prosecutor preparing a major drug or conspiracy case and one of your principal pieces of evidence is the tape-recorded conversations of the alleged conspirators.

The problem is that the quality of the tape — as the result of background noises and other factors— is so marginal that you run the risk of the jury discounting it.

What do you-do?

According to Steve Cain, president of Applied Forensic Technologies, Intl., and one of the nation's leading experts on voiceprint (or spectrograph) technology, you have tape quality enhanced and authenticated by qualified specialists.

"An integral part of any voice identification task," says Cain, "is to attempt to ensure that the most intelligible speech samples are available for comparison purposes.

All too often, however, the limitations of surveillance, recorder, microphone, and adverse room and reverberation effects severely degrade the audio signal. Through the proper selection of a variety of analog and digital tape filtering devices, unwanted sounds often can be attenuated."

The output signals of tape recorders can be damaged by three general factors — noise, interference and distortion — each of which is caused by a specific condition. In addition, there are what Cain calls "adverse forensic influences" that

4

Page 78: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

include the bandwidth equaliza tion of telephone lines that limit voice frequencies to between 300 to 3500 hertz.

To reduce or eliminate various noise and distortion sounds from an audio tape, forensic audio specialists use a variety of filters. For example, when a tape hiss occurs within the speech frequency range, a so-called low-pass filter is used to eliminate it. When what is called a "low-end rumble" occurs, an appropriate high-pass filter is used. "Comb filters" are used to reduce harmonically related noise such as a power supply hum.

"Because of inadequate equipment and poor operator technique," Cain says, "a host of...factors besides noise can reduce speech intelligibility. Improper recorder speed or transient mechanical problems, along with poor quality tape or unstable AC power, all can contribute to a poor quality recording."

The tape enhancement process, therefore, must start with examination of the equipment used and the recording tape speed. Once a tape is "cleansed" of interfering noises to make the recorded voices or other pertinent sounds as clear as possible, it usually must be authenticated before it can be introduced as evidence in court, to avoid any charges of illegal tampering.

Probably the most famous tape authentication examination was the one conducted in 1974 by a group of forensic experts appointed by then U.S. District Judge John Siica in the Watergate case to examine the disputed 18-minute gap in a White House recording.

With the increasing number of drug-related and money laundering cases being prosecuted by federal, state and local prosecutors, the use of tape-recorded conversations and related sounds is increasing correspondingly, calling— Cain says — for professional examination of tapes requiring enhancement and authentication.

As might be expected, the tape authentication procedures developed and suggested by the FBI Technical Services Division's Signal Analysis Branch are detailed and voluminous.

The requirements include sworn testimony on the circumstances of the recording and equipment used, the original tape and recording device, written records of any damage, maintenance and repairs; detailed statements by the operator on the technical conditions existing at the time of the recording, including such factors as the power source, background environment, condition of the tape, etc.

The FBI suggests that forensic experts carefully examine the recorder right down to the marks left by ferrofluids that adhere to magnetic poles. Each recorder leaves a distinctive "fingerprint" in the form of electronic imprints along the tape surface. These imprints — unless physically altered — are identifiable among different records.

5

Page 79: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Finally, the FBI suggests a physical as well as spectrographic (wave form) examination of the tape to make a subjective determination. The physical examination involves a trained examiner listening to perceived pitch, quality, rate of speech, mannerisms, amplitude, breathing patterns, syllable couplings, background sound variation, hum and other acoustic effects, such as room reverberation. That done, you and your tape are ready for the trial.

Published in NDAA Bulletin Jan. 1992 Vol. 110 No. 6

6

Page 80: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

SOUND RECORDINGS AS EVIDENCE IN COURT PROCEEDINGS

By Steve Cain Prosecutor Magazine - September/October 1995

INTRODUCTION FEDERAL RULE OF EVIDENCE 901 (A) PROVIDES in general terms that the requirement of authentication or identification as a condition precedent to the admissibility of evidence is satisfied by proffered proof sufficient to support a finding that the matter in question is what its proponent claims it to be. A foundation for authentication of sound recordings was established in the federal courts in United States v. McKeever,*1* and upheld in cases such as United States v. McMillan.*2* In McMillan the court ruled that where a government agent testified that he heard the voice .of an informant at all times when he was making a recording of a telephone conversation, that this part of the conversation was accurate, and that immediately after the telephone calls were completed, a tape was replayed by the agent in the informant's presence to verify that the conversation had in fact been recorded and that the instruments were operating correctly, it was sufficiently established that the recordings were true and accurate as a basis for their admission in evidence.

In United States v. Kandiel,*3* the court ruled that any question concerning the credibility of a witness who identifies voices on a tape recording admitted into evidence simply goes to the weight which the jury accords the evidence, not its admissibility.

Referring to McMillan the court said:Applying [the McMillan case], we conclude that the government laid a proper foundation for introduction of the two cassette tapes into evidence. The tapes were found at appellant's home. Ahmed Kandiel [defendant's brother] testified that the tapes were made in Egypt and sent to appellant by their mother and father while Ahmed was living with appellant. The contents of the tape recordings have numerous references to people, places and activities that were corroborative of other testimony in the record. We believe the government has offered sufficient circumstantial evidence to establish the prima facie authenticity and correctness of the tapes.

Furthermore we find that the government sufficiently established the identity of the speakers through the testimony of Ahmed Kandiel. Appellant's argument that Ahmed's credibility was suspect, and that therefore his testimony was insufficient to establish foundation for the admission of the tapes is with out merit. Identification of a voice, whether heard firsthand or through mechanical or electronic transmission or recording, may be made "by opinion based upon hearing the voice at any time under circumstances connecting it with the alleged speaker." Fed. R. Evid. 901(b)(5). Any question concerning the credibility of the identifying witness simply goes to the weight the jury accords this evidence, not to its admissibility. United Statesv. Kirk, 534 E2d 1262, 1277 (8th Cir. 1976), cert.

1

Page 81: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

denied, 433 U.S. 907, 97 S. Ct. 2971, 53 L. Ed. 2d 1091 (1977). Our review of the record convinces us that the district court did not abuse its discretion in finding that proper foundation was laid for admitting the tapes. See United States v. Johnson, 767 E2d 1259, 1271 (8th Cir. 1985).*4*

The cases are, therefore, now in general agreement as to what constitutes a proper foundation for the admission of a sound recording and indicate a reasonably strict adherence to the rules prescribed for testing admissibility of recordings, as set forth in McMillan.*5*

These rules can be summarized as follows:

The recording device must have been capable of taking the conversation now offered in evidence

The operator of the device must be competent to operate the device

The recording must be authentic and correct

Changes, additions or deletions have not been made in the recording

The recording must have been preserved in a manner that is shown to the court

The speakers must be identified

The conversation elicited was made voluntarily and in good faith, without any kind of inducement.*6*

THE BASIC PROCESSOVER THE PAST 35 YEARS, ATTORNEYS HAVE UTILIZED THE basic process set forth in McMillan to create cases for admission of tapes or, on the opposition side, to deny admission of tape evidence.

This process involves the following elements:

Capability of the recording device:this first requisite may be fulfilled simply. The very existence of the tape recording proves that the recording device was functioning and capable of duplicating sounds.*7*

Competency of the operator:today most people know how to operate a tape recorder so this step is almost automatic. In United States v. McCowan,*8* the agent merely testified that he learned how to use the recorder on the day he made the tapes. The fact that he successfully made the recordings satisfied the competency requirement.

Authenticity and correctness of the recording:

2

Page 82: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

authentication is satisfied by evidence sufficient to support a finding that the matter in question is "what its proponent claims,*9* as decreed in Federal Rule of Evidence 901. The standard for correctness of a recording is whether "the possibility of misidentification and adulteration [is] eliminated, not absolutely, but as a matter of reasonable probability."*10*

Preservation of the recording with no additions, deletions or changes:an aural overview of the tape allows the court to hear signs (i.e., gaps) which might indicate tampering. If there exist signs of tampering, a forensic expert is often consulted. If there are no signs of tampering, a proper chain of custody documentation may suffice.*11*

Chain of custody:this fifth step has created stumbling blocks for proponents of admissibility. The proponent for the tape's admittance can assure the court that the item offered as evidence is substantially the same as it was originally by documenting its "chain of custody." A proper chain of custody begins with consecutively numbered and dated tapes. Careful logs are then kept which note the time of particular conversations and the locations on the tapes at the time of occurrence. These evidence tapes are sealed and stored in separate envelopes and appropriate chain of custody records are maintained by the evidence custodian. *12*

Identification of the speakers:Federal Rule of Evidence 901(b)(5) states that: "Voice identification is adequate if made by a witness having sufficient familiarity with the speaker's voice." The rule goes on to clarify that familiarity may be obtained previous to or after listening to the recorded voice. This standard for voice identification has been upheld in cases such as United States v. Rizzo, United States v. Bonanno, and United States v. Hughes)*13*

Voluntary elicitation of the recorded conversation:as long as one participant in the conversation is aware that he is being recorded, the tape fulfills this final requirement. This means that a defendant's Fourth Amendment rights are not violated when the conversation is electronically monitored by a government agent with consent of the government informant in the investigation.*14*

ADMISSIBILITY OF INAUDIBLE SOUND RECORDINGSIT IS A GENERAL RULE THAT A sound recording is admissible unless the inaudible portions or omissions are so substantial as to render the recording as a whole untrustworthy as evidence.*15* It has further been established that the question of admissibility of audible portions of tape recordings, when certain portions were inaudible, was properly addressed to the discretion of the-trial court.*16*

3

Page 83: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

RECENT COURT RULINGSTHE HISTORICAL PROCESS SET OUT ABOVE, AS FIRST ESTABLISHED in United States v. McMillan, is widely used today even though several recent court decisions provide more relaxed rulings on admissibility.

For example, in United States v. Traficant*17* the court stated that:"Recent cases have developed more flexible standards for the admission of tape recorded conversations. The most important criterion for admission is that the tapes accurately reflect the conversation which they purport to record .... This evidence may be circumstantial or direct, real or testimonial, and need not conform to any particular mode." Therefore, according to the more liberal admission rulings, a tape recording may be admitted into evidence if a proper chain of custody is proven. Or, if the chain is not strong enough, the proponent of the tape may submit it to a qualified forensic expert for authentication. In United States v. King,*18* the United States Court of Appeals for the Ninth Circuit characterized the elements of the process as "useful, but not dispositive guidelines for determining when a proper foundation for the introduction of sound recordings has been made." The Ninth Circuit said that the trial court, in the exercise of its discretion, must be satisfied that the recording is accurate, authentic and generally trustworthy.

EXAMINATION REQUESTS AND REQUIRED EQUIPMENTWHEN AN AUDIO TAPE IS SUSPECTED OF HAVING BEEN TAMPERED with, it may be forwarded to a qualified forensic audio specialist for authentication. Prosecutors often request investigation of deficiencies in the previously mentioned process.Examples of such problems are:

Credibility questions relating to the tape recorder operator

Chain-of-custody contradictions

Differences between the content of the tape and testimony of what was said.

Most often, however, a forensic expert is contacted when the tape is believed to have been altered or tampered with. Due to the nature of the allegations surrounding tampering issues, the examiner will requirements specific items from the party.

The Federal Bureau of Investigation, for example, has a protocol of required information, including:

The original tape

The tape recorders and related components used to produce the recording

4

Page 84: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Written records of any damage or maintenance done to the recorders, accessories and other submitted equipment

A derailed statement from the person or persons who made the recording, describing exactly how it was produced and the conditions that existed at the time, such as:1. Power source, including a portable generator or drycell batteries2. Input, such as telephone, radio frequency transmitter/receiver, miniature microphone, etc.3. Environment, such as telephone transmission line, restaurant, apartment, street, etc.4. Background noises, such as television, radio, unrelated conversations, computer games, etc.5. Foreground information, such as number of individuals involved in the conversation, general topics of discussion, closeness to microphone, etc.6. Magnetic tape, such as brand, format, when purchased, whether previously used7. Recorder operation, such as number of times turned on and off in the record mode, type of keyboard or remote operations for all known recorded events, use of voice-activated features, etc.

A typed transcript of the entire recording or, if that is not available, transcriptions of the portions in question. The items listed above are examples of what is required by a forensic expert as she begins an examination of questioned audio recordings.

Extraneous voices:background voices which at times appear to be as near as the primary voices (these can, at times, even block the primary voices).

TECHNICAL DEFINITIONSCERTAIN TECHNICAL DEFINITIONS SHOULD BE UNDERSTOOD by prosecutors and others in considering the technical process of examining sound recordings. They include the ones listed below.

FALSIFICATION OF TAPESA QUALIFIED FORENSIC EXPERT DETERMINES AUTHENTICATION by performing a number of scientific tests which detect evidence of tampering or falsification. The four basic types of tampering are these:

Deletion: the elimination of words or sounds by stopping the tape and over-recording unwanted areas

5

Page 85: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Obscuration: the mixing in of sounds of amplitude sufficient to mask waveform patterns which originally would show stops and starts in inappropriate places

Transformation: the rearranging of words to change con- tent or context

Synthesis: the adding of words or sounds by artificial means or impersonation.

ELECTROMECHANICAL INDICATIONS OF FALSIFICATIONTHESE ARE SOMETIMES REFERRED TO AS "ANOMALIES" AND include the following:

Gaps: segments in a recording which represent unexplained changes in content or context (a gap can contain buzzing, humming or silence)

Transients: short, abrupt sounds exemplified by clicks, pops, etc. (transients may indicate tape splicing)

Fades: gradual loss of volume (fades can cause inaudibility and are considered gaps when the recording becomes fully inaudible)

Equipment sounds: inconsistencies of context caused by the recording equipment itself (common equipment sounds include hums, static, whistles, and varying pitches)

DETECTING FALSIFICATIONSA FORENSIC EXPERT IS TRAINED to detect falsifications and to authenticate sound recordings. The expert correlates his observations of anomalies with machine functions to interpret events in the following ways.

Critical listening: this involves the use of human analytical capabilities to locate anomalies. The forensic expert listens with proper headphones to the original tape using high-quality analytical equipment. He first performs a preliminary overview of the original tape and notes events, including starts, stops, speed fluctuations, and other variations requiring further investigation. He then examines recorded events and categorizes them as environmental or non-environmental. After examining recorded events, the expert analyzes background sounds. He listens for abnormal changes, absences or the presence of environmental sound. The final phase of critical listening is an extensive audit of the foreground information. He concentrates on voices, conversations and other audible sounds. Here anomalies include sudden changes in a person's voice, abrupt unexplained topic change or strong foreground interruptions indicative of

6

Page 86: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

obscuration. The initial forensic process of critical listening provides foundation and direction for later intensive instrumental tests.

Physical inspection: the forensic expert next inspects for tampering with thorough visual inspection of the tape itself. She inspects the housing for pry marks, welding, size, label and date, consistent with the alleged recording date. She also measures the tape and assures that the splicing of the magnetic tape to the leader is consistent with a normal manufacturing process. Any other splices are noted as possible alterations.

Magnetic development: direct visual observation of the "developed" tape is conducted to find track widths, the type of recorder used and the presence or absence of residual speech signals.

Spectrum analysis: specialized computer equipment and programs to produce a visual interpretation of a frequency- versus-amplitude and frequency-versus-amplitude-versus-time displays. This allows the expert to view the entire spectrum or to zoom in on an area of particular interest thereby helping to characterize the acoustic quality of anomalies and identify their source.

Waveform analysis: a computer generated display representing time-versus-amplitude of recorded sounds in graphic form. With such analysis the expert can often measure signal return time, which reveals how long a recorder had been turned off. He can identify record- mode events, including the measurement of record-to- erase-head distance, determination of the spacing between gaps in multiple-gap erase heads and inspection of the signature shape and spacing of various record event signals.

Recorder performance: various electrical and mechanical measurements of standard and modified recorders for use in finding possible origins of buzz sounds, hum, etc.

CONCLUSIONIN ORDER TO SUBMIT SOUND RECORDINGS AS EVIDENCE IN court, a prosecutor or other attorney must establish that the tape is an authentic representation of the conversation it is said to record. The traditional method of establishing authenticity involves maintaining a chain of custody which logs all persons, times and locations concerned in the creation of the tape. Then, the tape must be officially sealed and stored to complete a proper chain of custody. However, even if this procedure is strictly observed, there may still be challenges to the tape's authenticity.

7

Page 87: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

The recording may contain inconsistencies suggestive of tampering. In such cases, a prosecutor may consult a qualified forensic examiner to inspect the tape. The examiner would initially listen critically for signs such as gaps, transients fades, equipment sounds or extraneous voices which indicate tampering. Then she would utilize other methods like physical inspection, magnetic development, spectrum analysis and waveform analysis to discover anomalies.

It is relatively easy to change the content of a recording by deleting words or sections, by obscuring meaning with over-recorded sounds, or by transforming context through rearrangement of selected phrases or by adding additional words through synthesis. Nevertheless, falsifications normally leave detectable magnetic and waveform acoustic signatures which can lead to forensic individualization of the evidential recorders and tapes.

ENDNOTES1 United States v. McKeever, 169 E Supp. 426, 430 (S.D.N.Y. 1958), rev'd on other grounds, 271 E2d 669 (2d Cir. 1959).2 United States v. McMillan, 508 E2d I01,104 (8th Cir. 1974), cert. denied, 42 1 U.S. 916 (1975); see also United States v. Kandiel, 865 E2d 967,973-974 (8th Cir. 1988), cert. denied, 487 U.S. 1210 (1988); Todisco v. United States, 298 E2d 208 (9th Cir. 1962).3 United States v. Kandiel, 865 E2d 967,973-974 (8th Cir. 1988),cert. denied, 487 U.S. 1210 (1988).4 Kandiel, 865 E2d at 974.5 McMillan, 508 E2d at 104.6 Id. at 104.7 United States v. Moss, 591 E2d 428, 433 (8th Cir. 1979); United States v. McCowan, 706 E2d 863 (8th Cir. 1983).8 McCowan, 706 E2d at 863.9 Zenith Radio Corp. v. Matsushita Electrical Industries Co., 505 E Supp. 1190 (E.D. Pa. 1980), and Finance Co. of America v. Bankamerica Corp., 493 E Supp. 895 (D.C. Md. 1980).10 Gass v. United States, 416 E2d 767,770 (D.C. Cir. 1969); United States v. Haldeman, 559 E2d 31 (D.C. Cir. 1976).11 United States v. Faurote, 749 E2d 40 (7th Cir. 1984).12 United States v. Craig, 573 E2d 455 (7th Cir. 1977), cert. denied, 439 U.S. 820 (1978).13 United States v. Rizzo, 492 E2d 443 (2d Cir. 1974), cert. denied, 417 U.S. 944 (1974); United States v. Bonanno, 487 E2d 654 (2d Cir. 1973); United States v. Hughes, 658 E2d 317 (5th Cir. 1981).14 United States v. White, 401 U.S. 745 (1971); United States v. Bonanno, 487 E2d 654 (2d Cir. 1973); United States v. Bishton, 463 E2d 887 (D.C. Cir. 1972); United States v. Quintana, 457 E2d 874 (10th Cir. 1972), cert. denied, 409 U.S. 877 (1972); United States v. Holmes, 452 E2d 249 (7th Cir. 1971), cert. denied, 405 U.S. 1016 (1972).

8

Page 88: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

15 United States v. West, 948 E2d 1042 (6th Cir. 1991); People v. Rogers, 543 N.E.2d 300 (Ill. 1989); State v. Rodfiguez, 583 N.E.2d 795 and. 1972).16 United States v. Enright, 579 E2d 980 (6th Cir. 1978); United States v. Gordon, 688 E2d 42 (8th Cir. 1982).17 United States v. Traficant, 558 E Supp. 996, 1002 (N.D. Ohio, 1983).18 United States v. King, 587 E2d 956, 961 (9th Cir. 1978).

 

9

Page 89: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Forensic Technologies

by Steve Cain

From OACDL Vindicator 1995

Applied Forensic Technologies, a member of the National Association of Criminal Defense Lawyers, developed Task Descriptions as an educational tool for

attorneys in specific areas of forensic evidence analysis:

1) Voice Identification- Comparison of an unknown voice against one or more known suspect voices for identification elimination purposes. The majority of state and federal appellate courts have upheld the scientific reliability and validity of voice comparison opinions.2) The use of analog and digital filters to improve the intelligibility of various voices appearing on a degraded tape recording (audio or video.)3) Tape Authentication- The examination of suspect audio and video tapes whenever questions arise concerning alleged editing/tapering of recorded material. The typical signs of alteration include unannounced stop/starts; over-recording (erasure of previously recorded material); pauses; and other suspicious record events.

Task Description:Video Image ProcessingToday many activities are recorded by video cameras. Many times criminal acts are recorded both intentionally and unintentionally. The quality of the video images at times may require some form of image processing to reveal information which may be helpful. Image processing of videotape can be helpful by improving visual images or by verifying that the videotape is authentic.

TheoryImage processing performed to improve the visual qualities of an image by forensic examiners is not image manipulation. The two forms of processing performed by forensic examiners are enhancement and restoration. When enhancement is used, the image is improved visually by traditional photographic techniques. An example of image restoration would be removing the motion blur of an object to produce a sharp image of the object.

Real World FactorsVideo images are considered low resolution The fact that an image is low resolution means that only a limited amount of information can ever be obtained from it. The better the equipment used, the better the chances that an image can be processed. Poor images with signal to noise ratios of 30db or better have a good chance of being processed in a positive way.

If a video image is of good visual quality, further processing will only prove to degrade the image. Zooming in on specific areas of a video will not provide additional information in most cases. If a video image is of such poor quality that

1

Page 90: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

it is seriously over or under exposed or seriously out of focus, little can be done. Image processing is a tool which can improve quality but it cannot provide information which has not been recorded.

Factors to ConsiderTo perform image processing on a video image, the original tape must be obtained. Without the original, it is improbable that an image can be improved. Also the original video must be used if an examination for authenticity is to be made.

Many financial institutions have equipped their facilities with real-time or time-lapse video security systems to record crimes such as robbery or ready teller fraud. Accurate identification of persons or objects captured on tape may be compromised, however, by poor picture resolution, poor focus, glare, poor illumination, or by motion of the camera or subject. Digital processing of video tapes can in many cases, enhance or restore such degraded images.

Video images (single-frame or full-motion) can be digitally enhanced with no risk of damage to the original recording. Image processing employs sophisticated computer hardware, high resolution monitors, and specialized printers. Originally degraded video signals are captured and converted to digital formats for storage for subsequent high- speed computations that correct exposure and contrast errors, sharpen or smooth the edges of objects to make them more legible, zoom in or out to reveal detail, filter to emphasize or de-emphasize motion, and selective highlighting of portions of images. Enhanced images can be stored or printed for use in criminal or civil investigations and court presentation.

Image restoration, on the other hand, attempts to recover information lost at the time the recording was made, for example, because of focusing errors or relative motion between the camera and subject. Motion is particularly troublesome in security applications because some video cameras used for security work use combinations of shutter speeds and apertures far less adequate than in broadcast applications. Consequently, restoration of images degraded by motion requires mathematical routines capable of estimating the velocity and acceleration of the subject and compensating for such motion at the level of individual pixels or picture elements.

Success in image enhancement and restoration depends in part upon the technical quality of the original recording, and in part upon the systems available for recovering images. The video imaging analyst can quickly determine whether a video recording can be improved for use as evidence.

Task Description:

Videotape Editing/Tampering Examination Foundation Requirements Videotape evidence must be material, relevant and competent to be admissible. In addition, videotape evidence is demonstrative evidence or evidence that can convey a relevant first-hand sense impression to the viewer. This evidence also must be

2

Page 91: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

authenticated or verified and shown to be a genuine, fair and accurate portrayal of what it is purported to reveal. Videotapes are defined as a form of photographic evidence under Rule 1001 (2) of the Federal Rules of Evidence.

Generally the prevailing trend in case law today is clearly in favor of admissibility. The four key elements governing the admissibility of videotapes include:

1. relevance;2. fairness and accuracy;3. the exercise of judicial discretion withrespect to probative value; and4. issues of competency such as hearsayobjections or violations of other exclusionary rules.

Technical Aspects of VideotapeThe basic recording process of video information proceeds initially from the capture of light and sound waves to the generation of electrical impulses and later to magnetically stored electrical impulses on the videotape and then, following the playback sequence, converts the signals from the magnetic tape into electrical signals that are capable of producing video and audio output. Most videotape involves a plastic tape coated material with magnetic particles such as iron oxide and other reproduction characteristics. At least four separate types of electrical impulses are stored on the videotape: 1) the luminance signal which deals with the overall brightness of the visual image; 2) the chrominance signal which contains the relative color values of the visual image; 3) the synchronization signal which synchronizes the timing and information from other signals; and 4) the audio signal. Present day videotape and camcorder formats include the VHS and S-VHS (in both Standard and 'C' cassette styles); 8 mm; Hi 8, and various Beta formats

Methods of AlterationMany camcorders also allow for recording of specific time and date information. Conventional post-production or out of camera videotape editing consists of imposing order of a sequence of scenes previously recorded. This is done by re-recording or "dubbing" sequences from the original videotape onto another videotape known as the "edit master." Traditional editing of videotape through cutting or splicing is rarely done as the effects are readily discernible by the viewer. Distortions resulting from varying playback speeds are very difficult to achieve with either conventional or digital video editing. The videotape may cause a "rolling" or smearing effect on the video portion unless it is played back at exactly 30 frames per second.

It should be noted that the editing or alteration of the audio competent of videotape evidence should be considered separately from the video component. The audio component may be re-recorded onto an edit master. Many videotape formats have at least two separate audio tracks which lend themselves to fairly easy manipulation during any tampering effort. Telltale signs of the editing process of the video signal may involve poor synchronization with the new

3

Page 92: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

narration scene to exist with the video component and ambient noise level. The audio signal in videotape is generally much less complex and contains much less information than the video signal and therefore is much easier to manipulate or fabricate. Using audio dubbing technology, it is possible to completely rearrange words, sounds and sentences spoken by an individual to produce audio segments with unintended, opposite and legally detrimental meanings.

Depending upon the skill involved in the editing process, videotape evidence can often be detected by a forensics expert as having been altered. Signs to look for include: significant changes of volume, content, or continuity with either the main speaker's words or background sounds; sudden, strange or unaccounted for sounds; and the audio component not fully synchronized with the relevant video component. The expert should have ready access to appropriate laboratory equipment such as microscopes, special cameras, cross-pulse monitors, waveform monitors, oscilloscopes, and other pertinent laboratory equipment.

Traditionally, the alteration or editing process involved a rearrangement of the analog information contained in the video image although it was practically impossible to move, change or tamper with the actual picture elements or pixels that constitute the video image.

Nowadays, however, digital editing or fabrication of videotape evidence has been made possible by enormous advances in computer video speed, computer memory capacity, computer software and associated advances in the field of electronic imaging and laser scanners.

With respect to audio equipment because of advances in digital audio sampling and editing, it can be readily utilized in conventional dubbing and re-recording techniques.

Fabrication of the video component is normally much more difficult. Depending on the format of the videotape used, and the type of time code information that it may have been recorded, time code can either be erased by re-recording over it on the original tape or the original time code can simply be replaced during the dubbing process with a subsequent time code. Once again as digital editing becomes more widespread, time codes of all types will become more vulnerable to tampering since like all information on a videotape they can be reduced to easily manipulate binary code. It should be noted that even if there is an interruption of the date/time record or time code on a video segment, this is no guarantee that tampering has occurred. Secondly, it is almost always possible to edit a tape and then add a different time code or date/time record as the tampered with videotape is being dubbed onto another tape. The audio and video components of videotape should be considered separately during any necessary review. With respect to the video component, "smears, glitches," rolling lines, "unexplained distortions," and any other type of picture "break up" may indicate tampering. Sudden jumps in action or cuts from one scene to another should also be noted. With respect to the sound component, sudden changes in noise levels, strange or inexplicable sounds, or important statements made while a person's

4

Page 93: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

back is turned may also prove significant. Additionally, the synchronized audio and video components of videotape motion provides an indication of reliability/reference point. If there are discrepancies in the synchronization between the picture and the sound as when a person's words do not match their lip movements, this may also indicate a possible edit.

Additional Issues/Methods of AnalysisIn all disputed videotape examinations it should be ascertained whether the recording was purportedly made with the equipment specified and if the recorder is available, appropriate tests of that recorder can be made to see if any individual "footprint" can be matched with the accompanying videotape. Next, one should determine if the recording is continuous and is it an accurate representation of the event depicted. The original tape should always be examined for anomalies.

Some of the more obvious indications of possible videotape editing include evidence of the submitted tape being exposed to a processing amplifier with results in improved synchronization patterns. One can also examine such things as head video switching differences or re-recorded time base errors. Other types of visual/electronic footprints would include non-synchronous edits, azimuth insert edits which may require magnetic development, defective pixel/yoke ringing; color filter/color registration errors, and other significant picture aberrations. The focus of the analysis is technically directed at the frequencies of the video and audio signal, as well as an intensive computerized review of the frames as they shift from an operator "stop" and "re-start." Components of the color signal are checked for consistence as well as observing the continuity of the entire event.

 

5

Page 94: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

ANOMALIES ASSOCIATED WITH COMPUTER EDITING OF RECORDED TELEPHONE CONVERSATIONS

Second international chemical congress forensic symposium fall 1995 San Juan Puerto Rico by Steve Cain

During a two to three year period, a Midwestern entrepreneur had been interested in filing a patent on an innovative new product. As a home-based business man, much of his product development and marketing strategies were accomplished through contact with several dozen product development attorneys and other business advisors over his home and office telephones. When requested to provide the original telephone tape recordings, he claimed they had been inadvertently misplaced but that he had made copies of the relevant conversations which he later surrendered for forensic analysis. Although unsuccessful in ever examining the original tapes, I did have two copies of each of the original tapes. The original recorders were described as Radio Shack type portable machines together with a telephone interface device and two consumer brand high speed dubbing cassette recorders which purportedly were used in the selective dubbing of individual telephone conversations from the original tapes.

During review of ten composite copy tape conversations, it became apparent through both aural and spectrographic/waveform analysis, that there existed a number of suspicious record events (i.e. “anomalies”) which deserved further instrumental attention. A KAY Digital Spectrograph Model 5500 was used for the bulk of the analysis. As the original tapes were not available, magnetic development was not deemed appropriate and therefore traditional digital waveform/spectrographic techniques were utilized in the examination process.

Before displaying examples of the computer-related edited phenomena, it may prove beneficial to review the traditional analog anomalies often associated with falsification of recordings. These include:

1. Deletion: the elimination of words or sounds by stopping the tape and over-recording unwanted areas.

2. Obscuration: the mixing in of sounds of amplitude sufficient to mask waveform patterns which originally would show stop/starts in inappropriate places.

3. Transformation: the rearranging of words to change content or context.

4. Synthesis: the adding of words or sounds by artificial means or impersonation. Anomalies often times include the following phenomena:

1. Gaps: segments in a recording which represent unexplained changes in content or context.

2. Transients: short, abrupt sounds exemplified by clicks, pops, etc.

3. Fades: gradual loss of volume.

1

Page 95: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

4. Equipment Sounds: context inconsistencies caused by the recording equipment (such as hum, static, and varying pitches).

5. Extraneous voices: background voices which at times appear to be as near as the primary voice or can even mask the primary voice. (1)

Modern day technology and the development of the DSP chip have greatly complicated the issue of tape tampering detection and further increases the likelihood

that altered tapes can escape detection. The Federal Bureau of Investigation Signal Analysis Branch has already acknowledged, “it is difficult to detect some alterations when a recording is digitized onto a computer system, physically or electronically edited and recopied onto another tape. (2) Recently there have been at least 20 different manufacturers of desktop computer editing workstations or digital recorders which can be used as “turn key” editing systems. Software related computer cards can transform a personal computer into a sophisticated digital audio editing machine. Some of the systems do require that the initial conversion of the analog format be accomplished by a digital audio recorder before accessing the computer hardware.(3)

Digitization of speech can sometimes leave discernable artifacts, especially “aliasing” effects. This phenomena of digitizing the speech signal involves two distinctive processes known as Sampling and Quantizing, which are the true core of the digital recording process. Speech digitization requires filtering by an appropriate low pass filter which should remove any high frequencies that are beyond the sampling rate of the equipment. The sampling process refers to the transforming of the low-filtered electronic waveform into many thousands of small units of time. Each of these time units are later quantized with respect to its respective amplitude.

The Nyquist Theorem, however, requires that the sampling frequency be twice as high as the highest frequency converted into digital format. If this theorem is not followed, an undesirable effect known as Aliasing occurs.(4) High frequency changes in amplitude are not properly encoded, leaving some information lost and occasionally new erroneous signals are generated. “If the throughput frequency is greater than one-half the sampling frequency, aliasing inevitably occurs.”(5) For example if S is the sampling rate and F is a higher frequency than one-half the sampling rate and N is an integer, a new sample frequency, Fa is also created at Fa = ± NS ± F. Therefore, if S equals 44 KHZ and we sampled at 36 KHZ, another sample frequency would occur at 8 KHZ. If we sample at 40 KHZ, a 4 KHZ aliasing signal would occur. (6)

Other aliasing effects involve Image Aliasing which occurs in multiple images produced by the sampling process. If a 44 KHZ sampler is utilized and a 36 KHZ input signal is analyzed, some of the resulted output frequencies would 8 KHZ, 52 KHZ, 80 KI-IZ, etc. In addition, Harmonic Aliasing can exaggerate the problem. Complex tones, for example, could result in aliasing frequencies

2

Page 96: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

generated separately for each harmonic. The practical result of this would be additional harmonics would be added to the digitized signal which normally would be multiples of the harmonic of the fundamental frequency.(7)

As DSP technology and their respective chips become more sophisticated and available to the consumer, the ability to edit, alter or fabricate audio recordings will be enhanced. Computer-based digital editing now permits the generation of lengthy, fabricated audio segments, sometimes devoid of the traditional transients in other editing artifacts associated with analog tape tampering.

The results of an aural/waveform/spectrographic analysis on the evidence tape copies disclosed a number of computer related editing anomalies associated with significant portions of the recorded telephone conversations, namely:

1. Uncharacteristic tones in the recordings sometimes occurring at even numbered multiples of each other (i.e. 4, 8, 1 6, 20 KHZ).

2. Omission or deletion of material.

3. Abrupt beginning and ending of ongoing speech.

4. Aliasing effects.

The more subtle effects of the digital editing process involving “aliasing” artifacts can sometimes be heard but are more readily apparent in the spectrographic/waveform analysis of the altered speech signals.

Examples of the digital editing process associated with this case are displayed in the accompanying sets of overhead transparencies. A short term aural composite tape was produced and should further corroborate the nature and extent of the digital editing anomalies associated with the computer edits found in this examination process.

(COMMENT: Overhead transparencies of computer-related editing phenomena)

BIBLIOGRAPHY

1. Steve Cain, “Sound Recordings as Evidence in Court Proceedings,” article accepted for publication by National District Attorneys Association, The Prosecutor, to be published late 1995.

2. Bruce E. Koenig, “Authentication of Forensic Audio Recordings,” Journal of Audio Engineering Society, 38, 1/2, 1990, January/February, page 4.

3. Steve Cain, “Verifying the Integrity of Audio and Video Tapes,” paper published in The Champion Magazine, July, 1993.

4. Jordan S. Gruber, Fausto Poza and Anthony Pellicano, Audio Tape Recordings: Evidence. Experts and Technology, Volume 48, American

3

Page 97: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Jurisprudence Series, Lawyers Cooperative Publishing, Rochester, New York, 1993, pp. 108-109.

5. Ken C. Pohlmann, Principles of Digital Audio, Howard W. Sams and Company, 1992, pp. 46-48.

6. Ibid 5, p. 45.

7. Ibid 5, p. 48.

4

Page 98: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Policies and Procedures for Managing Physical EvidenceA. Forensic Tape Analysis, Inc. (FTAI) manages all physical evidence so as to guarantee the integrity of that evidence. This is accomplished by procedures intended to 1; insure the chain of custody, 2; properly identify all evidence, 3; maintain the security and confidentiality of all evidence, and 4; carefully document all technical and non-technical actions taken with evidence. FTAI strives to manage physical evidence according to the most rigorous law-enforcement procedures. FTAI also adheres to the Rules of Evidence that apply in whatever jurisdiction is pertinent to particular civil or criminal matters referred to FTAI for investigation.  

B. Submitting Evidence to FTAI. The following nine-step procedure should be followed when submitting physical evidence to FTAI for examination.  

1.      The only person to handle evidence chain of custody will be the designated “Evidence Custodian” The contributor should be the same as that which originally obtained or seized the evidence. Complete notes should be maintained about the seizure, history of custody and submission of the evidence in order to establish the chain of custody in court proceedings.

2.      Evidence must be properly marked, for identification by the technician having custody for analysis purposes.

a.       Marking should include the date (DD-MM-YY), whether the specimen is known (K) or questioned (Q), the file number assigned by the Evidence Custodian, and to initials designating the contributor and the technician assigned to the case.

b.      Markings should be written or printed with indelible ink.

c.       Markings should be made on a permanent portion of the specimen.

d.      If more that one specimen is provided, each must be marked with an item number or letter to clearly distinguish the specimen from all other specimens.  

3.      Evidence must be properly packaged for submission to FTAI. The purpose of packaging is to protect the evidence, to avoid possible contamination or confusion among different specimens, and to guard against loss during shipment and delivery a company with a tracing system should be utilized.

a.       Magnetic recordings (audiotape, videotape, or computer disks to be examined for authentication, enhancement, voice identification, etc.) must be properly protected against re-recording or over-recording. Analog audiotape cassettes are provided with two plastic tabs, which allow recordings to be made. Tabs are located on the rear edge of the cassette. These tabs must be removed (broken) prior to packaging. If this is not completed before the specimen is received it will be done in FTAI’s laboratory. Digital audiotape cassettes may have similar tabs,

1

Page 99: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

or thy may have moveable tabs. These must be set to the “write –protect” position. Some videotape cassettes have breakable “write-enable” tabs; others have movable “write-protect” tabs. These must be set to the “write-protect” position. Contact FTAI if additional information is required.

b.      Magnetic recordings should be placed in plastic containers (cassette-type tapes) or boxes (reel-type tape) to be properly protected from mechanical damage and the effects of stray magnetic fields. Preferably, these containers should be wrapped with at least three inches of good quality packing material between the specimen and the outside of the shipping carton. An acceptable alternative is to wrap the plastic container or box in aluminum foil and place it in a heavily padded shipping envelope. Evidence containers must be properly sealed to keep chain of custody intact. Envelopes can be sealed in the normal fashion with the flap initialed. This includes questioned documents for handwriting comparisons.  

4.      Management by FTAI. The following procedures are followed by FTAI to insure the integrity of evidence submitted for examination.

a.       Upon receipt of evidence it is immediately recorded in evidence log book by our Evidence Custodian by assigning a file number.

b.      Copies are made of all evidence received, and recorded in the evidence log.  

5.      Evidence is placed in a case file jacket or other container, which a copy of evidence log information (above) shall be attached, together with additional contributor information (address, telephone number, etc.).  

6. The case is assigned to an FTAI Technician, and unsealed and removed   from packing material.

7. If restricted handling procedures are involved (i.e. protective orders,   restrictive access to original tape evidence, etc.) the director of FTAI is   responsible for implementing of said procedures. No copies of the  evidence  will be made unless specifically authorized by the director of FTAI. Tape copies will be handled/preserved in the same manner and care as original tape recording.

8. Upon completion of casework, including payment of any outstanding fees,  the evidence and a copy of the case report (if required) will be returned to the contributor following the procedures outlined above for protection, packaging,  and delivery.

9. Case files and work notes remain the property of FTAI and will not be sent to   evidence contributors or others unless a written request is received. Case files and work notes are retained for (7) seven years after completion of examination, whereupon they are destroyed. Steve Cain Revised 11/15/02

2

Page 100: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

VERIFYING THE INTEGRITY OF AUDIO AND VIDEOTAPESBy Steve Cain

Champion - July 1993 by Steve Cain

An ever increasing reliance on tape evidence in criminal prosecutions, especially in organized crime and drug cases, underscores the importance of tape integrity and the methods used to qualify or disqualify tape evidence.This article will discuss some of the procedures utilized in analog and digital editing of tapes and assess their potential threat vis-a-vis tape tampering issues; the "legal admissibility" issue surrounding tape recorded evidence to include defining strategies for the defense to require the government to release the 'best evidence' for analysis purposes; and an overview of the accepted techniques for the scientific analysis of recorded tape evidence.

Tape Editing Technology,The forensic examination of "tampered tapes" should include an inspection of the original tape(s) and the recorder(s) used to produce the tape(s). In the simple case, the existence of an electronic edit and/or evidence of physical splicing will produce acoustic irregularities which can be viewed with instruments and documented.Modern day technology was apparently used in the electronic editing performed on the disputed Gennifer Flowers/Gov. Bill Clinton tape recordings. The Cable News Network (CNN) asked that I provide an expert opinion on Mr. Clinton's voice and also asked that I examine the tape submitted by the STAR News Magazine for any evidence of possible tampering. The later examination disclosed a number of suspicious acoustic events (anomalies) including: a total loss of signal (dropouts) ;a change in the speakers' frequency response during different telephone conversations; and "spikes" (audible sounds of short duration which are often attributable to normal stop/start and pause functions of the recorder).In order to provide any definitive conclusion, I requested the original recorder and tape to determine if these electronic edits were intentional edits or possible malfunction/anomalies of the recorder/microphone equipment. CNN has never received the requested tape or recorder from the Star News Magazine.Digital editing of both audio and video tapes, however, greatly complicates the issue and increases the likelihood that altered tapes can escape detection.The Federal Bureau of Investigation (FBI) Signal Analysis Branch has already acknowledged, "It is difficult to detect some alterations when a recording is digitized into a computer system, physically or electronically edited and recopied on to another tape." *1*The days of utilizing a razor blade and splicing tape to effectively alter or "doctor" a recorded conversation are all but gone. Right now there are at least twenty manufacturers of desktop computer editing work stations or digital recorders which can be used as "turn key" editing systems. Software and add on computer cards can transform an IBM personal computer or a Macintosh computer into a sophisticated digital audio editing machine. Some of the systems require a digital audio recorder for initial conversion of the analog format before accessing the

1

Page 101: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

computer hardware. These editing work stations were developed to save the motion picture and recording industries money by precluding the necessity of recording sessions or to correct subtle errors in multi track releases.Some computer boards and software cost less than a $1,000,and provide both recording and editing of sound in an IBM compatible or Mac personal computer format. Editing options are practically inexhaustible thus giving the operator the ability to alter the tape in a word processor type of mode (i.e. cut and paste, copy, delete, etc.) while selected playback files utilize subdue cross fading effects that can "shape" the sound. The typical telltale signs of traditional analog recorder editing including "clicks, pops" and other short duration sounds, can now all be effectively removed without any detectable, audible clue.

Traditional Editing TechniquesPresent tape editing practices include either physical splices or electronic editing on one or more analog tape recordings whenever portions of selected conversations are over recorded (i.e. erased) or the original recorder was stopped and restarted inappropriately. While listening to the tape, the attorney may first suspect an alteration by noting either unexplained transients, equipment sounds, extraneous voices, or inconsistencies with provided written information.The major categories of tape alterations include; (1) Deletion; (2) Obscuration; (3) Transformation; and (4) Synthesis *2* Deletion of unwanted material can readily be done through splicing or by using one or more recorders to erase, rerecord, or stop/pause the recorder at strategic points within the conversation. Obscuration involves the distortion of a recorded signal with the purpose of rendering selective portions unintelligible. This method, for example, was used during the editing of the infamous 18 minute gap in the Watergate tapes. This technique is also used to .mask splices, clicks, or suspicious transients and is more difficult to detect than deletion methods. By judicious use of two tape recorders, one may add "noise" to the copy and thereby mask the original recording and render it less intelligible. One can also reduce the volume of the slave recorder and thus weaken the amplitude of target conversations on the original tape.Transformation involves the alteration of portions of a recording so as to change the meaning of what is said. The technique is similar to deletion practices but greater skill and care must be applied as a knowledge of acoustic phonetics is required to avert a suspicious edit.Lastly, synthesis is the generation of artificial text by adding background sounds or conversation to the tape copy which were not present on the original recording. The addition of selective phrases can be accomplished if a sufficient data base library of recorded conversations is available. It must be emphasized that all of the traditional analog methods of altering audiotapes can be more efficiently and surreptitiously accomplished through the use of digital editing work stations.

Tape Authentication And Detection Of EditsWith the threat of digital editing looming larger, it is more inoperative than ever that both the official tapes and recorders be made available for inspection.

2

Page 102: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

The FBI's Signal Analysis Branch has developed a set of well defined procedures for the acceptance of authentication requests which provides an excellent overview of what the government considers to be essential for a scientifically valid tape analysis:1. Sworn testimony or written allegations by defense, plaintiff, or government witnesses of tampering or other illegal acts. The description of the problem should be as complete as possible, including exact location in recording, type of alleged alteration, scientific test performed, and so on;2. The original tape must be provided. Copies of a duplicate tape cannot be authenticated and are normally not accepted for examination by the FBI;3. The tape recorders and related components used to produce the recording must be provided; and,4. Written records of any damage or maintenance done to the recorders, accessories, and other submitted equipment must be provided.In addition, there must be a detailed statement from the person or persons who made the recording describing exactly how it was produced and the conditions that existed at the time, including:A. Power source, such as alternating current, dry cell batteries, automobile electrical system, portable generator.B. Input, such a telephone, radio, frequencies (Rf) transmitter/receiver, miniature microphone, etc.C. Environment, such as telephone transmission line, small apartment, etc.D. Background noise, such as television, radio, unrelated conversations, computer games, etc.E. Foreground information, such as number of individuals involved in the conversation, general topics of discussion, closeness to microphone, etc.F. Magnetic tape, such as brand, format, when purchased and whether previously used.G. Recorder operation, such as number of times turned on and off in the record mode, type of keyboard or remote operation for all known record events, use of voice activated features, etc.Also recommended is a typed transcript of the recording, to include both English and foreign language versions *3*It is essential in all tape authentication exams to obtain the original recorder and tape, as copies cannot normally be authenticated. If the defense is encountering difficulties in obtaining the necessary "originals" they may wish to cite Koenig's article'*4*as an authoritative resource which specifies the reasons why the original evidence is essential in any tape tampering request.If the original tape and recorder are not available for inspection, the forensic expert can still conduct a preliminary examination of the submitted "copy" for any evidence of discontinuous recorder operation, although all conclusions must necessarily be qualified regarding possible editing effects. The examination process normally includes both an aural, physical, and instrumental analysis of the evidential tape. Phase continuity, speed determination, azimuth determination, waveform analysis, spectrographic and narrow band spectrographic analysis are among the techniques employed to evaluate the

3

Page 103: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

tape.The techniques and tests are usually adequate in the detection of altered analog recordings. Fortunately, the vast majority of altered tapes today are still analog tapes.Defense counsel should have a working knowledge of how tapes are analyzed.First, there is a physical inspection of the submitted tape, the tape housing, the tape recorder and all ancillary equipment used to make the original recording:microphones, telephone couplers, transceivers, etc. A magnetic development test involves the application of a special fluid which under proper magnification will make visible the head track configuration, off-azimuth recordings, start/stop functions, damage to recording heads, etc. The forensic expert can subsequently determine whether the submitted tape is a copy, has been over-recorded, or was made on a different recorder than the one submitted. The original recorder can be detected by slight speed fluctuations and deformities in the rotating parts which provide a unique "wow and flutter" signature which can be measured. Also, spectrum analysis can be used to measure slightly different signals transmitted through the microphone or telephone equipment. All of the signal analysis equipment can be useful in answering questions related to bandwidth, distortion effects, or unique tones generated during the original recording process.

Forensic Video ExaminationsThe forensic video examiner is concerned with the authenticity and integrity of the signal. Questions relating and whether the tape is a copy, a compilation of other tapes or an edited version are of important consideration. Forensic examinations of videotapes usually consist of both a visual and aural examination. One of the more important pieces of equipment used in forensic video examinations is a waveform monitor which is a specialized oscilloscope. It displays the voltage versus time modes and has specialized circuits to process the signal. If any editing occurs, then its possible to display the signal aberration on the display screen of the instrument.*5*Additional tests include measurements of the chrominance, hue and burst of the color videotape by using a vector scope. The vector scope measures the chrominance information and allows for the examination of matching bursts of multiple signals. It also permits the investigation of edit points.Vertical, interval and horizontal information known as video synchronizing information can be observed on a cross pulse monitor. This "cross pulse" information can be viewed on a cross pulse monitor and with proper application, one can often determine if the videotape is a copy or an original. In cases where the helical heads are out of alignment, a set of marks could exist for each succeeding generation or copy.*6* Lastly, if one suspects videotape editing, the examination will require a frame-by-frame inspection, with the use of waveform monitors, vector scopes, and a cross pulse monitor together with other forensic equipment as deemed appropriate. It must be noted that there are sophisticated production studios that can edit videotapes in such fashion that traditional methods of detection are no longer adequate. Studios capable of producing such tapes are, for now, generally limited to larger metropolitan areas.

4

Page 104: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

Legal Issues/AdmissibilityIn their article, "Attacking The Weight Of The Prosecution's Science Evidence,"*7* authors Edward J. Imwinkelried and Robert Scofield explore the thesis that the accused has a constitutional right to introduce expert testimony which can generate a reasonable doubt. The authors warn, however, that this right to relevant criminal evidence is in fact very limited in scope, namely; (1) important or "crucial evidence" and; (2) the defense must show that the evidence is "trustworthy."Likewise, authors Nancy Hollander and Lauren Baldwin point out that the admissibility of an expert's testimony is often dependent on whether the expert is testifying for the defense or for the prosecution ."*8*In the field of forensic tape analysis, there exists few competently trained and certified experts available to the defense to challenge the accuracy of government tapes and/or the conclusions of the government experts. Even though I have over twenty years experience in federal law enforcement and as a Treasury Department crime laboratory supervisor, I am routinely subjected to concerted efforts by the prosecution to attack my credibility and the accuracy of my conclusions. As you would expect, as a government expert, I never received any criticisms from the prosecutor concerning my credentials or accuracy of my findings.

Access To EvidenceMore and more courts are being forced to address the question of whether the government has the privilege to withhold technical data from a defendant challenging the integrity of electronic surveillance evidence. A few courts have recognized "qualified privilege" for the government to such data (by drawing an analogy to an "informer's privilege"), but have not been very sensitive to the unique nature of electronic surveillance evidence nor defined the showing required to overcome the government's "qualified privilege." Under the due process clause, criminal defendants should be afforded a meaningful opportunity to present a complete defense.*9* To safeguard this right the court has recognized the principal of "constitutionally guaranteed access to evidence ....*10* This access to evidence however, is not absolute as indicated in Roviaro v. United States,*11*" wherein the court recognized the government's limited privilege to withhold the identity of informers. Two circuit courts of appeal have extended the limited privilege recognized in Roviaro to the nature and location of electronic surveillance equipment."*12*In Angiulo and Cintolo, the appellants asserted that the district court had mistakenly barred questions concerning providing them the precise location of microphones hidden in an apartment. Trial motions for the information had not been made nor had the defendants offered any technical basis for the value of the information. The government successfully objected to the questions concerning the microphones location on the grounds that it would reveal sensitive surveillance techniques and jeopardize future criminal investigations.In upholding the district court, the First Circuit, citing Van Horn *13* and United States v. Harley,*14* and making an analogy to the informer’s privilege in Roviaro held that a qualified privilege against compelled government disclosure

5

Page 105: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

of sensitive investigative techniques exists."*15* The privilege can be overcome, however, by a sufficient showing of need. The defendant must show that, "he needs the evidence to conduct his defense and that there are no adequate independent means of getting at the same point."*16* The Cintolo court stressed that the extent to which adequate alternative means could have substituted for the proper testimony is "a key to evaluating this claim of necessity.*17*As technological advances have occurred in digital editing, there likewise has been a tremendous increase in the number of body worn FM transmitters and other recording devices used by law enforcement to collect evidence against defendants. It should be emphasized, however, that some of this evidence may not be admissible in court if the agencies do not comply with several Federal Communication Commission (FCC) regulations. First, all nonfederal agencies must use only transmitters that are approved by the FCC and without this approval the transmitter is not considered a legal transmitting device and therefore cannot be legally used to gather evidence. Secondly, state and local agencies must be licensed in the FCC's Police Radio Service and thus far most departments reportedly have not met this requirement. These observations are part of the information contained in "Equipment Performance Report: Body Worn FM Transmitters," a report of the Technology Assessment Program (TAP). This program tested nine Body-Worn FM transmitters in accordance with National Institutes of Justice (NIJ) Standard 0214.01. These standards require transmitters passing the test to provide intelligible audio signals that result in acceptable quality voice recordings.*18* As noted in the Cintolo and Angiulo decisions, the defense failed to provide a sufficient showing of necessity, thus, it is imperative that defense experts vouch for the necessity of access to the government evidence as soon as possible.

The Need For Original Recording Equipment And How To Get IfThere are a number of valid scientific reasons for accessing original tapes, recorders, and related equipment to conduct a proper analysis.In practically every creditable forensic publication dealing with forensic tape analysis procedures, the authors emphasize the necessity of examining the original evidence or a direct patch cord copy. In many cases, however, experience has shown an unwillingness of the government prosecutor and agents to provide such materials to the defense for examination purposes. The government may object that the defense never requested the original or direct copy recordings and therefore, their motions for access at the eleventh hour are basically "delay strategies." This argument can be effectively countered if the defense obtains an appropriate court order requesting the defense expert be provided access to the required "best evidence recordings."Secondly, the government may contend that it has a qualified (if not absolute) privilege of withholding technical data from the defense counsel citing "National Security" or indicating that such release may jeopardize future criminal investigations. The Anguilo and Cintolo decisions provide the defense counsel relief from such government actions. Counsel must show the need for the evidence to conduct the defense and that there "is no adequate independent means of getting at the same points."

6

Page 106: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

The importance of the defense obtaining the original or at least a direct patch cord copy of all evidential recordings cannot be over emphasized. In practically every case I have seen, the copy initially provided by the government was not adequate for the best voice identification, tape enhancement or tape authentication examination. Subsequent motions filed by the defense citing the aforementioned requisite need for the original evidence often results in its release by the court. As reflected in the newly approved International Association for Identification standards for analysis of questioned voice recordings, the "unknown and known voice samples must be original recordings, unless listed as a specific exception ...."*19*

Notes:1. Bruce E. Koenig, Authentication of Forensic Audio Recordings, JOURNAL OF AUDIO ENGINEERING, 38 No. 1/2, 1990, Jan/Feb, page 4.2. National Commission For The Review of Federal and State Wiretapping Laws, pp 223225,1972.3. Steve Cain, Voiceprint Identification, NARCOTICS, FORFEITURE, AND MONEY LAUNDERING UPDATE NEWSLETTER, U.S. Department of Justice, Criminal Division, (Winter 1988).4. Bruce E. Koenig, Authentication of Forensic Audio Recordings, JOURNAL OF AUDIO ENGINEERING SOCIETY, 38 No. 1/2, 1990, Jan/Feb. page 4.5. Tom Owen, Forensic Audio and Video Theory And Applications, JOURNAL OF AUDIO ENGINEERING SOCIETY, Vol. 36, No. 1/2. 1988, Jan/Feb, page 39.6. Ibid page 40.7. Edward J. Imwinkelried, and Robert G.Scofield, Attacking The Weight Of Prosecution ~Scientific Evidence, THE CHAMPION, PDN, April 1992.8. Nancy Hollander and Lauren M. Baldwin, Testimony In Criminal Trials: Creative Uses,Creative Attacks, THE CHAMPION, December 199 1.9. California v. Trombetta, 467 U.S. 479, 485 (1984).10. United States v. Valenzuela Bemal, 458 U.S. 858, 867 (1982).11. 353 U.S. 53 ().12. See United States v. Angiulo, 847 F.2d. 956,98182 (lst Cir. 1988); and United States v. Cinto1o, 818 F.2d. 980, 100103 (lst Cir. 1987); United States v. Van Horn, 789 F.2d. 1492, 150708 (llth Cir. 1986).13. 798, F.2d. 1492 ( ).14. 682 F.2d. 1018, 1020 (D.C. Cir 1982).15. Cintolo, 818 F.2d. 1002.16. See Harley, supra.17. Cintolo, 818 F.2d. 1003.18. Copies are available at no charge from the Technology Assessment Program Information Center (TAPIC), tollfree number 800-248-2742 or (301) 251-5060.19. IAI Voice Comparison Standards, JOURNAL OF FORENSIC IDENTIFICATION, January/February, 1992

 

7

Page 107: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

AUTHENTICATION OF SOUND RECORDINGS FOR EVIDENTIARY PURPOSESBy: STEVE CAIN, MFS, MFSQDPRESIDENTFORENSIC TAPE ANALYSIS, INCLAKE GENEVA, WISCONSIN

MICHAEL R. CHIAL Ph.DPROFESSOR AND CHAIRMAN OFCOMMUNICATIONS PROGRAMS ANDPROFESSOR OF COMMUNICATIVE DISORDERSUNIVERSITY OF WISCONSIN-MADISON

PRESENTED AT 1994 ANNUAL MEETING OF THEAMERICAN ACADEMY OF FORENSIC SCIENCES(JURISPRUDENCE SECTION)FEBRUARY 18, 1994

AUTHENTICATION OF SOUND RECORDINGS FOR EVIDENTIARY PURPOSES

An ever-increasing reliance on tape evidence in both criminal and civil hearings underscores the importance of tape integrity and the methods used to qualify or disqualify audiotape evidence. Tape recordings are subject to increasing falsification and misinterpretation, especially with the advent of computer-based digital editing equipment. The purpose of this paper is four-fold: 1) to identify the predominant methods by which audiotapes are normally intentionally altered or falsified; 2) identify the physical and instrumental techniques for detecting signs of tape falsification; 3) briefly discuss the increasing threat caused by modern-day digital editing techniques and 4) provide examples of both analog and digitally falsified tapes.

There are two generally accepted approaches for establishing the authenticity of a questioned tape recording. Current legal practices normally require that the burden of proof be placed on the attorney seeking to introduce the tape into evidence. This will require that the attorney demonstrate that certain accepted methods designed to protect from any form of tape tampering have been adhered to and if that is not successful to submit the tape to a qualified expert for a forensic examination. On a more practical level, an original recording is considered authentic if it starts at the beginning of the tape and does not stop until the end. Any stops or restarts should be announced by the operator. Original recordings should contain all of the audio information recorded at the moment in time that the event occurred. The recording should further not contain any break in its continuity or content nor should it contain any suspicious signs suggestive of falsification.

1

Page 108: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

It is important for both attorney and investigator to understand that falsification or tampering with tapes involves an intentional attempt to alter the tape’s original content. Often, however, the evidential recorders and their respective tapes have been unintentionally interrupted during the recording process. This innocuous or accidental interruption of the tape does not constitute a falsification effort and may include the following operator errors; 1) accidental stop/restart of tape recorder; 2) mechanical malfunction of the tape recorder; 3) damage to the tape oxide or the use of a previously recorded tape; 4) “off-speed” recording due to low batteries or improper AC line connections; 5) microphone abnormalities; etc.

The major categories of intentional tape editing or falsification include; 1) Deletion; 2) Obscuration; 3) Transformation; and 4) Synthesis. Deletion of unwanted material can be rapidly accomplished through either splicing or by using one or more recorders to erase, rerecord, or stop/pause the recorder at strategic points within the conversation. Obscuration involves the distortion of a recorded signal with the purpose of rendering selective portions unintelligible (i.e. the eighteen minute gap in the infamous Watergate tapes). This technique can also be used to mask splices, clicks, or suspicious transients. Transformation involves the alteration of portions of a recording so as to alter its original content. The technique is similar to deletion practices but requires greater knowledge of acoustic phonetics and is more difficult to accomplish. Lastly, synthesis is the generation of artificial text by adding background sounds or conversation to the taped copy which were not present on the original recording. It should be emphasized that all of the aforementioned traditional analog techniques for altering audiotapes could be more effectively and surreptitiously accomplished through the use of digital editing workstations.

The principles of falsification are also similar to the general principles of disguise. Namely, the individual actually effecting the tape falsification is attempting to obscure or disrupt important features of the originally recorded event or subject of interest. This is accomplished through various masking techniques. Secondly, falsification efforts are often designed to misdirect the attention of the listener to an irrelevant aspect or feature of the signal or an event of interest.

The electromechanical indications of falsified tapes should include one or more of the following phenomenon:

1) Gaps segments in a recording which represents unexplained changes in content or context. A gap can contain buzzing, hum, or silence.

2) Transients - short, abrupt sounds exemplified by clicks, pops, etc. Transients may indicate tape splicing or some other interruption of the recording process.

3) Fades - gradual loss of volume. Fades can cause inaudibility and are considered gaps when the recording becomes fully inaudible.

2

Page 109: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

4) Equipment sounds - inconsistencies of context caused by the recording equipment itself. Common equipment sounds include hum, static, whistles, and varying pitches.

5) Extraneous voices - background voices which at times appear to be as near as the primary voices, and at times can even block the primary voices.

The methods for detecting falsified (non-authentic) recordings include:

Critical Listening The forensic tape specialist will normally listen with high quality head phones and professional recording equipment to the original tapes prior to conducting any instrumental examination. He notes any unusual aural and/or acoustic events such as starts, stops, speed fluctuations, and other variations requiring investigation. He examines all recorded events to include both foreground and background sounds and listens for abnormal changes, absences, or presences of differing environmental sounds. He concentrates on voices, conversation and other audible sounds.

Aural Anomalies Would include sudden changes in a person’s voice, abrupt unexplained topic changes, or a sudden change in foreground/background information.

Physical lnspection

Magnetic Development

Spectrum Analysis Employs the use of specialized computer equipment which measures the frequency spectrum of the recorded tape and provides a visual interpretation of the frequency vs. amplitude, frequency vs. amplitude vs. time displays. This allows for the expert to view the entire spectrum or to zoom in on one particular area of interest to help characterize the acoustic nature of a particular anomaly and to possibly identify its source.

Waveform Analysis - A computer generated display representing time vs. amplitude of recorded signals in graphic form. Such analysis normally allows the expert to measure and identify record-mode events including the measurement of record-to-erase head distances, determination of the spacing between gaps and multiple gap erase heads, and inspection of the signature shape and spacing of various record event signals.

Test Recordings on Evidential Recorders and Accessory Equipment -Various electrical, magnetic and mechanical measurements of both standard and modified recorders can be used in determining the possible origins of questionable tones or sounds occurring on the evidential recording.

There exist many different methods of both analog and digital editing of tape recordings and the below examples highlight one of the more common methods utilized.

3

Page 110: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

TRADITIONAL METHODS

OF TAPE EDITING METHOD OF DETECTION1. Whispered Speech 1. Talker identification (voice print analysis) involving the combined aural/spectrographic method2. Vocal Disguise or Mimicking 2. Talker identification (voice print analysis)3. Typical Analog Edits - Splicing (electronic or physical), stop/restart, over-recording, pausing of recorder, erasures, dubbing, etc. 3. Critical listening, instrumental analysis, magnetic development, and spectrum analysis.

4. Re-recording to obscure physical physical edits, etc. 4. Critical listening, instrumental analysis, magnetic development, and spectrum analysis.

CONTEMPORARY/FUTURE CHALLENGES

Digital editing of both audio and video tapes has greatly complicated the authentication process and increases the likelihood that altered tapes can escape detection. There are at least 30 different desktop computer editing workstations or digital recorders which can be used as “turnkey” editing systems. Software and add on computer cards can transform an IBM or Macintosh computer into a sophisticated digital audio editing machine. Some of the systems require a digital audio recorder for initial conversion of the analog format before accessing the computer hardware. These editing workstations were originally designed by the motion picture and recording industries to correct subtle errors in multi-track releases and can now be purchased at prices as low as $300 for the software. The editing options are practically inexhaustible and provide the operator the ability to alter the tape in a word-processing format (ie. cut and paste, copy, delete, etc.,) while selecting playback files which can help “shape” the sound. The typical telltale signs of traditional analog recorder editing including clicks and pops and other short duration sounds can now be effectively removed without little if any detectable audible clues.

Examples of varying editing processes including related hardware and/or equipment:

1) Pitch Shift Telephones

2) Vocal Disguise through synthesized speech (Votrax or Dectalk).

3) Computer Manipulation of speech formant data (Kay Elemetrics Model 4300 and ASL programs - Re-synthesis of Human Speech)

4) Additive mixing of noise or other background and foreground signals into on-going speech.

5) Signal Processing Filters (analog and digital)

a. Phasing Anomalies

4

Page 111: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

b. Chorusing

c. Harmonic Distortion

d. Reverberation

e. Filtering of Selective Frequencies

f. Channel Switching

The threat of future digital editing is becoming of increasing concern to the courts. It is therefore more imperative than ever that both the original tapes and the recorders be made available for inspection. Both the FBI Signal Analysis Branch and other certified acoustic tape experts recognize that it is essential for the contributing attorney to provide all of the original tapes and related recording equipment before a complete authentication can be accomplished.

Professor Chial and I have attempted to explain some of the more traditional and more recent methods of detecting falsified or edited audiotape recordings; identify the various physical and instrumental techniques for detecting signs of tape falsification; discuss various examples of falsified tapes, and lastly to briefly discuss the increasing threat caused by digital computer-based editing systems. It is relatively easy to change the content of a recording by deleting words or obscuring meaning with over-recorded sounds or by transforming the context through rearrangement of selected phrases or added words. Nevertheless, falsifications normally leave detectable magnetic and waveform acoustic signatures which can lead to forensic individualization of the evidential recorders and tapes.

Note: For additional information see the following published articles:

“Authentication of Forensic Audio Recordings,” Journal of Audio Engineering Society, 38, 1990, Bruce E. Koenig.

The National Commission for the Review of Federal and State Wire Tapping Laws, 1976, Mark Weiss, et al.

“Verifying the Integrity of Audio and Video Tapes,” The Champion Magazine, Summer 1993, Steve Cain.

“Sound Recordings as Evidence in Court Proceedings,” The Prosecutor Magazine, Sept/Oct. 1995

 

5

Page 112: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

AES standard for forensic purposes —Criteria for the authentication of analog audio tape recordings

Users of this standard are encouraged to access http://www.aes.org/standards to determine if they are using the latest printing incorporating all current amendments and editorial corrections.

This document has been reproduced by Global Engineering Documents with the permission of AES under a royalty agreement.

AUDIO ENGINEERING SOCIETY, INC.60 East 42nd Street, New York, New York 10165, USA

AES standard for forensic purposes

Criteria for the authentication of analog audio tape recordingsPublished byAudio Engineering Society, Inc.Copyright © 2000 by the Audio Engineering Society

AbstractThe purpose of this standard is to formulate a standard scientific procedure for the authentication of audio tape recordings intended to be offered as evidence or otherwise utilized in civil, criminal, or other fact finding proceedings.

An AES standard implies a consensus of those directly and materially affected by its scope and provisions and is intended as a guide to aid the manufacturer, the consumer, and the general public. The existence of an ABS standard does not in any respect preclude anyone, whether or not he or she has approved the document, from manufacturing, marketing, purchasing, or using products, processes, or procedures not in agreement with the standard. Prior to approval, all parties were provided opportunities to comment or object to any provision. Approval does not assume any liability to any patent owner, nor does it assume any obligation whatever to parties adopting the standards document. This document is subject to periodic review and users are cautioned to obtain the latest editi

AES43-2000

Contents

Foreword 3

1 Scope 4

2 Normative references 4

3 Definitions 4

6

Page 113: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

4 Verification of authenticity 5

4.1 Criteria 5

4.2 Equipment 5

4.3 Reporting 5

5 Examination and analysis 6

5.1Evidence management 6

5.2 Critical listening and waveform examination 7

5.3 Photo-microscopic analysis 8

5.4 The formulation of an opinion and conclusion 8

6 Testimony 9

6.1 Preparation 9

6.2 Problems 10

Annex A Informative references 11

Foreword

[This foreword is not a part of AES standard for forensic purposes — criteria for the authentication of analog audio tape recordings, AES43-2000.]

This document was developed by a writing group, headed by A. Pellicano, of the SC-03-12 Working Group on Forensic Audio of the SC-03 Subcommittee on the Preservation and Restoration of Audio Recordings. The writing group was formed to execute project AES-X48.

It results from an international consensus and is not intended to. reflect the practice of any single nation. As an AES standard, it is an international professional society’s statement of technical good practice, but its use is entirely voluntary and it does not have the status of a governmental regulation. Nevertheless, any claim to voluntary compliance with the standard implies acceptance of its mandatory clauses.

In 1991, SC-03-12 was organized as AESSC WG-12 at the request of a community of engineers from the ABS. the Acoustical Society of America, various law enforcement agencies, and groups concerned with testimony. The group concerns itself with the handling, authentication, and enhancement of audio recorded materials basing itself on methodologies such as developed from those described in Bolt, Cooper, Flanagan, McKnight, Stockham, and Weiss, Report on a Technical Investigation Conducted for the U.S. District Court for the

7

Page 114: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

District of Columbia by the Advisory Panel on the White House Tapes. May 31, 1974.

This document results from one of the projects set out at the early meetings of the working group.

Tom Owen, Chair of SC-03-12Michael McDermott, Vice-Chair of SC-03-12

1999-09-03

AES standard for forensic purposes —

Criteria for the authentication of analog audio tape

recordings

1 Scope

This standard specifies the minimum procedure for the authentication of analog audio tape recordings intended to be offered as evidence or otherwise utilized in civil, criminal, or other fact finding proceedings. It does not specify or restrict additional testing procedures that can be used.

These methodologies are suggested to any and all individuals and groups who hold themselves out to be or are recognized as forensic tape analysts or experts.

This standard is a set of procedures set forth to inform attorneys, courts, and other interested parties. It also serves to aid interested parties who are attempting to determine whether or not the procedures and methodologies of potential, chosen, or opposing experts are of a scientific nature and would withstand objective scrutiny.

2 Normative references

The following standard contains provisions that, through reference in this text, constitute provisions of this document. At the time of publication, the edition indicated was valid. All standards are subject to revision, and parties to agreements based on this document are encouraged to investigate the possibility of applying the most recent editions of the indicated standards.

AES27- 1996, AES recommended practice for forensic purposes — Managing recorded audio materials intended for examination.

3 Definitions

3.1

authentication

8

Page 115: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

authentic recording and authenticity analysis as defined in AES27

3.2

forensic tape analyst

FTA

entity performing authentication according to this standard

3.3

designated original recording

DOR

original recording as defined in AES27

3.4

designated originating recording device

DORD

original recorder as defined in AES27

3.5

employer

engaging party

entity engaging the services of an FTA

3.6

cassette

device composed of a case containing two coplanar or superimposed hubs or reels on which a magnetic tape is wound, so that the tape can move from hub (reel) to hub (reel) during recording, reproduction, a fast forward movement, or rewinding, and can be easily and instantaneously inserted in a recording-reproducing equipment or in a reproducer designed for this purpose, without handling the magnetic tape

3.7

memorialization

legally acceptable documentation of evidence

3.8

9

Page 116: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

test recording

recording made by the FTA, using the designated originating recording device and a non-evidence blank tape, for the purpose of determining certain performance characteristics of the recording device

3.9

signature

waveform or microscopic visualization (or demonstration) of record events either located on the DOR or created on a test recording, or both, utilizing the DORD or any tape recording device examined by the FTA for the purpose of identification or comparison during an examination

4 Verification of authenticity

4.1 Criteria

Verification is predicated upon two sets of criteria:

a) that a person, whether a law enforcement official or any individual stated, if called upon, could or would testify under penalty of perjury, that the tape recorded evidence presented as the DOR is, in fact, the tape material utilized to create the recording at the exact time that the occurrence, interview, interrogation, or recorded content actually took place;

b) that by a comprehensive examination procedure and scientific means the FTA was able to determine that it is the original.

4.2 Equipment

The FTA shall examine the DOR along with and utilizing the DORD. The FTA shall render findings that would scientifically evince that the DORD recorded the designated original recording, and found no conclusive evidence of tampering, unauthorized editing, or any form of intentional deletions, material or otherwise, within the recorded content.

4.3 Reporting

The FTA may then render an opinion that the recording has passed the procedure or standard for authentication and that the questioned tape recording is authentic in physical state and in content.

5 Examination and analysis

5.1 Evidence management

Except where otherwise specified in this standard, evidence management practices shall comply with AES27.

10

Page 117: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

5.1.1 Physical examination

5.1.1.1 Record-prevention punch-out tabs

If the audio evidence is contained in a tape cassette that features record-prevention punch-out tabs, the FTA should try to obtain permission to remove them or the FTA may remove the tabs at its discretion. If the tabs are removed, the FTA shall attach the removed record-prevention tabs to a suitable carrier such as a file card by means of a nondestructive and removable adhesive such as transparent adhesive tape. The carrier shall be placed inside a sealed envelope, with the date and time that the envelope was sealed and the signature of the FTA written across the seal. The cassettes shall be comprehensively photographed or videotaped before and after the removal of the punch-out tabs.

5.1.1.2 Operating condition

When the tape recorded evidence is contained in a cassette, the cassette shall be carefully examined to determine that it is operable. The FTA shall inspect the cassette, making sure that there is no obstruction to the tape. The FTA shall also look for apparent tears or splices on the tape material itself that could possibly obstruct or deter playback. The FTA shall carefully rotate the tape hubs in both directions to detect any hidden obstruction that could hinder playback. When examining a reel of tape, the same care and caution shall be executed.

NOTE Playback of a damaged tape can produce further damage to the tape.

5.1.1.2.1 Notification of damage

If during the physical examination, the FTA finds evidence of physical tampering or damage to the cassette or the tape material, the FTA shall immediately inform its employer that the submitting party shall be notified. If the cassette or tape material can be repaired, then the FTA shall obtain written permission from the submitting party prior to proceeding with any repairs or modifications. Whether or not the FTA receives permission to repair the damage or remove the tape material and place it in a new cassette or otherwise prepare the DOR to be available for playback, the FTA shall photograph or videotape the evidence for reference to memorialize the discovery. If the tape or cassette is repaired, the videotape or photographs shall comprehensively depict the repair.

5.1.1.2.2 Splices

If a physical splice is located, the splice shall be noted and photographed or videotaped at the time of the observation.

5.1.1.3 DORD condition

The DORD and any accompanying apparatus such as separate microphones, switching devices, and similar accessories shall be inspected and examined to determine that they are operational. After the FTA concludes the visual

11

Page 118: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

inspection, a compatible tape shall be placed in the DORD and the functions of the DORD shall be tested to ensure that it can play back the DOR without damage to the DOR or the DORD.

5.1.1.3.1 Notification

If the DORD is not functional, the FTA shall inform its employer that the submitting party shall be notified. If the DORD can be repaired, then the FTA shall obtain written permission to do so. If the repair necessitates the replacement of the record-playback head, the erase head or both, the FTA shall indicate to the employer that the replacement of the head or heads negates an authentication procedure and that the FTA report of findings relates only to the examination of the DOR. All repairs shall be comprehensively memorialized including who repaired the recorder and at what facility. All replaced parts shall be maintained as evidence by the FTA.

5.1.2 Verification

Compliance with 5.1 shall be verified and attested to by the FTA before proceeding further with the evidence.

5.2 Critical listening and waveform examination

The critical listening and waveform examination procedures can assist an FTA in attempting to determine whether or not any anomalies are present on the questioned recording.

5.2.1 The FTA shall produce a first test recording containing known exemplars of the functions of the DORD. It should include a minimum of ten start recording signatures, ten stop recording signatures, ten stop-start recording signatures, ten pause signatures (assuming that the recording device has this feature), and if the DORD is so equipped, ten voice activation signatures. Other test recordings may be produced which should include over-recordings and other variations of the record functions of the recording device if necessary or appropriate.

5.2.2 The designated recording device shall be utilized to play back the test recording. The playback should be rehearsed to ensure that the level of playback is appropriate. That setting should be fixed by either carefully applying tape across the volume control of the recorder or exacting some form of measurement that would ensure that the playback output level can be reasonably reproduced.

5.2.3 The first test recording should be played back into a configuration of either a computerized method of storing the playback on a hard disk, or some form of memory device that would allow repetitive playback. Many programs are now available to digitize the playback and store that information on hard drives. They further allow an array of playback functions, and most have features that would enable the FTA to view the waveform.

12

Page 119: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

5.2.4 Once the signal or audio from the test recording has been stored, playback of the digitized recording can take place to enable the FTA to listen to the recording while viewing the waveform. The FTA can then be informed as to how the record functions of the designated recording device sound (assuming that the functions generate a discernible audible sound when played back) and are visually demonstrated or appear in the waveform domain. The FTA should then study and scrutinize the signatures so that it can be reasonably acquainted with how the function signatures of the designated recording device sound and are seen or demonstrated in the waveform domain.

5.2.5 The DOR shall be played back with as close to the exact output level and through the exact configuration as the test recording. The output volume control of the DORD may be adhesive-taped to fixed position until all of the test recordings are created and subsequently stored. Once the signal or audio from the DOR is stored, then the FTA shall critically listen to the content while viewing the waveform.

5.2.6 The FTA should then produce, by the safest and best means possible, at least two first-generation copies of the DOR for reference and to evince the state of the recorded content at or about the time of receipt. If the FTA is asked for copies, then copies should be provided appropriately labeled and marked.

5.2.7 The critical listening and waveform examination should occur as often as the FTA deems it necessary in order to answer the following questions.

a) Was the content consistent and uninterrupted throughout the entirety of the questioned tape recording? If not, then the location of the gaps, dropouts, over-recordings, or any other form of disruption should be delineated for further examination and analysis. If there are other apparent unrelated recordings, they should be cataloged for reference and/or possible further examination and analysis.

b) Were there any identifiable record function signatures detected and located in the content? If so, are they consistent with the test recording exemplars? If not, they should be designated as possible anomalies. In either case, they should be labeled or otherwise delineated for further analysis.

c) Was there any form of anomalous or otherwise perceptible aural or visible indications in the playback or waveform display? If so, their presence should be labeled or otherwise delineated for further analysis. This question would include level changes, apparent or obvious differences in background content, or any other form of aurally perceptible variances.

d) Were there background conversations or content? For example, were there radio communications or other perceptible speech, or repetitive noise that would aid in determining authentication? If so, they should be labeled or otherwise delineated for reference and further analysis.

13

Page 120: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

e) If (a) through (d) render any form of anomaly or evident difference, then further test recordings utilizing the DORD should be produced in an attempt to recreate or mimic the differences or anomalies detected and located. If the further tests can or cannot do so, that revelation should be reported.

5.2.8 These and other findings should be reported upon, verified in the waveform, and their precise location noted for future reference. Once these procedures have been accomplished, then the next step shall be to perform the photo-microscopic examination and analysis.

5.3 Photo-microscopic analysis

5.3.1 Test recordings for the specific purpose of photo-microscopic analysis should be produced. These test recordings should include all of the record function signatures of the DORD.

5.3.2 The test recordings should be examined under the microscope, in a scientific manner, which would allow the

PTA to view the magnetic domain (Bitter patterns) of the record function signatures of the test recording examined.

See annex A for informative references.

5.3.3 The known exemplars produced, viewed, and examined can familiarize the FTA with how the function signatures of the DORD appear. The FTA can now be enabled to make measurements, take photographs, videotape, and otherwise memorialize the procedure and the resulting findings.

5.3.4 The FTA can now perform the same examination and analysis upon the designated original recording. This procedure, when performed in a scientific manner, can enable the FTA to attempt to identify the signatures located on the questioned recording. The FTA can now make comparisons and other forms of tests resolving the issue of authenticity as it pertains to the recording that the FTA is examining. It further allows the opportunity to demonstrate the s findings by means of measurements, photographs, videotapes, or any other form of demonstrative means that could be reviewed by the employers, the courts, or juries and other experts, opposing or consulting.

5.3.5 An FTA can now draw conclusions from these findings, including whether or not the DORD actually recorded the designated original recording. An FTA’s finding could either validate this fact or disprove it. In some cases no definitive solution can be made.

5.4 The formulation of an opinion and conclusion

5.4.1 Once an FTA has performed all of the testing procedures and rendered scientific findings, it should be sure to have:

14

Page 121: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

a) performed all of the tests and examinations in a scientific manner, that if recreated or duplicated by another expert would render the exact same findings; for example, if the PTA has found and identified a stop/start recording signature at a specific location on the questioned recording, another expert or analyst could or would find and identify that same signature at the same location;

b) produced comprehensible and repeatable graphic waveform displays, printouts, or any other form of graphic rendering that would demonstrate the FTA’s findings in the waveform domain, so that another expert or any other individual could view them in an effort to determine whether the FTA’s findings exist and are valid;

c) produced sufficient photographs, videotapes, or any other form of definitive renderings that would demonstrate the FTA’s findings in the magnetic domain, so that another expert or any other individual could view them in an effort to determine whether the FTA’s findings exist and are valid.

5.4.2 If asked, an FTA should render a comprehensive report that would effectively demonstrate all of the procedures and findings, in a scientific manner, that would survive objective scrutiny and lend credence to its opinion and conclusion.

 

5.4.3 As to what an FTA hears or perceives in the playback of the DOR that is not demonstrable, that information would be categorized as subjective and left to the courts, juries, or other parties to determine its relevance, validity, or both. It may, however, be reported thereon.

5.4.4 After an FTA has completed all of the tests and examinations, has analyzed and memorialized all of the findings, and has either rendered a comprehensive written report or rendered an oral report to the employers regarding this opinion and conclusion, based on a high degree of scientific certainty, an FTA may be permitted to testify as to its opinion and conclusion.

6 Testimony

Once an FTA has finalized its examination and analysis and reached a definitive conclusion and opinion, the FTA may be available for testimony if called upon to do so.

6.1 Preparation

6.1.1 To adequately prepare for testimony, an FTA shall attend to its files so that notes, correspondence, data, and other written or otherwise demonstrable information are in a comprehensive form. This requirement includes the cataloging of all the evidence submitted, the test recordings produced, and any

15

Page 122: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

and all demonstrative renderings that may be requested to be viewed by the opposing parties, their experts, and the engaging party.

6.1.2 Once the files .are in order, then an FTA should review all findings in a comprehensive fashion to determine that all of the calculations, demonstrative renderings, reports, and supporting information are complete and, most importantly, accurate. The PTA should thoroughly review its deposition, if one had occurred, and any and all forms of reports it may have previously rendered.

6.1.3 When an FTA is reasonably assured that it is prepared, then the FTA shall proceed to prepare its employer and first, and at the very least, demonstrate the following:

a) that the FTA followed the criteria for authentication as strictly as possible;

b) that the FTA had attained a high degree of scientific certainty as to its findings, opinion, and conclusion;

c) that all of the FTA’s demonstrative waveform or spectral renderings are accurate and truthfully demonstrate all of the findings which it claims are located on the questioned recording; further, that if any other competent expert or party examined the FTA’s waveforms, it can locate the signatures, events, edits, or anomalies that are graphically demonstrated in the FTA’s depictions at or about the same location as did the FTA presenting the findings;

d) that all of the FTA’s photographs or other forms of visual magnetic domain renderings are accurate and truthfully demonstrate its findings located on the questioned recording; further that if any other competent expert or party examined the magnetic domain, that party could and would locate the signatures, events, edits, or anomalies that are demonstrated in the PTA’s depictions at or about the same location as did the FTA when presenting the findings;

e) that the FTA has performed its examination in the utmost unbiased ethical manner and that it believes the findings, opinion, and conclusion would withstand the scrutiny of peers and the legal process;

f) that the FTA should submit or make available to its employer all reference materials, instrumentation manuals, literature or any other form of documentation or data that the FTA has relied upon during its examination and analysis, in rendering its opinion, or both. Further, the FTA should attempt to familiarize its employer in the syntax, nomenclatures, or terminology utilized in their field;

g) that the FTA should assert that its employer can rely upon the FTA to professionally and truthfully testify as to the findings with the utmost assurance within its capabilities and competence.

6.1.4 At this point the engaging party may further interview or mock cross examine an FTA in an attempt to ascertain any issues relating to the findings,

16

Page 123: Enhancement of tape recorded voices to facilitate ... of tape recorded voi…  · Web viewEnhancement of tape recorded voices to facilitate transcription & aural identification:

opinions and conclusions rendered or any issues relating to prior testimony given by an FTA.

6.2 Problems

6.2.1 From time to time there are problems educating or relating findings to the engaging party. The FTA should avail itself in an effort to clearly address the issues caused by its findings or the engaging party’s apprehensions if any exist.

6.2.2 If an FTA senses or is otherwise led to believe that the engaging party has difficulty in comprehending the issues, or its findings, opinions, and conclusions, the FTA may suggest further preparation or offer the services of another expert to further clarify the issues or perform an independent examination and analysis of the questioned recording, in an attempt to satisfy the doubt of the engaging party or otherwise assure it of the testimony to be presented.

 

17