fundamentals of digital audio

FUNDAMENTALS OF DIGITAL AUDIONew Edition

THE COMPUTER MUSIC AND DIGITAL AUDIO SERIES

John Strawn, Founding Editor

James Zychowicz, Series Editor

Digital Audio Signal ProcessingEdited by John Strawn

Composers and the ComputerEdited by Curtis Roads

Digital Audio EngineeringEdited by John Strawn

Computer Applications in Music: A BibliographyDeta S. Davis

The Compact Disc HandbookKen C. Pohlman

Computers and Musical StyleDavid Cope

MIDI: A Comprehensive IntroductionJoseph RothsteinWilliam Eldridge, Volume Editor

Synthesizer Performance andReal-Time TechniquesJeff PressingChris Meyer, Volume Editor

Music ProcessingEdited by Goffredo Haus

Computer Applications in Music:A Bibliography, Supplement IDeta S. DavisGarrett Bowles, Volume Editor

General MIDIStanley Jungleib

Experiments in Musical IntelligenceDavid Cope

Knowledge-Based Programming forMusic ResearchJohn W. Schaffer and Deron McGee

Fundamentals of Digital AudioAlan P. Kefauver

The Digital Audio Music List: A CriticalGuide to ListeningHoward W. Ferstler

The Algorithmic ComposerDavid Cope

The Audio Recording HandbookAlan P. Kefauver

Cooking with CsoundPart I: Woodwind and Brass RecipesAndrew Horner and Lydia Ayers

Hyperimprovisation: Computer-Interactive Sound ImprovisationRoger T. Dean

Introduction to AudioPeter Utz

New Digital Musical Instruments:Control and Interaction Beyond the KeyboardEduardo R. Miranda and

Marcelo M. Wanderley, with a Foreword by Ross Kirk

Fundamentals of Digital AudioNew EditionAlan P. Kefauver and David Patschke

Volume 22 • THE COMPUTER MUSIC AND DIGITAL AUDIO SERIES

FUNDAMENTALS OF DIGITAL AUDIO

New Edition

Alan P. Kefauver and David Patschke

Í

A-R Editions, Inc.

Middleton, Wisconsin

Library of Congress Cataloging-in-Publication Data

Kefauver, Alan P.

Fundamentals of digital audio / By Alan P. Kefauver and David Patschke.

-- New ed.

p. cm. -- (Computer music and digital audio series)

ISBN 978-0-89579-611-0

1. Sound--Recording and reproducing--Digital techniques. I. Patschke,

David. II. Title.

TK7881.4.K4323 2007

621.389'3--dc22

2007012264

A-R Editions, Inc., Middleton, Wisconsin 53562

© 2007 All rights reserved.

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1 1. Music—Data

processing. 2. Music—Computer programs.

Contents

List of Figures ix

Preface to the New Edition xiii

Chapter One The Basics 1Sound and Vibration 1

The Decibel 9

The Analog Signal 16

Synchronization 19

Chapter Two The Digital PCM Encoding Process 23Sampling 23

Quantization 27

Analog-to-Digital Conversion 32

Chapter Three The Digital Decoding Process 45Data Recovery 45

Error Detection and Correction 47

Demultiplexing 50

Digital-to-Analog Converter 50

Sample and Hold 53

Reconstruction Filter 53

Output Amplifier 56

Chapter Four Other Encoding/Decoding Systems 57Higher-Bit-Level Digital-to-Analog Converters 57

Oversampling Digital-to-Analog Converters 57

Oversampling Analog-to-Digital Converters 59

One-Bit Analog-to-Digital Converters 59

Direct Stream Digital (DSD) 61

High Definition Compact Digital (HDCD) 62

Chapter Five Data Compression Formats 63Lossless Compression 64

Lossy Compression 65

Chapter Six Tape-Based Storage and Retrieval 75Systems

Rotary Head Tape Systems 76

Digital Audiotape (DAT) Systems 81

Record Modulation Systems 84

Multitrack Rotary Head Systems 86

8mm, 4mm, and Digital Linear Tape (DLT) Storage Systems 91

Fixed-Head Tape-Based Systems 92

Chapter Seven Disk-Based Storage Systems 101Optical Disk Storage Systems 101

Magnetic Disk Storage Systems (Hard Disk Drives) 120

Solid State (Flash) Memory 125

Chapter Eight Digital Audio Editing 127Tape-Based Editing Systems 127

Disk-Based Editing Systems 130

Personal Computers and DAWs 133

Chapter Nine The Digital Editing and Mastering 147Session

Tracks, Input/Output Channels, and Tracks 149

The Editing Session 152

The Multitrack Hard-Disk Recording Session 161

Chapter Ten Signal Interconnection and 167Transmission

Electrical Interconnection 167

Optical Interconnection 170

Digital Audio Broadcast 174

Glossary 177Further Reading/Bibliography 185Index 187

Chapter 1

Figure 1.1 A sound source radiating into free space.

Figure 1.2 A musical scale.

Figure 1.3 The musical overtone series.

Figure 1.4 The envelope of an audio signal.

Figure 1.5 A professional sound level meter (courtesy B and K Corporation).

Figure 1.6 Typical sound pressure levels in decibels.

Figure 1.7 The inverse square law.

Figure 1.8 The Robinson-Dadson equal loudness contours.

Chapter 2

Figure 2.1 Waveform sampling and the Nyquist frequency.

Figure 2.2 Waveform sampling at a faster rate.

Figure 2.3 Waveform sampling and aliasing.

Figure 2.4 Voltage assignments to a wave amplitude.

Figure 2.5 Comparison of quantization numbering systems.

Figure 2.6 Offset binary and two’s complement methods.

Figure 2.7 A waveform and different types of digital encoding.

Figure 2.8 Block diagram of a PCM analog-to-digital converter.

Figure 2.9 Filter Schematics.

Figure 2.10 Various filter slopes for anti-aliasing.

Figure 2.11 The sample-and-hold process.

Figure 2.12 A multiplexer block schematic.

Figure 2.13 Interleaving.

Chapter 3

Figure 3.1 Block diagram of a digital-to-analog converter.

Figure 3.2 Reshaping the bit stream.

Figure 3.3 Three error correction possibilities.

List of Figures

ix

Figure 3.4 An 8-bit weighted resistor network converter.

Figure 3.5 A dual slope–integrating converter.

Figure 3.6 The effects of hold time on the digital-to-analog process.

Figure 3.7 Reconstruction of the audio signal.

Chapter 4

Figure 4.1 The effect of oversampling digital-to-analog converters.

Figure 4.2 PCM and PWM (1-bit) conversion.

Chapter 5

Figure 5.1 Compression/decompression in the digital audio chain.

Figure 5.2 A block diagram of lossless encoding and decoding.

Figure 5.3 A block diagram of an MPEG-1 Layer I (MP1) encoder.

Figure 5.4 A block diagram of an MPEG-1 Layer III (MP3) encoder.

Figure 5.5 A block diagram of an AAC encoder.

Figure 5.6 A block diagram of an AC-3 encoder.

Figure 5.7 ATRAC codec overview.

Chapter 6

Figure 6.1 a. Perspective view of tape wrap around a video head drum; b. topview of tape wrap around a video head drum showing details.

Figure 6.2 a. Track layout on an analog 3/4 helical scan videotape recorder; b. PCM-1630 processor and associated DMR-4000 U-matic® recorder(courtesy Sony Corp.).

Figure 6.3 a. Tape wrap on a DAT recorder’s head drum; b. track layout on a DATrecorder.

Figure 6.4 Channel coding for record modualtion.

Figure 6.5 Frequency allocation for a. an 8mm and b. a Hi8 video recorder.

Figure 6.6 A Hi8mm-based eight-channel multitrack recorder (courtesy Tascam)showing a. the main unit and b. the controller.

Figure 6.7 An S-VHS-based eight-channel multitrack recorder (courtesy Alesis).

Figure 6.8 Cross-fading between the read and write heads on a digital taperecorder.

Figure 6.9 Track layout on a DASH 48- and 24-track digital tape recorder.

Figure 6.10 Cross-fading between the two write heads on a DASH digital taperecorder.

Figure 6.11 a. Transport layout on a Sony PCM-3324A digital tape recorder; b. a 48-track multitrack digital tape recorder (photo courtesy Sony Corp.).

x FUNDAMENTALS OF DIGITAL AUDIO

Figure 6.12 Cross-fading between the read and write heads on a ProDigi digitalmultitrack recorder.

Chapter 7

Figure 7.1 Pit spacing, length and width for the compact disc. Note how pit lengthdefines the repeated zeroes.

Figure 7.2 Light reflected directly back maintains the 0 series while the transitionscatters the beam, denoting a change. Dust particles on the substrateare out of focus.

Figure 7.3 Compact disc specifications.

Figure 7.4 The compact disc pressing process.

Figure 7.5 Compact Disc Interactive (CD-i) audio formats.

Figure 7.6 a. The front panel of a CD-R machine; b. the rear of the same machineshowing the RS-232 interface as well as the ES/EBU, S/PDIF, and analoginputs and outputs (courtesy Tascam).

Figure 7.7 Super CD.

Figure 7.8 The recording system for the MiniDisc.

Figure 7.9 The playback system for the MiniDisc.

Figure 7.10 A professional magneto-optical recorder (courtesy Sony Corp.).

Figure 7.11 MiniDisc specifications.

Figure 7.12 A computer hard disk showing sectors and tracks.

Figure 7.13 A computer hard disk stack showing the concept of cylinders.

Figure 7.14 Various flash-memory products, from left to right: Secure Digital, Com-pactFlash, and Memory Stick (courtesy SanDisk).

Chapter 8

Figure 8.1 The assemble edit process.

Figure 8.2 A professional digital editor for tape-based digital recorders (photocourtesy Sony Corp.).

Figure 8.3 a. A 1-Terabyte RAID Level-0 configuration diagram; b. a 500-Gb RAIDLevel-1 configuration diagram.

Figure 8.4 a. The front panel of an 8-channel A-to-D (and D-to-A) converter; b. the back panel of the same unit. Notice the FireWire connections(photos courtesy Tascam).

Figure 8.5 A compressor software plug-in for a computer-based digital audio workstation.

Figure 8.6 A software instrument plug-in (specifically, a Hammond B3 emulator/virtual instrument) for use with a MIDI controller on a computer-baseddigital audio workstation.

LIST OF FIGURES xi

Figure 8.7 Diagram of a computer-based digital audio workstation.

Figure 8.8 A DAW processor card for a PCI slot in a personal computer (photocourtesy Avid/Digidesign).

Figure 8.9 A multi-channel A-to-D (and D-to-A) converter that connects directlyto a processor card located in a computer-based DAW (photo cour-tesy Avid/Digidesign).

Figure 8.10 An example of a control surface to manually adjust DAW software(courtesy Tascam).

Figure 8.11 An example of a stand-alone DAW unit (photo courtesy Tascam).

Chapter 9

Figure 9.1 Editing screen from Digidesign Pro Tools software.

Figure 9.2 Editing screen from Apple Logic Pro software.

Figure 9.3 Editing screen from Steinberg Nuendo software.

Figure 9.4 Mixing window from Digidesign Pro Tools software.

Figure 9.5 Mixing window from Apple Logic Pro software.

Figure 9.6 a. A portion of 2-track (stereo) material selected in Digidesign ProTools software; b. that same portion being duplicated and draggedinto a new stereo track.

Figure 9.7 An edit window in Digidesign Pro Tools software showing a. nominalresolution of a waveform, b. “zooming in” on a portion of the waveform,and c. maximum magnification of the waveform for precise selection.

Figure 9.8 The cross-fade and -3dB down point of a digital edit.

Figure 9.9 The crossfade editing process in Digidesign Pro Tools software: a. se-lecting the portion of material to crossfade, and determining the pre-cise options for the fade; b. The resulting waveform after the fade.

Figure 9.10 A magnetic tape splice with a linear slope.

Figure 9.11 The crossfade editor in Apple Logic Pro software.

Figure 9.12 a. An example of an input matrix for Digidesign Pro Tools software; b. An output matrix example for the same.

Figure 9.13 A multitrack project opened in Digidesign Pro Tools software. Notethat although there are 18 tracks (some of them stereo, even), in thisexample the hardware supports only up to 8 analog outputs and inputs.

Figure 9.14 An equalization software plug-in for a software-based DAW.

Figure 9.15 A reverb software plug-in for a software-based DAW.

Chapter 10

Figure 10.1 Frequencies and their uses.

xii FUNDAMENTALS OF DIGITAL AUDIO

Preface to the New Edition

Back when I was in school—and it wasn’t so long ago—being in a recording studiowas an immense (and immersive) experience. Humongous mixers filled wholerooms, with just enough space left for a couple floor-to-ceiling racks full of graphicequalizers, reverb units, noise reduction, gates, and all manner of sound modifica-tion devices. Another wall was filled with large multitrack tape recorders.

I’m remembering the good old days as I sit on my deck, typing this on my laptopcomputer that has built into it a humongous mixer; equalizers, reverb units, gatesand all manner of sound modification devices; and the capability to record or ex-port my finished audio programs in any number of professional formats.

Whether because of the demand for quality and portability or because of itsleaner data requirements, audio was at the forefront of the digital entertainmentrevolution. Audio and music today exist almost exclusively in digital form for threereasons: reduced manufacturing costs, Moore’s Law, and the Internet.

The Internet you know about: a sprawling, worldwide network of computersthrough which increasing numbers of people get their information and entertain-ment and do increasing amounts of their business.

Moore’s Law goes something like this: Computers double in speed for the samecost every 18 months. In addition, the price-per-byte of memory continues to fallsimilarly from year to year. I remember spending $200 for 256 kilobytes of memoryfor my first computer (an 8 MHz model). Today, that same $200 would buy abouttwo gigabytes—almost an 8,000x increase. What all this means is that even off-the-shelf computers today are more than adequate to record, process, edit, and listento multiple channels of digital audio.

Once available only to the most elite recording institutions, high quality digitalaudio recording is now accessible to almost everyone. Consequently, unlike yearsago, when high-quality equipment and restricted distribution dictated what wouldbe recorded and how it would be heard, today anyone can be an artist, engineer,and label.

Some people do well with this accessibility and others do not. Learning the fun-damentals won’t give you insight for making subjective production decisions, but itwill give you a solid basis for making them. Don’t get too comfortable with your

xiii

new high tech equipment and skills though, because the one fact about this indus-try is that everything will continue to change, like it or not, and faster than youwould prefer.

That makes understanding the basic concepts and principles underlying yourequipment all the more important. And that is what this book hopes to accomplish.

xiv FUNDAMENTALS OF DIGITAL AUDIO

ONEThe Basics

1

SOUND AND VIBRATION

Sound has been discussed many times and in many ways in almost every bookwritten about recording and the associated arts. Yet, without a thorough knowledgeof the nature of sound, understanding how analog sound is handled and convertedto and from the digital environment is impossible. Sound, by its nature, is illusive,and defining it in concise terms is not easy.

The Basic Characteristics of Sound

The major components of sound are frequency, amplitude, wavelength, and phase.Other components are velocity, envelope, and harmonic structure. Because we dis-cuss sound as a wave later in this book, now may be a good time to visualize asound wave in the air. Figure 1.1 shows a sound wave traveling through free space.At the source of the sound (point a), a sound wave radiates omnidirectionally awayfrom itself. The sound energy from the sound source is transferred to the carryingmedium (in this case air) as air compressions (point b) and rarefactions (point c).The air itself does not move, but the movement of individual air particles is trans-ferred from one particle to the next in much the same way that waves move on thesurface of the sea. It is probably easiest to visualize the creation and movement ofsoundwaves by imagining a large drum begin struck: the head (or skin) of the drummoves in and out after it is struck, causing subsequent air compressions and rar-efactions emanating away from the drum.

Velocity

The speed of energy transfer equals the velocity of sound in the described medium.The velocity of sound in air at sea level at 70 degrees Fahrenheit is 1,130 feet persecond (expressed metrically as 344 meters per second). The velocity of sound de-pends on the medium through which the sound travels. For example, sound travels

through steel at a velocity of about 16,500 feet per second. Even in air, the velocityof sound depends on the density of the medium. For example, as air temperatureincreases, air density drops, causing sound to travel faster. In fact, the velocity ofsound in air rises about 1 foot per second for every degree the temperature rises.The formula for the velocity of sound in air is:

V 49 459.4 ˚F

2 FUNDAMENTALS OF DIGITAL AUDIO

c. Rarefaction

b. Compression

a. Sound source

FIGURE 1.1 A sound source radiating into free space.

Note in Figure 1.1 that as sound moves farther from its source, its waves becomeless spherical and more planar (longitudinal).

Wavelength

The distance between successive compressions or rarefactions (i.e., the soundpressure level over and under the reference pressure) is the wavelength of thesound that is produced. The length of the sound wave is itself not that useful forour purposes.

Wavelength VelocityFrequency

However, the frequency of the sound is important. Because we know the velocity ofsound, it is easy to determine the frequency when the wavelength is known. Thesimple formula can be changed to read:

Frequency VelocityWavelength

Thus, if the distance from compression to compression is 2.58 feet, the frequency is440 cycles per second, or 440 hertz (Hz). The period of the wave, or the time ittakes for one complete wave to pass a given point, can be defined by the formula

Period 1Frequency

Therefore, the period of this wave is 1/440 of a second. As the period becomesshorter, the frequency becomes higher. This sound, with a frequency of 440Hz anda period of 1/440 of a second, is referred to in musical terms as “A,” that is, the Aabove middle C on the piano. Although sounds exist both higher and lower andrange varies from person to person, we’ll assume generally that the range of humanhearing is approximately 20 Hz to 20,000Hz (20kHz).

Musical Notation and Frequency

It is beyond the scope of this book to teach the ability to read musical notation.However, this skill is essential to becoming a competent recording engineer, andthe student is strongly advised to pursue the study of music if preparing for a career in the recording arts.

As mentioned above, the note A has a frequency of 440Hz (this is the note occu-pying the second space of the treble-clef staff). The A that is located on the top line

THE BASICS 3


of the bass clef staff is an octave below 440Hz and has a frequency of 220Hz. An oc-tave relationship is a doubling or halving of frequency. Figure 1.2 shows a musicalscale with the corresponding frequencies.

Harmonic Content

Very few waves are a pure tone (i.e., a tone with no additional harmonic content).In fact, only a sine wave is a true pure tone. When an object (e.g., a bell) is struck,several tones at different frequencies are produced. The fundamental resonancetone of the bell is heard first, followed by other frequencies at varying amplitudes.The next tone is a doubling of the fundamental frequency (an octave) above that,followed by another doubling that is heard as a musical interval of a fifth. Each ofthese harmonics changes amplitude slightly over time, adding to the individualcharacteristics of the sound.

For example, a bell with a fundamental frequency of 64Hz produces harmonics of128Hz and 192Hz, which is a G, a fifth above the second C. Many other harmonicsat varying amplitudes are produced, depending on the metallic composition of thebell. These harmonics are arranged in a relationship called the overtone series, andthe combination of these harmonies, or overtones, gives sound its specific timbre,or tone coloration.

The difference in harmonic content makes an oboe sound different from a clarinet.Although an oboe and a clarinet produce the same note with the same fundamentalfrequency, the number of overtones and the amplitude of each differs.

The overtone series is most often notated in the musical terminology of octaves,fifths, fourths, and so on, but actually corresponds to the addition of the fundamen-tal frequency. Therefore, an overtone series based on the note C is, in hertz, 65,130, 195, 260, 325, 390, 455, 520, 585, 650, 715, and so on. In musical terms this is aseries of the fundamental followed by an octave, a perfect fifth, a perfect fourth, amajor third, a minor third, another minor third, three major seconds, and a minor

ŠÝ Ł Ł Ł Ł Ł Ł Ł

Ł Ł Ł Ł Ł Ł Ł Ł

C130Hz

D E F G A220Hz

B c260Hz

d e f g a440Hz

b c1520Hz

FIGURE 1.2 A musical scale.

second (see Figure 1.3). The frequency that is twice the frequency of the fundamen-tal is called the second harmonic even though it is the first overtone. Confusion of-ten exists concerning this difference in terminology.

The third harmonic is the first non-octave relationship above the fundamental (it is the fifth), and as such, any distortion in this harmonic tone is often detectedbefore distortion is heard in the tone a fifth below (the second harmonic). Manyanalog audio products list the percentage of third harmonic distortion found (be-cause it is the most audible) as part of their specifications.

The Sound Envelope

All sounds have a start and an end. The drop in amplitude, called a decay, often occurs after the start of the sound and is followed by a sustain and a final drop inamplitude before the event ends completely. The four main parts of the sound enve-lope are (1) the attack, (2) the initial decay, (3) the sustain (or internal dynamic), and(4) the final decay (often referred to as the release). These are shown in Figure 1.4.

The attack is the time that it takes for the sound generator to respond to what-ever has set it into vibration. Different materials, with their different masses, havedifferent attack times. How an object is set into vibration also affects its attacktime. A softly blown flute has a longer attack time than a sharply struck snaredrum. In general, struck instruments have an attack time that is much faster (in the1- to 20-millisecond range) than wind instruments (in the 60- to 200-millisecondrange). String attack times vary, depending on whether the instrument is bowed orplucked. The attack time of an instrument can be represented by an equivalent fre-quency based on the following formula:

T 1Frequency

THE BASICS 5

ŠÝŁ Ł Ł

Ł Ł Ł Ł− Ł Ł Ł Ł

Fundamental2nd harmonic

3rd harmonic

4th harmonic

FIGURE 1.3 The musical overtone series.

Rearranging this formula gives:

Frequency 1T

This means that ifi an instrument has an attack time of 1 millisecond, the equiva-lent frequency is 1 kilohertz (kHz). This fact is important to remember when tryingto emphasize the attack of an instrument. The prudent engineer remembers thiswhen applying equalization, or tonal change, to a signal.

The initial decay, which occurs immediately after the attack on most instru-ments, is caused by the cessation of the force that set the tube, string, or mediuminto vibration. It is the change in amplitude between the peak of the attack and theeventual leveling-off of the sound, known as the sustain. The length of the sustain


Initialattack

Sustain Finaldecay

Initialdecay

FIGURE 1.4 The envelope of an audio signal.

varies, depending on whether the note is held by the player for a specific period oftime (e.g., when a trumpet player holds a whole note) or on how long the mediumcontinues to vibrate (medium resonance) before beginning the final decay, whichoccurs when the sound is no longer produced by the player or by the resonance ofthe vibrating medium. As the trumpet player releases a held note, the air column inthe instrument ceases to vibrate, and the amplitude decays exponentially until it isno longer audible. Final decays vary from as short as 250 milliseconds to as long as100 seconds, depending on the vibrating medium. However, not all frequencies orharmonics decay at the same rate. Most often, the high-frequency components ofthe sound decay faster than the low-frequency ones. This causes a change in thesound’s timbre and helps define the overall sound of the instrument (or othersound-producing device).

Masking

Many references can be found in the literature about the equal loudness contours(discussed later in this chapter) developed by Fletcher and Munson and later up-dated by Robinson and Dadson. These curves relate our perception of loudness at varying frequencies and amplitudes and apply principally to single tones. Howmany times have you sat in a concert hall listening to a recital in which one instru-mentalist drowned out another? Probably more than once. When you consider thefact that a piano, even at half-stick (i.e., with the lid only partially raised), can pro-duce a level of around 6 watts, whereas a violinist, at best, produces 6 milliwatts, itis easy to understand why the violinist cannot be heard all the time. Simply stated,loud sounds mask soft ones. The masking sound needs to be only about 6 decibelshigher than the sound we want to hear to mask the desired sound.

It makes perfect sense that a loud instrument will cover a softer one if they areplaying the same note or tone, but what if one instrument is playing a C and an-other an A? Studies by Zwicker and Feldtkeller have shown that even a narrowband of noise can mask a tone that is not included in the spectrum of the noise it-self. For example, a low-frequency sine wave can easily mask, or apparently reducethe level of, a higher sinusoidal note that is being sounded at the same time. Thismasking occurs within the basilar structure of the ear itself. The louder tonecauses a loss in sensitivity in neighboring sections of the basilar membrane. Thegreatest amount of masking occurs above the frequency of the masking signalrather than below it, and the greater the amplitude of the masking signal, the widerthe frequency range masked.

However, when a note is produced by an instrument that has a complicated,dense sound spectrum (i.e., a note with a rich harmonic structure), that sound willusually mask sounds that are less complicated (less dense). In fact, many newerbuildings that use modular, or “carrel,” types of office space are equipped to sendconstant low-level random wide-band noise through loudspeakers in their ceilings.

THE BASICS 7

This masking signal keeps the conversations in one office carrel from intruding intoadjacent office space. The level of the masking signal is usually around 45 decibels,and this (plus the inverse square law, which is discussed later) effectively providesspeech privacy among adjacent spaces. This effect is used also by some noise re-duction systems when, as the signal rises above a set threshold, processing actionis reduced or eliminated. In addition, the masking effect is a critical part of thedata-reduction systems used in some of the digital audio storage systems dis-cussed later in this book.

Localization

A person with one ear can perceive pitch, amplitude, envelope, and harmonic con-tent but cannot determine the direction from which the sound originates. The abil-ity to localize a sound source depends on using both ears, often referred to asbinaural hearing. Several factors are involved in binaural hearing, depending on thefrequency of the sound being perceived.

The ears are separated by a distance of about 6 1/2 or 7 inches so that soundwaves diffract around the head to reach both ears. When the wavelength of soundis long, the diffraction effect is minimal and the comparative amplitude at each earabout the same. However, when the wavelength is short, the diffraction effect isgreater and the sound attenuated at the farther ear. Because the sound has totravel farther relative to wavelength, there is a perceptual time difference factor aswell. You may have noticed that it is easier to locate high-frequency sounds thanlow-frequency ones. In fact, low-frequency signals often appear to be omnidirec-tional because of this effect. We can say that high frequencies (above 1kHz) are lo-calized principally by amplitude and time differences between the two ears.

How, then, do we localize low frequencies? With all sound there is a measurabletime-of-arrival difference between the two ears when the sound is to one side orthe other. As the sound moves to a point where it is equidistant from both ears,these time differences are minimized. With longer wavelengths the time-of-arrivaldifferences are less noticeable because the ratio of the time difference to the pe-riod of the wave is large. However, this creates phase differences between the twoears, allowing the brain to compute the relative direction of the sound. Therefore,we can say that low frequencies are located by intensity and phase differences.

The Haas Effect and the Inverse Square Law

Although much has been said and written about the Haas Effect, also known as theprecedence effect, and the inverse square law, we can say that the sound we hearfirst defines for us the apparent source of the sound. Consider the following ex-ample. Two trumpet players stand in front of you, one slightly to the right at a dis-tance of 5 feet and another slightly to the left at a distance of 10 feet. If they play


the same note at the same amplitude, you will localize the sound to the right be-cause it is louder. This amplitude difference is due to the inverse square law, whichstates that for every doubling of distance there is a 6-decibel loss in amplitude infree space (i.e., where there are no nearby reflecting surfaces). Now suppose thatboth players sustain their notes. If player A (on the left) increases his amplitude by6 decibels, the sound levels will balance, but your ear-brain combination will insistthat player B (on the right) is closer to you than player A. Although the levels havebeen equalized, you perceive the nearer player to be closer. You would think thatas long as the levels are identical at both ears, the players would appear to beequidistant from you.

According to Haas, “Our hearing mechanism integrates the sound intensitiesover short time intervals similar, as it were, to a ballistic measuring instrument.”This means that the ear-brain combination integrates the very short time differ-ences between the two ears, causing the sound with the shortest timing differencesto appear louder and therefore closer. Haas used two sound sources and, while delaying one, asked a subject to vary the loudness of the other source until itmatched the sound level of the delayed sound. He found that where the delay wasgreater than 5 but less than 30 milliseconds, the amplitude of the delayed sourcehad to be 10 decibels louder than the signal from the nondelayed source for thetwo sounds to be perceived as equal. Beyond a delay of 30 milliseconds a discreteecho was perceived, and prior to 5 milliseconds the level needed to be increasedincrementally as the delay lengthened.

THE DECIBEL

Earlier in this chapter we discussed the basic characteristics of sound, but one thatwas conspicuously absent was amplitude. The unit of measure that is normallyused to define the amplitude of sound levels is the decibel (dB). A Bel is a large,rather cumbersome unit of measure, so for convenience it is divided into 10 equalparts and prefaced with deci to signify the one-tenth relationship. Therefore, a deci-bel is one tenth of a Bel.

A sound level meter measures sound in the environment. A large commercial air-craft taking off can easily exceed 130dB, whereas a quiet spot in the summer woodscan be a tranquil 30dB. Most good sound level meters are capable of measuring inthe range of 0dB to 140dB. You might think that the 0dB level is the total absence ofsound, but it is not. Actually, 0dB is the lowest sound pressure level that an aver-age listener with normal hearing can perceive. This is called the threshold of hear-ing. The 0dB reference level corresponds to a sound pressure level of 0.00002 dynesper square centimeter (dynes/cm2), which, referenced to power or intensity inwatts, equals 0.000000000001 watts per square meter (W/m2). Also referred to inthis book is the threshold of feeling, which is typically measured as an intensity of1W/m2. A typical professional sound level meter is shown in Figure 1.5.

THE BASICS 9

As you can see, the decibel must have a reference, which, when we discusssound levels in an acoustic environment, is the threshold of hearing. In fact, thedecibel is defined as 10 times the logarithmic relationship between two powers.The formula for deriving the decibel is:

dB 10logPowerA PowerB

where PowerA is the measured power and PowerB the reference power. We can usethis formula to define the amplitude range of human hearing by substituting thethreshold of hearing (0.000000000001W/m2) for the reference power and using thethreshold of feeling (1W/m2) as the measured power. The formula, with the propervalues inserted, looks like this:


FIGURE 1.5 A professional sound level meter (courtesy B and K Corporation).

1W/m210log

1 1012 W/m2 120dB

Therefore, the average dynamic range of the human ear is 120dB. Figure 1.6 showstypical sound pressure levels, related to the threshold of hearing, found in oureveryday environment. Note also the level called the threshold of pain.

It is interesting to note that other values can be obtained using the power for-mula. For example, if we use a value of 2W in the measured power spot and a valueof 1W in the reference power spot, we find that the result is 3dB. That is, any 2-to-1power relationship can be defined simply as an increase in level of 3dB. Whetherthe increase in power is from 100W to 200W or from 2,000W to 4,000W, the increasein level is still 3dB.

The inverse square law was mentioned earlier in this chapter. You might surmisethat doubling distance would cause a sound level loss of one half, or -3dB. How-ever, you must remember that sound radiates omnidirectionally from the source ofthe sound. Recall from your high school physics class that the area of a sphere isdetermined by the formula a 4πr2. It follows that when a source radiates to a pointthat is double the distance from the first, it radiates into four times the area insteadof twice the area. This causes a 6dB loss of level instead of a 3dB loss. The formulafor the inverse square law is:

Level drop 10logr22 20logr2 6dB

r1 r1

where r1 equals 2 feet and r2 equals 4 feet. Figure 1.7 shows this phenomenon. Notethat at a distance of 2 feet, the sound pressure level is 100dB. When the listenermoves to a distance of 4 feet (i.e., twice the original distance), the level drops to94dB.

Equal Loudness Contours

Ambient, or background, noise is all around us. Even in a quiet concert hall, themovement of air in the room can be 30dB or more above the threshold of hearingat one or more frequencies. However, the ear is not equally sensitive to sound pres-sure at all frequencies.

The ear is most sensitive to the frequency range between 3,000Hz and 5,000Hz. Atone that is heard just at the threshold of hearing (1 x 10-12W/m2) at a frequency of4,000Hz corresponds to a sound pressure level of 0dB. For the tone to be perceivedby the same listener at the same loudness level when the frequency is lowered to400Hz, the amplitude of the tone must be raised about 12dB. If we lower the fre-quency to 40Hz, the level must be raised nearly 45dB to be perceived at the samevolume. Figure 1.8 shows the equal loudness contours developed by Fletcher andMunson and updated later by Robinson and Dadson.

THE BASICS 11


0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170

180 Rocket engines

Threshold of pain

Threshold of feeling

Subway train

Heavy truck traffic

Average factory

Average conversation

Average office

Full symphony orchestra @FFF

Average full symphony orchestra

Jet engine (close up)

Level in dB

Dang

er L

evel

s

Dripping water

Power lawn mower

Jackhammer

Chamber orchestra

Quiet concert hall

Quiet recording studio

Threshold of hearing

Subdued conversation

FIGURE 1.6 Typical sound pressure levels in decibels.

The contours are labeled phons, which range from 10 to 120; at 1,000Hz the phonlevel is the same as the sound pressure level in decibels. Therefore, the phon is ameasure of equal loudness. Figure 1.8 has a curve labeled “MAF,” which stands for “minimum audible field.” This curve, equal to 0 phons, defines the threshold ofhearing. Note that at low sound pressure levels, low frequencies must be raisedsubstantially in level to be perceived at the same loudness as 1kHz. Frequenciesabove 5kHz (5,000Hz) need to be raised as well, although not as much. Looking atFigure 1.8, you can see that the discrepancies between lows, highs, and mid tonesare reduced as the level of loudness increases. The 90-phon curve shows a varia-tion of 40dB, whereas the 20-phon curve shows one of nearly 70dB.

The sound level meter in Figure 1.5 has several weighting networks so that at dif-ferent sound pressure levels the meter can better approximate the response of thehuman ear. The A, B, and C weighting networks correspond to the 40-, 70-, and 100-phon curves of the equal loudness contours.

THE BASICS 13

Sound source

r1 = 2 feet

r2 = 4 feet

r1

r2

FIGURE 1.7 The inverse square law.

Logarithms

As you may have noticed, the formula used to define the decibel applied logarithms,abbreviated “log.” Anyone involved in the audio engineering process needs to understand logarithms. In brief, a logarithm of a number is that power to which 10must be raised to equal that number—not multiplied, but raised. The shorthand


MAF

120

110

100

90

80

70

60

50

40

30

20

10

120

100

80

60

40

20

10

20 100 1,000 5,000 10,000

Frequency (Hz)

Loudness Level(phons)

Soun

d Pr

essu

re L

evel

(dB

re

20 N

/m2

0

FIGURE 1.8 The Robinson-Dadson equal loudness contours.

notation for this is 10x, where x, the exponent, indicates how many times the number is to be multiplied by 10. Therefore, 103 is 10 raised to the third power, orsimply “10 to the third”:

101 = 10102 = 100103 = 1,000109 = 1,000,000,000

Numbers whose value is less than 1 can be represented with negative exponents,such as

0.1 = 10 -1

0.01 = 10 -2

0.001 = 10 -3

0.000001 = 10 -6

Because we are talking about very large and very small numbers, prefix names canbe added to terms such as hertz (frequency) and ampere (a measure of currentflow) to indicate these exponents. Therefore, for large numbers,

1,000 cycles per second = 103 hertz (or 1 kilohertz [1kHz])106 hertz = 1 megahertz or (1MHz)109 hertz = 1 gigahertz (1GHz)1012 hertz = 1 terahertz (1THz)

For small numbers,

ampere (A) = 10-3 A (or 1 milliamp [1mA])10-6A = 1 microamp (1A)10-9A = 1 nanoamp (1nA)10-12A = 1 picoamp (1pA)

Now you can see that the threshold of hearing, defined earlier as 0.000000000000W/m2, appears in the formula as 1 x 10-12W/m2. It is also helpful to know that whenpowers of 10 are multiplied or divided, you simply add or subtract the exponents.Therefore, 106 x 103 = 106+3 = 109, and 1012 ÷ 109 = 1012-9 = 103. Logarithms of num-bers also exist between, for example, 1 and 10 and 10 and 100, but we will leavethose problems to the mathematicians. Today, the logarithm of any number caneasily be found by either looking them up in a table or pushing the log button on acalculator.

THE BASICS 15

THE ANALOG SIGNAL

In order to record, reproduce, or transmit sound, it first needs to be transducedinto an electrical signal. The beginning of this process requires a microphone. A mi-crophone has a thin diaphragm that is suspended in or attached (in some fashion,depending on the type of microphone) to a magnetic field. The diaphragm movesback and forth in reaction to the sound waves that pass through it, and that move-ment within the magnetic field creates a small electrical signal, which is an electri-cal representation of the compressions and rarefactions of the sound wave. Thesignal is transmitted from the microphone along its cable to be amplified. Micro-phones generate only a tiny amount of signal (measured in volts), which needs am-ple amplification before it can be used for any recording or reproduction. It shouldbe noted that entire volumes (and, indeed, careers) exist that detail microphonesand capturing sound, which have been generalized for our purposes here.

Reference Levels and Metering

When we discuss reference levels, we are talking about the values of the currentand voltage that pass through a cable between two pieces of equipment. Ohm’s lawestablishes some fundamental relationships that we should be aware of. Power, expressed in watts, is equal to either the square of the voltage divided by the resis-tance or the square of the current multiplied by the resistance in the circuit, that is,P = E2/R and P = I2R, respectively. In professional audio circuits we work with volt-age levels (or their corresponding digital values) instead of power levels, and be-cause the resistance in the circuit is constant, we do not need to be concernedwith R at this time. To calculate the difference between two powers, we use thepower formula:

dB 10logPowerA PowerB

Because we really want to know the decibel level referenced to volts, the formulashould read:

dB 10logE2AE2B

To remove the squares from the voltages in the formula we can rewrite the expres-sion as:

dB 20logEAEB


Note now that any 2-to-1 voltage relationship will yield a 6dB change in level in-stead of the 3dB change of the straight power formula.

There is always a standard reference level in audio circuits. This reference level,an electrical zero reference, is equivalent to the voltage found across a common resistance in the circuit. Therefore, we can compare levels by noting how manydecibels the signal is above or below the reference level.

The dBm

The standard professional audio reference level is +4dBm. The dBm is a referencelevel in decibels relative to 1 milliwatt (mW). Zero dBm is actually the voltage dropacross a 600-ohm resistor in which 1mW of power is dissipated. Using Ohm’s law (P = E2/R) we find that the voltage is 0.775 volts RMS. (This value is merely a conve-nient reference point and has no intrinsic significance.) The meters that were usedon the original audio circuits when this standard was enacted were vacuum tubevolt meters (VTVMs). As the demand for more meters grew (as stereo moved tomultitrack), a less expensive meter was needed. An accurate meter, which needs alow impedance, would load down the circuit it was measuring and thereby causefalse readings. To compensate for the loading effect of the meter, a 3.6KΩ resistorwas inserted in the meter path. Now the meter no longer affected the circuit it wasmeasuring, although it read 4dBm lower. When the meter was raised to 0VU (vol-ume units), the actual line level was +4dBm.

The notation “dBu” is often found in the specifications in manuals that comewith digital equipment. As was just mentioned, we know that the dBm is referencedto an impedance of 600 ohms. (This was derived from the early telephone companystandards. Many of our audio standards originated with “Ma Bell.”) However, mostcircuits today are bridging circuits instead of the older style matching circuits, andthe dBu is used as the unit of measure. Without going deeper into the subject ofmatching and bridging circuits, we can say that the dBu is equal to the dBm inmost cases.

The Volume Unit

If the meters on the recording devices that we use to store digital audio displayedsignal level in dBm, they would be very large and difficult to read. Considering thevariety of system operating levels found on recording devices today, a comprehen-sive meter would have to range from -40dBm or -50dBm to +16dBm. This would bea range of around 76dB. To make this easier, we use the volume unit (VU) to mea-sure signal level through the device. In most professional applications, this meter is calibrated so that an input level of +4dBm equals 0VU. However, if the device operates at -10dBm, as many consumer-type ones do, then -10dBm equals 0VU.

THE BASICS 17

Consider an experiment where a calibration tone of 1kHz at 0VU is played backfrom an analog consumer device operating at a level of -10dBm and from a profes-sional analog device operating at +4dBm. If those signals were both sent to an analog-to-digital converter, the digitized signal level would be the same from bothmachines. As this signal is played back from the storage medium, the digital-to-analog converter outputs the calibration tone and produces a level of -10dBm or+4dBm at the device’s output, depending on the device. The same signal playedback on either a consumer or professional device will produce the reference leveloutput at the device’s specified line level. If we were to compare the output of thetwo devices, the +4dBm machine would play back 14dB louder, but this differenceis due to the output gain of the playback section amplifier.

The classic VU meter is calibrated to conform to the impulse response of the hu-man ear. The ear has an impulse response, or reaction time, that is defined as a re-sponse time and a decay time of about 300 milliseconds. This corresponds to thereaction time of the ear’s stapedius muscle, which connects the three small bonesof the middle ear to the eardrum. Therefore, any peak information shorter than 300 milliseconds will not be fully recognized by the meter. The VU meter is de-signed to present the average content of the signal that passes through it.

The Peak Meter

On average, certain types of music have shorter attack and decay times than oth-ers, and, as you will see in Chapter 2 during the discussions of digital recording,once full level is reached, there is no margin for error. A peak meter is a device thatis designed to respond to all signals (no matter how short) that pass through it andis much more suitable for digital recording. The ballistics, or response time, of astandard DIN peak meter are designed to fulfill these requirements. A peak meterwill reach full scale in 10 milliseconds. However, if the peak meter reaching the input signal were allowed to fall back at the same rate, the eye would be unable tofollow its rapid movement. The metered value is held in a capacitor for a specifiedamount of time and allowed to fall back using a logarithmic response. Because thepeak meter is not a volume indicator, it is correct to read it in decibels instead ofvolume units. Analog recording uses several standards for what is called the zerolevel for a peak meter. The International Electrotechnical Commission (IEC) has de-fined these meters as the BBC, where Mark 4 equals 0dBm; the Type 1, where 0dBequals +6dBm; and the Type 2, where 0dB equals 0dBm.

Traditionally, a volume indicator is calibrated so that there is some headroomabove 0VU. Analog magnetic tape saturates slowly as levels rise, and typically 9dBof headroom above zero is allowed before reaching critical saturation levels. Thisscaling also accommodates peak information, as fast peaks tend to be about 9dB or10dB higher than the average level. The BBC meters, as well as the Type 1 andType 2 IEC meters, also allow for headroom above their zero reference. On theother hand, meters designed for digital audio are calibrated differently. Zero on a


digital meter means full quantization level; that is, all bits at full value (there is noheadroom). Therefore, it is important to note that 0VU on the console (whether itis +4dBm, -10dBm, or another level) does not equal 0dB on the digital meter. Inearly cases, -12dB was chosen as the calibration point for digital metering, but withthe advent of systems with higher quantization levels, -18dB is more often used. Todifferentiate between peak meters and digital meters, the term “dBfs” is used (“fs”stands for “full scale”). This implies that the meter is on some kind of digital ma-chine where 0dBfs equals full quantization level (all bits are used). Manufacturershave used a variety of standards, and some allow several decibels of level abovezero as a hidden headroom protection factor. Perhaps in the future a digital meter-ing standard will be adapted that everyone can adhere to. In the meantime the pru-dent engineer will read the manual for the piece of equipment in use and be awareof its metering characteristics. Most professional equipment in production todayuses the standard of 0VU = -18dBfs.

SYNCHRONIZATION

Although not a basic characteristic or function of sound, time code is an importantpart of digital systems. Without time code, position locating and synchronization inthe tape-based digital domain would be extremely difficult. Time code, as we knowit today, was developed by the video industry to help with editing. In 1956, whenvideotape made its debut, the industry realized that the film process of cut-and-splice would not work in video. The images that were visible on film were not so onvideotape. Certain techniques (e.g., magnetic ink that allowed you to see therecorded magnetic pulses) were developed, but these did not prove satisfactory.Another technique was to edit at the frame pulse or control track pulse located atthe bottom edge of the videotape. This pulse tells the head how fast to switch in arotating-head system. In the 1960s, electronic machine-to-machine editing was introduced, providing precise machine control and excellent frame-to-frame match-up. However, the splice point was still found by trial and error.

Time Code

A system was needed that would uniquely number each frame so that it could beprecisely located electronically. Several manufacturers introduced electronic codesto fulfill this task, but the codes were not compatible with one another. In 1969 theSociety of Motion Picture and Television Engineers (SMPTE) developed a standardcode that became recognized for its accuracy. That standard was also adopted bythe European Broadcasting Union (EBU), making the code an international stan-dard. The SMPTE/EBU code is the basis for all of today’s professional video- and audiotape editing and synchronization systems.

THE BASICS 19

The SMPTE time code is an addressable, reproducible, permanent code thatstores location data in hours, minutes, seconds, and frames. The data consist of abinary pulse code that is recorded on the video- or audiotape along the corre-sponding video and audio signals. The advantages of this are (1) precise time refer-ence, (2) interchange abilities among editing systems, and (3) synchronizationbetween machines. On analog-based tape machines, the code can be stored in twodifferent ways: longitudinal time code (LTC), which is an audio signal that is storedon a separate audio track of the video or audio machine; and vertical interval timecode (VITC), which is small bursts of video integrated into the main video signaland stored in the vertical blanking interval (i.e., between the fields and frames ofthe video picture).

LTC

Longitudinal time code is an electronic signal that switches from one voltage to an-other, forming a string of pulses, which can be heard as an audible warbling soundif amplified. Each 1-second-long piece of code is divided into 2,400 equal partswhen used in the NTSC standard of 30 frames per second or into 2,000 equal partswhen used with the PAL/SECAM system of 25 frames per second. Notice how eachsystem generates a code word that is 80 bits long: PAL/SECAM: 2,000 bits per sec-ond divided by 25 frames per second equals 80 bits per frame; NTSC: 2,400 bits persecond divided by 30 frames per second equals 80 bits per frame. Most of thesebits have specific values that are counted only if the time code signal changes fromone voltage to another in the middle of a bit period, forming a 1⁄2-bit pulse, whichrepresents a digital 1, whereas a full-bit pulse represents a digital 0. Following isone frame’s 80-bit code:

Bits Function0–3, 8, 9 Frame count16–19, 24–26 Second count32–35, 40–42 Minute count48–51, 56, 57 Hour count64–79 Sync word10 Drop frame11 Color frame27 Field mark

The remaining eight groups of 4 bits are called user bits, which can be used to storedata such as the date and reel number. Three bits are unused.

A typical time code number might be 18:23:45:28. This code number indicates aposition 18 hours, 23 minutes, 45 seconds, and 28 frames into the event. This couldbe on the fortieth reel of tape. Time code does not need to start over on each reel.


Bit 10, the drop-frame bit, tells the time code reader whether the code wasrecorded in drop-frame or non-drop-frame format. Black-and-white television has a carrier frequency of 3.6MHz, whereas color uses 3.58MHz. This translates to 30 frames per second as opposed to 29.97 frames per second. To compensate forthis difference, a defined number of frames are dropped from the time code everyhour. The offset is 108 frames (3.6 seconds). Two frames are dropped every minuteof every hour except in the tenth, twentieth, thirtieth, fortieth, and fiftieth minute.Frame dropping occurs at the changeover point from minute to minute.

Bit 11 is the color frame bit, which tells the system whether the color frameidentification has been applied intentionally. Color frames are often locked as ABpairs to prevent color shift in the picture. As mentioned earlier, user bits can ac-commodate data for reel numbers, recording date, or any other information thatcan be encoded into eight groups of 4 bits.

VITC

Vertical interval time code is similar in format to LTC. It has a few more bits, andeach of the 9 data-carrying bits is preceded by 2 sync bits. At the end of each framethere are eight cyclic redundancy check (CRC) codes, which are similar to the codesused in digital recording systems. This generates a total of 90 bits per frame. Themain difference between LTC and VITC is how they are recorded on tape, LTC beingrecorded on one of the videotape’s longitudinal audio tracks or on a spare track ofthe audiotape recorder. Some specialized two-track recorders have a dedicatedtime code track between the two standard audio tracks. Playback and record levelsshould be between -10dB and +3dB (-3dB is recommended). This allows 12dB ofheadroom on high-output audiotape operating at a reference level of 370 nano-webers per meter (nw/m), where +4dBm equals 370nw/m of magnetic fluxivity.Time code appears similar to a 400Hz square wave with many odd harmonics. Timecode is difficult to read at low speeds and during fast wind or rewind.

Vertical interval time code was developed for use with 1-inch-tape-width helicalscan SMPTE type-C video recorders, which were capable of slow-motion and freeze-frame techniques. During these functions, LTC is impossible to read. However, VITCis readable (as long as the video is visible on the screen) because the indexing in-formation for each field/frame is recorded in the video signal during the verticalblanking interval. When viewed on a monitor that permits viewing of the full videosignal, VITC can be seen as a series of small white pulsating squares at the top ofthe video field. Normally, VITC is recorded on two nonadjacent vertical blanking in-terval lines in both fields of each frame. Recording the code four times in eachframe provides a redundancy error that lowers the probability of reading errorsdue to dropouts.

Because VITC is recorded as an integrated part of the analog videotaperecorder’s video track, it can be read by the rotating video heads of a helical scan

THE BASICS 21

recorder at all times, even in freeze-frame or fast-wind modes. Video technology isexplained more fully in Chapter 6.

In today’s digital video acquisition and editing environment, time code is still avaluable reference. Frequently, audio and video are now being handled digitally onthe same computer system, so the need for mechanical synchronization of separatetape machines is lost; however, the importance of internal synchronization is stillpresent. The need to identify each field of video is addressed by embedding thetime code reference during acquisition into the data for each frame, whether it isrecorded onto tape or on a data storage device. The time code information is partof the data stream that gets transferred concurrently with the audio and video,intertwining all three elements.

Word Clock

SMPTE time code can only help synchronize devices up to its finite resolution—1/30th of a second—though on some equipment it can be used up to 1/100 of aframe. When connecting digital devices that divide time into slices of 1/96,000th of a second (or smaller), a higher resolution reference is needed to ensure that all the data are being sent and received, and that they are correctly interpreted intothe destination device. Word clock differs from time code in that it doesn’t “stamp”each sample point with another tag of data, but rather it is a constant signal thatsets the reference sampling rate of the source device in order to avoid data errorsand maximize performance in the digital domain. The word clock signal is inte-grated in the S/PDIF and AES/EBU digital audio signals (discussed in Chapter 10),which allows the destination device to correctly process the digital signal. Some fa-cilities also use a separate, or master word clock device, to dictate the word clockto all studio devices from a single source. If no master word clock is present, thenthe source device would provide the reference clock to its destination.

This ends our discussion of the basic characteristics of sound. An understandingof these concepts will help you with the information yet to come. We have touchedonly briefly on some very important areas, so you should consult the excellent gen-eral texts on sound and recording cited in the list of suggested readings at the endof this book.


fundamentals of digital audio

Documents