· web viewadvanced audio coding (aac) [2, 3], is a combination of state-of-the-art technologies...

EE -5359 Multimedia Processing Project Proposal

Multiplexing of Dirac Video with AAC Audio bit-stream

Under guidance of Dr.K.R.Rao

Submitted By,ASHWINI S URS

M.S.E.EID # 1000646070

September 24, 2009

Multiplexing of Dirac Video with AAC Audio bit-stream

Proposal: To develop an algorithm to multiplex Dirac video [1] with AAC audio bit-stream [2]

using MPEG-2 systems [3].

Overview of Dirac video codec:Dirac is a hybrid video codec developed by British Broadcasting Corporation (BBC). The

key feature of Dirac is that it is an open technology, which means that the technology can be used without payment of licensing fees. Dirac is a hybrid video codec because it involves both transform and motion compensation. Motion compensation is used to remove any temporal redundancy in data and transform is used to remove the spatial redundancy. Dirac uses modern techniques like, wavelet transform and arithmetic coding for entropy coding. The image motion is tracked and the motion information is used to make a prediction of a later frame. A transform is applied to the predicted frame and the transform coefficients are quantized and entropy coded. The applications of Dirac range from high definition television (HDTV) to web streaming due to its flexibility. The block diagrams of the encoder and the decoder are shown in figures 1 and 2 respectively. The decoder performs the inverse operations [1].

Fig.1 Dirac encoder block diagram [1]

Fig.2 Dirac decoder block diagram [14]

Overview of AAC audio codec:Advanced audio coding (AAC) [2, 3], is a combination of state-of-the-art technologies

for high-quality multichannel audio coding from organizations namely, AT&T Corp., Dolby Laboratories, Fraunhofer Institute for Integrated Circuits (Fraunhofer IIS), and Sony Corporation. AAC supports a wide range of sampling rates (8–96 kHz), bit rates (16–576 kbps) and from one to 48 audio channels [4]. The improved compression ratio feature of AAC provides higher quality audio at the same bit rate as previous standards or same quality audio at lower bit rates [10].

AAC consists of three profiles, namely: main profile, low-complexity profile and scalable sampling rate (SSR) profile. The key feature of low-complexity profile is, it deletes the prediction tool and reduces the temporal noise shaping tool in complexity. Hence, this profile is favorable if memory and power constraints are to be met. The block diagrams of the AAC encoder and decoder are shown in figures 3 and 4 respectively.

Fig.3 AAC encoder block diagram [7]

AAC has very flexible bit stream syntax. A single transport is not ideally suited to all applications, and AAC can accommodate two basic bit stream formats: Audio data interchange format (ADIF) and Audio data transport stream (ADTS) [10].

The feature of ADTS is that it has one header for each frame followed by raw block of data. ADTS headers are present before each AAC raw data block or block of 2 to 4 raw data blocks in a frame to ensure better error robustness in the streaming environments. Hence, ADTS bit stream format is adopted [2].

Fig.4 AAC decoder block diagram [2]

Multiplexing:A multimedia program consists of a combination of a few basic elementary streams (ES)

like the video stream, one or more audio streams and optional data streams (subtitles). In order to transmit multiple streams, various elementary streams need to be multiplexed into a single transmission stream which would be carrying all the data. In order to transmit high quality video and audio data, a large amount of bandwidth is required. In order to achieve this efficiently in terms of compression, flexible video and audio codec’s like Dirac and AAC are used [5].

Fig.5 Digital television transmission scheme [10]

Factors to be considered for multiplexing and transmission:In order to transmit a multimedia program, the elementary streams need to be combined

to form a unified bit stream. Firstly, data from every elementary stream should be prioritized equally in order to prevent any overflow or underflow of the elementary stream buffer at the receiver side. To achieve this, long elementary streams are broken down into small data packets and then multiplexed to form a single stream of data. Due to the formation of data packets, we can ensure that the transmission is reliable. In addition to this, the multiplexed stream apart from carrying the encoded data should also contain information to play the elementary streams in a sequence and in a synchronized manner at the receiver’s end to make it meaningful. Hence, the transmission of timing information along with the encoded streams in the form of timestamps plays a prominent role [7].

Packetization:Packetization forms the first step in the process of multiplexing. This refers to formatting

the long stream of data into blocks called packets. A long stretch of data at shorter intervals can be reliably and efficiently transmitted by network layer. A packet mainly consists of the header, which contains the information about the data that it is carrying followed by the payload, which is the actual data [6].

Here, the data that needs to be packetized are the audio and video streams. In case of transmitting more than one program, we have many video and audio streams. During packetization, adopted method should be such that it would enable the user to easily realign the packets, at the de-multiplexer side, to form the corresponding streams. In order to ensure the

Dirac Encoder

AAC Encoder

Packetizer

Packetizer

MultiplexerTransportStream

VideoSource

AudioSource

Encoded streams

DataSource Packetizer

PES

above mentioned criteria and to meet the transmission channel requirements two layers of packetization are carried out. The first layer of packetization yields the packetized elementary stream (PES) and the second layer yields the transport stream (TS). This second layer is what is used for transmission [6]. Figure.4 demonstrates how the layer of packetization is carried out. Multiplexing takes place after the second layer of packetization, just before the transmission.

Fig. 6 Two layers of packetization [10]

Packetized elementary stream: The PES packets are obtained by encapsulating coded video, coded audio, and data

elementary streams. This forms the first layer of packetization. The encapsulation on video and audio data is done by sequentially separating the elementary streams into access units. Access units in case of audio and video elementary streams are audio and video frames respectively. Each PES packet contains data from only one elementary stream. PES packets may have a variable length since the frame size in both audio and video bit streams is variable. The PES packet consists of the PES packet header followed by the PES packet payload. The header information distinguishes different elementary streams, carries the synchronization information in the form of timestamps and other useful information. Encapsulation of elementary stream to form PES is shown in figure 5 [4].

AUDIO OR VIDEO ELEMENTARY STREAM

PESFrame 1

PESFrame 3

PESFrame 4

PESFrame 2

Header Payload

Fig.7 PES encapsulation from elementary stream [10]

Transport stream: The second layer of packetization forms a series of packets called the transport stream

(TS). These are fixed length subdivisions of the PES packets with additional header information. These packets are multiplexed together to form a transport stream carrying more than one elementary stream. A TS packet is 188 bytes in length and always begins with a synchronization byte of 0x47[9].

Fig.8 Structure of transport packet [7].

Some constraints met while forming the transport packets are listed below: Total packet size should be of fixed size (188 bytes). Each packet can have data from only one PES. PES header should be the first byte of the transport packet payload. PES packet is split or stuffing bytes are added if the above constraints are not met [10].

The encapsulation of PES packets to form TS packets is shown in figure 7.

PES PayloadPES Header

Transport Header

Transport StreamPacket

Stuffing bytesPayload

Fig. 9 TS packet formation from PES packet [10]

Frame number as timestamp:The proposed method uses the frame number as timestamps. This section explains how

frame numbers can be used to synchronize audio and video streams. A particular video bit stream has a constant frame rate during playback specified by frames per second (fps). So, given the frame number, one can calculate the time of occurrence of this frame in the video sequence during playback as follows:

Time of playback = Frame number /fps

The AAC compression standard defines each audio frame to contain 1024 samples. The audio data in the AAC bit stream can have any discrete sampling frequency between 8 – 96 kHz [8]. The frame duration increases from 96 kHz to 8 kHz. However, the sampling frequency and hence the frame duration remain constant throughout a particular audio stream. So, the time of occurrence of the frame during playback is as follows:

Time of playback = 1024*frame number/sampling freq.

By encoding the frame numbers as time stamps and using the above mentioned equation, playback time can be calculated and synchronization can be expected after demultiplexing [10].

The advantages of using frame numbers are: Low complexity and more suitable for software implementation. Saves the extra PES header bytes used for sending the program clock reference (PCR)

information periodically. Clock jitters or inaccurate clock samples ensures synchronization. No propagation of delay between audio and video due to drift between the master clocks

at the transmitter and receiver [10].

Acronyms:

AAC – Advanced audio coding.ADIF – Audio data interchange format.ADTS – Audio data transport stream.AES – Audio engineering society.AFC – Adaptation field control.AVC – Advanced video coding.BBC – British broadcasting corporation.HDTV – High definition television.ISDB-T –Integrated services digital broadcasting – Terrestrial.MPEG – Moving picture experts group.PES – Packetized elementary stream.PID – Packet identifiers.SSR – Scalable sampling rate.TNS – Temporal noise shaping.TS – Transport stream.

References:

[1] T. Borer, and T. Davies, “Dirac video compression using open technology”, BBC EBU Technical Review, July 2005.

[2] MPEG–2 Advanced audio coding, AAC. International Standard IS 13818–7, ISO/IEC JTC1/SC29 WG11, 1997.

[3] MPEG. Information technology - Generic coding of moving pictures and associated audio information, part 4: Conformance testing. International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, 1998.

[4] M. Bosi and M. Goldberg “Introduction to digital audio coding and standards”, Boston: Kluwer academic publishers, c2003.

[5] A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal processing: image communication, vol. 19, issue 9, pp. 793-849, Oct. 2004.

[6] K. Brandenburg, “MP3 and AAC Explained”, AES 17th International conference, Florence, Italy, Sep. 1999.

[7] P.A. Sarginson, “MPEG-2: Overview of systems layer”, BBC RD 1996/2.

[8] H. Kalva, et al., “Implementing multiplexing, streaming and server interaction for MPEG-4”, IEEE Transactions on circuits and systems for video technology, vol. 9, No. 8, pp. 1299-1311, Dec. 1999.

[9] “Special issue on global digital television: technology and emerging services”, Proceedings of the IEEE, vol. 94, pp. 5-332, Jan. 2006.

[10] H. Murugan, M.S.E.E Thesis, University of Texas at Arlington, Arlington,Tx “Multiplexing H264 video bit-stream with AAC audio bit-stream, demultiplexing and achieving lip sync during playback”, May 2007.

[11] H. Murugan and K.R. Rao, “Multiplexing H.264 video with AAC audio bit streams, de-multiplexing and achieving lip sync”, ICEAST 2007, Bangkok, Thailand, 21-23 Nov. 2007.

[12] M. Uehara, “Application of MPEG-2 systems to terrestrial ISDB (ISDB-T)”, Proceedings of the IEEE, vol.94, pp. 261-268, Jan. 2006.

[13] A. Ravi, M.S.E.E Thesis, University of Texas at Arlington, Arlington,Tx "Performance analysis and comparison of Dirac video codec with H.264/ MPEG-4 Part 10 AVC", Aug. 2009.

[14] A. Ravi and K.R. Rao, “Performance analysis and comparison of the Dirac video codec with H.264/ MPEG-4 Part 10 AVC", Submitted to Journal of VCIR, Sept. 2009.

[15] J.M. Boyce, “The United States television broadcasting transition”, IEEE signal processing magazine, vol.26, pp. 110-112, May 2009.

[16] “ATSC video and audio coding”, Proceedings of the IEEE, vol. 94, pp. 60 - 76, Jan. 2006.

[17] J.Jain and P. Desale, “Low power transport demultiplexer for ATSC and DVB broadcast format”, LSI research and development Pune pvt. ltd.

[18] Digital audio compression standard (AC-3, E-AC-3), revision B, ATSC Document A/52B, Advanced Television Systems Committee, Washington, D.C., Jun. 14, 2005.

[19] Digital Video Systems Characteristics Standard for Cable Television, ANSI/SCTE 432004.

[20] B. J. Lechner, et.al, “The ATSC transport layer, including Program and System Information (PSIP),” Proceedings of IEEE , vol. 94, pp.77–101, Jan. 2006.

[21] G. A. Davidson, “Digital audio coding: Dolby AC-3,” in the Digital Signal Processing Handbook, V. K. Madisetti and D. B. Williams, Eds. Boca Raton, FL: CRC, 1998, pp. 41-1–41-21.

[22] C. C. Todd, et.al, “AC-3: perceptual coding for audio transmission and storage,” presented at the 96th Conv. Audio Engineering Soc., 1994, Preprint 3796.

· web viewadvanced audio coding (aac) [2, 3], is a combination of state-of-the-art technologies...

Documents