continuous media support for multimediacollectionscanada.gc.ca/obj/s4/f2/dsk2/ftp01/mq31256.pdf ·...

CONTINUOUS MEDIA SUPPORT FOR MULTIMEDIA

DATABASES

-4 thesis subrnitted to the

Department of Computing and Information Science

in conformity with the requirements for

the degree of Master of Science

Queen's University

Kingston, Ontario, Canada

September 1998

Copyright @ Jun Su, 1998

National Library 1+1 o f m a d a Bibiinthèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie SeMces seMces bibliographiques

The author has granted a non- exclusive licence allowing the National Liirary of Canada to reproduce, loan, distribute or sel1 copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantid extracts fiom it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé me licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distriber ou vendre des copies de cette thèse sous la forme de rnicrofiche/nlm, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thése ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Abstract

Multimedia presentations demand specific support from database management sys-

terns. The delivery of continuous media data from a database semer to multiple des-

tinations over a network presents new challenges for buffer management in a DBMS.

It has to consider specific requirements like providing for continuity of presentation or

for immediate continuation of presentation after frequent user interactions. Different

media also have specific features that must be considered.

In this thesis we present a buffer management strategy for MPEG video presenta-

tions. It supports smooth presentation of MPEG video stored in the relational DBMS

DB2/UDB, and quick response to user interactions. Experiments show that Our buffer

management strategy provides support superior to other strategies presented in the

literature. -4 framework to support cornplex multimedia presentation that is based

on DBP/UDB and its multimedia extenders is also presented.

Acknowledgment s

1 would like to thank my supervisor, Dr. Pat Martin, for his support, advice, feedback.

and above all, his patience. Without his guidance, this thesis could not have been

finished. 1 would also like to thank Gary Powley and Wendy Powley, for helping me

with my research and implementation; Rong Qiu and Hoiying Li, my good friends. for

giving me help whenever 1 needed. Finally, I would like to thank the Department of

Computing and Information Science at Queen's University for tlieir generous financial

support provided during my graduate studies.

iii

Contents

1 Introduction

1.1 Motivation for the Research . . . . . . . . . . . . . . . . . . . . . . .

1.2 Goals of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Background 7

2.1 Multimedia DBMS and Buffer Management Strategy . . . . . . . . . 8

2.2 AMOS hlultimedia DBhlS at GMD-IPSI . . . . . . . . . . . . . . . . I l

. . . . . . . . . . . . . . . . . . 2.3 LeastIMost Relevant for Presentation 15

. . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 A General Mode1 17

. . . . . . . . . . . . . . 2.3.2 Replacement and Preloading S trategy 20

. . . . . . . . . . . . . . . . . 2.4 MPEG Video and -4udio Player a t OGI 21

. . . . . . . . . . . . . . . . . . . . . . 2-41 MPEG video standard 21

. . . . . . . . . . . . . . . . . . . . . . . 2.4.2 System Architecture 23

vi CONTENTS

. . . . . 2.4.3 Software Feedback for Client/Server Spchronization 25

. . . . . . . . . . . . . . . 2.4.4 Software Feedback for QoS control 27

3 System Architecture 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 System Architecture 29

. . . . . . . . . . . . . . . . . . . 3.1.1 Session Manager and Client 30

. . . . . . . . . . . . . . . . . . . . . . . 3.1.2 SessionCoordinator 32

. . . . . . . . . . . 3.1.3 DB2/UDB and Its Multimedia Extenders 33

. . . . . . . . . . . . . . . . . . 3.2 Media Provider and Media Presenter 35

3.2.1 Media Provider and Media Presenter for MPEG video . . . . . 34

4 B d e r Management Strategy 43

. . . . . . . . . . . . . . . . . . . . . . . . 4.1 MPEG Video Presentation 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Initialization 44

. . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Buffer Manager 46

. . . . . . . . . . . . . . . . . . . . . . 4.1.3 Decoder and Presenter 49

. . . . . . . . . . . . . . . . . . . . . . . 4.2 Buffer Management Strategy 51

. . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Preloading Strategy 55

. . . . . . . . . . . . . . . . . . . . . . 4.2.2 Replacement S trategy 57

5 Performance Study 59

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Measurements 59

CONTENTS vii

5.2 Comparing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.3 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . - . - . 64

5.4 Results and Observations . . . . . . . . . . . . . . . . . - . . - . . . . 65

5.4.1 Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.4.2 Interactive Response Time . . . . . . . . . . . . . . . . . . . . il

W C 5.4.3 Smoothness vs. Buffer Size . . . . . . . . . . . . . . . . . . . . i a

6 Conclusions 77

6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . 78

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . . . 79

Bibliography

Glossary

Vit a

viii CONTENTS

List of Tables

2.1 Functionality of the synchronization feedback mechanism . . . . . . . 26

2.2 Functionality of the QoS control feedback mechanism . . . . . . . . . 25

. . . . . . . . . . . . . . . . . . . . . . . 5.1 MPEG Video Frames Mode1 62

5.2 SmoothnessofAMOSandOurStrategy . . . . . . . . . . . . . . . . 68

5-3 Smoothness of OGI and Our Strategy . . . . . . . . . . . . . . . . . 69

5.4 . Interaction Response Time (ms) . . . . . . . . . . . . . . . . . . . . . 72

x LIST OF TABLES

List of Figures

. . . . . . . . . . . . . . . . . . General architecture of a hlhl-DBM S 9

. . . . . . . . . . . . . . . . . Architecture of the AMOS MM-DBMS 14

. . . . . . . . . . . . . . . . . . . . . . . Example state of a data flow 16

. . . . . . . . . . An example of interaction sets viith relevance values 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . L/MRP Algorithm 20

. . . . . . . . . . . . . . . . . . . . . . Architecture of the OGI player 23

Structure of the synchronization feedback mechanism . . . . . . . . . 26

. . . . . . . . . . . Structure of the QoS control feedback mechanism 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 System Architecture 30

. . . . . . . . . . . . . . . . . . . . . . . 3.2 Session Manager and Client 31

. . . . . . . . . . . . . . . . . . 3.3 Object-Relational Database System 34

. . . . . . . . . . . . . . . . 3.4 Classification of Adaptation Mechanism 37

. . . . . . . . 3.5 Media Provider and Media Presenter for MPEG video 39

xi

xii LIST OF FIGURES

. . . . . . . . . . . . . . . . . . . 4.1 MPEG Video Presentation Process 45

. . . . . . . . . . . . . . . . . . . . . . . . 4.2 Data Flow at Client Side 46

. . . . . . . . . . . . . . . . . . . 4.3 .-ln example of MPEG video stream 53

. . . . . . . . . . . . . . . . . . . . . 4.4 Buffer Management Algorithm 54

. . . . . . . . . . . . . . . . . . . . . . . . 5.1 Smoothness Measurement 67

CI- . . . . . . . . . . . . . . . . . . . . . . . 5.2 Smoothness vs . Buffer Size r a

Chapter 1

Introduction

1.1 Motivation for the Research

Multimedia presentations present a wide range of media including audio, video, test,

images, and animation in a single presentation, and allow users to control the rate

and selection of media being played. It is one of the most important multimedia

applications. The fusion of different media into multimedia presentations provides an

opportunity to create more effective and efficient communications of ideas.

A multimedia database management system (MM-DBMS) provides the necessary

support for multimedia presentation. A MM-DBMS has the capability of storing,

managing and retrieving information on individual media, managing interrelation-

ships between the information represented by different media, and exploiting t hese

1

2 1 ntroduction

media for presentation purposes [RNL95].

Multimedia presentation demands specific support from a EVM-DBMS. They re-

quire the delivery of continuous media data from a database server to multiple des-

tinations over a network. To facilitate a hiccup-free presentation, the MM-DBbIS

must ensure that an object is present in memory before it is displayed. If the loading

rate of a media stream from disk to mernory is less than the delivery rate of the

media stream, then preloading of the stream prior to delivery is necessary to ensure

continuous presentation. Furthermore, an appropriate allocation and replacement

strategy must be provided to anticipate the dernands of delays and user interactions.

Such a strategy must rninirnize the response time of multimedia presentations while

guaranteeing that dl continuity requirements are satisfied.

Replacement strategies for conventional database applications, like LRü (Leas t

Recently Used), FIFO (First In First Out), LFU (Least Frequently Used), etc.,

([DTgO], (EH841) are not suitable for a multimedia database system. They do oot ex-

plicitly address the reference behaviour of interactive continuous data. Furthermore,

presentation scenarios can be constructed where these strategies have destructive be-

haviour. We can show this by an example. Suppose Our buffer uses the LRU strategy.

If we begin with an empty buffer with constant buffer size 15, then playing frames

1 to 20 of a video leads to a buffer state in which frames 6 to 20 are in the buffer

after the presentation h a . finished. If we next wish to play forward from frame 5 to

1.1 Motivation for the Research 3

15, then kame 5 is not present in buffer, which causes a buffer fault. LRU would

replace 6 to load 5. Now we request frame 6, and LRU would replace 7, and so on.

In this case LRU always replaces the frame that will be needed next. The reason for

this behaviour is that LRU does not consider any presentation specific informat ion.

For al1 the other general strategies similar examples of "destructive behaviour" can

be found. This behaviour is not only for some "constructed~' examples, Moser et al

[kIKK95] show the average misbehaviour of LRU in their performance investigations.

Currently many multimedia systems employ the "UseSrToss" replacement st rat-

egy [CAFSl]. Each data page is free for replacement immediately after it is presented.

The drawback of this simple strategy is that data that rnay be referenced after an in-

teraction are not kept in the buffer. For example. if the user initiates a play backmard

interaction from the play fonvard state, then al1 previously tossed da t a may have to

be reloaded into buffer, which leads to increase response times.

Most buffer management strategies developed For multimedia database si-çtems

only support generic models for continuous media, but we claim that media specific

properties should be considered. A generic buffer manager treats video frames as

independent, atomic units and the preloading and replacement strategies are based

on this assumption. In MPEG (Motion Picture Experts Group) video presentation.

frame dropping is used because the data volume involved is very large, and the systeni

may not be able to present al1 the frarnes in time. Dropping of one frame rnay affect

4 Introduction

the following frames. Thus for MPEG video the dependencies between the single

frames have to be considered.

MPEG is a widely accepted international standard. Specially, MPEG-1 plays a n

important role in multimedia applications. Some researchers [CPS95] have studied

the specific features on MPEG-1, but did not consider the impact of these features on

multimedia presentation. Other researchers [HL971 have considered the effect of the

features of MPEG-I on multimedia presentation, but could not mode1 MPEG-I data

very well. We investigate a specific buffer management strategy for MPEG-I video

presentation in this thesis.

A relational database uses tables to represent entities and uses keys to represent

relationships. An instance of an entity is represented as a row of the table. The

standard applications of a relational database range from high volume online trans-

action systems to query intensive data aarehouse applications. Multimedia data like

audio and video are stored as binary large objects (BLOBs) in a relational database.

User-defined data types and functions are used to support multimedia applications.

1.2 Goals of Research

The main goals of the research are:

To study efficient buffer management strategies for delivering MPEG video da ta

from a relational database. The strategies must also support user interaction.

1.3 Outline of Thesis 5

To implement a buf'fer management strategy and study its performance in

a client-semer environment where IBM DB2/UDB [IBAI97b] is used to store

MPEG video a t the server side.

To propose a framework to support complex multimedia presentations.

The particular problems and issues addressed in this thesis are:

How to preload video data to maintain a smooth presentation.

How to replace video data in order to quickly respond to user interactions.

1.3 Outline of Thesis

The remainder of this thesis is organized as follows:

Chapter 2 discusses the background rnaterial required for our work.

Chapter 3 describes the architecture of a system to support multimedia presen-

tation. The subsystem for MPEG video presentation is also presented.

Chapter 4 presents the buffer management strategy for MPEG video presenta-

tion.

Chapter 5 discusses the implementation of the buffer management strategy and

presents a performance study of the strategy.

6 Introduction

Chapter 6 concludes the thesis with a review of the contributions of the work

and a discussion of future research directions.

Chapter 2

Background

In this chapter, we first introduce multimedia database management system (DBMS)

and its buffer management strategy. Then we give an overview of the "AMOS Multi-

media DBMS at GMD-IPSI". We present their "teast/most relevant for presentation"

(L/MRP) buffer management strategy from which we derived our strategy for MPEG

video presentation. Finaily we present an MPEG video player developed at OGI that

provides the b a i s for Our implementation.

7

8 Background

2.1 Multimedia DBMS and Buffer Management

Strategy

A multimedia database management system (MM-DBMS) has the capability of stor-

ing, managing and retrieving information on individual media, managing interre-

lationships between the information represented by different media, and exploiting

these media for presentation purposes. The basic constituents of a MM-DBMS are

the following:

Multimedia Data Modeling. Standard datatypes are not adequate to reflect the

structure of multimedia data. New built-in datatypes like image and audio and

a notion of stream for presentation and capture purpose are needed. In addi-

tion to the datatypes, type constructors that allow one to deal with temporal

relationships are also helpful.

Content-Based Retrieval. Retrieval in rnuitimedia databases must include the

type of queries known from the field of traditional databases as well as retrieval

functionality (such as full text search) knowo from the field of information

retrieval. In the case of videos, content-based search means the ability to search

for a specific fragment of a video that starts with a given scene, or includes given

objects.

2.1 Muhimecria DBMS and B a e r Management Strategy 9

- - - - - - - data -

conuol MM-DBMS Server

manipulation object

pp -- -

MM-DBMS Client

manipulation objecu 1 7

Application 1 Pragnrn 1

Figure 2.1: General architecture of a MM-DBMS

Continuous Storage Management. To provide timely delivery, continuous data

strearns may be directed from the storage components to the consuming com-

ponent (viewer, application) bypassing other layers of the multimedia database

system. This avoids additional overhead but does not allow any further pro-

cessing (selection of portion, scafing, etc.) of the data by the database system.

Architecture. Figure 2.1 [NT921 shows a multimedia application that uses the

service of the DBMS to retrieve multimedia objects from the database, to ma-

nipulate them, to transport them over the network and finally to present them

at the user's workstation. A transport protocol that implements the continuous

flow of data along with a mechanism for continuous control is of central im-

portance for an efficient management of presentation and capture functionality

throughout the whole system.

Buffer management wit hin the multimedia database system is essential to ensure

the maintenance of the intra- and inter-strearn synchronization requirements of mul-

timedia data presentations. To facilitate a hiccup-free presentation, we must ensure

10 Background

that an object is present in memory before it is displayed. If the loading rate of a me-

dia stream from disk t o memory were less than the delivery rate of the media stream,

preloading of the stream pnor to delivery would be necessary to ensure continuous

presentation. Furthemore, an appropriate allocation and replacement strategy must

be provided to anticipate the dernands of delays and user interactions. Such a strat-

egy must minimize the response time of multimedia presentations while guaranteeing

that al1 continuity and synchronization requirements are satisfied.

Research involving buffer management in multimedia database systems is still in

its infancy [TK95]. Moser et al [MKK95] have proposed a buffer strategy termed

"least /most relevant for presentation" . This buffer strategy invest igates the effects

of such user interactions as "rewind" and "fast fonvard" on buffer design. A mech-

anism is proposed which reduces the delay after user interactions. Chaudhuri et al

[CGS95] have investigated the problem of continuously displaying composite objects

that are dynamically specified at the server level. Techniques based on simple slid-

ing and buffered sliding are proposed which support continuous display by partial

prefetching of overlapping media objects. Such an approach is preferable to the naive

strategy of prefetching the entirety of overlapped media objects. Gollapudi [GZ96]

has investigated the minimum buffering requirements that are necessary to guarantee

the continuity and synchrony of the presentation of multimedia data. A prefetching

technique that satisfies the minimum requirements has also been worked out.

2.2 AMOS Multimedia DBMS at GMD-IPSI 11

2.2 AMOS Multimedia DBMS at GMD-IPSI

The Integrated Publication and Information Systems Institute (IPSI) is one of the

eight institutes of GMD - German National Research Center for Information Tech-

nology. Research on MM-DBMS a t IPSI started at the end of the 1980's. In 1992,

the department Active Media Object Stores (AMOS) was founded to foster devel-

opments in this research area [NT92]. Currently, concepts are implemented in the

AMOS MM-DBMS prototype and are elaborated by means of international research

projects as well a s industrial projects. The accomplishments to date of the AMOS

prototype include the following: the design of multimedia data types, the modeling

of meta information, support for multimedia presentations, and the development of

an object manager for continuous objects.

AMOS allows the free composition of different media into a new multimedia prod-

uct - a multimedia presentation. Any combination of both continuous media, such

as audio, video, and text, as well as non-continuous media, such as a picture, can be

arranged in one multimedia presentation. This calls for a mode1 of a presentation that

includes defined temporal dependencies between the media, defined time intervals in

which media are presented to the user as well as media specific characteristics such

as initial playback volume of an audio.

The essential concepts of the modeling and solutions can be summarized as follows:

Spatial and Temporal Composition: The description of a presentation

reflects al1 possible temporal relationships between the media and a possibIe

spatial position and overlapping of two-dimensional media on the screen.

Time Line: The selected representation mirrors the entire temporal course of

the multimedia presentation. The solution to keeping track of a presentation is

to store information only about changes during a presentation.

Interaction Capabilities: One of the main features of a presentation on the

client's site is the user interaction in the course of a presentation. Interactions

are treated and modeled as normal media.

Coarse and Fine Synchronization: Timing requirements are subdivided into

fine and coarse synchronization. The coarse synchronization ensures that the

time line representation of the presentation is put into action. Coarse synchro-

nization relies on the mere schedule of the presentation stored in the description.

The fine synchronization, however, obeys the maximum permissible deviation

from reference media.

Presentation Parameters: Initial settings such as playback volume for an

audio, the playback speed of a video, and the like, are modeled.

A presentation is composed on a high definition level using a definition tool such

as HyTime [IS092]. The script-like description of the presentation is transferred

from the MM-DBMS semer to the client, is interpreted there, and the presentation

2.2 AMOS Multimedia DBMS at GMD-PSI 13

is shown to the user as desired. The interpreting component a t the client manages

the preparation, startup and termination of the individual media presentations that

belong to the integrated multimedia presentation.

The architecture of the AMOS MM-DBMS is shown in Figure 2.2 [RKN96]. The

VODAK DBMS [AUG95], which was also developed at GMD-IPSI, is used for the

modeling and storage of discrete data. Thus, data types of the VODAK Modeling

language (VML) are exchanged between client and server. The main tasks a t the

server are data storage, scheduling, and cont inuous ob ject management (CO hil) , which

uses extemal media servers for storage. This architecture enables easy integration of

specific hardware such as real-time servers, tape or magneto-optical jukeboxes, and

CD-ROM devices. The Continuous Transport Manager (CTM) enables access to

Internet protocols as well as Asynchronous Transfer Mode (ATM).

The "Spatial, Temporal and Interaction Script" Interpreter (STI) interprets the

time line-based scripts (as generated from a VML schema) and triggers the Multi-

media Playout Manager (MPM). This module is responsible for the handling of the

client's environment, the synchronized playback, and the scheduling of concurrent

requests. The MPM controls single media presenters (SPMs) which are implemented

by utilizing libraries available for the client's platform, for example, a virtual audio

device, a motion JPEG player, and an animation tool. -4 SPM get its da ta from ,

the COM. When the application asks for time-dependent data, this request is sent to

14 Background

1 VODAK remote API Continuws Transp. Mgr. 1 1 l Controt Data ; 1 Continuous Data

VML SchemdApplication

ontinuous Database

Figure 2.2: Architecture of the AMOS MM-DBMS

the VODAK DBMS via the VODAK remote API. Time-dependent data are sent via

COM to the application. The COM initiates asychronous replacement and loading

of distributed continuous data and adaptation in case of semer and network delays.

In contrast to traditional object managers, the delivery of data like audio and

video is "just in time". Object management enables an application to access data by

loading these objects from disk into a main memory buffer and by replacing objects

no longer needed. Updated objects are written back to disk. In a client/server based

2.3 Least/Most Relevant for Presentation 15

environment, objects must be transported in addition to the main memory of the

requesting client. Because of their large volume, time-dependent data cannot be

entirely loaded into the buffer. These objects are therefore decomposed into small

blocks (for example audio sequences, frames) which are loaded and replaced in the

buffer. Loading and replacement of objects is done by a strategy, called Least/Most

Relevant for Presentation (L/MRP) [MKK95] which is explained in more detail in

next section.

2.3 Least /Most Relevant for Presentation

Least/Most Relevant for Presentation (L/MRP) is a buffer management strategy that

considers specific requirements such as continuity of presentations and immediate

continuation of presentations after frequent user interactions. It is especially sui table

for supporting highly interactive multimedia presentation.

To explain the general idea of L/MRP, we use the presentation snapshot illus-

trated in Figure 2.3. Since the continuous objects are too large to be stored in the

client buffer as a whole, they are segmented into a sequence of manipulation units,

called Continuous O bject Presentation Units (COPU). Single COPUs are requested

continuously by the buffer manager. Al1 COPUs of a continuous object are indexed

from O to ri-1, where n denotes the total number of COPUs. We denote the direction

and skip parameter of a presentation by a single signed skip value. A positive value

16 Background

indicates the fonvard direction and a negative value indicates the backward direction.

The absolute value of the skip value denotes the number of COPUs to be skipped. In

Figure 2.3, the skip value is +2 which means the presentation rnoves forward and one

of every two frames are skipped. We can also identih three different COPU types

in Figure 2.3: COPUs located in the reverse direction (History), COPUs to be refer-

enced in future (Referenced) and COPUs to be skipped because the absolute value of

the skip value is great than one (Skip) . LIEVIRP introduces the notion of a relevance

function that assigns a value to every element of a set denoting its significance for

replacement and preloading. L/MRP makes use of the relevance values in the way

that least relevant COPUs are replaced and most relevant COPUs will be preloaded.

The relevance value of a COPU also depends on specific presentation parameters like

the number of the currently presented COPUs.

current presentation point p presentation direction -

COPU in reverse COPU to be referenced

COPUtobe skipped

Figure 2.3: Example state of a data flow

Figure 2.4 shows the sets of History, Referenced and Skip COPUs for the given

presentation point (508) and a skip value of +2. Each COPU, identified by the

index on the x-axis, is associated with a relevance value refiecting the importance of

2.3 Least/Most Relevant for Presentation 17

the COPU with respect to the specific interaction set. Let us assume that due to

some previous ongoing presentations, the COPUs with underlined numbers are in the

buffer. The least relevant COPUs, that is the COPUs with the lowest relevance value,

a t this moment are 527, 525, 523, 528, 521,500, 526, etc.. Thus, the next replacement

candidates are 527, 523, 500, etc.. The most relevant COPUs for presentation are

508, 510, 512, 514, etc.. Thus, the next preloaded COPUs will be 510, 514, etc..

relevance

1

4 a a a

0 : . a a 0.9 u I w

TS 0 Y m -

O 1 a O

0 History Referenced O Skip

Figure 2.4: .4n example of interaction sets with relevance values

2.3.1 A General Mode1

Let CO (continuous object) denote the sequence of al1 COPUs that constitute a

continuous object. An element ci, i = O,. . . , ICOI - 1, denotes the COPU with index

i within the continuous object CO. The state of a presentation is characterized by a

18 Background

tuple s =< p, skip >, with p E {O,. . . , ICOI - 1) denoting the index of the COPü

at the current presentation point and skip E 2, skip # O, denoting the skip value.

COPUs are related to one or more interaction sets A,. Each set A, has an associated

criteria that is used to decide whether or not a COPU belongs to the set at a specific

point of a presentation, and to specify the relevance value of the COPU with respect

to S. Hence, an interaction set A, is defined as a binary relation relating a COPU to a

relevance value. For example the interaction set Re f erenced, with s =< 508, +2 >.

as visualized in Figure 2.4, is:

To denote the relevance of a COPU within a given interaction set A,, a distance

relevance junction d., is defined. The domain of a distance relevance function is

relative distance of individual COP Us to the current presentation point. Function

d,, map distances to values in [O, 11:

d,, (2) is the relevance value of a COPU with distance i to any possible presectation

point p. The distance relevance values describe the degree of importance to keep

COPUs in buffer. For example, the distance function &R,,,R,,,,, for al1 future

referenced COPUs describes the degree of importance to keep specific COPUs in the

buffer because of the high probability to be accessed in the current presentation. X

2.3 LeadMost Relevant for Preseztation 19

distance relevaace function value of 1 means that the COPU is most relevant for

presentation, and, hence, is not to be considered as a candidate for replacement, but

has to be preloaded by L/MRP, if necessa-

The formal definition of the interaction set As for a presentation state s is:

A, = {((cj, dr, (i)) lcj E CO, j = g ( i , '11 2 E NO)

The index j of a COPU to be considered in A, is determined by a function g which

depends on the distance of the COPUs i and the current state S. The relevance value

for a COPU cj is determined by the distance relevance function d,,.

To compare the relevances of COPUS with respect to the whole continuous object

we introduce the relevance functzon TA^ for an interaction set As:

The relevance function can be obtained by projection on the second position of the

respective interaction set, if a COPU is considered there; othenvise a value of zero is

assigned.

The interaction sets Re f er enced, , Skip, and H Z S ~ O T Y ~ are defined as follows:

skip Skip, = { ( c j l &,,ip(i))l~j E CO, Iskipl > 1, i E No, j = p + - (1 + i + 1 l skzpl

J } Iskipl - 1 s kip

Histmy, = {(cj, drHi,,,,,(i))) lcj E CO, i E No, ,j = p - (i + 1) . -) lskzpl

20 Background

2.3.2 Replacement and Preloading Strategy

The L/MRP algorithm is initiated by the COM at every request to reference a COPU

for presentation. The next replacement victim during a presentation is the COPU c

available in the buffer with the minimum value rco(c ) The COPU d with maximum

value rco(d) is the next COPU that has to be preloaded if it is not yet present in

buffer. The algorithm is given in Figure 2.5, where BUFFER denotes the set of

COPUS present in the buffer. GetNextCopuToBePresented(s) : Pointer to COPU begin (1) for al1 c in CO with relevant value = 1 do

begin // preload the most relevant COPUs if c not in BüFFER then Il buffer fault begin

if I BUFFER I = duffer-Size> then /I buffer full begin

Il replace the least relevant COPU v fmd v in BWFFER with the least relevant value replace v by preloading c

end else

Il just load c into buffer preioad c into BUFFER

end end return buffer address of COPU c

end

Figure 2.5: L/MRP Algorithm

The algorithm guarantees that the next COPU to be presented is in buffer. In

statement (1) of the algorithm, for a presentation state s, the COPUs checked are

2.4 MPEG Video and Audio Player at OGI 21

(ci, r ) E Re f erenced,, where i = p+ h-skzp with h = 0,1,2, . . . , f while 1 denotes the

number of COPUs t o be prefetched. In statement (2) the distance relevance functions

are used to compute the relevance values for COPUs in the BUFFER.

Our buffer management strategy is derived from L/EVIRP. We extend it with spe-

cific features to better support MPEG video presentation.

2.4 MPEG Video and Audio Player at OGI

Shanwei Cen et al a t Oregon Graduate Institute of Science and Technology have

designed and implemented a distributed, real-time MPEG video and audio player

[CPS95]. The player is designed for use across the Internet, a shared environment

with variable traffic and with great diversity in network bandwidth and host process-

ing speed. They used a toolkit approach to build software feedback mechanisms for

clientIsemer synchronization, dynamic Quality-of-Service control. and system a d a p

t iveness .

2.4.1 MPEG video standard

The MPEG video compression algorithm [LEGS11 relies on two basic techniques:

block-based motion compensation for the reduction of the temporal redundancy, and

transform domain-based compression (DCT) for the reduction of spatial redundancy.

The idea of motion compensation is to encode a video frame based on other video

22 Background

frames temporally close to it. Typicdly, the image in a video Stream does not differ

too much within small time intervals.

In MPEG-1 [LEGSl], three types of frames (pictures) are used to encode video:

Intra- (1-), Predicted (P-) and Bi-directional (B-) frames. Intra-frames are encoded

independently, without reference to any p s t or future frames. Predicted frames

are encoded in relation to a past reference frame namely an 1- or P- frame. Bi-

directional frames are encoded relative to both preceding and following reference

frames. -4 video sequence is composed of a sequence of groupof-pictures (GoP),

each GoP contains a frame sequence with a fked pattern, such as IBBPBBPBBPBB.

The GoP structure enables random access within a sequence. Usually a GoP is an

independently decodable unit that can be of any size. A Group of Pictures is closed,

if the frarnes have no references to other GoPs, and open otherwise. To playback

an MPEG-video, a t least the GoP information and independent 1 frarnes have to be

available. Different MPEG-I videos may have different groupof-pictures patterns.

But they al1 obey the sarne rules. The bIPEG-1 video we used in our study has

groupof-pictures pattern IBBPBBPBBPBB. But o h algorithm works well for any

other patterns.

The ISO standard MPEG-2, established in 1994, is designed to produce higher

quality movies at higher bit rates. The concept is similar to MPEG-1, but includes

extensions to cover a wide range of applications. The primary application aimed at

2.4 MPEG Video and Audio Player at CIGI 23

during the MPEG-2 definition process was dl digital transmission of broadcast TV

qudity video at coded bit rates between 4 and 9 Mbps. The most significant enhance-

ment over MPEG-1 is the addition of syntax for the efficient coding of interlaced video

[MPEâ]. Other key features of MPEG-2 are the scalable extensions which permit the

division of a continuous video signal into two or more coded bit streams representing

the video a t different resolutions, picture quality, or picture rates.

2.4.2 System Architecture

Figure 2.6 shows the architecture of the player. The player has five components: a

video server ( VS), an audio server ( A S ) , a client, and video and audio output devices.

VS manages video streams. -4s manages audio streams. The client is composed of

a video decoder and a controller which controls the playback of both video audio

streams and provides a user-interface. The client, VS and AS reside on different

hosts, communicating via a network.

........................................... Feedback

Video Stream

Controller

Audio Stream Client ...........................................

Figure 2.6: Architecture of the OGI player

24 Background

A program for the player is a video and audio Stream pair: <video-host : video-

file, audio-host: audio file>, where a video stream is a sequence of frames, and an

audio stream is a sequence of sarnples. These two strearns are recorded strictly syn-

chronously. We refer to a contiguous subsequence of audio sarnples corresponding

to a video frame as an audio block. Therefore, there is a one-to-one correspondence

between video frames and audio blocks.

Dunng playback of a program, VS and AS retrieve the video and audio streams

from their storage and send them to the client a t a specified speed. The client buffers

the streams to remove network jitter, decodes video frames, resamples audio, and

plays them to video and audio output devices respectively.

Programs can be played back at variable speed. Play speed is specified in terms

of frarnes-per-second (fps). The player plays a program in real-time by mapping

its logical time (defined by sequence numbers for each frame/block) into system time

(real time, in seconds) on the client's host machine. Suppose the system time a t which

frame(i) is displayed is Ti, and the current play speed is P fps, then the time a t which

frame(i+l) is played is T,+I = Ti+$. VS and AS also map the program's logical time

into their own system time during the retrieval of the media streams. Synchronizatiori

between audio and video streams is maintained at the client by playing audio blocks

and displaying video frames with the same sequence number a t the same time.

If any stage of the video pipeline, from VS through the network and client buffer


to the decoder, does not have sufficient resources to support the current quality-of-

seMce (QoS) specification it can decide independently to drop frames. The controller

of the client also drops Iater frames (frarnes which arrive after their display tirne). A

similar approach is implemented for the audio pipeline.

The user QoS specification is currently restricted to display frame rate. The

display frame rate is the number of frarnes-per-second displayed by the client. A

valid display frame rate is always equal to, or lower than, the current play speed.

There are a nurnber of serious problems to be solved in this system. These prob-

lems include client /semer clock drift, insufficient effective bandwid t h to meet the

user-specified QoS, and stalls and skips in the pipeline. 4 software feedback mech-

anism was adopted to solve these problems. A feedback mechanism monitors the

output or internal state of the system under control, compares it to the goal specifi-

cation, and feeds the difference back to adjust the behavior of the system itself.

2.4.3 Software Feedback for Client/Server Synchronization

The synchronization mechanism is implemented in the client, as show^ in Figure 2.7.

It measures the current client time, Tc, and the server time, T,, as observed a t the

client, and computes the raw server work ahead time, T,,, = Ts - Tc. T,,,, is input

to a low-pass filter, FI, to eliminate high frequency jitter and get the server work

ahead time, Tm. The control algorithm then compares Tswa with the target server

26 Background

1 Event 1 Feedback Action 1

- -- --

Table 2.1: Functionality of the synchronization feedback mechanism

1

work ahead time, Ttswa, and takes action accordingiy.

Tsma too low Tswa < !jT'wa Tm, too high Tsw, > $Ttswa

Ttswo ~ O O IOW Ttswa < aK x Jnet Ttswa tao h i ~ h Ttsura > K X Jnet

_._.__._**. . . . ._. - f . . . . . f f . * f . . . System under control ;

Speed up Cs rate or skip Cs Slow down Cs rate or stall Cs

Double Ttswa Halve Ttswa

Figure 2.7: Structure of the synchronization feedback mechanism

Tt,,. in turn is determined by the current network delay jitter level. The jitter

of the measured current semer work ahead tirne, (Trswa - Tswa(, is fed to another

low-pass filter, F2, to get the network delay jitter, Jnet. Jnet is then used to compute

Table 2.1 describes the functionality of the synchronization feedback mechanism.

Cs refers to the VS clock, and K > O is a constant. Whenever the control algorithm

detects that T,,, has deviated too far from Ttswa, it adjusts the VS clock rate by

skipping it or stalling it for a certain amount of time, to bring Tswa back to Tt,,,.


Each time the VS clock is adjusted, the mechanism backs off for a certain amount of

time to let the effect of the adjustment propagate back to the feedback signal input.

2.4.4 Software Feedback for QoS control

The QoS control feedback mechanism is d s o implemented in the client, as shown

in Figure 2.8. Initially, the target frame rate, Ft, at which VS sends frames is set

to the user-specified frame rate, Fu. The feedback mechanism monitors the display

frame rate at the client and uses a low-pass filter to remove transient noise. The

filtered display frame rate, Fd, is then compared against Fu and the existing Ft by

the control algorithm. If the pipeline is found to be under or over loaded, a new Ft

value is computed and fed back to VS.

. . . . . * f . . . * . - - . . . _ . . _ _ _ . . _ . . _ _ . . : System under control ;

Display

fnme rate

Figure 2.8: Structure of the QoS control feedback mechanism

The control algorithm adjusts Ft linearly. The functionality of the feedback mech-

anism is described in Table 2.2. Ti , Th and 4 are three parameters: low and high

thresholds and adjust step, where > O, Th > O, A > O and Th - Ti > A. These

parameters, as well as the back-off time after a feedback action, are respecialized

28 Background

Table 2.2: Functiondity of the QoS control feedback mechanisrn

1 Event Pipeline over-loaded Fd < Ft - Th

Pipeline under-loaded Fd > Ft - and Fd < Fu

upon play speed change. The back-off time is also adapted to T,,. measured in the

Feedback Action Ft = Ft - A

Ft = h.lin(Ft + A, Fd)

synchronization feedback mechanism.

Chapter 3

System Architecture

This chapter presents the architecture of Our framework to support multimedia appli-

cation and provides an overview of the various components that make up the system.

We also present the design of a subsystem to support MPEG video presentation. Our

buffer management strategy is implemented and tested in this subsystem.

3.1 System Architecture

The system architecture is illustrated in Figure 3.1. Rilultimedia data is stored in

DBâ/UDB and accessed using its multimedia extenders. The system has a set of

Session Managers and Clients, which are under control of a Central Coordinator. A

Session Manager - Client pair is created for each application, which contains multiple

29

30 System Architecture

media streams and deds with a specific multimedia application, such as Video-on-

Demand, News-on-Demand, Media Editing Workbenches, and so on. The number of

Session Manager - Client pairs is limited only by the amount of system resources. The

function and configuration of each Session Manager - Client pair are specific to the

media they support. The system, therefore can be extended to support new media

types and new multimedia applications.

Network

1 Session Manager Session Manager I I 1 . . . 1 Session Manager I

1 ' Session coordinator 1 1 I I DBîfüDB 1 MM Extenders 1 1

Figure 3.1 : System Architecture

3.1.1 Session Manager and Client

The Session Manager provides real-time retrieval of multimedia data from the database

and transfer of the data to the Client over a network. The Client is responsible for re-

questing data from the Session Manager and delivering it to the presentation devices

3.1 System Architecture 31

on the client workstation.

A multimedia application may involve multiple media streams. For example. a

Video-on-Demand application may need one video stream, one audio strearn and one

text stream. Thus a Session Manager should be able to support data caching and

scheduling of multiple media streams, and the Client should be able to synchronize

multiple media streams. The configuration of a Session Manager - Client pair is shown

in Figure 3.2.

I I I I Presentation Coordinator I

1 I 1 I I I

Media Presenter

71 [Medial I-EZZJ i I Media Prowder

Network

I 1 I 1 I t I t Media Coordinator I l I

I

Figure 3.2: Session Manager and Client

A Session Manager contains one or more Media Provzders which are under con-

trol of a Media Coordznator. A Media Provider deals with data retrieval, caching and

transfer of a single media stream such as video, audio or animation. The number of


Media Providers is detennined by the application. The Media Coordinator coordi-

nates the work of multiple Media Providers. For example, if multiple Media Providers

contend for system resources, then the Media Coordinator must provide some control

mechanism to ensure al1 of the providers are satisfied.

A Client contains one or more Media Presenters which are under control of a

Presentation Coordinator. A Media Presenter works cooperatively with a Media

Provider to deal with data requests, caching and real-time presentation of a single

media Stream. The Presentation Coordinat or coordinat es the presentation of mu1 tiple

bf edia Presenters.

3.1.2 Session Coordinat or

The Session Coordinator plays multiple roles in this systern including Admission

Control, Resovrce Administration, Media Sharing and Batching. In its Admission

Control role, the Session Coordinator determinates whether to accept or refuse a

request from a user according to the system resources usage. If a request is accepted.

a Session Manager - Client pair is created to handle the application. In its Resource

-4dministration role, the Session Coordinator distributes system resources among al1

the Session Managers to meet their requirements. Media Sharing [KPTSB] is such

a technique where buffers that have been played back by a user are preserved in

a controlled fashion for use by subsequent users requesting the same data. This

3.1 System Architecture 33

technique can be used when sufficient buffer space is available a t the semer side to

retain data for the required duration. Through this way the system can avoid fetching

the data from the disk again for the late lagging user, and it is possible to support a

larger number of sessions than permitted by the disk bandwidth. Batching [DSS94]

is another technique for improving the performance of a system by grouping requests

that arrive for the same topic within a short duration of time.

3.1.3 DBP/UDB and Its Multimedia Extenders

An object-oriented database management system (00-DBMS) is a much more natural

basis than a relational DBhlS for implementing the functions necessary to manage

multimedia data. However, 00-DBEVISs have not had a significant impact in the

database market. Two reasons for this lack of impact are that most of the current

00-DBMSs lack maturity as database systems and that they are not sufficiently

compatible with relational DBMSs [INF97b].

Leading enterprise DBMS vendors are offering a new kind of database system - an

object-relational database system (OR-DBMS), which combines the best features of

00-DBMSs and relational DBMSs. Figure 3.3 shows the relationships between these

DBMSs [INF97b]. IBM DB2/UDB is such an OR-DBMS that can provide better

support for multimedia data.

34 Systern Architecture

Set-Based

Nonset-Based

1

I I

ReIational Object-Relational

DBMS b DBMS I

I HierarchicaVNetwork I Object-Oriented

DBMS I I

DBMS

I I Simple data Complex data

Figure 3.3: Object-Relational Database System

Based on the object relationai facilities introduced by DB?/UDB, a set of mul-

timedia extenders were created by IBM to facilitate the development of multimedia

applications [IBM97a]. An extender encapsulates the attributes, structure and be-

havior of new data types and stores them in a column of a DB2/UDB table, so

that they can be processed through the SQL language as a natural addition to the

standard set of DB2/UDB data types. Currently there are four kinds of extenders

available, namely a Text Extender, a Image Extender, a Video Extender and an Au-

dio Extender. These extenders provide powerful support for text, image, video and

audio, respectively. For example, the Text Extender encapsulates IBM's full-text

search technology that supports synonym search, proxirnity search, Boolean search,

and wildcard search [IBM97a]. The Image Extender can look for a image that has

3.2 Media Provider and Media Presenter 35

a particular color or pattern by using IBM1s Query by Image Content (QBIC) tech-

nology [NBE93]. A variety of multimedia formats are also supported for each typeo

such as TIF, GIF and BMP for image, WAVE and MIDI for audio, and MPEG, N I

and QuickTime for video. We make use of these multimedia extenders to manage

multimedia data in Our system.

3.2 Media Provider and Media Presenter

The Media Provider and Media Presenter are the basic units of our system. -4 Media

Provider - Media Presenter pair supports a single media presentation. The combina-

tion of multiple Media Provider - Media Presenter pairs can support a wide range of

multimedia applications. The Media Provider and Media Presenter are implemented

with threads in order to make efficient use of system resources.

A specific Media Provider and Media Presenter must be developed for each dif-

ferent media strearn, and the specific features and requirements of each media should

be considered. For example, we must develop a specific Media Provider and >Ie-

dia Presenter for each different video format, such as MPEG, Motion-JPEG, AVI,

QuickTime, and so on.

Two technical problems that must be considered are the adaptation technology,

and bufer management. The adaptation technology [HKR97] tries to dynamically

change the quality of a presentation after the system has detected bottlenecks in


data delivery. Reducing presentation quality leads to a reduced data volume to be

transported frorn the server t o the client and the presentation expects to keep up its

intra-media synchronization by reducing disk utilization, memory consumpt ion and

used network bandwidth. In general, adaptation strategies have to be invoked, if it can

be foreseen that a mnning presentation cannot keep up intra-rnedia synchronization

assuming that the resource consumption remains constant.

Adaptation techniques can be classified along two dimensions (Figure 3.4): (1)

the method used for data reduction and (2) the effect the adaptation ha. on the

presentation. In field one, a video stream is adapted by dropping single frames. The

synchronization requirements are met by presenting the previously presented frame

each time the presentation detects a dropped frame in the stream. In field two, the

rate of presented COPUs is reduced by switching to another stream. For esample.

the rate of samples presented in an audio stream can be reduced by switching to an

audio stream of a lower sampling rate. Field three uses switching to another data

stream for reducing the quality of single COPUs, but keep the original display rate.

Finally, field four shows how the dropping of COPUs could lead to a reduction of the

quality of single COPUs. If the raw information of a continuous object is stored in

more than one stream, the COPUs of the basic stream can be loaded first. If the

system has enough time to load other streams, the next incremental stream can be

transferred to the client. Otherwise, the incremental streams are dropped and the


COPU quality will not be increased.

Ahptation Dimension

Time

Resolution

Frame Dropping (Video)

(1)

Method Used Within a Continuous Switch between

Long Fields Continuous Long Field

Reduction of Sarnpling Rate (Audio)

(2 )

Dropping of Enhanced Layers (AudioNideo)

(4)

Figure 3.4: Classification of Adaptation bleclianism

Quality Switching ( AudioNideo)

(3)

The buffer management strategy [GZ96] is another critical technique. At the

server side, it can remove database delay jitter and improve the systern performance.

At the client side, it can remove network delay jitter and rninimize user interaction

response time while guaranteeing that d l continuity and synchronization requirements

are satisfied.

3.2.1 Media Provider and Media Presenter for MPEG video

We have developed a Media Provider and Media Presenter to support continuous and

interactive MPEG- 1 video presentations. The design and implement at ion considers

the following features:

Continuity: The MPEG video must be presented continuously a t a constant

speed, such as 25 frarnes per second (fps).


0 High data volume: The data volume involved in the presentation is very

high; approxirnately 1.5 Mbps for MPEG-1 video, and between 4 and 9 Mbps

for MPEG-2 video a t the play speed of 25 fps.

Frame dependency: The decoding of some frames of a MPEG video depend

on some other frames.

User interaction: Users may pause/stop the play, change the play speed , or

change the play direction during the presentation. The system should respond

to user interactions as soon as possible. . . '

To achieve these goals, MPEG video rnust be retrieved from the database, sent

over the network, decoded a t the client side, and delivered to the presentation devices

(speaker and display) at a constant speed (for example 25 fps). Network bandwidth

and database bandwidth are essential to support the high data volume, othenvise

adaptation strategies must be used. -4n efficient buffer management strategy is the

key to supporting continuity, frame dependency and user interactions.

The architecture of the Media Provider - Media Presenter pair for MPEG-1 video

is illustrated in Figure 3.5. It is a client-semer architecture. The Media Provider acts

as the server and is composed of a Buffer Manager and a Communication Manager.

The Buffer Manager retrieves video data from database and transfers it to the Com-

munication Manager. It provides buffer management strategies to srnoot h database

delay jitter and reduce user interaction response time. The main strategy to deal with


user interaction is implemented at the client side, but the cooperation of the server

is necessary. The Communication Manager is responsible for receiving requests from

the client and sending back data over the network.

Media Presenter f

Network

Media Provider 1

Figure 3.5: Media Provider and Media Presenter for MPEG video

The Media Presenter acts as the client and is composed of a Buffer Manager,

a Communicatzon Manger and a Video Decoder. The Buffer Manager at the client

side is the engine of the whole system. A special buffer management strategy is

implemented that retrieves video data from the server, buffers the video data in order

to smooth network and decoder delay jitter, resolves frame dependency, and rnakes

decisions to load, skip or replace data in the buffer. The Communication Manager

sends requests to the server and receives data from it. The Video Decoder gets

data from the Buffer Manager, decodes it and passes the decoded pictures to the


application for display. The UDP protocol is used to transfer video data from the

server to the client, and the TCP/IP protocol is used to transfer control data between

client and server. The Communication Manager can also be easily modified to work

over an -4TM network.

The irnplementation is based on a client-pull architecture where continuous data

is passed to the client with a best-effort delivery. In a client-pull architecture, clients

request data from the server at the time it is needed. The advantage of the client-

pull architecture is that the client can react to user interactions and to performance

bottlenecks in order to keep its presentation intra-media synchronized. Some delay

time is needed for the client to send requests to the server and wait for the server to

respond. So a11 the requests should be sent in advance to account for the delay.

In a semer-push architecture, once a session is started data are retrieved by the

server and transmitted continuously to a client wit hout any intermediate client re-

quests. It is difficult to handle user interactions in such system.

The best-effort approach is the simplest realization of multimedia presentations in

an open distributed environment. Each application is allowed to start a presentation

at any point in time without exclusively allocating resources. The drawback is that

no guarantee for the timeless of data delivery can be given and the temporary load

peaks may delay the presentations.


An alternative approach is resource reseruation, which means that resources in-

volved in the presentation, like processor, disk, and network, are requested by the

client and dedicated to the presenter at presentation time. The drawback of this

approach is that the resources have to be exclusively controlled, which is unrealistic

for commonly used open distributed environments like the Internet and non-real-time

operating systems. Another disadvantage is that resources are wasted if a user pauses

or switches to slower presentation speeds, which may happen very often in Our system.

We have therefore adopted the best-effort strategy. In order to keep up the intra-

media synchronization of a presentation in a best-effort system under varying amilable

resources, adaptation techniques have to be used. A simple adaptation technique is

used in our system where the frames are dropped selectively in case the system can

not afford the real-time presentation.

Chapter 4

Buffer Management Strategy

In this chapter we present our buffer management strategy to support interactive

MPEG video presentation. ÇVe first introduce the workflow of a MPEG video presen-

tation and then present the buffer management strategy.

4.1 MPEG Video Presentation

. *

We discussed in section 2.4.1 how the group of pictures (GoP) structure enables

random access within a MPEG video stream. We assign a sequential number, called

a GoP number, to each GoP. A sequential number is also assigned to each frame and

is denoted as a hame number. Alternatively a frame can be referred to using a frame

position, which is a GoP number and the frame's relative frame position within the

GoP.

44 BufFer Management S trategy

Since the frame size of MPEG video is variable, it is not easy to randomly ac-

cess the frames. We have enhanced the hinctionality of the DBZ/UDB video es-

tender to support random access of MPEG video frames. When a video is stored

into DB2/UDB, information about the video is extracted by the DBZ/UDB video

extender and stored with the video. We additionally check the video type and if it

is a MPEG video, we extract extra information including the number of GoPs, the

relative position of each GoP and the GoP pattern. This information about the GoP

makes it possible to randomly access MPEG video frames.

A frame is the basic data unit in our system. Client requests one frame a t a tirne!

and the server sends one frame a t a time. By doing so the system can avoid wasting

network bandwidth in a highly interactive presentation environment. The default is

that the client must send requests for each frame it wants. But this would not waste

too much network bandwidth and CPU resources because the data size of request is

very small.

4.11 Initialization

The workflow of a EvIPEG video presentation in Our system proceeds as depicted in

Figure 4.1:

The client initializes al1 its processes first. Then it sends a request to the server

to initiate a presentation. The server accepts the request if it does not exceed its

4.1 MPEG Video Presentation 45

reject

reques t

notify

Client

accept

Semer I I

listen

Client Session Manager

Figure 4.1: MPEG Video Presentation Process

I I

i f m e request

maximum number of concurrent sessions. If the server accepts the request, then it

Client -

initializes a new session to serve the client and sends a positive response to the client.

Session Manager

The client may then begin the presentation. The client first retrieves N GoPs into

frarne

its local buffer. The number N represents the number of GoPs that can be delivered

in the amount of time it takes to retrieve a frame from the server. In our case, iV is

4 and it takes just under 2 seconds to retrieve a frame. Thus, in the absence of user

interactions, the client can continuously retrieve frames from the server and the next

frames to be presented are always loaded in the local buffer.

To retrieve a frame the client sends a request to the server that specifies the frame

number. The semer retneves the desired frame from its local buffer. If the desired

frame is not in its local buffer, the server retrieves it frorn the database. Each time

46 BufEer Management Strategy

the server goes to the database it retrieves a block of video data that includes the

desired frame. The server then delivers the frame to the client over network.

The three components a t the client side, namely the Buffer Manager, Decoder and

Presenter, work independently to process the sarne data flow as shown in Figure 4.2.

decoded frames Buffer frarnes frames

O Manager Decoder Presenter

Figure 4.2: Data Flow a t Client Side

4.1.2 Buffer Manager

The Buffer Manager retrieves new frames from the server into its local buffer. The

Buffer Manager deals with two issues: the loading strategy and reacting to user

interactions. The first issue, that is the loading strategy, chooses the next frarne to

be loaded. The Buffer Manager should always load the frame first that will have the

greatest effect on the continuity of the presentation.

If there are already enough frames in local buffer to ensure continuous presen-

tation, that is more than N GoPs, the Buffer Manager simply loads the foliowing

frames in sequence. If there are not sufficient frames, the frames are loaded according

to their priorities. Within one GoP, the 1- frame has the highest priority, the P-

frames have the second highest priority, and the B- frarnes have the lowest priority.


The priority of a frame is also determined by its distance from the frame that is

currently being presented, which we cal1 the presentation point. The nearer a frame

is to the presentation point, the higher its priority since it has a higher probability of

being needed earlier. We assign a distance factor to each GoP denoting its distance

to the presentation point. The priority of a frarne therefore is decided by its priority

within one GoP and the distance factor of its GoP. For example, suppose we assign

the priorities of the frarnes within one GoP as follows (for simplicity the consecutive

B- frames are assigned the same pnonty):

I B B P B B P B B

1 0.7 0.7 0.9 0.6 0.6 0.8 0.5 0.5

The equations to determine these priorities will be given later.

We assume there are three GoPs - GoPl, GoP2 and GoP3 - where GoPl is the

nearest to the presentation point, and GoP3 is the farthest to the presentation point.

We assign distance factors to these three GoPs as follows:

The equations to determine these factors will be given later.

Therefore, if GoPi is already loaded, the priorities for al1 the frames of GoP2 and

GoP3 are computed as follows:

48 Buffer Management Strategy

GoP2 GoP3

I B-B P B-B P B-B I B-B P B-B P B-B

0.9 0.63 0.81 0.54 0.72 0.45 0.8 0.56 0.72 0.48 0.64 0.4

The loading order of the frames of GoP2 and GoP3 is determined by the priorities.

The second issue the Buffer Manager must deal with is the problem of minimizing

the response time to user interactions, where a user interaction refers to the change of

presentation states. The common presentation states are normal play (PLAY), fast

forward play (FF), fast backward play (FB), reposition (JUMP), pause and resurne.

In fast play mode (FF and FB), only E frames and P- frarnes are presented; thus the

play speed is three times faster than the normal play. One can also choose to only

play 1- frames so the play speed is nine times faster the normal play.

The loading strategy does not change for FF play. The B- frames are still loaded

though they are not presented. They are loaded in case the user changes play state

from FF to normal play so the loaded B- frames can be used immediately. Thus

the response time for this interaction can be very small. This strategy increases the

demand on the system, but does not jeopardize the presentation. If the system could

not afford to load al1 the data, the B- frames can be discarded first by the buffer

manager.

The Buffer Manager also provides a strategy to deal with FB play. Presented

frames are preserved in the local buffer for the amount of time i t takes to retrieve


a f r m e fiom the server before they are tossed. If the user chooses FB play, these

preserved frames can be used immediately without any delay. At the same time the

loading engine can begin to load frames in the reverse direction. Thus the response

time for a FB user interaction is small.

We do not have an efficient strategy for JUMP because it is hard to predict

the new positim. Whenever the user chooses this operation, we can only load the

needed frames and reconstruct the local buffer at that time, thus the response time

is relatively large compared to the other operations. An alternative way is to define

some working points [MKK95], and then restrict the user to only jump to one of

these working points. Some frames for these working points are loaded in advance

to reduce the response time. When user chooses the pause operation, the Buffer

Manager continues to work until the local buffer is full.

4.1.3 Decoder and Presenter

The Decoder is in the middle of the workflow. It decodes the video data loaded

by the buffer manager, and stores it in another buffer pool that is accessed by the

Presenter. The Presenter presents the video frames in real-tirne. The Decoder and

Buffer Manager share the same buffer pool, which is protected by a semaphore. Sim-

ilarly the buffer pool shared by the Decoder and the Presenter is d s o controlled by a

semaphore.

50 BuEer Management S trategy

We do not explain the decoding process of a single frarne here. More details can

be found in D. Gall [LEGSl]. We instead introduce the decoding process for a GoP,

whose pattern is IB1B2PLB3B4P2BJB6:

O The 1 frame is decoded first;

Then frarne Pl is decoded because it only depends on 1 frame;

Then frames BI and B2 are decoded because their reference frames I and Pl

are available;

Then frame P2 is decoded which depends on Pl only;

Then frarnes B3 and Bq are decoded;

O Then the I frame I2 of the next GoP is decoded;

Finally frames B5 and B6 are decoded;

So the decoding order of a GoP is IPIBl BZP2B3B412&B6.

The Presenter can not present the frames in the decoded order so it must sort the

frames according to their original order. To facilitate this sorting task, the Decoder

must work ahead of the Presenter so that the Presenter has enough time to sort the

decoded frarnes. Otherwise the frames may be discarded due to the wrong order.

The Decoder can decide to drop frames in the following situations:

4.2 BufEer Management Strategy 51

The frame is too late, for example it should be presented before or immediately

after the fraxne that is being presented;

The reference frames are not available, for example the P- frame imrnediately

after the B- frame which is to be decoded is not available;

The frame is damaged.

The Presenter presents the video frames in sequence. If the desired frame is not

available, the previous one is repeated and the delayed frame is subsequently dropped.

Dropping frames has a great effect on system performance so the Buffer Manager,

Decoder and Presenter should work cooperatively t o ensure that most frames could

be presented on time.

4.2 Buffer Management Strategy

In this section we present Our buffer management strategy to support continuous and

interactive MPEG video presentation. We consider the specific features of MPEG

video, like frarnes dependencies, in our design. The two main issues addressed by Our

buffer management strategy are preloading and replacement of video data. Preloading

is necessary because of the non-real-time behaviour of the underlying system compo-

nents (for example storage devices and network). The data needed by the Decoder

and the Presenter must be in the buffer before they are requested. A load-on-demand

52 BufFer Management S t rat egy

strategy can not guarantee continuity and would lead to a jittery presentation. The

number of frames to be preloaded depends on the predicted loading time of the

database and network connections. The number of preloading frarnes determines the

initial delay of a presentation. Strategies for quantiSing this parameter are given by

R. Ng [NY94].

The main goal of a replacement strategy is to replace those frames in the buffer

that are not expected to be presented for the longest period of time in the future.

.4ssuming a single, non-interactive presentation of the continuous ob ject , the strategy

of tossing a frame immediately after it is presented is optimal. In order to take the

interactivity of multimedia presentations into account the replacement strategy has

to consider the efFect of the user interactions on the data flow. Once a user interaction

occurs, the Buffer Manager has to preload frames in order to guarantee continuity

before the presentation can continue. Thus, interaction response time is primarily

determined by the number of buffer faults occurring during the preloading phase

imrnediately after the interaction. In order to reduce the number of buffer faults, the

buffer management strategy has to consider potential interactions by keeping those

frames which are referenced after Iikely interactions.

Additional buffer space is required to support user interactions. Besides the

preloaded frarnes needed for continuity, the Buffer Manager must also keep those

4.2 B d e r Management Strategy 53

frames that are referenced with high probability after interactions. It should be pos-

sible to tune the buffer management strateai, with respect to its degree of support

for interactions, to minimize buffer consumption. The extreme case of no interaction

support is equivalent to a simple 'LUse&Toss" strategy [CAFSI].

First we need to detemine the number of hames to be preloaded. Since our

system is a client-pull architecture, the client must issue initial requests wi t h some

overhead time t o overcome network delivery and database retrieval delay. The amount

of overhead time required is from the time a request is issued at the client side to the

tirne the client receives the desired data. The overhead tirne, which we denote as t,

is estimated based on the performance of the server and network load. CVe can then

convert t to a number of frames, N, which equals t divided by the play speed S. N is

the number of fiames to be preloaded.

. 0 0 . 0 . 0 O 0 0 0 . 0 0 0 0 0 0 0 I B B P B B P B B B P B P B S I B

I presenting point

Figure 4.3: An example of MPEG video stream

In Figure 4.3 each circle denotes a frame. There are two GoPs with the pattern

IBBPBBPBBPBB. LI, Lp and LB denote the next 1, P and B frame to be loaded,

respectively. And RI, Rp and RB denote the next 1, P and B frame to be tossed,

54 B d e r Management S trategy

respectively. We define the preloaded to be the distance, that is number of frames,

frorn the presentation point (PP) to the preloadzng point, which is the frame closest

to the presentation point among LI , Lp and LB. We denote presented as the distance

between the presentation point and the replacement point, which is the frame closest

to the presentation point among RI, Rp and RB. Before the presentation begins we

must load N frames. The Buffer Manager algorithm is shown in Figure 4.4.

do ( if there is free buffer (

if preloaded < N //adaptation is needed ( skipframes; }

PRELOADING; 1 else { //no free buffer

if preloaded < N ( if presented > O

( REPLACEMENT; PRELOADING;

1 eIse if the frame to be loaded is a B frame { skip this frame; }

1 1

if presented > N ( REPLACEMENT; }

) while (1);

Figure 4.4: Buffer Management Algorithm

At any time we try to maintain a t least N frames in local buffer in order to ensure

a smooth presentation. We should also preserve about N presented frames in local

buffer in order to efficiently respond to user interactions. For example, when a user

4.2 Buffer Management Strategy 55

changes play direction from forward play to backward play, the r e s e ~ e d frames can

be used immediately. If there is not enough bufTer space to maintain N preloaded

frarnes and N presented frames, we discard some of the presented frames. If buffer

space is still too small, we drop some preloaded frames.

4.2.1 Preloading Strategy

The three different kinds of frarnes in an MPEG video stream have different impor-

tance. Within one GoP, the 1 frarne is the starting point for decoding the followving

frames, so it must be loaded into the local buffer first. P frames should be loaded

immediately after the I frame, because they are needed to decode B frames. The P

frames within a GoP are loaded in sequence, because the previous P frame is needed

to decode the following P frarne. The B frames are loaded last provided that there

is sufficient buffer space and network bandwidth. In the case that frames must be

dropped, we drop B frames first, then P frames, and then finally 1 frames. If an 1

frame or P frame is dropped, then the following frames within the same GoP should

al1 be dropped accordingly.

We assign a priority values Pr, Pp and PB to L I , Lp and LB, respectively. When-

ever we need to load a frame, we choose the one from L I , Lp and LB with the highest

priority value. We calculate the priorities for the next frame of each type as follows.

We denote CoP as the number of frames in one GoP, NP as the number of P frames

56 Buffer Management S t rat egy

in one GoP, and NB as the number of B frarnes in one GoP. The presentation point is

denoted as PP. As we discussed in session 4.1.2, the priority of a frame is determined

by its priority within its GoP and the distance factor of its GoP.

To calculate a frarne's priority, we first define the frame position of each loading

point, Say NI , N p and Ng, within a GoP as folIows:

The frame priorities Fr, Fp and FB are then defined as follows:

where mp, r n b , c, and cb are variables.

To calculate the distance factors, we first calculate the distance, in GoPs, of each

loading point from the presentation point, Say DI , Dp and D p , as follows:

4.2 BufFer Management Strategy 57

Dg = (LB - PP)/GoP

The distance factors Sr, Sp and SB are then defined as follows:

where r., rp and rb are variables.

Finally the priority values PI, Pp and PB are calculated as follows:

The variables m,, m b , +, cb, T., rP and r b must be tuned to achieve acceptable

performance for each particular system. In Our system as presented in the following

chapter these variables are 0.9, 0.85, 0.05, 0.05, 0.1, 0.1 and 0.2, respectively.

4.2.2 Replacement Strategy

The replacement algorithm considers the dependencies among frames within one GoP.

In one GoP:

58 B d e r Management Strategy

a B frame can be tossed anytirne;

a P frame can be tossed if no dependent frames exist which are either the

following frames within the same GoP or the two B frames irnrnediately before

it;

the 1 frame should be the last frarne to be tossed.

During replacement, the 1 frames, P frames and B frames should be tossed in the

reverse order in which they are loaded. We denote the next I frame to be tossed as

RI, the next P frame to be tossed as Rp and the next B frame to be tossed as Re.

Pnority values PI, Pp and PB are again assigned to RI, Rp and RB, respectively.

Whenever we need to toss a frame, we choose the one from RI, Rp and RB with the

lowest priority value.

The priority values for RI, Rp and RB are calculated using the same equations as

those defined for LI , Lp and Ls except RI, Rp and RB substitute for LI , L p and Ls.

The variables in those equations are the same for both preloading and replacement

because they use the same mechanism.

Chapter 5

Performance St udy

The objective of the performance study is to examine the smoothness and interactive

response time of MPEG video presentation using our buffer management strateg- We

also compare Our strategy with two others. The results of these tests are presented

and discussed in t his chapter.

5.1 Measurements

Two measures are used to evaluate our strategy. One is smoothness, which is the

deviation of presentation jitter [SWM95] from the desired value of zero. We assume

that the mapping of logical time (frame number) into system time is precise, because

the database delay and network delay are all-constant. We also assume the delay

from the client t o the video output can be ignored, because it is same in al1 of Our

59

60 Performance Study

experiments. The presentation jitter is measured in t e m s of logical display time.

Consider a video Stream of kame sequences (fo, f i , . . . , jn) and a playback dis-

playing a subsequence of these frames (fia, fi,, . . . , !*,,,). At each logical display tirne

k (k 2 O and k 5 n), we calculate the logical time error, ek = k - ik between the

expected frame fk and the actually displayed frame fi,, where ik 5 k and ik+i 2 k,

producing the error sequence E : (eo, e l , . . . , en). The smoothness, S, of a playback

is the deviation of the sequence E from the perfect playback, which drops no frames

and has an error sequence of al1 zeroes. Thus S is defined as [SWM95]:

This definition of S is independent of play speed. A lower value of S indicates a

smoother playback. S equal zero denotes perfect playback.

The other measure we use to evaluate the performance of our buffering strategy

is interaction response time (IRT), which is the delay between the occurrence of a

user interaction and the time when the system reacts to this interaction by continuing

with the presentation flow. It is a critical parameter for the use and acceptance of

multimedia systerns. The typicai interactions are:

forward play (PLAY) : al1 the frames are presented sequentially in forward

direction.

fast forward play (FF) : one of every three fiames (al1 the 1- and P- frames) are

presented in forward direction.

5.1 Measurements 61

fast backward play (FB) : one of every three frames (al1 the 1- and P- frames)

are presented in backward direction. The frame number of the following frame

is less than the previous one.

reposition (JUMP) : move the presentation point to any new place and start to

play the video there.

rewind : move the presentation point to the start of the video and play

pause/stop : stop playing.

resume : continue playing.

The response time we are going to measure includes PL.4Y -t FF, PLAY + FB, FF

-t PLAY, FB -t PLAY, FF + FB, FB + FF and PLAY -t JUMP. \Ne calculate

IRT in the following way.

Suppose Ive are playing MPEG video in normal fonvard play (PLAY) mode and

that we want to calculate the IRT that it will take to change to fast fonvard play

(FF) mode. Once the user presses the FF button, how do we tell if the video is being

played in FF mode? In PLAY mode, we present the frarnes sequentially. In FF mode

every third frame is presented, that is only I frames and P frames are presented. The

IRT for this interaction is measured from the time the user presses the FF button

to the time we detect a number of successive frames that are al1 1- or P- frames and

that have the same gap of three frames. The IRT times for al1 the other interactions


is calculated in a similar manner.

5.2 Comparing Strategies

We compare our buffer management strategy with two other strategies. The first

one is L/MRP [MKK95] uçed in AMOS which was developed a t GMD - German

National Research Center for Information Technology. They also conçidered media-

specific modeling of MPEG video [HL97]. However they adopted a different strategy

from ours to load and drop frames. The three possible strategies to load and drop

frames of MPEG video are shown in Table 5.1. Model a is used in our strategy, and

model c is used in L/MRP. Model a demonstrates a better performance than model

c as we discuss Iater.

a) 1 1 1 B 1 B 1 P 1 B 1 B ( P 1 B 1 B 1 P 1 B 1 B 1 Preloading by priority

b) 1 1 B B P B B P B B P B B.1 Conventional Preloading

C) 1 1 B B 1 P B B 1 P B B 1 P B B 1 Preloading by priority

Table 5.1: MPEG Video Frames Model

In model a, each frame within a GoP is loaded and dropped independently. I-

frames have a higher priority than B- and P- frames. And P- frames have a higher

priority than B- frames. We adopt this model because we feel it is more flexible and

5.2 Comparing Strategies 63

more efficient than the others.

In mode1 6, each GoP is loaded or dropped as a unit. In Our experiments, one

GoP includes 12 frames that car be presented for half a second so dropping one GoP

would lead to unacceptable jitter.

Mode1 c segments one GoP into pieces. Each segment starts with an I- or P-

frame. The segments that include the I-frames must be loaded first since the other

segments are dependent on it. The segments act as the atornic preloading unit that

means that a segment is loaded or dropped as a whole. This strategy is implemented

in the AMOS multimedia database system [HL97]. So we denote this strategy as

AMOS.

The second strategy used in our cornparison \vas developed at Oregon Gradu-

ate Institute of Science and Technology. Shanwei Cen et al [CPS95] designed and

implemented a distributed real-time MPEG video player using a software feedback

mechanism. The system uses a semer-push architecture. The server retrieves video

data according to the retrieval putteni it receives from the client and sends it to the

client a t a constant speed. The retrieval pattern comes from the GoP pattern. If a

frame within a GoP is to be retrieved, the corresponding bit of the retrieval pattern

is set to 1; otherwise it is set to O. The client simply consumes what it receives.

A feedback mechanism is used by the client to control client/server synchronization

and quality-of-service (QoS). When a bottleneck is detected a t the client side, the


client calculates the retrieval pattern again and sends it to the server. The server

then retneves data according to the new pattern. When the client calculates the

retrieval pattern, it always tries to evenly drop frames within a GoP. Thus it can

maintain a smooth presentation.

-4 simple buffer management strategy was implemented at the client side that

bufTers the data received from the server in a single buffer queue. The decoder con-

sumes the buffered data sequentially and the presented frames are tossed immediately.

We denote this algorithm as OGI.

In OGI, MPEG videos are stored in disk files. The transfer speed of disk files is

much higher than that of a database so the performance of OGI should be supenor

to Our system because of this reason. In order to make a more even cornparison

nre altered the OGI system so that it can also retrieve video data from a relational

database. We denote this new version of the system as OGI*.

5.3 Test Environment

The performance study tests were conducted in an environment with the following

properties:

The server runs on an IBM PowerStation 220. The video data is stored in IBM

DBOIUDB.

5.4 Results and Observations 65

The client runs on an IBM PowerPC.

The Server and the client are connected via a 10 Mbps Ethernet.

The frame size of the MPEG video is 320 x 240. There are 9500 frames in total,

which are encoded at 30 fps. The average frame size is 4.79K bytes, and the

GoP pattern is IBBPBBPBBPBB.

The buffer size a t the client side is 512K bytes.

Software decoding is used in al1 the systems.

To evaluate the smoothness of the four strategies (ours, OGI, OGI* and AMOS),

ive played the default video stream at various play speeds without user interactions.

Each play was repeated 5 times, and a smoothness value was calculated after each

play. The average smoothness values are presented in the following section.

The same strategy is used to evaluate IRT. The default video stream was played

at a fked play speed. Then we tried al1 kinds of user interactions during the play.

Each user interaction was repeated 10 times. The average IRT values are presented

in the following section.

5.4 Results and Observations

We present experiments to compare the smoothness and interactive response time

rneasurements of the four strategies. We also present experiments to show how the


buffer size affects the smoothness measurement of Our system.

5.4.1 Smoothness

Figure 5.1 shows the smoothness of the four strategies. The vertical avis is the

smoothness measurement. The horizontal avis is the play speed of presentation in

frames per second (fps). The play speed in the graph ranges from 5 to 16 fps. We

collected samples at each integers and the midpoints. When the play speed is under

5 fps, the smoothness measurements of al1 the strategies except OGI* are zero which

means that they work perfectly. When the play speed is higher than 16fps, which is

the rnaximpm playing rate supported by our system, the smoothness rneasurements i

of al1 the strategies except OGI increase dramatically. Our strategy out performs a11

the others when the presentation play speed is between 5 fps and l4fps.

Our strategy shows better performance than AMOS due to Our difTerent loading

and replacement strategy. In our strategy each frame is dropped independently so

we can choose to drop frames that have the least effect on other frames, that is

the B frames, and to drop the frames evenly within one GoP. AMOS, on the other

hand, must drop a segment a t a time. The dropped segment also effects the following

frames.

For example, let AMOS drop its last segment which has one P- frarne and two B-

frames, and our strategy drop the last four B- frames of the pattern shown in Table


5 6 7 8 9 IO 1 1 12 13 14 15 16

Figure 5.1: Smoot hness Measurement

2 We drop one more frame than AMOS. Normally the size of a P- frame is larger

than that of a B- frame, and we want both strategies to drop about the same amount

of data so we assume that the size of two B- frames will not be less than that of one P-

frame. In Table 5.2, the positions marked X in the pattern mean the corresponding

frames are dropped. The smoothness within one GoP is calculated using Equation

5.1 defined in Section 5.1. It is clear that Our strategy is better than AMOS.

Our strategy outperforms OGI when the play speed is less than 14 fps. OGI is

a semer-push system, and Our strategy is based on a client-pull architecture. In Our

strategy, the client can decide to drop frames immediately if it detects a bottleneck.


Loading Pattern Smoothness

Pattern I B B P B B P B B P B B

AMOS I B B P B B P B B X X X 4-=2.16

Ours I B B P B B P X X P X X 4-=l58

Table 5.2: Smoothness of AMOS and Our Strategy

When the bottleneck disappears, it can either drop fewer frarnes or stop dropping

frarnes entirely. In OGI, when a bottleneck is detected, the client recalculates the

retrieval pattern and sends it to the server. The server then takes action to drop

frarnes according to the retrieval pattern. It therefore takes a longer time for OGI

to respond to a bottleneck than our strategy. During the presentation, the system

resources such as database and network bandwidth may change frequently because

it is a open environment. We therefore need to respond to these changes quickly;

othenvise the smoothness of the presentation may be jeopardized.

The following is an example. Suppose that at first we play the video smoothly

and can preload al1 the video data. We then detect a bottleneck and estimate that we

can only preload nine frarnes per GoP. In Our strategy, we can change the preloading

pattern imrnediately as shown in Table 5.3. OGI, however, can not change immedi-

ately so the last three frarnes of the GoP will be dropped. Its smoothness is therefore


worse than ours as shown in Table 5.3.

Loading Pattern Smoothness

Pattern 1 B B P B B P B B P B B

Ours I B X P B X P B X P B B ,/F=l

1 OGI I B B P B B P B B X X X d F = 2 . 1 6

Table 5.3: Smoothness of OGI and Our Strategy

The srnoothness measurement of OGI* is worse than ours because it does not con-

sider the specific features of the database. As we know, there is overhead associated

with retrieving video data from database using DB2/UDB7s multimedia extenders.

The server always retrieves a block of data so that the average overhead time can be

reduced. The size of the block determines the time tb to retrieve the block from the

database. In our strategy we considered this overhead time and added the average

overhead time into the preloading time. The client preloads an amount of data which

takes longer to present than tb.

When the client requests a frame that is not in the local buffer of the server,

the server retrieves the block from the database which has the desired frame and

it takes approxirnately tb to deliver the frame to the client. During this time, the

client continues to consume those frarnes that are preloaded in its local buffer. So the


database overhead time does not affect the smoothness of Our strategy.

OGI*, bowever, does not consider the database overhead time. Its client simply

consumes what it receives from the server. W ' e n the server needs to retrieve data

from the database, the client rnay consume d l of its buffered frames and then be forced

to wait for the server which degrades the smoothness of the presentation. When the

desired frame finally arrives, the client rnay have to drop it because it is too late:

which may also affect the following frames.

When play speed is geater than 15 fps OGI outperforms our strategy. The

smoothness of Our strategy degrades beyond 15 fps because the database I/O becomes

a bottleneck. Each time we retrieve a block of video data from the database which

takes approximately 3 seconds. This amount of data can be played at the client for

about 3 seconds at the play speed of 16 fps. When the play speed exceeds 16 fps, the

database can not provide the data in real-time and the smoothness degrades. The

OGI retrieves video data from a disk file and so does aot encounter this limit.

We did not calculate standard deviation of the smoothness results, so some differ-

ences may not be significant. But given a consistent experimental environment, that

would not affect our results.


5.4.2 Interactive Response Time

Table 5.4 shows the results of experiments to evaluate interactive response time for

the four strategies. The video was piayed at the frame rate of 12 fps. The table iists

the IRTs for the following interactions:

O PLAY -t FF changes the status from normal play to fast fonvard play. The

IRT is measured from the time the user presses the PLAY button to the time

we detect that M successive 1- and P- frames are presented. M is a variable,

which is 5 in our experiment

F F -t PLAY changes the status from fast fonvard play to normal play. The

IRT is measured from the time the user presses the FF button to the tirne we

detect that M successive frames are presented.

O PL.4Y -t FB changes the status from normal play to fast backward play. The

IRT is measured from the time the user presses the PLAY button to the time

we detect that M successive 1- and P- frarnes are presented in reverse order.

FB -t PLAY changes the status from fast backward play to normal play. The

IRT is measured from the time the user presses the FB button to the time we

detect that M successive frames are presented in fonvard direct ion.

O FF -t FB changes the status from fast fonvard play to fast backward play. The

IRT is measured from the time the user presses the F F button to the time we


detect that M successive 1- and P- frarnes are presented in reverse order.

FB + FF changes the status from fast backward play to fast forward play. The

IRT is measured from the time the user presses the FE3 button to the tirne ive

detect that M successive 1- and P- frames are presented in forward direction.

JUMP starts presentation from new position. The IRT is measured from the

time the user presses the JUM P button to the time we detect that M successive

frames are presented in forward direction.

AMOS

PLAY + F F 126

FF -, PLAY 128

PLAY + FB 442

FB-PLAY 411

F F + FB 344

FB - FF 302

JUMP 4406

Ours

113

117

414

353

263

291

3297

OGI

211

312

216

319

205

209

103

Table 5.4: Interaction Response Time (ms)

From the table we can see that PLAY-iFB and FB-,PLAY take longer time than

PLAY-LFF and FF-LPLAY. The reason is that PLAY-LFB and FB-iPL.4Y need to


change play direction which will take some time to reorganize buffer pool. We can

also see that JUMP operation takes relatively longer time than al1 the ot hers because

the new position is difficult to predict. Whenever a JUMP operation occurs, we have

to load al1 the needed data from the semer.

The results show that Our strategy outperforms AMOS and OGI* for al1 the cases,

and outperforms OGI for two cases. Our preloading strategy allows us to perform

slightly better than AMOS. Although both strategies try to preload al1 the video data

in order to support user interaction etliciently, when it can not afford to retrieve al1

the data in real-time our strategy tries to preload the higher priority 1- and P- frames

that are needed by FF and FB play. We can therefore perform FF and FB smoothly

with little affect on PLAY. AMOS, on the other hand, drops segments that include

1- or P- frames so that FF , FB and PL.4Y are al1 affected.

The performance of OGI* is the worst of the four strategies. The main reason is

that it uses the Use&Toss strategy [CAFSl]. In OGI* the data is tossed irnmediately

after being used. Whenever a user changes play direction, e.g. PLAY + FB or FB -t

PLAY, it does not have any data that can be used immediately. It rnust reconstruct

the buffer pool and reload al1 the data in the other direction, which increases its IRT.

The other reason for its long IRT is that it only loads data that is needed for the

current presentation. For FF and PB, it only preloads the desired 1- and P- frames.

When user changes play operations, such as FF -t PLAY or FB -t PL.4Y, it does

l


not have the desired B- frames in hand.

Our strategy has almost the same performance as OGI. Our IRT for PLAY -t

FF and FF + PLAY are better than that of OGI. The main reason is that Our

client tries to load al1 the frames for PLAY and FF operations if the buffer space and

network bandwidth are sufficient. When the user changes play operations between

PLAY and FF, no change is needed. Al1 the desired data is there so the IRT can

be small. On the other hand, OGI only loads those data that are needed for the

current presentation. For FF, it only preloads the desired 1- and P- frames so when

user changes play operations form FF to PLAY, OGI does not have the desired B-

frames in hand. OGI's IRT is therefore longer than ours. Our IRT response for other

interactions are worse than that of OGI because those interactions need to change

play direction. Whenever play direction is changed, we need to reconstruct the buffer

pool because we use link list to manage the buffer pool. That will cost some time.

However the overhead is not too much as you can see from Table 5.4.

For all the strategies the IRT of JUMP is higher than other operations. The

reason is that it is difficult to predict the new position of JUMP. Thus we can not

preload the data for the play.


5.4.3 Smoothness vs. Buffer Size

The smoothness of Our strategy greatly depends on the size of the buffer at the client

side. This fact is shown by results in Figure 5.2. The horizontal avis denotes buffer

size measured in kilobytes (KB) and the vertical axis denotes srnoothness. From

Figure 5.2 we can see that the smoothness depends directly on the buffer size.

Smoothness S

128 192 256 320 384 448 512 576

Figure 5.2: Smoothness vs. Buffer Size

In this experirnent we Vary the buffer size from 128KB to 576KB; the play speed

is fixed at 12 fps. When the buffer size is less than 12SK, the smoothness of the

system is very bad, because the system can not buffer sufficient frames to tolerate

the database and network delay. When the buffer size grows, more frames can be

76 Performance S t ud y

preloaded, thus the system can tolerate more delay, and the srnoothness is irnproved

subsequently. When the buffer size is greater than 576KB, the smoothness of the

system is near zero which means perfect playback.

Chapter 6

Conclusions

Multimedia presentations demand specific support from database management sys-

tems. Specially, the buffer management strategy plays an important role in sustaining

smooth presentation and user interactions. We developed a buffer management strat-

egy to support MPEG video presentation where the video is stored in a relational

database. MPEG-related features, such as frame dependency, were considered in the

design of our strategy that affects the performance of the strategy. We implemented

this strategy in an MPEG video presentation system and conducted experiments to

evaluate our strategy. Our strategy demonstrates a better performance than other

existing strategies. We also designed a frameivork to support multimedia presenta-

tion. Our implementation produces some of the components of this framework, and

the remainder are left as future work.

78 Conclusions

The following sections present a s u m m q of the contributions of the research and

a discussion of interesting future work in the area.

Contributions

This research rnakes the following contributions:

The design and implementa t ion of the MPEG video presentation sys-

tem. The MPEG video presentation systern can play MPEG video in real-time

and support various user interactions. The MPEG video is stored in a relational

database system and can be played in real-time.

The development of a b d e r management strategy for continuous

media support. The buffer management strategy provides good support for

continuous and interactive MPEG video presentation. A preloading strategy is

irnplemented to remove the delay of the underlying system components. Fur-

thermore, a replacement strategy is implemented that attempts to replace those

frarnes which are not expected to be referenced for the longest period of time

in the future. The concepts of distance relevance function, preloading prior-

ity and replacement priority are introduced in the design. The design also

considers the specific features of MPEG video, specifically the dependencies

6.2 Future Work 79

between the frames. It preloads different MPEG video frames in different pri-

ority to maximize the srnoothness and minimize the interaction response time

of the presentation. This strategy has been implemented in Our MPEG video

presentation system, and experiments have been conducted to evaluate the per-

formance of the strategy. The smoothness of our strategy is better than other

existing strategies, and the IRT is better than others in most cases.

The design of a fiamework to support multimedia presentation. We

proposed an open and scalable framework to support various multimedia appli-

cations. The framework makes use of the relational database system DB2/UDB

and its multimedia extenders to manage multimedia data. The system can also

be expanded to accommodate new media types on the help of media-specific

providers and presenters. A wide range of multimedia applications can be sup-

ported by combining different media providers and media presenters.

6.2 Future Work

-4 number of topics within this research area remain open for future consideration:

Tunable parameters of b e e r management strategy to support differ-

ent presentation scenarios. We have defined a set of functions to calculate

priority value in our buffer management strategy. The parameters in those

80 Conclusions

functions came from experimentd results with Our particular configuration. In

redity there are many other presentation scenarios where the current param-

eters are not suitable. Ideally these parameters should be tunable to satisfy

difFerent presentation scenarios.

Define working point to support JUMP operation more efficiently.

Our buffer management strategy does not have good support for the JUMP

operation. The reason is that i t is difficult to predict the new position. Thus

we can not preload data for the new playback. An alternative way is to define

some named working points so that the JUMP operation can be replaced with

an operator to set the working point. We can preload data for these working

points to reduce the response time for this operation.

Make suitable changes to support MPEG-2 video. Our MPEG video

presentation systern can only support MPEG-1 video. However it is not diffi-

cult to adapt it to support EVIPEG-2 video due to their conceptual similarity.

The key features of MPEG-2 are the scaiable extensions that permit the divi-

sion of a continuous signal into two or more coded bit streams representing the

video at different resolution, picture quality, or picture rates. The modeling

of an MPEG-2 video can be done in the same tvay as discussed for MPEG-1

streams. But its scalability features must be considered, and additional seman-

tic information may be needed at the client side or a t the server side.

6.2 Future Work 81

Incorporate hardware support to decode full-size MPEG video. Our

system only uses a software decoder to decode MPEG video. The CPU procesc

ing capability limits the maximum picture size as 320 x 210, and the maximum

play speed as 16 frames per second (fps). To support full-size (640 x 480) and

full-speed (25 or 30 fps) MPEG video presentation, we must have corresponding

hardware support.

i Media Sharing at the server side. In Our system, the server can support

multiple clients. The data rates at the server side are so high that despite

the development of an efficient retrieval strategy, database I/O can still be

the potential bottleneck. This problem limits the number of concurrent clients

that can be supported by the system. A media sharing technique [KRT95] can

reserve the buffers that have been played by a user in controlled fashion for use

by subsequent users requesting the same data. We can implement this technique

at the server side to irnprove the performance of the system.

L

Presentation Adaptation, Media Synchronization and Media Compo-

sition. Techniques to provide present ation adaptation, media synchronization

and media composition are al1 useful addit ions to our multimedia application

framework. Presentation techniques are needed to keep up the intra-media syn-

chronization of a presentation in best-effort systems under varying available

resources. We have implemented a simple adaptation mechanism in our system

82 Conclusions

that should be improved to achieve better performance. Media S~chroniza t ion

and Media Composition are needed for multiple media presentations.

Bibliography

[AUG95] Sankt Augustin. VODAK V4.0 User Manual. GMD Technical Report

No. 910, April 1995.

[BKL96] S. Boll, W. Klas and M. Lohr. Integrated Database Services for Blultime-

dia Presentations. In Multimedia Information Storage and Management,

Kluwer Academic Publishers, 1996.

[CAF911 S. Christodoulakis, N. Ailamaki, M. Fragonikolakis, etc. An O bject Ori-

ented Architecture for hl ultimedia Information Systems. In IEEE Data

Engineering, M(3), pp 34-41, September 1991

[CGSSS] S. Chaudhuri, S. Ghandeharizadeh, and C. Shahabi. Avoiding Retriewl

Contention for Composite Multimedia O bjects. In Proceedings of the 2lst

VLDB conference, pp 122-129, Zurich, Switzerland, 1995.

[CPS95] S. Cen, C. Pu, R. Staehli, etc. A Distributed Real-Time MPEG Video

Audio Player. In Proceedings of NOSSDA V'95. pp 99-107, -4pril 18-21,

83

84 BIBLIOGRAPHY

2. Chen, S. Tan and R. Campbell. Real Time Video and Audio in the

World Wide Web. In World Wide Web Journal, Volume 1, Number 1,

December 1995, pp 333-348.

J. Dey, J. Salehi and J. Kurose. Providing VCR Capabilities in Large-

Scale Video Servers. In Proceedings of ACM Multimedia, pp 134142, San

Francisco, October 1994.

A. Dan, D. Sitaram and P. Shahabuddin. Scheduling Policies for an On-

Demand Video Server with Batching. IBM Research Report RC 19381.

1994

A. Dan and D. Towsley. An Approximate Analysis of LRU and FIFO

Buffer Replacement Schemes. In Proceedings of ACM SIGMETRICS

Conference 1990, pp 143-149, 1990.

W. Effelsberg and T. Haerder. Principles of Database Buffer Manage-

ment. In A CM Transactions of Database Systems, 9(9) :560-595, 1984.

J. Gemme1 and S. Christodoulakis. Principles of Delay-Sensitive Multi-

media Data Storage and Retrieval. In ACM Transactions on Information

Systems, 10(1), pp 53-59, January 1992.

[GZ96a] S. Gollapudi and A. Zhang. Buffer Management in Multimedia Database

Systems. In TheThird IEEE International Conference o n Multimedia

Comput2ng and Systems (K'MCSy96), pp 87-95, Hiroshima, Japan, June.

1996.

[GZ96b] S. Gollapudi and -4. Zhang. NetMedia: -4 Client-Server Distributed Nul-

tirnedia Database Environment. In the 1996 International Workshop on

Multimedia Database Management Systems, pp102-110, Blue Mountain

Lake, New York, August. 1996.

[HKR97] S. Hollfelder, A. Kraks and T. C. Rakow. X Client-Controlled Adapta-

tion Framework for Multimedia Database Systems. In European Work-

shop on Interactive Distributeci Multimedia Systems and Telecommunica-

t ion Services (IDMS797'), ppL87-192, September 10- 12, Darmstadt, Ger-

many.

[HL971 S. Hollfelder and H. Lee. Data Abstractions for Multimedia Database

Systems, 1997. GMD Technical Report.

[HOL97] Silvia Hollfelder. Admission Control for Multimedia Applications in

Client-Pull Architectures. In International Workshop o n Multimedia In-

formation Systern (MIS), pp 23-32, Como, Italy, Sept. 25-27, 1997.

[HSH97]

[IB M97al

[IBM97b]

[INF97]

[INF97b]

[ISOSS]

[JZSS]

S. Hollfelder, F. Schmidt and M. Hemmje. Transparent Integration of

Continuous Media Support into a hlultimedia DBMS. GMD Technical

Report (Arbeztspapiere der GMD) No. 1104, St. Augustin, Germany, De-

cember 1997

DB2 Relational Extenders. IBM white paper.

ht tp://wwnr.software.ibm.com/data/pubs/papers/.

DB2 Object-Relational Solution. IBM white paper.

http://www.software.ibm.corn/data/pubs/papers/.

Michael Stonebraker. Architecture Options for O bject-Relational

DBMSs. 1nformi.x white paper.

http://wnrw.informiu.com/informi~/corpinfo/zines/whiteidxhtm.

Michael Stonebraker. Object-Relational DBMS - The Next Wave. In-

for mi^ white paper.

http://www.informLu.com/informix/corpinfo/zi~es/~vhiteid~. htm-

Hypermedia/Time-based Structuring Language: HyTime (ISO 10744).

International Standard Organization. ' '

T. V. Johnson and A. Zhang. A Framework for Supporting Quality-

Based Presentation of Continuous Multimedia Çtreams. In the Fourth

BIBLIOGRAPHY 87

IEEE Internat ion~l Conference on Multimedia Computing and Systerns

(ICMCS'96), Ottawa, Canada, June, 1997.

A. Kraio. An Object Manager for Continuous Data Within the

OODBMS VODAK (in German). In GMD-Studien 256, Darmstadt,

1994.

M. Kamath, K. Ramamritham and D. Towsley. Coutinuous Media Shar-

ing in Multimedia Database Systems. In Proceedzngs of the Fourth In-

ternational Conference on Database Systems for Advanced Applications

(DASFA A '95), Singapore, -4pril 10-13, 1995

Didler Le Gall. MPEG: A Video Compression Standard for Multimedia

Applications. In Communication of The ACM, Vo134, No. 4, April 1991,

pages 45-68.

S. Little and A. Ghafoor. Network Considerations for Distributed Multi-

media Object Composition and Communication. In IEEE Network Mag-

azine, pp. 32-49, 1990.

T. Little and A. Ghafoor. Synchronization and Storage Models for Multi-

media Objects. In IEEE Journal on Selected Areas i n Communications,

8(3):413-427, April 1990.

[MKK95] F. Moser, A. Kraib, and W. Klas. L/MRP: -4 Buffer Management Strat-

egy for Interactive Continuous Data Flow in a Multimedia DBMS. In

Proceedings of the 21st VLDB conference, Zurich, Switzerland, 1995.

[MPEZ] Generic Coding of Moving Pictures and Associated Audio Information

- Part2: Video (MPEG-2) , ISO/IEC 13818-2 International Standard,

1996.

[NBE93] W. Niblack, R. Barber, and W. Equitz. The QBIC Project: Querying

Image %y Content Using Color, Texture, and Shape. In SPIE 1993 In-

ternational Symposium on Electronic Imuging: Scinece and Technology,

pp 77-87, February 1993.

[NFS91] R. Ng, C. Faloustos and T. Sellis. Flexible Buffer Management based on

Marginal Gains. In Proceedings of the 1991 ACM SIGMOD Conference,

pp. 379-396, 1991.

[NNW93] E.J. O'Neil, P.E. O'Neil and G. Weikum. The LRU/k Page Replacement

Algorithm for Database Disk Buffering. In Proceedzngs of the 1993 ACM

SIGMOD Conference, pp. 297-306, 1993.

[NT921 E.J. Newhold and V. Turau. Database Research a t IPSI. In SIGMOD

Record, 21(1):133-138, March 1992.

BIBLIOGRAPHY 89

R. Ng and J. Yang. Maximizing Buffer and Disk Utilization for News On-

Demand. In Proceedings of the 20th International Conference on Very

Large Data Bases 1994 (VLDBYd), pp. 451-462, 1994.

T.C. Rakow, W. Klas and E.J. Newhold. Research on Multime-

dia Database Systems at GMD-IPSI. In IEEE Multimedia Newsletter

4(1):41-46, April '96.

T. Rakow, E. Neuhold, and M. Lhr. Multimedia Database Systems - The

Notions and the Issues. In Tagungsban GI-Fachtagung Datenbanks-terne

in Bro, Technzk und Wissenschaft (BTW), Dresden, Mrz 1995, S. 1-29.

Springer, Reihe Informatik Aktuell, Berlin 1995.

S. Roa, H. Vin and .4. Tarafdar. Comparative Evaluation of Server-push

and Client-pull -4rchitectures for Multimedia Severs. In Nossdav 96, pp.

45-48, 1996.

D. Rotem and J. L. Zhoa. Buffer management for Video Database Sys-

tems. In Proceedings of IEEE Data Engineering 1995, 18, pp 45-50,1995.

J. A. Schnepf, Y. Lee and L. Kang. Building a Framework for FLexible

Interactive Presentations. In Paczfic Workshop on Distnubted Multime-

dia Systems (Pacific DMS '96), 190-197, Hong Kong, June 1996.

90 BIBLIOGRAPHY

G. Saco and M. Schkolnik. Buffer Management in Relational Database

Systems. In ACM Transactions on Database Systems, 11(4), pp. 173-495,

1986.

R. Staehli, J. Walpole and D. Maier. Quality of Service Specifications

for Multimedia Presentat ions. In Mdtimedza Sys tems. August , 1995.

H. Thimm and W. Klas. Playout Management - An Integrated Service

of a Multimedia Database Management System, 1995. Technical Report.

GMD-IPSI.

Glossary

AMOS

DBMS

FIFO

GoP

IRT

LFU

LRU

L/MRP

MM-DBMS

MPEG

OGI

QoS

UDB

Active Media Object Stores

Database Management System

First In First Out

Group-of-pictures

Interactive Response Time

Least Frequently Used

Least Recently Used

Least Most Relevant for Presentation

Multimedia Database Management System

Motion Pictures Experts Group

Oregon Graduate Institute

Quality of Service

Universal Database

92 Glossary

IMAGE EVALUATION TEST TARGET (QA-3)

APPLIED IMAGE. lnc 1653 East Main Street

,- Rochester. NY 14609 USA -- --= Phone: 71 614û2-0300 Fax: 71 W288-5989

continuous media support for multimediacollectionscanada.gc.ca/obj/s4/f2/dsk2/ftp01/mq31256.pdf ·...

Documents