continuous media support for multimediacollectionscanada.gc.ca/obj/s4/f2/dsk2/ftp01/mq31256.pdf ·...
TRANSCRIPT
CONTINUOUS MEDIA SUPPORT FOR MULTIMEDIA
DATABASES
-4 thesis subrnitted to the
Department of Computing and Information Science
in conformity with the requirements for
the degree of Master of Science
Queen's University
Kingston, Ontario, Canada
September 1998
Copyright @ Jun Su, 1998
National Library 1+1 o f m a d a Bibiinthèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie SeMces seMces bibliographiques
The author has granted a non- exclusive licence allowing the National Liirary of Canada to reproduce, loan, distribute or sel1 copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantid extracts fiom it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé me licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distriber ou vendre des copies de cette thèse sous la forme de rnicrofiche/nlm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thése ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Abstract
Multimedia presentations demand specific support from database management sys-
terns. The delivery of continuous media data from a database semer to multiple des-
tinations over a network presents new challenges for buffer management in a DBMS.
It has to consider specific requirements like providing for continuity of presentation or
for immediate continuation of presentation after frequent user interactions. Different
media also have specific features that must be considered.
In this thesis we present a buffer management strategy for MPEG video presenta-
tions. It supports smooth presentation of MPEG video stored in the relational DBMS
DB2/UDB, and quick response to user interactions. Experiments show that Our buffer
management strategy provides support superior to other strategies presented in the
literature. -4 framework to support cornplex multimedia presentation that is based
on DBP/UDB and its multimedia extenders is also presented.
Acknowledgment s
1 would like to thank my supervisor, Dr. Pat Martin, for his support, advice, feedback.
and above all, his patience. Without his guidance, this thesis could not have been
finished. 1 would also like to thank Gary Powley and Wendy Powley, for helping me
with my research and implementation; Rong Qiu and Hoiying Li, my good friends. for
giving me help whenever 1 needed. Finally, I would like to thank the Department of
Computing and Information Science at Queen's University for tlieir generous financial
support provided during my graduate studies.
iii
Contents
1 Introduction
1.1 Motivation for the Research . . . . . . . . . . . . . . . . . . . . . . .
1.2 Goals of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background 7
2.1 Multimedia DBMS and Buffer Management Strategy . . . . . . . . . 8
2.2 AMOS hlultimedia DBhlS at GMD-IPSI . . . . . . . . . . . . . . . . I l
. . . . . . . . . . . . . . . . . . 2.3 LeastIMost Relevant for Presentation 15
. . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 A General Mode1 17
. . . . . . . . . . . . . . 2.3.2 Replacement and Preloading S trategy 20
. . . . . . . . . . . . . . . . . 2.4 MPEG Video and -4udio Player a t OGI 21
. . . . . . . . . . . . . . . . . . . . . . 2-41 MPEG video standard 21
. . . . . . . . . . . . . . . . . . . . . . . 2.4.2 System Architecture 23
vi CONTENTS
. . . . . 2.4.3 Software Feedback for Client/Server Spchronization 25
. . . . . . . . . . . . . . . 2.4.4 Software Feedback for QoS control 27
3 System Architecture 29
. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 System Architecture 29
. . . . . . . . . . . . . . . . . . . 3.1.1 Session Manager and Client 30
. . . . . . . . . . . . . . . . . . . . . . . 3.1.2 SessionCoordinator 32
. . . . . . . . . . . 3.1.3 DB2/UDB and Its Multimedia Extenders 33
. . . . . . . . . . . . . . . . . . 3.2 Media Provider and Media Presenter 35
3.2.1 Media Provider and Media Presenter for MPEG video . . . . . 34
4 B d e r Management Strategy 43
. . . . . . . . . . . . . . . . . . . . . . . . 4.1 MPEG Video Presentation 43
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Initialization 44
. . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Buffer Manager 46
. . . . . . . . . . . . . . . . . . . . . . 4.1.3 Decoder and Presenter 49
. . . . . . . . . . . . . . . . . . . . . . . 4.2 Buffer Management Strategy 51
. . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Preloading Strategy 55
. . . . . . . . . . . . . . . . . . . . . . 4.2.2 Replacement S trategy 57
5 Performance Study 59
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Measurements 59
CONTENTS vii
5.2 Comparing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . - . - . 64
5.4 Results and Observations . . . . . . . . . . . . . . . . . - . . - . . . . 65
5.4.1 Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4.2 Interactive Response Time . . . . . . . . . . . . . . . . . . . . il
W C 5.4.3 Smoothness vs. Buffer Size . . . . . . . . . . . . . . . . . . . . i a
6 Conclusions 77
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . 78
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . . . 79
Bibliography
Glossary
Vit a
viii CONTENTS
List of Tables
2.1 Functionality of the synchronization feedback mechanism . . . . . . . 26
2.2 Functionality of the QoS control feedback mechanism . . . . . . . . . 25
. . . . . . . . . . . . . . . . . . . . . . . 5.1 MPEG Video Frames Mode1 62
5.2 SmoothnessofAMOSandOurStrategy . . . . . . . . . . . . . . . . 68
5-3 Smoothness of OGI and Our Strategy . . . . . . . . . . . . . . . . . 69
5.4 . Interaction Response Time (ms) . . . . . . . . . . . . . . . . . . . . . 72
x LIST OF TABLES
List of Figures
. . . . . . . . . . . . . . . . . . General architecture of a hlhl-DBM S 9
. . . . . . . . . . . . . . . . . Architecture of the AMOS MM-DBMS 14
. . . . . . . . . . . . . . . . . . . . . . . Example state of a data flow 16
. . . . . . . . . . An example of interaction sets viith relevance values 17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . L/MRP Algorithm 20
. . . . . . . . . . . . . . . . . . . . . . Architecture of the OGI player 23
Structure of the synchronization feedback mechanism . . . . . . . . . 26
. . . . . . . . . . . Structure of the QoS control feedback mechanism 27
. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 System Architecture 30
. . . . . . . . . . . . . . . . . . . . . . . 3.2 Session Manager and Client 31
. . . . . . . . . . . . . . . . . . 3.3 Object-Relational Database System 34
. . . . . . . . . . . . . . . . 3.4 Classification of Adaptation Mechanism 37
. . . . . . . . 3.5 Media Provider and Media Presenter for MPEG video 39
xi
xii LIST OF FIGURES
. . . . . . . . . . . . . . . . . . . 4.1 MPEG Video Presentation Process 45
. . . . . . . . . . . . . . . . . . . . . . . . 4.2 Data Flow at Client Side 46
. . . . . . . . . . . . . . . . . . . 4.3 .-ln example of MPEG video stream 53
. . . . . . . . . . . . . . . . . . . . . 4.4 Buffer Management Algorithm 54
. . . . . . . . . . . . . . . . . . . . . . . . 5.1 Smoothness Measurement 67
CI- . . . . . . . . . . . . . . . . . . . . . . . 5.2 Smoothness vs . Buffer Size r a
Chapter 1
Introduction
1.1 Motivation for the Research
Multimedia presentations present a wide range of media including audio, video, test,
images, and animation in a single presentation, and allow users to control the rate
and selection of media being played. It is one of the most important multimedia
applications. The fusion of different media into multimedia presentations provides an
opportunity to create more effective and efficient communications of ideas.
A multimedia database management system (MM-DBMS) provides the necessary
support for multimedia presentation. A MM-DBMS has the capability of storing,
managing and retrieving information on individual media, managing interrelation-
ships between the information represented by different media, and exploiting t hese
1
2 1 ntroduction
media for presentation purposes [RNL95].
Multimedia presentation demands specific support from a EVM-DBMS. They re-
quire the delivery of continuous media data from a database server to multiple des-
tinations over a network. To facilitate a hiccup-free presentation, the MM-DBbIS
must ensure that an object is present in memory before it is displayed. If the loading
rate of a media stream from disk to mernory is less than the delivery rate of the
media stream, then preloading of the stream prior to delivery is necessary to ensure
continuous presentation. Furthermore, an appropriate allocation and replacement
strategy must be provided to anticipate the dernands of delays and user interactions.
Such a strategy must rninirnize the response time of multimedia presentations while
guaranteeing that dl continuity requirements are satisfied.
Replacement strategies for conventional database applications, like LRü (Leas t
Recently Used), FIFO (First In First Out), LFU (Least Frequently Used), etc.,
([DTgO], (EH841) are not suitable for a multimedia database system. They do oot ex-
plicitly address the reference behaviour of interactive continuous data. Furthermore,
presentation scenarios can be constructed where these strategies have destructive be-
haviour. We can show this by an example. Suppose Our buffer uses the LRU strategy.
If we begin with an empty buffer with constant buffer size 15, then playing frames
1 to 20 of a video leads to a buffer state in which frames 6 to 20 are in the buffer
after the presentation h a . finished. If we next wish to play forward from frame 5 to
1.1 Motivation for the Research 3
15, then kame 5 is not present in buffer, which causes a buffer fault. LRU would
replace 6 to load 5. Now we request frame 6, and LRU would replace 7, and so on.
In this case LRU always replaces the frame that will be needed next. The reason for
this behaviour is that LRU does not consider any presentation specific informat ion.
For al1 the other general strategies similar examples of "destructive behaviour" can
be found. This behaviour is not only for some "constructed~' examples, Moser et al
[kIKK95] show the average misbehaviour of LRU in their performance investigations.
Currently many multimedia systems employ the "UseSrToss" replacement st rat-
egy [CAFSl]. Each data page is free for replacement immediately after it is presented.
The drawback of this simple strategy is that data that rnay be referenced after an in-
teraction are not kept in the buffer. For example. if the user initiates a play backmard
interaction from the play fonvard state, then al1 previously tossed da t a may have to
be reloaded into buffer, which leads to increase response times.
Most buffer management strategies developed For multimedia database si-çtems
only support generic models for continuous media, but we claim that media specific
properties should be considered. A generic buffer manager treats video frames as
independent, atomic units and the preloading and replacement strategies are based
on this assumption. In MPEG (Motion Picture Experts Group) video presentation.
frame dropping is used because the data volume involved is very large, and the systeni
may not be able to present al1 the frarnes in time. Dropping of one frame rnay affect
4 Introduction
the following frames. Thus for MPEG video the dependencies between the single
frames have to be considered.
MPEG is a widely accepted international standard. Specially, MPEG-1 plays a n
important role in multimedia applications. Some researchers [CPS95] have studied
the specific features on MPEG-1, but did not consider the impact of these features on
multimedia presentation. Other researchers [HL971 have considered the effect of the
features of MPEG-I on multimedia presentation, but could not mode1 MPEG-I data
very well. We investigate a specific buffer management strategy for MPEG-I video
presentation in this thesis.
A relational database uses tables to represent entities and uses keys to represent
relationships. An instance of an entity is represented as a row of the table. The
standard applications of a relational database range from high volume online trans-
action systems to query intensive data aarehouse applications. Multimedia data like
audio and video are stored as binary large objects (BLOBs) in a relational database.
User-defined data types and functions are used to support multimedia applications.
1.2 Goals of Research
The main goals of the research are:
To study efficient buffer management strategies for delivering MPEG video da ta
from a relational database. The strategies must also support user interaction.
1.3 Outline of Thesis 5
To implement a buf'fer management strategy and study its performance in
a client-semer environment where IBM DB2/UDB [IBAI97b] is used to store
MPEG video a t the server side.
To propose a framework to support complex multimedia presentations.
The particular problems and issues addressed in this thesis are:
How to preload video data to maintain a smooth presentation.
How to replace video data in order to quickly respond to user interactions.
1.3 Outline of Thesis
The remainder of this thesis is organized as follows:
Chapter 2 discusses the background rnaterial required for our work.
Chapter 3 describes the architecture of a system to support multimedia presen-
tation. The subsystem for MPEG video presentation is also presented.
Chapter 4 presents the buffer management strategy for MPEG video presenta-
tion.
Chapter 5 discusses the implementation of the buffer management strategy and
presents a performance study of the strategy.
6 Introduction
Chapter 6 concludes the thesis with a review of the contributions of the work
and a discussion of future research directions.
Chapter 2
Background
In this chapter, we first introduce multimedia database management system (DBMS)
and its buffer management strategy. Then we give an overview of the "AMOS Multi-
media DBMS at GMD-IPSI". We present their "teast/most relevant for presentation"
(L/MRP) buffer management strategy from which we derived our strategy for MPEG
video presentation. Finaily we present an MPEG video player developed at OGI that
provides the b a i s for Our implementation.
7
8 Background
2.1 Multimedia DBMS and Buffer Management
Strategy
A multimedia database management system (MM-DBMS) has the capability of stor-
ing, managing and retrieving information on individual media, managing interre-
lationships between the information represented by different media, and exploiting
these media for presentation purposes. The basic constituents of a MM-DBMS are
the following:
Multimedia Data Modeling. Standard datatypes are not adequate to reflect the
structure of multimedia data. New built-in datatypes like image and audio and
a notion of stream for presentation and capture purpose are needed. In addi-
tion to the datatypes, type constructors that allow one to deal with temporal
relationships are also helpful.
Content-Based Retrieval. Retrieval in rnuitimedia databases must include the
type of queries known from the field of traditional databases as well as retrieval
functionality (such as full text search) knowo from the field of information
retrieval. In the case of videos, content-based search means the ability to search
for a specific fragment of a video that starts with a given scene, or includes given
objects.
2.1 Muhimecria DBMS and B a e r Management Strategy 9
- - - - - - - data -
conuol MM-DBMS Server
manipulation object
pp -- -
MM-DBMS Client
manipulation objecu 1 7
Application 1 Pragnrn 1
Figure 2.1: General architecture of a MM-DBMS
Continuous Storage Management. To provide timely delivery, continuous data
strearns may be directed from the storage components to the consuming com-
ponent (viewer, application) bypassing other layers of the multimedia database
system. This avoids additional overhead but does not allow any further pro-
cessing (selection of portion, scafing, etc.) of the data by the database system.
Architecture. Figure 2.1 [NT921 shows a multimedia application that uses the
service of the DBMS to retrieve multimedia objects from the database, to ma-
nipulate them, to transport them over the network and finally to present them
at the user's workstation. A transport protocol that implements the continuous
flow of data along with a mechanism for continuous control is of central im-
portance for an efficient management of presentation and capture functionality
throughout the whole system.
Buffer management wit hin the multimedia database system is essential to ensure
the maintenance of the intra- and inter-strearn synchronization requirements of mul-
timedia data presentations. To facilitate a hiccup-free presentation, we must ensure
10 Background
that an object is present in memory before it is displayed. If the loading rate of a me-
dia stream from disk t o memory were less than the delivery rate of the media stream,
preloading of the stream pnor to delivery would be necessary to ensure continuous
presentation. Furthemore, an appropriate allocation and replacement strategy must
be provided to anticipate the dernands of delays and user interactions. Such a strat-
egy must minimize the response time of multimedia presentations while guaranteeing
that al1 continuity and synchronization requirements are satisfied.
Research involving buffer management in multimedia database systems is still in
its infancy [TK95]. Moser et al [MKK95] have proposed a buffer strategy termed
"least /most relevant for presentation" . This buffer strategy invest igates the effects
of such user interactions as "rewind" and "fast fonvard" on buffer design. A mech-
anism is proposed which reduces the delay after user interactions. Chaudhuri et al
[CGS95] have investigated the problem of continuously displaying composite objects
that are dynamically specified at the server level. Techniques based on simple slid-
ing and buffered sliding are proposed which support continuous display by partial
prefetching of overlapping media objects. Such an approach is preferable to the naive
strategy of prefetching the entirety of overlapped media objects. Gollapudi [GZ96]
has investigated the minimum buffering requirements that are necessary to guarantee
the continuity and synchrony of the presentation of multimedia data. A prefetching
technique that satisfies the minimum requirements has also been worked out.
2.2 AMOS Multimedia DBMS at GMD-IPSI 11
2.2 AMOS Multimedia DBMS at GMD-IPSI
The Integrated Publication and Information Systems Institute (IPSI) is one of the
eight institutes of GMD - German National Research Center for Information Tech-
nology. Research on MM-DBMS a t IPSI started at the end of the 1980's. In 1992,
the department Active Media Object Stores (AMOS) was founded to foster devel-
opments in this research area [NT92]. Currently, concepts are implemented in the
AMOS MM-DBMS prototype and are elaborated by means of international research
projects as well a s industrial projects. The accomplishments to date of the AMOS
prototype include the following: the design of multimedia data types, the modeling
of meta information, support for multimedia presentations, and the development of
an object manager for continuous objects.
AMOS allows the free composition of different media into a new multimedia prod-
uct - a multimedia presentation. Any combination of both continuous media, such
as audio, video, and text, as well as non-continuous media, such as a picture, can be
arranged in one multimedia presentation. This calls for a mode1 of a presentation that
includes defined temporal dependencies between the media, defined time intervals in
which media are presented to the user as well as media specific characteristics such
as initial playback volume of an audio.
The essential concepts of the modeling and solutions can be summarized as follows:
Spatial and Temporal Composition: The description of a presentation
reflects al1 possible temporal relationships between the media and a possibIe
spatial position and overlapping of two-dimensional media on the screen.
Time Line: The selected representation mirrors the entire temporal course of
the multimedia presentation. The solution to keeping track of a presentation is
to store information only about changes during a presentation.
Interaction Capabilities: One of the main features of a presentation on the
client's site is the user interaction in the course of a presentation. Interactions
are treated and modeled as normal media.
Coarse and Fine Synchronization: Timing requirements are subdivided into
fine and coarse synchronization. The coarse synchronization ensures that the
time line representation of the presentation is put into action. Coarse synchro-
nization relies on the mere schedule of the presentation stored in the description.
The fine synchronization, however, obeys the maximum permissible deviation
from reference media.
Presentation Parameters: Initial settings such as playback volume for an
audio, the playback speed of a video, and the like, are modeled.
A presentation is composed on a high definition level using a definition tool such
as HyTime [IS092]. The script-like description of the presentation is transferred
from the MM-DBMS semer to the client, is interpreted there, and the presentation
2.2 AMOS Multimedia DBMS at GMD-PSI 13
is shown to the user as desired. The interpreting component a t the client manages
the preparation, startup and termination of the individual media presentations that
belong to the integrated multimedia presentation.
The architecture of the AMOS MM-DBMS is shown in Figure 2.2 [RKN96]. The
VODAK DBMS [AUG95], which was also developed at GMD-IPSI, is used for the
modeling and storage of discrete data. Thus, data types of the VODAK Modeling
language (VML) are exchanged between client and server. The main tasks a t the
server are data storage, scheduling, and cont inuous ob ject management (CO hil) , which
uses extemal media servers for storage. This architecture enables easy integration of
specific hardware such as real-time servers, tape or magneto-optical jukeboxes, and
CD-ROM devices. The Continuous Transport Manager (CTM) enables access to
Internet protocols as well as Asynchronous Transfer Mode (ATM).
The "Spatial, Temporal and Interaction Script" Interpreter (STI) interprets the
time line-based scripts (as generated from a VML schema) and triggers the Multi-
media Playout Manager (MPM). This module is responsible for the handling of the
client's environment, the synchronized playback, and the scheduling of concurrent
requests. The MPM controls single media presenters (SPMs) which are implemented
by utilizing libraries available for the client's platform, for example, a virtual audio
device, a motion JPEG player, and an animation tool. -4 SPM get its da ta from ,
the COM. When the application asks for time-dependent data, this request is sent to
14 Background
1 VODAK remote API Continuws Transp. Mgr. 1 1 l Controt Data ; 1 Continuous Data
VML SchemdApplication
ontinuous Database
Figure 2.2: Architecture of the AMOS MM-DBMS
the VODAK DBMS via the VODAK remote API. Time-dependent data are sent via
COM to the application. The COM initiates asychronous replacement and loading
of distributed continuous data and adaptation in case of semer and network delays.
In contrast to traditional object managers, the delivery of data like audio and
video is "just in time". Object management enables an application to access data by
loading these objects from disk into a main memory buffer and by replacing objects
no longer needed. Updated objects are written back to disk. In a client/server based
2.3 Least/Most Relevant for Presentation 15
environment, objects must be transported in addition to the main memory of the
requesting client. Because of their large volume, time-dependent data cannot be
entirely loaded into the buffer. These objects are therefore decomposed into small
blocks (for example audio sequences, frames) which are loaded and replaced in the
buffer. Loading and replacement of objects is done by a strategy, called Least/Most
Relevant for Presentation (L/MRP) [MKK95] which is explained in more detail in
next section.
2.3 Least /Most Relevant for Presentation
Least/Most Relevant for Presentation (L/MRP) is a buffer management strategy that
considers specific requirements such as continuity of presentations and immediate
continuation of presentations after frequent user interactions. It is especially sui table
for supporting highly interactive multimedia presentation.
To explain the general idea of L/MRP, we use the presentation snapshot illus-
trated in Figure 2.3. Since the continuous objects are too large to be stored in the
client buffer as a whole, they are segmented into a sequence of manipulation units,
called Continuous O bject Presentation Units (COPU). Single COPUs are requested
continuously by the buffer manager. Al1 COPUs of a continuous object are indexed
from O to ri-1, where n denotes the total number of COPUs. We denote the direction
and skip parameter of a presentation by a single signed skip value. A positive value
16 Background
indicates the fonvard direction and a negative value indicates the backward direction.
The absolute value of the skip value denotes the number of COPUs to be skipped. In
Figure 2.3, the skip value is +2 which means the presentation rnoves forward and one
of every two frames are skipped. We can also identih three different COPU types
in Figure 2.3: COPUs located in the reverse direction (History), COPUs to be refer-
enced in future (Referenced) and COPUs to be skipped because the absolute value of
the skip value is great than one (Skip) . LIEVIRP introduces the notion of a relevance
function that assigns a value to every element of a set denoting its significance for
replacement and preloading. L/MRP makes use of the relevance values in the way
that least relevant COPUs are replaced and most relevant COPUs will be preloaded.
The relevance value of a COPU also depends on specific presentation parameters like
the number of the currently presented COPUs.
current presentation point p presentation direction -
COPU in reverse COPU to be referenced
COPUtobe skipped
Figure 2.3: Example state of a data flow
Figure 2.4 shows the sets of History, Referenced and Skip COPUs for the given
presentation point (508) and a skip value of +2. Each COPU, identified by the
index on the x-axis, is associated with a relevance value refiecting the importance of
2.3 Least/Most Relevant for Presentation 17
the COPU with respect to the specific interaction set. Let us assume that due to
some previous ongoing presentations, the COPUs with underlined numbers are in the
buffer. The least relevant COPUs, that is the COPUs with the lowest relevance value,
a t this moment are 527, 525, 523, 528, 521,500, 526, etc.. Thus, the next replacement
candidates are 527, 523, 500, etc.. The most relevant COPUs for presentation are
508, 510, 512, 514, etc.. Thus, the next preloaded COPUs will be 510, 514, etc..
relevance
1
4 a a a
0 : . a a 0.9 u I w
TS 0 Y m -
O 1 a O
0 History Referenced O Skip
Figure 2.4: .4n example of interaction sets with relevance values
2.3.1 A General Mode1
Let CO (continuous object) denote the sequence of al1 COPUs that constitute a
continuous object. An element ci, i = O,. . . , ICOI - 1, denotes the COPU with index
i within the continuous object CO. The state of a presentation is characterized by a
18 Background
tuple s =< p, skip >, with p E {O,. . . , ICOI - 1) denoting the index of the COPü
at the current presentation point and skip E 2, skip # O, denoting the skip value.
COPUs are related to one or more interaction sets A,. Each set A, has an associated
criteria that is used to decide whether or not a COPU belongs to the set at a specific
point of a presentation, and to specify the relevance value of the COPU with respect
to S. Hence, an interaction set A, is defined as a binary relation relating a COPU to a
relevance value. For example the interaction set Re f erenced, with s =< 508, +2 >.
as visualized in Figure 2.4, is:
To denote the relevance of a COPU within a given interaction set A,, a distance
relevance junction d., is defined. The domain of a distance relevance function is
relative distance of individual COP Us to the current presentation point. Function
d,, map distances to values in [O, 11:
d,, (2) is the relevance value of a COPU with distance i to any possible presectation
point p. The distance relevance values describe the degree of importance to keep
COPUs in buffer. For example, the distance function &R,,,R,,,,, for al1 future
referenced COPUs describes the degree of importance to keep specific COPUs in the
buffer because of the high probability to be accessed in the current presentation. X
2.3 LeadMost Relevant for Preseztation 19
distance relevaace function value of 1 means that the COPU is most relevant for
presentation, and, hence, is not to be considered as a candidate for replacement, but
has to be preloaded by L/MRP, if necessa-
The formal definition of the interaction set As for a presentation state s is:
A, = {((cj, dr, (i)) lcj E CO, j = g ( i , '11 2 E NO)
The index j of a COPU to be considered in A, is determined by a function g which
depends on the distance of the COPUs i and the current state S. The relevance value
for a COPU cj is determined by the distance relevance function d,,.
To compare the relevances of COPUS with respect to the whole continuous object
we introduce the relevance functzon TA^ for an interaction set As:
The relevance function can be obtained by projection on the second position of the
respective interaction set, if a COPU is considered there; othenvise a value of zero is
assigned.
The interaction sets Re f er enced, , Skip, and H Z S ~ O T Y ~ are defined as follows:
skip Skip, = { ( c j l &,,ip(i))l~j E CO, Iskipl > 1, i E No, j = p + - (1 + i + 1 l skzpl
J } Iskipl - 1 s kip
Histmy, = {(cj, drHi,,,,,(i))) lcj E CO, i E No, ,j = p - (i + 1) . -) lskzpl
20 Background
2.3.2 Replacement and Preloading Strategy
The L/MRP algorithm is initiated by the COM at every request to reference a COPU
for presentation. The next replacement victim during a presentation is the COPU c
available in the buffer with the minimum value rco(c ) The COPU d with maximum
value rco(d) is the next COPU that has to be preloaded if it is not yet present in
buffer. The algorithm is given in Figure 2.5, where BUFFER denotes the set of
COPUS present in the buffer. GetNextCopuToBePresented(s) : Pointer to COPU begin (1) for al1 c in CO with relevant value = 1 do
begin // preload the most relevant COPUs if c not in BüFFER then Il buffer fault begin
if I BUFFER I = duffer-Size> then /I buffer full begin
Il replace the least relevant COPU v fmd v in BWFFER with the least relevant value replace v by preloading c
end else
Il just load c into buffer preioad c into BUFFER
end end return buffer address of COPU c
end
Figure 2.5: L/MRP Algorithm
The algorithm guarantees that the next COPU to be presented is in buffer. In
statement (1) of the algorithm, for a presentation state s, the COPUs checked are
2.4 MPEG Video and Audio Player at OGI 21
(ci, r ) E Re f erenced,, where i = p+ h-skzp with h = 0,1,2, . . . , f while 1 denotes the
number of COPUs t o be prefetched. In statement (2) the distance relevance functions
are used to compute the relevance values for COPUs in the BUFFER.
Our buffer management strategy is derived from L/EVIRP. We extend it with spe-
cific features to better support MPEG video presentation.
2.4 MPEG Video and Audio Player at OGI
Shanwei Cen et al a t Oregon Graduate Institute of Science and Technology have
designed and implemented a distributed, real-time MPEG video and audio player
[CPS95]. The player is designed for use across the Internet, a shared environment
with variable traffic and with great diversity in network bandwidth and host process-
ing speed. They used a toolkit approach to build software feedback mechanisms for
clientIsemer synchronization, dynamic Quality-of-Service control. and system a d a p
t iveness .
2.4.1 MPEG video standard
The MPEG video compression algorithm [LEGS11 relies on two basic techniques:
block-based motion compensation for the reduction of the temporal redundancy, and
transform domain-based compression (DCT) for the reduction of spatial redundancy.
The idea of motion compensation is to encode a video frame based on other video
22 Background
frames temporally close to it. Typicdly, the image in a video Stream does not differ
too much within small time intervals.
In MPEG-1 [LEGSl], three types of frames (pictures) are used to encode video:
Intra- (1-), Predicted (P-) and Bi-directional (B-) frames. Intra-frames are encoded
independently, without reference to any p s t or future frames. Predicted frames
are encoded in relation to a past reference frame namely an 1- or P- frame. Bi-
directional frames are encoded relative to both preceding and following reference
frames. -4 video sequence is composed of a sequence of groupof-pictures (GoP),
each GoP contains a frame sequence with a fked pattern, such as IBBPBBPBBPBB.
The GoP structure enables random access within a sequence. Usually a GoP is an
independently decodable unit that can be of any size. A Group of Pictures is closed,
if the frarnes have no references to other GoPs, and open otherwise. To playback
an MPEG-video, a t least the GoP information and independent 1 frarnes have to be
available. Different MPEG-I videos may have different groupof-pictures patterns.
But they al1 obey the sarne rules. The bIPEG-1 video we used in our study has
groupof-pictures pattern IBBPBBPBBPBB. But o h algorithm works well for any
other patterns.
The ISO standard MPEG-2, established in 1994, is designed to produce higher
quality movies at higher bit rates. The concept is similar to MPEG-1, but includes
extensions to cover a wide range of applications. The primary application aimed at
2.4 MPEG Video and Audio Player at CIGI 23
during the MPEG-2 definition process was dl digital transmission of broadcast TV
qudity video at coded bit rates between 4 and 9 Mbps. The most significant enhance-
ment over MPEG-1 is the addition of syntax for the efficient coding of interlaced video
[MPEâ]. Other key features of MPEG-2 are the scalable extensions which permit the
division of a continuous video signal into two or more coded bit streams representing
the video a t different resolutions, picture quality, or picture rates.
2.4.2 System Architecture
Figure 2.6 shows the architecture of the player. The player has five components: a
video server ( VS), an audio server ( A S ) , a client, and video and audio output devices.
VS manages video streams. -4s manages audio streams. The client is composed of
a video decoder and a controller which controls the playback of both video audio
streams and provides a user-interface. The client, VS and AS reside on different
hosts, communicating via a network.
........................................... Feedback
Video Stream
Controller
Audio Stream Client ...........................................
Figure 2.6: Architecture of the OGI player
24 Background
A program for the player is a video and audio Stream pair: <video-host : video-
file, audio-host: audio file>, where a video stream is a sequence of frames, and an
audio stream is a sequence of sarnples. These two strearns are recorded strictly syn-
chronously. We refer to a contiguous subsequence of audio sarnples corresponding
to a video frame as an audio block. Therefore, there is a one-to-one correspondence
between video frames and audio blocks.
Dunng playback of a program, VS and AS retrieve the video and audio streams
from their storage and send them to the client a t a specified speed. The client buffers
the streams to remove network jitter, decodes video frames, resamples audio, and
plays them to video and audio output devices respectively.
Programs can be played back at variable speed. Play speed is specified in terms
of frarnes-per-second (fps). The player plays a program in real-time by mapping
its logical time (defined by sequence numbers for each frame/block) into system time
(real time, in seconds) on the client's host machine. Suppose the system time a t which
frame(i) is displayed is Ti, and the current play speed is P fps, then the time a t which
frame(i+l) is played is T,+I = Ti+$. VS and AS also map the program's logical time
into their own system time during the retrieval of the media streams. Synchronizatiori
between audio and video streams is maintained at the client by playing audio blocks
and displaying video frames with the same sequence number a t the same time.
If any stage of the video pipeline, from VS through the network and client buffer
2.4 MPEG Video and Audio Player at OGI 25
to the decoder, does not have sufficient resources to support the current quality-of-
seMce (QoS) specification it can decide independently to drop frames. The controller
of the client also drops Iater frames (frarnes which arrive after their display tirne). A
similar approach is implemented for the audio pipeline.
The user QoS specification is currently restricted to display frame rate. The
display frame rate is the number of frarnes-per-second displayed by the client. A
valid display frame rate is always equal to, or lower than, the current play speed.
There are a nurnber of serious problems to be solved in this system. These prob-
lems include client /semer clock drift, insufficient effective bandwid t h to meet the
user-specified QoS, and stalls and skips in the pipeline. 4 software feedback mech-
anism was adopted to solve these problems. A feedback mechanism monitors the
output or internal state of the system under control, compares it to the goal specifi-
cation, and feeds the difference back to adjust the behavior of the system itself.
2.4.3 Software Feedback for Client/Server Synchronization
The synchronization mechanism is implemented in the client, as show^ in Figure 2.7.
It measures the current client time, Tc, and the server time, T,, as observed a t the
client, and computes the raw server work ahead time, T,,, = Ts - Tc. T,,,, is input
to a low-pass filter, FI, to eliminate high frequency jitter and get the server work
ahead time, Tm. The control algorithm then compares Tswa with the target server
26 Background
1 Event 1 Feedback Action 1
- -- --
Table 2.1: Functionality of the synchronization feedback mechanism
1
work ahead time, Ttswa, and takes action accordingiy.
Tsma too low Tswa < !jT'wa Tm, too high Tsw, > $Ttswa
Ttswo ~ O O IOW Ttswa < aK x Jnet Ttswa tao h i ~ h Ttsura > K X Jnet
_._.__._**. . . . ._. - f . . . . . f f . * f . . . System under control ;
Speed up Cs rate or skip Cs Slow down Cs rate or stall Cs
Double Ttswa Halve Ttswa
Figure 2.7: Structure of the synchronization feedback mechanism
Tt,,. in turn is determined by the current network delay jitter level. The jitter
of the measured current semer work ahead tirne, (Trswa - Tswa(, is fed to another
low-pass filter, F2, to get the network delay jitter, Jnet. Jnet is then used to compute
Table 2.1 describes the functionality of the synchronization feedback mechanism.
Cs refers to the VS clock, and K > O is a constant. Whenever the control algorithm
detects that T,,, has deviated too far from Ttswa, it adjusts the VS clock rate by
skipping it or stalling it for a certain amount of time, to bring Tswa back to Tt,,,.
2.4 MPEG Video and Audio Player at OGI 27
Each time the VS clock is adjusted, the mechanism backs off for a certain amount of
time to let the effect of the adjustment propagate back to the feedback signal input.
2.4.4 Software Feedback for QoS control
The QoS control feedback mechanism is d s o implemented in the client, as shown
in Figure 2.8. Initially, the target frame rate, Ft, at which VS sends frames is set
to the user-specified frame rate, Fu. The feedback mechanism monitors the display
frame rate at the client and uses a low-pass filter to remove transient noise. The
filtered display frame rate, Fd, is then compared against Fu and the existing Ft by
the control algorithm. If the pipeline is found to be under or over loaded, a new Ft
value is computed and fed back to VS.
. . . . . * f . . . * . - - . . . _ . . _ _ _ . . _ . . _ _ . . : System under control ;
Display
fnme rate
Figure 2.8: Structure of the QoS control feedback mechanism
The control algorithm adjusts Ft linearly. The functionality of the feedback mech-
anism is described in Table 2.2. Ti , Th and 4 are three parameters: low and high
thresholds and adjust step, where > O, Th > O, A > O and Th - Ti > A. These
parameters, as well as the back-off time after a feedback action, are respecialized
28 Background
Table 2.2: Functiondity of the QoS control feedback mechanisrn
1 Event Pipeline over-loaded Fd < Ft - Th
Pipeline under-loaded Fd > Ft - and Fd < Fu
upon play speed change. The back-off time is also adapted to T,,. measured in the
Feedback Action Ft = Ft - A
Ft = h.lin(Ft + A, Fd)
synchronization feedback mechanism.
Chapter 3
System Architecture
This chapter presents the architecture of Our framework to support multimedia appli-
cation and provides an overview of the various components that make up the system.
We also present the design of a subsystem to support MPEG video presentation. Our
buffer management strategy is implemented and tested in this subsystem.
3.1 System Architecture
The system architecture is illustrated in Figure 3.1. Rilultimedia data is stored in
DBâ/UDB and accessed using its multimedia extenders. The system has a set of
Session Managers and Clients, which are under control of a Central Coordinator. A
Session Manager - Client pair is created for each application, which contains multiple
29
30 System Architecture
media streams and deds with a specific multimedia application, such as Video-on-
Demand, News-on-Demand, Media Editing Workbenches, and so on. The number of
Session Manager - Client pairs is limited only by the amount of system resources. The
function and configuration of each Session Manager - Client pair are specific to the
media they support. The system, therefore can be extended to support new media
types and new multimedia applications.
Network
1 Session Manager Session Manager I I 1 . . . 1 Session Manager I
1 ' Session coordinator 1 1 I I DBîfüDB 1 MM Extenders 1 1
Figure 3.1 : System Architecture
3.1.1 Session Manager and Client
The Session Manager provides real-time retrieval of multimedia data from the database
and transfer of the data to the Client over a network. The Client is responsible for re-
questing data from the Session Manager and delivering it to the presentation devices
3.1 System Architecture 31
on the client workstation.
A multimedia application may involve multiple media streams. For example. a
Video-on-Demand application may need one video stream, one audio strearn and one
text stream. Thus a Session Manager should be able to support data caching and
scheduling of multiple media streams, and the Client should be able to synchronize
multiple media streams. The configuration of a Session Manager - Client pair is shown
in Figure 3.2.
I I I I Presentation Coordinator I
1 I 1 I I I
Media Presenter
71 [Medial I-EZZJ i I Media Prowder
Network
I 1 I 1 I t I t Media Coordinator I l I
I
Figure 3.2: Session Manager and Client
A Session Manager contains one or more Media Provzders which are under con-
trol of a Media Coordznator. A Media Provider deals with data retrieval, caching and
transfer of a single media stream such as video, audio or animation. The number of
32 System Architecture
Media Providers is detennined by the application. The Media Coordinator coordi-
nates the work of multiple Media Providers. For example, if multiple Media Providers
contend for system resources, then the Media Coordinator must provide some control
mechanism to ensure al1 of the providers are satisfied.
A Client contains one or more Media Presenters which are under control of a
Presentation Coordinator. A Media Presenter works cooperatively with a Media
Provider to deal with data requests, caching and real-time presentation of a single
media Stream. The Presentation Coordinat or coordinat es the presentation of mu1 tiple
bf edia Presenters.
3.1.2 Session Coordinat or
The Session Coordinator plays multiple roles in this systern including Admission
Control, Resovrce Administration, Media Sharing and Batching. In its Admission
Control role, the Session Coordinator determinates whether to accept or refuse a
request from a user according to the system resources usage. If a request is accepted.
a Session Manager - Client pair is created to handle the application. In its Resource
-4dministration role, the Session Coordinator distributes system resources among al1
the Session Managers to meet their requirements. Media Sharing [KPTSB] is such
a technique where buffers that have been played back by a user are preserved in
a controlled fashion for use by subsequent users requesting the same data. This
3.1 System Architecture 33
technique can be used when sufficient buffer space is available a t the semer side to
retain data for the required duration. Through this way the system can avoid fetching
the data from the disk again for the late lagging user, and it is possible to support a
larger number of sessions than permitted by the disk bandwidth. Batching [DSS94]
is another technique for improving the performance of a system by grouping requests
that arrive for the same topic within a short duration of time.
3.1.3 DBP/UDB and Its Multimedia Extenders
An object-oriented database management system (00-DBMS) is a much more natural
basis than a relational DBhlS for implementing the functions necessary to manage
multimedia data. However, 00-DBEVISs have not had a significant impact in the
database market. Two reasons for this lack of impact are that most of the current
00-DBMSs lack maturity as database systems and that they are not sufficiently
compatible with relational DBMSs [INF97b].
Leading enterprise DBMS vendors are offering a new kind of database system - an
object-relational database system (OR-DBMS), which combines the best features of
00-DBMSs and relational DBMSs. Figure 3.3 shows the relationships between these
DBMSs [INF97b]. IBM DB2/UDB is such an OR-DBMS that can provide better
support for multimedia data.
34 Systern Architecture
Set-Based
Nonset-Based
1
I I
ReIational Object-Relational
DBMS b DBMS I
I HierarchicaVNetwork I Object-Oriented
DBMS I I
DBMS
I I Simple data Complex data
Figure 3.3: Object-Relational Database System
Based on the object relationai facilities introduced by DB?/UDB, a set of mul-
timedia extenders were created by IBM to facilitate the development of multimedia
applications [IBM97a]. An extender encapsulates the attributes, structure and be-
havior of new data types and stores them in a column of a DB2/UDB table, so
that they can be processed through the SQL language as a natural addition to the
standard set of DB2/UDB data types. Currently there are four kinds of extenders
available, namely a Text Extender, a Image Extender, a Video Extender and an Au-
dio Extender. These extenders provide powerful support for text, image, video and
audio, respectively. For example, the Text Extender encapsulates IBM's full-text
search technology that supports synonym search, proxirnity search, Boolean search,
and wildcard search [IBM97a]. The Image Extender can look for a image that has
3.2 Media Provider and Media Presenter 35
a particular color or pattern by using IBM1s Query by Image Content (QBIC) tech-
nology [NBE93]. A variety of multimedia formats are also supported for each typeo
such as TIF, GIF and BMP for image, WAVE and MIDI for audio, and MPEG, N I
and QuickTime for video. We make use of these multimedia extenders to manage
multimedia data in Our system.
3.2 Media Provider and Media Presenter
The Media Provider and Media Presenter are the basic units of our system. -4 Media
Provider - Media Presenter pair supports a single media presentation. The combina-
tion of multiple Media Provider - Media Presenter pairs can support a wide range of
multimedia applications. The Media Provider and Media Presenter are implemented
with threads in order to make efficient use of system resources.
A specific Media Provider and Media Presenter must be developed for each dif-
ferent media strearn, and the specific features and requirements of each media should
be considered. For example, we must develop a specific Media Provider and >Ie-
dia Presenter for each different video format, such as MPEG, Motion-JPEG, AVI,
QuickTime, and so on.
Two technical problems that must be considered are the adaptation technology,
and bufer management. The adaptation technology [HKR97] tries to dynamically
change the quality of a presentation after the system has detected bottlenecks in
36 System Architecture
data delivery. Reducing presentation quality leads to a reduced data volume to be
transported frorn the server t o the client and the presentation expects to keep up its
intra-media synchronization by reducing disk utilization, memory consumpt ion and
used network bandwidth. In general, adaptation strategies have to be invoked, if it can
be foreseen that a mnning presentation cannot keep up intra-rnedia synchronization
assuming that the resource consumption remains constant.
Adaptation techniques can be classified along two dimensions (Figure 3.4): (1)
the method used for data reduction and (2) the effect the adaptation ha. on the
presentation. In field one, a video stream is adapted by dropping single frames. The
synchronization requirements are met by presenting the previously presented frame
each time the presentation detects a dropped frame in the stream. In field two, the
rate of presented COPUs is reduced by switching to another stream. For esample.
the rate of samples presented in an audio stream can be reduced by switching to an
audio stream of a lower sampling rate. Field three uses switching to another data
stream for reducing the quality of single COPUs, but keep the original display rate.
Finally, field four shows how the dropping of COPUs could lead to a reduction of the
quality of single COPUs. If the raw information of a continuous object is stored in
more than one stream, the COPUs of the basic stream can be loaded first. If the
system has enough time to load other streams, the next incremental stream can be
transferred to the client. Otherwise, the incremental streams are dropped and the
3.2 Media Provider and Media Presenter 37
COPU quality will not be increased.
Ahptation Dimension
Time
Resolution
Frame Dropping (Video)
(1)
Method Used Within a Continuous Switch between
Long Fields Continuous Long Field
Reduction of Sarnpling Rate (Audio)
(2 )
Dropping of Enhanced Layers (AudioNideo)
(4)
Figure 3.4: Classification of Adaptation bleclianism
Quality Switching ( AudioNideo)
(3)
The buffer management strategy [GZ96] is another critical technique. At the
server side, it can remove database delay jitter and improve the systern performance.
At the client side, it can remove network delay jitter and rninimize user interaction
response time while guaranteeing that d l continuity and synchronization requirements
are satisfied.
3.2.1 Media Provider and Media Presenter for MPEG video
We have developed a Media Provider and Media Presenter to support continuous and
interactive MPEG- 1 video presentations. The design and implement at ion considers
the following features:
Continuity: The MPEG video must be presented continuously a t a constant
speed, such as 25 frarnes per second (fps).
38 System Architecture
0 High data volume: The data volume involved in the presentation is very
high; approxirnately 1.5 Mbps for MPEG-1 video, and between 4 and 9 Mbps
for MPEG-2 video a t the play speed of 25 fps.
Frame dependency: The decoding of some frames of a MPEG video depend
on some other frames.
User interaction: Users may pause/stop the play, change the play speed , or
change the play direction during the presentation. The system should respond
to user interactions as soon as possible. . . '
To achieve these goals, MPEG video rnust be retrieved from the database, sent
over the network, decoded a t the client side, and delivered to the presentation devices
(speaker and display) at a constant speed (for example 25 fps). Network bandwidth
and database bandwidth are essential to support the high data volume, othenvise
adaptation strategies must be used. -4n efficient buffer management strategy is the
key to supporting continuity, frame dependency and user interactions.
The architecture of the Media Provider - Media Presenter pair for MPEG-1 video
is illustrated in Figure 3.5. It is a client-semer architecture. The Media Provider acts
as the server and is composed of a Buffer Manager and a Communication Manager.
The Buffer Manager retrieves video data from database and transfers it to the Com-
munication Manager. It provides buffer management strategies to srnoot h database
delay jitter and reduce user interaction response time. The main strategy to deal with
3.2 Media Provider and Media Presenter 39
user interaction is implemented at the client side, but the cooperation of the server
is necessary. The Communication Manager is responsible for receiving requests from
the client and sending back data over the network.
Media Presenter f
Network
Media Provider 1
Figure 3.5: Media Provider and Media Presenter for MPEG video
The Media Presenter acts as the client and is composed of a Buffer Manager,
a Communicatzon Manger and a Video Decoder. The Buffer Manager at the client
side is the engine of the whole system. A special buffer management strategy is
implemented that retrieves video data from the server, buffers the video data in order
to smooth network and decoder delay jitter, resolves frame dependency, and rnakes
decisions to load, skip or replace data in the buffer. The Communication Manager
sends requests to the server and receives data from it. The Video Decoder gets
data from the Buffer Manager, decodes it and passes the decoded pictures to the
40 System Architecture
application for display. The UDP protocol is used to transfer video data from the
server to the client, and the TCP/IP protocol is used to transfer control data between
client and server. The Communication Manager can also be easily modified to work
over an -4TM network.
The irnplementation is based on a client-pull architecture where continuous data
is passed to the client with a best-effort delivery. In a client-pull architecture, clients
request data from the server at the time it is needed. The advantage of the client-
pull architecture is that the client can react to user interactions and to performance
bottlenecks in order to keep its presentation intra-media synchronized. Some delay
time is needed for the client to send requests to the server and wait for the server to
respond. So a11 the requests should be sent in advance to account for the delay.
In a semer-push architecture, once a session is started data are retrieved by the
server and transmitted continuously to a client wit hout any intermediate client re-
quests. It is difficult to handle user interactions in such system.
The best-effort approach is the simplest realization of multimedia presentations in
an open distributed environment. Each application is allowed to start a presentation
at any point in time without exclusively allocating resources. The drawback is that
no guarantee for the timeless of data delivery can be given and the temporary load
peaks may delay the presentations.
3.2 Media Provider and Media Presenter 41
An alternative approach is resource reseruation, which means that resources in-
volved in the presentation, like processor, disk, and network, are requested by the
client and dedicated to the presenter at presentation time. The drawback of this
approach is that the resources have to be exclusively controlled, which is unrealistic
for commonly used open distributed environments like the Internet and non-real-time
operating systems. Another disadvantage is that resources are wasted if a user pauses
or switches to slower presentation speeds, which may happen very often in Our system.
We have therefore adopted the best-effort strategy. In order to keep up the intra-
media synchronization of a presentation in a best-effort system under varying amilable
resources, adaptation techniques have to be used. A simple adaptation technique is
used in our system where the frames are dropped selectively in case the system can
not afford the real-time presentation.
42 System Architecture
Chapter 4
Buffer Management Strategy
In this chapter we present our buffer management strategy to support interactive
MPEG video presentation. ÇVe first introduce the workflow of a MPEG video presen-
tation and then present the buffer management strategy.
4.1 MPEG Video Presentation
. *
We discussed in section 2.4.1 how the group of pictures (GoP) structure enables
random access within a MPEG video stream. We assign a sequential number, called
a GoP number, to each GoP. A sequential number is also assigned to each frame and
is denoted as a hame number. Alternatively a frame can be referred to using a frame
position, which is a GoP number and the frame's relative frame position within the
GoP.
44 BufFer Management S trategy
Since the frame size of MPEG video is variable, it is not easy to randomly ac-
cess the frames. We have enhanced the hinctionality of the DBZ/UDB video es-
tender to support random access of MPEG video frames. When a video is stored
into DB2/UDB, information about the video is extracted by the DBZ/UDB video
extender and stored with the video. We additionally check the video type and if it
is a MPEG video, we extract extra information including the number of GoPs, the
relative position of each GoP and the GoP pattern. This information about the GoP
makes it possible to randomly access MPEG video frames.
A frame is the basic data unit in our system. Client requests one frame a t a tirne!
and the server sends one frame a t a time. By doing so the system can avoid wasting
network bandwidth in a highly interactive presentation environment. The default is
that the client must send requests for each frame it wants. But this would not waste
too much network bandwidth and CPU resources because the data size of request is
very small.
4.11 Initialization
The workflow of a EvIPEG video presentation in Our system proceeds as depicted in
Figure 4.1:
The client initializes al1 its processes first. Then it sends a request to the server
to initiate a presentation. The server accepts the request if it does not exceed its
4.1 MPEG Video Presentation 45
reject
reques t
notify
Client
accept
Semer I I
listen
Client Session Manager
Figure 4.1: MPEG Video Presentation Process
I I
i f m e request
maximum number of concurrent sessions. If the server accepts the request, then it
Client -
initializes a new session to serve the client and sends a positive response to the client.
Session Manager
The client may then begin the presentation. The client first retrieves N GoPs into
frarne
its local buffer. The number N represents the number of GoPs that can be delivered
in the amount of time it takes to retrieve a frame from the server. In our case, iV is
4 and it takes just under 2 seconds to retrieve a frame. Thus, in the absence of user
interactions, the client can continuously retrieve frames from the server and the next
frames to be presented are always loaded in the local buffer.
To retrieve a frame the client sends a request to the server that specifies the frame
number. The semer retneves the desired frame from its local buffer. If the desired
frame is not in its local buffer, the server retrieves it frorn the database. Each time
46 BufEer Management Strategy
the server goes to the database it retrieves a block of video data that includes the
desired frame. The server then delivers the frame to the client over network.
The three components a t the client side, namely the Buffer Manager, Decoder and
Presenter, work independently to process the sarne data flow as shown in Figure 4.2.
decoded frames Buffer frarnes frames
O Manager Decoder Presenter
Figure 4.2: Data Flow a t Client Side
4.1.2 Buffer Manager
The Buffer Manager retrieves new frames from the server into its local buffer. The
Buffer Manager deals with two issues: the loading strategy and reacting to user
interactions. The first issue, that is the loading strategy, chooses the next frarne to
be loaded. The Buffer Manager should always load the frame first that will have the
greatest effect on the continuity of the presentation.
If there are already enough frames in local buffer to ensure continuous presen-
tation, that is more than N GoPs, the Buffer Manager simply loads the foliowing
frames in sequence. If there are not sufficient frames, the frames are loaded according
to their priorities. Within one GoP, the 1- frame has the highest priority, the P-
frames have the second highest priority, and the B- frarnes have the lowest priority.
4.1 MPEG Video Presentation 47
The priority of a frame is also determined by its distance from the frame that is
currently being presented, which we cal1 the presentation point. The nearer a frame
is to the presentation point, the higher its priority since it has a higher probability of
being needed earlier. We assign a distance factor to each GoP denoting its distance
to the presentation point. The priority of a frarne therefore is decided by its priority
within one GoP and the distance factor of its GoP. For example, suppose we assign
the priorities of the frarnes within one GoP as follows (for simplicity the consecutive
B- frames are assigned the same pnonty):
I B B P B B P B B
1 0.7 0.7 0.9 0.6 0.6 0.8 0.5 0.5
The equations to determine these priorities will be given later.
We assume there are three GoPs - GoPl, GoP2 and GoP3 - where GoPl is the
nearest to the presentation point, and GoP3 is the farthest to the presentation point.
We assign distance factors to these three GoPs as follows:
The equations to determine these factors will be given later.
Therefore, if GoPi is already loaded, the priorities for al1 the frames of GoP2 and
GoP3 are computed as follows:
48 Buffer Management Strategy
GoP2 GoP3
I B-B P B-B P B-B I B-B P B-B P B-B
0.9 0.63 0.81 0.54 0.72 0.45 0.8 0.56 0.72 0.48 0.64 0.4
The loading order of the frames of GoP2 and GoP3 is determined by the priorities.
The second issue the Buffer Manager must deal with is the problem of minimizing
the response time to user interactions, where a user interaction refers to the change of
presentation states. The common presentation states are normal play (PLAY), fast
forward play (FF), fast backward play (FB), reposition (JUMP), pause and resurne.
In fast play mode (FF and FB), only E frames and P- frarnes are presented; thus the
play speed is three times faster than the normal play. One can also choose to only
play 1- frames so the play speed is nine times faster the normal play.
The loading strategy does not change for FF play. The B- frames are still loaded
though they are not presented. They are loaded in case the user changes play state
from FF to normal play so the loaded B- frames can be used immediately. Thus
the response time for this interaction can be very small. This strategy increases the
demand on the system, but does not jeopardize the presentation. If the system could
not afford to load al1 the data, the B- frames can be discarded first by the buffer
manager.
The Buffer Manager also provides a strategy to deal with FB play. Presented
frames are preserved in the local buffer for the amount of time i t takes to retrieve
4.1 MPEG Video Presentation 49
a f r m e fiom the server before they are tossed. If the user chooses FB play, these
preserved frames can be used immediately without any delay. At the same time the
loading engine can begin to load frames in the reverse direction. Thus the response
time for a FB user interaction is small.
We do not have an efficient strategy for JUMP because it is hard to predict
the new positim. Whenever the user chooses this operation, we can only load the
needed frames and reconstruct the local buffer at that time, thus the response time
is relatively large compared to the other operations. An alternative way is to define
some working points [MKK95], and then restrict the user to only jump to one of
these working points. Some frames for these working points are loaded in advance
to reduce the response time. When user chooses the pause operation, the Buffer
Manager continues to work until the local buffer is full.
4.1.3 Decoder and Presenter
The Decoder is in the middle of the workflow. It decodes the video data loaded
by the buffer manager, and stores it in another buffer pool that is accessed by the
Presenter. The Presenter presents the video frames in real-tirne. The Decoder and
Buffer Manager share the same buffer pool, which is protected by a semaphore. Sim-
ilarly the buffer pool shared by the Decoder and the Presenter is d s o controlled by a
semaphore.
50 BuEer Management S trategy
We do not explain the decoding process of a single frarne here. More details can
be found in D. Gall [LEGSl]. We instead introduce the decoding process for a GoP,
whose pattern is IB1B2PLB3B4P2BJB6:
O The 1 frame is decoded first;
Then frarne Pl is decoded because it only depends on 1 frame;
Then frames BI and B2 are decoded because their reference frames I and Pl
are available;
Then frame P2 is decoded which depends on Pl only;
Then frarnes B3 and Bq are decoded;
O Then the I frame I2 of the next GoP is decoded;
Finally frames B5 and B6 are decoded;
So the decoding order of a GoP is IPIBl BZP2B3B412&B6.
The Presenter can not present the frames in the decoded order so it must sort the
frames according to their original order. To facilitate this sorting task, the Decoder
must work ahead of the Presenter so that the Presenter has enough time to sort the
decoded frarnes. Otherwise the frames may be discarded due to the wrong order.
The Decoder can decide to drop frames in the following situations:
4.2 BufEer Management Strategy 51
The frame is too late, for example it should be presented before or immediately
after the fraxne that is being presented;
The reference frames are not available, for example the P- frame imrnediately
after the B- frame which is to be decoded is not available;
The frame is damaged.
The Presenter presents the video frames in sequence. If the desired frame is not
available, the previous one is repeated and the delayed frame is subsequently dropped.
Dropping frames has a great effect on system performance so the Buffer Manager,
Decoder and Presenter should work cooperatively t o ensure that most frames could
be presented on time.
4.2 Buffer Management Strategy
In this section we present Our buffer management strategy to support continuous and
interactive MPEG video presentation. We consider the specific features of MPEG
video, like frarnes dependencies, in our design. The two main issues addressed by Our
buffer management strategy are preloading and replacement of video data. Preloading
is necessary because of the non-real-time behaviour of the underlying system compo-
nents (for example storage devices and network). The data needed by the Decoder
and the Presenter must be in the buffer before they are requested. A load-on-demand
52 BufFer Management S t rat egy
strategy can not guarantee continuity and would lead to a jittery presentation. The
number of frames to be preloaded depends on the predicted loading time of the
database and network connections. The number of preloading frarnes determines the
initial delay of a presentation. Strategies for quantiSing this parameter are given by
R. Ng [NY94].
The main goal of a replacement strategy is to replace those frames in the buffer
that are not expected to be presented for the longest period of time in the future.
.4ssuming a single, non-interactive presentation of the continuous ob ject , the strategy
of tossing a frame immediately after it is presented is optimal. In order to take the
interactivity of multimedia presentations into account the replacement strategy has
to consider the efFect of the user interactions on the data flow. Once a user interaction
occurs, the Buffer Manager has to preload frames in order to guarantee continuity
before the presentation can continue. Thus, interaction response time is primarily
determined by the number of buffer faults occurring during the preloading phase
imrnediately after the interaction. In order to reduce the number of buffer faults, the
buffer management strategy has to consider potential interactions by keeping those
frames which are referenced after Iikely interactions.
Additional buffer space is required to support user interactions. Besides the
preloaded frarnes needed for continuity, the Buffer Manager must also keep those
4.2 B d e r Management Strategy 53
frames that are referenced with high probability after interactions. It should be pos-
sible to tune the buffer management strateai, with respect to its degree of support
for interactions, to minimize buffer consumption. The extreme case of no interaction
support is equivalent to a simple 'LUse&Toss" strategy [CAFSI].
First we need to detemine the number of hames to be preloaded. Since our
system is a client-pull architecture, the client must issue initial requests wi t h some
overhead time t o overcome network delivery and database retrieval delay. The amount
of overhead time required is from the time a request is issued at the client side to the
tirne the client receives the desired data. The overhead tirne, which we denote as t,
is estimated based on the performance of the server and network load. CVe can then
convert t to a number of frames, N, which equals t divided by the play speed S. N is
the number of fiames to be preloaded.
. 0 0 . 0 . 0 O 0 0 0 . 0 0 0 0 0 0 0 I B B P B B P B B B P B P B S I B
I presenting point
Figure 4.3: An example of MPEG video stream
In Figure 4.3 each circle denotes a frame. There are two GoPs with the pattern
IBBPBBPBBPBB. LI, Lp and LB denote the next 1, P and B frame to be loaded,
respectively. And RI, Rp and RB denote the next 1, P and B frame to be tossed,
54 B d e r Management S trategy
respectively. We define the preloaded to be the distance, that is number of frames,
frorn the presentation point (PP) to the preloadzng point, which is the frame closest
to the presentation point among LI , Lp and LB. We denote presented as the distance
between the presentation point and the replacement point, which is the frame closest
to the presentation point among RI, Rp and RB. Before the presentation begins we
must load N frames. The Buffer Manager algorithm is shown in Figure 4.4.
do ( if there is free buffer (
if preloaded < N //adaptation is needed ( skipframes; }
PRELOADING; 1 else { //no free buffer
if preloaded < N ( if presented > O
( REPLACEMENT; PRELOADING;
1 eIse if the frame to be loaded is a B frame { skip this frame; }
1 1
if presented > N ( REPLACEMENT; }
) while (1);
Figure 4.4: Buffer Management Algorithm
At any time we try to maintain a t least N frames in local buffer in order to ensure
a smooth presentation. We should also preserve about N presented frames in local
buffer in order to efficiently respond to user interactions. For example, when a user
4.2 Buffer Management Strategy 55
changes play direction from forward play to backward play, the r e s e ~ e d frames can
be used immediately. If there is not enough bufTer space to maintain N preloaded
frarnes and N presented frames, we discard some of the presented frames. If buffer
space is still too small, we drop some preloaded frames.
4.2.1 Preloading Strategy
The three different kinds of frarnes in an MPEG video stream have different impor-
tance. Within one GoP, the 1 frarne is the starting point for decoding the followving
frames, so it must be loaded into the local buffer first. P frames should be loaded
immediately after the I frame, because they are needed to decode B frames. The P
frames within a GoP are loaded in sequence, because the previous P frame is needed
to decode the following P frarne. The B frames are loaded last provided that there
is sufficient buffer space and network bandwidth. In the case that frames must be
dropped, we drop B frames first, then P frames, and then finally 1 frames. If an 1
frame or P frame is dropped, then the following frames within the same GoP should
al1 be dropped accordingly.
We assign a priority values Pr, Pp and PB to L I , Lp and LB, respectively. When-
ever we need to load a frame, we choose the one from L I , Lp and LB with the highest
priority value. We calculate the priorities for the next frame of each type as follows.
We denote CoP as the number of frames in one GoP, NP as the number of P frames
56 Buffer Management S t rat egy
in one GoP, and NB as the number of B frarnes in one GoP. The presentation point is
denoted as PP. As we discussed in session 4.1.2, the priority of a frame is determined
by its priority within its GoP and the distance factor of its GoP.
To calculate a frarne's priority, we first define the frame position of each loading
point, Say NI , N p and Ng, within a GoP as folIows:
The frame priorities Fr, Fp and FB are then defined as follows:
where mp, r n b , c, and cb are variables.
To calculate the distance factors, we first calculate the distance, in GoPs, of each
loading point from the presentation point, Say DI , Dp and D p , as follows:
4.2 BufFer Management Strategy 57
Dg = (LB - PP)/GoP
The distance factors Sr, Sp and SB are then defined as follows:
where r., rp and rb are variables.
Finally the priority values PI, Pp and PB are calculated as follows:
The variables m,, m b , +, cb, T., rP and r b must be tuned to achieve acceptable
performance for each particular system. In Our system as presented in the following
chapter these variables are 0.9, 0.85, 0.05, 0.05, 0.1, 0.1 and 0.2, respectively.
4.2.2 Replacement Strategy
The replacement algorithm considers the dependencies among frames within one GoP.
In one GoP:
58 B d e r Management Strategy
a B frame can be tossed anytirne;
a P frame can be tossed if no dependent frames exist which are either the
following frames within the same GoP or the two B frames irnrnediately before
it;
the 1 frame should be the last frarne to be tossed.
During replacement, the 1 frames, P frames and B frames should be tossed in the
reverse order in which they are loaded. We denote the next I frame to be tossed as
RI, the next P frame to be tossed as Rp and the next B frame to be tossed as Re.
Pnority values PI, Pp and PB are again assigned to RI, Rp and RB, respectively.
Whenever we need to toss a frame, we choose the one from RI, Rp and RB with the
lowest priority value.
The priority values for RI, Rp and RB are calculated using the same equations as
those defined for LI , Lp and Ls except RI, Rp and RB substitute for LI , L p and Ls.
The variables in those equations are the same for both preloading and replacement
because they use the same mechanism.
Chapter 5
Performance St udy
The objective of the performance study is to examine the smoothness and interactive
response time of MPEG video presentation using our buffer management strateg- We
also compare Our strategy with two others. The results of these tests are presented
and discussed in t his chapter.
5.1 Measurements
Two measures are used to evaluate our strategy. One is smoothness, which is the
deviation of presentation jitter [SWM95] from the desired value of zero. We assume
that the mapping of logical time (frame number) into system time is precise, because
the database delay and network delay are all-constant. We also assume the delay
from the client t o the video output can be ignored, because it is same in al1 of Our
59
60 Performance Study
experiments. The presentation jitter is measured in t e m s of logical display time.
Consider a video Stream of kame sequences (fo, f i , . . . , jn) and a playback dis-
playing a subsequence of these frames (fia, fi,, . . . , !*,,,). At each logical display tirne
k (k 2 O and k 5 n), we calculate the logical time error, ek = k - ik between the
expected frame fk and the actually displayed frame fi,, where ik 5 k and ik+i 2 k,
producing the error sequence E : (eo, e l , . . . , en). The smoothness, S, of a playback
is the deviation of the sequence E from the perfect playback, which drops no frames
and has an error sequence of al1 zeroes. Thus S is defined as [SWM95]:
This definition of S is independent of play speed. A lower value of S indicates a
smoother playback. S equal zero denotes perfect playback.
The other measure we use to evaluate the performance of our buffering strategy
is interaction response time (IRT), which is the delay between the occurrence of a
user interaction and the time when the system reacts to this interaction by continuing
with the presentation flow. It is a critical parameter for the use and acceptance of
multimedia systerns. The typicai interactions are:
forward play (PLAY) : al1 the frames are presented sequentially in forward
direction.
fast forward play (FF) : one of every three fiames (al1 the 1- and P- frames) are
presented in forward direction.
5.1 Measurements 61
fast backward play (FB) : one of every three frames (al1 the 1- and P- frames)
are presented in backward direction. The frame number of the following frame
is less than the previous one.
reposition (JUMP) : move the presentation point to any new place and start to
play the video there.
rewind : move the presentation point to the start of the video and play
pause/stop : stop playing.
resume : continue playing.
The response time we are going to measure includes PL.4Y -t FF, PLAY + FB, FF
-t PLAY, FB -t PLAY, FF + FB, FB + FF and PLAY -t JUMP. \Ne calculate
IRT in the following way.
Suppose Ive are playing MPEG video in normal fonvard play (PLAY) mode and
that we want to calculate the IRT that it will take to change to fast fonvard play
(FF) mode. Once the user presses the FF button, how do we tell if the video is being
played in FF mode? In PLAY mode, we present the frarnes sequentially. In FF mode
every third frame is presented, that is only I frames and P frames are presented. The
IRT for this interaction is measured from the time the user presses the FF button
to the time we detect a number of successive frames that are al1 1- or P- frames and
that have the same gap of three frames. The IRT times for al1 the other interactions
62 Performance Study
is calculated in a similar manner.
5.2 Comparing Strategies
We compare our buffer management strategy with two other strategies. The first
one is L/MRP [MKK95] uçed in AMOS which was developed a t GMD - German
National Research Center for Information Technology. They also conçidered media-
specific modeling of MPEG video [HL97]. However they adopted a different strategy
from ours to load and drop frames. The three possible strategies to load and drop
frames of MPEG video are shown in Table 5.1. Model a is used in our strategy, and
model c is used in L/MRP. Model a demonstrates a better performance than model
c as we discuss Iater.
a) 1 1 1 B 1 B 1 P 1 B 1 B ( P 1 B 1 B 1 P 1 B 1 B 1 Preloading by priority
b) 1 1 B B P B B P B B P B B.1 Conventional Preloading
C) 1 1 B B 1 P B B 1 P B B 1 P B B 1 Preloading by priority
Table 5.1: MPEG Video Frames Model
In model a, each frame within a GoP is loaded and dropped independently. I-
frames have a higher priority than B- and P- frames. And P- frames have a higher
priority than B- frames. We adopt this model because we feel it is more flexible and
5.2 Comparing Strategies 63
more efficient than the others.
In mode1 6, each GoP is loaded or dropped as a unit. In Our experiments, one
GoP includes 12 frames that car be presented for half a second so dropping one GoP
would lead to unacceptable jitter.
Mode1 c segments one GoP into pieces. Each segment starts with an I- or P-
frame. The segments that include the I-frames must be loaded first since the other
segments are dependent on it. The segments act as the atornic preloading unit that
means that a segment is loaded or dropped as a whole. This strategy is implemented
in the AMOS multimedia database system [HL97]. So we denote this strategy as
AMOS.
The second strategy used in our cornparison \vas developed at Oregon Gradu-
ate Institute of Science and Technology. Shanwei Cen et al [CPS95] designed and
implemented a distributed real-time MPEG video player using a software feedback
mechanism. The system uses a semer-push architecture. The server retrieves video
data according to the retrieval putteni it receives from the client and sends it to the
client a t a constant speed. The retrieval pattern comes from the GoP pattern. If a
frame within a GoP is to be retrieved, the corresponding bit of the retrieval pattern
is set to 1; otherwise it is set to O. The client simply consumes what it receives.
A feedback mechanism is used by the client to control client/server synchronization
and quality-of-service (QoS). When a bottleneck is detected a t the client side, the
64 Performance Study
client calculates the retrieval pattern again and sends it to the server. The server
then retneves data according to the new pattern. When the client calculates the
retrieval pattern, it always tries to evenly drop frames within a GoP. Thus it can
maintain a smooth presentation.
-4 simple buffer management strategy was implemented at the client side that
bufTers the data received from the server in a single buffer queue. The decoder con-
sumes the buffered data sequentially and the presented frames are tossed immediately.
We denote this algorithm as OGI.
In OGI, MPEG videos are stored in disk files. The transfer speed of disk files is
much higher than that of a database so the performance of OGI should be supenor
to Our system because of this reason. In order to make a more even cornparison
nre altered the OGI system so that it can also retrieve video data from a relational
database. We denote this new version of the system as OGI*.
5.3 Test Environment
The performance study tests were conducted in an environment with the following
properties:
The server runs on an IBM PowerStation 220. The video data is stored in IBM
DBOIUDB.
5.4 Results and Observations 65
The client runs on an IBM PowerPC.
The Server and the client are connected via a 10 Mbps Ethernet.
The frame size of the MPEG video is 320 x 240. There are 9500 frames in total,
which are encoded at 30 fps. The average frame size is 4.79K bytes, and the
GoP pattern is IBBPBBPBBPBB.
The buffer size a t the client side is 512K bytes.
Software decoding is used in al1 the systems.
To evaluate the smoothness of the four strategies (ours, OGI, OGI* and AMOS),
ive played the default video stream at various play speeds without user interactions.
Each play was repeated 5 times, and a smoothness value was calculated after each
play. The average smoothness values are presented in the following section.
The same strategy is used to evaluate IRT. The default video stream was played
at a fked play speed. Then we tried al1 kinds of user interactions during the play.
Each user interaction was repeated 10 times. The average IRT values are presented
in the following section.
5.4 Results and Observations
We present experiments to compare the smoothness and interactive response time
rneasurements of the four strategies. We also present experiments to show how the
66 Performance Study
buffer size affects the smoothness measurement of Our system.
5.4.1 Smoothness
Figure 5.1 shows the smoothness of the four strategies. The vertical avis is the
smoothness measurement. The horizontal avis is the play speed of presentation in
frames per second (fps). The play speed in the graph ranges from 5 to 16 fps. We
collected samples at each integers and the midpoints. When the play speed is under
5 fps, the smoothness measurements of al1 the strategies except OGI* are zero which
means that they work perfectly. When the play speed is higher than 16fps, which is
the rnaximpm playing rate supported by our system, the smoothness rneasurements i
of al1 the strategies except OGI increase dramatically. Our strategy out performs a11
the others when the presentation play speed is between 5 fps and l4fps.
Our strategy shows better performance than AMOS due to Our difTerent loading
and replacement strategy. In our strategy each frame is dropped independently so
we can choose to drop frames that have the least effect on other frames, that is
the B frames, and to drop the frames evenly within one GoP. AMOS, on the other
hand, must drop a segment a t a time. The dropped segment also effects the following
frames.
For example, let AMOS drop its last segment which has one P- frarne and two B-
frames, and our strategy drop the last four B- frames of the pattern shown in Table
5.4 Results and Observations 67
5 6 7 8 9 IO 1 1 12 13 14 15 16
Figure 5.1: Smoot hness Measurement
2 We drop one more frame than AMOS. Normally the size of a P- frame is larger
than that of a B- frame, and we want both strategies to drop about the same amount
of data so we assume that the size of two B- frames will not be less than that of one P-
frame. In Table 5.2, the positions marked X in the pattern mean the corresponding
frames are dropped. The smoothness within one GoP is calculated using Equation
5.1 defined in Section 5.1. It is clear that Our strategy is better than AMOS.
Our strategy outperforms OGI when the play speed is less than 14 fps. OGI is
a semer-push system, and Our strategy is based on a client-pull architecture. In Our
strategy, the client can decide to drop frames immediately if it detects a bottleneck.
68 Performance Study
Loading Pattern Smoothness
Pattern I B B P B B P B B P B B
AMOS I B B P B B P B B X X X 4-=2.16
Ours I B B P B B P X X P X X 4-=l58
Table 5.2: Smoothness of AMOS and Our Strategy
When the bottleneck disappears, it can either drop fewer frarnes or stop dropping
frarnes entirely. In OGI, when a bottleneck is detected, the client recalculates the
retrieval pattern and sends it to the server. The server then takes action to drop
frarnes according to the retrieval pattern. It therefore takes a longer time for OGI
to respond to a bottleneck than our strategy. During the presentation, the system
resources such as database and network bandwidth may change frequently because
it is a open environment. We therefore need to respond to these changes quickly;
othenvise the smoothness of the presentation may be jeopardized.
The following is an example. Suppose that at first we play the video smoothly
and can preload al1 the video data. We then detect a bottleneck and estimate that we
can only preload nine frarnes per GoP. In Our strategy, we can change the preloading
pattern imrnediately as shown in Table 5.3. OGI, however, can not change immedi-
ately so the last three frarnes of the GoP will be dropped. Its smoothness is therefore
5.4 Results and Observations 69
worse than ours as shown in Table 5.3.
Loading Pattern Smoothness
Pattern 1 B B P B B P B B P B B
Ours I B X P B X P B X P B B ,/F=l
1 OGI I B B P B B P B B X X X d F = 2 . 1 6
Table 5.3: Smoothness of OGI and Our Strategy
The srnoothness measurement of OGI* is worse than ours because it does not con-
sider the specific features of the database. As we know, there is overhead associated
with retrieving video data from database using DB2/UDB7s multimedia extenders.
The server always retrieves a block of data so that the average overhead time can be
reduced. The size of the block determines the time tb to retrieve the block from the
database. In our strategy we considered this overhead time and added the average
overhead time into the preloading time. The client preloads an amount of data which
takes longer to present than tb.
When the client requests a frame that is not in the local buffer of the server,
the server retrieves the block from the database which has the desired frame and
it takes approxirnately tb to deliver the frame to the client. During this time, the
client continues to consume those frarnes that are preloaded in its local buffer. So the
70 Performance Study
database overhead time does not affect the smoothness of Our strategy.
OGI*, bowever, does not consider the database overhead time. Its client simply
consumes what it receives from the server. W ' e n the server needs to retrieve data
from the database, the client rnay consume d l of its buffered frames and then be forced
to wait for the server which degrades the smoothness of the presentation. When the
desired frame finally arrives, the client rnay have to drop it because it is too late:
which may also affect the following frames.
When play speed is geater than 15 fps OGI outperforms our strategy. The
smoothness of Our strategy degrades beyond 15 fps because the database I/O becomes
a bottleneck. Each time we retrieve a block of video data from the database which
takes approximately 3 seconds. This amount of data can be played at the client for
about 3 seconds at the play speed of 16 fps. When the play speed exceeds 16 fps, the
database can not provide the data in real-time and the smoothness degrades. The
OGI retrieves video data from a disk file and so does aot encounter this limit.
We did not calculate standard deviation of the smoothness results, so some differ-
ences may not be significant. But given a consistent experimental environment, that
would not affect our results.
5.4 Results and Observations 71
5.4.2 Interactive Response Time
Table 5.4 shows the results of experiments to evaluate interactive response time for
the four strategies. The video was piayed at the frame rate of 12 fps. The table iists
the IRTs for the following interactions:
O PLAY -t FF changes the status from normal play to fast fonvard play. The
IRT is measured from the time the user presses the PLAY button to the time
we detect that M successive 1- and P- frames are presented. M is a variable,
which is 5 in our experiment
F F -t PLAY changes the status from fast fonvard play to normal play. The
IRT is measured from the time the user presses the FF button to the tirne we
detect that M successive frames are presented.
O PL.4Y -t FB changes the status from normal play to fast backward play. The
IRT is measured from the time the user presses the PLAY button to the time
we detect that M successive 1- and P- frarnes are presented in reverse order.
FB -t PLAY changes the status from fast backward play to normal play. The
IRT is measured from the time the user presses the FB button to the time we
detect that M successive frames are presented in fonvard direct ion.
O FF -t FB changes the status from fast fonvard play to fast backward play. The
IRT is measured from the time the user presses the F F button to the time we
72 Performance Study
detect that M successive 1- and P- frarnes are presented in reverse order.
FB + FF changes the status from fast backward play to fast forward play. The
IRT is measured from the time the user presses the FE3 button to the tirne ive
detect that M successive 1- and P- frames are presented in forward direction.
JUMP starts presentation from new position. The IRT is measured from the
time the user presses the JUM P button to the time we detect that M successive
frames are presented in forward direction.
AMOS
PLAY + F F 126
FF -, PLAY 128
PLAY + FB 442
FB-PLAY 411
F F + FB 344
FB - FF 302
JUMP 4406
Ours
113
117
414
353
263
291
3297
OGI
211
312
216
319
205
209
103
Table 5.4: Interaction Response Time (ms)
From the table we can see that PLAY-iFB and FB-,PLAY take longer time than
PLAY-LFF and FF-LPLAY. The reason is that PLAY-LFB and FB-iPL.4Y need to
5.4 Results and Observations 73
change play direction which will take some time to reorganize buffer pool. We can
also see that JUMP operation takes relatively longer time than al1 the ot hers because
the new position is difficult to predict. Whenever a JUMP operation occurs, we have
to load al1 the needed data from the semer.
The results show that Our strategy outperforms AMOS and OGI* for al1 the cases,
and outperforms OGI for two cases. Our preloading strategy allows us to perform
slightly better than AMOS. Although both strategies try to preload al1 the video data
in order to support user interaction etliciently, when it can not afford to retrieve al1
the data in real-time our strategy tries to preload the higher priority 1- and P- frames
that are needed by FF and FB play. We can therefore perform FF and FB smoothly
with little affect on PLAY. AMOS, on the other hand, drops segments that include
1- or P- frames so that FF , FB and PL.4Y are al1 affected.
The performance of OGI* is the worst of the four strategies. The main reason is
that it uses the Use&Toss strategy [CAFSl]. In OGI* the data is tossed irnmediately
after being used. Whenever a user changes play direction, e.g. PLAY + FB or FB -t
PLAY, it does not have any data that can be used immediately. It rnust reconstruct
the buffer pool and reload al1 the data in the other direction, which increases its IRT.
The other reason for its long IRT is that it only loads data that is needed for the
current presentation. For FF and PB, it only preloads the desired 1- and P- frames.
When user changes play operations, such as FF -t PLAY or FB -t PL.4Y, it does
l
74 Performance Study
not have the desired B- frames in hand.
Our strategy has almost the same performance as OGI. Our IRT for PLAY -t
FF and FF + PLAY are better than that of OGI. The main reason is that Our
client tries to load al1 the frames for PLAY and FF operations if the buffer space and
network bandwidth are sufficient. When the user changes play operations between
PLAY and FF, no change is needed. Al1 the desired data is there so the IRT can
be small. On the other hand, OGI only loads those data that are needed for the
current presentation. For FF, it only preloads the desired 1- and P- frames so when
user changes play operations form FF to PLAY, OGI does not have the desired B-
frames in hand. OGI's IRT is therefore longer than ours. Our IRT response for other
interactions are worse than that of OGI because those interactions need to change
play direction. Whenever play direction is changed, we need to reconstruct the buffer
pool because we use link list to manage the buffer pool. That will cost some time.
However the overhead is not too much as you can see from Table 5.4.
For all the strategies the IRT of JUMP is higher than other operations. The
reason is that it is difficult to predict the new position of JUMP. Thus we can not
preload the data for the play.
5.4 Results and Observations 75
5.4.3 Smoothness vs. Buffer Size
The smoothness of Our strategy greatly depends on the size of the buffer at the client
side. This fact is shown by results in Figure 5.2. The horizontal avis denotes buffer
size measured in kilobytes (KB) and the vertical axis denotes srnoothness. From
Figure 5.2 we can see that the smoothness depends directly on the buffer size.
Smoothness S
128 192 256 320 384 448 512 576
Figure 5.2: Smoothness vs. Buffer Size
In this experirnent we Vary the buffer size from 128KB to 576KB; the play speed
is fixed at 12 fps. When the buffer size is less than 12SK, the smoothness of the
system is very bad, because the system can not buffer sufficient frames to tolerate
the database and network delay. When the buffer size grows, more frames can be
76 Performance S t ud y
preloaded, thus the system can tolerate more delay, and the srnoothness is irnproved
subsequently. When the buffer size is greater than 576KB, the smoothness of the
system is near zero which means perfect playback.
Chapter 6
Conclusions
Multimedia presentations demand specific support from database management sys-
tems. Specially, the buffer management strategy plays an important role in sustaining
smooth presentation and user interactions. We developed a buffer management strat-
egy to support MPEG video presentation where the video is stored in a relational
database. MPEG-related features, such as frame dependency, were considered in the
design of our strategy that affects the performance of the strategy. We implemented
this strategy in an MPEG video presentation system and conducted experiments to
evaluate our strategy. Our strategy demonstrates a better performance than other
existing strategies. We also designed a frameivork to support multimedia presenta-
tion. Our implementation produces some of the components of this framework, and
the remainder are left as future work.
78 Conclusions
The following sections present a s u m m q of the contributions of the research and
a discussion of interesting future work in the area.
Contributions
This research rnakes the following contributions:
The design and implementa t ion of the MPEG video presentation sys-
tem. The MPEG video presentation systern can play MPEG video in real-time
and support various user interactions. The MPEG video is stored in a relational
database system and can be played in real-time.
The development of a b d e r management strategy for continuous
media support. The buffer management strategy provides good support for
continuous and interactive MPEG video presentation. A preloading strategy is
irnplemented to remove the delay of the underlying system components. Fur-
thermore, a replacement strategy is implemented that attempts to replace those
frarnes which are not expected to be referenced for the longest period of time
in the future. The concepts of distance relevance function, preloading prior-
ity and replacement priority are introduced in the design. The design also
considers the specific features of MPEG video, specifically the dependencies
6.2 Future Work 79
between the frames. It preloads different MPEG video frames in different pri-
ority to maximize the srnoothness and minimize the interaction response time
of the presentation. This strategy has been implemented in Our MPEG video
presentation system, and experiments have been conducted to evaluate the per-
formance of the strategy. The smoothness of our strategy is better than other
existing strategies, and the IRT is better than others in most cases.
The design of a fiamework to support multimedia presentation. We
proposed an open and scalable framework to support various multimedia appli-
cations. The framework makes use of the relational database system DB2/UDB
and its multimedia extenders to manage multimedia data. The system can also
be expanded to accommodate new media types on the help of media-specific
providers and presenters. A wide range of multimedia applications can be sup-
ported by combining different media providers and media presenters.
6.2 Future Work
-4 number of topics within this research area remain open for future consideration:
Tunable parameters of b e e r management strategy to support differ-
ent presentation scenarios. We have defined a set of functions to calculate
priority value in our buffer management strategy. The parameters in those
80 Conclusions
functions came from experimentd results with Our particular configuration. In
redity there are many other presentation scenarios where the current param-
eters are not suitable. Ideally these parameters should be tunable to satisfy
difFerent presentation scenarios.
Define working point to support JUMP operation more efficiently.
Our buffer management strategy does not have good support for the JUMP
operation. The reason is that i t is difficult to predict the new position. Thus
we can not preload data for the new playback. An alternative way is to define
some named working points so that the JUMP operation can be replaced with
an operator to set the working point. We can preload data for these working
points to reduce the response time for this operation.
Make suitable changes to support MPEG-2 video. Our MPEG video
presentation systern can only support MPEG-1 video. However it is not diffi-
cult to adapt it to support EVIPEG-2 video due to their conceptual similarity.
The key features of MPEG-2 are the scaiable extensions that permit the divi-
sion of a continuous signal into two or more coded bit streams representing the
video at different resolution, picture quality, or picture rates. The modeling
of an MPEG-2 video can be done in the same tvay as discussed for MPEG-1
streams. But its scalability features must be considered, and additional seman-
tic information may be needed at the client side or a t the server side.
6.2 Future Work 81
Incorporate hardware support to decode full-size MPEG video. Our
system only uses a software decoder to decode MPEG video. The CPU procesc
ing capability limits the maximum picture size as 320 x 210, and the maximum
play speed as 16 frames per second (fps). To support full-size (640 x 480) and
full-speed (25 or 30 fps) MPEG video presentation, we must have corresponding
hardware support.
i Media Sharing at the server side. In Our system, the server can support
multiple clients. The data rates at the server side are so high that despite
the development of an efficient retrieval strategy, database I/O can still be
the potential bottleneck. This problem limits the number of concurrent clients
that can be supported by the system. A media sharing technique [KRT95] can
reserve the buffers that have been played by a user in controlled fashion for use
by subsequent users requesting the same data. We can implement this technique
at the server side to irnprove the performance of the system.
L
Presentation Adaptation, Media Synchronization and Media Compo-
sition. Techniques to provide present ation adaptation, media synchronization
and media composition are al1 useful addit ions to our multimedia application
framework. Presentation techniques are needed to keep up the intra-media syn-
chronization of a presentation in best-effort systems under varying available
resources. We have implemented a simple adaptation mechanism in our system
82 Conclusions
that should be improved to achieve better performance. Media S~chroniza t ion
and Media Composition are needed for multiple media presentations.
Bibliography
[AUG95] Sankt Augustin. VODAK V4.0 User Manual. GMD Technical Report
No. 910, April 1995.
[BKL96] S. Boll, W. Klas and M. Lohr. Integrated Database Services for Blultime-
dia Presentations. In Multimedia Information Storage and Management,
Kluwer Academic Publishers, 1996.
[CAF911 S. Christodoulakis, N. Ailamaki, M. Fragonikolakis, etc. An O bject Ori-
ented Architecture for hl ultimedia Information Systems. In IEEE Data
Engineering, M(3), pp 34-41, September 1991
[CGSSS] S. Chaudhuri, S. Ghandeharizadeh, and C. Shahabi. Avoiding Retriewl
Contention for Composite Multimedia O bjects. In Proceedings of the 2lst
VLDB conference, pp 122-129, Zurich, Switzerland, 1995.
[CPS95] S. Cen, C. Pu, R. Staehli, etc. A Distributed Real-Time MPEG Video
Audio Player. In Proceedings of NOSSDA V'95. pp 99-107, -4pril 18-21,
83
84 BIBLIOGRAPHY
2. Chen, S. Tan and R. Campbell. Real Time Video and Audio in the
World Wide Web. In World Wide Web Journal, Volume 1, Number 1,
December 1995, pp 333-348.
J. Dey, J. Salehi and J. Kurose. Providing VCR Capabilities in Large-
Scale Video Servers. In Proceedings of ACM Multimedia, pp 134142, San
Francisco, October 1994.
A. Dan, D. Sitaram and P. Shahabuddin. Scheduling Policies for an On-
Demand Video Server with Batching. IBM Research Report RC 19381.
1994
A. Dan and D. Towsley. An Approximate Analysis of LRU and FIFO
Buffer Replacement Schemes. In Proceedings of ACM SIGMETRICS
Conference 1990, pp 143-149, 1990.
W. Effelsberg and T. Haerder. Principles of Database Buffer Manage-
ment. In A CM Transactions of Database Systems, 9(9) :560-595, 1984.
J. Gemme1 and S. Christodoulakis. Principles of Delay-Sensitive Multi-
media Data Storage and Retrieval. In ACM Transactions on Information
Systems, 10(1), pp 53-59, January 1992.
[GZ96a] S. Gollapudi and A. Zhang. Buffer Management in Multimedia Database
Systems. In TheThird IEEE International Conference o n Multimedia
Comput2ng and Systems (K'MCSy96), pp 87-95, Hiroshima, Japan, June.
1996.
[GZ96b] S. Gollapudi and -4. Zhang. NetMedia: -4 Client-Server Distributed Nul-
tirnedia Database Environment. In the 1996 International Workshop on
Multimedia Database Management Systems, pp102-110, Blue Mountain
Lake, New York, August. 1996.
[HKR97] S. Hollfelder, A. Kraks and T. C. Rakow. X Client-Controlled Adapta-
tion Framework for Multimedia Database Systems. In European Work-
shop on Interactive Distributeci Multimedia Systems and Telecommunica-
t ion Services (IDMS797'), ppL87-192, September 10- 12, Darmstadt, Ger-
many.
[HL971 S. Hollfelder and H. Lee. Data Abstractions for Multimedia Database
Systems, 1997. GMD Technical Report.
[HOL97] Silvia Hollfelder. Admission Control for Multimedia Applications in
Client-Pull Architectures. In International Workshop o n Multimedia In-
formation Systern (MIS), pp 23-32, Como, Italy, Sept. 25-27, 1997.
[HSH97]
[IB M97al
[IBM97b]
[INF97]
[INF97b]
[ISOSS]
[JZSS]
S. Hollfelder, F. Schmidt and M. Hemmje. Transparent Integration of
Continuous Media Support into a hlultimedia DBMS. GMD Technical
Report (Arbeztspapiere der GMD) No. 1104, St. Augustin, Germany, De-
cember 1997
DB2 Relational Extenders. IBM white paper.
ht tp://wwnr.software.ibm.com/data/pubs/papers/.
DB2 Object-Relational Solution. IBM white paper.
http://www.software.ibm.corn/data/pubs/papers/.
Michael Stonebraker. Architecture Options for O bject-Relational
DBMSs. 1nformi.x white paper.
http://wnrw.informiu.com/informi~/corpinfo/zines/whiteidxhtm.
Michael Stonebraker. Object-Relational DBMS - The Next Wave. In-
for mi^ white paper.
http://www.informLu.com/informix/corpinfo/zi~es/~vhiteid~. htm-
Hypermedia/Time-based Structuring Language: HyTime (ISO 10744).
International Standard Organization. ' '
T. V. Johnson and A. Zhang. A Framework for Supporting Quality-
Based Presentation of Continuous Multimedia Çtreams. In the Fourth
BIBLIOGRAPHY 87
IEEE Internat ion~l Conference on Multimedia Computing and Systerns
(ICMCS'96), Ottawa, Canada, June, 1997.
A. Kraio. An Object Manager for Continuous Data Within the
OODBMS VODAK (in German). In GMD-Studien 256, Darmstadt,
1994.
M. Kamath, K. Ramamritham and D. Towsley. Coutinuous Media Shar-
ing in Multimedia Database Systems. In Proceedzngs of the Fourth In-
ternational Conference on Database Systems for Advanced Applications
(DASFA A '95), Singapore, -4pril 10-13, 1995
Didler Le Gall. MPEG: A Video Compression Standard for Multimedia
Applications. In Communication of The ACM, Vo134, No. 4, April 1991,
pages 45-68.
S. Little and A. Ghafoor. Network Considerations for Distributed Multi-
media Object Composition and Communication. In IEEE Network Mag-
azine, pp. 32-49, 1990.
T. Little and A. Ghafoor. Synchronization and Storage Models for Multi-
media Objects. In IEEE Journal on Selected Areas i n Communications,
8(3):413-427, April 1990.
[MKK95] F. Moser, A. Kraib, and W. Klas. L/MRP: -4 Buffer Management Strat-
egy for Interactive Continuous Data Flow in a Multimedia DBMS. In
Proceedings of the 21st VLDB conference, Zurich, Switzerland, 1995.
[MPEZ] Generic Coding of Moving Pictures and Associated Audio Information
- Part2: Video (MPEG-2) , ISO/IEC 13818-2 International Standard,
1996.
[NBE93] W. Niblack, R. Barber, and W. Equitz. The QBIC Project: Querying
Image %y Content Using Color, Texture, and Shape. In SPIE 1993 In-
ternational Symposium on Electronic Imuging: Scinece and Technology,
pp 77-87, February 1993.
[NFS91] R. Ng, C. Faloustos and T. Sellis. Flexible Buffer Management based on
Marginal Gains. In Proceedings of the 1991 ACM SIGMOD Conference,
pp. 379-396, 1991.
[NNW93] E.J. O'Neil, P.E. O'Neil and G. Weikum. The LRU/k Page Replacement
Algorithm for Database Disk Buffering. In Proceedzngs of the 1993 ACM
SIGMOD Conference, pp. 297-306, 1993.
[NT921 E.J. Newhold and V. Turau. Database Research a t IPSI. In SIGMOD
Record, 21(1):133-138, March 1992.
BIBLIOGRAPHY 89
R. Ng and J. Yang. Maximizing Buffer and Disk Utilization for News On-
Demand. In Proceedings of the 20th International Conference on Very
Large Data Bases 1994 (VLDBYd), pp. 451-462, 1994.
T.C. Rakow, W. Klas and E.J. Newhold. Research on Multime-
dia Database Systems at GMD-IPSI. In IEEE Multimedia Newsletter
4(1):41-46, April '96.
T. Rakow, E. Neuhold, and M. Lhr. Multimedia Database Systems - The
Notions and the Issues. In Tagungsban GI-Fachtagung Datenbanks-terne
in Bro, Technzk und Wissenschaft (BTW), Dresden, Mrz 1995, S. 1-29.
Springer, Reihe Informatik Aktuell, Berlin 1995.
S. Roa, H. Vin and .4. Tarafdar. Comparative Evaluation of Server-push
and Client-pull -4rchitectures for Multimedia Severs. In Nossdav 96, pp.
45-48, 1996.
D. Rotem and J. L. Zhoa. Buffer management for Video Database Sys-
tems. In Proceedings of IEEE Data Engineering 1995, 18, pp 45-50,1995.
J. A. Schnepf, Y. Lee and L. Kang. Building a Framework for FLexible
Interactive Presentations. In Paczfic Workshop on Distnubted Multime-
dia Systems (Pacific DMS '96), 190-197, Hong Kong, June 1996.
90 BIBLIOGRAPHY
G. Saco and M. Schkolnik. Buffer Management in Relational Database
Systems. In ACM Transactions on Database Systems, 11(4), pp. 173-495,
1986.
R. Staehli, J. Walpole and D. Maier. Quality of Service Specifications
for Multimedia Presentat ions. In Mdtimedza Sys tems. August , 1995.
H. Thimm and W. Klas. Playout Management - An Integrated Service
of a Multimedia Database Management System, 1995. Technical Report.
GMD-IPSI.
Glossary
AMOS
DBMS
FIFO
GoP
IRT
LFU
LRU
L/MRP
MM-DBMS
MPEG
OGI
QoS
UDB
Active Media Object Stores
Database Management System
First In First Out
Group-of-pictures
Interactive Response Time
Least Frequently Used
Least Recently Used
Least Most Relevant for Presentation
Multimedia Database Management System
Motion Pictures Experts Group
Oregon Graduate Institute
Quality of Service
Universal Database
92 Glossary
IMAGE EVALUATION TEST TARGET (QA-3)
APPLIED IMAGE. lnc 1653 East Main Street
,- Rochester. NY 14609 USA -- --= Phone: 71 614û2-0300 Fax: 71 W288-5989