ibm videocharger and digital library mediabase.doc

A Review of Video Streaming

over the Internet

Abstract :

Ideally, video and audio are streamed across the Internet from the

server to the client in response to a client request for a Web page

containing embedded videos. The client plays the incoming

multimedia stream in real time as the data is received. Quite a few

video streamers are starting to appear and many pseudo-streaming

technologies and other potential solutions are also in the pipeline.

Generally streaming video solutions may work on a closed-loop

intranet, but for mass-market Internet use, they're simply

dysfunctional. However current transport protocol, codec and

scalability research will eventually make video on the Web a

practical reality. Below we have reviewed the currently available

commercial products which purport to provide video streaming

capabilities over the Internet and outlined their limitations. Then we

describe the major research projects curently underway, which are

attempting to solve some of these limitations. Finally we compare

and evaluate the SuperNOVA project with respect to other research

projects and the current commercial products.

Introduction

For a long time now, its been very easy to download and play back

high-quality audio and video files from the Internet. Current web

browsers and servers support full-file transfer mode of document

retrieval. However, full file transfer means very long, unacceptable

transfer times and playback latency. Ideally, video and audio should

be streamed across the Internet from the server to the client in

response to a client request for a Web page containing embedded

videos. The client plays the incoming multimedia stream in real time

as the data is received. Audio streaming is becoming widely

accepted and deployed. In particular, Progressive Networks'

RealAudio has a wide following. Although streaming audio programs

are considerably further along than video, they are still nowhere

near typical computer-sound quality. The idea of streaming video

over the network has been gaining a lot of interest. The current

Internet is a best effort network and interconnects sites with widely

varying bandwidth capabililties. In the future the Internet will see

the rollout of ATM, RSVP with the ability to control Quality of

Services (QoS) and mobile networks with widely varying QoS.

Therefore it will remain a very heterogeneous network. In this report

firstly we present a brief review of the current video compression

standards, evolving standards and techniques and the internet

transport protocols being deployed. In addition, issues such as the

need for servers, plugins and firewall penetration are discussed.

There are many commerical streaming video products becoming

available as well as many research projects in this area. We then

review the currently available commercial products which purport to

provide video streaming capabilities over the Internet and out line

their current limitations. Then we describe the major research

projects currently underway, which are attempting to solve some of

these limitations. Finally we compare and evaluate the SuperNOVA

project with respect to other research projects and the cur rent

commercial products.

Video Compression Standards

The most important video codec standards for streaming video are H.261, H.263,

MJPEG, MPEG1, MPEG2 and MPEG4. A brief description of these is given below.

Compared to video codecs for CD-ROM or TV broadcast, codecs designed for the

Internet require greater scalability, lower computational complexity, greater resiliency

to network losses, and lower encode/decode latency for video conferencing. In

addition, the codecs must be tightly linked to network delivery software to achieve the

highest possible frame rates and picture quality. As one looks at the existing codec

standards, it becomes apparent that none are ideal for Internet video. In fact, it is quite

clear that over the next few years, we will see a host of new algorithms that are

specifically designed for the Internet and are thus more suitable for it. Research is

currently underway looking at both new scalable, flexible codecs and ways of scaling

existing codecs using transcoding and filters. Section 3 outlines current research in

video scalability. New algorithms specifically targeted at Internet video are being

developed. Consequently application framework standards such as H323/H.324 for

videoconferencing and MPEG4, are being designed that will easily incorporate these

new codec innovations into applications being developed today, without significant

rework.

H.261

H.261 is also known as P*64 where P is an integer number meant to represent

multiples of 64kbit/sec. H.261 was targeted at teleconferencing applications and is

intended for carrying video over ISDN - in particular for face-to-face videophone

applications and for videoconferencing. The actual encoding algorithm is similar to

(but incompatible with) that of MPEG. H.261 needs substantially less CPU power for

real-time encoding than MPEG. The algorithm includes a mechanism which optimises

bandwidth usage by trading picture quality against motion, so that a quickly-changing

picture will have a lower quality than a relatively static picture. H.261 used in this

way is thus a constant-bit-rate encoding rather than a constant-quality, variable-bit-

rate encoding.

H.263

H.263 is a draft ITU-T standard designed for low bitrate

communication. It is expected that the standard will be used for a

wide range of bitrates, not just low bitrate applications. It is

expected that H.263 will replace H.261 in many applications. The

coding algorithm of H.263 is similar to that used by H.261, however

with some improvements and changes to improve performance and

error recovery. The differences between the H.261 and H.263

coding algorithms are listed below. Half pixel precision is used for

H.263 motion compensation whereas H.261 used full pixel precision

and a loop filter. Some parts of the hierarchical structure of the

datastream are now optional, so the codec can be configured for a

lower datarate or better error recovery. There are now four optional

negotiable options included to improve performance: Unrestricted

Motion Vectors, Syntax-based arithmetic coding, Advance

prediction, and forward and backward frame prediction similar to

MPEG called P-B frames. H.263 supports five resolutions. In addition

to QCIF and CIF that were supported by H.261 there is SQCIF, 4CIF,

http://rice.ecs.soton.ac.uk/peter/h263/h263.html

http://www.nta.no/brukere/DVC/

http://www.stud.ee.ethz.ch/~rmprince/h261.html

and 16CIF. SQCIF is approximately half the resolution of QCIF. 4CIF

and 16CIF are 4 and 16 times the resolution of CIF respectively. The

support of 4CIF and 16CIF means the codec could then compete

with other higher bitrate video coding standards such as the MPEG

standards.

MJPEG

There is really no such standard as "motion JPEG" or "MJPEG" for

video. Various vendors have applied JPEG to individual frames of a

video sequence, and have called the result "M-JPEG". JPEG is

designed for compressing either full-color or gray-scale images of

natural, real-world scenes. It works well on photographs, naturalistic

artwork, and similar material; not so well on lettering, simple

cartoons, or line drawings. JPEG is a lossy compression algorithm

which uses DCT-based encoding. JPEG can typically achieve 10:1 to

20:1 compression without visible loss, 30:1 to 50:1 compression is

possible with small to moderate defects, while for very-low-quality

purposes such as previews or archive indexes, 100:1 compression is

quite feasible. Non-linear video editors are typically used in

broadcast TV, commercial post production, and high-end corporate

media departments. Low bitrate MPEG-1 quality is unacceptable to

these customers, and it is difficult to edit video sequences that use

inter-frame compression. Consequently, non-linear editors (e.g.,

AVID, Matrox, FAST, etc.) will continue to use motion JPEG with low

compression factors (e.g., 6:1 to 10:1).

MPEG-1

MPEG 1, 2 and 4 are currently accepted, draft and developing standards respectively,

for the bandwidth efficient transmission of video and audio. The MPEG-1 codec

targets a bandwidth of 1-1.5 Mbps offering VHS quality video at CIF (352x288)

resolution and 30 frames per second. MPEG-1 requires expensive hardware for real-

time encoding. While decoding can be done in software, most implementations

consume a large fraction of a high-end processor. MPEG-1 does not offer resolution

scalability and the video quality is highly susceptible to packet losses, due to the

dependencies present in the P (predicted) and B (bi-directionally predicted) frames.

The B-frames also introduce latency in the encode process, since encoding frame N

http://drogo.cselt.stet.it/mpeg/

needs access to frame N+k, making it less suitable for video conferencing.

MPEG-2

MPEG 2 extends MPEG 1 by including support for higher resolution video and

increased audio capabilities. The targeted bit rate for MPEG 2 is 4-15Mbits/s,

providing broadcast quality full-screen video. The MPEG 2 draft standard does cater

for scalability. Three (3) types of scalability; Signal-to-Noise Ratio (SNR), Spatial and

Temporal, and one extension (that can be used to implement scalability) Data

Partitioning, have been defined. Compared with MPEG-1, it requires even more

expensive hardware to encode and decode. It is also prone to poor video quality in the

presence of losses, for the same reasons as MPEG-1. Both MPEG-1 and MPEG-2 are

well suited to the purposes for which they were developed. For example, MPEG-1

works very well for playback from CD-ROM, and MPEG-2 is great for high-quality

archiving applications and for TV broadcast applications. In the case of satellite

broadcasts, MPEG-2 allows >5 digital channels to be encoded using the same

bandwidth as used by a single analog channel today, without sacrificing video quality.

Given this major advantage, the large encoding costs are really not a factor. However,

for existing computer and Internet infrastructures, MPEG-based solutions are too

expensive and require too much bandwidth; they were not designed with the Internet

in mind.

MPEG-4

The intention of MPEG 4 is to provide a compression scheme suitable for video

conferencing, i.e. data rates less 64Kbits/s. MPEG4 will be based on the segmentation

of audiovisual scenes into AVOs or "audio/visual objects" which can be multiplexed

for transmission over heterogeneous networks. The MPEG-4 framework currently

being developed focuses on a language called MSDL (MPEG-4 Syntactic Description

Language). MSDL allows applications to construct new codecs by composing more

primitive components and providing the ability to dynamically download these

components over the Internet. This philosophy is similar to that for the multimedia

APIs being developed for Sun Microsystems Java, where it will be possible to

dynamically download codec components. This trend is also seen in products from

major vendors such as Microsoft and Netscape, where they allow for multiple audio

and video codecs to be plugged into their real-time streaming solutions.

http://drogo.cselt.stet.it/mpeg/standards/mpeg-4.htm

http://drogo.cselt.stet.it/mpeg/

Scalable Video Compression Techniques

These can be sub-divided into DCT-based schemes (which include H.261, H.263,

MPEG1 and MPEG2), wavelet and sub-band schemes, fractal-based schemes and

image segmentation/region based compression schemes (MPEG4).

Subband/Wavelet Coding

The majority of scalable video codecs are based on subband coding techniques of

which the most widely used is the wavelet transform. VDONet and Vxtreme use the

Wavelet codecs. There is also a lot of work going on in research organisations looking

at the application of wavelet and subband coding techniques to scalable video codecs

- see sections 7.2, 7.3, 7.4, 7.5.

Fractal Video Coding

Various research groups [13, 14] are investigating the application of

fractal compression to scalable video. Iterated Systems have

developed a commercial product which has been implemented

within Progressive Network's RealVideo product.

Image Segmentation and Object-based Video Coding

A number of research groups are investigating the application of image segmentation

to video compression. The approaches involve extracting important subsets of the

image content of each frame and only delivering the most important e.g. object

boundaries, moving objects. Object-based coding can achieve very high data

compression rates while maintaining an acceptable visual quality in the decoded

images. However object-based coders are computationally intensive and to be viable

as a real time process, an object-based coder would need to have the image

segmentation algorithm implemented as a VLSI array.See section 7.6, the UC Davis

Image Sequence Processing Group [22] and 7.7 the Video Communication Research

Group (VCRG), Uni. of Western Australia [19], and the Bath Video Coding Group

[21]. The MPEG4 standard is directly related to this content-based scalable video codec

approach.

http://drogo.cselt.stet.it/mpeg/standards/mpeg-4.htm

http://archive.dstc.edu.au/RDU/staff/jane-hunter/video-streaming.html#ref21%23ref21





http://inls.ucsd.edu/y/Fractals/

http://www.mat.sbg.ac.at/~uhl/wav.html

Internet Transport Protocols

TCP Transmission Control Protocol

HTTP (Hypertext Transfer Protocol) uses TCP as the protocol for

reliable document transfer. If packets are delayed or damaged, TCP

will effectively stop traffic until either the original packets or backup

packets arrive. Hence it's unsuitable for video and audio because:

TCP imposes its own flow control and windowing schemes on the data stream,

effectively destroying temporal relations between video frames and audio

packets

Reliable message delivery is unnecessary for video and audio - losses are

tolerable and TCP retransmission causes further jitter and skew.

UDP

UDP (User Datagram Protocol) is the alternative to TCP. RealPlayer,

StreamWorks and VDOLive use this approach. (RealPlayer gives you

a choice of UDP or TCP, but the former is preferred.) UDP forsakes

TCP's error correction and allows packets to drop out if they're late

or damaged. When this happens, you'll hear or see a dropout, but

the stream will continue. Despite the prospect of dropouts, this

approach is arguably better for continuous media delivery. If

broadcasting live events, everyone will get the same information

simultaneously. One disadvantage to the UDP approach is that

many network firewalls block UDP information. While Progressive

Networks, Xing and VDOnet offer work-arounds for client sites

(revert to TCP), some users simply may not be able to access UDP

files.

Server or Serverless

Two major approaches are emerging for streaming multimedia

content to clients. The first is the server-less approach which uses

the standard web-server and the associated HTTP protocol to get

the multimedia data to the client. The second is the server-based

approach that uses a separate server specialized to the

video/multimedia streaming task. The specialization takes many

forms, including optimized routines for reading the huge multimedia

files from disk, the flexibility to choose any of

UDP/TCP/HTTP/Multicast protocols to deliver data, and the option to

exploit continuous contact between client and server to dynamically

optimize content delivery to the client. The primary advantages of

the server-less approach are: (i) there is one less software piece to

learn and manage, and (ii) from an economic perspective, there is

no video-server to pay for. In contrast, the server-based approach

has the advantages that it: (i) makes more efficient use of the

network bandwidth, (ii) offers better video quality to the end user,

(iii) supports advanced features like admission control and multi-

stream multimedia content, (iv) scales to support large number of

end users, and (v) protects content copyright. The tradeoffs clearly

indicate that for serious providers of streaming multimedia content

the server-based approach is the superior solution. RealPlayer,

StreamWorks and VDOnet's VDOLive require you to install their A/V

server software on your Web server computer. Among other things,

this software can tailor the quality and number of streams, and

provide detailed reports of who requested which streams. Other

programs, such as Shockwave and VivoActive, are serverless. They

don't require any special A/V server software beyond your ordinary

Web server software. With these programs, you simply link a file on

your server's hard drive from a Web page. When someone hits the

link, the file starts to download. Serverless programs are simple to

incorporate into a Web site but don't have the reporting capabilities

of server-based programs. And because they lack both stream- and

bandwidth-management features, they may be problematic if you

need to support many simultaneous streams.

Java Replayers Replacing Plugins

New solutions are appearing which use Java to eliminate the need

to download and install plugins or players. Such an approach will

become standard once the Java Media Player APIs being developed

by Sun, Silicon Graphics and Intel are available. This approach will

also ensure client platform independence. Vosaic appears to be one

of the few products with a Java replayer which supports H.263.

FireWalls

Nearly all streaming products require users behind a firewall to have

a UDP port opened for the video streams to pass through (1558 for

StreamWorks, 7000 for VDOLive, 7070 for RealAudio). Rather than

punch security holes in the firewall, Xing/StreamWorks has

developed a proxy software package you can compile and use, while

VDONet/VDOLIve and Progressive Networks/RealPlayer are

approaching leading firewall developers to get support for their

streams incorporated into upcoming products. Currently a number

of products change from UDP to HTTP or TCP when UDP can't get

through firewall restrictions. This reduces the quality of the video. In

all cases, it's still a security issue for network managers.

Commercial Real Time Video Streamers

MacroMedia's Streaming Shockwave

Shockwave for Director consists of two components. On the HTTP

server side, the Afterburner tool compresses Director movies to

make them available on the Internet. On the client side, the

Shockwave plugin lets the user incorporate Director movies into the

page layout of their HTML document. The current Shockwave plugin

is not streaming. The entire Director movie must be downloaded

before playback. The current release allows for a seperate real-time

audio stream which can be encoded at 8,16,32 or 64 kbps,

depending on the most likely bandwidth available to users.

Macromedia have just released Director 6 Multimedia Studio which

supposedly includes new Streaming Shockwave technology.

Macromedia and Progressive Networks have also announced the

integration of Shockwave Flash, a vector-based animation and

graphics system, on top of RealMedia, to enable audio and video

streaming of output from Flash.It is a serverless product which relies

on the HTTP protocol only. It isn't capable of live feeds and makes

no use of IP Multicast, so it can't scale well to support thousands of

enterprise customers while efficiently using bandwidth.

http://www.macromedia.com/shockwave/developer.html

Progressive Network's RealVideo

Progressive Networks has recently launched RealVideo, the

streaming video version of their well-known RealAudio product. Both

server and client versions have been released. In addition

Progressive Networks have released a range of video-oriented

content development tools, some their own, others developed by

third parties. Users need to install the RealServer 4.0 and the

RealPlayer Plus 4.0. It uses the RTSP protocol on top of UDP. Users

apparently have a choice of either fixed or optimized frame rate

encoding in the new RealVideo encoder. Users choose between a

number of pre-defined encoding templates which correspond to the

most appropriate audio and video formats for a given bandwidth.

"Stream thinning" detects poor or congested Internet connections

and will dynamically adjust the video frame rate in real-time. This is

presumably frame dropping. "Smart networking" automatically

delivers audio and video streams via the most efficient network

protocol. This is presumable choosing between TCP, UDP or UDP

multicast. The choice of TCP would be to deal with firewall

restrictions blocking UDP. Progressive Networks have recently

licensed in ClearVideo, a fractal-based video compression

technology from Iterated Systems (see http://www.iterated.com) to

complement their internally-developed compression methods.

RealVideo 1.0 provides two codecs RealVideo Standard (developed

by Progressive Networks) and RealVideo Fractal (using Clear Video

technology from Iterated Systems, Inc.).

Xing Technology's StreamWorks

StreamWorks streams video and audio over the WWW using

UDP/IP. Video streams can be MPEG1 while audio can be MPEG1 or

MPEG1 private data streams containing MPEG2 LBR audio. Providers

encode content at 8.5, 24, 56 or 112 kbps depending on the

bandwidth capabilities of the potential users. StreamWorks supports

a process called thinning which reduces a high-bandwidth stream so

it can be transmitted over a lower bandwidth connection. At low

bandwidths, the software maintains a continuous audio stream of 8

to 10 Kbps, and the video stream uses whatever bandwidth is left.

http://204.62.160.251/streams/info/streamwk_gen_info.html

http://www.real.com/products/realvideo/overview/index.html

The MPEG-based compression allows the software to drop frames

from the stream, creating a jerky video sequence with almost no

motion, while maintaining a smooth audio playback. The quality of

the frames that do get through is still pretty good, just not as fluid

as one would expect from real video. StreamWorks' is able to

broadcast streams to "relay servers". By using a star configuration,

it's possible to provide a video feed from a single server to regional

servers that then provide that stream to desktop clients.

StreamWorks' technology includes three components: the client

software, the server software and a video capture/encoding box

called the AVTrans encoder to compress audio and video streams.

These streams are transferred to a Unix server running the

StreamWorks server software over a TCP/IP network and, from

there, are broadcast over the network to client workstations. The

AVTrans encoder is capable of creating a range of compressed

streams, ranging from an 8.5-Kbps low bit rate format that produces

8-kilohertz mono audio on the client, to a 112-Kbps stream that

provides 44 KHz of stereo audio or 30-frames-per-second, quarter-

screen video for large bandwidth connections such as Ethernet or a

T1. Like the other products examined in this review, StreamWorks

requires you to register its mime type in your Web server's

configuration file, and you need to open a UDP port (1558) for

delivering video to client workstations. The server software can

recode the compressed streams on the fly to compensate for large

numbers of users and a limited bandwidth. The server is configured

from a text file, so you can limit the total bandwidth output, the

maximum number of simultaneous streams and the maximum bit

rate per stream. The maximum default configuration for the server

is 10 Mbps for an Ethernet connection, but that can be adjusted

depending on how your client machines are connecting--via 14.4-

Kbps modem pool, ISDN hub or 100-Mbps backbone. With a 28.8-

Kbps modem connection, the StreamWorks server drops to a much

lower frame rate of 2 to 3 frames per second, producing a jerky,

halting video image while maintaining continuous audio continuity.

Client performance is better than VDOLive.

VDONet's VDOLive

http://www.vdo.net/

VDONet claim that content providers only need one video source

which can be scaled on the fly for both high and low scale

connections. They claim to be able to deliver 10-15 fps over a 28.8

kbps modem using a proprietary video compression scheme based

in part on wavelet techniques (VDOWave). Under ideal conditions

(minimal Internet traffic, no local network overhead, minimal

overload on the VDOLive On-Demand Server): with a 14.4 kbps

modem: up to 2 to 3 frames per second with a 28.8 kbps modem:

from 8 to 12 frames per second with an ISDN line: up to 20 frames

per second. VDONet's VDOLive boasts a slightly higher frame rate

over a standard 28.8-Kbps modem than StreamWorks because it

uses a wavelet compression technology that lets it shave layers of

quality off each frame that's transmitted, rather than dropping

whole frames. This creates a stream that is smoother at low bit

rates, but of lower visual clarity and quality. VDOLive appears to be

the only commercial product which tries to estimate bandwidth and

adapt dynamically. The image quality is very poor at times but

audio is good. VDOLive includes two programs, VDO Capture which

lets you capture video streams and VDO Clip which compresses a

previously captured video stream and encodes it for delivery from a

VDO server. VDO Capture supports seven full-motion video cards

that can capture 16- or 24-bit color images at 15 frames per second

in a frame size ranging from 64-by-64 pixels to 250-by-176 pixels.

Unfortunately, existing AVI files that don't meet these criteria can't

be used unless they're converted. The VDOLive client is blunt, but

effective. Hitting the play button calls up a window for you to enter

an address for the VDOLive meta file that points to the video stream

you want to launch. There are also a few user-configurable

parameters behind this window. VDOLive is supported by some

firewall vendors. However if UDP-based video is blocked by a

firewall, VDOLive resorts to TCP-based video instead. VDONet's

codec VDOWave has been included in the codecs shipped with

Microsoft's NetShow since 1996. Microsoft hold an equity stake in

VDONet.

Vosaic

http://www.vosaic.com/

Based on research at the University of Illinois, Vosaic uses the

Video Datagram Protocol (VDP) protocol.VDP is basically an

augmented RTP. VDP improves reliability by creating two separate

channels between the client and server; one is a control channel the

two machines use to coordinate what information is being sent

across the network, and the other channel is for the streaming data.

A server would first send the client what amounts to an inventory of

the stream that is about to be broadcast. The client then uses this

list to tell the server which segments to deliver, and if a segment of

the stream is lost or delayed, the client can simply ask for that

segment be resent. The stream itself is buffered on the client side,

providing for smooth playback in most cases. VDP also uses

adaptive flow control on the server side that can adapt the packet

flow based on how well the client is doing. If the client is doing well

and receiving all the frames, the server can increase the number of

packets being sent out onto the network. If the client is having

trouble keeping up or the network is so loaded that packets are

being delayed, the server can drop packets from the stream. VDP is

designed to preserve network bandwidth in response to both

network congestion as well as client CPU load. Vosaic supports video

and audio standards including MPEG1, MPEG2, GSM audio, and

H.263. To view Vosaic's streaming videos you need the Vosaic plug-

in. It also requires you to down load both a VOSAIC client and a

server. There is a new version out based on Java which VOSAIC

MediaStudio is a JAVA-based authoring application which can

convert AVI/ASF formats and MPEG1/2 formats into bandwidth

compatible MPEG or H.263 files. The quality (target frame rate,

quantisation, MPEG frame sequence(IPBIPB)) need to be pre-set

depending on the likely connection bandwidths of your clients.

Vosaic appears to be quite similar to SuperNOVA. It uses both

feedback and a feedforward scheme to adapt to both network and

end system conditions. However it doesn't include end-to-end QoS

management with user interaction. Dynamic scaling is only frame-

dropping, within the boundaries pre-determined at capture time. It

does not support transcoding on the fly. On a T1 link your source is

MPEG while on a 28K link your source is H263. On the plus side -

they already have a 100% Java H263 player. Vosaic had a lot of

audio dropouts compared to VDOLive which maintains audio at all

costs. It delivered 8bit video only and suffered from missing blocks

due to packets being lost - a consequence of MPEG1 encoded video.

VXtreme

VXtreme consists of a number of WebTheater products: Web

Theater Client, Server, Producer, LiveStation, and Personal Edition.

VXtreme's software-only compression technology automatically

adapts the bandwidth of the video to the network connection.

VXtreme's Web Theater software uses RTP (Real Time Protocol) as

its network delivery mechanism extended to include mechanisms

for packet loss recovery. VXtreme's compression method is non-

standard. They claim it offers bandwidth scaling and software-only

capability. It is apparently not based on DCT or motion estimation

(H.261, H.263, MPEG1,2) or wavelets which they claim are compute-

intensive and require hardware-support. For the multicast case,

VXtreme uses a layered compression scheme to divide the

compressed video into multiple streams with differing priorities

(based on importance to visual quality). This layered approach

reduces jitter caused by frame dropping and delivers smoother but

lower resolution video. They have a bizarre congestion control

method which freezes both audio and video and then restarts. Their

proprietary encoding method is just as blocky as DCT-based

encoding. Microsoft has recently acquired VXtreme's codec to ship

with NetShow.

Vivoactive

The VivoActive player supports audio/video streaming of

proprietary VIV files over the web with standard HTTP connections.

VIV files are compressed (up to 250:1) files created by the

VivoActive producer. Presently, the Producer can be downloaded for

free. The plug-in works well with VIV files, but not many sites have

VIV files.The VIVO format uses H.263 video compression and G.723

audio compression. No separate video server required. Uses HTTP

rather than UDP. While Vivo acknowledged that there is some

inevitable loss in speed and quality using HTTP vs. UDP, they,

argued it is negligible, and that it is more than made up for by the

http://www.vivo.com/

http://www.vxtreme.com/

fact that HTTP, which will continue to send streams even when

packets are dropped, is more flexible and less of a bandwidth hog

than UDP. Not truly scalable - users can control how a video file is

compressed and delivered by specifying a bandwidth. You can

choose from a variety of predefined settings to optimize your video

depending on the type of content you're streaming and the network

connection of your audience (modem, ISDN, T1, LAN).lets you

customize the data rate, frame rate, output size, audio quality and

buffering parameters for your streaming video.

Microsoft's NetShow

Microsoft's NetShow expects the user to first create an ASF (Active

Movie Streaming) stream. The user has to choose from a range of

audio and video codecs depending on their bandwidth availability.

Codecs on offer include MPEG-layer3, Microsoft MPEG-4, Vivo G.723

(audio) and H.263 (video). Content can be produced using

VivoActive. It doesn't appear to offer dynamic scalability but relies

on the user to choose from a table of codecs depending on whether

they are on a 28.8Kbps modem, 56Kbps ISDN or 110Kbps Intranet

connection. NetShow will also support the Progressive Networks

RealAudio and RealVideo formats. It requires both a client (NetShow

Player) and a server (NetShow Server). There is also a set of

NetShow Content Creation Tools. It uses the UDP protocol and relies

on port 1755 to get through firewalls. A Netscape plug-in is used to

replay the video. The major limitation of NetShow is that it doesn't

support high quality video formats which would be deliverable over

high bandwidth connections. But it does deliver very good quality

video (using the latest compression standards, H.263 and MPEG4) at

low bandwidths. The advantage of NetShow is its flexibility. It

supports a range of audio and video codecs which can simply be

plugged into the NetShow architecture to provide a range of

video/audio streaming solutions. Codecs on offer include: Duck

TrueMotion RT, MPEG-3, Iterated Systems' ClearVideo, Microsoft

MPEG4, VDOnet's VDOWave, Vivo H.263, Intel H.263. In addition,

http://www.microsoft.com/netshow/codecs.html

they have just acquired Vxtreme. See

http://www.microsoft.com/netshow/codecsship.htm

Comparison of Commercial Video Streaming Products

The previous section describes the 8 major players in this field. The

best ones are those which deliver the highest quality video for a

given bandwidth i.e. lowest delay, no jitter (low frame loss), good

audio/visual synchronisation, high quality audio and image

resolution. In addition, the ability to provide the best possible video

quality over a range of networks/bandwidths without content

duplication is highly desirable. This characteristic is referred to as

scalability.

All commercial products, except ShockWave claim some form of

video scalability. Investigation reveals that often the claims of

scalability are not what they appear to be or are simply misleading.

The scalability more often than not is static and not dynamic, and

there is little user control in the visual manifestation of this

scalability.

The currently available commercial products offer two types of

scalability. Firstly, there is scalability at the encoding stage. Users

are given a range of encoding formats to choose from, which

correspond to a range of bandwidths. The limitation of this

scalability is that users need to know the bandwidth in advance.

This is inflexible - any unpredicted load cannot be handled

gracefully. Additionally, in a multi-receiver scenario the selected

bandwidth must be that of the lowest channel's capacity. This is an

unrealistic restriction and a waste of bandwidth for higher capacity

receivers. Also forcing an individual to select bandwidth assumes

some sort of technical awareness, and does not easily illustrate the

related visual quality of the selected video. Multiple formats were

not supported from a single source, but rather required the

existence of a clip in the desired format. This entails an overhead in

administration and storage of audio and video material.

Secondly, some of the products also incorporate some kind of

dynamic scalability based on the available bandwidth at the time.

Where dynamic scalability is provided it is usually simple frame

dropping. This is not ideal because it can cause jerkiness and loss of

synch. Alternatively, a layered or hierarchical compression method

can be used. Layered compression methods usually lose image

quality or resolution but maintain frame rate as the bandwidth

drops. VXtreme claims to use a layered compression method but it

only supports AVI and MOV file formats.

VOSAIC supports a variety of codecs - H.263, MPEG1 and MPEG2 - to

suit the available bandwidth which can range from 28.8Kbps to T1.

The bandwidth must be specified at encoding so that the most

appropriate codec can be selected. Limited dynamic adaption is

possible through frame dropping.

VDOLive is based on a proprietary wavelet encoding which enables

10-15fps, 1/4 screen video replay over 28.8Kbps. It scales

dynamically from 14.4Kbps modem to ISDN and Cable modems.

VivoActive offers a very simple solution for low bandwidth

connections. It doesn't require a server since it uses HTTP and it

simply uses the low bandwidth H.263 and GSM codecs to enable

embedded audio/video streaming over 28.8 Kbps modems. But it

doesn't support high quality video (MPEG1, MPEG2) over higher

bandwidths.

Progressive Network's RealVideo has recently incorporated Iterated

Systems fractal compression technology, which will improve its

ability to dynamically scale to a range of bandwidths.

The philosophy being adopted by the major vendors such as Sun,

Microsoft and Netscape is to provide the ability to dynamically

download codec components over the Internet. In the multimedia

APIs being developed for Sun Microsystems Java, it will be possible

to dynamically download codec components. This trend is also seen

in products from major vendors such as Microsoft and Netscape,

where they allow for multiple audio and video codecs to be plugged

into their real-time streaming solutions. Consequently, Microsoft's

NetShow which has been designed to allow a variety of codecs,

suited to differing applications, to be easily incorporated, offers

flexibility and support for the latest scalable video compression

techniques.

Commercial Video Servers

High-end database-driven video servers are also available from

companies like IBM, Oracle, SGI,Sun and Tektronix. These products

should be considered for large scale applications or for serving large

numbers of simultaneous streams.

SGI WebForce

IBM VideoCharger and Digital Library MediaBase

Sun MediaCenter Servers

General Conclusions

Streaming video (and audio) across networks is an effort that is attracting many

participants. This is evidenced by the eight primary commercial and thirteen research

organisations involved with this technology in various ways. A key characteristic of

both the commercial products and research demonstrators is the diversity in

technological infrastructure e.g. networks, protocols, compression standards

supported.

All the commercial video products reviewed in this report are

optimised for low bandwidth modem or ISDN connections and are

not designed to scale to higher bandwidth networks. The video

needs to be pre-encoded with the target audience in mind.

The commercial products have either adopted/developed their own

proprietary standards, embraced the currently accepted standards

(e.g. MPEG) or implemented a combination of the two. Compatibility

between the commercial products has been limited because of

these proprietary standards. However recent products such as Sun's

MediaFramework API and MicroSoft's NetShow have been designed

http://www.sun.com/products-n-solutions/hw/servers/smc_external.html

http://www.software.ibm.com/is/dig-lib/v2factsheet

http://www.sgi.com/Products/WebFORCE/Products/Mediabase/mbase.html

to enable new and various codecs to be easily incorporated into

their framework.

H.263 and MPEG-4 are going to become the defacto standards for

video delivery over low bandwidths. But broadband standards such

as MPEG-1 and MPEG-2, which are useful for many types of

broadcast and CD-ROM applications, are unsuitable for the Internet.

Although MPEG-2 has had scalability enhancements, these will not

be exploitable until the availability of reasonably priced hardware

encoders and decoders which support scalable MPEG2.

Codecs designed for the Internet require greater bandwidth

scalability, lower computational complexity, greater resilience to

network losses, and lower encode/decode latency for interactive

applications. These requirements imply codecs designed specifically

for the diversity and heterogeneity of Internet delivery. The

research on Internet codecs has broadly taken two directions. DCT

based and non-DCT based. DCT based video delivery, except for

MPEG 2, possesses no inherent scalability. To achieve adaptivity

various operations can be applied to the (semi) compressed data

stream to reduce its bit rate. Amongst these operations is

transcoding, the conversion of one compression standard to

another. The beauty of the DCT based approach is that it is

compatible with current and imminent draft compression standards.

Furthermore it allows re-use of existing audio and video archives

without explicitly re-coding them to cater for all possible formats.

Existing viewers also maintain their currency.

Non-DCT based compression techniques, e.g. layered, sub-band,

wavelet etc., are intrinsically scalable. This is their great attraction.

Unfortunately although several CODECs exist, they are still

experimental in nature and often suffer from performance problems.

In addition, existing movie libraries would need to be re-coded, by

no means a trivial task.

The research projects reviewed in this chapter broadly fall into two

categories, one group is developing scalable video CODECs mainly

using sub band coding. The other group is looking at scalable video

in the context of QoS. There is consensus in the research

community that the key to efficient delivery of continuous media

over heterogeneous networks is dynamic bandwidth adaption. Of

these groups the research carried out at Columbia both in the video-

on-demand testbed seem the most significant work in this area this

research is similar to SuperNOVA in some areas and complementary

in others.

ibm videocharger and digital library mediabase.doc

Documents