ibm videocharger and digital library mediabase.doc
DESCRIPTION
TRANSCRIPT
A Review of Video Streaming
over the Internet
Abstract :
Ideally, video and audio are streamed across the Internet from the
server to the client in response to a client request for a Web page
containing embedded videos. The client plays the incoming
multimedia stream in real time as the data is received. Quite a few
video streamers are starting to appear and many pseudo-streaming
technologies and other potential solutions are also in the pipeline.
Generally streaming video solutions may work on a closed-loop
intranet, but for mass-market Internet use, they're simply
dysfunctional. However current transport protocol, codec and
scalability research will eventually make video on the Web a
practical reality. Below we have reviewed the currently available
commercial products which purport to provide video streaming
capabilities over the Internet and outlined their limitations. Then we
describe the major research projects curently underway, which are
attempting to solve some of these limitations. Finally we compare
and evaluate the SuperNOVA project with respect to other research
projects and the current commercial products.
Introduction
For a long time now, its been very easy to download and play back
high-quality audio and video files from the Internet. Current web
browsers and servers support full-file transfer mode of document
retrieval. However, full file transfer means very long, unacceptable
transfer times and playback latency. Ideally, video and audio should
be streamed across the Internet from the server to the client in
response to a client request for a Web page containing embedded
videos. The client plays the incoming multimedia stream in real time
as the data is received. Audio streaming is becoming widely
accepted and deployed. In particular, Progressive Networks'
RealAudio has a wide following. Although streaming audio programs
are considerably further along than video, they are still nowhere
near typical computer-sound quality. The idea of streaming video
over the network has been gaining a lot of interest. The current
Internet is a best effort network and interconnects sites with widely
varying bandwidth capabililties. In the future the Internet will see
the rollout of ATM, RSVP with the ability to control Quality of
Services (QoS) and mobile networks with widely varying QoS.
Therefore it will remain a very heterogeneous network. In this report
firstly we present a brief review of the current video compression
standards, evolving standards and techniques and the internet
transport protocols being deployed. In addition, issues such as the
need for servers, plugins and firewall penetration are discussed.
There are many commerical streaming video products becoming
available as well as many research projects in this area. We then
review the currently available commercial products which purport to
provide video streaming capabilities over the Internet and out line
their current limitations. Then we describe the major research
projects currently underway, which are attempting to solve some of
these limitations. Finally we compare and evaluate the SuperNOVA
project with respect to other research projects and the cur rent
commercial products.
Video Compression Standards
The most important video codec standards for streaming video are H.261, H.263,
MJPEG, MPEG1, MPEG2 and MPEG4. A brief description of these is given below.
Compared to video codecs for CD-ROM or TV broadcast, codecs designed for the
Internet require greater scalability, lower computational complexity, greater resiliency
to network losses, and lower encode/decode latency for video conferencing. In
addition, the codecs must be tightly linked to network delivery software to achieve the
highest possible frame rates and picture quality. As one looks at the existing codec
standards, it becomes apparent that none are ideal for Internet video. In fact, it is quite
clear that over the next few years, we will see a host of new algorithms that are
specifically designed for the Internet and are thus more suitable for it. Research is
currently underway looking at both new scalable, flexible codecs and ways of scaling
existing codecs using transcoding and filters. Section 3 outlines current research in
video scalability. New algorithms specifically targeted at Internet video are being
developed. Consequently application framework standards such as H323/H.324 for
videoconferencing and MPEG4, are being designed that will easily incorporate these
new codec innovations into applications being developed today, without significant
rework.
H.261
H.261 is also known as P*64 where P is an integer number meant to represent
multiples of 64kbit/sec. H.261 was targeted at teleconferencing applications and is
intended for carrying video over ISDN - in particular for face-to-face videophone
applications and for videoconferencing. The actual encoding algorithm is similar to
(but incompatible with) that of MPEG. H.261 needs substantially less CPU power for
real-time encoding than MPEG. The algorithm includes a mechanism which optimises
bandwidth usage by trading picture quality against motion, so that a quickly-changing
picture will have a lower quality than a relatively static picture. H.261 used in this
way is thus a constant-bit-rate encoding rather than a constant-quality, variable-bit-
rate encoding.
H.263
H.263 is a draft ITU-T standard designed for low bitrate
communication. It is expected that the standard will be used for a
wide range of bitrates, not just low bitrate applications. It is
expected that H.263 will replace H.261 in many applications. The
coding algorithm of H.263 is similar to that used by H.261, however
with some improvements and changes to improve performance and
error recovery. The differences between the H.261 and H.263
coding algorithms are listed below. Half pixel precision is used for
H.263 motion compensation whereas H.261 used full pixel precision
and a loop filter. Some parts of the hierarchical structure of the
datastream are now optional, so the codec can be configured for a
lower datarate or better error recovery. There are now four optional
negotiable options included to improve performance: Unrestricted
Motion Vectors, Syntax-based arithmetic coding, Advance
prediction, and forward and backward frame prediction similar to
MPEG called P-B frames. H.263 supports five resolutions. In addition
to QCIF and CIF that were supported by H.261 there is SQCIF, 4CIF,
and 16CIF. SQCIF is approximately half the resolution of QCIF. 4CIF
and 16CIF are 4 and 16 times the resolution of CIF respectively. The
support of 4CIF and 16CIF means the codec could then compete
with other higher bitrate video coding standards such as the MPEG
standards.
MJPEG
There is really no such standard as "motion JPEG" or "MJPEG" for
video. Various vendors have applied JPEG to individual frames of a
video sequence, and have called the result "M-JPEG". JPEG is
designed for compressing either full-color or gray-scale images of
natural, real-world scenes. It works well on photographs, naturalistic
artwork, and similar material; not so well on lettering, simple
cartoons, or line drawings. JPEG is a lossy compression algorithm
which uses DCT-based encoding. JPEG can typically achieve 10:1 to
20:1 compression without visible loss, 30:1 to 50:1 compression is
possible with small to moderate defects, while for very-low-quality
purposes such as previews or archive indexes, 100:1 compression is
quite feasible. Non-linear video editors are typically used in
broadcast TV, commercial post production, and high-end corporate
media departments. Low bitrate MPEG-1 quality is unacceptable to
these customers, and it is difficult to edit video sequences that use
inter-frame compression. Consequently, non-linear editors (e.g.,
AVID, Matrox, FAST, etc.) will continue to use motion JPEG with low
compression factors (e.g., 6:1 to 10:1).
MPEG-1
MPEG 1, 2 and 4 are currently accepted, draft and developing standards respectively,
for the bandwidth efficient transmission of video and audio. The MPEG-1 codec
targets a bandwidth of 1-1.5 Mbps offering VHS quality video at CIF (352x288)
resolution and 30 frames per second. MPEG-1 requires expensive hardware for real-
time encoding. While decoding can be done in software, most implementations
consume a large fraction of a high-end processor. MPEG-1 does not offer resolution
scalability and the video quality is highly susceptible to packet losses, due to the
dependencies present in the P (predicted) and B (bi-directionally predicted) frames.
The B-frames also introduce latency in the encode process, since encoding frame N
needs access to frame N+k, making it less suitable for video conferencing.
MPEG-2
MPEG 2 extends MPEG 1 by including support for higher resolution video and
increased audio capabilities. The targeted bit rate for MPEG 2 is 4-15Mbits/s,
providing broadcast quality full-screen video. The MPEG 2 draft standard does cater
for scalability. Three (3) types of scalability; Signal-to-Noise Ratio (SNR), Spatial and
Temporal, and one extension (that can be used to implement scalability) Data
Partitioning, have been defined. Compared with MPEG-1, it requires even more
expensive hardware to encode and decode. It is also prone to poor video quality in the
presence of losses, for the same reasons as MPEG-1. Both MPEG-1 and MPEG-2 are
well suited to the purposes for which they were developed. For example, MPEG-1
works very well for playback from CD-ROM, and MPEG-2 is great for high-quality
archiving applications and for TV broadcast applications. In the case of satellite
broadcasts, MPEG-2 allows >5 digital channels to be encoded using the same
bandwidth as used by a single analog channel today, without sacrificing video quality.
Given this major advantage, the large encoding costs are really not a factor. However,
for existing computer and Internet infrastructures, MPEG-based solutions are too
expensive and require too much bandwidth; they were not designed with the Internet
in mind.
MPEG-4
The intention of MPEG 4 is to provide a compression scheme suitable for video
conferencing, i.e. data rates less 64Kbits/s. MPEG4 will be based on the segmentation
of audiovisual scenes into AVOs or "audio/visual objects" which can be multiplexed
for transmission over heterogeneous networks. The MPEG-4 framework currently
being developed focuses on a language called MSDL (MPEG-4 Syntactic Description
Language). MSDL allows applications to construct new codecs by composing more
primitive components and providing the ability to dynamically download these
components over the Internet. This philosophy is similar to that for the multimedia
APIs being developed for Sun Microsystems Java, where it will be possible to
dynamically download codec components. This trend is also seen in products from
major vendors such as Microsoft and Netscape, where they allow for multiple audio
and video codecs to be plugged into their real-time streaming solutions.
Scalable Video Compression Techniques
These can be sub-divided into DCT-based schemes (which include H.261, H.263,
MPEG1 and MPEG2), wavelet and sub-band schemes, fractal-based schemes and
image segmentation/region based compression schemes (MPEG4).
Subband/Wavelet Coding
The majority of scalable video codecs are based on subband coding techniques of
which the most widely used is the wavelet transform. VDONet and Vxtreme use the
Wavelet codecs. There is also a lot of work going on in research organisations looking
at the application of wavelet and subband coding techniques to scalable video codecs
- see sections 7.2, 7.3, 7.4, 7.5.
Fractal Video Coding
Various research groups [13, 14] are investigating the application of
fractal compression to scalable video. Iterated Systems have
developed a commercial product which has been implemented
within Progressive Network's RealVideo product.
Image Segmentation and Object-based Video Coding
A number of research groups are investigating the application of image segmentation
to video compression. The approaches involve extracting important subsets of the
image content of each frame and only delivering the most important e.g. object
boundaries, moving objects. Object-based coding can achieve very high data
compression rates while maintaining an acceptable visual quality in the decoded
images. However object-based coders are computationally intensive and to be viable
as a real time process, an object-based coder would need to have the image
segmentation algorithm implemented as a VLSI array.See section 7.6, the UC Davis
Image Sequence Processing Group [22] and 7.7 the Video Communication Research
Group (VCRG), Uni. of Western Australia [19], and the Bath Video Coding Group
[21]. The MPEG4 standard is directly related to this content-based scalable video codec
approach.
Internet Transport Protocols
TCP Transmission Control Protocol
HTTP (Hypertext Transfer Protocol) uses TCP as the protocol for
reliable document transfer. If packets are delayed or damaged, TCP
will effectively stop traffic until either the original packets or backup
packets arrive. Hence it's unsuitable for video and audio because:
TCP imposes its own flow control and windowing schemes on the data stream,
effectively destroying temporal relations between video frames and audio
packets
Reliable message delivery is unnecessary for video and audio - losses are
tolerable and TCP retransmission causes further jitter and skew.
UDP
UDP (User Datagram Protocol) is the alternative to TCP. RealPlayer,
StreamWorks and VDOLive use this approach. (RealPlayer gives you
a choice of UDP or TCP, but the former is preferred.) UDP forsakes
TCP's error correction and allows packets to drop out if they're late
or damaged. When this happens, you'll hear or see a dropout, but
the stream will continue. Despite the prospect of dropouts, this
approach is arguably better for continuous media delivery. If
broadcasting live events, everyone will get the same information
simultaneously. One disadvantage to the UDP approach is that
many network firewalls block UDP information. While Progressive
Networks, Xing and VDOnet offer work-arounds for client sites
(revert to TCP), some users simply may not be able to access UDP
files.
Server or Serverless
Two major approaches are emerging for streaming multimedia
content to clients. The first is the server-less approach which uses
the standard web-server and the associated HTTP protocol to get
the multimedia data to the client. The second is the server-based
approach that uses a separate server specialized to the
video/multimedia streaming task. The specialization takes many
forms, including optimized routines for reading the huge multimedia
files from disk, the flexibility to choose any of
UDP/TCP/HTTP/Multicast protocols to deliver data, and the option to
exploit continuous contact between client and server to dynamically
optimize content delivery to the client. The primary advantages of
the server-less approach are: (i) there is one less software piece to
learn and manage, and (ii) from an economic perspective, there is
no video-server to pay for. In contrast, the server-based approach
has the advantages that it: (i) makes more efficient use of the
network bandwidth, (ii) offers better video quality to the end user,
(iii) supports advanced features like admission control and multi-
stream multimedia content, (iv) scales to support large number of
end users, and (v) protects content copyright. The tradeoffs clearly
indicate that for serious providers of streaming multimedia content
the server-based approach is the superior solution. RealPlayer,
StreamWorks and VDOnet's VDOLive require you to install their A/V
server software on your Web server computer. Among other things,
this software can tailor the quality and number of streams, and
provide detailed reports of who requested which streams. Other
programs, such as Shockwave and VivoActive, are serverless. They
don't require any special A/V server software beyond your ordinary
Web server software. With these programs, you simply link a file on
your server's hard drive from a Web page. When someone hits the
link, the file starts to download. Serverless programs are simple to
incorporate into a Web site but don't have the reporting capabilities
of server-based programs. And because they lack both stream- and
bandwidth-management features, they may be problematic if you
need to support many simultaneous streams.
Java Replayers Replacing Plugins
New solutions are appearing which use Java to eliminate the need
to download and install plugins or players. Such an approach will
become standard once the Java Media Player APIs being developed
by Sun, Silicon Graphics and Intel are available. This approach will
also ensure client platform independence. Vosaic appears to be one
of the few products with a Java replayer which supports H.263.
FireWalls
Nearly all streaming products require users behind a firewall to have
a UDP port opened for the video streams to pass through (1558 for
StreamWorks, 7000 for VDOLive, 7070 for RealAudio). Rather than
punch security holes in the firewall, Xing/StreamWorks has
developed a proxy software package you can compile and use, while
VDONet/VDOLIve and Progressive Networks/RealPlayer are
approaching leading firewall developers to get support for their
streams incorporated into upcoming products. Currently a number
of products change from UDP to HTTP or TCP when UDP can't get
through firewall restrictions. This reduces the quality of the video. In
all cases, it's still a security issue for network managers.
Commercial Real Time Video Streamers
MacroMedia's Streaming Shockwave
Shockwave for Director consists of two components. On the HTTP
server side, the Afterburner tool compresses Director movies to
make them available on the Internet. On the client side, the
Shockwave plugin lets the user incorporate Director movies into the
page layout of their HTML document. The current Shockwave plugin
is not streaming. The entire Director movie must be downloaded
before playback. The current release allows for a seperate real-time
audio stream which can be encoded at 8,16,32 or 64 kbps,
depending on the most likely bandwidth available to users.
Macromedia have just released Director 6 Multimedia Studio which
supposedly includes new Streaming Shockwave technology.
Macromedia and Progressive Networks have also announced the
integration of Shockwave Flash, a vector-based animation and
graphics system, on top of RealMedia, to enable audio and video
streaming of output from Flash.It is a serverless product which relies
on the HTTP protocol only. It isn't capable of live feeds and makes
no use of IP Multicast, so it can't scale well to support thousands of
enterprise customers while efficiently using bandwidth.
Progressive Network's RealVideo
Progressive Networks has recently launched RealVideo, the
streaming video version of their well-known RealAudio product. Both
server and client versions have been released. In addition
Progressive Networks have released a range of video-oriented
content development tools, some their own, others developed by
third parties. Users need to install the RealServer 4.0 and the
RealPlayer Plus 4.0. It uses the RTSP protocol on top of UDP. Users
apparently have a choice of either fixed or optimized frame rate
encoding in the new RealVideo encoder. Users choose between a
number of pre-defined encoding templates which correspond to the
most appropriate audio and video formats for a given bandwidth.
"Stream thinning" detects poor or congested Internet connections
and will dynamically adjust the video frame rate in real-time. This is
presumably frame dropping. "Smart networking" automatically
delivers audio and video streams via the most efficient network
protocol. This is presumable choosing between TCP, UDP or UDP
multicast. The choice of TCP would be to deal with firewall
restrictions blocking UDP. Progressive Networks have recently
licensed in ClearVideo, a fractal-based video compression
technology from Iterated Systems (see http://www.iterated.com) to
complement their internally-developed compression methods.
RealVideo 1.0 provides two codecs RealVideo Standard (developed
by Progressive Networks) and RealVideo Fractal (using Clear Video
technology from Iterated Systems, Inc.).
Xing Technology's StreamWorks
StreamWorks streams video and audio over the WWW using
UDP/IP. Video streams can be MPEG1 while audio can be MPEG1 or
MPEG1 private data streams containing MPEG2 LBR audio. Providers
encode content at 8.5, 24, 56 or 112 kbps depending on the
bandwidth capabilities of the potential users. StreamWorks supports
a process called thinning which reduces a high-bandwidth stream so
it can be transmitted over a lower bandwidth connection. At low
bandwidths, the software maintains a continuous audio stream of 8
to 10 Kbps, and the video stream uses whatever bandwidth is left.
The MPEG-based compression allows the software to drop frames
from the stream, creating a jerky video sequence with almost no
motion, while maintaining a smooth audio playback. The quality of
the frames that do get through is still pretty good, just not as fluid
as one would expect from real video. StreamWorks' is able to
broadcast streams to "relay servers". By using a star configuration,
it's possible to provide a video feed from a single server to regional
servers that then provide that stream to desktop clients.
StreamWorks' technology includes three components: the client
software, the server software and a video capture/encoding box
called the AVTrans encoder to compress audio and video streams.
These streams are transferred to a Unix server running the
StreamWorks server software over a TCP/IP network and, from
there, are broadcast over the network to client workstations. The
AVTrans encoder is capable of creating a range of compressed
streams, ranging from an 8.5-Kbps low bit rate format that produces
8-kilohertz mono audio on the client, to a 112-Kbps stream that
provides 44 KHz of stereo audio or 30-frames-per-second, quarter-
screen video for large bandwidth connections such as Ethernet or a
T1. Like the other products examined in this review, StreamWorks
requires you to register its mime type in your Web server's
configuration file, and you need to open a UDP port (1558) for
delivering video to client workstations. The server software can
recode the compressed streams on the fly to compensate for large
numbers of users and a limited bandwidth. The server is configured
from a text file, so you can limit the total bandwidth output, the
maximum number of simultaneous streams and the maximum bit
rate per stream. The maximum default configuration for the server
is 10 Mbps for an Ethernet connection, but that can be adjusted
depending on how your client machines are connecting--via 14.4-
Kbps modem pool, ISDN hub or 100-Mbps backbone. With a 28.8-
Kbps modem connection, the StreamWorks server drops to a much
lower frame rate of 2 to 3 frames per second, producing a jerky,
halting video image while maintaining continuous audio continuity.
Client performance is better than VDOLive.
VDONet's VDOLive
VDONet claim that content providers only need one video source
which can be scaled on the fly for both high and low scale
connections. They claim to be able to deliver 10-15 fps over a 28.8
kbps modem using a proprietary video compression scheme based
in part on wavelet techniques (VDOWave). Under ideal conditions
(minimal Internet traffic, no local network overhead, minimal
overload on the VDOLive On-Demand Server): with a 14.4 kbps
modem: up to 2 to 3 frames per second with a 28.8 kbps modem:
from 8 to 12 frames per second with an ISDN line: up to 20 frames
per second. VDONet's VDOLive boasts a slightly higher frame rate
over a standard 28.8-Kbps modem than StreamWorks because it
uses a wavelet compression technology that lets it shave layers of
quality off each frame that's transmitted, rather than dropping
whole frames. This creates a stream that is smoother at low bit
rates, but of lower visual clarity and quality. VDOLive appears to be
the only commercial product which tries to estimate bandwidth and
adapt dynamically. The image quality is very poor at times but
audio is good. VDOLive includes two programs, VDO Capture which
lets you capture video streams and VDO Clip which compresses a
previously captured video stream and encodes it for delivery from a
VDO server. VDO Capture supports seven full-motion video cards
that can capture 16- or 24-bit color images at 15 frames per second
in a frame size ranging from 64-by-64 pixels to 250-by-176 pixels.
Unfortunately, existing AVI files that don't meet these criteria can't
be used unless they're converted. The VDOLive client is blunt, but
effective. Hitting the play button calls up a window for you to enter
an address for the VDOLive meta file that points to the video stream
you want to launch. There are also a few user-configurable
parameters behind this window. VDOLive is supported by some
firewall vendors. However if UDP-based video is blocked by a
firewall, VDOLive resorts to TCP-based video instead. VDONet's
codec VDOWave has been included in the codecs shipped with
Microsoft's NetShow since 1996. Microsoft hold an equity stake in
VDONet.
Vosaic
Based on research at the University of Illinois, Vosaic uses the
Video Datagram Protocol (VDP) protocol.VDP is basically an
augmented RTP. VDP improves reliability by creating two separate
channels between the client and server; one is a control channel the
two machines use to coordinate what information is being sent
across the network, and the other channel is for the streaming data.
A server would first send the client what amounts to an inventory of
the stream that is about to be broadcast. The client then uses this
list to tell the server which segments to deliver, and if a segment of
the stream is lost or delayed, the client can simply ask for that
segment be resent. The stream itself is buffered on the client side,
providing for smooth playback in most cases. VDP also uses
adaptive flow control on the server side that can adapt the packet
flow based on how well the client is doing. If the client is doing well
and receiving all the frames, the server can increase the number of
packets being sent out onto the network. If the client is having
trouble keeping up or the network is so loaded that packets are
being delayed, the server can drop packets from the stream. VDP is
designed to preserve network bandwidth in response to both
network congestion as well as client CPU load. Vosaic supports video
and audio standards including MPEG1, MPEG2, GSM audio, and
H.263. To view Vosaic's streaming videos you need the Vosaic plug-
in. It also requires you to down load both a VOSAIC client and a
server. There is a new version out based on Java which VOSAIC
MediaStudio is a JAVA-based authoring application which can
convert AVI/ASF formats and MPEG1/2 formats into bandwidth
compatible MPEG or H.263 files. The quality (target frame rate,
quantisation, MPEG frame sequence(IPBIPB)) need to be pre-set
depending on the likely connection bandwidths of your clients.
Vosaic appears to be quite similar to SuperNOVA. It uses both
feedback and a feedforward scheme to adapt to both network and
end system conditions. However it doesn't include end-to-end QoS
management with user interaction. Dynamic scaling is only frame-
dropping, within the boundaries pre-determined at capture time. It
does not support transcoding on the fly. On a T1 link your source is
MPEG while on a 28K link your source is H263. On the plus side -
they already have a 100% Java H263 player. Vosaic had a lot of
audio dropouts compared to VDOLive which maintains audio at all
costs. It delivered 8bit video only and suffered from missing blocks
due to packets being lost - a consequence of MPEG1 encoded video.
VXtreme
VXtreme consists of a number of WebTheater products: Web
Theater Client, Server, Producer, LiveStation, and Personal Edition.
VXtreme's software-only compression technology automatically
adapts the bandwidth of the video to the network connection.
VXtreme's Web Theater software uses RTP (Real Time Protocol) as
its network delivery mechanism extended to include mechanisms
for packet loss recovery. VXtreme's compression method is non-
standard. They claim it offers bandwidth scaling and software-only
capability. It is apparently not based on DCT or motion estimation
(H.261, H.263, MPEG1,2) or wavelets which they claim are compute-
intensive and require hardware-support. For the multicast case,
VXtreme uses a layered compression scheme to divide the
compressed video into multiple streams with differing priorities
(based on importance to visual quality). This layered approach
reduces jitter caused by frame dropping and delivers smoother but
lower resolution video. They have a bizarre congestion control
method which freezes both audio and video and then restarts. Their
proprietary encoding method is just as blocky as DCT-based
encoding. Microsoft has recently acquired VXtreme's codec to ship
with NetShow.
Vivoactive
The VivoActive player supports audio/video streaming of
proprietary VIV files over the web with standard HTTP connections.
VIV files are compressed (up to 250:1) files created by the
VivoActive producer. Presently, the Producer can be downloaded for
free. The plug-in works well with VIV files, but not many sites have
VIV files.The VIVO format uses H.263 video compression and G.723
audio compression. No separate video server required. Uses HTTP
rather than UDP. While Vivo acknowledged that there is some
inevitable loss in speed and quality using HTTP vs. UDP, they,
argued it is negligible, and that it is more than made up for by the
fact that HTTP, which will continue to send streams even when
packets are dropped, is more flexible and less of a bandwidth hog
than UDP. Not truly scalable - users can control how a video file is
compressed and delivered by specifying a bandwidth. You can
choose from a variety of predefined settings to optimize your video
depending on the type of content you're streaming and the network
connection of your audience (modem, ISDN, T1, LAN).lets you
customize the data rate, frame rate, output size, audio quality and
buffering parameters for your streaming video.
Microsoft's NetShow
Microsoft's NetShow expects the user to first create an ASF (Active
Movie Streaming) stream. The user has to choose from a range of
audio and video codecs depending on their bandwidth availability.
Codecs on offer include MPEG-layer3, Microsoft MPEG-4, Vivo G.723
(audio) and H.263 (video). Content can be produced using
VivoActive. It doesn't appear to offer dynamic scalability but relies
on the user to choose from a table of codecs depending on whether
they are on a 28.8Kbps modem, 56Kbps ISDN or 110Kbps Intranet
connection. NetShow will also support the Progressive Networks
RealAudio and RealVideo formats. It requires both a client (NetShow
Player) and a server (NetShow Server). There is also a set of
NetShow Content Creation Tools. It uses the UDP protocol and relies
on port 1755 to get through firewalls. A Netscape plug-in is used to
replay the video. The major limitation of NetShow is that it doesn't
support high quality video formats which would be deliverable over
high bandwidth connections. But it does deliver very good quality
video (using the latest compression standards, H.263 and MPEG4) at
low bandwidths. The advantage of NetShow is its flexibility. It
supports a range of audio and video codecs which can simply be
plugged into the NetShow architecture to provide a range of
video/audio streaming solutions. Codecs on offer include: Duck
TrueMotion RT, MPEG-3, Iterated Systems' ClearVideo, Microsoft
MPEG4, VDOnet's VDOWave, Vivo H.263, Intel H.263. In addition,
they have just acquired Vxtreme. See
http://www.microsoft.com/netshow/codecsship.htm
Comparison of Commercial Video Streaming Products
The previous section describes the 8 major players in this field. The
best ones are those which deliver the highest quality video for a
given bandwidth i.e. lowest delay, no jitter (low frame loss), good
audio/visual synchronisation, high quality audio and image
resolution. In addition, the ability to provide the best possible video
quality over a range of networks/bandwidths without content
duplication is highly desirable. This characteristic is referred to as
scalability.
All commercial products, except ShockWave claim some form of
video scalability. Investigation reveals that often the claims of
scalability are not what they appear to be or are simply misleading.
The scalability more often than not is static and not dynamic, and
there is little user control in the visual manifestation of this
scalability.
The currently available commercial products offer two types of
scalability. Firstly, there is scalability at the encoding stage. Users
are given a range of encoding formats to choose from, which
correspond to a range of bandwidths. The limitation of this
scalability is that users need to know the bandwidth in advance.
This is inflexible - any unpredicted load cannot be handled
gracefully. Additionally, in a multi-receiver scenario the selected
bandwidth must be that of the lowest channel's capacity. This is an
unrealistic restriction and a waste of bandwidth for higher capacity
receivers. Also forcing an individual to select bandwidth assumes
some sort of technical awareness, and does not easily illustrate the
related visual quality of the selected video. Multiple formats were
not supported from a single source, but rather required the
existence of a clip in the desired format. This entails an overhead in
administration and storage of audio and video material.
Secondly, some of the products also incorporate some kind of
dynamic scalability based on the available bandwidth at the time.
Where dynamic scalability is provided it is usually simple frame
dropping. This is not ideal because it can cause jerkiness and loss of
synch. Alternatively, a layered or hierarchical compression method
can be used. Layered compression methods usually lose image
quality or resolution but maintain frame rate as the bandwidth
drops. VXtreme claims to use a layered compression method but it
only supports AVI and MOV file formats.
VOSAIC supports a variety of codecs - H.263, MPEG1 and MPEG2 - to
suit the available bandwidth which can range from 28.8Kbps to T1.
The bandwidth must be specified at encoding so that the most
appropriate codec can be selected. Limited dynamic adaption is
possible through frame dropping.
VDOLive is based on a proprietary wavelet encoding which enables
10-15fps, 1/4 screen video replay over 28.8Kbps. It scales
dynamically from 14.4Kbps modem to ISDN and Cable modems.
VivoActive offers a very simple solution for low bandwidth
connections. It doesn't require a server since it uses HTTP and it
simply uses the low bandwidth H.263 and GSM codecs to enable
embedded audio/video streaming over 28.8 Kbps modems. But it
doesn't support high quality video (MPEG1, MPEG2) over higher
bandwidths.
Progressive Network's RealVideo has recently incorporated Iterated
Systems fractal compression technology, which will improve its
ability to dynamically scale to a range of bandwidths.
The philosophy being adopted by the major vendors such as Sun,
Microsoft and Netscape is to provide the ability to dynamically
download codec components over the Internet. In the multimedia
APIs being developed for Sun Microsystems Java, it will be possible
to dynamically download codec components. This trend is also seen
in products from major vendors such as Microsoft and Netscape,
where they allow for multiple audio and video codecs to be plugged
into their real-time streaming solutions. Consequently, Microsoft's
NetShow which has been designed to allow a variety of codecs,
suited to differing applications, to be easily incorporated, offers
flexibility and support for the latest scalable video compression
techniques.
Commercial Video Servers
High-end database-driven video servers are also available from
companies like IBM, Oracle, SGI,Sun and Tektronix. These products
should be considered for large scale applications or for serving large
numbers of simultaneous streams.
SGI WebForce
IBM VideoCharger and Digital Library MediaBase
Sun MediaCenter Servers
General Conclusions
Streaming video (and audio) across networks is an effort that is attracting many
participants. This is evidenced by the eight primary commercial and thirteen research
organisations involved with this technology in various ways. A key characteristic of
both the commercial products and research demonstrators is the diversity in
technological infrastructure e.g. networks, protocols, compression standards
supported.
All the commercial video products reviewed in this report are
optimised for low bandwidth modem or ISDN connections and are
not designed to scale to higher bandwidth networks. The video
needs to be pre-encoded with the target audience in mind.
The commercial products have either adopted/developed their own
proprietary standards, embraced the currently accepted standards
(e.g. MPEG) or implemented a combination of the two. Compatibility
between the commercial products has been limited because of
these proprietary standards. However recent products such as Sun's
MediaFramework API and MicroSoft's NetShow have been designed
to enable new and various codecs to be easily incorporated into
their framework.
H.263 and MPEG-4 are going to become the defacto standards for
video delivery over low bandwidths. But broadband standards such
as MPEG-1 and MPEG-2, which are useful for many types of
broadcast and CD-ROM applications, are unsuitable for the Internet.
Although MPEG-2 has had scalability enhancements, these will not
be exploitable until the availability of reasonably priced hardware
encoders and decoders which support scalable MPEG2.
Codecs designed for the Internet require greater bandwidth
scalability, lower computational complexity, greater resilience to
network losses, and lower encode/decode latency for interactive
applications. These requirements imply codecs designed specifically
for the diversity and heterogeneity of Internet delivery. The
research on Internet codecs has broadly taken two directions. DCT
based and non-DCT based. DCT based video delivery, except for
MPEG 2, possesses no inherent scalability. To achieve adaptivity
various operations can be applied to the (semi) compressed data
stream to reduce its bit rate. Amongst these operations is
transcoding, the conversion of one compression standard to
another. The beauty of the DCT based approach is that it is
compatible with current and imminent draft compression standards.
Furthermore it allows re-use of existing audio and video archives
without explicitly re-coding them to cater for all possible formats.
Existing viewers also maintain their currency.
Non-DCT based compression techniques, e.g. layered, sub-band,
wavelet etc., are intrinsically scalable. This is their great attraction.
Unfortunately although several CODECs exist, they are still
experimental in nature and often suffer from performance problems.
In addition, existing movie libraries would need to be re-coded, by
no means a trivial task.
The research projects reviewed in this chapter broadly fall into two
categories, one group is developing scalable video CODECs mainly
using sub band coding. The other group is looking at scalable video
in the context of QoS. There is consensus in the research
community that the key to efficient delivery of continuous media
over heterogeneous networks is dynamic bandwidth adaption. Of
these groups the research carried out at Columbia both in the video-
on-demand testbed seem the most significant work in this area this
research is similar to SuperNOVA in some areas and complementary
in others.