ee 6850 lecture #10sfchang/course/vis/slide/lecture10-part1-handout.pdf · ee 6850, f'02,...
TRANSCRIPT
EE 6850, F'02, Chang, Columbia U.
EE 6850 Lecture #10 (Nov. 13, 2002)
� Part I:Content Adaptation for Ubiquitous Media Access� On-demand dynamic distillation� Scalable delivery� Content-adaptive streaming and messaging
� References� A. Fox, E. Brewer, S. Gribble, and E. Amir, “Adapting to Network and Client Variability via On-Demand
Dynamic Distillation,” ASPLOS, 1996. � J. R. Smith, R. Mohan, and C.-S. Li, “Scalable Multimedia Delivery for Pervasive Computing,” ACM
Multimedia Conference, Orlando, FL, Oct. 1999.� T.-L. Pham, G. Schneider and S. Goose, A situated computing framework for mobile and ubiquitous
multimedia access using small screen and composite devices,“ ACM Multimedia Conference, Oct. 2000, Los Angeles, CA.
� S.-F. Chang, D. Zhong, and R. Kumar, Real-Time Content-Based Adaptive Streaming of Sports Video, Columbia University ADVENT Technical Report #121, July 2001. Also IEEE Workshop on Content-Based Access to Video/Image Library, Hawaii, Dec. 2001.
� H. Sundaram and S.-F Chang, Condensing Computable Scenes using Visual Complexity and Film Syntax Analysis, IEEE Conference on Multimedia and Exhibition, Tokyo, Japan, Aug. 22-25, 2001.
� A. Jaimes, T. Echigo, M. Teraguchi, and F. Satoh, “LEARNING PERSONALIZED VIDEO HIGHLIGHTS FROM DETAILED MPEG-7 METADATA,” IEEE Intern. Conf. On Image Processing, NY, 2002.s
� S.F. Chang, “Optimal Video Adaptation and Skimming Using a Utility-Based Framework, TyrrhenianInternational Workshop on Digital Communications (IWDC-2002), Capri Island, Italy, Sept. 2002.
EE 6850, F'02, Chang, Columbia U.
The Need for Video Adaptation
� Heterogeneous users, networks, and terminals� Content analysis to assist video adaptation decision
Broa dca stCont ent
EE 6850, F'02, Chang, Columbia U.
Adaptation in 3-tier architecture
Metadata:• Transcoding
hints• Content
adaptability
adaptationtool library
resourceabstraction
& monitoring
proxy/gateway
server
videouser
terminalnetwork
optimal adaptation
EE 6850, F'02, Chang, Columbia U.
Need for a Systematic Framework
task
device
active (e.g. search)passive (e.g. watching tv)
+
• what are allowable adaptations?• how much resource is available?• how to model and optimize quality?
UI resources
cpu speed
bandwidth
audio?
EE 6850, F'02, Chang, Columbia U.
Active Proxy
� Active proxy dynamic distillers [Fox, Brewer, et al 96]� Datatype-specific
distillation� intermediate proxy – low
end-to-end latency� network/application
interface separation� Address different data
types: image, text, and video (global level)
Client variability (‘96 data)
Constraints: bandwidth, latency, data size, spatial-temporal size, color depth, audio
EE 6850, F'02, Chang, Columbia U.
Datatype and adaptation axes
Distillation latency of Images
Variations:network
Variations:display
Variations:software
EE 6850, F'02, Chang, Columbia U.
Effect on end-to-end latency
Text Datatype (PDF, text, HTML)
EE 6850, F'02, Chang, Columbia U.
Video
EE 6850, F'02, Chang, Columbia U.
IBM InfoPyramid
� InfoPyramid, Universal Tuner [Smith, Li, Mohand 99]� Content adaptation with various
fidelity and modality� Translation and transcoding
� Transcoding focused on images� Allowable operators
� Resolution, bit rate, color depth, modality substitute
� Each IP object is associated with a (rate, quality) descriptor
• Did not Address rich multi-level elements and structures in video• Did not address rapidly changing User/network conditions
EE 6850, F'02, Chang, Columbia U.
Image transcoding methods
EE 6850, F'02, Chang, Columbia U.
Detecting image purpose
A General Utility-Based Framework (Chang’02)
Entitye
a1
a2
a3
Adaptation Space
r1 r2
r3 Resource Space
u1
u2
u3
Utility Space
CR
� Entity: � A set of video data with consistency constraints on certain attributes� Basic data unit undergoing adaptation� Examples:
� Program, Shot, Scene� Frame, Object, Region� Syntactic or semantic elements, e.g., anchors, scoring segments� Synchronous multimedia entities, e.g., dialog, talking face, explosion
EE 6850, F'02, Chang, Columbia U.
Multiple Degrees of Freedom for Adaptation
� Signal level� change of bit rate, frame rate, resolution, color
depth, SNR
� Time� Condensation by uniform time scaling,
content-based filtering (selection/dropping)
� Modality conversion� Key frame shows, video posters, spatial
summaries
EE 6850, F'02, Chang, Columbia U.
Adaptation-Resource-Utility Spaces
� Adaptation Space� Temporal
� Frame rate reduction� Time condensation� Entity removal
� Spatial� Resolution� SNR, color depth� Object
� Modality conversion
� Bandwidth� Computation� Memory
� Resource Space� Power� Display� Viewer time
� Utility Space� Objective (SNR)� Subjective Scores� Information
Comprehension
EE 6850, F'02, Chang, Columbia U.
Resource Constrained Utility Maximization
� Given a content entity e, and resource constraints (CR), find the optimal adaptation operation aopt within the search space to maximize the utility
� aopt = arg MinA U(e, a), s.t. R < CR
� Similar to the rate-distortion approach used in video coding
� Can be extended to a utility-constrained resource minimization formulation
EE 6850, F'02, Chang, Columbia U.
Key Issues
� Given an entity, what are allowable adaptations in a specific environment?� frames droppable from MPEG, shots removable in a
scene� Ambiguity in Specifying Adaptation Operations, e.g.,
� Drop 10% of transform coefficients� Uniform allocation vs. R/D optimization
� Options� Standard compliant scalable adaptations� Empirical variation bounds among implementations� Describe rankings among competing operations, not their
utility values
EE 6850, F'02, Chang, Columbia U.
Key Issues� Utility measure and modeling
� How to measure and combine resource and utility of different modalities and levels� Power, bandwidth, display, user time;
Audio, video, signal, subjective, comprehension� Signal level vs. comprehension level
� Relationships with user tasks and preferences� Representation schemes for ARU spaces and relationships
� Sample the permissible adaptation region and describe the associated R and U� E.g., binary selection from N shots � N(N-1) points� E.g., 4 frame rates, 2 resolutions � 8 adaptation points
� Given resource constraints, sample the permissible adaptation region� Divide adaptation space to separate dimensions
� Lose correlation among dimensions
EE 6850, F'02, Chang, Columbia U.
Example 1: Video Skim Generation(with H. Sundaram 2001)
1. How much time is available? (CR)
2. What are allowable operations? (A)3. How will operations alter aesthetic affects,
perceptual quality, and semantic understanding? (U)
4. A Resource Constrained Utility Maximization Problem
Generalized utility framework for video skimming
dropped frames
Original scene � condensed clip (video skims)
Original-1 30% Skim
Original-news 17% Skim
EE 6850, F'02, Chang, Columbia U.
The entities preserved in this case
� Video shot (duration altered)� The fundamental video entity; try to maximize the
coherence of each retained video shot� Segment beginnings (SBEG’s) significant phrase
� This is an element of the speech discourse� Synchronous multimedia segments
� Ensures maximum skim coherence� Elements of visual syntax
� dialogs, regular anchors, shot phrases� Film rhythm
� Preserves the “pace” of the film
EE 6850, F'02, Chang, Columbia U.
How to Construct Computable Utility Model?
� How much time is required for generic comprehension (who, what, where, when)?
� Is comprehension time related to the computable spatio-temporal complexity of the shot ?
� Explore Viewer Perceptual Model from Film Theory� The presence of detail robs a shot of its screen time [Sharff 1982].
(a) (b)
EE 6850, F'02, Chang, Columbia U.
Measuring Comprehension Time
� Represent the shot by its key-frame� A shot is selected at random [3600 shots]� The subject was asked to correctly answer four
questions in minimum time: � Who ?� What ?� When ?� Where ?
“Why” was not asked
EE 6850, F'02, Chang, Columbia U.
Utility Function
complexity →
time →
0
02.
54.
5
1
reduce to upper boundoriginal
shot Ub
LbUtility function between the bounds
t: duration, c: complexity: selection indicator sequence
( ) 2 .4 0 1 .1 1
( ) 0 .6 1 0 .6 8b
b
U c c
L c c
= += +
� Plot of average time vs. complexity shows two bounds
EE 6850, F'02, Chang, Columbia U.
Factors affecting comprehension
� The comprehension time is influenced by many factors:� Visual complexity� The viewer task (active vs. passive)� Prior knowledge of the viewer
100101000
� This example only focused on visual complexity since it is measurable.
EE 6850, F'02, Chang, Columbia U.
Rich Structure: Considering syntax
� Minimum number of shots in a scene� The particular ordering of the shots
(cut)� The specific duration of the shots,
to direct viewer attention� Changing the scale of the shots
The specific arrangement of shots so as to bring out their mutual relationship. [ sharff 82 ].
Film makers think in terms of phrases of shots and not individual shots � choose the right entity for adaptation
EE 6850, F'02, Chang, Columbia U.
The progressive phrase
Hence, a phrase (a group of shots) must at least have three shots.
“Two well chosen shots will create expectations of the development of narrative; the third well-chosen shot will resolve those expectations.”[ sharff 82 ].
Maximal shot removal:eliminate all the dark shots.
EE 6850, F'02, Chang, Columbia U.
Structure (dialog)
Hence, a dialog must at least have six shots
“Depicting a conversation between m people requires 3m shots.” [ sharff 82 ].
Maximal adaptation:eliminate all the dark shots.
SVM classifier
Time-dependentViterbi decoder-temporal consistency
Audio content analysis
silence
significant phrases
Prosody analysis: (pitch, pause, energy)[Hirshberg, Nakatani, Arons, ‘92]
audio-scenes
silence removal
non-speech cleannoisy
speech
Tied Audio-Video Constraint
� Tied segments:� Include all significant phrases� Audio and video boundaries are fully synchronized� Cannot be condensed or de-synchronized� Allow viewers to “catch up” when viewing skims
� Untied segments:� Audio-video can be dropped, condensed, reduced� Audio-video segments do not have to synchronize
Significant Phrases
opening closingsignificant phrase
dialog
EE 6850, F'02, Chang, Columbia U.
Utility Framework for Skim Generation
shot detection
video utility function
audio utility function
utility function
iterative
maximization
skim generation
shot duration and visual complexity
target skim time
audio constraints
visual syntax constraints
constraints
original
skim
[Chang, IWDC 02]
Example 2: Utility-Based MPEG-4 Video Transcoding(with J. Kim and Y. Wang)
(no FD,50%)
(B1,42%)
(all B,27%)
� Original bit rate � reduced rate (resource change)� Adaptation Space: FD: frame dropping, CD: coefficient dropping,
and combinations
� Utility Ranking Description: {(Ri, Rank(Ai), Ui, Consistency-Flag), i = 1,2, …}
EE 6850, F'02, Chang, Columbia U.
MPEG-4 Fine-Grained Scalability
I PBase Layer
Spatial
Enhancement Layer
Temporal
Enhancement Layer
� Adaptation Space: Temporal frame rate and SNR bit planes
EE 6850, F'02, Chang, Columbia U.
Utility Function of Subjective Spatio-Temporal Preference(w. R. Kumar and M. van der Schaar)
� SNR is inadequate for measuring utility
� Conduct study where users chose preferred frame-rate at different bit-rates
� As the bit-rate goes up, people prefer better frame-rates
� Preference varies with video category� High-motion videos (Stefan,
Coastguard) require a higher frame rate
0102030405060708090
100
0 5
200300500
Spatio-Temporal Preferences
0
2
4
6
8
10
12
0 500 1000 1500
Bitrate (kbps)
Fram
e-R
ate
(fp
s)
Foreman
Stefan
Coastguard
Mobile
EE 6850, F'02, Chang, Columbia U.
Utility Model Guided FGS
� When more bit rate available, increase SNR quality and fix temporal rate to a predetermined bitrate
� Then improve temporal quality
� Further improve SNR quality at the new frame-rate
� And so on….
Enhanced-reference scheme
Scheme using fixed reference rate
(B) SNR quality of enhanced-scheme
Bandwidth
SNR dB 2T fps
T fps
4T fps
3T fps
EE 6850, F'02, Chang, Columbia U.
FGS+ Scheme
I PBase Layer
Spatial
Enhancement Layer
Temporal
Enhancement Layer
� Solve the issue of determining optimal rate for motion prediction reference
EE 6850, F'02, Chang, Columbia U.
Performance
� Improvement over FGS varies from 0.19 dB to 1.28 dB
� At low bit-rates simple videos benefit (Coastguard)
� At high bit-rates complex videos benefit (Mobile)
Improvement of FGS+ (B-frames) over
FGS
0
0.2
0.4
0.6
0.8
1
1.2
1.4
300 800 1300kbps
SNR (dB)
Foreman StefanCoastguard Mobile
EE 6850, F'02, Chang, Columbia U.
Content-based utility classification and prediction
Challenge: predict utility function for an arbitrary video?
Content Features Classesobject motion
object complexity
camera operation
content genre
TD iC →
Content-basedUtility Function
classifier T i
T 1
T N
content-utility correlation & content-based classification
EE 6850, F'02, Chang, Columbia U.
Example 3: Utility Adaptive Video Streaming
non-important segments: still frame + audio + captions
Content Adaptation
time
channel rate
important segments: video mode
Encoder andScheduler
Real-time Event Detection
Live video
View/text recognition
network
playback
� Model utility based on content “importance” vs. “non-importance”� Utility-based adaptive rate allocation
EE 6850, F'02, Chang, Columbia U.
Detecting recurrent semantic units: Serve
Adaptive Color Filter
Object level verification
Canonical View Detection
Shot Detection
Compressed video
Every I/P frames Key frames/shot Down-sampled I/P frames
Color,
lines, player
(Chang, Zhong, Kumar ’00)
System Issues for Adaptive Streaming
� Increase buffer size feasible� 8MB buffer can store 2000 sec (32Kbps) - 250 sec (256Kbps)
� But playback latency is the main constraint� E.g., deliver real-time event within 60 sec
� Client error more serious than server error� Degrade to Channel Rate, Freeze and resume, or adaptive
playback speed.
t1t0
Input channel
accumulatedrate
time
Server Underflow
Client Underflow
Output
networkvideoencoder
playback
Recap
Entitye a1
a2
a3
Adaptation Space
r1 r2
r3 Resource Space
u1
u2
u3Utility Space
CR
� Content adaptation needed for UMA applications� A generalized conceptual framework for
� Modeling relationships among content entities, adaptation processes, utility, and resources
� Formulate problems as constrained optimization tasks, e.g.,� Time condensed skims� Modeling spatio-temporal utility preference� Content-adaptive streaming
� Open issues� utility model, high-dimensional representation, search