multimedia quality in a conversational video-conferencing
TRANSCRIPT
ETSI workshop on Effects of transmission performance on Multimedia QoS 17 - 19 June 2008 - Prague, Czech Republic
www.psytechnics.com
Multimedia Quality in a Conversational Video-conferencing Environment
Quan Huynh-Thu, Psytechnics Ltd
Presentation outline
• Introduction to video-conferencing
• Beyond video-conferencing
• HD video-conferencing subjective experiment• HD video-conferencing subjective experiment
• Results
• Discussion
© Psytechnics 20082
Video-conferencing: early hype
What?
• Real-time audio-visual communication
Why?
• Enhance collaboration • Enhance collaboration
Where?
• Desktops, laptops, mobile devices…
• Integration in Unified Communications environment
© Psytechnics 20083
Video-conferencing: challenges
Real-timeInteractive
MultimediaDelayMedia
synchronization
© Psytechnics 20084
Bandwidth limited
IP Application
Transmission errors and network
congestion
Coding artefacts
Video-conferencing: issues
• Technology:– Video compression (coding)
– IP-based transmission (errors)
– Real-time constraint (delay)
– Two or multi-party multimedia communication (handling of potential multiple end-points)of potential multiple end-points)
• Human factor:– 'Un-natural' conversational feeling due to small resolution and camera set-up: • “I’m looking at you... but can't see you”
– Higher sensitivity to errors than for media streaming
© Psytechnics 20085
Video-conferencing: user expectation
Usage Requirements
Public “Look…”, “Hi…” • Cheap• Simple usage
© Psytechnics 20086
Corporate/Enterprise • Collaboration• Increased productivity
• Face-to-face interaction
• Cost effective• Reliable• Quality
Beyond video-conferencing: High-definition
• Real-time audio-video communication
• Enhanced features for tele-collaboration
• Face-to-face feeling (tele-presence)• Face-to-face feeling (tele-presence)
© Psytechnics 20087
HD video-conferencing subjective experiment
• Real-time interactive/conversational experiment using high-definition video-conferencing system
• Trials: task-driven 2-min conversations
1. Task requiring focus on visual terminal
2. Task not especially requiring use of visual information
• Various bandwidth/network conditions
© Psytechnics 20088
Subjective testing facilities
State-of-the art subjective test rooms at Psytechnics headquarters
© Psytechnics 20089
Experiment set-up: equipment
• 24” widescreen full HD display (1080p)
• Hardware codec box with associated HD camera (720p@30fps)
• Professional shotgun microphone
• Pair of speakers
© Psytechnics 200810
Experiment: set-up
display
camera
microphone speakerspeaker
Viewing distance: 4H (80cm)
© Psytechnics 200811
Viewing distance: 4H (80cm)
Conversational task 1: Shape matching
• Purpose: maintain focus on visual terminal
• Description: work in partnership to build a 3D shape from construction blocks
– Subject A was given a bag of multicolour, interlocking construction blocks
– Subject B was given a completed figure made from an – Subject B was given a completed figure made from an identical set of blocks
– Subject B had to instruct subject A on how to build the identical shape
– Subjects had to verify the correctness of the built shape
© Psytechnics 200812
Conversational task 2: Who’s in the bag?
• Purpose: “free” conversation not especially requiring focus on visual terminal
• Description: identify as many famous characters as possible.
– The “Clue Giver” takes a card from the bag and has to provide clues as to who is on the card without naming provide clues as to who is on the card without naming the character
– The “Guesser” must name the character and can ask questions
– “Clue giver” can skip the name he/she does not know the personality
© Psytechnics 200813
Participants
• 20 naïve subjects: 8 females and 12 males
• 16 subjects recruited from public
• 4 subjects from Psytechnics
• Age: 18-72
• None had participated in subjective testing in the • None had participated in subjective testing in the past
• None had experienced a HD video-conferencing system in the past
© Psytechnics 200814
Experimental design
• Bandwidth: 3072 and 1472 kbps
• Packet loss ratio: 0, 0.5, 3 and 6%
• Task: 1 and 2
• No delay or jitter
• Video codec: H.264 baseline profile• Video codec: H.264 baseline profile
• Audio codec: AAC at 64kbps
• Network degradations generated using ITU-T G.1050 IP impairment model
• Full factorial design (3 variables): 16 test conditions
© Psytechnics 200815
Experimental procedure
• Test conditions grouped by task:
1. 8 test conditions with task 1
2. Break
3. 8 conditions with task 2
• Presentation order of test conditions was randomizedrandomized
• Test condition was identical for both parties in a given trial
• Role of “instruction giver” and “instruction follower” was swapped between test conditions
© Psytechnics 200816
Quality assessment ratings
1. How would you rate the audio quality of the connection?
2. How would you rate the video quality of the connection?
3. How would you rate the overall quality of the connection? connection?
4. Did you have any difficulty in understanding the other party during the connection?
5. Was the overall quality of the connection acceptable for the task?
© Psytechnics 200817
Quality assessment ratings
• 5-point discrete category rating scale for Qs 1-3:
Excellent
Good
Fair
Poor
• Binary answer for Qs 4-5:
© Psytechnics 200818
Poor
Bad
Yes No
Results: distribution of quality ratings
50
60
70
80
90
100
Pe
rce
ntag
e o
f vo
tes
audio
50
60
70
80
90
100
Pe
rce
ntag
e o
f vo
tes
video
60
70
80
90
100
Per
cen
tag
e o
f vot
es
multimedia
© Psytechnics 200823
1 2 3 4 50
10
20
30
40
50
Ratings
Pe
rce
ntag
e o
f vo
tes
1 2 3 4 50
10
20
30
40
50
Ratings
Pe
rce
ntag
e o
f vo
tes
1 2 3 4 50
10
20
30
40
50
Ratings
Per
cen
tag
e o
f vot
es
Results: audio quality
2
2.5
3
3.5
4
4.5
5
Aud
io M
OS
2.5
3
3.5
4
4.5
5
Au
dio
MO
S
© Psytechnics 200824
1
1.5
2
Condition
3Mbp
s,PLR
=0
1.4M
bps,P
LR=0
3Mbp
s,PLR
=0.5
1.4M
bps,P
LR=0
.53M
bps,P
LR=3
1.4M
bps,P
LR=3
3Mbp
s,PLR
=6
1.4M
bps,P
LR=6
task 1
task 2
1
1.5
2
Condition
1.4M
bps T
ask1
1.4M
bps T
ask2
3Mbp
s Tas
k13M
bps T
ask2
PLR=0%
PLR=0.5%PLR=3%
PLR=6%
Results: video quality
2
2.5
3
3.5
4
4.5
5
Vid
eo M
OS
2.5
3
3.5
4
4.5
5
Vid
eo M
OS
© Psytechnics 200825
1
1.5
2
Condition
3Mbp
s,PLR
=0
1.4M
bps,P
LR=0
3Mbp
s,PLR
=0.5
1.4M
bps,P
LR=0
.53M
bps,P
LR=3
1.4M
bps,P
LR=3
3Mbp
s,PLR
=6
1.4M
bps,P
LR=6
task 1
task 2
1
1.5
2
Condition
1.4M
bps T
ask1
1.4M
bps T
ask2
3Mbp
s Tas
k13M
bps T
ask2
PLR=0%
PLR=0.5%PLR=3%
PLR=6%
Results: multimedia quality
2
2.5
3
3.5
4
4.5
5
Mu
ltim
edia
MO
S
2.5
3
3.5
4
4.5
5
Mul
timed
ia M
OS
© Psytechnics 200826
1
1.5
2
Condition
3Mbp
s,PLR
=0
1.4M
bps,P
LR=0
3Mbp
s,PLR
=0.5
1.4M
bps,P
LR=0
.53M
bps,P
LR=3
1.4M
bps,P
LR=3
3Mbp
s,PLR
=6
1.4M
bps,P
LR=6
task 1
task 2
1
1.5
2
Condition
1.4M
bps T
ask1
1.4M
bps T
ask2
3Mbp
s Tas
k13M
bps T
ask2
Mul
timed
ia M
OS
PLR=0%
PLR=0.5%PLR=3%
PLR=6%
ANOVA: audio quality
Source Sum Sq. d.f. Mean Sq. F Prob>F NobsEffect
size
BitRate 4.05 1 4.05 7.832061 0.01145901 160 0.1591
PLR 23.575 3 7.858333 13.25222 0.00000113 80 0.3134
Task 0.3125 1 0.3125 0.429864 0.51991635 160 0.0442
BitRate*PLR 1.175 3 0.391667 1.210027 0.31437809 40 0.0990
BitRate*Task 0.1125 1 0.1125 0.448819 0.51095717 80 0.0375
© Psytechnics 200827
BitRate*Task 0.1125 1 0.1125 0.448819 0.51095717 80 0.0375
PLR*Task 3.3625 3 1.120833 3.755327 0.01567849 40 0.1674
BitRate*PLR*Task 0.9125 3 0.304167 1.264357 0.29525098 20 0.1233
ANOVA: video quality
Source Sum Sq. d.f. Mean Sq. F Prob>F NobsEffect
size
BitRate 4.05 1 4.05 9.529412 0.00606871 160 0.1591
PLR 110 3 36.66667 42.98201 0.00000000 80 0.6770
Task 0.0125 1 0.0125 0.027576 0.86986343 160 0.0088
BitRate*PLR 1.05 3 0.35 0.914089 0.43995935 40 0.0935
BitRate*Task 0.0125 1 0.0125 0.082969 0.77643152 80 0.0125
© Psytechnics 200828
BitRate*Task 0.0125 1 0.0125 0.082969 0.77643152 80 0.0125
PLR*Task 5.5375 3 1.845833 4.236034 0.00900974 40 0.2148
BitRate*PLR*Task 0.2375 3 0.079167 0.19716 0.89790687 20 0.0629
ANOVA: multimedia quality
Source Sum Sq. d.f. Mean Sq. F Prob>F NobsEffect
size
BitRate 4.5125 1 4.5125 19.6533 0.00028535 160 0.1679
PLR 48.775 3 16.25833 25.32036 0.00000000 80 0.4508
Task 0.2 1 0.2 0.968153 0.33750598 160 0.0354
BitRate*PLR 0.4125 3 0.1375 0.418838 0.74016348 40 0.0586
BitRate*Task 0.3125 1 0.3125 0.979381 0.33478807 80 0.0625
© Psytechnics 200829
BitRate*Task 0.3125 1 0.3125 0.979381 0.33478807 80 0.0625
PLR*Task 3.225 3 1.075 5.259657 0.00284023 40 0.1639
BitRate*PLR*Task 0.3125 3 0.104167 0.363985 0.77923977 20 0.0722
Intelligibility
Conditio
• Percentage of subjects (P1) who had no difficulty in understanding the other party during the connection
MOS=2.95
© Psytechnics 200830
Task 1
Conditio
n1 2 3 4 5 6 7 8
P1 (%) 100 100 95 95 95 90 95 95
Task 2
Conditio
n9 10 11 12 13 14 15 16
P1 (%) 95 95 90 90 95 95 100 95
Acceptability
• Percentage of subjects (P2) who found the quality of the connection acceptable for the required task
Conditio
MOS=2.95
© Psytechnics 200831
Task 1
Conditio
n1 2 3 4 5 6 7 8
P2 (%) 100 100 100 100 95 95 100 90
Task 2
Conditio
n9 10 11 12 13 14 15 16
P2 (%) 95 95 90 90 95 85 90 95
Summary
• Conversational video-conferencing experiment using – Full-factorial design based on 3 variables: bandwidth, packet loss ratio and task
– Random packet loss
– No delay or jitter
• Most participants unexpectedly provided high quality ratings and found the quality acceptable even if video was severely degraded (user expectation)
• The system in test produced similar video quality at both 3Mbps and 1.4Mbps (without packet loss)
• Packet loss ratio was found to be the most important factor influencing multimedia quality amongst the 3 variables considered.
• Statistical analysis showed an interaction effect between the visual impact of packet loss and task on multimedia quality
© Psytechnics 200832
Discussion
• User expectation: naïve participants (public) might have lower quality expectation than target users (experienced with video / business users)
• Balance/distribution of errors/qualities on the audio and visual signals using real-world systems in subjective experiments
• Suitable tasks to exercise both audio and video components, create eye-contact…eye-contact…
• Conversational tests represent heavy investment for relatively small amount of data:
– Full-factorial design with small number of variables to examine main/interaction effects
– Fractional design with high number of variables but no possibility to examine main/interaction effects
© Psytechnics 200833