development and optimization of coding algorithms for...

63
Gerhard Tech n Heribert Brust n Karsten Müller n Anil Aksay n Done Bugdayci Development and optimization of coding algorithms for mobile 3DTV

Upload: others

Post on 09-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

Gerhard Tech n Heribert Brust n Karsten Müller n Anil Aksay n Done Bugdayci

Development and optimization of coding algorithms for mobile 3DTV

Page 2: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV

Project No. 216503

Development and optimization of coding algorithms for mobile 3DTV

Gerhard Tech, Heribert Brust, Karsten Müller, Anil Aksay, Done Bugdayci

Abstract: Error resilience tools for H.264/AVC are presented. Slice encoding has been implemented in the H.264/MVC reference software JMVC 5.0.5. An evaluation of the new encoder shows that the additional bit rate needed for the error resilience can be neglected for error free channels. In case of error-prone channel the new slice mode provides sufficient error resilience.

The Mixed Resolution Stereo Coding (MRSC) approach has been evaluated. The optimal bit rate distribution between left and down sample right view has been examined. The Advanced Mixed Resolution Stereo Coding (AMRSC) approach has been developed. The three main features of AMRSC are optimized down sampling, interview prediction and view enhancement using unsharp masking. The suitability of sub sampling and low pass filtering together with interview prediction has objectively been evaluated. Further improvements of coding efficiency have been achieved by optimizing the bit rate distribution between the full view and the predicted down-sampled view. AMRSC is compliant with the overall MVC coding strategy in Mobile3DTV.

For subjective evaluation of coding methods 96 test stimuli have been generated from the six sequences of the coding test set using Simulcast, Multi View, Mixed Resolution and Video + Depth coding. Two codec profile and a low and a high quality level have been used. 24 test stimuli have been generated for subjective evaluation of transmission approaches using Simulcast, Multi View and Video + Depth coding. Half of the 24 sequences have been coded using the new encoder supporting error resilience methods.

Keywords: 3DTV, Error Resilience, Mixed Resolution Coding, Generation of coded test sequences

Page 3: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

2

Executive Summary

This deliverable is tripartite. The first part deals with the new prototype of the software encoder using error resilience tools. The second part describes the examination and the development of the Mixed Resolution approach. The optimization of the coding approach for subjective tests is presented in the last part.

H.264/AVC provides several error resilience tools. However none of them is implemented in JMVC Reference Software for MVC extension of H.264/AVC. The implementation of slice encoding into the H.264/MVC reference software JMVC 5.0.5 has been carried out. Frames are stored in smaller data packets that can still be decoded independently in case of losses. An evaluation of the new encoder has been carried out using an error-free and error-prone channel. Coding tests show that the additional bit rate needed for the error resilience can be neglected and video quality only decreases slightly for error-free channels. In case of error-prone channel it has been demonstrated that the new slice mode provides sufficient error resilience and leads to a high gain of video quality.

In the context of stereo coding, both views have the same resolution in classical coder settings. Here, an interesting alternative is the Mixed Resolution Coding approach, which is also eva-luated. It is found that the optimal bit rate distribution between left and down sample right view is approximately 30% to 35% for the down sampled view. The quality of Mixed Resolution and Full Resolution Coding is subjectively evaluated and shows that the subjective quality of coded Mixed Resolution sequences is better than Simulcast coded sequences, due to decreasing num-ber of coding artifacts. Although this approach yields lower bitrates, the perceived quality may not always be close to the full view. Therefore, beyond the Mixed Resolution approach, an Ad-vanced Mixed Resolution Stereo Coding (AMRSC) approach has been developed. The three main features of the AMRSC approach are optimized down sampling, interview prediction and view enhancement using unsharp masking at the receiver side. The suitability of sub sampling and low pass filtering together with interview prediction was objectively evaluated. Further improvements in coding efficiency were achieved by optimizing the bit rate distribution between the full view and the predicted down-sampled view. For the Mobile3DTV application, this means that an MVC codec will be used, for which the AMRSC approach is now also compliant.

Test stimuli for subjective evaluations have been generated. Coding results for various stereo video coding approaches, codecs and codec settings are presented. For subjective evaluation of coding methods in total 96 test stimuli have been generated from the six sequences of the cod-ing test set using Simulcast, Multi View, Mixed Resolution and Video + Depth Coding. Moreover a baseline and a high codec profile were used. An objective evaluation was carried out at a low and a high quality level. 24 test stimuli have been generated for the subjective evaluation of transmission methods. Therefore the four sequences of the transmission test set have been coded using Simulcast, Multi View and Video plus Depth coding. Half of the sequences were coded using the new encoder with error resilience tools.

Page 4: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

3

Table of Contents

1 Introduction .......................................................................................................................... 4

2 Software encoder using error resilience tools ....................................................................... 5

2.1 Slice Interleaving ........................................................................................................... 5

2.2 Modified MVC encoder .................................................................................................. 6

2.3 Modified MVC bit stream assembler .............................................................................. 7

2.4 Modified MVC decoder .................................................................................................. 7

2.5 Evaluation of MVC encoder using slice mode ................................................................ 8

2.5.1 Test Setup .............................................................................................................. 8

2.5.2 Coding Results for error free channel ..................................................................... 9

2.5.3 Coding Results for error prone channel ................................................................ 10

2.6 Conclusion .................................................................................................................. 11

3 Mixed Resolution Coding .................................................................................................... 12

3.1 Optimization of Mixed Resolution Stereo Coding (MRSC) ........................................... 12

3.1.1 Different sampling methods of MRSC .................................................................. 12

3.1.2 Objective criteria for bit rate allocation .................................................................. 14

3.1.3 Subjective evaluation ........................................................................................... 16

3.2 Advanced Mixed Resolution Coding (AMRSC) ............................................................ 18

3.2.1 Interview Prediction .............................................................................................. 18

3.2.2 View Enhancement using unsharp masking ......................................................... 29

3.3 Conclusion .................................................................................................................. 31

4 Optimization of coding approaches for subjective tests ...................................................... 32

4.1 Test sequences for subjective evaluation of coding approaches ................................. 32

4.1.1 Test setup ............................................................................................................ 32

4.1.2 Coding Results ..................................................................................................... 36

4.1.3 Generated Test Stimuli ......................................................................................... 44

4.1.4 Conclusion ........................................................................................................... 49

4.2 Test sequences for transmission studies ..................................................................... 50

4.2.1 Test setup ............................................................................................................ 50

4.2.2 Coding Results ..................................................................................................... 52

4.2.3 Generated Test Stimuli ......................................................................................... 57

4.2.4 Conclusion ........................................................................................................... 58

5 Conclusion ......................................................................................................................... 59

Page 5: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

4

1 Introduction This deliverable consists of three parts. The first part deals with the new prototype of the soft-ware encoder using error resilience tools. The second part describes the examination and the development of the Mixed Resolution approach. The optimization of coding approaches for subjective tests is presented in the last part.

H.264/AVC provides several error resilience tools. However none of them is implemented in JMVC Reference Software for MVC extension of H.264/AVC. The implementation of slice encoding into the H.264/MVC reference software JMVC 5.0.5 is reported in section 2. Moreover coding results using this new encoder are presented. The evaluation has been carried out using an error-free and error-prone channel.

In section 3 the Mixed Resolution Coding approach is presented. Different types of sub sampling one view are compared. The optimum bit rate distribution between both views is investigated and the quality of Mixed Resolution and Full Resolution Coding is subjectively evaluated. An Advanced Mixed Resolution Stereo Coding Approach (AMRSC) is presented. Therefore the coding of mixed resolution sequences was improved by exploiting interview dependences. Moreover the suitability of sub sampling and low pass filtering together with interview prediction was objectively evaluated. Improvements in coding efficiency were achieved by choosing differ-ent QP parameters for left and right view. Finally the enhancement of the subjective quality by applying a simple unsharp masking algorithm was investigated.

Section 4 presents methods for the generation of test stimuli for subjective tests. Coding results for various stereo video coding approaches, codecs and codec settings are presented. The fo-cus is set on the generation of test stimuli with defined bit rates for subjective comparison of coding approaches as well as transmission methods. Test stimuli generated for the coding test also include sequences of the simple Mixed Resolution approach. For generation of some of the transmission test stimuli, the new encoder using error resilience tools has been utilized.

Page 6: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

5

2 Software encoder using error resilience tools Error resilient tools in H.264/AVC are data partitioning, slice interleaving, flexible macroblock (MB) ordering (FMO), SP/SI frames, reference frame selection, intra block refreshing and redun-dant slices [1]. SP/SI frames and reference frame selection requires feedback from the decoder. Data partitioning, slice interleaving, FMO, intra block refreshing and redundant slices are the candidates to be used in MVC. However none of these tools are implemented in JMVC Refer-ence Software for MVC extension of H.264/AVC.

Slice interleaving for error resilience has been integrated to JMVC 5.0.5. By this way, it is possi-ble to code each different representation with/without slices.

2.1 Slice Interleaving

Frame #0

Frame #1

Frame #2

Frame #3

Parameter sets

Slice #0

Slice #1

Slice #2

Slice #0

Slice #0

Slice #1

Slice #0

Video sequence

Bit stream NAL packets

Figure 2.1: Bit stream syntax of H.264/AVC using fixed-size slices

H.264/AVC bit stream is composed of network abstraction layer (NAL) units as shown in Figure 2.1. In each NAL unit, there is a video coding layer (VCL) block. VCL can be a small packet with information about the bitstream like sequence parameter set (SPS); picture parameter set (PPS) or supplemental enhancement information (SEI). SPS and PPS are required packets whereas SEI can be skipped. Other VCL packets are the coded video streams. Each packet is a slice containing an integer number of macro block. It can contain all macro blocks of a frame or it can contain a single macro block. Figure 2.2 depicts a frame encoded using several slices. Slices are independently decodable if previous frames are available. This is achieved by using the location information in the slice header and by allowing spatial dependency only inside the slice. For compression efficiency, using a single slice per frame is better in order to avoid header over-head.

Page 7: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

6

Slice #0

Slice #1

Slice #3

Slice #4

Figure 2.2 Slice encoding of a frame using fixed-size slices

If NAL unit size is bigger than Maximum Transmission Unit (MTU) of the corresponding transport medium, it will be fragmented into smaller packets. In erroneous environments, some of these smaller packets can be lost, and this will cause the system to lose the entire frame, since parts of a NALU cannot be decoded by the decoder. However, by encoding a frame into several slices so that each slice size is smaller than MTU, each packet arrived at the decoder can be decoded correctly. In Figure 2.3 it is shown how the same error pattern is applied to both slice and no slice encoded streams. Some of the slices can still be decoded in case of slice encoding. The performance of slice encoding can be affected by the burst size of the error and also the size of the slices.

time

time

time

: Bit Error : Lost Packet

Figure 2.3 Sliced and not-sliced encoding in cases of erroneous transmission

Slice encoding and decoding has been implemented into JMVC 5.0.5 by modifying the encoder, decoder and bit stream assembler. In the current version, encoder generates several numbers of slices for a frame according to the input slice size parameter.

2.2 Modified MVC encoder

There are several functions modified in order to integrate slice encoding into MVC encoder. First a new function is added to check the total bytes spend for the currently encoded slice. By using this function, slice encoding loop is modified. Instead of coding all MBs in the frame, there is a check after each MB is encoded. If the total allowed slice size is smaller than the current slice size, current slice is finalized and the next MB will start with a new slice. Another modification is done for the loop filter functions. Previously loop filter operations are applied for all MBs in the frame. This part is changed so that loop filter operations applied for the MBs inside the current slice.

Page 8: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

7

2.3 Modified MVC bit stream assembler This is simply the assembling of the left and right streams before the decoder and it is necessary since JMVC software decoder takes the multi – view coded streams in assembled format. The decoder uses the inter-view prediction references between the two streams so the assembling is done such that the frames of each stream are put in order containing one from left and one from right view streams. JMVC bit assembler assumes each frame is encoded using a single slice and there are no losses in the stream. Instead of fixing these problems, a new application is written which assembles left and right streams correctly in case of slice encoding and also losses.

2.4 Modified MVC decoder Since JMVC 5.0.5 Reference software does not have the slice mode implemented, adding slice mode in the encoder brought the issue of decoder modification. Decoder software does not re-quire any modification for slice decoding. However there are some modifications for error handling in case of slice and/or frame losses.

Error concealment is not a normative part of H.264/AVC. More information on H.264/AVC error concealment can be found in [1]. However, these concealment strategies are not included in the MVC software. In order to handle frame losses another application is used to insert skip frames. This will enable decoder to decode lost frames as a copy of the previous frames in the buffer.

For slice losses, modifications are done in the MVC decoder. New functions are added to check the decoded MBs in each frame and identify the missing MBs when decoding of the current pic-ture finishes. Missing MBs are copied from the collocated MBs from the nearest decoded picture in the buffer. Also loop filter functions are modified to disable loop filtering for the missing MBs.

Page 9: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

8

2.5 Evaluation of MVC encoder using slice mode

2.5.1 Test Setup

The evaluation of the MVC encoder was carried out using the four sequences of the test set for transmission studies (see section 4.2.1.2). To generate test sequences the slice argument determining the slice size was varied as well as the QP. Codec Parameters are given in Table 2.1. A slice size of zero means that slice interleaving was disabled.

Table 2.1 Codec Settings

Profile Baseline

GOP Size 1 (IPPP)

Symbol Mode CAVLC

Search Range 48

Intra Period 16

QP 24, 28, 32, 36, 40

Slice Size (byte) 250, 500, 750, 1000, 1250

The performance of the encoder has been evaluated for an error-free and an error prone chan-nel. To generated distorted sequences the channel model shown in Table 2.2 has been used. For details please refer to [2].

Table 2.2 Channel and transmission parameters

Channel Model (COST207) Typical Urban 6 taps

fDMax=24Hz

Modulation 16QAM

Convolutional Code Rate 2/3

Guard Interval 1/4

FFT Mode 8K

SNR 17dB

Page 10: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

9

2.5.2 Coding Results for error free channel

The rate-distortion characteristics of the error resilient encoder in case of an error-free channel are depicted in Figure 2.4. With decreasing size of the slices the performance of the encoder decreases. This can be explained with an increasing overhead, introduced by the higher number of data packets. Nevertheless for a slice size of e.g. 1000 byte performance losses are below 0.5 dB can be neglected.

Figure 2.4 PSNR vs. bit rate, for different slice sizes and sequences; error-free channel

Page 11: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

10

2.5.3 Coding Results for error prone channel

Results for the error-prone channel are depicted in Figure 2.5. Each of the shown rate-distortion points is an average over rates and distortions of five sequences distorted with different error patterns. Note that averaging is required for evaluation, since coding different QPs and slice sizes results in bit streams of different length that are distorted a different position. For a single sequence quality can decrease with enabled slice mode. Reason for this is that losses that oc-curred in uncritical parts of the bit stream generated without slice mode can be shifted to critical parts of the bit stream when coded with enabled slice mode. However, in average the slice interleaving should lead to an increased quality. To show this the evaluation has to be carried out statistically.

Here only five different error patterns have been used per bit stream. Hence the influence of outliers can still be seen in Figure 2.5 (For example for RollerBlade1 at a slice size of 250). Nevertheless the converging tendency can already be observed: A smaller slice size results in an increasing performance. For high rates the gain obtained by using the slice mode increases. Reason for this is that the length of critical parts in the bit stream increases and losses of impor-tant data packets are more likely. This effect can be diminished by slice partitioning effectively.

Figure 2.5 PSNR vs. bit rate, for different slice sizes and sequences; error-prone channel;

average over sequences distorted by 5 different error patterns

Page 12: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

11

2.6 Conclusion The prototype of the software encoder using error resilience has been presented. Error resi-lience is achieved by a new slice mode. Frames are stored in smaller data packets that can still be decoded independently in case of losses.

Coding tests show that the additional bit rate needed for the error resilience can be neglected and video quality only decreases slightly for error free channels. In case of error-prone channel it has been demonstrated that the new slice mode provides sufficient error resilience and leads to a high gain of video quality.

Beyond the objective examination of the slice encoder carried out here, a subjective evaluation of the encoder using slice mode will be carried out in a large scale subjective test and be re-ported in the upcoming deliverable D4.3 “Results of quality attributes of coding, transmission and their combinations“. Further error protection applied in the lower layers will be reported in the upcoming deliverable D3.4 “Stereo DVB-H broadcasting system with error resilient tools”.

Page 13: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

12

3 Mixed Resolution Coding In this section the Mixed Resolution Coding approach is presented. Different types of sub sam-pling one view are compared. The optimum bit rate distribution between both views is investi-gated and the quality of Mixed Resolution and Full Resolution coding is subjectively evaluated.

The Advanced Mixed Resolution Stereo Coding (AMRSC) is presented. Therefore the coding of mixed resolution sequences was improved by exploiting interview dependences. Moreover the suitability of sub sampling and low pass filtering together with interview prediction was objectively evaluated. Improvements in coding efficiency were achieved by choosing different QP parameters for left and right view. Finally the enhancement of the subjective quality by applying a simple unsharp masking algorithm was investigated.

3.1 Optimization of Mixed Resolution Stereo Coding (MRSC) If the sharpness in two views of a stereoscopic signal is different, the perceived quality is close to the sharper view (binocular suppression theory) [3], [4]. In the presence of different amount of blocking artifacts however, the binocular quality is rated as the average of both views. This means that it should be possible to transmit a stereoscopic video with one reduced resolution view (Mixed Resolution representation) at a lower bit rate and to still reach the same quality as for the Full Resolution representation.

3.1.1 Different sampling methods of MRSC

Different ways of reducing the sharpness of one view were investigated. Sub sampling one view in both directions and sub sampling only in one direction were compared. Another method of reducing the sharpness is low pass filtering and coding at the base resolution. These methods have been compared to Full Resolution coding in an informal subjective test with the sequences Mountain, Diving, Performance and Soccer 2 (Figure 3.1).

Figure 3.1: Test sequences for subjective comparison of Mixed Resolution and Full Resolution coding(left

view): (a) Mountain (b) Diving (c) Performance (d) Soccer 2

(c) (d)

(a) (b)

Page 14: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

13

The sequences have a resolution of 320x240 Pixel, a frame rate of 30 frames per second and a length of 240 frames (Mountain, Diving and Performance) and 450 frames (Soccer2) respec-tively. For down sampling and up sampling, the filters used in the JSVM software, were applied [5] row wise and column wise. These are the non-normative dyadic filter for down sampling (equation 1) and the normative dyadic filter for up sampling (equation 2)

642043519261953402 (1)

3210502032200501 (2)

The test was carried out on a 3.5" display with barrier technology. It has a total resolution of 640x480 pixel and a resolution of 320x480 pixel per view in 3D mode. The sequences were dis-played with the stereoscopic player [6] which does a vertical up sampling by a factor of two. Four different coding types were compared (Table 3.1).

Table 3.1: Tested types of sub sampling of coded stereoscopic sequences

Left view Right view

Type 1 Coding of full resolution view Coding of full resolution view

Type 2 Horizontal sub sampling by a factor of 2 and coding

Coding of full resolution view

Type 3 Horizontal and vertical sub sampling by a factor of 2 and coding

Coding of full resolution view

Type 4 Low pass filtering in horizontal and vertical direction and coding of full resolution view

Coding of full resolution view

The total bit rate for all types was the same and the bit rate distributions were chosen to 1:2, 1:4 and 1:8 for left view vs. right view. For coding, H.264/AVC simulcast with reference software JM 14.2 was used.

The different coding types were subjectively rated by 5 video coding experts in an informal test from best to worst in the following order:

1. Type 3 2. Type 2 3. Type 4 4. Type 1

This indicates that it is possible to transmit a Mixed Resolution sequence with the same bit rate at a higher subjective quality.

To verify these results, further tests have been carried out to show the difference between the best Mixed Resolution representation (Type 3) and Full Resolution. In order to do further subjec-tive tests, the best bit rate distribution had to be found. This was done with objective criteria, described in the next section.

Page 15: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

14

3.1.2 Objective criteria for bit rate allocation

To optimize the bit rate distribution between left and right view, the PSNR measure was used. Due to the theory that the binocular quality of a stereo sequence is the average of the quality of both views, in presence of blocking artefacts [4], a total PSNR was calculated, considering all pixels of both views. To realize that with existing tools, the mean squared error (MSE) was first calculated separately for both views. After that the total PSNR was calculated from both mean squared errors for left and right view using the following equations:

tMSEPSNR

2

10

255log10

(3)

2

21 MSEMSEMSEt

(4)

In the case of Mixed Resolution Coding, the calculation of mean squared error was done after up-sampling the lower resolution view. The calculation was done with respect to the low pass filtered and up-sampled original view.

Figure 3.2 shows the rate-distortion curves for the sequence Mountain for left and right view for Mixed Resolution in which the left view was coded at half horizontal and half vertical resolution, and Full Resolution without any low pass filtering. A total bit rate of 400 kbit/s was used. The bit rate distribution varied over the entire range from 100% for left view to 100% for right view. Both curves were then interpolated with a cubic spline interpolation to calculate the total PSNR and match the exact value of 400 kbit/s total bit rate.

Figure 3.2: PSNR for left view, right view and total PSNR with (a) Mixed Resolution and (b) Full Resolution

It can be seen that the total PSNR curve for Mixed Resolution has its maximum at 30% for the left (subsampled) view and for Full Resolution at 45% for the left (full) view. Moreover it can be seen that the total PSNR reaches higher values for Mixed Resolution than for Full Resolution. This comes from the fact that the PSNR for the left view was calculated with respect to the low

(a) (b)

Page 16: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

15

pass filtered up-sampled uncoded view. Hence the total PSNR curves do not take blur into ac-count. However, they follow the binocular quality, if there is no difference between Mixed Resolution (with one low pass filtered, down-sampled and up-sampled view) and Full Resolution visible. It was shown in D2.4 that the difference between Mixed Resolution and Full Resolution is minimized with increasing base resolution, increasing viewing distance and decreasing display size.

Figure 3.3 shows the total PSNR curves for Mixed Resolution and Full Resolution for different sequences with the following total bit rates: Mountain (425 kbit/s), Diving (236 kbit/s), Perform-ance (1280 kbit/s) and Soccer 2 (350 kbit/s).

Figure 3.3: Total PSNR for Mixed Resolution and Full Resolution for the sequences

(a) Mountain, (b) Diving, (c) Performance and (d) Soccer 2

The maximum of the total PSNR curve for Full Resolution lies around 50% for the left view. For Mixed Resolution the total PSNR reaches its maximum at 30% (Mountain and Performance), 35% (Soccer 2) and 45% (Diving). This shows that the optimum bit rate distribution between both views is sequence-dependent.

(a) (b)

(c) (d)

Page 17: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

16

Figure 3.4 shows the total PSNR curves and its maxima for the sequence Mountain at different total bit rates. It can be seen that the optimum distribution depends on the total bit rate.

Figure 3.4: Total PSNR and corresponding maxima for the sequence Mountain at total bit rates of 200 kbit/s,

400 kbit/s, 600 kbit/s, 800 kbit/s and 1000 kbit/s (from bottom to top)

3.1.3 Subjective evaluation

Small scale subjective tests were carried out to compare Mixed Resolution coding with Full Resolution coding. The test setup was the same as described in the subjective tests with un-coded sequences in D2.4 [5]. The 3.5" and the 32" stereoscopic displays were used. The se-quences Mountain, Diving, Performance and Soccer2 were shown to 13 expert viewers in an A-B preference vote in the following order:

AABBAABB

It was randomly chosen, whether A was the Mixed Resolution sequence and B Full Resolution sequence or vice versa. After that test persons should rate, if A or B had the better overall qual-ity. The tested bit rates are shown in Table 3.2.

Table 3.2: Bit rate distribution to left and right view for subjective tests with Mixed Resolution and Full

Resolution

Sequence Mountain Mountain Diving Performance Soccer2 Soccer2

Total bitrate [kbit/s] 320 425 236 1280 260 350

Bit rate left view /total bit rate: Mixed Resolution [%]

36 27 39 28 32 40

Bit rate left view /total bit rate: Full Resolution [%]

47 52 52 51 49 49

Page 18: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

17

Table 3.3: Result of subjective tests with coded sequences

Mixed Resolution better

No differ-ence

Full Resolution better

Setup I

(3.5” display)

h/d = 1/10

Mountain

320 kbit/s 8 1 4

Mountain 425 kbit/s

8 2 3

Diving

236 kbit/s 10 0 3

Performance

1280 kbit/s 9 0 4

Soccer2

260 kbit/s 3 1 9

Soccer2 350 kbit/s

5 0 8

total 43 4 31

Setup II

(32” display)

h/d = 1/5

Mountain

320 kbit/s 9 1 3

Mountain 425 kbit/s

9 0 4

Diving

236 kbit/s 8 1 4

Performance

1280 kbit/s 8 0 5

Soccer2

260 kbit/s 6 0 7

Soccer2 350 kbit/s

6 1 6

total 46 3 29

Page 19: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

18

The results of these tests are shown in Table 3.3. It can be seen, that for both displays Mixed Resolution has a slightly better binocular quality than Full Resolution. This means that for these relatively small bit rates the stronger blocking artifacts in the case of Full Resolution are more annoying than the slightly unsharper images in the case of Mixed Resolution.

Nevertheless the performance of the Mixed Resolution approach is also display dependent. In the large scale subjective evaluation of the coding approaches the Mixed Resolution approach does not outperform the simulcast approach (see upcoming Deliverable 4.3 “Results of quality attributes of coding, transmission and their combinations”). This can be related to the advanced NEC display that is used in the large scale study. The NEC autostereoscopic 3.5” display with a resolution of 428 x 240 is based on a lenticular sheet technology and provides a much better video quality than the display based on parallax barrier technology. The sharpness difference introduced by mixed resolution seems to be more visible on the NEC display. However, the evaluation carried out here shows the potential of the mixed resolution approach. Therefore an advanced mixed resolution approach has been investigated and is presented in the next section.

3.2 Advanced Mixed Resolution Coding (AMRSC) The coding tests in section 3.1 were carried out without using any prediction between both views. Coding a stereoscopic video with inter-view prediction can result in bit rate savings while maintaining the same quality. It was investigated whether inter-view prediction and Mixed Resolution Coding can be combined to obtain better coding results than applying only one of the two techniques.

The unsharp masking algorithm presented in section 3.2.2 is a method for enhancing the subjec-tive quality of Mixed Resolution sequences. It is a simple algorithm that can be applied on a mo-bile device with low computational costs.

A further enhancement of the Mixed Resolution approach can be obtained by using optimized down sampling algorithms. An investigation of these methods is not part of this deliverable but presented in the upcoming Deliverable 5.4 (“Advanced algorithms for stereo-video pre-processing”).

3.2.1 Interview Prediction

It was reported in D2.2 that using H.264/MVC (Multiview Video Coding) with inter-view prediction results in a significantly better rate distortion performance than simulcast coding. On the other hand, coding a low pass filtered or sub sampled video requires less bit rate than the original video and maintains a high binocular quality due to the binocular suppression theory. It was investigated how the Mixed Resolution Coding can be improved by exploiting inter-view depen-dences of both views.

3.2.1.1 Low Pass Filtering and Sub sampling

This section describes the coding experiments of low pass filtered and sub sampled views with interview and without interview prediction. The right view was coded with the base resolution. The left view was coded with four different methods:

Method I: low pass filtering and coding with base resolution (Left LP).

Method II: low pass filtering and sub-sampling by a factor of two in both directions and cod-ing at the reduced resolution (Left DS).

Method III: the decoded right view was used for inter-view prediction for the low pass filtered left view (Left LP IV).

Page 20: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

19

Method IV: the decoded right view was low pass filtered and sub-sampled by a factor of two and used for inter-view prediction of the low pass filtered and sub-sampled left view (Left DS IV) [7].

For all methods the H.264/MVC was used. The tested sequences are Hands (251 frames), Snail (189 frames), Horse (140 frames) and Car (235 frames) with a base resolution of 480x272 pixel.

Figure 3.5 Sequences for coding test with Mixed Resolution and interview prediction:

(a) Hands, (b) Snail, (c) Horse and (d) Car

The encoder settings are shown in Table 3.4. The QP parameter was varied from 20 to 44. For the methods using interview prediction, the QP for left and right view were the same.

Table 3.4 Encoder setting for Mixed Resolution Coding with interview prediction

Encoder Implementation JMVM 7.0

Quantization Parameter 20, 22, 24, 26, ..., 44

GOP Size 2

Intra Period 16

Symbol Mode CAVLC

(a) (b)

(c) (d)

Page 21: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

20

(a)

(b)

Figure 3.6: Coding results for Mixed Resolution with inter-view prediction,

LP = low-pass filtered, DS = down-sampled, IV = inter-view prediction:

(a) Hands, (b) Snail

Page 22: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

21

Figure 3.7: Coding results for Mixed Resolution with inter-view prediction,

LP = low-pass filtered, DS = down-sampled, IV = inter-view prediction:

(a) Horse and (b) Car

(a)

(b)

Page 23: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

22

The rate-distortion curves are shown in Figure 3.6 and Figure 3.7. The right (original) view has the worst performance because it was coded with base resolution and high details. The coding of the low pass filtered left view (Left LP) shows a better performance because the PSNR was calculated with respect to the low pass filtered uncoded view. The PSNR of the decoded low pass filtered and sub-sampled view (Left DS) was also calculated with respect to the low pass filtered and sub-sampled uncoded view.

It can be seen for all sequences that coding with a lower resolution leads to a better rate distor-tion performance than only low pass filtering.

With the use of inter-view prediction the low pass filtered version (Left LP IV) reaches some gain for low bit rates. For high bit rates nearly no enhancement is visible.

The use of inter-view prediction for the low pass filtered and down-sampled version (Left DS IV) achieves a gain for all sequences for low and high bit rates. Bit rate savings of up to 70% com-pared to the coding of a down sampled view without inter-view prediction are possible for some sequences (Horse, Car).

3.2.1.2 Bit rate allocation

In the tests described in section 3.2.1.1 the best result was achieved with coding a sub-sampled version of the left view with inter-view prediction from the right view. The QP values were the same for left and right view. This combination of QP values is not necessarily the best in terms of the overall binocular quality.

It was further tested how the rate distortion performance changes for coding the left view when the QP value of the base (right) view is varied.

The coder settings of Table 3.4 were used for this test, while the QP value of the right view was varied from 20 to 44 with step size 1. The decoded right view was low pass filtered and sub sam-pled, and used for inter-view prediction of the left view. The left view was coded with QP values from 20 to 44 with step size 4 with all QP values of the decoded right view.

It can be seen in Figure 3.8 that based on the PSNR value for same QPs for left and right view, the PSNR value changes if the QP value of the right view varies. The PSNR value reaches higher values at lower left-view-only bit rates if the quality of the base view is increased. When the quality of the right view is decreased, the left view requires a higher bit rate and has a lower PSNR value. For the sequence Snail there are some exceptions of this behavior, but only for relatively low bit rates of the right view. Note, that this inverse PSNR behavior only occurs, be-cause the left view PSNR is plotted against the left-view-only bit rate. For the total PSNR vs. total bit rate, the expected behavior occurs, as shown in the following figures.

Page 24: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

23

Figure 3.8: Left view bit rate vs. left view PSNR; the down-sampled left view was coded with inter-view

prediction from right view; the violet curve shows results for same QP values for left and right view; the black

curves show the variation of rate and distortion with varying QP of the right view

(a) Hands, (b) Snail, (c) Horse and (d) Car

To find out which gain is achievable with different QP values compared to same QP values of left and right view, it is necessary to evaluate both views jointly. This was done by averaging the mean squared errors of both views. The mean squared error of the left view was calculated with respect to the low pass filtered uncoded left view. Figure 3.9 and Figure 3.10 show the total PSNR versus the total bit rate of both views. The displayed numbers are the QP values of the left (down sampled) and the right view with the highest total PSNR for particular bit rates.

(a) (b)

(c) (d)

Page 25: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

24

(b)

(a)

Figure 3.9: Total PSNR versus total bit rate for Mixed Resolution Coding with inter-view prediction and the

QP combinations for left (down sampled) and right view with the highest PSNR for particular bit rates:

(a) Hands and (b) Snail

Page 26: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

25

Figure 3.10: Total PSNR versus total bit rate for Mixed Resolution Coding with inter-view prediction and the

QP combinations for left (down sampled) and right view with the highest PSNR for particular bit rates:

Horse (a) and (b) Car

(b)

(a)

Page 27: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

26

It can be seen that with this optimization the QP of the right (base) view is always higher than the QP of the left (down sampled) view. The bit rate distribution to left and right view is shown in Table 3.5.

For all tested QPs the bit rate for the right view is higher than the bit rate for the left view. For the sequence Hands the bit rate difference between left and right view is lower than for the other sequences. The reason for this is that the sequence Hands has less inter-view dependences than the sequence Horse for example.

Figure 3.11 and Figure 3.12 show the comparison between the optimized QP values for inter-view prediction and the results of inter-view prediction with same QPs for left and right view. For the sequences Car and Horse the rate distortion curves are nearly identical for both methods. There is a small gain of different QP values for the sequence Snail and a significant gain for the sequence Hands.

The use of inter-view prediction with same QPs causes less gain compared to simulcast for the sequence Hands than for the other sequences. Because of that the optimization of the QP val-ues results in high gains for the sequence Hands.

Table 3.5: Coding results for left (down sampled) and right view with interview prediction

Hands Snail

Total bit rate [kbit/s] 2358 1551 1040 610 397 319 186 123 77 54

Total PSNR [dB] 37.5 34.9 32.8 30.4 28.8 46.1 43.3 40.9 38.2 35.7

QP left (DS) view 20 24 28 32 36 20 24 28 32 36

QP right view 28 32 35 39 41 24 28 31 35 39

Bit rate left view [kbit/s] 946 651 420 252 138 99 56 30 18 12

Bit rate right view [kbit/s] 1412 900 621 358 258 220 130 92 59 42

Horse Car

Total bit rate [kbit/s] 1784 1041 612 320 1205 742 441 232 126

Total PSNR [dB] 38.8 36.2 33.9 31.2 41.8 39.7 37.7 35.3 33.1

QP left (DS) view 20 24 28 32 20 24 28 32 36

QP right view 24 27 30 34 24 27 30 34 38

Bit rate left view [kbit/s] 339 174 88 44 240 129 68 38 24

Bit rate right view [kbit/s] 1445 866 524 276 965 612 373 194 103

Page 28: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

27

(b)

(a)

Figure 3.11: Total bit rate versus total PSNR for Mixed resolution coding without inter-view prediction, with

inter-view prediction with same QPs and with different QPs for left and right view:

(a) Hand and (b) Snail

Page 29: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

28

Figure 3.12: Total bit rate versus total PSNR for Mixed resolution coding without inter-view prediction, with

inter-view prediction with same QPs and with different QPs for left and right view:

(a) Horse and (b) Car

(b)

(a)

Page 30: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

29

3.2.2 View Enhancement using unsharp masking

The suitability of unsharp masking filters for enhancement of the binocular quality has been evaluated. The filter increases the subjective sharpness of the sub-sampled view, hence the approach has the potential to achieve a sharper overall image, as well as the potential to reduce the sharpness differences between both views. An advantage of this approach is that the bit rate for transmission does not increase.

The resolution of the right (base) view was 480x272 pixel and the resolution of the left (sub sam-pled) view was 240x136.

After decoding and up sampling an unsharp masking algorithm was applied to the left view. This algorithm uses the following convolution matrix

888

81

8

888

(5)

for the up-sampled image. This has the effect that in the resulting image, the low frequency components are reduced, while the high frequency components are enhanced. Thus, the algo-rithm produces a subjectively sharper sequence.

The parameter α adjusts the factor of the unsharp masking. Here, α values of 2 and 4 and QP values of 25, 28, 31, 34, 37 were tested.

The sequences were shown on the 3.5” and 32”, displays (see section 3.1.3) in an informal ex-pert viewing test. It was observed, that the binocular quality of the Mixed Resolution sequences, to which the method was applied, did not improve quality in all cases. Even a slightly worse qual-ity was observed for medium and low bit rates. As expected the overall subjective sharpness was increased, but also the coding artifacts were amplified and binocular quality decreases. Nevertheless for high bit rates which support a quality appropriate in a real life scenario the posi-tive effect is dominant and algorithm improves the quality, due to increased overall sharpness (Figure 3.13).

Page 31: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

30

Figure 3.13: Part of sequence Horse for QP=25 (a), (c), (e) and QP=37 (b), (d), (f);

(a) and (b): sub sampled decoded left view;

(c) and (d): sub sampled decoded left view with unsharp masking;

(e) and (f): decoded left view

(a) (b)

(c) (d)

(e) (f)

Page 32: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

31

3.3 Conclusion Expert evaluation shows that the subjective quality of coded Mixed Resolution sequences is better than simulcast coded sequences, due to decreasing number of coding artifacts. The opti-mized bit rate distribution between left and right view for Mixed Resolution Coding without inter-view prediction was approximately 30 to 35% for the low resolution view.

Nevertheless in the large scale subjective evaluation of the coding approaches the Mixed Resolution approach does not outperform the simulcast approach (see the upcoming deliverable D4.3 “Results of quality attributes of coding, transmission and their combinations“). This might be related to the advanced display used in the large scale study and the different test methodologies. However, the evaluation carried out in this scope show the potential of the mixed resolution approach. Therefore the Advanced Mixed Resolution Stereo Coding (AMRSC) ap-proach has been investigated.

The three main features of the AMRSC approach are optimized down sampling, interview prediction and view enhancement using unsharp masking at the receiver side.

Optimized down sampling is reported in D5.4 (“Advanced algorithms for stereo-video pre-processing”) and leads to PSNR gains up to 1dB for the down-sampled view. Inter-view predic-tion significantly improves the rate distortion performance. The optimized QP combinations of both views show that the base view should be coded with a higher QP than the predicted low resolution view. The observed QP difference is between 2 and 8 for the tested sequences and settings. Unsharp masking of the low resolution view can enhance the overall quality but only for high bit rates. This does not apply to low or medium bit rates, because coding artifacts are also amplified by sharpening.

Further potential for the Mixed Resolution approach lies in more advanced content-adaptive sharpening and up-sampling algorithms. Also approaches using information from the full view for reconstruction from the down-sampled view are thinkable. A higher performance might result also from an optimization of the AMRSC approach using a new 3D video quality metric which comprises the implications of the binocular suppression theory better than the PSNR used in this scope.

Page 33: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

32

4 Optimization of coding approaches for subjective tests In this section methods for the generation of test stimuli for subjective tests are described. Cod-ing results for various stereo video coding approaches, codecs and codec settings are pre-sented. The focus of this section is set on the generation of test stimuli with defined bit rates for subjective comparison (in contrast to objective coding comparisons in previous Deliverable D2.2 [8]).

Based on the results of D2.2 the coding approaches to be tested have been chosen. From the two possible methods for Video plus Depth Coding (MPEG-C part 3 using H.264/AVC and H.264 auxiliary picture syntax), MPEG-C part 3 has been selected. Reason for this is the flexibility of independent bit rate allocation for video and depth provided by MPEG-c part 3. Coding ap-proaches using interview prediction are H.264/AVC Stereo SEI message and H.264/MVC. Out of those, H.264/MVC has been chosen in line with the 3D Video community, not for performance reasons, but for the reason of backward compatibility given by the possibility to extract a bit stream for 2D presentation. Beyond the methods examined in D2.2 the simple Mixed Resolution approach was optimized to generate test stimuli for the evaluation of coding methods. Further-more the new prototype of the software encoder using slice mode has been utilized for the cod-ing carried out for the evaluation of transmission approaches.

Another difference to D2.2 is the choice of test sequences. The coding test set from D2.1 [9] and a transmission test set have been defined matching the user’s needs examined in D4.1 [10]. For coding tests short sequences (~10s) are used. Longer sequences with audio (~60s) have been coded for the transmission tests. Further adjustments concern the spatial and temporal resolu-tion: The video format was adapted to match the resolution of the new NEC display and the frame rate was set to 12.5 or 15 fps.

4.1 Test sequences for subjective evaluation of coding approaches The subjective evaluation of coding approaches targets the finding of the optimal approach for coding of stereo video content. Therefore a great variety of coded sequences has been gener-ated. The next sections describe the test setup as well as the coding results.

4.1.1 Test setup

For the large scale evaluation of the four coding approaches Simulcast, Multi View (MVC), Mixed Resolution (MRSC) and Video plus Depth (VD) coding using MPEG-C part 3 a set of test stimuli has been generated.

The coding approaches have been optimized at rate points with a low and a high video quality. Furthermore a baseline and a high codec profile have been used. The six sequences from the coding test set of the stereo video database [9] have been used. This leads to a total number of 4 (approaches) x 2 (qualities) x 2 (profiles) x 6 (sequences) = 96 test stimuli.

4.1.1.1 Coding Approaches

H.264/AVC Simulcast

The left and right views are coded as independent streams using H.264/MPEG-4 AVC. Hence this method does not need any pre- or post processing before coding and after decoding, the complexity on sender and receiver side is low. Redundancy between channels is not exploited. Optimization is carried out by jointly varying the quantization parameter (QP) for left and right view.

Page 34: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

33

H.264/AVC Multi View Coding (MVC)

H.264/AVC MVC allows inter-view prediction. The left view is used as reference for the right view. Prediction has been enabled for anchor as well as for non-anchor frames. No pre- or post-processing is required on the sender or receiver side. Optimization is carried out by jointly vary-ing the QP for left and right view.

H.264/AVC Mixed Resolution coding (MRSC)

Binocular suppression theory states that perceived image quality is dominated by the view with higher spatial resolution [4]. The mixed resolution approach utilizes this attribute of human perception by decimating one view before transmission and up-scaling at the receiver side. This enables a trade off between spatial sub-sampling and amplitude quantization. Nevertheless sampling introduces pre- as well as post-processing.

For experiments in this scope the right view was decimated by a factor of two in horizontal and vertical direction. The simple MRSC approach without interview prediction, optimized down-sam-pling and unsharp masking has been used. Optimization is carried out by independently varying the QP for left and right.

MPEG C Part 3 using H.264/AVC (V+D Coding)

MPEG-C Part 3 defines a video plus depth representation of the stereo video content. Depth was estimated from an original left and right view by the HHI Hybrid Recursive Matching (HRM) algorithm. One view and the associated depth signal are coded. At the receiver the second view is synthesized by depth image based rendering [11]. Compared to video, a depth signal can be coded in most cases at a fraction of the color bit rate at sufficient quality for view synthesis. Nevertheless errors in depth estimation and interpolation at occurring disocclusions introduce artefacts to the rendered view. Optimization is carried out by independently varying the QP for video and depth.

4.1.1.2 Test set

The test set for coding defined in the stereo video database [9] was used to generate the test stimuli. The sequences are shown in Figure 4.1. Details are presented in Table 4.1. All se-quences have a frame rate of 15 frames per second.

Table 4.1 Properties of sequences from the coding test set

Sequence Genre Movement Complexity Length in sec.

Size in pixels Camera Object Structural Depth

Horse Nature none low high medium 9.3 432x240

Bullinger News none low low low 7.7 432x240

Car Action high low medium high 7.8 432x240

Mountain Documentary medium low medium low 8.0 320x240

Butterfly Animation none high high medium 12.0 432x240

Soccer2 Sports high high medium high 13.3 320x240

Page 35: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

34

Horse

Car

Bullinger

Mountain

Butterfly Soccer2

Figure 4.1 Sequences of the coding test set

Page 36: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

35

4.1.1.3 Codec Profiles

Coding has been carried out using two codec profiles. The simple baseline profile uses an IPPP structure and CAVLC. The complex high profile enables hierarchical B-Frames and CABAC. For the Simulcast, Mixed Resolution and V+D approach the AVC Reference Software JM 14.2 has been used. The MVC stimuli have been coded using the MVC reference Software JMVC 5.0.5. Table 4.2 shows the used codec settings in detail.

Table 4.2 Codec Settings and Profiles

Profile Baseline High

GOP Size 1 (IPPP) 8 (Hierarchical B frames)

Symbol Mode CAVLC CABAC

Search Range 48 48

Intra Period 16 16

4.1.1.4 High and low quality

The coding approaches have been evaluated at a high and low quality. Note, that it is not useful to define a constant high and constant low bit rate for all sequences to achieve high and low qualities for all sequences. Reason for this is a variable compressibility of different sequences. A rate sufficient for a high quality for one sequence might produce a low quality for other se-quences. To guarantee a comparable low and a comparable high quality for all sequences a low and a high rate point had to be determined for each sequence individually. The following ap-proach was used to obtain these rate points:

To define a high and a low quality for all sequences of the coding test set the quantization parameters (QP) of the codec for simulcast coding was set to 30 for the high quality and 37 for the low quality. This results in a low and high bit rate for each sequence of the coding test set. Resulting bit rates are shown in Table 4.3 and have been used as target rates for the other three approaches together with the baseline profile.

Table 4.3 Target bit rates in kbit/s for high and low quality

Profile Quality Bullinger Butterfly Car Horse Mountain Soccer2

Baseline Low 74 143 130 160 104 159

High 160 318 378 450 367 452

High Low 46 94 112 104 78 134

High 99 212 323 284 208 381

Bit rates for the high profile are also shown in Table 4.3. They are the rates from the sequences coded with high profile and simulcast having the same PSNR as the sequences coded with the base profile and simulcast at QP 37 and QP 30. This guarantees a comparable objective quality for the baseline and high-profile sequences using simulcast. Hence it can be subjectively eva-luated if the different GOP structures of the two profiles have an influence on the subjective qual-ity which is not reflected by the PSNR.

Page 37: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

36

4.1.2 Coding Results

4.1.2.1 Baseline Profile

Simulcast

Figure 4.2 shows the RD-characteristics used for the optimization of the simulcast approach. For coding a QP range from 18 to 44 with a step size of one was used. Sequences matching the bit rates defined in Table 4.3 have been taken as test stimuli.

The Bullinger sequence is highly compressible due to the very low complexity of the constant background and only slightly moving foreground. Content of the Butterfly and the Horse se-quence have both a high structural complexity and no camera movement. Nevertheless coding gains for Butterfly are higher than for Horse. Reason for this is the absence noise and a higher similarity of subsequent frames in the artificial scene. In the sequences Mountain, Soccer2 and Car the camera is moving. The strongest camera motion can be found in Car, nevertheless the camera is only moving in forward direction thus the scene is changing rather slowly. This ex-plains the higher gains compared to the Mountain and Soccer2 sequences in which the camera moves in horizontal or vertical direction.

Figure 4.2 PSNR vs. bit rate of left and right view for simulcast coding (baseline profile)

Page 38: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

37

Multi View Coding

The RD-characteristics used for the optimization of MVC approach is shown in Figure 4.3. The sequences have been coded using a QP range from 18 to 44 with a step size of one. Sequences matching the bit rates defined in Table 4.3 have been taken as test stimuli.

A comparison to simulcast coding shows that the coding gain increases. The differences be-tween the sequences are similar. A high gain can be found for the Butterfly sequence. This is related to the similarity of the two artificial views that enables an efficient interview prediction.

Figure 4.3 PSNR vs. bit rate of left and right view for mvc coding (baseline profile)

Mixed Resolution Stereo Coding

To determine optimal bit rate distribution between the views of the mixed resolution method, the approach suggested in [12] and section 3.1.2 was used. Thus the shown PSNR was calculated from the average MSE of the full and the up-sampled low resolution view. To take binocular suppression theory into account the down- and up-sampled original view was taken as reference for the up-sampled low resolution view. Hence the PSNR calculated this way only evaluates the coding quality and not the overall quality.

The left view and the down-sampled right view have been coded with QPs from 18 to 44 with a step size of 2. Coding results are shown in Figure 4.5. Each point represents a QP-combination for the left and the down-sampled right view. The optimal QP-combinations can be found on the envelope of these points. Sequences matching the bit rates defined in Table 4.3 and coded with optimal QP combinations have been taken as test stimuli. Therefore also coding with interme-diate QP-combinations has been done if necessary.

Page 39: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

38

Figure 4.4 PSNR vs. bit rate of left view and and down-sampled right view for MRSC Coding

(baseline profile)

Page 40: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

39

Figure 4.5 PSNR vs. bit rate of left view and depth for V+D Coding (baseline profile)

Page 41: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

40

Video + Depth Coding

Coding results for V+D coding are shown in Figure 4.5. The PSNR was calculated from the aver-age MSE of the left and the rendered right view. The MSE of the rendered right view was calcu-lated taking the rendered right view from uncoded data as reference. Rendering artifacts already existing in the uncoded data are neglected with this approach. Hence the PSNR calculated this way only evaluates the coding quality and not the overall quality.

The left view has been coded with QPs from 18 to 44 and a step size of 2. For depth QPs from 8 to 44 or 18 to 44 depending on the sequence have been used with a step size of 2. Each point in Figure 4.5 represents a QP-combination for the left view and depth. The optimal QP-combina-tions can be found on the envelope of these points. Sequences matching the bit rates defined in Table 4.3 and coded with optimal QP combinations have been taken as test stimuli. Therefore also coding with intermediate QP-combinations has been done, if necessary.

4.1.2.2 High Profile

Figure 4.6 to Figure 4.9 show the coding results for the high profile. Typical characteristics of the sequences are similar to the baseline profile case. For all sequences a high coding can be achieved by using the high profile with hierarchical B-pictures and CABAC. A comparison of high and base profile is presented separately for each sequence in section 4.1.3.

Simulcast

Figure 4.6 PSNR vs. bit rate of left and right view for simulcast coding (high profile)

Page 42: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

41

Multi View Coding

Figure 4.7 PSNR vs. bit rate of left and right view for MVC coding (high profile)

Page 43: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

42

Mixed Resolution Stereo Coding

Figure 4.8 PSNR vs. bit rate of left view and and down-sampled right view for MRSC Coding

(high profile)

Page 44: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

43

Video+Depth Coding

Figure 4.9 PSNR vs. bit rate of left view and depth for V+D Coding (high profile)

Page 45: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

44

4.1.3 Generated Test Stimuli

Table 4.4 to Table 4.9 show PSNRs and bit rate distribution of the resulting test stimuli. The total PSNR was calculated using the MSE of the single left and right views. Note that for MRSC the PSNR of the right view was calculated using the uncoded up- and down-sampled right view as reference. Therefore PSNR values are marked with pluses. For V+D coding the PSNR of the right view was calculated using the rendered right view from uncoded data as reference, PSNR values are marked with asterisks.

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Base Profile

Low Rate (74 kbit/s)

Simulcast 36.5 36.4 36.6 50%

MVC 39.2 39.1 39.2 36%

MRSC 39.0+ 38.7 39.3+ 32%

V+D 37.3* 39.2 36.0* 23%

High Rate (160 kbit/s)

Simulcast 40.3 40.1 40.6 49%

MVC 41.7 41.5 42.0 39%

MRSC 42.7+ 41.5 44.4+ 38%

V+D 39.5* 41.5 38.2* 33%

High Profile

Low Rate (46 kbit/s)

Simulcast 36.8 36.7 36.9 50%

MVC 37.3 37.5 37.0 36%

MRSC 38.5+ 37.7 39.5+ 38%

V+D 37.3* 38.6 36.3* 26%

High Rate (99 kbit/s)

Simulcast 40.2 40.0 40.4 49%

MVC 40.8 40.7 40.9 38%

MRSC 42.0+ 40.5 44.4+ 44%

V+D 39.4* 41.5 38.0* 21%

Table 4.4 Properties of test stimuli of sequence Bullinger

Table 4.4 shows the coding results for Bullinger. Interview prediction leads to bit rate savings of about 24% for the right view and significant PSNR gains for MVC. Optimal distribution of the bit rate for MR coding reaches from 32%-38% for the down-sampled right view. Depth can be coded at approximately 20%-30% of the total bit rate and leads to the best quality of the left

Page 46: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

45

view. A comparison of the baseline and the high profile shows, that the high profile enables bit rate savings of about 40% while the quality for the Simulcast, MRSC and VD coding remains unchanged.

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Base Profile

Low Rate Base Profile (143 kbit/s)

Simulcast 32.0 32.0 32.1 50%

MVC 37.0 37.0 37.0 27%

MRSC 33.8* 32.5 35.7* 45%

V+D 32.6+ 34.5 31.2+ 19%

High Rate Base Profile (318 kbit/s)

Simulcast 36.5 36.5 36.6 50%

MVC 40.6 40.6 40.5 28%

MRSC 39.1* 38.0 40.6* 38%

V+D 36.2+ 38.7 34.6+ 27%

High Profile

Low Rate (94 kbit/s)

Simulcast 32.0 32.0 32.1 50%

MVC 33.7 33.7 33.7 23%

MRSC 33.3* 32.0 35.4 49%

V+D 33.6+ 35.0 32.5+ 8%

High Rate (212 kbit/s)

Simulcast 36.4 36.4 36.4 50%

MVC 38.2 38.3 38.1 24%

MRSC 38.8* 37.8 40.0* 39%

V+D 37.2+ 38.5 36.3+ 29%

Table 4.5 Properties of test stimuli of sequence Butterfly

The results for the Butterfly sequence are shown in Table 4.5. Due to the synthetic character of the sequence and the similarity of both views, interview prediction is very efficient. About 50% of bit rate compared to simulcast can be saved for the right view. The bit rate of the down-sampled right view ranges from 38% to 49% of the total bit rate. The optimal bit rate for depth is from about 8% to 29% of the total rate. The bit rates of sequences coded with high profile are about 33% lower than for sequences coded with the baseline profile. Performance of MVC and MRSC decreases, but performance of V+D increases.

Page 47: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

46

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Base Profile

Low Rate (130 kbit/s)

Simulcast 33.4 33.5 33.3 52%

MVC 34.4 34.5 34.3 43%

MRSC 35.1* 34.0 36.7* 46%

V+D 34.7+ 35.7 33.8+ 14%

High Rate (378 kbit/s)

Simulcast 37.3 37.5 37.2 52%

MVC 38.2 38.2 38.1 43%

MRSC 39.4* 38.1 41.1* 42%

V+D 37.9+ 40.2 36.5+ 12%

High Profile

Low Rate (112 kbit/s)

Simulcast 33.5 33.7 33.4 52%

MVC 35.1 35.3 34.9 36%

MRSC 35.2* 34.2 36.4* 42%

V+D 34.6+ 35.9 33.7+ 7%

High Rate (323 kbit/s)

Simulcast 37.6 37.7 37.4 52%

MVC 38.4 38.6 38.2 38%

MRSC 39.7* 39.0 40.5* 34%

V+D 38.1+ 40.3 36.7+ 16%

Table 4.6 Properties of test stimuli of sequence Car

Table 4.6 depicts the results for the sequence Car. MVC and MRSC results in bit rate savings of approx. 20% for the right view. Depth can be coded efficiently and needs only about 7% to 16% of the total bit rate. The bit rate of sequences coded with the high profile is about 14% lower as for sequences coded with the baseline profile. The quality remains for all methods approximately equal. Thus the gain achieved by using the more complex coding structure is relatively low.

Page 48: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

47

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Base Profile

Low Rate (160 kbit/s)

Simulcast 29.2 29.2 29.2 49%

MVC 30.8 30.8 30.8 35%

MRSC 31.3* 29.7 34.1* 41%

V+D 30.3+ 31.0 29.8+ 18%

High Rate (450 kbit/s)

Simulcast 34.0 34.0 33.9 49%

MVC 35.3 35.3 35.4 40%

MRSC 37.1* 35.6 39.2* 33%

V+D 34.9+ 37.2 33.5+ 13%

High Profile

Low Rate (104 kbit/s)

Simulcast 29.0 29.0 29.0 49%

MVC 30.1 30.2 30.0 30%

MRSC 31.2* 29.5 33.8* 41%

V+D 30.6+ 31.5 29.9+ 9%

High Rate (284 kbit/s)

Simulcast 33.7 33.6 33.7 50%

MVC 35.2 35.1 35.2 33%

MRSC 37.0* 36.0 38.4* 29%

V+D 34.5+ 36.6 33.1+ 13%

Table 4.7 Properties of test stimuli of sequence Horse

Properties of the sequence Horse are provided in Table 4.7. MVC and MRSC lead to a rate for the right view of approximately 30% to 40% of the total bit rate. Depth can be coded at about 9% to 18% of the total bit rate and leads to a high quality of the left view. The high profile enables bit rate savings 35% of compared to the base profile at approximately equal quality.

Page 49: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

48

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Base Profile

Low Rate (104 kbit/s)

Simulcast 28.7 28.9 28.5 52%

MVC 29.6 29.9 29.3 40%

MRSC 31.7* 30.0 34.7* 33%

V+D 29.8+ 30.7 29.1+ 19%

High Rate (367 kbit/s)

Simulcast 33.2 33.3 33.0 54%

MVC 32.9 33.1 32.7 48%

MRSC 37.0* 34.9 41.2* 30%

V+D 33.7+ 35.7 32.4+ 13%

High Profile

Low Rate (78 kbit/s)

Simulcast 29.1 29.3 28.9 53%

MVC 30.0 30.3 29.6 35%

MRSC 32.7* 31.1 35.2* 29%

V+D 31.2+ 33.1 29.9+ 10%

High Rate (208 kbit/s)

Simulcast 33.4 33.6 33.2 53%

MVC 34.4 34.6 34.1 41%

MRSC 37.3* 35.3 40.9* 28%

V+D 33.5+ 35.9 32.0+ 16%

Table 4.8 Properties of test stimuli of sequence Mountain

Table 4.8 shows coding results for the Mountain sequence. Regarding the bit rate distribution of simulcast coding, it can be seen that the right view is slightly less compressible than the left view. MVC achieves gains up to 1 dB compared with simulcast. Nevertheless interview predic-tion is not efficient for the high rate and baseline profile. The MRSC leads to a distribution of bit rate of about 30% for the down-sampled right view. Depth can be coded at 13% to 19% of the total bit rate. For the low rates up to 25% of bit rate can be saved with the high profile and slightly better quality. At the high rate a saving of 40% is achieved.

Page 50: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

49

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Base Profile

Low Rate (159 kbit/s)

Simulcast 31.8 31.9 31.7 50%

MVC 33.4 33.6 33.3 37%

MRSC 34.4* 33.0 36.6* 37%

V+D 33.4+ 34.3 32.7+ 16%

High Rate (452 kbit/s)

Simulcast 36.1 36.2 35.9 51%

MVC 36.6 36.8 36.5 43%

MRSC 39.4* 37.6 42.5* 36%

V+D 37.3+ 38.9 36.2+ 13%

High Profile

Low Rate (134 kbit/s)

Simulcast 31.9 32.1 31.8 50%

MVC 32.8 33.0 32.6 35%

MRSC 34.4* 33.2 36.1* 34%

V+D 33.1+ 34.4 32.1+ 10%

High Rate (381 kbit/s)

Simulcast 36.2 36.3 36.1 51%

MVC 37.5 37.6 37.4 40%

MRSC 39.4* 37.8 42.2* 34%

V+D 37.2+ 39.1 35.9+ 15%

Table 4.9 Properties of test stimuli of sequence Soccer2

Coding results for the Soccer2 sequence are presented in Table 4.9. For MVC PSNR gains up to 1.5 dB can be reached. Bit rate of the right view reaches from 34% to 43% for MVC and MRSC. The depth can be coded at 10% to 15% of the total bit rate and enables gains up to 2.7 dB for the left view. Coding with the high profile leads to bit rate saving of 15% at approximately the same quality.

4.1.4 Conclusion

For subjective evaluation of coding methods 96 test stimuli have been generated from the six sequences of the coding test set using Simulcast, Multi View, Mixed Resolution and Video + Depth coding. A baseline and a high codec profile were used. An objective evaluation was car-ried out at a low and a high quality level.

Page 51: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

50

MVC results in a higher PSNR compared to simulcast. Using V+D and Mixed Resolution coding the PSNR of the left view increases compared to the simulcast approach. Nevertheless quality of the rendered right view, the down-sampled view and the overall quality of both views is questionable since rendering artifacts and image distortions introduced by down-sampling can-not be evaluated using the PSNR. Therefore the large scale subjective test is needed. Results of this test are reported in the upcoming deliverable D4.3 “Results of quality attributes of coding, transmission and their combinations“. Coding using the high profile generates sequences at approximately the same quality level, but with bit rate savings from 15% to 50%.

4.2 Test sequences for transmission studies Additional to the examination of the coding approaches a study on transmission approaches was carried out. This section deals with the preparation of test stimuli for this study. The focus is set on the coding part. Apart from the slice mode, error resilience strategies are discussed in the upcoming deliverable D3.4 (“Stereo DVB-H broadcasting system with error resilient tools”).

4.2.1 Test setup

For the transmission studies coded sequences using the Simulcast, Multi View and Video plus Depth coding have been generated.

The coding approaches have been optimized at rate points with a high video quality. The base-line codec profile has been used. The four sequences from the transmission test set of the ste-reo video database [9] have been used. Furthermore sequences have been coded with and without using the newly implemented slice mode.

This leads to a total number of 3 (approaches) x 4 (sequences) x 2 (slice mode)= 24 test stimuli.

4.2.1.1 Coding Approaches

The Simulcast, Multi View and the V+D coding approaches as described in section 4.1.1.1 have been used. Due to low performance of the simple Mixed Resolution approach in the subjective coding test (see D4.3), this approach was omitted. The Advanced Mixed Resolution Stereo Cod-ing (AMRSC) approach has not been available at the time of the preparation of the test se-quences.

4.2.1.2 Test set

A test set for transmission studies was defined and will be reported in the next update of the stereo video database [9]. The sequences are shown in Figure 4.10. Details are presented in Figure 4.10. The sequences RhineValleyMoving, Knights Quest and HeidelbergAlleys consist of different scenes with varying movement and complexity. All sequences have a length of 60 seconds and are available with audio.

Table 4.10 Properties of sequences from the coding test set

Sequence Genre Movement Complexity Frame Rate

Size in pixels Camera Object Structural Depth

RollerBlade Sports User Created

None High Medium Medium 15.0 320x240

RhineValleyMoving Action High High Medium Low 12.5 432x240

KnightsQuest Animation Various Various Low Low 12.5 432x240

HeidelbergAlleys Documentary Low Low High Various 12.5 432x240

Page 52: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

51

RollerBlade1

KnightsQuest

RhineValleyMoving

HeidelbergAlleys

Figure 4.10: Sequences of the transmission test set

4.2.1.3 Codec Profile

The transmission study has been carried out using the baseline profile shown in Table 4.2. For all approaches the MVC reference Software JMVC 5.0.5 has been used. Interview prediction was not used for simulcast and V+D coding.

4.2.1.4 Quality level

The coding approaches have been evaluated at a high quality point. Individual target bit rates for each sequence have been found with the approach described in section 4.1.1.4.

To define a high quality for all sequences of the transmission test set it was chosen to set a quantization parameter (QP) of the codec for simulcast coding to 30. This results in a target bit rate for each sequence from coding test set. Furthermore bit rates should not exceed 600 kbit/s. Therefore it was necessary to set a QP of 33 for the RollerBlade sequence. Resulting bit rates are shown in Table 4.11 and have been used as target rates for the other two approaches and the slice mode.

Table 4.11: Target bit rates in kbit/s

RollerBlade1 RhineValleyMoving KnightsQuest HeidelbergAlleys

574 423 275 341

Page 53: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

52

4.2.1.5 Slice Mode

The methods have been optimized with and without using the newly implemented slice mode. The size of a slice was set to 1000 byte. This size was chosen because it enables sufficient resi-lience in case of error prone channel without adding to much bit rate (see section 2.5)

4.2.2 Coding Results

4.2.2.1 Slice Mode disabled

Simulcast

Figure 4.11 shows the optimization of the simulcast approach. The sequences have been coded using a QP range from 18 to 44 with a step size of one. Sequences matching the bit rates de-fined in Table 4.11 have been taken as test stimuli.

It can be seen that for RollerBlade1 a high bit rate is needed to achieve a good image quality. Reason for is not only the high object movement, but also the higher frame rate of this se-quence. RhineValleyMoving and HeidelbergAlleys perform very similar, although their characteristics differ (high movement in RhineValleyMoving and high structural details in HeidelbergAlleys). KnightsQuest shows the best R-D performance. This can be related to the low structural complexity of this sequence and the absence of noise.

Figure 4.11 PSNR vs. bit rate of left and right view for simulcast coding (slice mode disabled)

Page 54: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

53

Multi View Coding

Figure 4.12 shows the optimization of the MVC approach. The sequences have been coded using a QP range from 18 to 44 with a step size of one. Sequences matching the bit rates de-fined in Table 4.11 have been taken as test stimuli. The characteristic of the coded sequences are similar to the sequences coded with simulcast.

Figure 4.12 PSNR vs. bit rate of left and right view for MVC coding (slice mode disabled)

Video + Depth Coding

The left view has been coded with QPs from 20 to 40 and a step size of 2. For depth QPs from 8 to 40 have been used with a step size of 2. Coding results are shown in Figure 4.13. Each point represents a QP-combination for the left view and depth. The optimal QP-combinations can be found on the envelope of these points. Sequences matching the bit rates defined in Table 4.11 and coded with optimal QP-combinations have been taken as test stimuli. Therefore also coding with intermediate QP-combinations has been done if necessary.

Performance differences regarding the four sequences are similar to simulcast coding. This can be explained with the effect, that the R-D performance is dominated by left view, left depth plays a minor role since it can be coded at a low rate. One difference can be found for RhineValleyMoving and HeidelbergAlleys. While these sequences perform similar for high rates using Simulcast or MVC, Figure 4.13 shows that the PSNR for RhineValleyMoving is higher than for HeidelbergAlleys for all bit rates. Reason for that is the low complexity of the RhineValleyMoving depth.

Page 55: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

54

Figure 4.13 PSNR vs. bit rate of left view and depth for V+D Coding (slice mode disabled)

4.2.2.2 Slice Mode enabled

Coding results for Simulcast, MVC and V+D with enabled slice mode are shown in Figure 4.14 to Figure 4.16. It can be seen that R-D performance is only influence slightly by the slice mode. Differences between the sequences are same as found for coding without the slice mode.

Page 56: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

55

Simulcast

Figure 4.14 PSNR vs. bit rate of left and right view for simulcast coding (slice mode enabled)

Multi View Coding

Figure 4.15 PSNR vs. bit rate of left and right view for MVC coding (slice mode enabled)

Page 57: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

56

Video + Depth Coding

Figure 4.16 PSNR vs. bit rate of left view and depth for V+D Coding (slice mode enabled)

Page 58: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

57

4.2.3 Generated Test Stimuli

Table 4.12 to Table 4.15 show PSNRs and bit rate distribution of the resulting test stimuli. The total PSNR was calculated using the MSE of the single left and right views. Note that for V+D coding the PSNR of the right view was calculated using the rendered right view from uncoded data as reference. Therefore PSNR values are marked with asterisks.

Table 4.12 shows the coding results for sequence Rollerblade coded at a rate of 574 kbit/s. It can be seen that MVC coding outperforms simulcast by 0.7 or 1.5 dB. For V+D Coding the depth only needs 1/6 of the total bit rate. Thus the left view can be coded with much higher quality compared to simulcast. Total PSNR of the V+D approach is low nevertheless it cannot be com-pared directly to the other approaches since it uses a different reference. A comparison of slice and non-slice mode shows, that for simulcast and V+D the additional required bit rate is such low that the same QPs are optimal for slice and non-slice mode.

Table 4.12: Coding Results for sequence Rollerblade (574 kbit/s)

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Simulcast 33.0 33.0 33.0 50%

Simulcast Slice Mode 33.0 33.0 33.0 50%

MVC 34.6 34.5 34.7 39%

MVC Slice Mode 33.8 33.7 33.9 38%

V+D 34.6* 36.7 33.2* 16%

V+D Slice 34.6* 36.7 33.2* 15%

Results for RhineValleyMoving are presented in Table 4.13. PSNR gains achieved by MVC are about 0.7dB. Depth is highly compressible and can be coded at about 10% of the total bit rate. Bit rate related to the slice mode can be neglected for V+D Simulcast and MVC coding.

Table 4.13: Coding Results for sequence RhineValleyMoving (423 kbit/s)

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Simulcast 36.9 36.9 36.9 50%

Simulcast Slice Mode 36.9 36.9 36.9 50%

MVC 37.6 37.6 37.6 41%

MVC Slice Mode 37.6 37.6 37.6 41%

V+D 38.1* 39.7 37.0* 13%

V+D Slice 37.9* 39.7 36.7* 10%

Table 4.14 depicts results for KnightsQuest. Due to the high similarity of both animated views high gains can be achieved by interview prediction. The rate of the right view is two times as low as the rate of the left view.

Page 59: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

58

Table 4.14: Coding Results for sequence Knights Quest (275kbit/s)

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Simulcast 39.4 39.4 39.4 50%

Simulcast Slice Mode 39.4 39.4 39.4 50%

MVC 41.4 41.4 41.4 32%

MVC Slice Mode 40.7 40.8 40.7 31%

V+D 40.2* 42.0 39.0* 24%

V+D Slice 39.9* 42.0 38.5* 19%

Properties of HeidelbergAlleys test stimuli are provided in Table 4.15. Interview prediction leads to gains up to 1.3 dB. Depth requires about 25% of the total bit rate. Compared to the other se-quences this is rather high.

Table 4.15: Coding Results for sequence HeidelbergAlleys (341kbit/s)

Method PSNR-Y both views [dB]

PSNR-Y left view [dB]

PSNR-Y right view [dB]

Bit rate right / Bit rate total

Simulcast 35.7 35.7 35.7 49%

Simulcast Slice Mode 35.0 35.0 35.1 49%

MVC 36.5 36.4 36.5 40%

MVC Slice Mode 36.5 36.4 36.5 40%

V+D 35.8* 37.2 34.7* 26%

V+D Slice 35.5* 37.2 34.3* 19%

4.2.4 Conclusion

For subjective evaluation of transmission 24 test stimuli have been generated from the four se-quences of the transmission test set using Simulcast, Multi View and Video + Depth coding.

MVC results in PSNR gains up to 2 dB compared to simulcast. Using V+D coding the PSNR of the left view increases up to 3.7 dB compared to the simulcast approach. Nevertheless quality of the rendered right view is questionable since rendering artifacts cannot be evaluated using the PSNR measure. In most cases additional bit rate needed for the slice mode can be neglected.

In the next step equal and unequal error protection will be added to the generated sequences. This method of error resilience is located in the transport layer and does not incorporate the en-coder. Transmission will be simulated using two channel models with different error rates. The resulting 4 (sequences) x 3 (methods) x 2 (slice mode) x 2 (protection mode) x 2 (error rates) = 96 sequences will be evaluated subjectively in large scale test. Results from this evaluation can be found in the upcoming Deliverable 4.3 “Results of quality attributes of coding, transmission and their combinations“.

Page 60: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

59

5 Conclusion The prototype of the software encoder using error resilience has been presented. Error Resi-lience is achieved by a new slice mode. Frames are stored in smaller data packets that can still be decoded independently in case of losses. Coding tests show that the additional bit rate needed for the error resilience can be neglected and video quality only decreases slightly for error free channels. In case of an error prone channel it has been demonstrated that the new slice mode provides sufficient error resilience and leads to a high gain of video quality.

The evaluation of the Mixed Resolution approach shows that the subjective quality of coded Mixed Resolution sequences is better than Simulcast coded sequences, due to decreasing num-ber of coding artifacts. The optimized bit rate distribution between left and right view for Mixed Resolution Coding without inter-view prediction was approximately 30 to 35% for the low resolu-tion view. An Advanced Mixed Resolution Stereo Coding (AMRSC) approach has been investi-gated. The three main features of the AMRSC approach are optimized down sampling, interview prediction and view enhancement using unsharp masking at the receiver side. Thus, the new AMRSC approach is compliant with a unified MVC coding strategy to be used in Mobile3DTV applications.

For subjective evaluation of coding methods 96 test stimuli have been generated from the six sequences of the coding test set using Simulcast, Multi View, Mixed Resolution and Video + Depth Coding. A baseline and a high codec profile were used. An objective evaluation was car-ried out at a low and a high quality level. For subjective evaluation of transmission methods 24 test stimuli have been generated from the four sequences of the transmission test set using Simulcast, Multi View and Video plus Depth coding.

Page 61: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

60

Acknowledgements

The authors want to thank the providers of stereoscopic content that made our research possi-ble.

We like to thank:

KUK Filmproduktion; www.kuk-film.de; for Horse, Hands, Snail, and Car

Electronics and Telecommunications Research Institute (ETRI; www.etri.re.kr; for Moun-tain, Soccer2, Diving and Performance

Dongleware Verlags GmbH; www.dongleware.de; for Summer in Heidelberg (“HeidelbergAlleys“)

Red Star Studio Ltd; www.redstar3d.com; for Knight’s Quest

cinovent entertainment GmbH; www.cinovent.com for Upper Middle Rhine Valley (“RhineValleyMoving”)

Blender Foundation; www.blender.org for the 3D models of the Big Buck Bunny movie; (the Butterfly sequence was rendered from these models; copyright Blender Foundation | www.bigbuckbunny.org )

Page 62: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

MOBILE3DTV D2.5 Development and optimization of coding algorithms for mobile 3DTV

61

References

[1] S. Kumar, L. Xu, M. K. Mandal, and S. Panchanathan, "Overview of Error Resiliency Schemes in H.264/AVC Standard," preprint for Signal Processing: Image Communica-tion, 2005.

[2] ETSI TR 102 377 “Digital Video Broadcasting (DVB); DVB-H Implementation Guidelines”, Technical Report, 2009

[3] B. Julesz, “Foundations of Cyclopean Perception”, University of Chicago Press, Chicago, IL, USA, 1971.

[4] Wa James Tam, “Image and depth quality of asymmetrically coded stereoscopic video for 3D-TV”, JVT-W094, San Jose, CA, April 2007.

[5] H. Brust, G. Tech and K. Müller, “Report on generation of mixed spatial resolution data base”, Mobile3dtv: Technical report, June 2009

[6] Stereoscopic Player, 3dtv.at, http://www.3dtv.at/Products/Player/Index_de.aspx [7] C. Fehn, P. Kauff, S. Cho, H. Kwon, N. Hur, and J. Kim. Asymmetric coding of stereos-

copic video for transmission over T-DMB. 3DTV-Con, Kos Island, Greece, 2007. [8] P. Merkle, H. Brust, K. Dix, Y. Wang, A. Smolic, “Adaptation and optimization of coding

algorithms for mobile 3DTV”, Mobile3DTV: Technical report, November 2008. [9] A. Smolic and G. Tech, "Report on generation of stereo video data base", v2.0, Mo-

bile3dtv: Technical report, July 2009. [10] D. Strohmeier, S. Jumisko-Pyykkö, M. Weitzel, S. Schneider, "Report on user needs and

expectations for mobile stereo-video", Mobile3DTV: Technical report, June 2008. [11] P. Merkle, Y. Wang, K. Müller, A. Smolic, and T. Wiegand, “Video plus Depth Compres-

sion for Mobile 3D Services”, Proc. IEEE 3DTV Conference, Potsdam, Germany, Mai 2009.

[12] H. Brust, A. Smolic, K. Müller, G. Tech, T. Wiegand, “Mixed resolution coding of stereos-copic video for mobile devices,” 3DTV Conference, Potsdam, 2009.

Page 63: Development and optimization of coding algorithms for ...sp.cs.tut.fi/mobile3dtv/results/tech/D2.5_Mobile3DTV_v1.0.pdf · Development and optimization of coding algorithms for mobile

Mobile 3DTV Content Delivery Optimization over DVB-H System

MOBILE3DTV - Mobile 3DTV Content Delivery Optimization over DVB-H System - is a three-yearproject which started in January 2008. The project is partly funded by the European Union 7th

RTD Framework Programme in the context of the Information & Communication Technology (ICT)Cooperation Theme.

The main objective of MOBILE3DTV is to demonstrate the viability of the new technology ofmobile 3DTV. The project develops a technology demonstration system for the creation andcoding of 3D video content, its delivery over DVB-H and display on a mobile device, equippedwith an auto-stereoscopic display.

The MOBILE3DTV consortium is formed by three universities, a public research institute and twoSMEs from Finland, Germany, Turkey, and Bulgaria. Partners span diverse yet complementaryexpertise in the areas of 3D content creation and coding, error resilient transmission, userstudies, visual quality enhancement and project management.

For further information about the project, please visit www.mobile3dtv.eu.

Tuotekehitys Oy TamlinkProject coordinator

FINLAND

Tampereen Teknillinen Yliopisto

Visual quality enhancement,

Scientific coordinator

FINLAND

Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V

Middle East Technical UniversityError resilient transmission

TURKEY

Stereo video content creation and coding

GERMANY

Technische Universität IlmenauDesign and execution of subjective tests

GERMANY

MM Solutions Ltd. Design of prototype terminal device

BULGARIA

MOBILE3DTV project has received funding from the European Community’s ICT programme in the context of theSeventh Framework Programme (FP7/2007-2011) under grant agreement n° 216503. This document reflects onlythe authors’ views and the Community or other project partners are not liable for any use that may be made of theinformation contained therein.