comparison of evaluation methods for the quality...

4
Comparison of Evaluation Methods for the Quality Assessment of Audio Signals Ulrike Sloma 1 , Florian Sch¨ afer 1 1 Institute for Media Technology , 98693 Ilmenau, Germany, Email: [email protected] Introduction When evaluating audio signals not only the overall audio quality but also the underlaying quality parameters are of high interest. For this purpose relevant quality features, e.g., attribute and vocabulary lists, have been determi- ned by several researchers. When evaluating the quality description of a set of audio signals with a few predefi- ned quality features an appropriate evaluation design has to be found. This study presents a comparison of two evaluation paradigms. On one hand a single-stimulus- with-multiple-attributes method and on the other hand a multi-stimulus-with-single-attribute method were used for evaluation of the same stimuli and quality attributes. Research Goal The aim of the study was to compare two different eva- luation methods for the assessment of different audio si- gnals and several quality features. One method presents the audio stimuli sequentially and asks for the grading of all quality features in parallel, this is called single- stimulus-with-multiple-attributes (SSMA) method. The second method is a multi-stimulus-with-single-attribute (MSSA) method in which a group of audio signals is pre- sented in parallel and the participant has to rate them according to one quality feature. The attributes were pre- sented sequentially. Table 1 and figure 1 are showing an overview over both methods. Advantages and disadvantages for each method are ex- amined to find a suitable method for multidimensional audio quality evaluation. A closer look is given to the test duration, the differences in the perception between the test methods as well as to the presentation order and the used signal types. Listening Test Design The listening test was divided into two test sessions with a short break in between. Each session covered the con- duction of one test method. The presentation order of Table 1: Overview of the used methods single-stimulus- with-multiple- attributes multiple-stimulus- with-single- attribute SSMA MSSA audio- signals sequential parallel quality features parallel sequential Fig. 1: Schemes of the used methods the methods changed between the participants. The same stimuli and quality features were used for both methods. The test was conducted in German and English language to get a wider range of participants. This section describes the listener task, the listening room, the choice of the used audio signals and quality features and the listener panel. Listener Task: The test was divided in a training phase and two test sessions as follows: 1. Training: Getting familiar with the audio signals and the qua- lity features. 2. 1st method: Group 1: SSMA; Group 2: MSSA 3. 2nd method: Group 1: MSSA; Group 2: SSMA Quality features: For the evaluation task, five perceptual quality features were chosen. Table 2 gives an overview. As a reference for this the SAQI vocabulary [5] was used, which is available in German and English. The scale end labels, defined in the SAQI vocabulary, were modified by Sch¨ afer [6] to make them more clearly to the participants. Table 2: Quality Features quality feature scale end labels distance closer - more distant width narrower - wider localizability more difficult locatable - easier locatable reverberation level less reverb - more reverb naturalness unnatural - natural DAGA 2016 Aachen 893

Upload: others

Post on 21-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparison of Evaluation Methods for the Quality ...pub.dega-akustik.de/DAGA_2016/data/articles/000548.pdf · [6]Sch afer, F., \Vergleich verschiedener Evaluierungs-methoden f ur

Comparison of Evaluation Methods for the Quality Assessment of Audio Signals

Ulrike Sloma1, Florian Schafer11 Institute for Media Technology , 98693 Ilmenau, Germany, Email: [email protected]

Introduction

When evaluating audio signals not only the overall audioquality but also the underlaying quality parameters are ofhigh interest. For this purpose relevant quality features,e.g., attribute and vocabulary lists, have been determi-ned by several researchers. When evaluating the qualitydescription of a set of audio signals with a few predefi-ned quality features an appropriate evaluation design hasto be found. This study presents a comparison of twoevaluation paradigms. On one hand a single-stimulus-with-multiple-attributes method and on the other handa multi-stimulus-with-single-attribute method were usedfor evaluation of the same stimuli and quality attributes.

Research Goal

The aim of the study was to compare two different eva-luation methods for the assessment of different audio si-gnals and several quality features. One method presentsthe audio stimuli sequentially and asks for the gradingof all quality features in parallel, this is called single-stimulus-with-multiple-attributes (SSMA) method. Thesecond method is a multi-stimulus-with-single-attribute(MSSA) method in which a group of audio signals is pre-sented in parallel and the participant has to rate themaccording to one quality feature. The attributes were pre-sented sequentially. Table 1 and figure 1 are showing anoverview over both methods.

Advantages and disadvantages for each method are ex-amined to find a suitable method for multidimensionalaudio quality evaluation. A closer look is given to thetest duration, the differences in the perception betweenthe test methods as well as to the presentation order andthe used signal types.

Listening Test Design

The listening test was divided into two test sessions witha short break in between. Each session covered the con-duction of one test method. The presentation order of

Table 1: Overview of the used methods

single-stimulus-with-multiple-attributes

multiple-stimulus-with-single-attribute

SSMA MSSA

audio-signals

sequential parallel

qualityfeatures

parallel sequential

SSMA MSSA

Fig. 1: Schemes of the used methods

the methods changed between the participants. The samestimuli and quality features were used for both methods.The test was conducted in German and English languageto get a wider range of participants.

This section describes the listener task, the listeningroom, the choice of the used audio signals and qualityfeatures and the listener panel.

Listener Task:

The test was divided in a training phase and two testsessions as follows:

1. Training:Getting familiar with the audio signals and the qua-lity features.

2. 1st method:Group 1: SSMA; Group 2: MSSA

3. 2nd method:Group 1: MSSA; Group 2: SSMA

Quality features:

For the evaluation task, five perceptual quality featureswere chosen. Table 2 gives an overview. As a reference forthis the SAQI vocabulary [5] was used, which is availablein German and English. The scale end labels, definedin the SAQI vocabulary, were modified by Schafer [6] tomake them more clearly to the participants.

Table 2: Quality Features

quality feature scale end labels

distance closer - more distantwidth narrower - widerlocalizability more difficult locatable - easier

locatablereverberation level less reverb - more reverbnaturalness unnatural - natural

DAGA 2016 Aachen

893

Page 2: Comparison of Evaluation Methods for the Quality ...pub.dega-akustik.de/DAGA_2016/data/articles/000548.pdf · [6]Sch afer, F., \Vergleich verschiedener Evaluierungs-methoden f ur

Listening room and speaker set-up:

As test environment a standardized listening lab, locatedat Technische Universitat Ilmenau, was used. The louds-peakers were arranged in a standardized 5.0 loudspeakerset-up. The room requirements and the set-up is definedin [1] and [2].

Signals:

Three different signal types were chosen to cover a broadsignal range.

- castanets- music (Eddie Rabbit)- male speech

The signals were taken from [3]. For each signal type fourversions were created.

1. mono→ center speaker

2. stereo→ left + right speaker

3. phase stereo→ left + right speaker, phase shifted signal at theright side

4. reverberation→ left + right speaker, rear left + right speakersadded

The choice of the audio signals was inspired by Berg andRumsey [4]. They focused on the identification of percep-tual attributes with a Repertory Grid elicitation method.For this the rating of quality features, previously derivedby an individual verbal elicitation, is similar to the SSMAmethod.

Listener panel:

In the study 22 participants, 14 male and 8 female, withan average age of 27 years took part. They ranged fromnaive to experienced listeners. In the post-screening noneof the listeners were rejected.

Results

The following section gives an overview over the mainoutcomes of the study. For statistical evaluation the simi-larity of the distributions of the specific signals selectedfor comparison was tested with a Kolmogorov-Smirnovtest. Significance was tested in case of equal distributi-on through a Wilcoxon rank sum test and for unequaldistribution through a two sample permutation test. Thesignificance level is 0.05 %. In the figures 2 and 3 thegray boxes indicates significant differences between bothmethods. Figure 4 visualizes this for differences based onthe presentation order.

Duration:

The time for the completion of the tests varies with theevaluation method. Table 3 gives an overview over theaverage times. Figure 2 visualizes this for both groups.For the MSSA method twice the time was needed than forthe SSMA method. Also the influence of the presentation

Table 3: Duration for both test methods

SSMA MSSA

Group 1 Ø 13.7 minutes Ø 20.8 minutesGroup 2 Ø 14.1 minutes Ø 32.5 minutesall Ø 13.9 minutes Ø 26.7 minutes

Fig. 2: Duration for both test methods with respect to thepresentation order; SSMA influences the duration of MSSA;no influence the other way round

order on the conduction time can be seen. If the SSMAmethod was presented first the average time needed forthe MSSA was decreased by 12 minutes. The other wayround no influence of the first presented method MSSAtowards the second method SSMA was found.

Comparison of the methods:

The box plots in figure 3 show the results for both me-thods for all signals. It can be seen, that there are only afew differences in the perceptual evaluation between thetwo methods. The most differences are found for the qua-lity feature “width”. For the “distance” and the “rever-beration level” both methods achieved the same results.In total 12 out of 60 comparisons show significant diffe-rences. It can be concluded that the methods have minorinfluence of the resulting values.

Influence of the presentation order:

In figure 4 exemplary the results for the quality feature“reverberation level” are visualized for the different orderof method presentation. In total, over all quality featu-res, all signals and both methods, only eight significantdifferences, out of 120 comparisons regarding the presen-tation order, were found. It can be stated that the orderof presentation has minor influence on the resulting va-lues.

Evaluation of the quality features:

The quality features can be divided into two categories.Category one includes the quality features “distance”,“width” and “reverberation level” and category two isbuild of the quality features “localizability” and “natu-ralness”. Within one category a positive correlation of theresulting values for the perception of the quality featuresfor the signals can be observed (figure 3). The categoriesare negative correlated to each other.

DAGA 2016 Aachen

894

Page 3: Comparison of Evaluation Methods for the Quality ...pub.dega-akustik.de/DAGA_2016/data/articles/000548.pdf · [6]Sch afer, F., \Vergleich verschiedener Evaluierungs-methoden f ur

a) b) c) d) e)

Fig. 3: Evaluation results for the presented signals for each method SSMA and MMSA; structure: signal type version method;a) distance, b) width, c) localizability, d) reverberation level and e) naturalness; gray marked signals: significant differencebetween the two methods

Fig. 4: Influence of the presenting order on the quality per-ception; left: group 1 (1st: SSMA, 2nd: MSSA), right: group 2(1st: MSSA, 2nd: SSMA); quality feature: reverberation level

Influence of the signal types:

Regarding the signals it can be said, that the differentversions of the signal types as well as the signal typesitself causes different perception of the quality features.The signal content has influence on the evaluation.

Conclusion

The results show a minimal influence of the used methodon the evaluation of perceptual qualities, but no speci-fic pattern can be observed within the found differences.Further there is just small influence of the presenting or-der on the resulting values. The major difference foundis the time needed to finish one method.

There are just a few differences between the results of thetwo evaluation methods. For that reason SSMA is moresuitable for the evaluation of the given quality featuresand for the evaluated spatial audio signals, especially forminimizing the testing time.

As next steps the evaluation of the presented questioncan be repeated for other use cases, signals and qualityfeatures to examine if the statements can be generalized.

Acknowledgment

The need for a suitable evaluation method came up whileworking on the Project NoSteaQ (Non Standardized Eva-luation of Audio Quality) [7]. The focus of this projectis spatial audio quality evaluation regarding to differentlistening rooms with consideration of the multidimensio-nality of audio quality. The test design and performancewas part of a master thesis done by Schafer [6].

The authors like to thank all the participants who weretaking part in the listening test. This work is suppor-ted by a grant of the Deutsche Forschungsgemeinschaft(Grant BR 1333/13-1).

DAGA 2016 Aachen

895

Page 4: Comparison of Evaluation Methods for the Quality ...pub.dega-akustik.de/DAGA_2016/data/articles/000548.pdf · [6]Sch afer, F., \Vergleich verschiedener Evaluierungs-methoden f ur

References

[1] Rec. ITU-R BS.1116-1, “Methods for the subjecti-ve assessment of small impairments in audio systemsincluding multichannel sound systems”, ITU, 1994-1997

[2] Rec. ITU-R BS.775-3, “Multichannel stereophonicsound system with and without accompanying pic-ture”, ITU, 2012

[3] EBU (2008, Oct. 7), “EBU SQAM CD - SoundQuality Assessment Material recordings forsubjective tests” [zip-archive], Available: htt-ps://tech.ebu.ch/publications/sqamcd

[4] Berg, J., and Rumsey, F., “Identification of Percei-ved Spatial Attributes of Recordings by RepertoryGrid 4924 (K1) Technique and Other Methods.”, AES106th Convention, 1999.

[5] Lindau, A., Brinkmann, F., Erbes, V., Maempel, ,H.-J., Lepa, S., Brinkmann, F., and Weinzierl, S., “Afocus group approach towards a Spatial Audio Quali-ty Inventory for virtual acoustic environments”, EAAJoint Auralization and Ambisonics Symposium, 2014.

[6] Schafer, F., “Vergleich verschiedener Evaluierungs-methoden fur die Untersuchung der wahrgenomme-nen Qualitat von raumlichen Signalen”, Master The-sis, TU Ilmenau, 2015.

[7] Sloma, U., “Multidimensionality and context depen-dencies in quality evaluation of spatial audio signals”,Proc. of the 2nd Int. Conf. on Spatial Audio - ICSA,2014.

DAGA 2016 Aachen

896