mobile video quality assessment...

Mobile Video Quality Assessment DatabaseAnush Krishna Moorthy, Lark Kwon Choi, Alan Conrad Bovik and Gustavo de Veciana

Department of Electrical & Computer Engineering,The University of Texas at Austin,

Austin, TX 78712-1084, USA.

Abstract—We introduce a new research tool – the LIVEMobile Video Quality Assessment (VQA) database. The databaseconsists of pristine reference and distorted videos, along withhuman/subjective opinion scores of the associated video quality.The database was designed towards improving our understandingof human judgements of time-varying video quality in heavily-trafficked wireless networks. A byproduct of a better understand-ing could be quantitative models useful for the development ofperceptually-aware algorithms for resource allocation and rateadaptation for video streaming. The database consists of 200distorted videos created from 10 RAW HD videos acquired usinga RED ONE digital cinematographic camera. It includes staticdistortions such as compression and wireless packet loss, as wellas dynamically varying distortions. We describe the creation ofthe database, simulated distortions and the human study thatwe conducted to obtain 5,300 time-sampled subjective traces ofquality and summary subjective scores. We analyze the resultsobtained for certain subclasses of distortions from this largedatabase of interest in context of wireless video delivery. TheLIVE Mobile VQA database, including the human subjectivescores, will be made available to researchers in the field at nocost in order to aid the development of novel strategies for video-aware resource allocation.

I. INTRODUCTION

According to the Cisco Visual Networking Index (VNI)forecast [1], global mobile traffic nearly tripled in 2010 forthe third consecutive year, and nearly 50% of this traffic wasaccounted for by mobile data, with this number predicted toincrease to more than 75% by 2015. Associated with thisexplosion in mobile video streaming is paucity of bandwidththat is already evident [2] and is only predicted to get worse[3]. In this environment, developing frameworks for efficientresource allocation when transmitting video is a topic of rele-vant practical interest, and a field of intense study. A promisingdirection of research is the perceptual optimization of wirelessvideo networks, wherein network resource allocation protocolsare designed to provide video experiences that are measurablyimproved under perceptual models.

Since the final receivers of videos transported over wirelessnetworks are humans, it is imperative that perceptual modelsfor resource optimization capture human opinion on visualquality. Here, we summarize some results from a large scalehuman (subjective) study that we recently conducted to gaugesubjective opinion on HD videos when displayed on mobiledevices.

There exist several subjective studies for video qualityassessment (VQA) [4]–[7], however, these studies have beenperformed on large screen displays and the distortions have in-cluded compression and/or transmission over networks. Stud-

ies on mobile displays include that in [8], which evaluated thequality of the H.264 scalable video codec (SVC), that in [9],which evaluated image resolution requirements for MobileTV,as well as those in [10]–[13].

While each of the above databases and studies are valuable,almost all of them suffer from one or more of the followingproblems: (1) small, insignificant database size, (2) insufficientdistortion separation for judgements on perceptual quality,(3) unknown sources (not necessarily uncompressed), withunknown source distortions, (4) low video resolutions whichare not relevant in today’s world, and (5) lack of publicavailability of the database. In order to provide the researchcommunity with a modern and adequate resource enablingsuitable modeling of human subjective opinion on videoquality, we have created a large database of broad utility.

The LIVE Mobile VQA database consists of 200 distortedvideos at HD (720p) resolution, and provides human opinionon subjective quality obtained from analyzing responses fromover 50 subjects resulting in 5,300 summary subjective scoresand time-sampled subjective traces of quality. The study wasconducted on a small mobile screen (4”) as well as a largertablet screen (10.1”) and encompasses a wide variety ofdistortions, including compression and wireless packet loss.More importantly, the database is the first of its kind to includedynamic distortions, i.e., distortions that vary as a function oftime when the subject views a video, in order to simulate thescenarios with variable bit-rate delivery.

In this article, we summarize certain key aspects of thedatabase construction, the human study conducted and de-scribe significant results obtained that are relevant to re-searchers in the wireless video space.

II. SUBJECTIVE ASSESSMENT OF MOBILE VIDEOQUALITY

A. Source Videos and Distortion Simulation

Source videos were obtained using a RED ONE digitalcinematographic camera, and the 12-bit REDCODE RAW datawas captured at a resolution of 2K(2048×1152) at frame ratesof 30 fps and 60 fps using the REDCODE 42MB/s optionto ensure the best possible acquisition quality. Source videoswere first truncated to 15 seconds and then downsampled to aresolution of 1280× 720 (720p) at 30 fps and converted intouncompressed .yuv files. A total of 12 videos from a larger setwere used for the study, two of which were used for trainingthe subject, while the rest were used in the actual study. Figure1 shows sample frames from some of the video sequences.

Fig. 1. Example frames of the videos used in the study.

Fig. 2. Rate Adaptation: Schematic diagram of the three different rate-switches in a video stream simulated in this study.

Each of the reference videos were subjected to a variety ofdistortions including: (a) compression, (b) wireless channelpacket-loss, (c) frame-freezes, (d) rate adaptation and (e)temporal dynamics.

Each source video was compressed using the JM referenceimplementation of the H.264 scalable video codec (SVC) []at four rates R1, R2, R3, R4 where, R1 < R2 < R3 < R4

between 0.7 Mbps and 6 Mbps using fixed QP encoding,yielding 40 distorted videos. The QP values (and hence thebit-rates) were selected manually for each video to ensureperceptual separation , so that humans (and algorithms alike)are capable of producing judgements of visual quality [4], [14].

Wireless packet loss was simulated using a Rayleigh fadingchannel over which each compressed and packetized videowas transmitted, which lead to a total of 40 distorted videos.Four frame-freeze conditions were simulated for each sourcevideo, yielding a total of 40 distorted videos.

To investigate if humans are more sensitive to changes indistortion levels rather than the absolute level of the distortion– similar to the behavior seen in psychovisual studies [15] –we also simulated rate-changes as a function of time as thesubject views a particular video. Specifically, the subject startsviewing the video at a rate RX , then after n seconds switchesto a higher rate RY , then again after n seconds switches backto the original rate RX , as illustrated in Fig. 2. We simulatedthree different rate switches, where RX = R1, R2 and R3

and RY = R4. Although the duration n is another potentialvariable affecting human quality of experience, because of thelength of the test sessions, we fixed n = 5 sec., which alongwith the three rate switches yielded a total of 30 rate-adapteddistorted videos.

A temporal rate (and thus quality) dynamics condition was

Fig. 3. Temporal Dynamics: Schematic illustration of two rate changes acrossthe video; the average rate remains the same in both cases. Left: Multiplechanges and Right: Single rate change. Note that we have already simulatedthe single rate-change condition as illustrated in Fig. 2, hence we ensure thatthe average bit-rate is the same for these two cases.

Fig. 4. Temporal Dynamics: Schematic illustration of rate-changes scenarios.The average rate remains the same in all cases and is the same as in Fig. 3.The first row steps to rate R2 and then steps to a higher/lower rate, while thesecond row steps to R3 and then back up/down again

simulated to evaluate the effect of multiple rate-switches (asagainst the single switch in the previous condition). The ratewas varied between R1 to R4 multiple times (3) as illustratedin Fig. 3. We also simulated a set of distorted videos whichevaluated the effect of the abruptness of the switch, i.e.,instead of switching directly between R1 and R4, the ratewas first switched to an intermediate level RZ from the currentlevel and then to the other extreme. We simulated the followingrate-switches: (1) R1 − R2 − R4, (2) R1 − R3 − R4, (3)R4 −R2 −R1 and (4) R4 −R3 −R1, as illustrated in Fig. 4.The average bit-rate across the temporal dynamics and the rateadaptation condition remained the same, to enable an objectivecomparison across these conditions. This yielded a total of 50distorted videos.

In summary, the LIVE Mobile VQA database consists of 10

reference videos and 200 distorted videos (4 compression +4 wireless packet-loss + 4 frame-freezes + 3 rate-adapted + 5temporal dynamics per reference), each of resolution 1280 ×720 at a frame rate of 30 fps, and of duration 15 seconds each.

B. Test Methodology

A single-stimulus continuous quality evaluation (SSCQE)study [16] with hidden reference [4], [14], [17] was conductedover a period of three weeks at The University of Texas atAustin, LIVE subjective testing lab, where the subjects viewedthe videos on the Motorola Atrix, which has a 4” screen witha resolution of 960 × 540, using software that was speciallycreated for the Android platform to display the videos.

The subjective study was conducted at The Universityof Texas at Austin (UT) and involved mostly (naive) un-dergraduate students whose average age was between 22-28 years. Following our philosophy of using a reasonablyrepresentative sampling of the visual population, no vision testwas performed, although a verbal confirmation of soundnessof (corrected) vision was obtained from the subject. Eachsubject attended two separate sessions as part of the studysuch that each session lasted less than 30 minutes, each ofwhich consisted of the subject viewing (randomized) 55 videos(50 distorted + 5 reference); a short training set (6 videos)preceded the actual study.

The videos were displayed on the center of the screen withan un-calibrated continuous bar at the bottom, which wascontrolled using the touchscreen. The subjects were asked torate the videos as a function of time i.e., provide instantaneousratings of the videos, as well as to provide an overall ratingat the end of each video. At the end of each video a similarcontinuous bar was displayed on the screen, although it wascalibrated as “Bad”, “Fair”, and “Excellent” by markings,equally spaced across the bar. Once the quality was entered,the subject was not allowed to change the score. The qualityratings were in the range 0-5.

A total of thirty-six subjects participated in the mobilestudy and the design was such that each video received 18subjective ratings. The instructions provided to the subject arereproduced in the Appendix. The subject rejection procedurein [16] was used to reject two subjects from the mobilestudy, and the remaining scores were averaged to form aDifferential Mean Opinion Scores (DMOS) for each video[4], which is representative of the perceived quality of thevideo. DMOS was computed only for the overall scores thatthe subject assigned to the videos. The average standard errorin the DMOS score was 0.2577 across the 200 distortedvideos. We assume that the DMOS scores sample a Gaussiandistribution centered around the DMOS having a standarddeviation computed from the differential opinion scores acrosssubjects for all further analysis.

C. Evaluation of Subjective Opinion

For each of the temporal distortion classes, we conducteda t-test between the Gaussian distributions centered at theDMOS values (and having an associated, known standard

deviation) of the conditions we are interested in comparing atthe 95% confidence level. Since the conditions being comparedare functions of content, we compared each of the 10 referencecontents separately for each pair of conditions. In the tablesthat follow, a value of ‘1’ indicates that the row-conditionis statistically superior to the column-condition, while a ‘0’indicates that the row is worse than a column.

The results from the statistical analysis are tabulated inTables I - V. Due to the dense nature of the content, wesummarize the results in the following paragraphs. Note thatthe text only provides a high level description of the results inthe table, the reader is advised to thoroughly study the tablein order to better understand the results.

a) Compression (Table I): This table confirms that thedistorted videos were perceptually separable. Notice that eachcompression rate is statistically better (perceptually) than thenext lower rate over all content used in the study.

b) Rate Adaptation (Tables II, III): Our results indicatethat it is preferable to switch from a low rate to a higher oneand back, if the duration at the higher rate is at least halfas much as the duration of the lower rate. This is contraryto common wisdom that people prefer not to see fluctuationsin video quality, given the alternative of staying at the lowerrate. Further, if the rates are perceptually separated (as ourrates are), a change in the lowest rate has a definite impact onthe visual quality.

c) Temporal Dynamics (Tables IV, V): An interestingobservation from the results is that users prefer multiplerate switches over fewer switches. Again, while this may becontrary to conventional wisdom, there seems to be a plausibleexplanation for such behavior. When shown a high quality seg-ment of a video for a long duration, the subject acclimatizes tothe viewing quality, raising the bar for acceptance so that whenthe high quality segments are followed by long low qualitysegments, she/he assigns a higher penalty than on videos whichcontain short segments of higher quality. A long low qualitysegment preceded by a long high quality one evokes a negativeresponse. It may be conjectured that such videos are seen as anattempt to improve the viewing experience, thereby boostingthe overall perception of quality. We note that our results areconditioned on the degree of separation between the qualitylevels as well as the duration of each segment, and may notgeneralize to other such switches between quality levels atlower separation and with faster/slower segment duration.

The tables also indicate that switching to an intermediaterate before switching to a higher/lower rate is preferable –‘easing’ a user into the new quality level is seemingly alwaysbetter than simply jumping to the final quality level. It isalso almost always true that the intermediate level should becloser to the highest quality level in the switch. Finally, theresults also indicate that the quality of the end-segment hasa definite impact on the overall perception, and ending on ahigher quality segment is almost always preferable.

R1 R2 R3 R4

R1 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0R2 1 1 1 1 1 1 1 1 1 1 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0R3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0R4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - - - - - - - - - -

TABLE IRESULTS OF T-TEST BETWEEN THE VARIOUS COMPRESSION-RATES SIMULATED IN THE STUDY. EACH SUB-ENTRY IN EACH ROW/COLUMN CORRESPONDS

TO THE 10 REFERENCE VIDEOS IN THE STUDY.

R1 −R4 −R1 R2 −R4 −R2 R3 −R4 −R3

R1 −R4 −R1 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0R2 −R4 −R2 1 1 1 1 1 1 1 1 1 1 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0R3 −R4 −R3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - - - - - - - - - -

TABLE IIRESULTS OF T-TEST BETWEEN THE VARIOUS RATE-ADAPTED DISTORTED VIDEOS SIMULATED IN THE STUDY. EACH SUB-ENTRY IN EACH ROW/COLUMN

CORRESPONDS TO THE 10 REFERENCE VIDEOS IN THE STUDY.

R1 R2 R3 R4

R1 −R4 −R1 1 1 1 1 1 1 1 1 1 1 0 0 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0R2 −R4 −R2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0R3 −R4 −R3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 0 1 - 0 1 0 0 0 0 0 0 0 0 0 0 0

TABLE IIIRESULTS OF T-TEST BETWEEN THE VARIOUS COMPRESSION-RATES AND THE RATE-ADAPTED VIDEOS SIMULATED IN THE STUDY. EACH SUB-ENTRY IN

EACH ROW/COLUMN CORRESPONDS TO THE 10 REFERENCE VIDEOS IN THE STUDY.

R1 −R4 −R1 R1 −R4 −R1 −R4 −R1

R1 −R4 −R1 - - - - - - - - - - 0 - - - 0 0 0 0 1 -R1 −R4 −R1 −R4 −R1 1 - - - 1 1 1 1 0 - - - - - - - - - - -

TABLE IVRESULTS OF T-TEST BETWEEN MULTIPLE RATE SWITCHES AND A SINGLE RATE SWITCH. EACH SUB-ENTRY IN EACH ROW/COLUMN CORRESPONDS TO

THE 10 REFERENCE VIDEOS IN THE STUDY.

III. DISCUSSION AND CONCLUSION

We described a new resource – the LIVE Mobile VQAdatabase consisting of HD videos, which incorporates a widevariety of distortions, along with the associated subjectiveopinion scores on visual quality. The distortions simulatedinclude the the previously studied uniform compression andwireless packet loss and the novel dynamically-varying distor-tions. The large size of the study and the variety that it offersallows one to study and analyze human reactions to temporallyvarying distortions as well as to varying form factors from awide variety of perspectives.

It is obvious from the foregoing analysis that time-varyingquality has a definite and quantifiable impact on human sub-jective opinion, and this opinion is a function of the durationof the changes, the quality levels and the order in the variationsin quality occur. Our results sometimes contradict popularlyheld beliefs about video quality, however, these contradictionssuggest provocative new conclusions. Humans are seeminglynot as unforgiving as one may believe, and appear to rewardattempts to improve quality. Rapid changes in quality levelsas a function of time are perhaps perceived by humans asattempts to provide better quality and hence these kinds oftemporal distortions yield higher quality scores compared with

conditions in which long segments of low quality followlong segments of high quality. Due to limitations of thestudy session durations, we were unable to include severalother interesting conditions, including greater number of rate-changes, multiple rate changes between different quality lev-els, a single change with a high quality segment at the end(eg., R4 − R1 − R4) and so on. Future work will addressthese relevant scenarios to better understand human perceptionof visual quality.

In this article, we only summarized a relevant portion of thedatabase. We detail the entire database and the analysis fromall of the distortions, including an analysis of the temporaltraces of subjective quality and objective algorithm perfor-mance in [18]. The cited article also contains a descriptionand analysis of the same study conducted on a tablet screenand comparisons between subjective opinions as a function ofthe display device.

We hope that the new LIVE mobile VQA database of 200distorted videos and associated human opinion scores fromover 50 subjects will provide fertile ground for years of futureresearch. Given the sheer quantity of data, we believe that ourforegoing analysis (and that detailed in [18]) is the tip of theice-berg of discovery. We invite further analysis of the datatowards understanding and producing better models of human

R1 −R4 −R1 −R4 −R1 R1 −R2 −R4 R4 −R2 −R1 R1 −R3 −R4 R4 −R3 −R1

R1 −R4 −R1 −R4 −R1 - - - - - - - - - - - 0 0 0 1 1 0 0 0 0 1 1 - 1 1 1 1 1 1 1 0 0 0 0 - 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1R1 −R2 −R4 - 1 1 1 0 0 1 1 1 1 - - - - - - - - - - 1 1 1 1 1 1 1 1 1 1 0 0 - 0 0 0 - 0 0 0 1 1 1 - 1 1 1 1 1 1R4 −R2 −R1 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 1 - 0 0 0 0 - 0 1 0R1 −R3 −R4 1 1 1 1 - 1 1 1 1 1 1 1 - 1 1 1 - 1 1 1 1 1 1 1 1 1 1 1 1 1 - - - - - - - - - - 1 1 1 1 1 1 1 1 1 1R4 −R3 −R1 0 0 1 1 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 - 1 1 1 1 - 1 0 1 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - -

TABLE VRESULTS OF T-TEST BETWEEN THE VARIOUS TEMPORAL-DYNAMICS DISTORTED VIDEOS SIMULATED IN THE STUDY. EACH SUB-ENTRY IN EACH

ROW/COLUMN CORRESPONDS TO THE 10 REFERENCE VIDEOS IN THE STUDY.

behavior when viewing videos on mobile platforms.

APPENDIXINSTRUCTIONS TO THE SUBJECT

You are taking part in a study to assess the quality ofvideos. You will be shown a video at the center of your screenand there will be a rating bar at the bottom, which can becontrolled by using your fingers on the touchscreen. You areto provide the quality as function of time – i.e., move therating bar in real-time based on your instantaneous perceptionof quality. The extreme left on the bar is bad quality and theextreme right is excellent quality. At the end of the video youwill be presented with a similar bar, this time calibrated as‘Bad’, ‘Poor’ and ‘Excellent’, from left-to-right. Using thisbar, provide us with your opinion on the overall quality of thevideo. There is no right or wrong answer, we simply wish togauge your opinion on the quality of the video that is shownto you.

ACKNOWLEDGMENT

This research was supported by the National Science Foun-dation under grant CCF-0728748 and by Intel and Ciscocorporation under the VAWN program.

REFERENCES

[1] CISCO Corp., “Cisco Visual Networking Index: GlobalMobile Data Traffic Forecast Update, 2010–2015,”http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white paper c11-520862.html, 2011.

[2] PCWorld, “Fcc warns of impending wireless spectrum shortage,”http://www.pcworld.com/article/186434/fcc warns of impending wireless spectrum shortage.html, 2010.

[3] S. Higginbotham, “Spectrum shortage will strike in 2013,”http://gigaom.com/2010/02/17/analyst-spectrum-shortage-will-strike-in-2013/, 2010.

[4] K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack,“Study of subjective and objective quality assessment of video,” IEEETransactions on Image Processing, vol. 19, no. 2, pp. 1427–1441, 2010.

[5] Video Quality Experts Group (VQEG), “Final report from thevideo quality experts group on the validation of objectivequality metrics for video quality assessment phase II,”http://www.its.bldrdoc.gov/vqeg/projects/frtv phaseII, 2003.

[6] ——, “Final report from the video quality experts group on the valida-tion of objective quality metrics for video quality assessment phase I,”http://www.its.bldrdoc.gov/vqeg/projects/frtv phaseI, 2000.

[7] ——, “Final report of video quality experts group multimedia phase Ivalidation test, TD 923, ITU Study Group 9,” 2008.

[8] A. Eichhorn and P. Ni, “Pick your layers wisely-a quality assessmentof h. 264 scalable video coding for mobile devices,” Proceedings ofthe 2009 IEEE international conference on Communications, pp. 5446–5451, 2009.

[9] H. Knoche, J. McCarthy, and M. Sasse, “Can small be beautiful?:assessing image resolution requirements for mobile tv,” Proceedings ofthe 13th annual ACM international conference on Multimedia, pp. 829–838, 2005.

[10] S. Jumisko-Pyykko and J. Hakkinen, “Evaluation of subjective videoquality of mobile devices,” in Proceedings of the 13th annual ACMinternational conference on Multimedia. ACM, 2005, pp. 535–538.

[11] M. Ries, O. Nemethova, and M. Rupp, “Performance evaluation ofmobile video quality estimators,” in Proceedings of the European SignalProcessing Conference,(Poznan, Poland. Citeseer, 2007.

[12] S. Jumisko-Pyykko and M. Hannuksela, “Does context matter in qualityevaluation of mobile television?” in Proceedings of the 10th interna-tional conference on Human computer interaction with mobile devicesand services. ACM, 2008, pp. 63–72.

[13] S. Winkler and F. Dufaux, “Video quality evaluation for mobile appli-cations,” in Proc. of SPIE Conference on Visual Communications andImage Processing, Lugano, Switzerland, vol. 5150. Citeseer, 2003, pp.593–603.

[14] A. K. Moorthy, K. Seshadrinathan, R. Soundararajan, and A. C. Bovik,“Wireless video quality assessment: A study of subjective scores andobjective algorithms’,” IEEE Transactions on Circuits and Systems forVideo Technology, vol. 20, no. 4, pp. 513–516, April 2010.

[15] B. Wandell, Foundations of vision. Sinauer Associates, 1995.[16] BT-500-11: Methodology for the subjective assessment of the quality of

television pictures, International Telecommuncation Union Std.[17] M. H. Pinson and S. Wolf, “Comparing subjective video quality testing

methodologies,” Visual Communications and Image Processing, SPIE,vol. 5150, 2003.

[18] A. K. Moorthy, L.-K. Choi, A. C. Bovik, and G. de Veciana, “VideoQuality Assessment on Mobile Devices: Subjective, Behavioral and Ob-jective Studies,” IEEE Journal of Selected Topics in Signal Processing,Special Issue on New Subjective and Objective Methodologies for Audioand Visual Signal Processing, 2011 (submitted).

mobile video quality assessment...

Documents