attentivevideo: quantifying emotional responses to mobile...

AttentiveVideo: Quantifying Emotional Responses to Mobile Video Advertisements

Phuong Pham, Jingtao Wang Computer Science and LRDC, University of Pittsburgh, PA, USA

{phuongpham, jingtaow}@cs.pitt.edu

ABSTRACT

This demo presents AttentiveVideo, a multi-modal video player that can collect and infer viewers’ emotional responses to video advertisements on unmodified smart phones. When a subsidized video advertisement is playing, AttentiveVideo uses on-lens finger gestures for tangible video control, and employs implicit photoplethysmography (PPG) sensing to infer viewers' attention, engagement, and sentimentality toward advertisements. Through a 24-participant pilot study, we found that AttentiveVideo is easy to learn and intuitive to use. More importantly, AttentiveVideo achieved good accuracies on a wide range of emotional measures (best average accuracy = 65.9%, kappa = 0.30 across 9 metrics). Our preliminary result shows the potential of both low-cost collection and deep understanding of emotional responses to mobile video advertisements.

CCS Concepts • Human-centered computing➝Ubiquitous and mobile computing➝Ubiquitous and mobile computing systems and tools

Keywords Heart Rate; Computational Advertisement; Physiological Signal; Affective Computing; Mobile Interfaces.

1. INTRODUCTION In 2015, U.S. online digital video advertising revenues reached $4.2 billion, wherein mobile advertising rose over 66% compared to a 23% growth of the entire industry [2]. Despite the rapid growth and huge expenses, it is still challenging to evaluate the quality of advertisements. For example, the efficacy of direct response advertising, e.g. persuading a prospective customer to purchase specific merchandise, can be quantified through metrics such as click-through-rate (CTR), conversion ratio (CVR), and cost per click (CPC). It is much more challenging to measure the effectiveness of branding advertising. Since branding advertising is intended to increase customers’ awareness, trust, and sometimes loyalty toward a brand, there are limited short-term user behaviors that can be observed and analyzed. Traditional approaches such as self-report data, focus groups, and behavior analysis are not scalable because of their cost, time-consuming and the inherent ambiguity in reporting viewers’ subjective feelings [1]. Although autonomic feedback channels

such as facial expressions [4] and physiological signals [3] have been used as supplemental channels for the quality of advertisements, these approaches require either dedicated sensors [3] or PCs connected to high speed internet [4]. In this demo, we present AttentiveVideo, a multi-modal video player that can collect and infer viewers’ emotional responses to mobile advertisements in an automatic and scalable way on unmodified smart phones. AttentiveVideo enables a dual video control interface (Figure 1). For regular video materials (e.g. movies and TV shows), AttentiveVideo uses touch widgets for video playback (Figure 1, top). When a subsidized video advertisement is playing, AttentiveVideo employs on-lens finger gestures for tangible video control, i.e. covering the camera lens to play the ad, uncovering the lens to pause the ad (Figure 1, bottom). During this process, AttentiveVideo also implicitly extracts viewers’ photoplethysmography (PPG) waveforms by monitoring their fingertips’ transparency changes through the back camera. We show that 9 dimensions of viewers’ emotional responses to video advertisements can be inferred via implicit PPG signal sensing. With insights from the large scale, implicitly collected emotional responses, advertisers can improve the quality, effectiveness, and relevancy of commercial ads. Furthermore, AttentiveVideo can also benefit viewers by allowing them to enjoy higher quality video contents for free via sponsored advertisements.

Figure 1: AttentiveVideo with a dual video controls interface (top: touch widgets for consuming regular video; bottom: on-lens finger gestures for watching sponsored advertisements)

2. DESIGN OF ATTENTIVEVIDEO AttentiveVideo has two unique features when compared with existing video playback apps on mobile devices: 1) a dual video control interface, and 2) algorithms for automatic inference of emotional responses.

Tap to play/pause

Swipe to adjust seekbar

Cover the lens =

Uncover the lens =

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). ICMI '16, November 12-16, 2016, Tokyo, Japan ACM 978-1-4503-4556-9/16/11. DOI: http://dx.doi.org/10.1145/2993148.2998533

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the Owner/Author.Copyright is held by the owner/author(s).

ICMI’16, November 12–16, 2016, Tokyo, JapanACM. 978-1-4503-4556-9/16/11...$15.00http://dx.doi.org/10.1145/2993148.2998533

423

2.1 Dual Video Controls AttentiveVideo extends the Static LensGesture algorithm [7] to detect on-lens finger covering actions (Figure 1, bottom). This algorithm achieved an accuracy of 99.59% for video control [6].

The tangible video control channel in AttentiveVideo has at least three advantages in the context of subsidized mobile advertising: 1) this mechanism makes it harder for a viewer to skip the sponsored advertisements. Only live body parts (e.g. fingers or earlobes) can be used to enable video playback. Paradoxically, making the advertisement hard to skip is beneficial to both advertisers and viewers. From advertisers’ perspective, richer feedback on viewers’ attention, engagement, and sentimentality can help the advertiser to have a deep understanding of the viewers’ emotional responses to a specific advertisement. With the increased financial support from advertisers, viewers can also enjoy more high quality video materials for free. More importantly, viewers always have the freedom to switch to a "pay-per-view" option if they are not interested in sponsored ad; 2) it provides natural tactile feedback from the bezel of the back camera when a viewer is holding the phone in landscape mode [6]; and 3) the cover-and-hold gesture allows AttentiveVideo to implicitly capture the viewer's physiological signals during ad watching (Figure 2).

Figure 2: Implicitly capturing viewers’ PPG signals

2.2 Inference Algorithms Different from previous successes in mobile MOOC learning [5][6], there are at least two unique challenges when inferring emotional responses in mobile advertising. First, a video ad is much shorter (15-60s) than a typical video clip (3-30 mins) in MOOCs, demanding higher sensitivity; Second, advertisers care about viewers’ emotions such as amusing and like elicited by ads, while instructors pay more attention to learners’ cognitive states such as confusions and frustrations during video watching.

To address the sensitivity challenge, AttentiveVideo extracts 7 dimensions of time-domain HRV features used in AttentiveLearner [5][6] and replaces the 8th pNN50 feature with pNN5, pNN10, and pNN20. In total, 10 HRV features are extracted for each ad clip. AttentiveVideo does not use frequency domain HRV features for the same reason.

2.3 Quality Metrics and Model Training We defined 9 dimensions of viewers’ responses in three categories (i.e. attention, engagement, and sentiment) to quantify the quality of video advertisement (Table 1). We collected the ground truth of 6 dimensions of emotional responses via questionnaires on a 7-point Likert scale (the Question column in Table 1). We used the Self-Assessment Manikin (SAM) to collect the two continuous affect dimensions, i.e. valence and arousal. The like dimension was collected via a ranked list.

We ran a 24-participant (13 females) pilot study to evaluate the feasibility and usability of AttentiveVideo. We designed a realistic scenario by letting users watch an episode of a popular TV series (The Big Bang Theory) with 12 ads in 3 advertising slots.

Table 1. Nine dimensions of prediction metrics

Category Metric Question

Attention Attention I paid sufficient attention to the ad

Recall I can recall major details in this ad

Engagement

Like Please choose the 6 ads in this study that you liked best and rank them accordingly (1: most liked; 6: least liked)

Rewatch I’m interested in watching the ad again in the future

Share I found something special in the ad and want to share it with my friends

Sentiment

Touching I found the ad touching

Amusing I found the ad amusing

Valence Self-Assessment Manikin

Arousal Self-Assessment Manikin AttentiveVideo achieved best overall accuracy (average accuracy = 65.9%, kappa = 0.30 across 9 metrics) when using a Support Vector Machine (SVM) with RBF-kernels.

3. CONCLUSION We present AttentiveVideo, a multi-modal video player that collects and infers viewers’ emotional responses towards video advertisements on unmodified smart phones. AttentiveVideo can predict viewers’ attention, engagement, and sentimentality towards advertisements in 9 dimensional via commodity camera based implicit PPG sensing. AttentiveVideo can help advertisers to have a richer and fine-grained understanding of users’ emotional responses towards video advertisements. AttentiveVideo can also help viewers to enjoy more high quality video materials for free via subsidized video ads. We are exploring the inclusion of a front camera channel and automatic facial expression analysis (FEA) algorithms to augment the PPG sensing channel in AttentiveVideo.

4. REFERENCES [1] Aaker, D. et al. Warmth in advertising: Measurement, impact, and

sequence effects. J of Consumer Research 1986.

[2] IAB News, http://www.iab.com/news/us-internet-ad-revenues-hit-landmark-59-6-billion-in-2015/

[3] Lang, A. Involuntary attention and physiological arousal evoked by structural features and emotional content in TV commercials. Communication Research, 17(3) 1990.

[4] McDuff, D., et al. Automatic measurement of ad preferences from facial responses gathered over the internet. Image and Vision Computing, 32(10) 2014.

[5] Pham, P., and Wang, J. AttentiveLearner: Improving Mobile MOOC Learning via Implicit Heart Rate Tracking. In Proc. AIED 2015.

[6] Xiao, X., and Wang, J. Towards Attentive, Bi-directional MOOC Learning on Mobile Devices. In Proc. ICMI 2015.

[7] Xiao, X., et al. LensGesture: augmenting mobile interactions with back-of-device finger gestures. In Proc. ICMI 2013.

Fing

erti

p tr

ansp

aren

ce

t

424

attentivevideo: quantifying emotional responses to mobile...

Documents