revealing the hidden patterns of news photos: analysis of millions of news photos through gdelt and...

54
Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs Haewoon Kwak Jisun An Qatar Computing Research Institute Hamad Bin Khalifa University

Upload: haewoon-kwak

Post on 14-Apr-2017

411 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Haewoon Kwak Jisun An

Qatar Computing Research InstituteHamad Bin Khalifa University

Page 2: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

2

Page 3: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

3

Page 4: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

4

Page 5: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Roles of News Photos

● Influence people’s perception● Enhance reader’s memory● Deliver emotion otherwise hard to be

conveyed

5

Page 6: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

6

Page 7: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

7

Why was this photo not picked?(source: https://www.donaldjtrump.com)

Page 8: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

8

Page 9: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

To Understand Messages of Photos

● We need to know○ What are shown in the photos○ How they are portrayed

9

Page 10: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Challenges in News Photo Analysis

● Text mining has been a useful tool for analyzing news text

→ What is the appropriate tool forexamining news photos?

10

Page 11: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Conventional Tool for Photo Analysis

● Manual coding … hard to scale

11

Page 12: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Deep Learning for Image Recognition

12

Page 13: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Object Recognition

13

Page 14: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Emotion Detection

https://www.microsoft.com/cognitive-services/en-us/emotion-api 14

Page 15: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Deep learning enables us to study news photos in large-scale

15

Page 16: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Goal of This Work

● To offer a general understanding of news photos ○ What are shown in the photos?○ How are people portrayed?

■ From the perspective of emotion■ From the perspective of gender

● Case study: Portrayal of politicians

16

Page 17: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

● We can crawl photos from news websites and analyze them

● But, setting the deep learning framework and training it take time/money/...

Data Collection

17

Page 18: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

18

Page 19: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

GDELT Visual GKG (VGKG)

● Collects news articles around the world● Extract photos from the articles● Calls Google Cloud Vision API to analyze

photos

● VGKG is available since 1 Jan 2016 http:

//blog.gdeltproject.org/announcing-the-new-gdelt-visual-global-knowledge-graph-vgkg/

19

Page 20: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Example of VGKG

20160101004500 http://www.bbc.co.uk/news/uk-35205943 http://ichef.bbci.co.uk/news/1024/cpsprodpb/B89F/production/_87436274_87436273.jpg profession<FIELD>0.95780987<FIELD>/m/063km<RECORD>person<FIELD>0.85714287<FIELD>/m/01g317<RECORD>close up<FIELD>0.82379222<FIELD>/m/02cqfm<RECORD>bishop<FIELD>0.78259438<FIELD>/m/01b7b<RECORD>bishop<FIELD>0.71334475<FIELD>/m/027k49j<RECORD>diocesan bishop<FIELD>0.64282793<FIELD>/m/09sgrf<RECORD>auxiliary bishop<FIELD>0.57118613<FIELD>/m/05mx3n<RECORD>clergy<FIELD>0.57113737<FIELD>/m/0db79 -2<FIELD>-2<FIELD>-2<FIELD>-2 0.95443642<FIELD>3.199043<FIELD>12.419704<FIELD>-7.179338<FIELD>0.621747<FIELD>433,215;575,215;575,357;433,357<FIELD>-2<FIELD>-2<FIELD>-2<FIELD>2<FIELD>-2<FIELD>-2<FIELD>-2

20

Date, Document identifier (URL), Image URL, Labels (description, confidence score, unique id), Geographic Landmarks, Logos, Safe Search, Faces (Angle, Emotion, etc.), OCR

Page 21: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

(Potential) Limitations of GDELT

● List of news sources is not explicitly announced (also, growing) - coverage bias might exist

● Our work of comparing GDELT with another news dataset will be presented in the poster session

Two Tales of the World: Comparison of Widely Used World News Datasets - GDELT and EventRegistry Haewoon Kwak and Jisun AnICWSM'16: The 10th International Conference on Web and Social Media (poster), 2016 21

Page 22: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Our Dataset - Full

● GKG and VGKG in January 2016● Popularity measured by Alexa.com

22

Page 23: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Our Dataset - 7 Popular News Media

● Top 30 & > 1K records

23

Page 24: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Data Preprocessing

● Keep labels whose confidence score ≥ .8

http://i2.cdn.turner.com/cnnnext/dam/assets/160116174054-kerry-handshake-zarif-large-169.jpg

Person 0.84957772Business 0.59667766

24

Page 25: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

What Are Shown in the Photos?Common Objects in News Photos

25

Page 26: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

News Topics and Relevant Photos

● News photos should relate with topics of news articles→ Common objects might be different across topics

● CNN has ‘section’ info. in its URLhttp://edition.cnn.com/2016/04/07/travel/japan-best-of-wakayama/index.htmlhttp://edition.cnn.com/2016/05/05/politics/paul-ryan-donald-trump-republican-resistance/index.html 26

Page 27: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Person is the Most Common Object

27

Page 28: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

But, in Travel, Person is Uncommon

28

Page 29: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Region-related Sections

29

● Why does this matter?

Page 30: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Western Media and the Third World

● Golan reports that western mass media strengthen the portrayal of the third world by reporting war, poverty, famine, conflicts, violence and conflicts and lead to negative perception (Golan 2008).

30

Page 31: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

How CNN deals with MENA region?

31

Page 32: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

How Are People Portrayed? From the Perspective of Emotion

32

Page 33: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Classification of Emotions

33https://articulation360.wordpress.com/2011/08/26/emotions-memory-game/

Page 34: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Google API Can Detect 4 Emotions

34https://articulation360.wordpress.com/2011/08/26/emotions-memory-game/

SURPRISE

SORROW ANGER

JOY

Page 35: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Neutral (75%) or Joy (24%)

● Among 11,127 faces (in 7 popular media), 2,740 faces (24.6%) have one of emotions

● Most of them (2,665 faces) express joy

35

Page 36: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Nonverbal & Verbal Communication

● Happy faces accelerate the cognitive processing of positive words and slow down that of negative words (Stenberg, Wiking, and Dahl 1998)

36

Page 37: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

We Use Microsoft Face API

● Measures smiling intensity (0.0~1.0)

37

0.998 0.0 (baby)

https://www.microsoft.com/cognitive-services/en-us/face-api

Page 38: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Smile Comes with Positive Text

● Positive correlation between smile intensity and tone (sentiment) of the text

⍴=0.225

38

Page 39: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

How Are People Portrayed? From the Perspective of Gender

39

Page 40: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Previous Studies on News Media

1. Men outnumber women2. Men and women are associated with

particular roles3. More women than men were depicted

as happy and calm.

→ We’ll verify this in large-scale

40

Page 41: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Again, We Use Microsoft Face API

41https://www.microsoft.com/cognitive-services/en-us/face-api

● Measures Gender and Age

Page 42: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Unequal Gender Representation

0.5

42

Page 43: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Stereotyping: Women in “Living”

0.5

43

Page 44: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Women Smile More Than Men

44

Page 45: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Younger Women, Older Men

45

Page 46: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Case StudyPortrayal of Politicians

46

Page 47: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Smiling Politicians

● Goodnow (2010) found that Obama smiles more than Clinton in photos in Time magazine

● Smile gives a positive, non-threatening impression to viewers (Goffman 1979)

47

Page 48: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Bias of CNN Toward Sanders?

(Smiling faces / All faces)

* CNN even uses “Sorrow” faces for Sanders

48

Page 49: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Pro-Clinton Media Behave Similarly

49

Page 50: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Summary and Future Work

50

Page 51: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Key Findings

● What are shown in the news photos○ People commonly appear (≥ 40.5% @top500)

● How they are portrayed○ People are neutral (75%) or smiling (24%)○ Gender representation is unequal○ Gender role stereotyping is found○ Women smile more and look younger than men

● Clinton smiles more than Sanders in some media

→We demonstrate the great potential of deep learning for computational journalism

51

Page 52: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Deeper Analysis on Text and Photos

● Headline and photos?● Topic and photos?● Keywords and photos?

52

Page 53: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

Building PhotoBiasMeter.org

● Showing the preference of media outlets toward candidates over time

● Challenges○ Modeling complex dimension of

preference - “Smile” is only one dimension

53

Page 54: Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos through GDELT and Deep Learning-based Vision APIs

@haewoonFull paper is available via

http://arxiv.org/abs/1603.04531

54