thesis_presentation_v1 shorter split

20
Identifying Single and Stacked News Triangles in Online News Articles - an Analysis of 31 Danish Online News Articles Annotated by 68 Journalists By Miklas Njor [email protected] Master Thesis Project, 15 ECTS, DA613A, Spring 2015 Supervisor: Daniel Spikol Examiner: Bengt Nilsson Link to data: http://figshare.com/account/projects/4414 and http://plot.ly (see thesis for direct links) Presentation Outline

Upload: miklas-njor

Post on 14-Apr-2017

63 views

Category:

Documents


0 download

TRANSCRIPT

Identifying Single and Stacked News Triangles

in Online News Articles

- an Analysis of 31 Danish Online News Articles Annotated by 68 Journalists

By Miklas [email protected]

Master Thesis Project, 15 ECTS, DA613A, Spring 2015

Supervisor: Daniel Spikol Examiner: Bengt Nilsson

Link to data: http://figshare.com/account/projects/4414 and http://plot.ly (see thesis for direct links)

Presentation Outline

• Introduction• Research Questions • Methodology (What was the set-up)• Results

• Identifying the presence of Stacked News Triangles

• Named Entities influence on the presence of series of Stacked News Triangles

• Named Entities variance per Category

• Conclussion and Summary

Presentation Outline

https://www.flickr.com/photos/boston_public_library/6801377949/

Intro: The Problem

https://www.flickr.com/photos/boston_public_library/6801377949/

Intro: The Problem

https://www.flickr.com/photos/greenwood100/7314522860/

Intro: The News Triangle

“News stories should flow logically from the first paragraph. […] One way of looking at it is through the News Triangle or inverted pyramid. Generations of journalists have been brought up on this.”- Sissons

Intro: Stacked New Triangles

News Triangle for print news

News Triangles for online news

Headline + Sub-headline Intro Body element 1…

RQ 1: To what extent do online news articles follow the idiom of many News

Triangles, instead of only one News Triangle, where information is distributed at the beginning of the text. I.e. do the keyword candidates appear less frequently the further we move away from the start of each element block?

RQ 2: Given that much news concerns something that happened to someone somewhere, what influence does Named Entity keywords have on the

presence of News Triangles?

RQ 3: Is there a distinct variance of Named Entity Type keywords (Persons,

Place or Organisations) within the categories Culture, Domestic, Economy,

and Sports?

RQ 1: To what extent do online news articles follow the idiom of many News Triangles, instead of only one News Triangle, where information is

distributed at the beginning of the text. I.e. do the keyword candidates appear less frequently the further we move away from the start of each element block?

RQ 2: Given that much news concerns something that happened to someone somewhere, what influence does Named Entity keywords have on the

presence of News Triangles?

RQ 3: Is there a distinct variance of Named Entity Type keywords (Persons, Place or Organisations) within the categories Culture, Domestic, Economy, and Sports?

RQ 1: To what extent do online news articles follow the idiom of many News Triangles, instead of only one News Triangle, where information is

distributed at the beginning of the text. I.e. do the keyword candidates appear less frequently the further we move away from the start of each element block?

RQ 2: Given that much news concerns something that happened to someone somewhere, what influence does Named Entity keywords have on the

presence of News Triangles?

RQ 3: Is there a distinct variance of Named Entity Type keywords (Persons, Place or Organisations) within the categories Culture, Domestic, Economy, and Sports?

Research Questions

Methodology - Collecting Annotations

68 journalist8 articles eachSet of 31 articles

Categories:Culture (7)Domestic (14)Economy (6)Sports (4)

Methodology - Processing AnnotationsKeyword Annotations per Category:

Culture (7 articles), Domestic (14 articles), Economy (6 articles), Sports (4 articles).Category and Annotation Type

Keywords per category

Avg. keywords per article

collectively Unique keywords

per categoryAvg. Unique

keywords per article collectively

Culture Keywords 993 141.86 365 52.14Domestic Keywords 2510 179.29 703 50.21Economy Keywords 755 125.83 268 44.67

Sports Keywords 691 172.75 176 44

Averages and Five Number Summary of Article AnnotationsCategories Average Min Max Median 3rd

median1st

Median Culture 15.57 13 19 16 16 16 Domestic 18.71 12 39 15.5 16 15 Economy 14.83 13 18 14.5 15 14 Sports 18.75 12 37 13 13 13

Methodology - Processing Annotations

Article 22 - Keywords - 3RD QUARTILEKEYWORD COUN

Tdansk svømmeunion 13svømning 9trygfonden 9drukneulykker 6svømmeundervisning 6svømme 4børn 4tobias marling 4rené højer 3druknestatistik 2statistik 2folkeskolen 2drukneulykke 2skoler 2yougov 2skole 2yougov-undersøgelse 2

Article 4 - Keywords - 3RD QUARTILEKEYWORD COUN

T KEYWORD COUNT

odsherred 25 forældre 9tutoring 21 birgitte henriksen 9nordskolen 17 niveaudeling 9cooperative learning 15 elever 8manu sareen 14 kommuner 8undervisningsmetoder 12 lektier 8folkeskole 11 makkerlæsning 7undervisningsministeriet 11 trelærerordning 6folkeskolen 11 matematik 6specialundervisning 10 nordvestsjælland 6undervisning 10 undervisningsdifferentie

ring 6ppr 10 lektiehjælp 6peter holm 9

Methodology - Processing Annotations

Article 22 - Keywords

KEYWORD COUNT

dansk svømmeunion 13

svømning 9

trygfonden 9

drukneulykker 6

svømmeundervisning 6

svømme 4

børn 4

tobias marling 4

rené højer 3

druknestatistik 2

statistik 2

folkeskolen 2

drukneulykke 2

skoler 2

yougov 2

skole 2

yougov-undersøgelse 2

Article 4 - Keyowrds

KEYWORD COUNT KEYWORD COUNT

odsherred 25 forældre 9

tutoring 21 birgitte henriksen 9

nordskolen 17 niveaudeling 9

cooperative learning 15 elever 8

manu sareen 14 kommuner 8

undervisningsmetoder 12 lektier 8

folkeskole 11 makkerlæsning 7

undervisningsministeriet 11 trelærerordning 6

folkeskolen 11 matematik 6

specialundervisning 10 nordvestsjælland 6

undervisning 10 undervisningsdifferentiering 6

ppr 10 lektiehjælp 6

peter holm 9

Results - Keyword Distribution - 1

News Triangles for online news

Headline + Sub-headline Intro Body element 1…

Results - Keyword Distribution - 1

Results - Keyword Distribution - 2Category Article

ID Overall News

TriangleIntro Section 1 Section 2 Section 3 Section 4 Section 5

How Many Have Stacked News

Triangles?

CultureArticle 16 1 0

3 of 7Article 17 1 1 0 1Article 18 1 1Article 22 1 0Article 23 1 1Article 27 1 0 1 0 1 1Article 28 1 1 1

Domesti

c

Article 03 1 1 1

7 of 14

Article 04 1 1 1 1 1Article 05 1 1 1Article 07 1 0 1 1Article 08 1 1Article 13 1 1 1 1 1 1Article 19 1 1 1Article 20 1 1 1 0Article 21 1 1 0 0 1 0 0Article 24 1 0 1 1 1 0Article 25 1 0 1 0 1Article 26 1 1Article 29 1 1 1 0

Article 30 1 0Econom

y

Article 01 1 1 0 1 1

2 of 5Article 02 1 0 0 1 1Article 06 1 1 0 1 0Article 09 1 1 1Article 10 1 1 1 1 1

Article 11 1 1 1 0 1SportArticle 12 1 1

2 of 4Article 14 1 0Article 15 1 1 1 1Article 31 0 1 0 1 1

Results - NE in articles

Results - NE variance in annotations

Results - NE variance in annotations

RQ 1: To what extent do online news articles follow the idiom of many News Triangles, instead of only one News Triangle, where information is distributed at the beginning of the text. I.e. do the keyword candidates appear less frequently the further we move away from the start of each element block?

Summary/DiscussionSummary/Discussion

RQ 2: Given that much news concerns something that happened to someone somewhere, what influence does Named Entity keywords have on the presence of News Triangles?RQ 3: Is there a distinct variance of Named Entity Type keywords (Persons, Place or Organisations) within the categories Culture, Domestic, Economy, and Sports?

Category How Many Articles Have Stacked News Triangles?

Culture 3 of 7 have Stacked News Triangles

Domestic 7 of 14 have Stacked News Triangles

Economy 2 of 5 have Stacked News Triangles

Sports 2 of 4 have Stacked News Triangles

FUTURE WORK• Looking much closer at what causes of the ascent or descent of

the linear fit. • If a smaller or larger set of keywords is better• A larger set of articles• Named Entities in taxonomies

Summary/DiscussionSummary/Discussion• Including more keywords?

• Removing Stop words?

• De- or increasing partitions of the text?

Summary/DiscussionSummary/Discussion

[email protected]