new age tools in data journalism - analytics & visualization

Download New Age Tools in Data Journalism - Analytics & Visualization

If you can't read please download the document

Upload: gramener

Post on 16-Apr-2017

32 views

Category:

Data & Analytics


4 download

TRANSCRIPT

PowerPoint Presentation

New Age tools in Data Journalism

B Ganes Kesari Co-founder, Head Design & AnalyticsGramener

Journalism is undergoing a quantum leap powered by data and technology.

Were exploring a number of innovations in data-driven story telling.

Templatised story formats. QuizFlicks, for exampleInteractive story formats. Self-solving jigsaws, for exampleContent re-purposing. BCG Grid, for exampleAutomated content generation. Archive monetization. DateFlicks, for exampleAutomating analysisAutomating narratives

1

Journalism has had some conventional Silos2ContentTechnologyDesignData..but, they have started to Crumble now

Why data journalism? Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in barsBut now it's also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what's interesting. - Tim Berners-Lee

Data journalism is [...] the convergence of a number of fields [...] - from investigative research and statistics to design and programming. - Paul Bradshaw

Visual Design

Inv. ResearchProgrammingStatistics

Data driven journalism is a workflow that consists of digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing it and making a story. Mirko Lorenz

Key Challenges

Data literacyTech tools adoptionDiscovery & re-useRe-architecting the newsroom

Evolving consumersMobile firstRethink monetizationDisaggregation of contentInternalIndustry

digital strategies are evolving..The NY Times Strategy

The New York Times is now as much a technology company as a journalism companyBill KellerExecutive editor The CNN Strategy

While the New York Times keeps track with today's technological disruption by turning partly into a technology company themselves, CNN tries a slightly different approach: close collaboration.

Bucking the trendOthers focused on cost. We increased newsroom sizeAs stories were growing shorter, our stories grew in lengthDespite all advice, we erected a digital paywall

There are very few companies with the luxury of focusing on serious journalism today.Mark ThompsonCEO, The New York Times

BuzzFeed: Building The Next Media GiantBuild original, viral contentNative AdvertisingBlend content & advertising seamlessly

A creative idea plus a fresh network is the best way to go from zero to millions.Jonah PerettiCEO, BuzzFeed

SHOWme what is happening with the dataExplainto me why its happeningAllow me toexploreand figure it outJustexposethe data to meLow effortHigh effortHigh effortLow effortCreatorConsumerThere are many ways to aid consumption through data journalism

11PRINTPic Source: Flickr/goodwines (https://www.flickr.com/photos/goodwines/5181402251)

SHOWme what is happening with the dataExplainto me why its happeningAllow me toexploreand figure it outJustexposethe data to meLow effortHigh effortHigh effortLow effortCreatorConsumerThere are many ways to aid data consumption

EducationPredicting marksWhat determines a childs marks?Do girls score better than boys?Does the choice of subject matter?Does the medium of instruction matter?Does community or religion matter?Does their birthday matter?Does the first letter of their name matter?

Based on the results of the 20 lakh students taking the Class XII exams at Tamil Nadu over the last 3 years, it appears that the month you were born in can make a difference of as much as 120 marks out of 1,200.June borns score the lowestThe marks shoot up for Aug borns and peaks for Sep-borns120 marks out of 1200 explainable by month of birthAn identical pattern was observed in 2009 and 2010 and across districts, gender, subjects, and class X & XII.Its simply that in Canada the eligibility cutoff for age-class hockey is January 1. A boy who turns ten on January 2, then, could be playing alongside someone who doesnt turn ten until the end of the yearand at that age, in preadolescence, a twelve-month gap in age represents an enormous difference in physical maturity.-- Malcolm Gladwell, Outliers

TN Class X: English

TN Class X: Social Science

TN Class X: Mathematics

ICSE 2013 Class XII: Total marks

Lets look at 15 years of US Birth DataThis is a dataset (1975 1990) that has been around for several years, and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known.For example,Are birthdays uniformly distributed?Do doctors or parents exercise the C-section option to move dates?Is there any day of the month that has unusually high or low births?Are there any months with relatively high or low births?Very high births in September. But this is fairly well known. Most conceptions happen during the winter holiday seasonRelatively few births during the Christmas and Thanksgiving holidays, as well as New Year and Independence Day.Most people prefer not to have children on the 13th of any month, given that its an unlucky daySome special days like April Fools day are avoided, but Valentines Day is quite popular

More birthsFewer births on average, for each day of the year (from 1975 to 1990)

The pattern in India is quite differentThis is a birth date dataset thats obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns.

For example,Is there an aversion to the 13th or is there a local cultural nuance?Are holidays avoided for births?Which months have a higher propensity for births, and why?Are there any patterns not found in the US data?Very few children are born in the month of August, and thereafter. Most births are concentrated in the first half of the yearWe see a large number of children born on the 5th, 10th, 15th, 20th and 25th of each month that is, round numbered datesSuch round numbered patterns a typical indication of fraud. Here, birthdates are brought forward to aid early school admission

More birthsFewer births on average, for each day of the year (from 2007 to 2013)

This adversely impacts childrens marksIts a well established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer.

The average marks of children born on the 1st, 5th, 10th, 15th etc. of the month tend to score lower marks. Are holidays avoided for births?Which months have a higher propensity for births, and why?Are there any patterns not found in the US data?

Higher marksLower marks on average, for children born on a given day of the year (from 2007 to 2013)Children born on round numbered days score lower marks on average,due to a higher proportion of younger children

Wealth of Candidates22

http://times.gramener.com/candidates/

Rs 1 Lakh

Rs 10 Lakhs

Rs 1 Crore

Rs 10 Crores

Rs 100 crore

Rs 1,000 Crores

Rs 10,000 Crores

How rich are the candidates?31

How rich are the candidates?32

MLA Attendance Statistics33

< 50< 75< 95< 100= 100

100.0%

100.0%

100.0% ..

100.0%

100.0%

100.0% ..

100.0% .

100.0% .

100.0%

100.0% ..

99.3% ..

98.5%

98.5% .

98.1% .

98.1%

97.4%

96.8% ..

96.8%

96.1%

95.5%

95.5%

95.5%

95.5% ..

95.5%

94.8% ..

94.8%

94.2% .

94.2% ..

93.5%

93.5% ..

92.9%

92.2%

92.0% ..

91.7% .

91.6%

91.6% .

90.9% .

90.9% .

90.9% ..

90.9%

90.3%

90.3% .

90.3% ..

89.6%

89.6% ..

89.0%

89.0% ..

88.9%

88.3% ..

87.0%

87.0% ..

87.0% ..

87.0% .

87.0%

87.0% .

86.4% .

86.4% ..

86.4% .

85.7%

85.7% ..

85.1%

85.1% .

84.4%

84.4%

83.8%

83.8%

83.6%

83.1%

82.5% .

82.5% ..

81.8% ..

81.8%

81.8% .

81.2%

81.2%

81.2%

80.5%

79.9% ..

79.9% .

79.9%

79.2%

78.6%

78.6% .

78.6% ..

77.9%

77.9% ..

77.3%

76.6%

76.6% .

76.0%

74.8%

74.6%

74.0%

71.4%

70.8% ..

70.8%

70.8%

70.5% ..

69.5%

69.5% .

68.7% ..

68.2%

67.5% ..

67.5% .

67.5%

66.9%

64.3%

59.1%

55.2%

54.5% ..

53.9%

51.9% .

42.2% .

31.1% ..

97.4% ..

97.4%

97.4% ..

96.8%

96.1% ..

96.1%

94.8%

94.2% ..

93.5%

92.9% ,

92.2% .

90.9% .

90.3% .

90.3% . ..

90.3% .

90.3% ..

89.6% .

89.6% ..

89.0%

89.0% .

88.3%

88.3% ..

87.7% .

87.0% .

86.4%

86.4% ..

86.4% ..

86.4%

85.7%

84.4% ..

84.4% .

84.3%

83.8% .

83.8%

83.1% ..

83.1% ..

81.8% .

81.2%

81.2%

80.5% .

79.2% ..

79.2%

78.6% .

78.6%

77.3%

75.8%

75.3% .

75.3% ..

74.7% .

74.0% ..

74.0%

74.0%

74.0%

73.4%

73.4%

73.4%

72.1% .

70.8% .

70.1%

68.3%

68.2% ..

68.2% .

66.9%

66.9%

66.2% . .

65.6%

64.9%

63.6% ..

63.0% .

57.8% ..

51.9%

39.6%

34.4%

99.3% ..

98.5%

97.8%

94.0%

91.8% .

85.7% .

64.3% .

97.8%

96.1%

94.2% ..

93.3% .

92.2%

92.2% .

91.6% ..

90.3% ..

87.0% ..

86.4%

85.7% ..

82.5%

81.8%

81.8% .

81.5% .

80.5%

78.6% ..

78.1% ..

77.9% ..

75.3% ..

74.7% ..

73.4%

73.4% ..

70.8% ..

66.9% ..

65.6%

62.3% ..

59.1%

58.4%

39.0% MLA attendance at the AssemblyKarnataka, 2008-2012

< 50< 75< 95< 100= 100

100.0%

100.0%

100.0% ..

100.0%

100.0%

100.0% ..

100.0% .

100.0% .

100.0%

100.0% ..

99.3% ..

98.5%

98.5% .

98.1% .

98.1%

97.4%

96.8% ..

96.8%

96.1%

95.5%

95.5%

95.5%

95.5% ..

95.5%

94.8% ..

94.8%

94.2% .

94.2% ..

93.5%

93.5% ..

92.9%

92.2%

92.0% ..

91.7% .

91.6%

91.6% .

90.9% .

90.9% .

90.9% ..

90.9%

90.3%

90.3% .

90.3% ..

89.6%

89.6% ..

89.0%

89.0% ..

88.9%

88.3% ..

87.0%

87.0% ..

87.0% ..

87.0% .

87.0%

87.0% .

86.4% .

86.4% ..

86.4% .

85.7%

85.7% ..

85.1%

85.1% .

84.4%

84.4%

83.8%

83.8%

83.6%

83.1%

82.5% .

82.5% ..

81.8% ..

81.8%

81.8% .

81.2%

81.2%

81.2%

80.5%

79.9% ..

79.9% .

79.9%

79.2%

78.6%

78.6% .

78.6% ..

77.9%

77.9% ..

77.3%

76.6%

76.6% .

76.0%

74.8%

74.6%

74.0%

71.4%

70.8% ..

70.8%

70.8%

70.5% ..

69.5%

69.5% .

68.7% ..

68.2%

67.5% ..

67.5% .

67.5%

66.9%

64.3%

59.1%

55.2%

54.5% ..

53.9%

51.9% .

42.2% .

31.1% ..

97.4% ..

97.4%

97.4% ..

96.8%

96.1% ..

96.1%

94.8%

94.2% ..

93.5%

92.9% ,

92.2% .

90.9% .

90.3% .

90.3% . ..

90.3% .

90.3% ..

89.6% .

89.6% ..

89.0%

89.0% .

88.3%

88.3% ..

87.7% .

87.0% .

86.4%

86.4% ..

86.4% ..

86.4%

85.7%

84.4% ..

84.4% .

84.3%

83.8% .

83.8%

83.1% ..

83.1% ..

81.8% .

81.2%

81.2%

80.5% .

79.2% ..

79.2%

78.6% .

78.6%

77.3%

75.8%

75.3% .

75.3% ..

74.7% .

74.0% ..

74.0%

74.0%

74.0%

73.4%

73.4%

73.4%

72.1% .

70.8% .

70.1%

68.3%

68.2% ..

68.2% .

66.9%

66.9%

66.2% . .

65.6%

64.9%

63.6% ..

63.0% .

57.8% ..

51.9%

39.6%

34.4%

99.3% ..

98.5%

97.8%

94.0%

91.8% .

85.7% .

64.3% .

97.8%

96.1%

94.2% ..

93.3% .

92.2%

92.2% .

91.6% ..

90.3% ..

87.0% ..

86.4%

85.7% ..

82.5%

81.8%

81.8% .

81.5% .

80.5%

78.6% ..

78.1% ..

77.9% ..

75.3% ..

74.7% ..

73.4%

73.4% ..

70.8% ..

66.9% ..

65.6%

62.3% ..

59.1%

58.4%

39.0%

JD(S)

IND

INC

BJPPARTY LEGENDSAttendance percentage of MLAsAttendance %

36TVPic Source: Flickr/FaceMePLS (https://www.flickr.com/photos/faceme/1457252072/)

SHOWme what is happening with the dataExplainto me why its happeningAllow me toexploreand figure it outJustexposethe data to meLow effortHigh effortHigh effortLow effortCreatorConsumerThere are many ways to aid data consumption

Exploring politics as Data stories..

Has there ever been an all-woman election?Whos the oldest candidate ever?Who won by the lowest margins ever in history?Was there ever an uncontested win?Som Marandi (BJP) and Konathala Ramakrishna (INC) won by just 9 votes in Bihar, 1998 and AP, 1989 respectively.Since 1989, no election was won uncontested. Srinagar, J&K was the last, where Mohammad Shafi Bhat of JKN won without competition.Only two elections had women candidates exclusively: Karur, TN (1967) and Panskura, WB (1977). Only 8 had a woman majority ever.Arif Ahmed Shaikh Jafhar (NBNP) contested the 2009 elections from Dhhule, MH at age 99, making him the oldest candidate ever in India.

Which party has the largest number of victories in Lok Sabha elections?41

https://gramener.com/election/parliament42

Whats the largest number of candidates that stood in an election?43

https://gramener.com/election/cartogram?ST_NAME=Tamil%20Nadu

Live Results

Our CNN-IBN Microsoft Election Analytics Canter, which you can see at www.bing.com/electionsorelection-results.ibnlive.in.com, served over 10 million requests on 16th May 2014 the day of India election results.

This is one of the largest real-time visualisations that we (and perhaps many others) have attempted

45

46

https://gramener.com/timesnow/Times Now Coverage had80%+ viewership

47

48DIGITALPic Source: Flickr/ThomasHawk (https://www.flickr.com/photos/thomashawk/192567803/)

SHOWme what is happening with the dataExplainto me why its happeningAllow me toexploreand figure it outJustexposethe data to meLow effortHigh effortHigh effortLow effortCreatorConsumerThere are many ways to aid data consumption

We have internal information. Getting information from outside is our challenge. Theres no way of doing that. Senior EditorLeading Media Company

Indias religions

Australias religions

Utterly, Butterly, Colourful

54

The Network Layout of each sonnet shows how Shakespeare wove together words to build a sonnet. Each circle is a word and the lines show the direction (or link) to the next word.Shakespeares SonnetsThe colour of the circle is an approximate indication of thePart of Speech!). The sonnet currently selected - Sonnet 7 is most textually similar to Sonnet 67 (25.40 %).56

Which is the least successful party in Indian elections history?

Which is the least successful party?

https://gramener.com/election/parliament#story.ddp

Padma Awards: Dashboards without data

59

60

https://youtu.be/e3hssOzuwGc

60

Sudar, Yahoo!Anand C, ConsultantKiran, HasgeekAnand S, GramenerMugunth, Steinlogic Honcheng, buUukSau Sheong, HP LabsLim Chee Aung

BangaloreSingapore

1 follower100 followers A follows B (or)B follows AMost followed in BangaloreMost followed in SingaporeExporing the Social Network of Coders

Tata TeleservicesTata Consultancy ServicesTata Business Support ServicesTata Global BeveragesTata Infotech (merged)Tata Toyo RadiatorHoneywell Automation IndiaTata CommunicationsA G C NetworksTata TechnologiesTata ProjectsTata PowerTata FinanceIdea CellularTata MotorsTata SonsTata SteelTayo RollsTata SecuritiesTata CoffeeTata Investment CorpA J EngineerH H MalghamH K SethnaKeshub MahindraRavi KantRussi ModySujit GuptaA S BamAmal GanguliD B EngineerD N GhoshM N BhagwatN N KampaniU M RaoB MuthuramanIshaat HussainJ J IraniN A PalkhivalaN A SoonawalaR GopalakrishnanRatan TataS RamadoraiS Ramakrishnan

Directorships at the Tatas

Every person who was a Director at the Tata Group is shown here as an orange circle. The size of the circle is based on the number of directorship positions held over their lifetime.

Every company in the Tata Group is shown here as a blue circle. The size of the circle is based on the number of directors the company has had over time.

Every directorship relation is shown by a line. If a person has held a directorship position at a company, the two are connected by a line.

The group appears to be divided into two clusters based on the network of directorship roles.Prominent leadersbridge the groupsSecond group of companiesFirst group of companiesSome directors are mainly associated with the first group of companiesSome directors are mainly associated with the second group of companies

63The Boundaries across different Media are Blurring

64..and Newer Genres are Emerging

Visualisation is imperative forData Insights ActionSpot the unusualCommunicate patternsSimplify decisions

We handle terabyte-size datavia non-traditional analyticsand visualise it in real-time.

Gramener visualises your dataGramener transforms your data into concise dashboardsthat make your business problem & solution visually obvious.We help you find insights quickly, based on cognitive research,and our visualisations guide you towards actionable decisions.

A Data Science CompanyGANES KESARI [email protected]/@kesaritweets

Gramener is a data analytics and visualisation company. We handle large-scale data via non-traditional analytics (by which we mean programmatic analysis) and visualize the results in real-time.

The visualizations are our key differentiator.

We transform your data into concise dashboards that make it easy for you to find the problems as well as the solution.

We help you find these insights quickly, based on our work in cognitive research, and our visualizations guide you towards actionable decisions.

In other words, we make enterprise data consumption very easy.66