the data journalism handbook v0.1
DESCRIPTION
A collaboration between all those interested in the future of newsTRANSCRIPT
The Data Journalism Handbook
Version 0.1
Contributors
Contributors to this book include:
David Banisar, Article 19Caelainn Barr, EU Data JournalistMariano Blejman, Hacks/Hackers
Marianne Bouchart, Data Journalism BlogLiliana Bounegru, European Journalism Centre
Brian Boyer, Chicago TribuneJane Park, Creative Commons
Paul Bradshaw, City University LondonLucy Chambers, Open Knowledge Foundation
Helen Darbishire, Access Info EuropeSteve Doig, Cronkite School of Journalism
David Erwin, New York TimesLisa Evans, Guardian Datablog
Tom Fries, Bertelsmann StiftungDuncan Geere, Wired.co.uk
Rich Gordon, Northwestern UniversityJonathan Gray, Open Knowledge Foundation
Ted Han, DocumentCloudKate Hudson, Open Journalism
Francis Irving, ScraperWikiLizzie Jackson, Ravensbourne College
Nicolas Kayser-Bril, Data JournalistJohn Keefe, New York Public Radio
Friedrich Lindenberg, Open Knowledge FoundationLorenz Matzat, OpenDataCityAidan McGuire, ScraperWiki
Cynthia O'Murchu, Financial TimesAron Pilhofer, New York Times
Anthony Reuben, BBCSimon Rogers, Guardian Datablog
Amanda Rossi, freelance journalistFabrizio Scrollini, London School of Economics
Adam Thomas, Source FabricSascha Venohr, Zeit OnlineJerry Vermanen, De Stentor
César Viana, César Viana, Estacio de Sa UniversityFarida Vis, University of Leicester
Lulu Pinney, Infographic design (Telling Information)
This work is licensed under a Creative Commons Attribution Sharealike license.
Tables of contents
The Data Journalism HandbookContributorsTables of contents0. Preface
0.1 The purpose of this book0.2 Add to this book0.3 Share this book
1. Introduction1.1 What is data journalism?1.2 Why is it important?1.3 How is it done?1.4 Examples, case studies and interviews
1.4.1 Data powered stories1.4.2 Data served with stories1.4.3 Data driven applications
1.5 Making the case for data journalism1.5.1 Measuring impact1.5.2 Sustainability and business models
2. Getting data2.1 Where does data live?
2.1.1 Open data2.1.2 Social data services2.1.3 Research data
2.2 Asking for data2.2.1 Freedom of Information laws2.2.2 Helpful public servants
2.3 Getting your own data2.3.1 Scraping data2.3.2 Crowdsourcing data
3. Understanding data3.1 Data literacy3.2 Working with data3.3 Tools for analysing data3.4 Annotating data
4. Delivering data
4.1 From datasets to stories4.2 Publishing data4.3 Visualising data4.4 Data driven applications4.5 Engagement, outreach and community
5. Appendix5.1 Further resources
Notes: First draft deadline: Sunday, November 6th, 17.00 GMT (Please inform us if you finish your contribution earlier so we can start editing it) Project hashtag: #ddjbook Project URL:
0. Preface
0.1 The purpose of this book
Overview: Explain what this book does and doesn’t aim to do Authors: Jonathan Gray, Liliana Bounegru Length: 0.5-1 page
0.2 Add to this book
Overview: Explain how to contribute to future versions of this book Authors: Jonathan Gray Length: 0.5 page
0.3 Share this book
Overview: Encourage people to share this book Authors: Jonathan Gray Length: 0.5 page
1. Introduction
1.1 What is data journalism?
Overview: Define and describe data journalism and how it is different from other
forms of journalism. ● Authors: Paul Bradshaw, Jonathan Gray, [Heather Brooke], [Simon Rogers],
[Nicolas Kayser-Bril], [Richard Gordon] Length: 1-2 pages (with quotes from different people)
UPDATE: input from Paul Bradshaw, Jonathan Gray STILL NEED: Snappy quotes from different people on what data journalism is, and what it isn’t. EDITOR: Liliana
1.2 Why is it important?
Overview: Put data journalism into context and explain why it matters and what potential it has.
Authors: Tom Fries, [Paul Bradshaw], [Jonathan Gray], [Heather Brooke], [Simon Rogers], [Nicolas Kayser-Bril], [Richard Gordon]
Length: 1 page (with quotes) UPDATE: input from Tom Fries and Nicholas Kayser-Bril STILL NEED: Snappy quotes from different people on why data journalism is important. EDITOR: Liliana
1.3 How is it done?
Overview: Explain different ways of doing data journalism (e.g. journalists who can code vs coders for hire, off the shelf tools vs. custom web applications, in house graphics departments vs hired data visualisation experts, etc). Give examples of how it is being done in different newsrooms.
Authors: Lucy Chambers, [Aron Pilhofer], [Simon Rogers], [Anthony Reuben], [Cynthia O'Murchu], [Sascha Venohr], [Caelainn Barr]
Length: 2-3 pages (with examples and quotes) UPDATE: input from Zeit Online, notes from the Guardian and Chicago Tribune STILL NEED: More case studies, quotes and examples. In particular get input from BBC, Chicago Tribune, FT, Guardian and NYT. And talk about how to find developers, designers and issue experts.
EDITOR: Liliana
1.4 Examples, case studies and interviews
1.4.1 Data powered stories
● Overview: Give and describe successful examples of data powered stories you worked on. Describe how you produced these stories. The aim is to give journalists and decision-makers in newsrooms who might be interested in data journalism a sense of what the potential of data powered stories is and how they could go about producing them.
○ What data did you use and how did you obtain it?○ What determined you to start this project?○ What did the project aim to achieve?○ How long did you work on the project?○ How many people worked on it?○ What was the cost of the project?○ What were the skills necessary for this project? (domain knowledge,
coding, research, visualisation, etc.)○ What is the role of datasets in these stories? (e.g.: give rise to new stories,
enrich stories, contextualize stories, help journalists explore topics in new ways, etc.)
○ What was your approach? (exploratory vs. hypothesis approach)○ What techniques and tools did you use?○ How did you present the data powered story?○ What is the potential of data powered stories?○ Why should journalists/newsrooms be interested in producing such
projects?○ What were the challenges in producing these stories?○ What tips and advice would you give to journalists who want to work on
similar projects?○ Please include relevant links, videos and images.
● Authors: Caelainn Barr, James Ball, Sascha Venohr, [Anthony Reuben], Cynthia O'Murchu, [Heather Brooke]
● Length: 1.5-3 pages per example UPDATE: Zeit Online STILL NEED: More case studies - e.g. from Amanda on Brazilian citizen journalists, from Chicago Tribune, data journalism on the radio, Guardian (Lisa or James). EDITOR: Lucy/Kat
1.4.2 Data served with stories
● Overview: Give and describe successful examples of data served with stories
you worked on. Describe how you produced these projects. The aim is to give journalists and decision-makers in newsrooms who might be interested in data journalism a sense of what the potential of data served with stories is and how they could go about producing them.
○ What data did you use and how did you obtain it?○ What determined you to start this project?○ What did the project aim to achieve?○ How long did you work on the project?○ How many people worked on it?○ What was the cost of the project?○ What were the skills necessary for this project? (domain knowledge,
coding, research, visualisation, etc.)○ What is the role of datasets in these stories? (e.g.: provide additional
context or insight, etc.)○ What was your approach? (exploratory vs. hypothesis approach)○ What techniques and tools did you use?○ How did you present the story and the data served with it?○ What is the potential of such projects?○ Why should journalists/newsrooms be interested in producing such
projects?○ What were the challenges in producing these projects?○ What tips and advice would you give to journalists who want to work on
similar projects?○ Include relevant links, videos and images.
● Authors: Caelainn Barr, James Ball, Sascha Venohr, [Anthony Reuben], [Cynthia O'Murchu], [Heather Brooke]
Length: 1.5-3 pages per example UPDATE: needs doing! STILL NEED: Guardian, BBC, … Who else serves data with stories? EDITOR: Lucy/Kat
1.4.3 Data driven applications
● Overview: Give and describe successful examples of data driven applications you worked on. Describe how you produced these applications. The aim is to give journalists and decision-makers in newsrooms who might be interested in data journalism a sense of what the potential of data driven applications is and how they could go about producing them.
○ What data did you use and how did you obtain it?○ What determined you to start this project?○ What did the project aim to achieve?○ How long did you work on the project?○ How many people worked on it?○ What was the cost of the project?○ What were the skills necessary for this project? (domain knowledge,
coding, research, visualisation, etc.)○ What was your approach?○ What techniques and tools did you use?○ How did you present the outcome?○ What is the potential of such projects?
○ Why should journalists/newsrooms be interested in producing such projects?
○ What were the challenges in producing these projects?○ What tips and advice would you give to journalists who want to work on
similar projects?○ Include relevant links, videos and images.
Authors: Aron Pilhofer, Marcus Bösch Length: 1.5- 3 pages per example
UPDATE: needs doing! STILL NEED: Guardian, NYT, BBC, … EDITOR: Lucy/Kat
1.5 Making the case for data journalism
1.5.1 Measuring impact
Overview: Give overview of the potential of data journalism (e.g. engaging with new audiences, the future of journalism on the web) and how it could be measured. Include results of EJC survey on training needs for data journalism
● Authors: Liliana Bounegru, [Lorenz Matzat] Length: 1 page
1.5.2 Sustainability and business models
Overview: Discuss costs, sustainability and business models for data journalism. Provide successful and less successful examples and explain what lessons can be learned from them.
Authors: Lorenz Matzat Length: 1-2 pages
UPDATE: 1.5 still needs doing! STILL NEED: input from Guardian, Deutsche Welle, Zeit Online, NYT, etc. EDITOR: Liliana
2. Getting data
2.1 Where does data live?
2.1.1 Open data
Overview: An overview of open data sources, what they contain, how to find
them, how to search them, examples of open data being used by journalists Authors: Jonathan Gray, brian boyer Length: 1-3 pages (with links and examples)
2.1.2 Social data services
Overview: An overview of community driven websites which aim to help you find the data you need - such as GetTheData.org and TheDataHub.org - and their function in enabling collaboration around datasets
Authors: Jonathan Gray Length: 0.5-1 page (with links and examples)
2.1.3 Research data
Overview: An overview of sites to find research data Authors: Length: 0.5-1 page (with links and examples)
UPDATE: Great input and notes from Brian Boyer/Chicago Tribune, Jane Park/Creative Commons, John Keefe/WNYC, Chrys Wu/HacksHackers. STILL NEED: Needs to be written up and expanded. EDITOR: Friedrich
2.2 Asking for data
2.2.1 Freedom of Information laws
Overview: An overview of FOI legislation, an example of making an FOI request, information on resource in this area, how to get help from FOI experts
Authors: Helen Darbishire (Access Info), Fabrizio Scrollini (London School of Economics)
Length: 1-3 pages (with links and examples)
2.2.2 Helpful public servants
Overview: How talking directly with public servants or engaging with official open data initaitves might help you to find the data you need
Authors: [Jonathan Gray] Length: 0.5-1 page (with links and examples)
UPDATE: First draft almost done. STILL NEED: Editing and peer-review. EDITOR: Liliana/Friedrich
2.3 Getting your own data
2.3.1 Scraping data
Overview: Explaining basic idea of web scraping, why this can be necessary, examples of how this has been used by journalists and guide for absolute beginners on how it can be done based on an interesting case study
Authors: Francis Irving, Aidan McGuire, [Friedrich Lindenberg] Length: 2-3 pages (with links, examples, and a basic tutorial)
UPDATE: Input from Friedrich Lindenberg, Federica Cocco, Glenn McMahon and Francis Irving. STILL NEED: Needs to be written up and expanded. EDITOR: Friedrich
2.3.2 Crowdsourcing data
Overview: Explaining basic idea of crowdsourcing data, how various projects have used this, and how to do this (e.g. using Google Spreadsheets, forms, maps, Twitter hashtags, etc)
Authors: [Simon Rogers], [Lisa Evans] Length: 1-3 pages (with links and examples)
UPDATE: Input from Marianne Bouchart and others (not in the Google doc yet), Guardian (notes) STILL NEED: Nicolas-Kayser Bril (water data) and other examples EDITOR: Liliana/Friedrich
3. Understanding data
3.1 Data literacy
Overview: Explaining data literacy and its importance (including statistical/numerical literacy, use of mathematics, technical literacy, etc)
Authors: James Ball, Nicolas Kayser-Bril, Richard Gordon
Length: 1-3 pages UPDATE: input from Lisa Evans, Richard Gordon, Lizzie Jackson, Amanda Rossi, JV Chamary, Fabrizio Scrollini STILL NEED: Input from Nicholas Kayser-Bril, and quotes from Lisa Evans, Amanda on verifying data, citizen journalism, etc EDITOR: Liliana
3.2 Working with data
Overview: What you need to work with datasets: background knowledge, technical ability, etc. (case study approach with lessons learned from each project presented)
Authors: James Ball, Steve Doig Length: 1-2 pages per case study
UPDATE: Input from Claire Miller and Steve Doig STILL NEED: Further input and ideas EDITOR: Liliana
3.3 Tools for analysing data
Overview: Overview of different types of tools for analysing and working with datasets, examples of how they can be used, examples of how they have been used by journalists.
Authors: [Nicola Hughes], [Lisa Evans], [Friedrich Lindenberg], [Nicolas Kayser-Bril]
Length: 1-2 pages per case study UPDATE: Needs doing! STILL NEED: Input from Friedrich. EDITOR: Friedrich.
3.4 Harnessing external expertise
Overview: How to enable people to annotate and comment on datasets
● Authors: [Aron Pilhofer] Length: 1 page
UPDATE: Needs doing! STILL NEED: Input from Guardian, OWNI, NYT? EDITOR: Liliana
4. Delivering data
4.1 From datasets to stories
Overview: Explaining how to find stories in datasets (various approaches), including examples and case studies. Also looking at the broader role of data journalists in the newsroom, how they work with other journalists, etc.
● Authors: Caelainn Barr, [Cynthia O'Murchu], [Heather Brooke], [Lisa Evans], [Sascha Venohr]
Length: 0.5-1 page per approach/case study UPDATE: Some material STILL NEEDS: Expanding and editing EDITOR: Jonathan
4.2 Publishing data
Overview: Overview of ways to publish data including examples. Embedding data, raw data (formats), live data live data, updating data, APIs. Who is your data for. Also a section on knowing the law, ethics and privacy and open licensing.
Authors: Length: 1-2 pages
UPDATE: Needs doing! STILL NEED: EDITOR: Jonathan
4.3 Visualising data
Overview: How to visualise data - off the shelf tools and custom visualisations with step by step guides demonstrated on an example
● Authors: [Lulu Pinney], [Alastair Dant]
Length: 1-2 pages per case study UPDATE: Good start! STILL NEED: Needs expanding and editing, and more examples. EDITOR: Jonathan
4.4 Data driven applications
Overview: Step by step guide, tips and tricks for how newsrooms can produce data driven applications
What are the resources (skills, costs, etc.) needed? What are the steps to take when you want to build a data driven
application? What useful lessons did you learn from your own experience? Why should newsrooms be interested in producing data driven
applications? What is the potential of such projects?
● Authors: Aron Pilhofer Length: 2-3 pages (including examples)
UPDATE: Needs doing! Aron? STILL NEED: Ideas on how to get started, design process, etc. EDITOR: Jonathan
4.5 Engagement, outreach and community
Overview: Knowing your audience (and pitching appropriately), dissemination and outreach, social media, building community, engaging with existing communities (designers, developers, etc).
Authors: Length: 1-2 pages
UPDATE: Duncan (Wired) working on it now. Needs more input. EDITOR: Jonathan
5. Appendix
5.1 Further resources
Overview: Lists of links, resources, examples and other bits and pieces that don’t fit in the handbook
Authors: Everyone! Length: 5 pages
UPDATE: Needs doing! STILL NEED: Lots of ideas from everyone. EDITOR: Jonathan