nr14: ten tips for data journalists
DESCRIPTION
TRANSCRIPT
10 things every data journalist should know
NR14, HamburgJennifer LaFleur
Center for Investigative Reporting
A bit about CIR
Nonprofit investigative newsroomPublic interest investigative journalismBased near San FranciscoAbout 80 staffPrint, web, radio and tv
A little data journalism history
1952 1967 1980s …
#1 data is a powerful reporting tool
It takes you beyond the anecdote
And It’s easier than dealing with this
#1 data is a powerful reporting tool
Contrasts are in the data
Caution: This slide contains extreme nerdiness
#1 data is a powerful reporting tool
Contrasts are in the dataYour most powerful figures are in the data
Source: California Health Dept. data, Medicare billing data
Findings: Some hospitals had “alarming rates of a Third World nutritional disorder among its Medicare patients.”
Contrasts are in the dataYour most powerful figures are in the dataYou can make connections you might not be able to make otherwise
#1 data is a powerful reporting tool
Data: Youth prison workers, criminal convictions and grievance data
Findings: Employees with criminal backgrounds were more likely to be accused of abusing inmates.
Data: Federal bridge inspections and stimulus funding.
Findings: Some of the nation’s worst bridges did not get stimulus funds.
Contrasts are in the dataYour most powerful figures are in the dataYou can make connections you might not be able to make otherwiseYou can test assumptions
#1 data is a powerful reporting tool
Source: NHTSA complaint data
Findings: “…unintended acceleration has been a problem across the auto industry.”
#2 data comes from many places
If something is inspectedLicensedEnforced orPurchased
…There probably is a database
Where’s the data?
If there is a reportOr a formThere probably is a database
Where’s the data?
Sometimes data is readily available online for download
Where’s the data?
Sometimes you have to scrape it.
That usually involves programs that automate searching tasks on Web sites.
Where’s the data?
More often you need to go to an agency or source to get the data
Where’s the data?
Source: School district credit card purchases
Findings: District card holders made questionable purchases with their cards.
#3 people who keep data don’t always want t give it up
Getting electronic information
Know the law. Know what information you want.Do your homeworkKnow what the appropriate cost should be.Know who does the data entry. Get to know the computer people.
Just another way of saying no
Huge costsDelay tactics“Oh you silly little journalist”Sending you the wrong thing“Your request was unclear”HIPAAPrivacyPrivatization
#4 Sometimes holes in data can be a story
#5 Even when there is no data, you can use techniques for sampling and building a database.
SamplingPhysical surveys – go look at oneTestingQuestionnaires, polls and surveysBuilding from documents
We built a database of 500 people who had been granted or denied pardons during the Bush administration.
We started with a list of nearly 2,000 people. From that, we pulled a random sample. Then spent months researching the individuals.
We found that even after controlling for other factors, whites were more likely to get a pardon.
To examine food safety, the Center for Investigative Reporting in Bosnia sampled food – literally -- and had it tested in labs.
SVT surveyed 355 counties and districts about drug control – all replied (Courtesy Helena Bengtsson)
#6 Sometimes the crowd can help you
Where’s the data?
#7 There are many data tools – choose the right one
SpreadsheetsDatabasesMappingStatisticsProgramming
Source: Salary data and other charter school records
Findings: Reporters Found nepotism in charter schools and administrators earning six-figure salaries to run schools with only a few hundred or a couple of thousand students
Source: Washington Health Department dataFindings: “MRSA has been quietly killing in hospitals for decades.” But no one had tracked it until this story.
Source: City Budget
Findings: Some neighborhoods suffer more than others as mayor cuts budgets
SOURCE: Local health department inspection reports
FINDINGS: At 28% of the venues, more than half of the concession stands or restaurants had been cited for at least one "critical" or "major" health violation.
#8 Sharing data is good, but give it context and be sure it is right
Source: EPA and state data on hazardous chemical locationsFindings: Dallas County has 900+ sites that store hazardous chemicals
Source: Medicaid outcomes data for dialysis facilities
Findings: A CMS online tool did not tell the whole story about facilities. In some counties the gap in measures, such as survival rate were vast.
Source: Dam inspection data from Texas and federal government
Findings: Dam records had not been updated to account for population growth
#9 Data intended for one purpose can be used in other ways
Source: 311 calls for downed trees
Findings: After a tornado swept across New York City, 311 calls for downed trees helps trace its path
Disparities in water usage
“Water use highest in poor areas of the city”Mapping and statistical analysis
#10: No data is perfect
Check your data
• Read the documentation. Understand the contents of every field.
• Know how many records you should have.• Check counts and totals against reports.• Are all possibilities included? All states, all counties,
correct ranges?• Check for missing data, duplicates, internal
problems
Jennifer [email protected]
@j_la28www.cironline.org