data journalism 101 - part 1 by michael j. berens
DESCRIPTION
Pulitzer Prize winner, Michael J. Berens of The Seattle Times presents "Data Journalism 101," a three-hour, hands-on workshop for the Donald W. Reynolds National Center for Business Journalism at the Excellence in Journalism Conference in Nashville, Tenn. on Sept. 4, 2014. Part 1 provides an intro to databases and their importance to reporting. For more business journalism training opportunities and resources, please visit http://businessjournalism.org.TRANSCRIPT
Data Journalism 101
Excellence in Journalism Conference 2014
Donald W. Reynolds National Center for Business Journalism at ASU
Michael J. Berens – !e Seattle Times
Session One: Intro to Databases Accessing and managing data for stories
He said. She said.
Now I’m going to tell you who’s telling the truth.
Cells, !elds and headers – oh my!
Database Options Create your own database
� Obtain sources of information (paper records)
Import existing database
� Obtain existing database
� Scrape data from the web
Finding a serial killer
Track the exploitation of
vulnerable seniors
SUNDAY, SEPTEMBER 12, 2010
Deaths in adult homes hidden and ignored
Abuse and neglect may have killed hundreds of residents. But withnobody questioning the circumstances, troubled homes are staying open.
C O U R T E S Y O F J A M E S R U D O L P H
A HOME’S MISTREATMENT PROVES DEADLYNeglect at an adult family home is blamed for the 2008 death of 87-year-old Jean Rudolph, a retired nursing educator who had Alzheimer’s disease and heart problems. Infection from severe bedsores, which developed during her stay at the home, spread to her vital organs.
A SEATTLE TIMES INVESTIGATION / PART 4
Tracking fraudulent
medical devices and pro!teers
Follow the Information � You’ve received an unsolicited email from a doctor who
claims that scores of pain patients have accidentally died from methadone overdoses.
� "e doctor claims that the State of Washington pushes methadone as a “preferred drug” because it’s the least expensive.
� "e doctor claims the state fails to warn patients about the unique risks of methadone.
Find the data sources � Death certi!cates – Track cause of death and number of
overdose victims
� ARCOS Database – Created by U.S. Drug Enforcement Agency to track controlled substances
� In-patient hospital database – Created by a dozen or so states to track types of hospitalizations
� My own questions – How many patients also took benzodiazepines? Etc.
Step 1
Request the !le layout
Fields, position, type, length Field Number Variable Type Format Label Comment
1 SEQ_NO Char $10. Sequence Number Unique sequence number assigned to each record within a year. First four digits are the year of discharge.
2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003.
3 STAYTYPE Char $1 Type of Stay 1 = Inpatient 2 = Observation patient
4 HOSPITAL Char $4 Hospital Number
DOH assigned hospital number. Fourth character describes the Medicare certified unit type with: blank = acute care R = Rehabilitation unit P = Psychiatric unit S = Swing bed unit - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - A = Alcohol (discontinued after 1992) B = Bone marrow transplants (discontinued after 2000) E = Extended care (discontinued after 2001) H = Tacoma General & Group Health combined (discontinued after 1992) I = Group Health only at Tacoma General (discontinued after 1992)
5 LINENO Num 3. Number of Reported Revenue Items Codes
6 ZIPCODE Char $5 Patient's Zip Code
99999 indicates the zip code is unknown. 99998 indicates homelessness (some homeless patients may have a zip code for a shelter or other temporary location). Blanks indicate non-U.S. residence.
7 STATERES Char $2 State of Residence
State abbreviation used by U.S. Postal Service. This is assigned from the zip code. Residents with zip code 99998 are assigned to Washington XX = invalid zip code or a non-U.S. residence.
Fixed length vs. delimited � Fixed Length
� "e data !elds measure a speci!c number of characters
� Field 1 = 10 characters long
� File layout is critical
� Delimited
� "e data !elds are separated by a common character or mark
� Like a comma or tab
� Always ask for “text delimited data,” which is easier to import than !xed length
Make a master copy
Keep a log
Delimited !le
Hands On - Hunting Database
Fixed width !le