data journalism 101 - part 1 by michael j. berens

27
Data Journalism 101 Excellence in Journalism Conference 2014 Donald W. Reynolds National Center for Business Journalism at ASU Michael J. Berens – e Seattle Times Session One: Intro to Databases Accessing and managing data for stories

Upload: reynolds-center-for-business-journalism

Post on 12-Jan-2015

167 views

Category:

Career


2 download

DESCRIPTION

Pulitzer Prize winner, Michael J. Berens of The Seattle Times presents "Data Journalism 101," a three-hour, hands-on workshop for the Donald W. Reynolds National Center for Business Journalism at the Excellence in Journalism Conference in Nashville, Tenn. on Sept. 4, 2014. Part 1 provides an intro to databases and their importance to reporting. For more business journalism training opportunities and resources, please visit http://businessjournalism.org.

TRANSCRIPT

Page 1: Data Journalism 101 - Part 1 by Michael J. Berens

Data Journalism 101

Excellence in Journalism Conference 2014

Donald W. Reynolds National Center for Business Journalism at ASU

Michael J. Berens – !e Seattle Times

Session One: Intro to Databases Accessing and managing data for stories

Page 2: Data Journalism 101 - Part 1 by Michael J. Berens

He said. She said.

Now I’m going to tell you who’s telling the truth.

Page 3: Data Journalism 101 - Part 1 by Michael J. Berens

Cells, !elds and headers – oh my!

Page 4: Data Journalism 101 - Part 1 by Michael J. Berens

Database Options Create your own database

� Obtain sources of information (paper records)

Import existing database

� Obtain existing database

� Scrape data from the web

Page 5: Data Journalism 101 - Part 1 by Michael J. Berens

Finding a serial killer

Page 6: Data Journalism 101 - Part 1 by Michael J. Berens
Page 7: Data Journalism 101 - Part 1 by Michael J. Berens
Page 8: Data Journalism 101 - Part 1 by Michael J. Berens

Track the exploitation of

vulnerable seniors

SUNDAY, SEPTEMBER 12, 2010

Deaths in adult homes hidden and ignored

Abuse and neglect may have killed hundreds of residents. But withnobody questioning the circumstances, troubled homes are staying open.

C O U R T E S Y O F J A M E S R U D O L P H

A HOME’S MISTREATMENT PROVES DEADLYNeglect at an adult family home is blamed for the 2008 death of 87-year-old Jean Rudolph, a retired nursing educator who had Alzheimer’s disease and heart problems. Infection from severe bedsores, which developed during her stay at the home, spread to her vital organs.

A SEATTLE TIMES INVESTIGATION / PART 4

Page 9: Data Journalism 101 - Part 1 by Michael J. Berens

Tracking fraudulent

medical devices and pro!teers

Page 10: Data Journalism 101 - Part 1 by Michael J. Berens
Page 11: Data Journalism 101 - Part 1 by Michael J. Berens
Page 12: Data Journalism 101 - Part 1 by Michael J. Berens

Follow the Information �  You’ve received an unsolicited email from a doctor who

claims that scores of pain patients have accidentally died from methadone overdoses.

�  "e doctor claims that the State of Washington pushes methadone as a “preferred drug” because it’s the least expensive.

�  "e doctor claims the state fails to warn patients about the unique risks of methadone.

Page 13: Data Journalism 101 - Part 1 by Michael J. Berens

Find the data sources �  Death certi!cates – Track cause of death and number of

overdose victims

�  ARCOS Database – Created by U.S. Drug Enforcement Agency to track controlled substances

�  In-patient hospital database – Created by a dozen or so states to track types of hospitalizations

�  My own questions – How many patients also took benzodiazepines? Etc.

Page 14: Data Journalism 101 - Part 1 by Michael J. Berens

Step 1

Request the !le layout

Page 15: Data Journalism 101 - Part 1 by Michael J. Berens

Fields, position, type, length Field Number Variable Type Format Label Comment

1 SEQ_NO Char $10. Sequence Number Unique sequence number assigned to each record within a year. First four digits are the year of discharge.

2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003.

3 STAYTYPE Char $1 Type of Stay 1 = Inpatient 2 = Observation patient

4 HOSPITAL Char $4 Hospital Number

DOH assigned hospital number. Fourth character describes the Medicare certified unit type with: blank = acute care R = Rehabilitation unit P = Psychiatric unit S = Swing bed unit - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - A = Alcohol (discontinued after 1992) B = Bone marrow transplants (discontinued after 2000) E = Extended care (discontinued after 2001) H = Tacoma General & Group Health combined (discontinued after 1992) I = Group Health only at Tacoma General (discontinued after 1992)

5 LINENO Num 3. Number of Reported Revenue Items Codes

6 ZIPCODE Char $5 Patient's Zip Code

99999 indicates the zip code is unknown. 99998 indicates homelessness (some homeless patients may have a zip code for a shelter or other temporary location). Blanks indicate non-U.S. residence.

7 STATERES Char $2 State of Residence

State abbreviation used by U.S. Postal Service. This is assigned from the zip code. Residents with zip code 99998 are assigned to Washington XX = invalid zip code or a non-U.S. residence.

Page 16: Data Journalism 101 - Part 1 by Michael J. Berens
Page 17: Data Journalism 101 - Part 1 by Michael J. Berens
Page 18: Data Journalism 101 - Part 1 by Michael J. Berens
Page 19: Data Journalism 101 - Part 1 by Michael J. Berens

Fixed length vs. delimited �  Fixed Length

�  "e data !elds measure a speci!c number of characters

�  Field 1 = 10 characters long

�  File layout is critical

�  Delimited

�  "e data !elds are separated by a common character or mark

�  Like a comma or tab

�  Always ask for “text delimited data,” which is easier to import than !xed length

Page 20: Data Journalism 101 - Part 1 by Michael J. Berens

Make a master copy

Page 21: Data Journalism 101 - Part 1 by Michael J. Berens

Keep a log

Page 22: Data Journalism 101 - Part 1 by Michael J. Berens

Delimited !le

Page 23: Data Journalism 101 - Part 1 by Michael J. Berens

Hands On - Hunting Database

Page 24: Data Journalism 101 - Part 1 by Michael J. Berens
Page 25: Data Journalism 101 - Part 1 by Michael J. Berens

Fixed width !le

Page 26: Data Journalism 101 - Part 1 by Michael J. Berens
Page 27: Data Journalism 101 - Part 1 by Michael J. Berens