martin stabe, interactive producer, financial times

43
The data workflow past, present, future Martin Stabe Financial Times News:Rewired May 27, 2011

Upload: joelmgunter

Post on 13-Dec-2014

1.376 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Martin Stabe, interactive producer, Financial Times

The data workflowpast, present, future

Martin StabeFinancial Times

News:Rewired May 27, 2011

Page 2: Martin Stabe, interactive producer, Financial Times
Page 3: Martin Stabe, interactive producer, Financial Times

“Computer assisted reporting”

• As the phrase suggests, harks to an era when computerised analysis was rare– History can trace to 1950s, esp elections

• Key examples from 1980s, esp US legal stories

– Bringing social science methods to journalism• Statistics• Polling• GIS• Social network analysis

Page 4: Martin Stabe, interactive producer, Financial Times

“Enterprise Joins”

• “Enterprise”– US journo jargon for a story between

‘off-diary’ and ‘investigative’

• “Join”– Database jargon for combining records from

two tables– Using common content to

locate common fields across tables– May be complex, using ‘lookup tables’

Page 5: Martin Stabe, interactive producer, Financial Times

“Enterprise Joins”

• In other words, finding stories by linking two datasets, esp those not originally intended to be linked

• Often centred on common geographical records used across government– Postcodes (very good in UK)– Statistical output areas– Administrative or electoral geographies

Page 6: Martin Stabe, interactive producer, Financial Times
Page 7: Martin Stabe, interactive producer, Financial Times
Page 8: Martin Stabe, interactive producer, Financial Times
Page 9: Martin Stabe, interactive producer, Financial Times
Page 10: Martin Stabe, interactive producer, Financial Times

“Interviewing data”

• Database queries are like questions to an interviewee

• Data can be a reluctant source. “Dirty” data: Artifacts of – data entry errors– Lack of coding conventions– Esoteric systems for storing stray data– Discrete collection

(eg local authorities, government departments)

Page 11: Martin Stabe, interactive producer, Financial Times
Page 12: Martin Stabe, interactive producer, Financial Times
Page 13: Martin Stabe, interactive producer, Financial Times
Page 14: Martin Stabe, interactive producer, Financial Times

Adding interactivity

• “Data is only useful if it is personal – I want to find out about schools in my area, restaurants near me and so on – or when it reveals something remarkable.”- Bella Hurrell

Page 15: Martin Stabe, interactive producer, Financial Times

“The canvas for CAR”

• “The Web is the canvas for CAR, better than any other platform we’ve come up with as an industry. It has every advantage that should be available to the CAR practitioners, including unlimited depth, the ability to customize or personalize and the luxury of designing a database so that it will truly be useful to readers. Some papers get this, or are beginning to realize it.” – Derek Willis

Page 16: Martin Stabe, interactive producer, Financial Times
Page 17: Martin Stabe, interactive producer, Financial Times
Page 18: Martin Stabe, interactive producer, Financial Times
Page 19: Martin Stabe, interactive producer, Financial Times

“A fundamental change”

• “Newspapers need to stop the story-centric worldview. … So much of what local journalists collect day-to-day is structured information: the type of information that can be sliced-and-diced, in an automated fashion, by computers. Yet the information gets distilled into a big blob of text -- a newspaper story -- that has no chance of being repurposed.”– Adrian Holovaty

Page 20: Martin Stabe, interactive producer, Financial Times
Page 21: Martin Stabe, interactive producer, Financial Times

The data workflow• Obtain data

– Open data releases– Advanced search– Screen scraping– Freedom of Information Act– APIs, Web

• Clean, analyse and warehouse data– Excel– Google Refine– Google Fusion Tables– Visokio Omniscope (or Tableau)– Stata (or SPSS, SAS, R)– ArcView (or other GIS tools)– MySQL (or other database manager)

• Publish Data– Google Fusion Tables– Static XML (via FTP)– Dynamic XML (via PHP)

• Parsed by ActionScript in Flash• Parsed by JavaScript

Page 22: Martin Stabe, interactive producer, Financial Times

The data workflow

• Visualising complex dataset– Bank debt exposure data

• Monitor site for updates• CSV source• Clean in Excel• Import to MySQL database• Generate SQL query• Publish XML• Parse with ActionScript• Publish with Flash

Page 23: Martin Stabe, interactive producer, Financial Times
Page 24: Martin Stabe, interactive producer, Financial Times

• Newsrewired\BIS_monitoring.PNG

Page 25: Martin Stabe, interactive producer, Financial Times
Page 26: Martin Stabe, interactive producer, Financial Times
Page 27: Martin Stabe, interactive producer, Financial Times
Page 28: Martin Stabe, interactive producer, Financial Times
Page 29: Martin Stabe, interactive producer, Financial Times
Page 30: Martin Stabe, interactive producer, Financial Times
Page 31: Martin Stabe, interactive producer, Financial Times
Page 32: Martin Stabe, interactive producer, Financial Times
Page 33: Martin Stabe, interactive producer, Financial Times

The data workflow: the future

• Shifted from static to dynamic output

• Next step is automating the input side– Source APIs– Web scraping– “The web as database”

Page 34: Martin Stabe, interactive producer, Financial Times
Page 35: Martin Stabe, interactive producer, Financial Times
Page 36: Martin Stabe, interactive producer, Financial Times

The data workflow: the future

• Shifted from static to dynamic output

• Next step is automating the input side– Source APIs– Web scraping– “The web as database”

• Adding social media on input and output– Crowdsourcing (Guardian MP expenses)– Games and viral promotion (NYT budget cutter)

Page 37: Martin Stabe, interactive producer, Financial Times
Page 38: Martin Stabe, interactive producer, Financial Times

The data workflow: the future

• Shifted from static to dynamic output

• Next step is automating the input side– Source APIs– Web scraping– “The web as database”

• Adding social media on input and output– Crowdsourcing (Guardian MP expenses)– Games and viral promotion (NYT budget cutter)

Page 39: Martin Stabe, interactive producer, Financial Times
Page 40: Martin Stabe, interactive producer, Financial Times
Page 41: Martin Stabe, interactive producer, Financial Times

Cleaning data

Page 42: Martin Stabe, interactive producer, Financial Times
Page 43: Martin Stabe, interactive producer, Financial Times

www.ft.com/interactive

www.martinstabe.com

[email protected]

@martinstabe