sneak preview: local news engine (will perrin)

17
Local News Engine Less time, more scrutiny

Upload: datajournalismuk

Post on 17-Jan-2017

51 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Sneak preview: Local News Engine (Will Perrin)

Local News EngineLess time, more scrutiny

Page 2: Sneak preview: Local News Engine (Will Perrin)

Problem - more local scrutiny data than we can keep up with

1 - solo reporters/journalists/bloggers

2 - papers with (sadly) diminishing editorial staff

3 - civil society

4 - councils themselves

Page 3: Sneak preview: Local News Engine (Will Perrin)

The trends aren’t going to reverse

Massive increase in local accountability (devolution)

Increases in local data

People won’t scrutinise this stuff themselves

Armchair auditors didn’t work

Resources in media declining

Bad for communities, democracy

How can we do more scrutiny, with fewer resources

Page 4: Sneak preview: Local News Engine (Will Perrin)

Indexed by subject Indexed by name

Page 5: Sneak preview: Local News Engine (Will Perrin)
Page 6: Sneak preview: Local News Engine (Will Perrin)

Local News Engine

Prototype funded by Google Digital News Initiative – e50,000

AT LAST I CAN BUY IN PROPER CODERS

Open Data Services Co-operative – world class

Page 7: Sneak preview: Local News Engine (Will Perrin)

Pile up the newsworthy scrutiny dataMy patch covers parts of two central London boroughs - Camden (very large) and Islington (very small)

Data about building or altering houses, opening or changing pubs, bars and clubs, sex shops, gambling establishments, people due to be in court.

Camden planning applications - data store download

Camden commercial licensing - scraped

Islington planning applications - scraped

Islington commerical licening - scraped

Magistrates Court list (upcoming cases) - parsed from pdf to data

This is novel (we think)

Page 8: Sneak preview: Local News Engine (Will Perrin)

Datastore v scrapers – no contest

on the time that the scrapers take to run, and the range of data that’s included in them. In order to speed up the scrapers and to ensure that the data was comparable, we spun up some VMs on Google Compute Engine to run the scrapers.

Camden License: 38.4h runtime, data back to 2005

Camden Planning: 2 min runtime, data back to 2010

Islington License: 39.5h runtime, data back to 2006

Islington Planning: 16.2h runtime, data back to 2006

Page 9: Sneak preview: Local News Engine (Will Perrin)

Sort out the newsworthy people

By names - a newsworthy person appearing in a newsworthy data set could be newsworthy. (very) literally everyone who has been in the newspaper is newsworthy

Performed entity extraction on Camden New Journal and Islington Tribune, producing all the names of people and companies who had been in it.

Run geospatial search for all data with addresses in target area

Run list of 1,000-odd names from entity extraction as a search

Page 10: Sneak preview: Local News Engine (Will Perrin)

My Kings Cross patch covers bits of two London Boroughs

Page 11: Sneak preview: Local News Engine (Will Perrin)

Sort newsworthy places

By place - simple things happening in some places are news in their own right - eg a planning application or someone in court.

Users have wide definition of what is an interesting place - for some the whoel borough, others a particular ward/street

All the data has reasonable address information

Define area of interest by wards (for now this can be more precise to SOAs)

Perform geospatial search

Page 12: Sneak preview: Local News Engine (Will Perrin)

Data Issues - DPA exemptionsData is published by arms of government for public scrutiny. Special purposes exemption in DPA covers processing:

‘This exemption protects freedom of expression in journalism, art and literature (which are known as the ‘special purposes’).

The scope of the exemption is very broad and it can exempt from most provisions of the DPA, including subject access – but never principle 7 or the section 55 offence (unlawful obtaining etc of personal data).

However it does not give an automatic blanket exemption. In order for the exemption to apply:

the data must be processed only for journalism, art or literature,

it must be being processed with a view to publication,

you must have a reasonable belief that the publication is in the public interest, and

you must have a reasonable belief that compliance with the DPA is incompatible with journalism, art or literature.

Page 13: Sneak preview: Local News Engine (Will Perrin)

Data issues - access and licensing

Council data mainly had to be scraped - only one dataset in a modern data store.

Data therefore not licenced properly, asked council, they relaxed

Courts pdfs can be accessed by a journalist with reasonable reason. But each court varies. Courts info very sensitive - contains juveniles, cases with reporting restrictions etc. Must be handled with great care - contempt and no fault defamation.

British principle of open justice behind access, but poorly implemented.

Page 14: Sneak preview: Local News Engine (Will Perrin)

Issues and questions

Ethics (for citizens) – extension of journalism ethics as scrutiny becomes

Despite open data accessibility of local data is rubbish

Still requires good coding skills – ODS world class – code on Talk About Local Github

Court lists – Japanese puffer fish of data

Page 15: Sneak preview: Local News Engine (Will Perrin)

Sorting Criteria (emerging)Names

Broadly based on proper noun (‘named entity’) extraction from CNJ and Islington Tribune and people who crop up more than once.

‘A name will appear in the search results if ANY of its related entries match the search criteria. So, in the case of the SMITH record, Mrs Cherry Smith had a planning application in Caledonian ward in 2006 to cut down a tree, hence the match.

We’ve got a couple of ideas for solutions. One is to show on the result when the date of the most recent match is, another is to expose date UI. They are of differing complexities, though.

Areas

‘We try to get one or more locations associated with a result (eg address of defendant, location of crime, location of planning application) and if one or more of those locations is either in the postcode prefix list "N1", "N7", "WC1", "NW1" or the words “islington” or “camden” appear in a field that we think might contain a description of a location, it matches.’

Page 16: Sneak preview: Local News Engine (Will Perrin)
Page 17: Sneak preview: Local News Engine (Will Perrin)