linking uk government data, john sheridan
DESCRIPTION
Keynote Präsentation von John Sheridan bei der OGD2011 Konferenz am 16. Juni 2011 in Wien: Linking UK Government Data (englisch).TRANSCRIPT
John Sheridan Linked Data lead for data.gov.uk Head of Legislation Services at The [UK] National Archives
2
3
4
16. GOVERNMENT TRANSPARENCY
The Government believes that we need to throw open the doors of public bodies, to enable the public to hold politicians and public bodies to account. We also recognise that this will help to deliver better value for money in public spending, and help us achieve our aim of cutting the record deficit. Setting government data free will bring significant economic benefits by enabling businesses and non-profit organisations to build innovative applications and websites.
We will require public bodies to publish online the job titles of every member of
staff and the salaries and expenses of senior officials paid more than the lowest salary permissible in Pay Band 1 of the Senior Civil Service pay scale, and organograms that include all positions in those bodies.
We will ensure that all data published by public bodies is published in an open
and standardised format, so that it can be used easily and with minimal cost by third parties.
5
6
7
Formats for people Focused on presentation or typographic layout Look good, but hard to access the underlying data
Formats for machines Focused on data interchange between computers Look dreadful, hard for people to understand but easy to import into other systems and use
8
Single source of
data
Formats for people Focused on presentation or typographic layout
Formats for machines Focused on data interchange between computers
Download Good for static information
Small files
Used for export/import
Easy for publishers
Most of the data registered on data.gov.uk
Programmatic access Good for dynamic or real-time information or very large datasets
Lets developers select and use just the information they need
Retains more control for the publisher
More complicated to implement but much more powerful
Vital for many useful datasets
9
10
He also developed the first industrially practical screw-cutting lathe in 1800, allowing standardisation of screw thread sizes for the first time. This allowed the concept of interchangeability (a idea that was already taking hold) to be practically applied to nuts and bolts. Before this, all nuts and bolts had to be made as matching pairs only. This meant that when machines were disassembled, careful account had to be kept of the matching nuts and bolts ready for when reassembly took place.
http://en.wikipedia.org/wiki/Henry_Maudslay
In 1841, Joseph Whitworth created a design that, through its adoption by many British railroad companies, became a national standard for the United Kingdom called British Standard Whitworth. During the 1840s through 1860s, this standard was often used in the United States and Canada as well, in addition to myriad intra- and inter-company standards. .
http://en.wikipedia.org/wiki/Screw_thread#History_of_standardization
* make your stuff available on the Web (whatever format) under an open licence
** make it available as structured data (e.g., Excel instead of image scan of a table)
*** use non-proprietary formats (e.g., CSV instead of Excel)
**** use URIs to identify things, so that people can point at your stuff
***** link your data to other data to provide context
13
14
Give names, or web identifiers (URIs), to things
Publish information about them as Web Resources
Use RDF triples (subject, property, value) Link to other data about those things
15
Enables web-scale data publishing - distributed publication with web-based discovery mechanisms
Everything is a resource – follow your nose to discover more about properties, classes, or codes within a code list
Everything can be annotated - make comments about observations, data series, points on a map
Easy to extend - create new properties as required, no need to plan everything up-front
Easy to merge - slot together RDF graphs, no need to worry about name clashes
16
developing standards for responsible publishing of key types of data (financial data, organisation data, aggregate statistics, location data)
developing guidance, practices and tools that make it easy to publish data in Linked Data form, at low cost
making it easy for people to consume data in a programmatic way
2008 2009 2010
A 1,345 1,456 2,301
B 2,112 3,543 2,111
C 2,345 2,987 2,455
D 6,342 6,256 6,123
E 7,435 7,432 8,102
Transaction Date Supplier Amount
A-1263 09/09/2010 Spottiswoode & Co £ 2,345
A-1264 09/09/2010 JSB & Sons £ 2,111
A-1265 09/09/2010 BLG Ltd £ 2,455
A-1266 09/09/2010 Spottiswoode & Co £ 6,123
A-1267 09/09/2010 BLG Ltd £ 8,102
Director General
Director (Operations)
Director (Strategy)
Deputy Director (A)
Deputy Director (A)
URI = uniform resource identifier Everything starts HTTP – which gives us
actionable names There is choice about how to make URIs We are using {sector}.data.gov.uk/id/{something}
20
21
If you visit legislation.gov.uk you will see we have taken great care with naming things
23
Returns an html document for United Kingdom Public General Act (ukpga), 2005, Chapter 14, Section 1
Returns an html document with a list from all legislation types where the title contains “wildlife”
UK Public General Act (ukpga) 1981 Chapter 69 Section 5 As it extends to England As it stood on 30th January 2001 Displayed as an HTML document with the timeline
on Although URIs are opaque having this type of
design changes how people use the service 24
25
Everything on legislation.gov.uk is available as open data under the terms of our Open Government Licence
To access the data, visit any page and add: /data.xml
/data.rdf
/data.xht
For lists /data.feed
26
Re-use where we can, create where we must Small, high level, light weight vocabularies
Examples include datacube, organization, provenance
Create local specialisations
Examples include payments, central-government
Post hoc linking
27
28
qb:ComponentSpecification qb:componentRequired : boolean qb:componentAttachment : rdfs:Class qb:order : xsd:int
qb:ComponentProperty
qb:DimensionProperty
qb:AttributeProperty
qb:MeasureProperty
qb:CodedProperty sdmx:ConceptRole
skos:ConceptScheme
qb:codeList
qb:concept
qb:DataSet
qb:Slice
qb:slice
qb:Observation
qb:observation
qb:dataset
qb:structure
qb:SliceKey
qb:sliceStructure
qb:DataStructureDefinition
qb:sliceKey
sdmx:FrequencyRole sdmx:CountRole sdmx:EntityRole sdmx:TimeRole ...
sdmx:Concept
sdmx:CodeList
qb:componentProperty
qb:measureType
skos:Concept
qb:dimension qb:attribute qb:measure qb:componentProperty
qb:subSlice
29
qb:slice
PaymentDataset
Payment
ExpenditureLine Purchase
qb:dataset
foaf:Agent
payer
payee
payment
expenditureLine
interval:Interval date
skos:Concept
expenditureCode
amountIncludingVAT
amountExcludingVAT
vatCategory
vatRate
order
invoice
contract
transactionReference
paymentReference
totalAmountIncludingVAT
purchase
skos:Concept
narrative
ItemCategory
foaf:Agent
org:OrganizationalUnit unit
qb:structure
redacted
capital
revenue
procurementCategory
Item
skos:Concept
item totalAmountExcludingVAT
*new* Government Linked Data Working Group
Provenance Working Group
31
http://reference.data.gov.uk/id/day/2011-06-16
http://reference.data.gov.uk/id/department/CO
http://transport.data.gov.uk/id/station/WAT
http://education.data.gov.uk/id/school/341451
http://location.data.gov.uk/id/3245677362123
http://www.legislation.gov.uk/id/ukpga/2009/12/section/2
http://reference.data.gov.uk/id/day/2011-06-1 There are similar URIs for seconds, minutes,
hours, weeks, months, quarters, years We were a bit slow (170 years) to move from the
Julian to Gregorian Calendar (see the Calendar Act, 1750)
To transition, we lost 11 days in 1752 Convoluted explanation of why the tax year in
the UK starts on the 6th April Our URIs for time intervals work this way too
and the British time intervals URI Set is linked to the legislation
34
Malcolm Gladwell article on Ron Popeil from 2000 in the New Yorker:
”And how do you persuade people to disrupt their lives? Not merely by ingratiation or sincerity, and not by being famous or beautiful. You have to explain the invention to consumers - not once or twice but three or four times, with a different twist each time. You have to show them exactly how it works and why it works, and make them follow your hands as you chop liver with it, and then tell them precisely how it fits into their routine, and, finally, sell them on the paradoxical fact that, revolutionary as the gadget is, it's not at all hard to use.”
36
37
38
39
Open Standard Generic approach for creating APIs from
Linked Data Sits on top of a Linked Data store Several implementations, most mature is
Puelia
40
41
42
43
44
We will require public bodies to publish online the job titles of every member of staff and the salaries and expenses of senior officials paid more than the lowest salary permissible in Pay Band 1 of the Senior Civil Service pay scale, and organograms that include all positions in those bodies.
October 2010 CSV template and PDFs of organograms,
typically authored using Powerpoint Emphasis on visual appearance, led to
inconsistent datasets which are very hard to re-use
No relationship between the organogram and data
Not using web standards
46
“The Government has published
the most comprehensive
organisational charts of the UK
Civil Service ever released online,
taking another step towards its
goal of being the most transparent
government in the world and
opening up the structure of the
Civil Service to public scrutiny”
100s of UK Government Organisations have published their organisation data as Linked Data
Distributed data publishing It the largest number of organisations joining the Web
of Linked Data in a single day! The data is deeply linked (Departments, Grades ,
Professions, date of the snapshot) Cross dataset queries are perhaps the most
interesting Proves Linked Data is moving from research topic to
commodity publishing We can now extend this approach to other types of
dataset and link our transparency data
49
Make it as simple as possible for people in Departments to create Linked Data
Create high quality, consistent data that matches the policy intent and guidance
Distributed capture and publishing Create open data in open standards using open source
tools Human readable and machine readable from single source Provide download and API access in different formats
(CSV, XML, JSON, RDF, HTML) Evolutionary route to create longitudinal datasets,
reconciling against previous data Enable everyone to publish 5 Star Linked Data
50
Capture organisation data using a spreadsheet, which verifies policy rules and datatypes
Upload spreadsheet Preview organogram Download RDF and two CSVs Publish on your website and register with
data.gov.uk
51
It’s the tool most Civil Servants have This *does* also work in Libre Office / Open
Office etc
52
53
54
55
5. Create RDF
57
Organogram (PHP)
Sesame RDF Store
Senior CSV
Junior CSV
XLWrap
TDB
Linked Data API
Mapping TRiG
Excel file
RDF file API Config
Organogram HTML, CSS &
JavaScript
1. Upload Excel
2. Create CSVs
3. Create Mapping
4. Query (SPARQL)
6. Load RDF
7. Query (SPARQL)
JSON XML HTML
Reconciliation
Implicit properties are made explicit (person, role, person in a role)
Reconciliation adds value by automatic linking to other data
Provenance Example data Explicit open licence
60
Linked Data is essential to realising the promise of Open Government Data
Using Linked Data means working on Standards
Reference Data
Production
Publishing Lots of opportunities for international
collaboration Best advice, just start
email: john@johnlsheridan Twitter: @johnlsheridan Skype: johnlsheridan