improving data communication: 90+ tables to an api · 2018. 5. 3. · incorporate all dukes energy...
TRANSCRIPT
Vision and Objectives
Improving data
communication: 90+
tables to an APIHiren Bhimjiyani and Andreas Harding
Data Science Team, BEIS
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Vision and Objectives
Communication of data
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
PersonasExpert Analysts
Quick access toinformation
Information ForagersHigh level summary for decision making
Inquiring citizensVisually engaging summaries and visualisations
Vision and Objectives
APIsTextbook definition: Application Programming Interface
a set of functions and procedures that allow the creation of applications That access the features or data of an operating system, application, or other service.
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
What does this mean in practice for the BEIS API project: building an IT infrastructure that will allow people to access a structured database through some type of webpage GUI or access to the database directly through the URL string.
Vision and Objectives
Web database access API
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
DU
KES
API
Oth
er DB
s
Vision and Objectives
Benefits to statisticians
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Efficiency
Access to dataPrinciple 8: Accessibility
Future proofing
Make data open
18,000 APIs at programmableweb.com
AppsWider ReachPrinciple 1: Meet user needs
Vision and Objectives
Benefits to users
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
API
Timely information,Potential real time
Target single cell
Direct access via Excel, R, Python…
Vision and Objectives
DUKESDigest of United Kingdom Energy Statistics (DUKES), is an essential source of energy information covering:
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Vision and Objectives
DUKESDUKES publication comprises:
• extensive set of tables• charts and commentary• a comprehensive picture of energy production, with key series taken back to 1970
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Vision and Objectives
Project specificationIncorporate all DUKES energy tables into an API
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
• 90 tables, 19 years, around 400k cells • Complicated structure• Many standalone tables• No raw data to work with
Vision and Objectives
DUKES
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Vision and Objectives
The API
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
DUKES user interface to create an API query, and also to download datahttp://njs.analysisoncbas.co.uk/energy/data
Vision and Objectives
Impact
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
• Capable: Closer working relations with statisticians in other teams
• Helpful: Over 100 people interacting with the API despite soft launch
• Professional: Interest from HOC library and external consultants
• Innovative: Makes BEIS data open and consumable
• Efficient: Foundation for automating reporting, generalised API
Vision and Objectives
“I’m convinced, how do I start?”
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Data cleaning and structure
Software
Hardware
Vision and Objectives
Data cleaning
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Vision and Objectives
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Vision and Objectives
Making variables unique
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Vision and Objectives
Exception handling, 1998-2016
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
+#,##0\r;-#,##0\r;"-r"
#,##0\r;-#,##0\r;"-r"
+#,##0\r;-#,##0\r;" "
#,##0\r;-#,##0\r;" "
#,##0\r;-#,##0\r;"-r"
#,##0\r;-#,##0\r;"-"
#,##0"r" ;-#,##0 ;"- "
Cell formats
Vision and Objectives
Software to clean the data
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
•VBA
•R
•Python/pandas
Vision and Objectives
Data structure
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
The data needs to go into a structured database: pivoted, tidy, ordered
Vision and Objectives
The software and hardware
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Databasesoftware
Internet
Vision and Objectives
Internet
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
InternetConnection
Domainwww….
Hosting solutionInternal/external
Vision and Objectives
Software
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Use off-the shelf software: swirrl
Build your own. We used JavaScript
Client side (HTML5, CSS) for User interface if you want one
Vision and Objectives
Software
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Vision and Objectives
Hardware
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
Simple server
Vision and Objectives
Other considerations
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
User interface• Prototyping• User Acceptance Testing
Data management• Regular updates• Managing changes
Security• API key• Open
Expertise• In-house• Paid for
Vision and Objectives
Conclusion
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science
We should have a GSS API for all our published data
Vision and Objectives
GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science