improving data communication: 90+ tables to an api · 2018. 5. 3. · incorporate all dukes energy...

27
Vision and Objectives Improving data communication: 90+ tables to an API Hiren Bhimjiyani and Andreas Harding Data Science Team, BEIS GSS Conference 2017 Pioneers: On the forefront of Statistics and Data Science

Upload: others

Post on 14-Aug-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Improving data

communication: 90+

tables to an APIHiren Bhimjiyani and Andreas Harding

Data Science Team, BEIS

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Page 2: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Communication of data

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

PersonasExpert Analysts

Quick access toinformation

Information ForagersHigh level summary for decision making

Inquiring citizensVisually engaging summaries and visualisations

Page 3: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

APIsTextbook definition: Application Programming Interface

a set of functions and procedures that allow the creation of applications That access the features or data of an operating system, application, or other service.

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

What does this mean in practice for the BEIS API project: building an IT infrastructure that will allow people to access a structured database through some type of webpage GUI or access to the database directly through the URL string.

Page 4: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Web database access API

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

DU

KES

API

Oth

er DB

s

Page 5: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Benefits to statisticians

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Efficiency

Access to dataPrinciple 8: Accessibility

Future proofing

Make data open

18,000 APIs at programmableweb.com

AppsWider ReachPrinciple 1: Meet user needs

Page 6: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Benefits to users

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

API

Timely information,Potential real time

Target single cell

Direct access via Excel, R, Python…

Page 7: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

DUKESDigest of United Kingdom Energy Statistics (DUKES), is an essential source of energy information covering:

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Page 8: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

DUKESDUKES publication comprises:

• extensive set of tables• charts and commentary• a comprehensive picture of energy production, with key series taken back to 1970

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Page 9: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Project specificationIncorporate all DUKES energy tables into an API

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

• 90 tables, 19 years, around 400k cells • Complicated structure• Many standalone tables• No raw data to work with

Page 10: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

DUKES

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Page 11: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

The API

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

DUKES user interface to create an API query, and also to download datahttp://njs.analysisoncbas.co.uk/energy/data

Page 12: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Impact

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

• Capable: Closer working relations with statisticians in other teams

• Helpful: Over 100 people interacting with the API despite soft launch

• Professional: Interest from HOC library and external consultants

• Innovative: Makes BEIS data open and consumable

• Efficient: Foundation for automating reporting, generalised API

Page 13: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

“I’m convinced, how do I start?”

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Data cleaning and structure

Software

Hardware

Page 14: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Data cleaning

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Page 15: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Page 16: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Making variables unique

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Page 17: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Exception handling, 1998-2016

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

+#,##0\r;-#,##0\r;"-r"

#,##0\r;-#,##0\r;"-r"

+#,##0\r;-#,##0\r;" "

#,##0\r;-#,##0\r;" "

#,##0\r;-#,##0\r;"-r"

#,##0\r;-#,##0\r;"-"

#,##0"r" ;-#,##0 ;"- "

Cell formats

Page 18: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Software to clean the data

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

•VBA

•R

•Python/pandas

Page 19: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Data structure

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

The data needs to go into a structured database: pivoted, tidy, ordered

Page 20: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

The software and hardware

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Databasesoftware

Internet

Page 21: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Internet

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

InternetConnection

Domainwww….

Hosting solutionInternal/external

Page 22: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Software

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Use off-the shelf software: swirrl

Build your own. We used JavaScript

Client side (HTML5, CSS) for User interface if you want one

Page 23: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Software

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Page 24: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Hardware

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

Simple server

Page 25: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Other considerations

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

User interface• Prototyping• User Acceptance Testing

Data management• Regular updates• Managing changes

Security• API key• Open

Expertise• In-house• Paid for

Page 26: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

Conclusion

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

We should have a GSS API for all our published data

Page 27: Improving data communication: 90+ tables to an API · 2018. 5. 3. · Incorporate all DUKES energy tables into an API GSS Conference 2017 Pioneers: On the forefront of Statistics

Vision and Objectives

GSS Conference 2017Pioneers: On the forefront of Statistics and Data Science

[email protected]