expanding open data horizons with r and rstudio

17
ottawa.theodi.org @ODI_Ottawa Rob Davidson

Upload: r-kor

Post on 16-Apr-2017

87 views

Category:

Data & Analytics


3 download

TRANSCRIPT

ottawa.theodi.org@ODI_OttawaRob Davidson

nodes-map.theodi.org

ottawa.theodi.org

● Define problem or question

● Get the data

● Clean the data

● Explore the data

● Analyze the data

● Communicate results

Data Process

ottawa.theodi.org

● Define problem or question

● Get the data

● Clean the data

● Explore the data

● Analyze the data

● Communicate results

Data Process

ottawa.theodi.org

● Seoul Subway Data

● 2015 Canadian Federal Election

● Weather Impact on Ottawa Cycling

● Ottawa 311 Data

● Ottawa Crime Statistics

Case study code in Github: https://github.com/robscottd/OpenDataInAction

Case Studies

ottawa.theodi.org

Get the Data

The “Good” of getting open data:

Centralized government repositories

Multiple standard formats

The “Bad” of getting open data:

Data is too “clean”, value scrubbed out

Rate-limited/complex APIs

ottawa.theodi.org

Get the Data Using R

R basics: read.table, read.csv, read.csv2

Packages: readr, rvest, RSelenium, readxl, rjson

Using APIs: twitteR, httr, jsonlite

ottawa.theodi.org

Clean the Data

Tidy the data! Follow Hadley Wickham’s method

Address extreme outliers

Explore outliers graphically (Shiny Gadgets example)

Address missing values

Imputation through MICE, missForest, Hmisc

Or remove the incomplete observations

ottawa.theodi.org

Explore the Data: But why not just dive in?

No assumptions, “listen” to the data

Understand data properties

Find patterns in the data

Discover analysis strategies

Begin the visual narrative

ottawa.theodi.org

Explore the Data

Graph it

Map it (leaflet)

Network graph it (networkD3,igraph)

Heat map it (d3heatmap,heatmaply)

ottawa.theodi.org

Communicate Results

Complete the narrative - what is the story?

Design with audience in mind

Share your process

Publish for easy access and feedback

If possible, provide link to data

ottawa.theodi.org

Communicate Results

Shiny

Bookdown

Notebooks

ottawa.theodi.org

Case Study

Example:2015 Canadian Federal Election

http://bit.ly/1RJ8nMi

ottawa.theodi.org

It just might not tell you what you want to hear

● Survey design is biased● Sensors are not identically calibrated● Merged datasets are not temporally aligned

Data Does Not Lie

ottawa.theodi.org

● Data for Good Ottawa● Open Data Ottawa● Datafest Ottawa

Sample of Ottawa Open Data Groups

ottawa.theodi.org

● Federal Government● Provincial Government● Municipal Government

Canadian Open Data Consultations

Rob Davidson@[email protected]