primer: data-driven startups

43
Primer: Data-Driven Startups Digital Incubation Centre, Ministry of Transportation and Communications Doha, Qatar Heather Leson March 9, 2016

Upload: humanitarian-openstreetmap-team

Post on 17-Jan-2017

2.773 views

Category:

Small Business & Entrepreneurship


0 download

TRANSCRIPT

Page 1: Primer: Data-Driven Startups

Primer: Data-Driven StartupsDigital Incubation Centre, Ministry of Transportation and CommunicationsDoha, QatarHeather Leson March 9, 2016

Page 2: Primer: Data-Driven Startups
Page 3: Primer: Data-Driven Startups

Data Examples

Page 4: Primer: Data-Driven Startups
Page 5: Primer: Data-Driven Startups
Page 6: Primer: Data-Driven Startups

Cultural: Data about cultural works and artefacts — for example titles and authors — and generally collected and held by galleries, libraries, archives and museums.

Science: Data that is produced as part of scientific research from astronomy to zoology.

Finance: Data such as government accounts (expenditure and revenue) and information on financial markets (stocks, shares, bonds etc).

Statistics: Data produced by statistical offices such as the census and key socioeconomic indicators.

Weather: The many types of information used to understand and predict the weather and climate.

Environment: Information related to the natural environment such presence and level of pollutants, the quality and rivers and seas.

Transport: Data such as timetables, routes, on-time statistics.

Types of Open Data

(Source: okfn.org)

Page 7: Primer: Data-Driven Startups

Kasra and QCRI: Connecting Startups & Research

Page 8: Primer: Data-Driven Startups

Metis:

Collaborating with CMU to get data working within the privacy/security guidelines

Academic Planning Made Easier.

Page 9: Primer: Data-Driven Startups

Mumm:

Connecting with the local Cairo data science community.

Data for food.

Page 10: Primer: Data-Driven Startups

Exantium:

Strategy firm connecting open data to government and business. Part of a global network.

Page 11: Primer: Data-Driven Startups

Data-Driven Recipes

Page 12: Primer: Data-Driven Startups

1. How to:

Technical Training/Business for Data Literacy

Page 13: Primer: Data-Driven Startups

2. How to:

Host a Data Expedition

Page 14: Primer: Data-Driven Startups

StorytellerRole: Generate Ideas, interesting questions, help defining the questions and assist in the information products/story outputs.

ScoutRole: Scouts hunt down data from across the web. They can be non-technical or technical, depending on how difficult it is to obtain data (whether it is easily downloadable or needs to be scraped etc).

AnalystRole: Analysts are the ones who crunch the data found by the scouts and test the hypotheses generated by the storytellers.

“Engineers” (Optional)Role: create information outputs (varying degrees of technical from coding to using ‘off the shelf’ tools

DesignersRole: Beautify the outputs and make sure the story really comes through the data.

Page 15: Primer: Data-Driven Startups

3. How to:

Data Clinics to connect entrepreneurs, business and government

Page 16: Primer: Data-Driven Startups

Data Discovery

Page 17: Primer: Data-Driven Startups

DIY Data:

BQ Magazine’s Faces of Qatar

Page 18: Primer: Data-Driven Startups

DIY Data:

QCRI Social Computing

Groundtruth Data Collection

Phones, photos and food consumption for Health Monitoring

Page 19: Primer: Data-Driven Startups

You are a Smart City: Create a local map dataset

Page 20: Primer: Data-Driven Startups

Data Pipeline

Page 21: Primer: Data-Driven Startups

Qatar Data Expedition

Page 22: Primer: Data-Driven Startups

What are the questions you seek to answer?What is the license? Can you reuse/publish the data?Is the source credible? Is the data credible?Where did they get their data?How much time do I have to search?How am I organizing my research?

Keen to learn more about verification? http://verificationhandbook.com/ (it is in Arabic too!)

Consider

Page 23: Primer: Data-Driven Startups

Who is publishing about Qatar...on biodiversity?

United States 7,440 occurrences, 97.77% geo-referenced.

United Kingdom 832 occurrences, 8.29% geo-referenced.

Sweden 620 occurrences, 0.32% geo-referenced.

Netherlands 298 occurrences, 5.03% geo-referenced.

Source: Global Biodiversity Information Facility

Page 24: Primer: Data-Driven Startups

What about data on tourism?

Source: Knoema Data Atlas, which aggregates the World Development Indicators, 2015

$6, 616,000,000 USD International Tourism expenditures for travel items

(Time for more boutique travel startups)

Page 26: Primer: Data-Driven Startups

Location Data

OpenStreetMap: Free, open Dataset

Get data: http://planet.osm.org/

GADM: Administrative Boundaries

Bing Imagery

Page 27: Primer: Data-Driven Startups

Ministry of Development Planning and StatisticsIn economic statistics:

Quarterly and annual Gross Domestic Product -GDP (constant and current) by economic activity

Monthly, quarterly and annual Consumer Price Index, Production Price Index-PPI, Foreign Trade Statistics (import and export), Building permits

In social statistics:Labor force statistics (through a labor force sample survey)Marriage, health, birth, fertility, education, disability, mortality statistics (in coordination with

other ministries)In environmental statistics:Monthly rainfall, Monthly and annual average concentrations of air pollutants, Capacities of

urban wastewater treatment plantsIn population statistics: Population growth rate, Population sex ratio

Page 28: Primer: Data-Driven Startups

QALM portal (Qatar Information Exchange)

QALM is an ambitious national project, developed by a number of government partners including: The General Secretariat for Development Planning, The Statistics Authority, The Supreme Council of Health, The Supreme Education Council, Supreme Council of Family Affairs, ictQATAR, Ministerial Cabinet and the Permanent Population Committee.

http://www.qalm.gov.qa/

Data is available in multiple formats!

To get data from the Ministry of Development. Check their website. If you are looking for other data, they are an email away. [email protected]

Page 29: Primer: Data-Driven Startups

Using Data

Page 30: Primer: Data-Driven Startups

Learn how: http://datadrivenjournalism.net/

Page 31: Primer: Data-Driven Startups
Page 32: Primer: Data-Driven Startups

"Expenditure Components Of GDP at Current Prices (Mn Qatari Riyal)Source - Ministry of Development Planning and Statistics

"

"",""," ",,,,,,,,,,,,,,,,,,"","","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013",,,,,"2014",,,,"","","Total","Total","Total","Total","Total","Total","Total","Total","Total","Q1","Q2","Q3","Q4","Total","Q1","Q2","Q3","Q4","Total""Gross Domestic product","B.1G",115512.376669,162091.018049205,221610.304141365,290151.574403828,419582.826273579,355986.474251774,455445,618089.239045503,692654.670488044,186654.189573065,177830.420532429,185433.336051801,189857.929208376,739776,193880.888003083,189653.51105388,193080.129441538,194397.657502752,771013.233251822"Household Final Consumption Expenditure","P.3a",20166,25889.8602243444,36186.326795032,49728.6119489121,64675.8351579253,68622.9919301139,73645.7899114015,79905.6820538706,87682.19979384,24130.4586981125,24802.4947262859,23572.4447936237,26368.9939206421,98874.3921386642,26807.1948166319,27414.3657651239,26424.7106136522,28729.6901996358,109375.961395044"Government Final Consumption Expenditure","P.3b",15094,23171.9888517611,32616.2047008325,35989.9119915317,42695.8750950427,55652.33697478,63689.0870608494,77007.4825664626,89527.4435418714,24336.9460716118,24384.7648280038,24240.4862291342,25297.5589689309,98259.7560976807,26593.3225341388,26861.3831859924,27030.5661941075,27714.3396569197,108199.611571158"Gross capital formation","P.5",36399.044558,55609.5389690997,92830.0390858622,133518.050463385,172523.116020611,152947.14534688,142449.123027749,177621.474425169,194347.357152333,49488.7848033409,49657.1609781394,58089.4050290433,60871.3763188034,218106.851763655,53389.3706523124,58868.7621027634,67296.8526337788,77579.6276461965,257731.66028562"Exports (Goods & Services)-F.O.B","P.6",74122.332111,105496.630004,139210.733559638,174896,257467,182033,283832,442959.8,520182,141152,131890,134332,131751,539125,146457,134748,131592,116481,528682"Imports (Goods & Services)-F.O.B","P.7",-30269,-48077,-79233,-103981,-117779,-103269,-108171,-159405.2,-199084.33,-52454,-52904,-54801,-54431,-214590,-59366,-58239,-59264,-56107,-232976

"*Figures for 2013 & 2014 are Preliminary estimatesPowered by © QALM"

Census data extracted...not usable yet..

Page 33: Primer: Data-Driven Startups

Qatar Census

(Source: Doha News 2016)

Page 34: Primer: Data-Driven Startups

South African Census Data

Page 35: Primer: Data-Driven Startups

Open Refine http://openrefine.org/

Sublime Text https://www.sublimetext.com/

There are many tools for software developers and data scientists too.

Note: you still need the Human API to analyze and make decisions for your business. Of course, if you can afford it, then you can get your business intelligence from KPMG, Gartner, Bloomberg, McKinley or PWC. Until then….

Some tools to Clean Datasets

Learn more with Lillian and her online courses.

Page 36: Primer: Data-Driven Startups

Tools for Charts, Graphs and Infographics

http://tableau.com/

http://infogr.am/

http://piktochart.com/

https://www.canva.com/

More LMGTFY: http://www.creativebloq.com/design-tools/data-visualization-712402

(source: TuktukDesign, Noun Project ccby)

Page 37: Primer: Data-Driven Startups

Map toolsMapbox: http://mapbox.com/CartoDB: http://academy.cartodb.com/Leaflet: http://leafletjs.com/Google: https://www.google.com/mapmakerARCgis: https://www.arcgis.com/features/

Time mapper: http://timemapper.okfnlabs.org/

Also: if you are collecting your own location data, try Field Papers or crowdsource map photos with Mapillary. (They just got 8M funding!)

(source: Mister Pixel, Noun Project, ccby)

Page 38: Primer: Data-Driven Startups

QCRI Combining Data Sources: Real-Time Traffic Monitoring

● Collection and classification of traffic related tweets (script, research tool)

● Continuous Real-time querying of Google Traffic API

● Qatar Traffic Profiling & Modeling○ Geo: City, zone, district○ Time: Hourly, daily, weekly,

and monthly

● Usage: ○ Detection of abnormal

behaviors○ Predictions○ Monthly Public reports

■ Commute status■ Deadpoints

Page 39: Primer: Data-Driven Startups

The best way to learn is to find data and make data information products.

Try to recreate the diagrams and track back the data.

Track how other startups use data. Copy. Remix.

Page 40: Primer: Data-Driven Startups

Social Entrepreneurship & Social Good

Page 42: Primer: Data-Driven Startups

ABC: Always be Charging

How can you have a Data-Driven Career?

What is your Data Plan for your startup?

Can you use Data-Driven Journalism techniques to improve your business?

What kind of data do you need to grow your business?

What type of training do you want/need?

Page 43: Primer: Data-Driven Startups

Thank you

@heatherleson@qatarcomputing