Our Data Ourselves: MobileMiner

20 Young coders from Young Rewired State were issued with Android smartphones.

Developed MobileMiner together, an app that records the behaviour of other apps.

Return their data at hack-days.

Discuss their attitudes to privacy before and after confronting them with their data.

OpenCellID and OSM are croud-sourced

Full GPS is too invasive, and consumes power.

Avoid use of Google location API.

OpenCellId provides locations of (many) cell towers.

NervousNet from ETHZ:

NervousNet hub mobile app polls various physical sensors at a user-defined rate.

Data is pushed to one or more remote proxies.

Outputs of sensors combined into virtual sensors.

Small custom deployment at CCC Congres.

Crowd Sourcing: Brand Disambiguation

Search YouTube for apple and blackberry.

Some videos will compare the firms and their prooducts.

Many will be about pies, jam, crumbles, etc...

Crowd Sourcing: Brand Disambiguation

Can discriminating between the two be easily automated?

Does the word cook in video descriptions imply they're about pies?

Not really!

What about slice of pie?

Better, but even that gets used as an amusing metaphor for market-share sometimes:

Crowd Sourcing: Brand Disambiguation

A machine-learning approach? Natural language processing?

Much of machine learning is supervised:

Train a model with labeled gold-standard examples.

Test the model by using it to classify more labled examples.

Use the model to classify unlabeled examples in production.

Use crowd sourcing to get labelled training/test examples.

Crowd Sourcing: Alternative Platforms

See the GATE NLP crowd sourcing tutorial:

Amazon Mechanical Turk is now US only.

CrowdFlower is available, sign-up is free. (

Cannot launch tasks with no payment.

SurveyMonkey has a free trial, but surveys are not true crowd sourcing. (

Proflific is geared towards research, no fee trial. (

CrowdCrafting is entirely free, but tasks must be unpaid. (


Sign up and create a new project.


Tasks can be populated by media from cloud storage.


Various task templates are available.


Can import from YouTube playlists and Twitter searches.


Pastebin can be used to host.csv files easily.


Editing tasks can be straight-forward...


...sometimes more advanced editing is needed.

A Crowd Sourcing Task...

In groups, think of a crowd-sourcing task that would be of genuine use or interest.

How (in)tractable would your task be to automation?

How would you get data to populate the task?

How would you find workers to complete the task?

Could the task be framed such that there are non-financial incentives?


Our Data, Ourselves at PyData 2015:

