just because you can doesn't mean that you should - thingmonk 2016

30
Just because you can doesn’t mean that you should Dr. Boris Adryan @BorisAdryan

Upload: boris-adryan

Post on 16-Apr-2017

646 views

Category:

Technology


1 download

TRANSCRIPT

Just because you can doesn’t mean that you should

Dr. Boris Adryan @BorisAdryan

The logarithmic history of things

Boris the Academic “Give me £50M and I build you the best IoT ontology money can buy.”

“I wonder if anyone is making money with IoT”

Talking about inflated expectations

“There may be

money in IoT”

“I’m going to get

rich with IoT”“I’m making a decent

salary with IoT”

The logarithmic history of things

Boris the Academic “Give me £50M and I build you the best IoT ontology money can buy.”

Boris the Freelancer “If you want to pay £5M for machine learning - make sure it isn’t rude or annoying.”

Boris at Zühlke “Don’t pay anyone £0.5M - I show you how we can do it for half.”

peanuts: “a spoon full”

How many peanuts are that on average?

0 50 100

“on average”

3 samples

Do I get more peanuts at Thing Monk or at Monki Gras?

0 50 100

“on average” thingmonk 3 samples

“on average” monkigras

Do I get more peanuts at Thing Monk or at Monki Gras?

0 50 100

“on average” thingmonk 4 samples

“on average” monkigras

Do I get more peanuts at Thing Monk or at Monki Gras?

0 50 100

“on average” thingmonk

n samples

“on average” monkigras

statistical power through large numbers of samples

deviation

Statisticians and data scientists LOVE larger sample sizes!

…but if sampling costs time and resources, we need a compromise.

precision and accuracy that can be achieved

theoretically

Sampling strategy

precision and accuracy that is needed to get

a job done

accurate and precise

not accurate, but precise

accurate, not precise

not what you want

Deployment patterns and analytics strategies to maximise profit

Dr. Boris Adryan @BorisAdryan

39% of survey participants are worried about the upfront investment for an industrial IoT solution.

“Why aren’t you doing IoT?”

•how to cut down on hardware costs •how to cut down on software costs

Sweetening IoT for your customerA few recommendations from the trenches:

insights from a project with OpenSensors

Westminster Parking Trial

https://www.westminster.gov.uk/new-trial-improve-conditions-disabled-drivers

IoT solution

Service company

~750 independent parking lots with a total of

>3,500 individual spaces

access to

Can we learn an optimal deployment and sampling pattern?

•sampling rate of 5-10 min •data over 2 weeks in May 2015 •overall 2.6 million data points

Can we make Ethos’ budget go further by • distributing a given number of sensors over

a wider geographic area? • lowering the sampling rate for better

battery life?

labour: expensive

sensor: cheap

Correlation and clustering

0

5

10

15

20

0 3 6 9 12

“correlated”

0

5

10

15

20

0 3 6 9 12

“anti-correlated”

0

5

10

15

20

0 3 6 9 12

“independent”

lorry

coach

car

bike

skateboard

hierarchical clustering on the basis of a feature matrix

Good news: temporal occupancy pattern roughly predicts neighbours

lots in Southampton

lots around the corner of each other

750 parking lots

A caveat: Is a high-degree of correlation a function of parking lot size?

finding two lots of 20 spaces that correlate

finding two lots of 3 spaces that correlate

0:00 12:00 23:59

0:00 12:00 23:59

“more likely”

“less likely”

Bootstrapping in DBSCAN clusters

Simulation: Swap the occupancy vectors between parking lots of similar size and test per grid cell if lots still correlate

Verdict: In some grid cells the level of the occupancy of one parking lot predicts the occupancy of most parking spaces.

x

x

x

x

x

x

x

xx x x

xxxx

x

Better for navigation

We suggested that about ONE THIRD of the sensors may be sufficient.

Better predictive power

Suggested technology for trials

A temporary survey would have allowed us to make the same recommendation, including the insight that the provided 5’ resolution is probably not required.

Monte Carlo simulations are great tools to assess the business value of IoT

base

assets

“A tour of my assets every Friday.”

base

‘cost function’: sum of all edges

p1(need today)

“A demand-driven tour of my assets.”

‘cost function’: sum of edges

needed in 7 days

p2(need today)

p3(need today)

p4(need today)

p5(need today)

p6(need today)

Hardware is often perceived as investment that customers understand and therefore anticipate the cost.

This talk is about unfounded IoT fears.

There’s an air of magic around data and analytics.

“My data problem must be special!”

✓ unstructured data

✓ distributed ingestion and storage

Or they believe from hear-say that IoT automatically requires:

✓ real-time analytics

✓ sophisticated machine learning

My company went to an IoT conference

& all I got was this t-shirt

and a bunch of buzzwords.

Customers fear costs because they’re thinking about:

“I need to do real-time analytics!”

microseconds to seconds

seconds to minutes

minutes to hours

hours to weeks

on device

on stream

in batch

am I falling? counteract

battery level should I land?

how many times did I

stall?

what’s the best weather for

flying?

in process

in database

operational insight

performance insight

strategic insight

e.g. Kalman filter

e.g. with machine learning

e.g. rules engine

e.g. summary stats

Can IoT ever be real-time?

zone 1:

real-time [us]

zone 2:

real-time [ms]

zone 3:

real-time [s]

Edge, fog and cloud computing

Edge Pro: - immediate compression from raw

data to actionable information - cuts down traffic - fast response

Con: - loses potentially valuable raw data - developing analytics on embedded

systems requires specialists - compute costs valuable battery life

Cloud Pro: - compute power - scalability - familiarity for developers - integration centre across

all data sources - cheapest ‘real-time’

option

Con: - traffic

Fog Pro: - same as Edge - closer to ‘normal’ development work - gateways often mains-powered

Con: - loses potentially valuable raw data

Options for real-time in cloudsome features can cost a bit, especially when you don’t really know what you’re doing and want to ‘try it out’.

a badly configured SMACK stack on your own commodity hardware can be slow and unreliable

your pre-trained classifier

My current pet hate: Deep Learning

Deep learning has delivered impressive results mimicking human reasoning, strategic thinking and creativity.

At the same time, big players have released libraries such that even ‘script kiddies’ can apply deep learning.

It’s already leading to unreflected use of deep learning when other methods would be more appropriate.

Dr. Boris Adryan @BorisAdryan

‣ Preliminary surveys, data analysis and simulation can help to minimise the number of sensors and develop an optimal deployment strategy and sampling schedule.

‣ Faster analytics on bigger and better hardware are not automatically the most useful solution.

‣ A good understanding on the type of insight that is required by the business model is essential.

Zühlke can advise on options around IoT and data analytics, and provide complete solutions where needed.

Summary