data proximity: simple solutions to complex data science problems

21
Data Proximity: Simple solutions to complex data science problems Jose A. Rodriguez-Serrano @bbvadata Ph.D. In Computer Science Lead Data Scientist at BBVA Data & Analytics

Upload: jose-a-rodriguez-serrano

Post on 22-Jan-2017

42 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Data Proximity: Simple Solutions to Complex Data Science Problems

Data Proximity:Simple solutions to complex

data science problems

Jose A. Rodriguez-Serrano@bbvadata

Ph.D. In Computer Science

Lead Data Scientist at BBVA Data & Analytics

Page 2: Data Proximity: Simple Solutions to Complex Data Science Problems

Data science Solving problems with data (and computers)

Page 3: Data Proximity: Simple Solutions to Complex Data Science Problems

Problem 1. Undoing a traffic jam

CC h

ttps:

//www

.flick

r.com

/pho

tos/

prou

st/

Page 4: Data Proximity: Simple Solutions to Complex Data Science Problems

Problem 2 :Where was each of these

pictures taken?

(GPS coordinates if possible)

Page 5: Data Proximity: Simple Solutions to Complex Data Science Problems

Problem 3: Forecast the next value of anything

?

Page 6: Data Proximity: Simple Solutions to Complex Data Science Problems

?How would you solve these 3 problems?

If you had to solve all the 3 problems atthe same time, would you think differently?

Page 7: Data Proximity: Simple Solutions to Complex Data Science Problems

They can all be addressedwith the same solution!

Dilemma:

Best solution for each problemvs.

1 acceptable solution for all the problems

Page 8: Data Proximity: Simple Solutions to Complex Data Science Problems

Sensor Sensor

Sensor

SensorSensor

Sensor

Sensors measure current traffic “state”

Page 9: Data Proximity: Simple Solutions to Complex Data Science Problems

Sensor Sensor

Sensor

SensorSensor

Sensor

Timestamp State Action that solved23/09/13

18:00[81 54 53 9 17 98 1 20

…]OPEN BUS GATE

25/09/13 08:54

[154 53 91 17 98 1 20 …]

DISPLAY ALT ROUTE

25/08/13 17:56

[23 87 65 87 24 89 89 …]

ALTER TRAFFIC LIGHT

28/08/13 20:00

[81 34 53 9 27 98 1 20 …]

DISPLAY EVENT INFO

Sensors measure current traffic “state”

(Large) Database ofTraffic Problems,

States, and Solutions

Next time: Find most similar traffic state, and apply registered action.

E.g. Mounce et al., A metric for pattern-matching applications to traffic management, Transportation Research C, 2010

Page 10: Data Proximity: Simple Solutions to Complex Data Science Problems

Geolocalizing images just with data

Geotagged image database(e.g… Flickr)

e.g. Hays, Efros, IM2GPS: estimating geographic information from a single image, CVPR 2006

Mos

t sim

ilar g

eota

gged

imag

es

Find

mod

e of

loca

tions

Page 11: Data Proximity: Simple Solutions to Complex Data Science Problems

Forecast

Page 12: Data Proximity: Simple Solutions to Complex Data Science Problems

Reasoning from “neighbor transfer”A design pattern to quickly build data science applications

1/ Find a similar situation in your data (neighbors)

2/ Take the solution/action/output that was registered

Page 13: Data Proximity: Simple Solutions to Complex Data Science Problems

Reasoning from “neighbor transfer”

Neighbor transfer is not new

Crucial enablers:

1/Lots of data

2/Good similarity measures

3/ Efficient search (HW & SW)

Page 14: Data Proximity: Simple Solutions to Complex Data Science Problems

Make things as simple as possiblebut not simpler

A. Einstein

Page 15: Data Proximity: Simple Solutions to Complex Data Science Problems

Vehicle pose recognition

Rodriguez, Larlus, Dai, Data-driven detection of prominent objects, IEEE Trans. PAMI, 2015

Page 16: Data Proximity: Simple Solutions to Complex Data Science Problems

Neighbor transfer… + deep learning = doableRippel et al., Metric learning with adaptive density discrimination, ICLR 2016

Page 17: Data Proximity: Simple Solutions to Complex Data Science Problems

Why should I adopt that?

Page 18: Data Proximity: Simple Solutions to Complex Data Science Problems

When there’s a lot of data, sometimes simple solutions work well.

With big data, sometimes it’s even difficult to beat the simple methods

Page 19: Data Proximity: Simple Solutions to Complex Data Science Problems

Technical Debt Matters

This method is generic, and easy to maintain.

Page 20: Data Proximity: Simple Solutions to Complex Data Science Problems

Any programmer can implement it.

Page 21: Data Proximity: Simple Solutions to Complex Data Science Problems

We think often about scaling to lots of data,

Should we start thinking about scaling to lots of problems?