building intelligent data products

36
building intelligent data products

Upload: stephen-whitworth

Post on 19-Feb-2017

664 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building Intelligent Data Products

building intelligent data products

Page 2: Building Intelligent Data Products

what actually is fraud

architecting flexible data ‘plumbing’

building solid data products on top of them

Page 3: Building Intelligent Data Products

stephen whitworth

2 years at Hailo as data scientist/jack of some trades out of university

product and marketplace analytics, agent based modelling, data engineering, ‘ML’ services

data science/engineering at ravelin, specifically focused on our detection capabilities

Page 4: Building Intelligent Data Products

what is ravelin?

online fraud detection and prevention platform

stream application/server data to our events API

we give fraud probability + beautiful data visualisation

backed by techstars/passion/playfair/amadeus/indeed.com founder/wonga founder amongst other great investors

Page 5: Building Intelligent Data Products

fraud?

Page 6: Building Intelligent Data Products

$14Ba dollar for every year the universe has existed

Page 7: Building Intelligent Data Products

Same day delivery On-demand services

Page 8: Building Intelligent Data Products

‘victimless crime’

police ill-equipped to handle

low barrier to entry from dark net

3D secure - conversion killer

Page 9: Building Intelligent Data Products

traditional: human generated rules, born of deep expertise

order-centric view of the world

Page 10: Building Intelligent Data Products

hybrid: augment expertise by learning rules from data

cards don’t commit fraud, people do

Page 11: Building Intelligent Data Products

building good plumbing

Page 12: Building Intelligent Data Products

receive firehose through API

decode arbitrary data and store

extract hundreds of features

http/slack/whatever notification to customer

in 100-300ms (ish)

run through N models and rule engine to get probability

Page 13: Building Intelligent Data Products

BUZZWORDS ABOUND

go

postgres

AWS

microservices

zookeeper

NSQ python

event-driven

elasticsearch bigquery dynamodb

redis

Page 14: Building Intelligent Data Products
Page 15: Building Intelligent Data Products

instrumentation

Page 16: Building Intelligent Data Products

different databases for different needs

kudos if you get The Office reference

Page 17: Building Intelligent Data Products

postgres: solid, start here

dynamodb: very high throughput, low latency data

bigquery: to answer any question you could possibly have

elasticsearch: rich querying in a reasonable amount of time

graph db: haven’t decided, recommendations?

Page 18: Building Intelligent Data Products

asynchronous systemsfirehoses

nice deployment patterns

‘lambda architecture’ - the append only log

services store their own interpretation of events

services are almost entirely decoupled

Page 19: Building Intelligent Data Products

asynchronous systemsfirehoses

error propagation is challenging

no guarantees of SLA - at least as slow as your queue

hard to know who or what is consuming your data

Page 20: Building Intelligent Data Products

building data products

Page 21: Building Intelligent Data Products

‘a random forest is like a room full of experts who have seen different

cases of fraud from different perspectives’

Page 22: Building Intelligent Data Products

‘a random forest is like a room full of experts who have seen different

cases of fraud from different perspectives’

N

Page 23: Building Intelligent Data Products

precision: of all of my predictions, what % was I correct?

recall: out of all of the fraudsters, what % did I catch?

implicit tradeoff between conversion and fraud loss

‘accuracy’ a useless metric for fraud

Page 24: Building Intelligent Data Products

99.8% ACCURATE

Page 25: Building Intelligent Data Products
Page 26: Building Intelligent Data Products

keep model interfaces simple

hide arbitrarily complex transformations behind it

blend global and client specific models

Page 27: Building Intelligent Data Products

building and training statistical models

currently batch

will combine with online

Page 28: Building Intelligent Data Products

RANDOM FORESTS

Page 29: Building Intelligent Data Products

‘a random forest is like a room full of experts who have seen different

cases of fraud from different perspectives’

Page 30: Building Intelligent Data Products

RANDOM FORESTS

MONITORING

Page 31: Building Intelligent Data Products

probabilistic, not deterministic

dogfood - use live robot customers

run models in ‘dark mode’ to determine performance

Page 32: Building Intelligent Data Products

why not deep learning? ..yet

ability to debug random forests

had nice results with keras

Page 33: Building Intelligent Data Products

serialisation and deployment: an unsolved problem

Page 34: Building Intelligent Data Products

in beta and signing up clients

looking for on-demand services/marketplaces

talk to me afterwards

Page 35: Building Intelligent Data Products

obligatory: we are hiring!

senior machine learning engineers/data scientists

[email protected] or talk to me after

Page 36: Building Intelligent Data Products

@sjwhitworthwww.ravelin.com - @ravelinhq