deep learning for fraud detection

77
© 2014 MapR Technologies 1 © 2014 MapR Technologies Deep Learning for Fraud Detection

Upload: hadoop-summit

Post on 17-Jan-2017

1.072 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Deep Learning for Fraud Detection

© 2014 MapR Technologies 1© 2014 MapR Technologies

Deep Learning for Fraud Detection

Page 2: Deep Learning for Fraud Detection

© 2014 MapR Technologies 2

Contact Information

Ted DunningChief Applications Architect at MapR Technologies

Committer & PMC for Apache’s Drill, Zookeeper & othersVP of Incubator at Apache Foundation

Email [email protected] [email protected]

Twitter @ted_dunning

Page 3: Deep Learning for Fraud Detection

© 2014 MapR Technologies 3

Goals for Today• Explore the state of the art for deep-learning and fraud detection

• Separate at least some of the wheat from the chaff

• Provide some realistic guidance for getting results

Page 4: Deep Learning for Fraud Detection

© 2014 MapR Technologies 4

Goals for Today• Explore the state of the art for deep-learning and fraud detection

• Separate at least some of the wheat from the chaff

• Provide some realistic guidance for getting results

• Play with cool stuff !

Page 5: Deep Learning for Fraud Detection

© 2014 MapR Technologies 5

Agenda• Motivation• What are neural networks and deep learning?• It can be simpler than you think• But, no free lunch / you get what you pay / other clever aphorism• Some experiments• Where to go from here

Page 6: Deep Learning for Fraud Detection

© 2014 MapR Technologies 6

Motivation For Advanced Modeling in Fraud• Neural networks have completely dominated credit card fraud

detection since late 80’s– Random forest, tree ensembles often used in other kinds of fraud and

churn models

• The reason is rule-based systems simply don’t work– Well, they do work at first– Fraudsters change tactics, you add rules, interaction mayhem ensues

• And learning algorithms really do work– Fraudsters change tactics, you add features and retrain

Page 7: Deep Learning for Fraud Detection

© 2014 MapR Technologies 7

So learning is good

Page 8: Deep Learning for Fraud Detection

© 2014 MapR Technologies 8

So learning is good

But good learning is hard

Page 9: Deep Learning for Fraud Detection

© 2014 MapR Technologies 9

So learning is good

But good learning is hardAnd finding good features is

really hard

Page 10: Deep Learning for Fraud Detection

© 2014 MapR Technologies 10

Some Sample Features• Charge size relative to previous averages for card• Charge size relative to previous average for merchant• Known merchant or not• Doubled transaction• AVS or CVV2 mismatch

Page 11: Deep Learning for Fraud Detection

© 2014 MapR Technologies 11

Some Sample Features• Charge size relative to previous averages for card• Charge size relative to previous average for merchant• Known merchant or not• Doubled transaction• Address Verification System or CVV2 mismatch

Page 12: Deep Learning for Fraud Detection

© 2014 MapR Technologies 12

Some Sample Features• Charge size relative to previous averages for card• Charge size relative to previous average for merchant• Known merchant or not• Doubled transaction• Address Verification System or Card Verification Value mismatch

Page 13: Deep Learning for Fraud Detection

© 2014 MapR Technologies 13

Some Sample Features• Charge size relative to previous averages for card• Charge size relative to previous average for merchant• Known merchant or not• Doubled transaction• Address Verification System or Card Verification Value mismatch• Unusual region for card• Unusual time-of-day relative to history• Magstripe use if chip available• (hundreds more)

Page 14: Deep Learning for Fraud Detection

© 2014 MapR Technologies 14

Sequence Based Features• Plausible pattern matching (rent a car, pay for gas at airport)

• Probe transactions (gas in wrong place, pizza, big charge)

• Previous transaction at compromised merchant

• Card velocity

Page 15: Deep Learning for Fraud Detection

© 2014 MapR Technologies 15

Key Problems • Good guys need data … that means that fraudsters get first

chance at bat

• Good guys are careful and test systems before releasing

• Bad guys have many low-risk transactions and can change methods quickly

• In some areas, fraudster adapt techniques in hours

Page 16: Deep Learning for Fraud Detection

© 2014 MapR Technologies 16

Making up features is easy

Finding features that add real lift is very

hard

Page 17: Deep Learning for Fraud Detection

© 2014 MapR Technologies 17

What are neural networks and deep learning?• Start simple … imagine we have 20 features, 0 or 1

– Let’s yell “Fraud” if any of the features is a 1

– Houston, we have a model

• But this model isn’t any better than a rule• Also doesn’t have any interesting Greek letters

Page 18: Deep Learning for Fraud Detection

© 2014 MapR Technologies 18

Real-world Intrudes• We assumed all features are equally good

– What if some are kind of poor or weak?

• Can we weight different features more or less?– Can we learn these weights from data?

Page 19: Deep Learning for Fraud Detection

© 2014 MapR Technologies 19

Real-world Intrudes• We assumed all features are equally good

– What if some are kind of poor or weak?

• Can we weight different features more or less?– Can we learn these weights from data?

Page 20: Deep Learning for Fraud Detection

© 2014 MapR Technologies 20

Learning Works• Yes. We can learn these models

• How we measure error is important

• We must have good features

• Even good features may need transformation– Take logs of times and monetary values– Subtract means, scale, bin values

Page 21: Deep Learning for Fraud Detection

© 2014 MapR Technologies 21

Not Good Enough• We need combinations of models

• Simple linear combinations aren’t subtle enough

• Enter multi-level models– Can we learn a model that uses combinations of inputs– Where each of those combinations is a model that we learn?

Page 22: Deep Learning for Fraud Detection

© 2014 MapR Technologies 22

Yes, Virginia, There IS a Santa Claus

Each circle is a sum and a (soft) threshold

Arrows are multiplication by a learned weight

Page 23: Deep Learning for Fraud Detection

© 2014 MapR Technologies 23

Errors on Output Can Propagate

Each circle is sends error to each arrow

Arrows weight back-propagating errors

Page 24: Deep Learning for Fraud Detection

© 2014 MapR Technologies 24

Success!Triumph!

World domination!

Page 25: Deep Learning for Fraud Detection

© 2014 MapR Technologies 25

World domination!

With some reservations because features are

hard

Page 26: Deep Learning for Fraud Detection

© 2014 MapR Technologies 26

Turtles All the Way Down – We Wish• This learning works well for just a few layers

• This is still a big deal … – with cool features, we can build real systems

• With many layers, the learning no longer converges

• Well … until recently

Page 27: Deep Learning for Fraud Detection

© 2014 MapR Technologies 27

Model Learning in an Ideal World • If we could just learn the features

– Maybe unsupervised, maybe supervised– And at the same time learn the model

• Presumably we could build models quicker

• And more easily

• And we wouldn’t have to dirty our minds with pedestrian domain knowledge

Page 28: Deep Learning for Fraud Detection

© 2014 MapR Technologies 28

Example 1 – (not very) Deep Auto-encoder• Let’s take an example where we can learn features

• Data is EKG traces

• We want to find anomalies – No supervised training

Page 29: Deep Learning for Fraud Detection

© 2014 MapR Technologies 29

Spot the Anomaly

Anomaly?

Page 30: Deep Learning for Fraud Detection

© 2014 MapR Technologies 30

Maybe not!

Page 31: Deep Learning for Fraud Detection

© 2014 MapR Technologies 31

Where’s Waldo?

This is the real anomaly

Page 32: Deep Learning for Fraud Detection

© 2014 MapR Technologies 32

Normal Isn’t Just Normal• What we want is a model of what is normal

• What doesn’t fit the model is the anomaly

• For simple signals, the model can be simple …

• The real world is rarely so accommodating

Page 33: Deep Learning for Fraud Detection

© 2014 MapR Technologies 33

We Do Windows

Page 34: Deep Learning for Fraud Detection

© 2014 MapR Technologies 34

We Do Windows

Page 35: Deep Learning for Fraud Detection

© 2014 MapR Technologies 35

We Do Windows

Page 36: Deep Learning for Fraud Detection

© 2014 MapR Technologies 36

We Do Windows

Page 37: Deep Learning for Fraud Detection

© 2014 MapR Technologies 37

We Do Windows

Page 38: Deep Learning for Fraud Detection

© 2014 MapR Technologies 38

We Do Windows

Page 39: Deep Learning for Fraud Detection

© 2014 MapR Technologies 39

We Do Windows

Page 40: Deep Learning for Fraud Detection

© 2014 MapR Technologies 40

We Do Windows

Page 41: Deep Learning for Fraud Detection

© 2014 MapR Technologies 41

We Do Windows

Page 42: Deep Learning for Fraud Detection

© 2014 MapR Technologies 42

Windows on the World• The set of windowed signals is a nice model of our original signal• Clustering can find the prototypes

– Fancier techniques available using sparse coding

• The result is a dictionary of shapes• New signals can be encoded by shifting, scaling and adding

shapes from the dictionary

Page 43: Deep Learning for Fraud Detection

© 2014 MapR Technologies 43

Most Common Shapes (for EKG)

Page 44: Deep Learning for Fraud Detection

© 2014 MapR Technologies 44

Reconstructed signal

Original signal

Reconstructed signal

Reconstructionerror

< 1 bit / sample

Page 45: Deep Learning for Fraud Detection

© 2014 MapR Technologies 45

An Anomaly

Original technique for finding 1-d anomaly works against reconstruction error

Page 46: Deep Learning for Fraud Detection

© 2014 MapR Technologies 46

Close-up of anomaly

Not what you want your heart to do.

And not what the model expects it to do.

Page 47: Deep Learning for Fraud Detection

© 2014 MapR Technologies 47

A Different Kind of Anomaly

Page 48: Deep Learning for Fraud Detection

© 2014 MapR Technologies 48

Some k-means Caveats• But Eamonn Keogh says that k-means can’t work on time-series

• That is silly … and kind of correct, k-means does have limits– Other kinds of auto-encoders are much more powerful

• More fun and code demos at – https://github.com/tdunning/k-means-auto-encoder

http://www.cs.ucr.edu/~eamonn/meaningless.pdf

Page 49: Deep Learning for Fraud Detection

© 2014 MapR Technologies 49

The Limits of Clustering as Auto-encoder• Clustering is like trying to tile your sample distribution• Can be used to approximate a signal

• Filling d dimensional region with k clusters should give

• If d is large, this is no good

Page 50: Deep Learning for Fraud Detection

© 2014 MapR Technologies 50

Page 51: Deep Learning for Fraud Detection

© 2014 MapR Technologies 51

Page 52: Deep Learning for Fraud Detection

© 2014 MapR Technologies 52

Moral For Auto-encoders• The simplest auto-encoders can be good models

• For more complex spaces/signals, more elaborate models may be required– Winner take (absolutely) all may be problematic– In particular, models that allow sparse linear combination may be better

• Consider deep learning, recurrent networks, denoising

Page 53: Deep Learning for Fraud Detection

© 2014 MapR Technologies 53

How Does Clustering Do Reconstruction?

For normalized cluster centroids, dot-product and distance are equivalent

Page 54: Deep Learning for Fraud Detection

© 2014 MapR Technologies 54

How Does Clustering Do Reconstruction?

Winner takes all with k-means

Page 55: Deep Learning for Fraud Detection

© 2014 MapR Technologies 55

How Does Clustering Do Reconstruction?

Dot-product scales centroid to reconstruct

Page 56: Deep Learning for Fraud Detection

© 2014 MapR Technologies 56

AKA - Neural Network

Page 57: Deep Learning for Fraud Detection

© 2014 MapR Technologies 57

What If … We Had More Layers?

Page 58: Deep Learning for Fraud Detection

© 2014 MapR Technologies 58

Other Thoughts• What if we allow more than one cluster to be active?

– k-sparse learning!

Page 59: Deep Learning for Fraud Detection

© 2014 MapR Technologies 59

Other Thoughts• What if we allow more than one cluster to be active?

– k-sparse learning!

Page 60: Deep Learning for Fraud Detection

© 2014 MapR Technologies 60

Other Thoughts• What if we allow more than one cluster to be active?

– k-sparse learning!

Page 61: Deep Learning for Fraud Detection

© 2014 MapR Technologies 61

Other Thoughts• What if we allow more than one cluster to be active?

– k-sparse learning!

• Well, almost

Page 62: Deep Learning for Fraud Detection

© 2014 MapR Technologies 62

The Point of Deep Learning• It isn’t just many hidden layers in a neural network

• The goal is to eliminate feature engineering by learning features as well as the classifier

Page 63: Deep Learning for Fraud Detection

© 2014 MapR Technologies 63

Experiment 3 – Card Velocity• Most features so far are inherent in the data• Few are true sequence features

• Card velocity is a pure combination– Starting point can be anywhere– The issue is where the next point is relative to starting point

Page 64: Deep Learning for Fraud Detection

© 2014 MapR Technologies 64

Card Velocity

Non-fraud steps arereasonable in terms of distance and time

Fraudulent use of card by multiple attackers results in big, fast jumps

Page 65: Deep Learning for Fraud Detection

© 2014 MapR Technologies 65

Synthetic Data Example• Generate random point• Take four small steps• If fraud, second step can be large• Result is five positions, each in 3-d on surface of a sphere

– Data shape is N x (5 x 3)

• Add secondary features containing step size … N x 4

Page 66: Deep Learning for Fraud Detection

© 2014 MapR Technologies 66

The Truth is Out There• With the right feature (step-size),

it is trivial to spot the fraud

• Here we show the step size between positions

• Fraud cases take a big jump that others don’t

• But they can be anywhere

Page 67: Deep Learning for Fraud Detection

© 2014 MapR Technologies 67

But Dimensionality Bites Hard• With the step-size feature, learning succeeds instantly with the

simplest models and gets nearly perfect accuracy

• Without the step-size feature, learning with TensorFlow gets modest accuracy after substantial learning cost (work in progress, could do better with lots more tuning)

• The problem is that there are two many combinations of 15 variables, we need a very specific combination of three pair-wise diffs combined non-linearly into a distance

Page 68: Deep Learning for Fraud Detection

© 2014 MapR Technologies 68

Page 69: Deep Learning for Fraud Detection

© 2014 MapR Technologies 69

We have a bona fide revolution

But old tricks still pay

Page 70: Deep Learning for Fraud Detection

© 2014 MapR Technologies 70

Greenfield Problem Landscape

Page 71: Deep Learning for Fraud Detection

© 2014 MapR Technologies 71

Mature Problem Landscape

Page 72: Deep Learning for Fraud Detection

© 2014 MapR Technologies 72

Summary• There is too much to say in 40 minutes, let’s talk some more at

the MapR booth

• Deep learning, especially with systems like TensorFlow have huge promise

• Deep learning trades learning architecture engineering for feature engineering

• There are powerful middle grounds

Page 73: Deep Learning for Fraud Detection

© 2014 MapR Technologies 73

Page 74: Deep Learning for Fraud Detection

© 2014 MapR Technologies 74

Short Books by Ted Dunning & Ellen Friedman• Published by O’Reilly in 2014 - 2016• For sale from Amazon or O’Reilly• Free e-books currently available courtesy of MapR

http://bit.ly/ebook-real-world-hadoop

http://bit.ly/mapr-tsdb-ebook

http://bit.ly/ebook-anomaly

http://bit.ly/recommendation-ebook

Page 75: Deep Learning for Fraud Detection

© 2014 MapR Technologies 75

Streaming Architectureby Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)

Free copies at book signing today

http://bit.ly/mapr-ebook-streams

Page 76: Deep Learning for Fraud Detection

© 2014 MapR Technologies 76

Thank You!

Page 77: Deep Learning for Fraud Detection

© 2014 MapR Technologies 77

Q & A@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies