deep learning for fraud detection
TRANSCRIPT
© 2014 MapR Technologies 1© 2014 MapR Technologies
Deep Learning for Fraud Detection
© 2014 MapR Technologies 2
Contact Information
Ted DunningChief Applications Architect at MapR Technologies
Committer & PMC for Apache’s Drill, Zookeeper & othersVP of Incubator at Apache Foundation
Email [email protected] [email protected]
Twitter @ted_dunning
© 2014 MapR Technologies 3
Goals for Today• Explore the state of the art for deep-learning and fraud detection
• Separate at least some of the wheat from the chaff
• Provide some realistic guidance for getting results
© 2014 MapR Technologies 4
Goals for Today• Explore the state of the art for deep-learning and fraud detection
• Separate at least some of the wheat from the chaff
• Provide some realistic guidance for getting results
• Play with cool stuff !
© 2014 MapR Technologies 5
Agenda• Motivation• What are neural networks and deep learning?• It can be simpler than you think• But, no free lunch / you get what you pay / other clever aphorism• Some experiments• Where to go from here
© 2014 MapR Technologies 6
Motivation For Advanced Modeling in Fraud• Neural networks have completely dominated credit card fraud
detection since late 80’s– Random forest, tree ensembles often used in other kinds of fraud and
churn models
• The reason is rule-based systems simply don’t work– Well, they do work at first– Fraudsters change tactics, you add rules, interaction mayhem ensues
• And learning algorithms really do work– Fraudsters change tactics, you add features and retrain
© 2014 MapR Technologies 7
So learning is good
© 2014 MapR Technologies 8
So learning is good
But good learning is hard
© 2014 MapR Technologies 9
So learning is good
But good learning is hardAnd finding good features is
really hard
© 2014 MapR Technologies 10
Some Sample Features• Charge size relative to previous averages for card• Charge size relative to previous average for merchant• Known merchant or not• Doubled transaction• AVS or CVV2 mismatch
© 2014 MapR Technologies 11
Some Sample Features• Charge size relative to previous averages for card• Charge size relative to previous average for merchant• Known merchant or not• Doubled transaction• Address Verification System or CVV2 mismatch
© 2014 MapR Technologies 12
Some Sample Features• Charge size relative to previous averages for card• Charge size relative to previous average for merchant• Known merchant or not• Doubled transaction• Address Verification System or Card Verification Value mismatch
© 2014 MapR Technologies 13
Some Sample Features• Charge size relative to previous averages for card• Charge size relative to previous average for merchant• Known merchant or not• Doubled transaction• Address Verification System or Card Verification Value mismatch• Unusual region for card• Unusual time-of-day relative to history• Magstripe use if chip available• (hundreds more)
© 2014 MapR Technologies 14
Sequence Based Features• Plausible pattern matching (rent a car, pay for gas at airport)
• Probe transactions (gas in wrong place, pizza, big charge)
• Previous transaction at compromised merchant
• Card velocity
© 2014 MapR Technologies 15
Key Problems • Good guys need data … that means that fraudsters get first
chance at bat
• Good guys are careful and test systems before releasing
• Bad guys have many low-risk transactions and can change methods quickly
• In some areas, fraudster adapt techniques in hours
© 2014 MapR Technologies 16
Making up features is easy
Finding features that add real lift is very
hard
© 2014 MapR Technologies 17
What are neural networks and deep learning?• Start simple … imagine we have 20 features, 0 or 1
– Let’s yell “Fraud” if any of the features is a 1
– Houston, we have a model
• But this model isn’t any better than a rule• Also doesn’t have any interesting Greek letters
© 2014 MapR Technologies 18
Real-world Intrudes• We assumed all features are equally good
– What if some are kind of poor or weak?
• Can we weight different features more or less?– Can we learn these weights from data?
© 2014 MapR Technologies 19
Real-world Intrudes• We assumed all features are equally good
– What if some are kind of poor or weak?
• Can we weight different features more or less?– Can we learn these weights from data?
© 2014 MapR Technologies 20
Learning Works• Yes. We can learn these models
• How we measure error is important
• We must have good features
• Even good features may need transformation– Take logs of times and monetary values– Subtract means, scale, bin values
© 2014 MapR Technologies 21
Not Good Enough• We need combinations of models
• Simple linear combinations aren’t subtle enough
• Enter multi-level models– Can we learn a model that uses combinations of inputs– Where each of those combinations is a model that we learn?
© 2014 MapR Technologies 22
Yes, Virginia, There IS a Santa Claus
Each circle is a sum and a (soft) threshold
Arrows are multiplication by a learned weight
© 2014 MapR Technologies 23
Errors on Output Can Propagate
Each circle is sends error to each arrow
Arrows weight back-propagating errors
© 2014 MapR Technologies 24
Success!Triumph!
World domination!
© 2014 MapR Technologies 25
World domination!
With some reservations because features are
hard
© 2014 MapR Technologies 26
Turtles All the Way Down – We Wish• This learning works well for just a few layers
• This is still a big deal … – with cool features, we can build real systems
• With many layers, the learning no longer converges
• Well … until recently
© 2014 MapR Technologies 27
Model Learning in an Ideal World • If we could just learn the features
– Maybe unsupervised, maybe supervised– And at the same time learn the model
• Presumably we could build models quicker
• And more easily
• And we wouldn’t have to dirty our minds with pedestrian domain knowledge
© 2014 MapR Technologies 28
Example 1 – (not very) Deep Auto-encoder• Let’s take an example where we can learn features
• Data is EKG traces
• We want to find anomalies – No supervised training
© 2014 MapR Technologies 29
Spot the Anomaly
Anomaly?
© 2014 MapR Technologies 30
Maybe not!
© 2014 MapR Technologies 31
Where’s Waldo?
This is the real anomaly
© 2014 MapR Technologies 32
Normal Isn’t Just Normal• What we want is a model of what is normal
• What doesn’t fit the model is the anomaly
• For simple signals, the model can be simple …
• The real world is rarely so accommodating
© 2014 MapR Technologies 33
We Do Windows
© 2014 MapR Technologies 34
We Do Windows
© 2014 MapR Technologies 35
We Do Windows
© 2014 MapR Technologies 36
We Do Windows
© 2014 MapR Technologies 37
We Do Windows
© 2014 MapR Technologies 38
We Do Windows
© 2014 MapR Technologies 39
We Do Windows
© 2014 MapR Technologies 40
We Do Windows
© 2014 MapR Technologies 41
We Do Windows
© 2014 MapR Technologies 42
Windows on the World• The set of windowed signals is a nice model of our original signal• Clustering can find the prototypes
– Fancier techniques available using sparse coding
• The result is a dictionary of shapes• New signals can be encoded by shifting, scaling and adding
shapes from the dictionary
© 2014 MapR Technologies 43
Most Common Shapes (for EKG)
© 2014 MapR Technologies 44
Reconstructed signal
Original signal
Reconstructed signal
Reconstructionerror
< 1 bit / sample
© 2014 MapR Technologies 45
An Anomaly
Original technique for finding 1-d anomaly works against reconstruction error
© 2014 MapR Technologies 46
Close-up of anomaly
Not what you want your heart to do.
And not what the model expects it to do.
© 2014 MapR Technologies 47
A Different Kind of Anomaly
© 2014 MapR Technologies 48
Some k-means Caveats• But Eamonn Keogh says that k-means can’t work on time-series
• That is silly … and kind of correct, k-means does have limits– Other kinds of auto-encoders are much more powerful
• More fun and code demos at – https://github.com/tdunning/k-means-auto-encoder
http://www.cs.ucr.edu/~eamonn/meaningless.pdf
© 2014 MapR Technologies 49
The Limits of Clustering as Auto-encoder• Clustering is like trying to tile your sample distribution• Can be used to approximate a signal
• Filling d dimensional region with k clusters should give
• If d is large, this is no good
© 2014 MapR Technologies 50
© 2014 MapR Technologies 51
© 2014 MapR Technologies 52
Moral For Auto-encoders• The simplest auto-encoders can be good models
• For more complex spaces/signals, more elaborate models may be required– Winner take (absolutely) all may be problematic– In particular, models that allow sparse linear combination may be better
• Consider deep learning, recurrent networks, denoising
© 2014 MapR Technologies 53
How Does Clustering Do Reconstruction?
For normalized cluster centroids, dot-product and distance are equivalent
© 2014 MapR Technologies 54
How Does Clustering Do Reconstruction?
Winner takes all with k-means
© 2014 MapR Technologies 55
How Does Clustering Do Reconstruction?
Dot-product scales centroid to reconstruct
© 2014 MapR Technologies 56
AKA - Neural Network
© 2014 MapR Technologies 57
What If … We Had More Layers?
© 2014 MapR Technologies 58
Other Thoughts• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 59
Other Thoughts• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 60
Other Thoughts• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 61
Other Thoughts• What if we allow more than one cluster to be active?
– k-sparse learning!
• Well, almost
© 2014 MapR Technologies 62
The Point of Deep Learning• It isn’t just many hidden layers in a neural network
• The goal is to eliminate feature engineering by learning features as well as the classifier
© 2014 MapR Technologies 63
Experiment 3 – Card Velocity• Most features so far are inherent in the data• Few are true sequence features
• Card velocity is a pure combination– Starting point can be anywhere– The issue is where the next point is relative to starting point
© 2014 MapR Technologies 64
Card Velocity
Non-fraud steps arereasonable in terms of distance and time
Fraudulent use of card by multiple attackers results in big, fast jumps
© 2014 MapR Technologies 65
Synthetic Data Example• Generate random point• Take four small steps• If fraud, second step can be large• Result is five positions, each in 3-d on surface of a sphere
– Data shape is N x (5 x 3)
• Add secondary features containing step size … N x 4
© 2014 MapR Technologies 66
The Truth is Out There• With the right feature (step-size),
it is trivial to spot the fraud
• Here we show the step size between positions
• Fraud cases take a big jump that others don’t
• But they can be anywhere
© 2014 MapR Technologies 67
But Dimensionality Bites Hard• With the step-size feature, learning succeeds instantly with the
simplest models and gets nearly perfect accuracy
• Without the step-size feature, learning with TensorFlow gets modest accuracy after substantial learning cost (work in progress, could do better with lots more tuning)
• The problem is that there are two many combinations of 15 variables, we need a very specific combination of three pair-wise diffs combined non-linearly into a distance
© 2014 MapR Technologies 68
© 2014 MapR Technologies 69
We have a bona fide revolution
But old tricks still pay
© 2014 MapR Technologies 70
Greenfield Problem Landscape
© 2014 MapR Technologies 71
Mature Problem Landscape
© 2014 MapR Technologies 72
Summary• There is too much to say in 40 minutes, let’s talk some more at
the MapR booth
• Deep learning, especially with systems like TensorFlow have huge promise
• Deep learning trades learning architecture engineering for feature engineering
• There are powerful middle grounds
© 2014 MapR Technologies 73
© 2014 MapR Technologies 74
Short Books by Ted Dunning & Ellen Friedman• Published by O’Reilly in 2014 - 2016• For sale from Amazon or O’Reilly• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-world-hadoop
http://bit.ly/mapr-tsdb-ebook
http://bit.ly/ebook-anomaly
http://bit.ly/recommendation-ebook
© 2014 MapR Technologies 75
Streaming Architectureby Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book signing today
http://bit.ly/mapr-ebook-streams
© 2014 MapR Technologies 76
Thank You!
© 2014 MapR Technologies 77
Q & A@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies