how machine learning detect anomalies ? lecrae - 2005€¦ · detection unsupervised outlier...
TRANSCRIPT
How Machine
Learning
Detect Anomalies ? Anomaly is when you don't fit
the expected norm. Like "wait
what is this?! It doesn't belong
here." Its what the system
never planned for but now has
to adapt to. Its Neo in The
Matrix.
- LeCrae - 2005
Big Data Paris– March 2017
WHAT IS
NORMALITY
?
ORIGINAL
OUTSTANDING
SLIGHTLY
ODD
UGLY
HACKING CPC
Flo: Hi!
CPC: Hello, how are you?
Flo: My name is Florian
CPC: Nice to meet you.
Flo: I’m learning BASIC
CPC: Do you know GOSUB?
ODDITY ROBUST?
Flo: Hi !
CPC: Hello, how are you ?
Flo: I'm great, I just ate a dog !
CPC: Oops, did I wake you up?
Flo: No, i was just dining.
CPC: I might be your friend named
Andrew.
Flo: What ?
CPC: What is your favorite name?
Flo: What ???
TRUE A.I.
WILL
UNDERSTAND
ABNORMAL
The Counting Way
Edgeworth 1887
Discordant observations may be
defined as those which present the
appearance of differing in respect of
their law of frequency from other
observations with which they are
combined.
The Empirical Way
The Statistical Way
Grubbs' test for outliers (1950)An outlying observation, or outlier, is one that
appears to deviate markedly from the other
member of the sample in which it occurs.
The Machine Learning Way
Supervisedrare class
mining
Semi-Supervisednovelty
detection
UnsupervisedOutlier
detection
SUPERVISED ANOMALY DETECTION
Fraud @ AMADEUS
Time limit churningby taking advantage of various functionalities, agents are able to
lock the booking of a seat for an unlimited time without issuing
and paying a ticket.
This gives them the possibility to offer an unlimited reflection
period to their customers without the usual price increase.
Frequent flyer abuseabusive use of frequent flyer cards to be granted higher
privileges.
Supervised Anomaly Detection
Data available with good and bad labels
Bad Labels are Rare
”Rare Class Mining”
We assume that any new anomaly will
be similar to some past anomaly
Learning With Unbalanced Data
https://pdfs.semanticscholar.org/239b/2210b3fbc1f4b8246437a88a668bf9a0d2c0.pdf
An overview of classification algorithms for imbalanced datasets, Vaishali Ganganwar
OversamplingGenerate Synthetic
Examples from the
Identified anomaly
UndersamplingSelect a subset
Of the original Data
Cost Sensitive LearningTake Misclassification
Costs to minimize
financial cost
NOVELTY DETECTION
Detect The Verge Of The Storm
Novelty Detection
Data Available with Only Normal Labels
Detect abnormalities among new
observations
Time
Density Estimates
Gaussian Mixture
Anomaly/Novelty detection with scikit-learn
Alexandre Gramfort
Assumption
Independent and Identically
Distributed Variables
http://fr.slideshare.net/agramfort/anomalynovelty-detection-with-scikitlearn
One Class SVM
Detect Abnormal Network Activitytypical proportion of anomalies is 1 − 0.1%
0.5 million data points → 1000 anomalies
Rare Mining: Fraud Detection
STAKE : 13 B$ Per Year (US, 2015)
~0.04% Of Transaction Volume
(compared to 1.60% transaction fee)
Sequence Anomaly DetectionUniversal Probability Assigment
Universal Anomaly Detection: Algorithms and Applications
Markov Chain
First Letter:
Transaction Amount. Low (L) or High (H)
Second Letter:
Time Between Transaction. Low (L) or High(H)
LL : Small Transaction, Shortly After the previous one
LH: Small Transaction, Long after the previous one
Etc..
Learn Transition Probabilities on sample data
Identify Sequence that do not match
OUTLIER DETECTION
Particle Physics
Typical proportion of anomalies is 10-4 %
2 million data points → 100 anomalies
Outlier Detection
No Labels available whatsoever
The only information whatsoever is that
labels are ”rare” and ”isolated” in a
sense to be determined w.r.t the
remaining of the data
Damage Detection
Energy with ”predictive” preventive maintenance programreduce their pump costs by 30%
https://www.rolandberger.com/publications/publication_pdf/roland_berger_predictive_maintenance_20141215.pdf
Roland Berger Report on Predictive Maintenance – Novembre 2014
Life Cycle CostsClassic Preventive
Maintenance
Predictive Preventive
Maintenance
Initial Cycle Costs $20,600 $20,600
Installation Costs $83,000 $83,000
Pump Maintenance Costs $25,000 $16,000
Other Maintenance Costs $6,000 $2,000
Total Life Cycle Costs $134,600 $121,600
Anomaly DetectionCluster Based Detectors
Anomaly DetectionDensity Based Detectors
Compare Density around a poin t
With the density of its neighbours
Isolation Forest
Isolation Forest
Representative subset selection and outlier detection via isolation forest
Wo-Ruo Chen,a Yong-Huan Yun,a Ming Wen,a Hong-Mei Lu,a Zhi-Min
Zhang*a and Yi-Zeng Liang*
Security
Fake Reviews / Fake News
Opinion Fraud Detection in Online Reviews by Network Effects
Graph Analytics
http://www3.cs.stonybrook.edu/~leman/pubs/14-dami-graphanomalysurvey.pdf
Community Based
Assign nodes into communities
and detect nodes belonging to no
communities
Relational Learning.
Learn Using Neighbor as a feature
Structured Based
Find rare substructure in the graph
Anomaly Detection : In Crowd
http://www.svcl.ucsd.edu/projects/anomaly/
STAKE:
Medical Imaging
http://ots.fh-brandenburg.de/downloads/abschlussarbeiten/2016-10-14%20pl_tobias_meyer.pdf
Hyper Parameter Selection for Anomaly Detection With Stack Autoencoders – a Deep Learning Application
ElectrocardiogramsElectroencephalograms
Deep Learning: LTSM, Reconstruction Error
Deep Learning : Reconstruction Error
http://radar.oreilly.com/2014/07/new-approaches-to-anomaly-detection.html
Ellen Friedman
SOLVE BUSINESS CRITICAL PROBLEMS
Solve the XXX Remaining Fraud
Get the next ”9” in product quality
Keep us Safe
ANOMALY DETECTION BECAME ROBUST
Robust New ”General Purpose” Techniques (Isolation Forest)
Robust Specific Algorithms (Sequence Mining / Graph Mining)
Deep Learning To the Rescue
Machine Learning Challenge #1
ENDURE
Machine Learning Trade-offs
Interpretability Performance Self-Adaptation
Future with Deep LearningLearn when you might be wrong ?
Representation layer
High Reconstruction Error = Anomaly
Classification / Regression
Thank you !