active learning for fraud prevention

28
Active Learning for Fraud Prevention Venkatesh Ramanathan• May 21, 2016 ©2016 PayPal Inc. Confidential and proprietary.

Upload: hadoop-summit

Post on 07-Jan-2017

374 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Active Learning for Fraud Prevention

©2016 PayPal Inc. Confidential and proprietary.

Active Learning for Fraud Prevention

Venkatesh Ramanathan• May 21, 2016

Page 2: Active Learning for Fraud Prevention

©2016 PayPal Inc. Confidential and proprietary.

Agenda Introduction

Fraud Prevention

Algorithm

Experiments

Conclusion

Page 3: Active Learning for Fraud Prevention

©2016 PayPal Inc. Confidential and proprietary.

INTRODUCTION

Page 4: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

About Me•Software Engineer/Data Scientist/ML Researcher•Ph. D Computer Science•Research in Face Recognition, Phishing/Spam, Fraud Prevention

4

Page 5: Active Learning for Fraud Prevention

developers

+2.5 MILLION

©2016 PayPal Inc. Confidential and proprietary.

payments/year

4.9 BILLION

payments/second at

peak

~300

active customer accounts

184M

petabytes of data

42database

calls/ quarter

4.5T

PayPal operates one of the

largestPRIVATE CLOUDSin the world

We have transformed core

business processes into robustSERVICE-BASED

PLATFORMS

The power of our platform

Our technology transformation enables us to:• Process payments at tremendous scale• Accelerate the innovation of new products• Engage world-class developers & technologists

About PayPal

Page 6: Active Learning for Fraud Prevention

©2016 PayPal Inc. Confidential and proprietary.

FRAUD PREVENTION

Page 7: Active Learning for Fraud Prevention

Fraud Prevention @ PayPal

Robust feature engineering, machine learning and statistical models

Highly scalable and multi-layered infrastructure software

Superior team of data scientists, researchers, financial and intelligence analysts

Images source: images.google.com

Page 8: Active Learning for Fraud Prevention

Fraud Prevention @ PayPal

• Employs advanced machine learning and statistical models to flag fraudulent behavior up-front

• More sophisticated algorithms after transaction is completeTransaction

Level

• Monitor account level activity to identify abusive behavior• Abusive pattern include frequent payments, suspicious

profile changesAccount Level

• Monitor account-to-account interaction• Frequent transfer of money from several accounts to one

central account Network Level

Page 9: Active Learning for Fraud Prevention

Fraud Prevention – What are we up against?

Fraudsters are becoming increasingly smarter and adaptive

Need cost-effective solutions that can model complex attack patterns not previously observed

Need scalable and computationally efficient prediction models

Page 10: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Fraud Prevention – What are we up against?

•Much harder to get performance lift on our flagship models• Need to re-look at all aspects of

traditional model building• Need out-of-the-box thinking

10

Area we are missing (AUC 0.96)

Page 11: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Fraud Prevention – What can we do to build better models?

11

feature1 …. featureN ……… Target (Label)

d1d2…dM…..

Better feature

Better labeling

Advanced ML

Algorithms

Bigger better data

Page 12: Active Learning for Fraud Prevention

©2016 PayPal Inc. Confidential and proprietary.

ALGORITHM – ACTIVE LEARNING

Page 13: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Active Learning – What is it?

• Supervised learning algorithms require data to be labeled• Labelling is difficult, time-

consuming and expensive : Active Learning to the rescue• Idea – ML Algorithm can achieve

better accuracy if it is allowed to “choose the data” from which it learns*• Overcome labelling bottleneck

by asking queries (unlabeled data) to be labeled by human

13

Unlabeled Data

Labeled Data

Human Annotator

Machine Learning Model

(Re)Build Model

Select Queries

Source*: Burr Settles

Page 14: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Active Learning – What is it?

• Scenarios• Membership Query Synthesis – request labels for ‘any’

unlabeled instance in input space• Stream-based Selective Sampling – unlabeled instance

is drawn one at a time & learner decides whether to discard or query• Pool-based Sampling – instances are queried from a pool

according to informative-ness measure

14

Page 15: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Active Learning – What is it?

• Query Strategy Frameworks• Uncertainty Sampling• Query-By-Committee• Expected Model Change• Expected Error Reduction• Variance Reduction• Density Weighted

Methods

15

Page 16: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Active Learning – Toy Example

16

Toy data – 400 instances Model using random sampling70% accuracy

Model using active learning Uncertainty sampling – 90% accuracy

Page 17: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Active Learning For Fraud Prevention – Why is it unique?

17

• Data is unbalanced• Fraud labelling require trained experts. Can’t be outsourced• Fraud labelling is time consuming• Fraud labelling require more than just individual instances.

Require before & after transactions• Fraud labelling require data from other entities (ex: IP

address)• Fraud labelling require aggregate data• Fraud tag mature at different times (ex: chargeback) & not

instantaneous

Page 18: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Active Learning For Fraud Prevention – High Level Framework

18

Labeled Data

Create Bags

Deep Learning Model

GBT Model

(Re)Build Models

Unlabeled Data

Predict

Query By Committee

Human Expert

Create Statistics

Active Feature

Engineering SimulateFeatures

Page 19: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Modeling Algorithm – Deep Learning

19

Input LayerHidden Layers

Output Layer

• If a network has many layers of non-linearity, it is “deep”• Need scalable platform• Need lots of training data

Page 20: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Modeling Algorithm – Deep Learning

20

•Network Topology – Feed forward•Key Parameters•# of hidden layers•# of neurons @ each hidden layer•Regularization• Activation function

Page 21: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Modeling Algorithm – Gradient Boosting Trees

21

• GBT = Gradient Descent + Boosting• Fit an additive (ensemble) model in forward stage wise

manner• In each stage introduce a new model to compensate the

shortcomings of existing models

Page 22: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Modeling Algorithm – Gradient Boosting Trees

22

• Strengths• No pre-processing required• Robust• Scalable

•Weaknesses• Overfits (Need to find proper stopping point)• Sensitive to noise

• Key Parameters• # of trees• Max depth• Max observations • Learning rate

Page 23: Active Learning for Fraud Prevention

©2016 PayPal Inc. Confidential and proprietary.

EXPERIMENTS

Page 24: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Datasets

24

• Training Data• 1 year• 11 million transactions (1 million for active labelling)

• Test Data• 4 months• 4 million transactions

•# of features• 500 - 600

Page 25: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Tools

25

• H2O• Open source• Scalable• Robust• Deep Learning & GBM implementations

• R• Open source• Active learning package

Page 26: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary. 26

# of instances queried AUC (*weighted)

0 0.9601000 0.96110000 0.96350000 0.971100000 0.975500000 0.9771000000 0.979

Early Results – Active Learning Shows Promise…

Page 27: Active Learning for Fraud Prevention

©2016 PayPal Inc. Confidential and proprietary.

CONCLUSIONS

Page 28: Active Learning for Fraud Prevention

© 2016 PayPal Inc. Confidential and proprietary.

Conclusions

28

• Deep learning & GBT has shown tremendous performance for fraud detection.• Active learning shows promise in improving performance

of these champion models• Active learning also significantly reduce our labelling cost