using predictive analytics for fraud detection · what does this mean for fraud detection and...

53
Using Predictive Analytics for Fraud Detection Presented by: Manoj Chiba for ICFP Date: August 2017

Upload: others

Post on 21-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

UsingPredictiveAnalyticsforFraudDetectionPresentedby:ManojChibaforICFPDate:August2017

Datais[becoming] the[new] rawmaterialofbusiness~CraigMundie– Modified~

Ifwehavedata,let’slookatthedata.Ifallwehaveareopinion’slet’sgowithmine~JimBarksdale~

AgendaWhyanalyticsandWhatisdata?DatascientistandtheFraudDataScientistPredictiveAnalyticsWhatpredictiveanalyticsmeansforFraudUsecasesFraudanalyticsprocessmodel

WhyAnalyticsandWhatisdata?

WhyAnalytics?Ifthecurrentrateofchangeandcomplexityweretoremainconstant,wewouldhaveexperiencedallthemajormilestonesofthetwentiethcentury– inasingleweekin2025!

1. Thecreationoftheautomobile;2. ThefirstandsecondworldwarANDtheVietnamwar;3. DecodingoftheDNAstructure;4. Nuclearenergy;5. Spacetravel;6. Theinternet;and7. Humangenomesequencing

Thechallengefororganizationsis:Howtonavigatethis,buildstrategiesthatidentifytrendsofthefuture:Analyticsispostulatedtobetheanswer!=IDENTIFICATIONOFTRENDS,PRESENTANDFUTURETRENDS

Considerthefollowinginasingleday…online1. Enoughinformationisconsumedtofill

±168MillionDVDs

2. ±294Billionemailsaresent

3. ±2Millionblogpostsarewritten

4. ±4.7MillionminutesarespentonFacebook

5. ±864,000hoursofvideoareuploadedonYouTube

Foranyanalyticsweneeddata…Sowhatisdata?

Whatmanythinkdatais…

Gobbledygooknoun informal

Language that is meaningless or is made unintelligible by excessive use of abstruse technical terms,

nonsenseSynonyms: gibberish, claptrap, nonsense, balderdash, blather,

garbage

Theoftenforgottendata

Themost often“forgotten”data

Macro-economicinfluences

EmotionalImportance

Terminology

• UnstructuredDataDatathathasnoidentifiablestructure– forexample,thetextofemailmessages

• StructuredDataDatathatisorganisedbyapredeterminedstructure.

Thedataproblem?

Dataexistsbuttheproblemis:• DataMining• DataAnalysisSkills• Understandingwhatitmeansformybusiness

Thequestionshiftsfromwhatdowethink,towhatdoweknow? 95%

Resides internally

34%Recognised globally

BIGdata

Volume

Value

Velocity

Variety

VeracityWhile“size”ofdataistraditionallythehallmarkofbigdata,thetermispoor,andmaybebetter

rootedinanunderstandingthatBigDataisaboutcapacitytoSEARCH,

AGGREGATE andCROSS-REFERENCEdatasets

BusinessValue

Butwherearewe???

Hadoop

2006

HypeofBD

2011 2014BDplateau

2015 2016

AI,machinelearning,deeplearning…

Howofferingshavechanged(2012)

Howth

eofferin

gshavechanged

(201

6)Co

urtesyofFirstm

ark

• Maturityhasbeenreached….

• Trendin:

• FromInfrastructure(DevelopersandEngineers)toAnalytics(DataScientistsandAnalysts)

• FromAnalytics(DataScientistsandAnalysts)toApplication(Businessusersandconsumers)- InourcontextFraudDetection!

Whatdoesthismean?

STOP!Who?Thedatascientist?

In a world of near infinite data,professionals who can fish outinsights from the ocean of data we’redrowning swimming in are incrediblyattractive.

~ScottBrinker– Chiefmartec ~

Data Science

ComputerScience

MachineLearning

Unicorn

Math&Statistics

TraditionalSoftware

TraditionalResearch

SubjectMatterExpertise

InterestingdataversusActionable

data

InterestingvsActionable

Interesting:NicetoknowDoesNOT helpyoumakeinformeddecisionsDoesNOT provideinsight:Whyshouldwecare?

Actionable:Insights>ActionDesignProgrammesDevelopstrategiesAchievegoals

2,259 steps

Simpleexampleofactionabledata:MyFitbitDATA INFORMATION INSIGHT ACTION

TheFrauddatascientist

PredictiveAnalytics

Whatispredictiveanalytics…WhyPredictiveAnalyticsmatters…

Notwhatwillhappen…Whatmight happen…

• PredictiveAnalyticslikestatisticshasbeenaroundforalongtime…

• Sowhathaschanged?

1. Increaseinvolumeandtypeofdata2. Greaterinterestindataforinsights3. Computingpower,and“pointandclick”4. Toughereconomicconditionsandneedforcompetitivedifferentiation:Businessefficiency;ROI…..

Timeforpredictiveanalyticshascome…

Whypredictiveanalyticsmatters

• Descriptive• Whatarethecharacteristicsofthosewhocommitfraud?HowdoIturnmydataintorulesforbetterdecisions?

Knowledge

• Predictive• Howlikelyisaclaimwithsomeoneorabusinesswiththosecharacteristicstobefraudulent?

ActionUncertainty

Usableprobability

Sampletechniquesofpredictiveanalytics…

RuleInduction Decisiontree&classification Regression

Clustering AffinityAnalysisNearestNeighbor

NeuralNetworks Geneticalgorithms

Whatdoesthisallmeanforfrauddetection?

Whatdoesthismeanforfrauddetectionandprevention• BigDataandanalyticsprovidepowerfultoolsthatmayimproveanorganizationsfrauddetectionsystems

• COMPLIMENTARYtotraditionalexpert-basedfraud-detectionapproaches- DOESNOTREPLACE!!!

Socialnetworks:Thatis,fraudulentcompaniesaremoreconnectedtootherfraudulentcompaniesthantonon-fraudulentcompanies.

Whatdoesthismeanforfrauddetectionandprevention

Socialnetworks:Thatis,fraudulentcompaniesaremoreconnectedtootherfraudulentcompaniesthantonon-fraudulentcompanies.

Contextualinformation:SocialNetworkAnalysis

UseCase1AnalyticsAppliedtoFraudDetection

Usecaseofpredictiveanalyticstodetectfraud

• Context:CarinsurancecompanyinSA,operatesglobally

• Decliningprofits>increasedpremiums=fraudulentclaims

• Historicalclaimsdatawithknownfraudoutcomestopredictprobabilitythatnewclaimsarefraudulent!

• Understandingwhathashappened

• Problem:Repairshopsthatinflateestimates

Whatwedousinganalytics… Geo-spatialdata

• Ourproblem:Repairshopsthatinflaterepairestimates

UseofData:

• Claimants’address(Geocoded)

• Locationofrepairshops

• Averageclaimestimateforaparticularproblem

Analyzingthedata:

• Mapareaswhereestimatesarehigherthantheaverage

• Overlayclaimants’address

Algorithm:

• Predictbasedondistanceclaimanttravelstogetarepairdone>WHYtravellingoutsidearadius?

Usecaseofpredictiveanalyticstodetectfraud

• Algorithm:Claimantstravellingadistancetogetarepairdonecorrelateswiththerepairshopprovidingover-estimates(aboveaverage)

• >inflatedestimate>potentialfraud

• Outcome:• Reducetimerequiredtoreferquestionableclaimsforinvestigationbyasmuchas95%.

• Successrateinpursuingfraudulentclaimsfrom50%to88%!

• HealthcareinKenya!

UseCase2AnalyticsAppliedtoFraudDetection

Usecaseofpredictiveanalyticstomanage&preventfraud

• Context:Insurance(Turkey)

• Mismatchbetweenpublicandprivateprofilesofindividuals(narrativesforclaims)>Publicdatatoserveasareferenceforinternaldatabaserecords

• Relationshipbetweencustomerprofileandfraudulentclaims

• Useofsocialmediaasalisteningtool

Whatwedousinganalytics… SocialCRM• Ourproblem:Characteristics(customerbehaviorandfraudulentclaims)

UseofData:

• Consumersinternal“known”datacorroboratedwithexternalsocialdata(e.g.check-inat“home”is50kmawayfromregisteredaddress)

• Usingsocialanalytics(textandimages;check-in’s;likesetc.)

Analyzingthedata:

• Buildbehavioralprofilesfromsocialmediadata;

• Overlaybehavioraldatawithknownfraudulentclaims

Algorithm:

PredictbasedonbehavioraldataPROBABILITYoffraudulentclaim(relationshipbetweencustomerbehaviorandfraudulentclaims)

Sendforinvestigation:86%accuracy.Socialanalyticsisonlyanindicator>Investigatorsconfirmindependently

UseCase3AnalyticsAppliedtoFraudDetection

Usecaseofpredictiveanalyticstounderstandcreditcardfraud… Earlyadopters

• Context:Financialinstitution(largeimpairmentsonCCfraud)

• ”Classic”symptoms:Smallpurchasefollowedbyabigone;largenumberofonlinepurchasesinashortperiodoftime;spendingasmuchaspossiblequickly;smalleramounts,spreadacrosstimes

• Problem:“Normal”behaviourpatternsofCCusage>outliers

Whatwedousinganalytics… Supervisedandunsupervisedlearning

• Ourproblem:Identifycharacteristicsoftransactionsthatdeviatefromthenormalbehavior

UseofData:

• 2million+CCholders

Results

BusinessResults• 350+hoursofpureanalysis

• 3Monthsunderstanding

• Near-realtimedetectionoffraudulentpurchasesandCCuse

• 76%accuracy… >85%oncedataissuesfixed

Fraudanalyticsprocessmodel

Identifybusinessproblem

Identifydata

sources

Selectthedata

Cleanthedata

Transformthedata

Preprocessing

PreprocessingAnalyzethedata

Interpret,evaluate,deploythemodel

Keycharacteristicsofsuccessfulfraudanalyticsmodels

StatisticalAccuracy

Interpretability

OperationalefficiencyEconomiccost

Regulatorycompliance

Withtherightdata

• Garbagedatain>Garbagedataout• MasterDataManagement

• Policies• Governance• Processes• StandardsandTools• Leadstoincreasedaccuracyofpredictivemodels

Attheheartofpredictiveanalytics

ANALYTICSDataScienceiswhat Data

Scientistsdo….

Bringinthinkingandexpertisefromavarietyof

fieldstosolve“problems”

SowhyareweNOTleveragingpredictiveanalytics…1. Data-drivencompanyculture2. Whatisthevalue (costvsbenefit)3. Innovation:Sayingnobeforetrying– losingfirstmoveradvantage4. Leadership:Moredatadoesnotleadtosuccess– makingsenseofthedata

withcleargoalsdoes!5. Talentmanagement:Asdatabecomescheaper,thecomplementsbecome

expensive1. DataScientistswithabusinessunderstandingbecomecentral– Dowehavetheskills?

Whatskillsdoweneed?Whatisadatascientist?2. Problemsolvingskills:logicandreasoning– theabilitytoknowhow

non-traditionalandtraditionaldatasourcescanassistbusinessderiveanddrivevalue

ThankYouFormoreinformation,[email protected];[email protected];[email protected];0827845769