real-time analytics in financial
TRANSCRIPT
1 ©HortonworksInc.2011– 2016.AllRightsReserved
Real-timeAnalyticsinFinancialUseCase,ArchitectureandChallenges
蒋逸峰(しょういつほう/YifengJiang)SolutionsEngineer,Hortonworks@uprushOctober26,2016
2 ©HortonworksInc.2011– 2016.AllRightsReserved
TheYAP
MapbyGoogleMaps
3 ©HortonworksInc.2011– 2016.AllRightsReserved
http://www.wondermondo.com
4 ©HortonworksInc.2011– 2016.AllRightsReserved
Today’sMoney&FinancialService
moneymoney
FinancialService
0110010100 0110010100
5 ©HortonworksInc.2011– 2016.AllRightsReserved
EveryFinancialServiceisaBigDataService
à FinancialservicesareBIG
– Toobigtofail
à Everyfinancialserviceiseventuallyabigdataservice
– Numberoftransactions
– Numberofjobs
– Thirdpartydata
6 ©HortonworksInc.2011– 2016.AllRightsReserved
HowBigisBigDatainFinancial?
à Millionstobillionstransactionsperday
– Hundredstotensofthousandstransactionspersecond
à BigDatainbanking,payment,security,etc.
7 ©HortonworksInc.2011– 2016.AllRightsReserved
BigDataUseCaseinFinancial
http://www.forbes.com/sites/bernardmarr/2016/09/09/big-data-in-banking-how-citibank-delivers-real-business-benefits-with-their-data-first-approach/#7759859f75ed
8 ©HortonworksInc.2011– 2016.AllRightsReserved
WhyReal-timeAnalyticsinFinancial?
Canyoudetectfraudfrom
millionstobillionstransactionsperday
inreal-time ?
“Thecostsresultingfromtheseanomaliesisfareasiertocorrectifspottedquickly– or
evenbeforeithappens– throughpredictivemodeling.”
9 ©HortonworksInc.2011– 2016.AllRightsReserved
最近気になったニュース
http://gendai.ismedia.jp/articles/-/48832
http://mainichi.jp/articles/20161012/k00/00e/040/243000c
10 ©HortonworksInc.2011– 2016.AllRightsReserved https://roboteer-tokyo.com/archives/4415
11 ©HortonworksInc.2011– 2016.AllRightsReserved
Real-timeAnalyticsinFinancialUseCase,Architecture&Challenging
12 ©HortonworksInc.2011– 2016.AllRightsReserved
ASimpleUseCase– Real-timeSurveillance
Detectabnormal transactionsinStockExchange
à Triggeralertif
– Acustomerbuy/sellamountexceeds500MJPYin3minutes
à 300Ktransactionspersecond
à Abnormalmustbedetectedwithin10s
Alert
13 ©HortonworksInc.2011– 2016.AllRightsReserved
Real-time Surveillance Architecture v1
Trading Data(real-time)
Message Bus(Kafka)
Enricher(Storm)
Aggregator(Storm)
Master data, raw & aggregated trade(HBase+Phoenix)
Surveillance Rule Engine
Surveillance Alerts
master data look up
Insert trade(raw & aggregated)
Architecturev1
how?
14 ©HortonworksInc.2011– 2016.AllRightsReserved
Real-timeDataIngestion
à From transactiondatabase
– ChangeDataCapture(CDC)
– Notpracticalformostfinancialsystem
à Fromgatewaysystem
– Receivedatafromgatewaysystem
– SenddatatoKafka(asKafkaproducer)
15 ©HortonworksInc.2011– 2016.AllRightsReserved
Real-time Surveillance Architecture v1
Trading Data(real-time)
Message Bus(Kafka)
Enricher(Storm)
Aggregator(Storm)
Master data, raw & aggregated trade(HBase+Phoenix)
Surveillance Rule Engine
Surveillance Alerts
master data look up
Insert trade(raw & aggregated)
via CDC or gateway
Architecturev1
overhead?
16 ©HortonworksInc.2011– 2016.AllRightsReserved
Real-timeDataLookup
à Lowlatencydatastore
– Masterdata
– NoSQLdatabase:HBase (+Phoenix),Redis
à UselocalCache
– LRUcacheinStormbolts
17 ©HortonworksInc.2011– 2016.AllRightsReserved
Real-time Surveillance Architecture v2
Trading Data(real-time)
Message Bus(Kafka)
Enricher(Storm)
Aggregator(Storm)
Master data, raw & aggregated trade(HBase+Phoenix)
Surveillance Rule Engine
Surveillance Alerts
local master data cache
master data look up
Insert trade(raw & aggregated)
via CDC or gateway
Architecturev2– WithCache
exactly-once?
exactly-once?
18 ©HortonworksInc.2011– 2016.AllRightsReserved
Exactly-Once?
Messagedeliverysemantics
à At-most-once:maylosedatabutnoduplication
à At-least-once:nodataloss,butmayduplicate
à Exactly-once:nodataloss,noduplication
19 ©HortonworksInc.2011– 2016.AllRightsReserved
Exactly-Once!
NOrealexactly-oncemessagedeliveryindistributedsystem
à Thereisnosuchthingasexactly-oncedelivery
à Exactly-onceisanend-to-endrequirement
But… peoplelikeexactly-once,especiallyinfinancialservicesystem
20 ©HortonworksInc.2011– 2016.AllRightsReserved
Exactly-oncesemantics (betterphrase“effectively-once”)withat-least-once+
idempotentoperations
à Kafka&Stormguaranteeat-least-once
à De-duplicatebyensuringidempotentinyourapplication
Effectively-once
21 ©HortonworksInc.2011– 2016.AllRightsReserved
De-duplicationinwindowcomputation
Mostwindowcomputationscanachieveidempotent
à Examples:aggregation,counting,etc.
à De-duplicatemessagesinthewindow
– Usinglocalin-memorystatestore,e.g.aSetclass
Trading Eventsin Kafka
IDRegistry(local in-memory)
2. lookup trade_id
3. count de-duplicated events
5. output aggregated data
Aggregated Trade Data
Aggregator(Storm)
4. Insert trade_id
1. Pull data in 3m window
22 ©HortonworksInc.2011– 2016.AllRightsReserved
De-duplicationinnon-idempotentcomputation
à Exactly-onceinnon-idempotentcomputation
– Example:joincontinuousdatastreams
– Globalstatestorerequired:HBase,Redis
– BatchingcanhelpreducenumberofIDlookup.
à Exactly-onceisexpensive,avoiditatthebest
Click Logsin Kafka
IDRegistry(external NoSQL)
2. Lookup click_id
5. Output joined click
Joined Click Logs
Joiner(Storm)
4. Insert click_id
1. Pull data continuously
Query Logs
3. Lookup query
23 ©HortonworksInc.2011– 2016.AllRightsReserved
Real-time Surveillance Architecture v3
Trading Data(real-time)
Message Bus(Kafka)
Enricher(Storm)
Aggregator(Storm)
Master data, raw & aggregated trade(HBase+Phoenix)
Surveillance Rule Engine
Surveillance Alerts
master data look up
IDRegistry look up / insert,de-duplicate in window
local master data cache
IDRegistry(local in-memory)
Insert trade(raw & aggregated)
via CDC or gateway
Architecturev3– effectively-once
order?
24 ©HortonworksInc.2011– 2016.AllRightsReserved
LateMessages
http://www.slideshare.net/HadoopSummit/apache-beam-a-unified-model-for-batch-and-stream-processing-data
25 ©HortonworksInc.2011– 2016.AllRightsReserved
HandlingLateMessages
à Expectlatemessages
– Streamingapplicationneedstohandleoutoforderevents,e.g.,emitslatemessagestoaspecial
Kafkatopic
à Usesourcegeneratedtimestamp
à Storm’slatemessagesupportinwindowcomputation(BaseWindowedBolt)
– withTimestampField(StringfieldName)
– withLag(Durationduration)
26 ©HortonworksInc.2011– 2016.AllRightsReserved
CanItrustthedata?
Duplications!
Outoforderlatemessages!
Dataloss?
27 ©HortonworksInc.2011– 2016.AllRightsReserved
MonitorDataProcessingPipelineQuality
Approachestomonitordatapipelinequality
à Auditcompletenessà Outputduplicatedandlatemessagestologsforauditing.à Defineservicelevelobjective(SLO)ofdataqualityandmonitortheSLO.
28 ©HortonworksInc.2011– 2016.AllRightsReserved
DefineDataProcessingPipelineSLO
DesignpracticalSLOforthepipeline
à Process99.9999%eventswithinafewsecondsà and100%eventswithinafewhours
à At-most-oncesemanticsatanypointoftimeà Nearexactly-oncesemanticsinnearreal-timeà Andexactly-oncesemanticseventually
29 ©HortonworksInc.2011– 2016.AllRightsReserved
TheRuleEngine&TheArchitecture
Hundredsofrules
à Astocktradingpricejumpup/down>k%andtotalamount>m%inKminutes
à SingleATMcashwithdrawal>k%andnumberofATM>minKminutes
Manyoftheserulesfitintothissimplearchitecture!
RuleEngine
✓✗? ? ? ✓
Rulebaseonly?
30 ©HortonworksInc.2011– 2016.AllRightsReserved
Architecture-- withPredictiveAnalytics
Real-time Surveillance Architecture — with Predicate Engine
Trading Data(real-time)
Message Bus(Kafka)
Enricher(Storm)
Aggregator(Storm)
Master data, raw & aggregated trade(HBase+Phoenix)
Surveillance Rule Engine
Surveillance Alerts
master data look up
IDRegistry look up / insert,de-duplicate in window
local master data cache
IDRegistry(local in-memory)
Insert trade(raw & aggregated)
Financial Data Lake
Train Machine Learning Model
(Spark)load ML model
Surveillance Predicate Engine
(Storm)
via CDC or gateway
31 ©HortonworksInc.2011– 2016.AllRightsReserved
LifecycleofBigDataAdoptioninFinancialServiceIndustry
2.BusinessIntelligenceDataminingandvisualizationsoftwarethatrevealstrendsandusefulinformation
1.DataPoolingandProcessingConnectdataandcreatestructurebymerging,conditioningstreamsandarchiveddata
3.PredictiveAnalyticsAutomatedanalyticsintegratedintoworkflowthatunlockdatavalueandimproveprofitability
HadoopenabledBigDataPlatform
Customerstypically“StartSmall,ThinkBig”
32 ©HortonworksInc.2011– 2016.AllRightsReserved
THANKYOU