real-time analytics in financial

32
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Analytics in Financial Use Case, Architecture and Challenges 蒋 逸峰(しょう いつほう/Yifeng JiangSolutions Engineer, Hortonworks @uprush October 26, 2016

Upload: yifeng-jiang

Post on 14-Apr-2017

430 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Real-time Analytics in Financial

1 ©HortonworksInc.2011– 2016.AllRightsReserved

Real-timeAnalyticsinFinancialUseCase,ArchitectureandChallenges

蒋逸峰(しょういつほう/YifengJiang)SolutionsEngineer,Hortonworks@uprushOctober26,2016

Page 2: Real-time Analytics in Financial

2 ©HortonworksInc.2011– 2016.AllRightsReserved

TheYAP

MapbyGoogleMaps

Page 3: Real-time Analytics in Financial

3 ©HortonworksInc.2011– 2016.AllRightsReserved

http://www.wondermondo.com

Page 4: Real-time Analytics in Financial

4 ©HortonworksInc.2011– 2016.AllRightsReserved

Today’sMoney&FinancialService

moneymoney

FinancialService

0110010100 0110010100

Page 5: Real-time Analytics in Financial

5 ©HortonworksInc.2011– 2016.AllRightsReserved

EveryFinancialServiceisaBigDataService

à FinancialservicesareBIG

– Toobigtofail

à Everyfinancialserviceiseventuallyabigdataservice

– Numberoftransactions

– Numberofjobs

– Thirdpartydata

Page 6: Real-time Analytics in Financial

6 ©HortonworksInc.2011– 2016.AllRightsReserved

HowBigisBigDatainFinancial?

à Millionstobillionstransactionsperday

– Hundredstotensofthousandstransactionspersecond

à BigDatainbanking,payment,security,etc.

Page 7: Real-time Analytics in Financial

7 ©HortonworksInc.2011– 2016.AllRightsReserved

BigDataUseCaseinFinancial

http://www.forbes.com/sites/bernardmarr/2016/09/09/big-data-in-banking-how-citibank-delivers-real-business-benefits-with-their-data-first-approach/#7759859f75ed

Page 8: Real-time Analytics in Financial

8 ©HortonworksInc.2011– 2016.AllRightsReserved

WhyReal-timeAnalyticsinFinancial?

Canyoudetectfraudfrom

millionstobillionstransactionsperday

inreal-time ?

“Thecostsresultingfromtheseanomaliesisfareasiertocorrectifspottedquickly– or

evenbeforeithappens– throughpredictivemodeling.”

Page 9: Real-time Analytics in Financial

9 ©HortonworksInc.2011– 2016.AllRightsReserved

最近気になったニュース

http://gendai.ismedia.jp/articles/-/48832

http://mainichi.jp/articles/20161012/k00/00e/040/243000c

Page 10: Real-time Analytics in Financial

10 ©HortonworksInc.2011– 2016.AllRightsReserved https://roboteer-tokyo.com/archives/4415

Page 11: Real-time Analytics in Financial

11 ©HortonworksInc.2011– 2016.AllRightsReserved

Real-timeAnalyticsinFinancialUseCase,Architecture&Challenging

Page 12: Real-time Analytics in Financial

12 ©HortonworksInc.2011– 2016.AllRightsReserved

ASimpleUseCase– Real-timeSurveillance

Detectabnormal transactionsinStockExchange

à Triggeralertif

– Acustomerbuy/sellamountexceeds500MJPYin3minutes

à 300Ktransactionspersecond

à Abnormalmustbedetectedwithin10s

Alert

Page 13: Real-time Analytics in Financial

13 ©HortonworksInc.2011– 2016.AllRightsReserved

Real-time Surveillance Architecture v1

Trading Data(real-time)

Message Bus(Kafka)

Enricher(Storm)

Aggregator(Storm)

Master data, raw & aggregated trade(HBase+Phoenix)

Surveillance Rule Engine

Surveillance Alerts

master data look up

Insert trade(raw & aggregated)

Architecturev1

how?

Page 14: Real-time Analytics in Financial

14 ©HortonworksInc.2011– 2016.AllRightsReserved

Real-timeDataIngestion

à From transactiondatabase

– ChangeDataCapture(CDC)

– Notpracticalformostfinancialsystem

à Fromgatewaysystem

– Receivedatafromgatewaysystem

– SenddatatoKafka(asKafkaproducer)

Page 15: Real-time Analytics in Financial

15 ©HortonworksInc.2011– 2016.AllRightsReserved

Real-time Surveillance Architecture v1

Trading Data(real-time)

Message Bus(Kafka)

Enricher(Storm)

Aggregator(Storm)

Master data, raw & aggregated trade(HBase+Phoenix)

Surveillance Rule Engine

Surveillance Alerts

master data look up

Insert trade(raw & aggregated)

via CDC or gateway

Architecturev1

overhead?

Page 16: Real-time Analytics in Financial

16 ©HortonworksInc.2011– 2016.AllRightsReserved

Real-timeDataLookup

à Lowlatencydatastore

– Masterdata

– NoSQLdatabase:HBase (+Phoenix),Redis

à UselocalCache

– LRUcacheinStormbolts

Page 17: Real-time Analytics in Financial

17 ©HortonworksInc.2011– 2016.AllRightsReserved

Real-time Surveillance Architecture v2

Trading Data(real-time)

Message Bus(Kafka)

Enricher(Storm)

Aggregator(Storm)

Master data, raw & aggregated trade(HBase+Phoenix)

Surveillance Rule Engine

Surveillance Alerts

local master data cache

master data look up

Insert trade(raw & aggregated)

via CDC or gateway

Architecturev2– WithCache

exactly-once?

exactly-once?

Page 18: Real-time Analytics in Financial

18 ©HortonworksInc.2011– 2016.AllRightsReserved

Exactly-Once?

Messagedeliverysemantics

à At-most-once:maylosedatabutnoduplication

à At-least-once:nodataloss,butmayduplicate

à Exactly-once:nodataloss,noduplication

Page 19: Real-time Analytics in Financial

19 ©HortonworksInc.2011– 2016.AllRightsReserved

Exactly-Once!

NOrealexactly-oncemessagedeliveryindistributedsystem

à Thereisnosuchthingasexactly-oncedelivery

à Exactly-onceisanend-to-endrequirement

But… peoplelikeexactly-once,especiallyinfinancialservicesystem

Page 20: Real-time Analytics in Financial

20 ©HortonworksInc.2011– 2016.AllRightsReserved

Exactly-oncesemantics (betterphrase“effectively-once”)withat-least-once+

idempotentoperations

à Kafka&Stormguaranteeat-least-once

à De-duplicatebyensuringidempotentinyourapplication

Effectively-once

Page 21: Real-time Analytics in Financial

21 ©HortonworksInc.2011– 2016.AllRightsReserved

De-duplicationinwindowcomputation

Mostwindowcomputationscanachieveidempotent

à Examples:aggregation,counting,etc.

à De-duplicatemessagesinthewindow

– Usinglocalin-memorystatestore,e.g.aSetclass

Trading Eventsin Kafka

IDRegistry(local in-memory)

2. lookup trade_id

3. count de-duplicated events

5. output aggregated data

Aggregated Trade Data

Aggregator(Storm)

4. Insert trade_id

1. Pull data in 3m window

Page 22: Real-time Analytics in Financial

22 ©HortonworksInc.2011– 2016.AllRightsReserved

De-duplicationinnon-idempotentcomputation

à Exactly-onceinnon-idempotentcomputation

– Example:joincontinuousdatastreams

– Globalstatestorerequired:HBase,Redis

– BatchingcanhelpreducenumberofIDlookup.

à Exactly-onceisexpensive,avoiditatthebest

Click Logsin Kafka

IDRegistry(external NoSQL)

2. Lookup click_id

5. Output joined click

Joined Click Logs

Joiner(Storm)

4. Insert click_id

1. Pull data continuously

Query Logs

3. Lookup query

Page 23: Real-time Analytics in Financial

23 ©HortonworksInc.2011– 2016.AllRightsReserved

Real-time Surveillance Architecture v3

Trading Data(real-time)

Message Bus(Kafka)

Enricher(Storm)

Aggregator(Storm)

Master data, raw & aggregated trade(HBase+Phoenix)

Surveillance Rule Engine

Surveillance Alerts

master data look up

IDRegistry look up / insert,de-duplicate in window

local master data cache

IDRegistry(local in-memory)

Insert trade(raw & aggregated)

via CDC or gateway

Architecturev3– effectively-once

order?

Page 24: Real-time Analytics in Financial

24 ©HortonworksInc.2011– 2016.AllRightsReserved

LateMessages

http://www.slideshare.net/HadoopSummit/apache-beam-a-unified-model-for-batch-and-stream-processing-data

Page 25: Real-time Analytics in Financial

25 ©HortonworksInc.2011– 2016.AllRightsReserved

HandlingLateMessages

à Expectlatemessages

– Streamingapplicationneedstohandleoutoforderevents,e.g.,emitslatemessagestoaspecial

Kafkatopic

à Usesourcegeneratedtimestamp

à Storm’slatemessagesupportinwindowcomputation(BaseWindowedBolt)

– withTimestampField(StringfieldName)

– withLag(Durationduration)

Page 26: Real-time Analytics in Financial

26 ©HortonworksInc.2011– 2016.AllRightsReserved

CanItrustthedata?

Duplications!

Outoforderlatemessages!

Dataloss?

Page 27: Real-time Analytics in Financial

27 ©HortonworksInc.2011– 2016.AllRightsReserved

MonitorDataProcessingPipelineQuality

Approachestomonitordatapipelinequality

à Auditcompletenessà Outputduplicatedandlatemessagestologsforauditing.à Defineservicelevelobjective(SLO)ofdataqualityandmonitortheSLO.

Page 28: Real-time Analytics in Financial

28 ©HortonworksInc.2011– 2016.AllRightsReserved

DefineDataProcessingPipelineSLO

DesignpracticalSLOforthepipeline

à Process99.9999%eventswithinafewsecondsà and100%eventswithinafewhours

à At-most-oncesemanticsatanypointoftimeà Nearexactly-oncesemanticsinnearreal-timeà Andexactly-oncesemanticseventually

Page 29: Real-time Analytics in Financial

29 ©HortonworksInc.2011– 2016.AllRightsReserved

TheRuleEngine&TheArchitecture

Hundredsofrules

à Astocktradingpricejumpup/down>k%andtotalamount>m%inKminutes

à SingleATMcashwithdrawal>k%andnumberofATM>minKminutes

Manyoftheserulesfitintothissimplearchitecture!

RuleEngine

✓✗? ? ? ✓

Rulebaseonly?

Page 30: Real-time Analytics in Financial

30 ©HortonworksInc.2011– 2016.AllRightsReserved

Architecture-- withPredictiveAnalytics

Real-time Surveillance Architecture — with Predicate Engine

Trading Data(real-time)

Message Bus(Kafka)

Enricher(Storm)

Aggregator(Storm)

Master data, raw & aggregated trade(HBase+Phoenix)

Surveillance Rule Engine

Surveillance Alerts

master data look up

IDRegistry look up / insert,de-duplicate in window

local master data cache

IDRegistry(local in-memory)

Insert trade(raw & aggregated)

Financial Data Lake

Train Machine Learning Model

(Spark)load ML model

Surveillance Predicate Engine

(Storm)

via CDC or gateway

Page 31: Real-time Analytics in Financial

31 ©HortonworksInc.2011– 2016.AllRightsReserved

LifecycleofBigDataAdoptioninFinancialServiceIndustry

2.BusinessIntelligenceDataminingandvisualizationsoftwarethatrevealstrendsandusefulinformation

1.DataPoolingandProcessingConnectdataandcreatestructurebymerging,conditioningstreamsandarchiveddata

3.PredictiveAnalyticsAutomatedanalyticsintegratedintoworkflowthatunlockdatavalueandimproveprofitability

HadoopenabledBigDataPlatform

Customerstypically“StartSmall,ThinkBig”

Page 32: Real-time Analytics in Financial

32 ©HortonworksInc.2011– 2016.AllRightsReserved

THANKYOU