in-memory computing brings operational intelligence to ......reactivex reduces latency for each...
TRANSCRIPT
In-MemoryComputingBringsOperationalIntelligencetoBusinessChallenges
DR.WILLIAML.BAIN
SCALEOUT SOFTWARE,INC.
• Dr.WilliamBain,Founder&CEOofScaleOutSoftware:• Email:[email protected]
• Ph.D.inElectricalEngineering(RiceUniversity,1978)
• Careerfocusedonparallelcomputing– BellLabs,Intel,Microsoft
• 3priorstart-ups,lastacquiredbyMicrosoftandproductnowshipsasNetworkLoadBalancinginWindowsServer
• ScaleOutSoftwaredevelopsandmarketsIn-MemoryDataGrids,softwarefor:
• Scalingapplicationperformancewithin-memorydatastorage
• Analyzinglivedatainrealtimewithin-memorycomputing
• Thirteen+yearsinthemarket;450+customers,12,000+servers
AbouttheSpeaker
2In-MemoryComputingSummitEurope2018
Agenda
Theevolutionofin-memorycomputingforoperationalintelligence:• Thefoundation:in-memorydatagrids(IMDGs)• Thechallenges:usingIMDGsforcachingwithparallelquery• Data-parallelcomputing:deliveringoperationalintelligence
• Examplesinfinancialservices
• Thenextstep:data-parallelcomputingwithmethodinvocations• Examplesinfinancialservicesandcablemedia
• Evolutionintostream-processing:thedigitaltwinmodel• Examplesinecommerce,logistics,IOT,medicaldevicetracking,andmore
• Combiningstream-processinganddata-parallelcomputingwithanIMDG
3In-MemoryComputingSummitEurope2018
In-MemoryDataGrid(IMDG)
IMDGsprovidefast,scalable,distributedin-memorydatastorage.WhatisanIMDG?• IMDGstoreslive,object-orienteddata:
• Usesakey/valuestoragemodelforlargeobjectcollections.
• Mapsobjectstoaclusterofcommodityserverswithlocationtransparency.
• Haspredictablyfast(<1msec.)dataaccessandupdates.• Designedfortransparentscalingandhighavailability
Logicalview
Physicalview
4
IMDG
In-MemoryComputingSummitEurope2018
Basic“CRUD”APIs:• Create(key, obj, tout)• Read(key)• Update(key, obj)• Delete(key)
Objectkey
WideRangeofApplicationsforIMDGs
5
Vertical SelectedUseCases
Financial&Insurance
• Onlinebanking•Brokerage• Loanapps•Trading•Dataservices• Portfolioanalysis•Riskmanagement
•Positionupdating
Ecommerce• Shoppingcarts•Userstate
• Rewardprograms• Instantoffers• Productcatalogs• Salespikes
OnlineServices• Patientrecords• SaaS• Userpreferences
• Serviceprocessing• Onlineeducation• Internetprovisioning• Legalanalysis
Entertainment&Communication
• Onlinegamestate• Streamingmedia• Onlinebillpay• Servicescatalog
• Gamblinganalysis
Travel&Transportation
• Ticketingsystem• Reservationsystem• Reservationanalysis
In-MemoryComputingSummitEurope2018
IMDGsaretypicallyusedasadistributedcache.
UsingIMDGasaCache:ParallelQuery
• Usersoftenhaveadatabasemindsetandrelyonquery.• Queryretrievesasetofobjectswithselectedpropertiesand/ortags.
• Usesallgridserverstoaccessquerieddata.• Challenges:
• Cost=O(N)forNservers(vs.O(1)forCRUD)• Cancreateexcessivenetworktraffic
• Intermediatesolution:filtermethods:• RunBooleanmethodonobjectstorefinesearch.• Example(C#):(selectstockswhereregion=NW)
.Filter(EvalPriceChanges());• Reducesnumberofobjectsreturnedtoclient.• Providesabridgetodata-parallelcomputing.
6
QueriestheIMDGinparallel.
Mergesthekeysintoalistforthe
client.
In-MemoryComputingSummitEurope2018
TheNextStep:OperationalIntelligence(OI)
Goal:Provideimmediate (sub-second)feedbacktoasystemhandlinglivedata.
7
• AnIMDGhostslivedataandcanintrospectonthatdatainrealtime.
• Thisdeliversmuchgreatervaluethatjustusingthegridasacache.
• Afewexampleusecasesrequiringimmediatefeedbackwithinalivesystem:
• Ecommerce:personalized,real-timerecommendations• Healthcare:patientmonitoring,predictivetreatment• Equitytrading:minimizeriskduringatradingday• Reservationssystems:identifyissues,reroute,etc.• Creditcards&wiretransfers:detectfraudinrealtime• IoT,smartgrids:predictiveanalytics&optimization
In-MemoryComputeEngine
In-MemoryComputingSummitEurope2018
OperationalIntelligenceReal-time
LivedatasetsGigabytestoterabytes
In-memorystorage
Sub-secondstosecondsBestuses:
• Trackinglivedata• Immediatelyidentifyingtrendsandcapturingopportunities
• Providingimmediatefeedback
OperationalvsBusinessIntelligence
8
BusinessIntelligenceBatch
StaticdatasetsPetabytestoexabytes
Disk-basedstorage
MinutestohoursBestuses:
• Analyzingwarehouseddata• Miningforlongtermtrends
BigDataAnalytics
LiveSystems DataCenterOI BI
In-MemoryComputingSummitEurope2018
ValueofOperationalIntelligence
9
Businessevent
Data captured/updated
Intelligencedelivered
Actiontaken
Opportunityexpires
ValuecapturedwithBILostopportunityPotentialValue
In-MemoryComputingDeliversOIinThreeWays:1. Captureslivedatawithextremelylowlatency.2. Continuouslyanalyzesalivesystemtoidentifyopportunities.3. Makesautomateddecisionsbeforethemomentislost.
Opportunityexpires
Opportunitycost
Businessevent
Data captured/updated
Intelligencedelivered
Actiontaken
ValuecapturedwithOILostopportunity
In-MemoryComputingSummitEurope2018
Data-ParallelComputingforOI
IMDGcanhavesimple,fastAPIsforscalable,data-parallelcomputing:• “ParallelMethodInvocation”(PMI)
• FollowsIMDG’sobject-orientedstoragemodel.• Definesdata-paralleltasksasclassmethods.• Runsclassmethodsinparallelacrossclusterandperformsdistributedmergeofresults.
• Advantages:• Usesstandard,wellunderstood“eval/merge”paradigmfromparallelsupercomputing.
• Takesadvantageofcluster’sserversandcores.• Movesthecodetothedata;avoiddelaysduetodatamotion.
• Canbeusedtobuildmorecomplexdata-paralleloperators(e.g.,MapReduce)
10
AnalyzeData(Eval)
CombineResults(Merge)
In-MemoryComputingSummitEurope2018
ExampleofPMIinFinancialServices
Back-testingstocktradingstrategiesonstockhistories:• Awidelyusedapplication-“embarrassinglyparallel”
• HostedanIMDGinAmazonEC2using75serversholding1TBofstockhistorydatainmemory
• IMDGhandledacontinuousstreamofupdates(1.1GB/s)
• Results:analyzed1TBin4.1seconds(250GB/s).
• Observednear-linearscalingasdatasetandupdaterategrew.
11In-MemoryComputingSummitEurope2018
TheBenefitofComputingintheIMDG
• UsingIMDGasacachecausesdatamotionforeveryoperation.
• Networkaccesscreatesabottleneckthatlimitsthroughputandincreaseslatency.
• Avoidingdatamotionenableslinearlyscalablethroughputforgrowingworkloads=>predictable,lowlatency.
12In-MemoryComputingSummitEurope2018
Data-ParallelExecutionSteps
• Eval phase:eachserverquerieslocalobjectsandrunseval andmergemethods:
• Accessinglocalobjectsavoidsdatamotion.• Completeswithoneresultobjectperserver.
• Mergephase:allserversperformbinary,distributedmergetocreatefinalresult:
• Mergerunsinparalleltominimizecompletiontime.
• Returnsfinalresultobjecttoclient.
13In-MemoryComputingSummitEurope2018
ExampleinFinancialServices
• Goal:trackmarketpricefluctuationsforahedgefundandkeepportfoliosinbalanceacrossmarketsectors.
• Solution:• Keepportfoliosofstocks(longandshortpositions)inanobjectcollectionwithinIMDG.
• Collectmarketpricechangesinone-secondsnapshots.
• Defineamethodwhichappliesasnapshottoeachportfolioandoptionallygeneratesanalerttorebalance.
• Performperiodic(1/sec)parallelmethodinvocationsonthecollectionofportfolios.
• Combinealertsinparallelusingaseconduser-definedmergemethod.
• ReportalertstoUIeverysecondforfundmanager.
14In-MemoryComputingSummitEurope2018
OutputsContinuousAlertstotheUI
• PMIrunseverysecond;itcompletesin350msec.andimmediatelyrefreshesUI.
• Encapsulatesproprietaryanalysisalgorithm.
• UIalertstradertoportfoliosthatneedrebalancing.
• UIallowstradertoexamineportfoliodetailsanddeterminespecificpositionsthatareoutofbalance.
15In-MemoryComputingSummitEurope2018
OIExampleinLogisticsGoal:Matchorderstoinventoryinrealtimeandreportissuesbeforecommittingordersforperishablegoods.• Customer’sapproach:
• UseIMDGasacachewithordersandinventorychangesstoredasobjectsbySKUinseparatenamespaces.
• Performnightlyreconciliation;foreachSKU:• QueryallordersbySKU.• QueryinventorychangesbySKU.• Runproprietaryreconciliationalgorithmandgenerate
alerts.
• Problems:• Verypoorperformance(1+hours)duetoparallelqueriesanddatamotion
• Resultsnotavailableinrealtime
16In-MemoryComputingSummitEurope2018
Data-ParallelSolution
Keychallengefordata-parallelcomputing:choosetherightdomain• Solution:StoreddatabySKUintheIMDGandperformdata-parallelreconciliationonallSKUobjects.
• Mergeoperationreturnsalerts.
• Advantages:• Pre-joinedordersandinventorytoavoidqueries.
• Avoidednetworkbottleneckfromsendingobjectstoexternalcomputecluster.
• Eliminatedneedforacomputecluster.• Reducedreconciliationtimeto<1minute.• Enabledreal-timealerting.
17In-MemoryComputingSummitEurope2018
SingleMethodInvocation(SMI)• Anotherformofdata-parallelcomputation• Createdforafinancialservicesapplicationperformingcolumn-orientedcomputations.
• Invokesuser-definedmethodonasingle,selectedobject:
• Shipsparameterstoinvokingmethod.• Enablesobjecttobeupdated.• Efficientlyreturnsresultstoinvokingclient.
• Benefits:• Avoidsdatamotionto/fromclient.• Encapsulatesapplicationcodeandstagescodeinthegrid.
• Minimizeslatencytoinvokemethodandreturnresults.
18
GridServers
ExampleofaColumn-OrientedComputationwithSMI
In-MemoryComputingSummitEurope2018
HowanIMDGRunsComputations• Eachgridhostrunsaworkerprocesswhichexecutesapplication-definedmethodsonstoredobjects.
• Thesetofworkerprocessesiscalledaninvocationgrid(IG).
• IGusuallyrunslanguage-specificruntimes(JVM,.NET).
• IMDGcanshipcodetotheIGworkers.
• KeyadvantagesforIGs:• Followsobject-orientedmodel.• Avoidsnetworkbottlenecksbymovingcomputingtothedata.
• LeveragesIMDG’scores&servers.
InvocationGrid
19
In-MemoryDataGrid
In-MemoryComputingSummitEurope2018
IMDGExecutesData-ParallelMethodsMethodexecutionimplementsabatchjobonanobjectcollection:• Clientrunsasinglemethodonallobjectsinacollection.
• Executionrunsinparallelacrossthegrid.
• Resultsaremergedandreturnedtotheclient.
20
Object
In-MemoryComputingSummitEurope2018
IMDGExecutesMethodsforSingleObjects
Methodexecutionrunsindependentlyforeachrequest:• IMDGdirectsrequesttoaspecificobjectforexecutionwithlowlatency.
• IMDGexecutesmultiplemethodsinparallelforhighthroughput.
21
Object
In-MemoryComputingSummitEurope2018
OIExample:TrackingCableViewers
• CableCompany’sGoals:• Makereal-time,personalizedupselloffers.• Immediatelyrespondtoserviceissues&hotspots.• Trackaggregatebehaviortoidentifypatterns,e.g.:
• Totalinstantaneousincomingeventratefromset-topboxes• Mostpopularprogramsand#viewersbyzipcode
• Requirements:• Trackeventsfrom10Mset-topboxeswith25Kevents/sec(2.2B/day).
• Correlate,cleanse,andenricheventsperrules(e.g.ignorefastchannelswitches,matchchannelstoprograms)within5seconds(fromcurrent6+hours).
• Refreshaggregatestatisticsevery10seconds.
©2011TammyBrucepresentsLiveWire
22In-MemoryComputingSummitEurope2018
• Eachset-topboxisrepresentedasanobjectintheIMDG• Objectholdsraw&enrichedeventstreams,viewerparameters,andstatistics
ImplementationwithbothSMIandPMI
23
• IMDGcapturesincomingeventsbyupdatingobjects
• IMDGusesbothformsofdata-parallelcomputationto:
• immediatelyupdateboxobjectstogeneratealertstorecommendationengineusingSMI,and
• continuouslycollectandreportglobalstatisticsusingPMIacrossboxobjects
• DemonstratestheuseofanIMDGforstreamprocessing.
In-MemoryComputingSummitEurope2018
BuiltaPOCtodemonstrateperformanceandscalabilityforcablevendor:• BasedonasimulatedworkloadforSanDiegometropolitanarea
• Continuouslycorrelatedandcleansedtelemetryfrom10Msimulatedset-topboxes(usingasyntheticloadgenerator)
• Processedmorethan30Kevents/second• Enrichedeventswithprograminformationeverysecond
• Trackedaggregatestatistics(e.g.,top10programsbyzipcode)every10seconds
SyntheticWorkloadDemonstration
Real-TimeDashboard
24In-MemoryComputingSummitEurope2018
IMDGProcessesEventswithReactiveX
ReactiveX reduceslatencyforeachrequestcomparedtoSMI:• IMDGdirectseventstoaspecificobjectforhandlingbyReactiveXobservers.
• IMDGhandlesmultipleeventsinparallelforhighthroughput.
25
Object
In-MemoryComputingSummitEurope2018
StreamProcessingforFastQueriesChallenge:Howtoquerystockhistoriesthatarebeingcontinuouslyupdatedfromatickerfeed?• Requirements:
• Mustholdallpricesfromtoday’sandyesterday’stickerfeed.
• Mustsupport1000sofsimultaneousqueries.• Eachquerymustseethelatestpriceupdates.• Queriesmayhavehotspotsduetopopularstocks.
• Currentsolution:• Replicateallstockpricedataacross12+databasesforsimultaneousaccess.
• Useacomputecluster.• Notclearhowtokeepdatabasescoherent;expensive
26In-MemoryComputingSummitEurope2018
SolutionUsinganIMDG
• Storeeachstock’spricehistoryasapairofobjects(today’sandyesterday’sprices).• Applyupdatestotoday’sstockpricesasstreamingevents.
• Querystockpriceswithfastkey/valuereads.• Clientcacheshostlatestvaluesforobjects;using2+objectsperstockminimizesdatamotion.
• Implementaspecialreadmodethatalwaysreadscacheandasynchronouslyappliesupdates.
• Advantages:• Allquerieshavefast,predictablelatency.
• Hotspotsdonotaffectlatency.
27In-MemoryComputingSummitEurope2018
StatefulStream-ProcessingonanIMDG• IMDGiswellsuitedtousethe“digitaltwin”modelcreatedbyMichaelGrieves;popularizedbyGartner.
• Thismodelrepresentseachdatasourcewithagridobjectthatholds:
• Aneventcollection• Stateinformationaboutthedatasource• Logicforanalyzingevents,updatingstate,andgeneratingalerts
• Benefits:• Offersastructuredapproachtostatefulstreamprocessing.• Automaticallycorrelatesincomingeventsbydatasource.• Integratesallrelevantcontext(events&state).• Enableseasydeploymentofapplication-specificlogic(e.g.,ML,rulesengine,etc.)foranalysisandalerting.
• Providesdomainforaggregateanalysisandfeedback.
28In-MemoryComputingSummitEurope2018
Data-parallelanalysis
Example:TrackingaFleetofVehicles
• Goal:Tracktelemetryfromafleetofcarsortrucks.• Eventsindicatespeed,position,andotherparameters.
• Digitaltwinobjectstoresinformationaboutvehicle,driver,anddestination.
• Eventhandleralertsonexceptionalconditions(speeding,lostvehicle).
• Periodicdata-parallelanalyticsdeterminesaggregatefleetperformance:
• Computesoverallfuelefficiency,driverperformance,vehicleavailability,etc.
• Canprovidefeedbacktodriverstooptimizeoperations.
29In-MemoryComputingSummitEurope2018
Example:Heart-RateWatchMonitoring
Tracksheart-rateforalargepopulationofrunners:• Heart-rateeventsflowfromsmartwatchestotheirrespectivedigitaltwinobjectsforanalysis.• Theanalysisuseswearer’shistory,activity,andaggregatestatisticstodeterminefeedbackandalerts.
30In-MemoryComputingSummitEurope2018
DataParallelAnalysisAcrossallDigitalTwins
• UsesIMDG’sin-memorycomputeenginetocreateaggregatestatisticsinrealtime.
• Resultscanbereportedtoanalystsandupdatedeveryfewseconds.
• Resultscanbeusedasfeedbacktoeventanalysisindigitaltwinobjectsand/orreportedtousers.
31In-MemoryComputingSummitEurope2018
Example:EcommerceRecommendations
Goal:Deliverreal-timerecommendationsto1000sofonlineshoppers.• Eachshoppergeneratesaclickstreamofproductssearched.
• Stream-processingsystemmust:• Correlateclicksforeachshopperandassociatewithshopper’spreferences.
• Maintainahistoryofclicksduringashoppingsession.
• Analyzeclickstocreatenewrecommendationswithin100msec.
• Analysismust:• Takeintoaccounttheshopper’spreferencesanddemographics.
• Createanduseaggregatefeedbackoncollaborativeshoppingbehavior.
32In-MemoryComputingSummitEurope2018
ProvidingRecommendationsinRealTime
• Requiresscalablestream-processingtoanalyzeeachclickandrespondin<100ms:• Acceptinputwitheacheventonshopper’spreferences.• Provideaggregatefeedbackonbest-sellingproducts.
33In-MemoryComputingSummitEurope2018
ImplementationUsingDigitalTwinObjects
Tracksshoppersasdigitaltwinsandmakesreal-timerecommendations:• EachDTobjectholdsclickstreamofbrowsedproducts,preferences,anddemographics.
• Eventhandleranalyzesthisdataandimmediatelyupdatesrecommendations.
• ProductdescriptionsarekeptinsecondobjectcollectionintheIMDG.
• Thesedescriptionsareuploadedfromtheproductdatabase.
• Periodicdata-parallel,batchanalyticsacrossallshoppersdetermineaggregatetrends:
• Examplesincludebestsellingproducts,averagebasketsize,etc.
• Usedforanalysisandreal-timefeedback
34In-MemoryComputingSummitEurope2018
ProvidingKeyAggregateMetrics
• Periodicdata-parallelcomputationgeneratesaggregatestatisticsacrossDTobjectsforallshoppers.
• Tracksreal-timeshoppingbehavior.• Chartskeypurchasingtrends.• Enablesmerchandizertocreatepromotionsdynamically.
• Aggregatestatisticscanbesharedwithshoppers:
• Allowsshopperstoobtaincollaborativefeedback.
• Examplesincludemostviewedandbestsellingproducts.
35In-MemoryComputingSummitEurope2018
Wrap-Up
In-memorycomputingenablesoperationalintelligence.• Challenge:usinganIMDGsolelyasacachedoesnottakeadvantageofitsabilitytointrospectonlivedataandreturnresultsinrealtime.
• UserstendtoviewIMDGsasin-memorydatabasesandrelyheavilyonqueries.
• Operationalintelligencecancapturenewopportunitiesthatboostcompetitivevalue.• Data-parallelcomputingforOIinanIMDGoffersseveralkeybenefits:
• Itboostsapplicationperformancebymovingcodetothedata,avoidingnetworkbottlenecks.• Itcanbeimplementedusingobject-orientedconstructs,whichcleanlyseparateapplicationcodefromtheIMDG’sorchestrationmechanisms.
• Itdeliversresultsinrealtimeforlivedata.
• StreamprocessinginanIMDGallowsdeeperintrospectionthanpreviouslypossible:• IMDGsprovideanexcellentplatformforthedigitaltwinmodel,whichhasmanyapplications.
36In-MemoryComputingSummitEurope2018
www.scaleoutsoftware.com
In-MemoryComput ing for Operat ional Inte l l igence