towards the formal verification of data-intensive applications...
TRANSCRIPT
DICEHorizon2020Research&InnovationActionGrantAgreementno.644869http://www.dice-h2020.eu
FundedbytheHorizon2020FrameworkProgrammeoftheEuropeanUnion
TowardstheFormalVerificationofData-IntensiveApplicationsThrough
MetricTemporalLogicICFEM2016,TokyoNov17th,2016
FrancescoMarconi1,MarcelloM.Bersani1,Madalina Erascu2 andMatteoRossi11DEIB, Politecnico di Milano, Italy
2Institute e-Austria Timisoara and West University of Timisoara, Timisoara, Romania
Roadmap
§ ContextandMotivation• Data-IntensiveApplications• StreamingDIAs• Qualityissues
§ OurApproach• FormalModel• DecisionProcedure• Implementedtool:D-VerT
§ Conclusions• ExperimentalAnalysis• Futureworks
2
CONTEXTANDMOTIVATION
TowardstheFormalVerificationofData-IntensiveApplicationsThroughMetricTemporalLogic
3
DICEProject
o Horizon2020Research&InnovationAction(RIA)§ Quality-AwareDevelopmentforData-Intensiveapplications§ Feb2015- Jan2018,4MEurosbudget§ 9partners(Academia&SMEs),7EUcountries
4
Data-IntensiveApplications(DIAs)
o Needtoprocessdatabeing§ Massivelylargeinsize§ Complex§ Rapidlychanging
o DevotemostoftheirprocessingtimetoI/O,movementandmanipulationofdata.
o Relyonso-called"Bigdatatechnologies”
5
TheBigDataLandscape
o HeterogeneousTechnologies§ NoSQL,Spark,Hadoop/MapReduce,Storm,CEP,...
o Lackofstandardmethodologiesfordevelopmentandqualityanalysiso Differentproblemsfordifferent“kinds”ofDIA
§ Batchprocessing,streamprocessing,…o WedecidedtofocusonasubsetofDIA
§ streamingapplications
6
StreamingApplications
o SpecialcaseofDIAso Needtoprocessan(almost)continuousflowofinformation
§ Streamà unboundedsequenceoftuples(messages)
o Usuallydescribedbymeansofatopology§ Graphofcomputationscomposedof
• input nodes(sourceofdatastreams)• computational nodesàmanipulatedatastreams
o Calculate,Filter,Aggregate,Join,Talktodatabases,etc
7
QualityIssuesinStreamingDIAso Importantrequirementsforstreamingapplications
§ Latency§ Throughput
o Criticalpoints§ incorrectdesignoftimingconstraints§ nodefailures
o mightcause§ Highlatencyinprocessingtuples§ Memorysaturation
8
latency
throughput
Questions
o Howcanweanalyzeandverifythepresenceofthesekindsofquality(safety?)issues?§ Which(applicationdependent)propertiescouldweverify?
§ Associatedtowhichtechnology?§ Howcanwemodelthesystemandtheproperties?§ Howcanweautomatetheverification,providinga“userfriendly”supporttoDIAdesigners?
9
Stateoftheart
o Formalverificationofdistributedsystemsisamajorresearchareainsoftwareengineering
o FewworkstryingtoaddressformalverificationinthecontextofDIA§ Mainfocusonverifyingapplication-independentpropertiesrelatedtospecificframeworks
• ReliabilityandloadbalancingofMapReduce• ValidityofmessagingflowinMapReduce
§ nomodelingandverificationofapplication-dependent properties
10
PROPOSEDSOLUTION
TowardstheFormalVerificationofData-IntensiveApplicationsThroughMetricTemporalLogic
11
OurApproacho Focusonaspecificsetoftechnologies
§ Topology-basedstreamingapplicationso Identifyqualityissueso Selectareferencetechnologyà ApacheStormo Deviseaformalmodel
§ Allowingtocapturemeaningfulsystembehaviorandproperties
§ Havinganappropriatelevelofabstraction§ Usingaformalismthatenablesautomaticverification
o Defineatool-supportedmechanismforformalverification§ Startingfromhighlevelapplicationdescription
• Initialversion:JSONformat• Currentversion:annotatedUMLClassdiagram
12
✅
✅
ApacheStorm
o OpenSourceDistributedStreamProcessingSystemo Analytics,LogEventprocessing,etc..o Reliability,at-least-onesemanticso Wideadoptioninproductiono InStormtopologies
§ Sourcenodescalledspouts§ Computationalnodescalledbolts
13
Modelingchoices– 1/2o Allowingforthedefinitionoftopologiesinacompositionalway
§ Formalizebehaviorofspoutsandbolts§ Usethemasbuildingblocksfortopologies
o Abstractingaway§ Deploymentdetails§ Messagecontents§ Multi-layeredmessagebuffers
14
boltspoutSpout Bolt
Bolt
Spout Bolt
Modelingchoices– 2/2
o Relevantfeaturesmodeledforeachcomponent§ evolutionofthestates§ timingconstraints§ evolutionofitsmessagebuffer(inputqueue)
o Propertiestoverify§ “allboltqueueshaveaboundedoccupationlevel”
15
ParallelismFunctionality Proc_timeQueue_threshold:
avg_emit_rate:emit_amount:
Timedcounternetworksmodelo FormalmodelbasedonCLTLoc enrichedwithcountersdescribing:
§ stateevolutionofcomponents
§ timingconstraints
§ quantitiesoftuplesmovingthroughoutthetopology
16
⊆ CLTLoc✅
à ⊆CLTLoc✅
⊈CLTLoc🚫
àLTL
àcounters!
TimedcounternetworksmodelVerifyingtheproperty
o Weformulatedthepropertycheckasasatisfiability problem§ BoundedSatisfiabilityChecking(BSC)
o Goal§ Findanultimatelyperiodic traceviolatingboundednessproperty
• Havingtheform𝛼(𝑠𝛽))
• 𝛼à prefix• 𝑠𝛽à suffixrepeatableinfinitelymanytimes(loop)
o Rationale§ Ifthereisagrowingtrendintheloopà unboundedincreaseadinfinitum
17
Decidabilityissues
o CLTLoc1,2§ SATis decidable anddefined overtimed words§ Computed through Bounded Satisfiability Checking (BSC)§ Implemented procedurebased onSMT3
• UsingZot formal verification tool
o Decidabilityresultscannot beextendedtoCLTLoc +counters• ContainsCLTLoverquantifier-freePresburger formulae4
o Wedefinedapartialassessmentmethodtoguaranteethesoundnessofthesatisfiabilityoutcome.
18
1. AToolforDecidingContinuosTimeMetricTemporalLogic,Bersani,Rossi,SanPietro,20132. AnSMT-basedapproachtosatisfiabilitycheckingofMITL,Bersani,Rossi,SanPietro,20133. ConstraintLTLSatisfiabilityCheckingwithoutAutomata,Bersanietal.,20124. TheeffectsofboundingsyntacticresourcesonPresburger LTL.Demri,Gascon,2006
DecisionProcedure
o Given§ CLTLoc +countersformula§ aboundk
o Trytobuildastructure𝛼𝑠𝛽𝑠 with 𝛼𝑠𝛽𝑠 = 𝑘§ Ifstructureisnotfound(UNSAT)
• Noultimatelyperiodicmodelsoflength<=kexist§ Ifstructureisfound(SAT)
• Performtheassessmenttodetermineitsextensibilitytoinfinitemodel𝜶 𝒔𝜷 𝝎
o Ifchecksucceedsà outcomeisSAT(𝛼𝑠𝛽 iscounterexample)o Ifcheckfailsà spuriousresult,mustlookforanotherstructure
19
Assessmentmethod
o Providessufficientconditionforextendingadinfinitumboundedassignmentofvaluestocounters
o Intuitively,itchecksifintheloopthevalueofeachvariable𝑦 hasthesameshape§ Itmightdifferbyanon-negativeoffsetΔ2
20
s𝛽(loop1)s𝛽(loop2)
α
time
D-VerT – DICEVerificationToolInitialversion(April2016)
21
D-VerT - DICEVerificationToolCurrentVersion
22
https://github.com/dice-project/DICE-Verification/wiki
Experimentalresults
o Validationthroughopenandclosedsourceusecases§ Meaningfulqualitativeresultsinidentifyingcriticalpointsintopologydesign
§ Executiontimestronglydependsonthesizeofthetopologyandontheconfigurationsofsinglecomponents
23http://dice-project.github.io/DICE-Verification/
Usecase:FocusedCrawlerTopologyUMLDesign
24
o TypicalusageexampleofStormo Fetchingandindexingofmediaitemso Fromwebsources
Usecase:FocusedCrawlerTopologyOutputtrace
25
CONCLUSIONS 26
Wrapup
o Approachfortheautomatedverificationoftopology-baseddata-intensiveapplications.§ Definitionofaformalmodel(TCN)
• ExtendingCLTLoc metrictemporallogicwithdiscretecounters
• Enablingautomaticverificationofsafetyproperties
§ Definitionofatool-supportedmechanism• Toautomaticallygenerateformalmodelsfromhighlevelapplicationdescriptionandrunverification
§ Definitionofsufficientconditionsforguaranteeingthesoundnessoftheverificationresults
27
Futureworks
o Identificationandverificationoffurtherproperties
o Modelingdifferenttechnologies§ Spark,CEP,…
o Newresultsonthecorrectnessandcompletenessoftheanalysisofcounternetworks
o Toolandmodelimprovements
28
Thankyou
29
Starting formalism:Constraint LTLoverclocks- CLTLoco Extension of LTLwith TAclocks,where formulae are
§ Propositions (lightOn,lightOff,buttonOn,buttonOff)§ Constraints over clocks (c=0,c<1,…)§ LTLformulae
• X(φ)• φ U ψ
o CLTLoc1,2§ SATis decidable anddefined overtimed words§ Computed through Bounded Satisfiability Checking (BSC)§ Implemented procedurebased onSMT3
• UsingZot formal verification tool
1. AToolforDecidingContinuos Time Metric Temporal Logic,Bersani,Rossi,SanPietro,20132. AnSMT-basedapproachtosatisfiabilitycheckingofMITL,Bersani,Rossi,SanPietro,20133. ConstraintLTLSatisfiabilityChecking without Automata,Bersaniet al.,2012
buttonOn ->X(lightOn U buttonOff)
CLTLoc +counters
o V isafinitesetofvariables overℕo Cisafinitesetofclockvariables overℝo AP isafinitesetofatomicpropositionso 𝜃 areQFPformulaeoverterms𝛼 ≔ 𝑦|𝑋𝑦 where𝑦 ∈ 𝑉
o CLTLoc withcountersformulaearedefinedasfollows:
𝜙 ≔ 𝑝 𝑥~𝑐 𝜃 𝜙 ∧ 𝜙 ¬𝜙 𝑋𝜙 𝑌𝜙 𝜙𝑈𝜙 𝜙𝑆𝜙o where:
§ 𝑝 ∈ 𝐴𝑃, 𝑥 ∈ 𝐶, 𝑐 ∈ ℕ, ~ ∈ <,=§ X,Y,U,SaretheusualLTLoperators.
31
Relatedformalisms
o Timedcounternetworksaremainlyinspiredfrom:§ VectorAdditionSystemswithStates(VASS)
• Subclassofcountersystems• Lossy VASSà takeintoaccountnumberofmessages,nottheirorder
• Onlytheoreticalanalysis,donotenableautomaticverification• Timedcounternetworksallowtospecifytimingconstraintsviaclocks
§ TimedPetriNets• Transitionsfiringwithurgentsemantics• Firingconditionsandnumberoftokenconsumedexpressibleinaquiterigidway
• Forourmodelweneededmoreflexibilityo Possible occurrenceofeventso Expressslightlymoreelaboratefiringconditions
32