serene 2014 workshop: paper "combined error propagation analysis and runtime event detection in...
DESCRIPTION
SERENE 2014 - 6th International Workshop on Software Engineering for Resilient Systems http://serene.disim.univaq.it/ Session 4: Monitoring Paper 3: Combined Error Propagation Analysis and Runtime Event Detection in Process-driven SystemsTRANSCRIPT
1. Quanopt Ltd.
Combined Error Propagation Analysis and Runtime Event Detection in
Process-driven Systems
Gábor Urbanics, László Gönczy, Balázs Urbán, János Hartwig, Imre Kocsis
2. Quanopt Ltd.
Motivation and our contributions
Approach
Motivational example
Design time analysis
Runtime analysis
Future work and conclusion
3. Quanopt Ltd.
Motivation
Analyse complex IT system oDuring development
oDuring integration
oAt runtime
oBased on system models
Generate analysis for huge systems
Extendable
4. Quanopt Ltd.
Process modelling
Business process: oDirectly executed models (e.g. BPMN)
In a complex systems there are many supporting resources oWe present a method for business process and
supporting resources together oOnly general tools:
• Markov chains, Event trees • Too general, modelling could be hard
oDevelopment tools • Basic performance analysis • Business activity monitoring
5. Quanopt Ltd.
Contributions
Multi aspect modelling of complex (IT) systems oCustom, general process and resource model
Qualitative error propagation analysis oRoot cause and sensitivity analysis
oUsing finite domain constraint satisfaction problem
Runtime process monitoring
6. Quanopt Ltd.
Motivation and our contributions
Approach
Motivational example
Design time analysis
Runtime analysis
Future work and conclusion
7. Quanopt Ltd.
Approach
Process model
Resource model
Annotation model
System model Error Propagation
Analysis
Monitoring
[New Monitoring Rule]
[New Constraint]
Physical and Logical
Can be imported
Failure modes
Error propagation
behavior
Extra annotations for analysis
8. Quanopt Ltd.
Motivation and our contributions
Approach
Motivational example
Design time analysis
Runtime analysis
Future work and conclusion
9. Quanopt Ltd.
Motivational example
Design time analysis capabilities oSPOF analysis
oProcess-level effects of resource faults
oPropagating resource errors to the resource layer
10. Quanopt Ltd.
Case study
Large
transaction?
ReceiptN
Y
N
N
Y
Y
Client
Business Processes Layer
Flag & report
Laundering
suspected?
Record
transaction
Money
takeover
Form
processing
Pay
to $
Manual
laundering check
Perform full
check
Timeout
Client checked
earlier?
Legend
Activity Execution Path
11. Quanopt Ltd.
Process with resources
Large
transaction?
ReceiptN
Y
Backend Server 3
Compliance DB
AppServ4
N
N
Y
Y
AppServ3 VM
Customer & Account Identification
AppServ1 AppServ2
DB1 DB2
Backend Server 1 Backend Server 2Application Server
cluster
Client
Business Processes Layer
Supporting
Applications Layer
Physical
Resources Layer
Flag & report
Laundering
suspected?
Record
transaction
Money
takeover
Form
processing
Pay
to $
Manual
laundering check
Perform full
check
Timeout
DB
Client checked
earlier?
Cashier Module
Single
Hypervisor
Blade Server
Legend
Activity
ResourceDependency
Execution Path
12. Quanopt Ltd.
Large
transaction?
ReceiptN
Y
Backend Server 3
Compliance DB
AppServ4
N
N
Y
Y
AppServ3 VM
Customer & Account Identification
AppServ1 AppServ2
DB1 DB2
Backend Server 1 Backend Server 2Application Server
cluster
Client
Business Processes Layer
Supporting
Applications Layer
Physical
Resources Layer
Flag & report
Laundering
suspected?
Record
transaction
Money
takeover
Form
processing
Pay
to $
Manual
laundering check
Perform full
check
Timeout
DB
Client checked
earlier?
Cashier Module
Outage 1
Outage 1
Stuck 1
Single Fault 1
Outage 1
Stuck 1
Single
Hypervisor
Blade Server
Legend
Outage 1
Resource Setup Identifier
Failure Mode
Use Case Id
Activity
ResourceDependency
Execution Path
Single fault in physical layer
13. Quanopt Ltd.
Large
transaction?
ReceiptN
Y
Backend Server 3
Compliance DB
AppServ4
N
N
Y
Y
AppServ3 VM
Customer & Account Identification
AppServ1 AppServ2
DB1 DB2
Backend Server 1 Backend Server 2Application Server
cluster
Client
Business Processes Layer
Supporting
Applications Layer
Physical
Resources Layer
Flag & report
Laundering
suspected?
Record
transaction
Money
takeover
Form
processing
Pay
to $
Virtualized
HA Cluster
Manual
laundering check
Perform full
check
Timeout
Blade
Server Farm
DB
Client checked
earlier?
Cashier ModuleDegraded 2
Degraded 2
Failover 2
Single Fault 2
Delay-incurred Cost 2
Delayed 2
Delayed
Delay-incurred Cost 2
2
Legend
Outage 1
Resource Setup Identifier
Failure Mode
Use Case Id
Activity
ResourceDependency
Execution Path
Effects of a single fault
14. Quanopt Ltd.
Backwards error propagation
Large
transaction?
ReceiptN
Y
Backend Server 3
Compliance DB
AppServ4
N
N
Y
Y
AppServ3 VM
Customer & Account Identification
AppServ1 AppServ2
DB1 DB2
Backend Server 1 Backend Server 2Application Server
cluster
Client
Business Processes Layer
Supporting
Applications Layer
Physical
Resources Layer
Flag & report
Laundering
suspected?
Record
transaction
Money
takeover
Form
processing
Pay
to $
Virtualized
HA Cluster
Manual
laundering check
Perform full
check
Timeout
Blade
Server Farm
DB
Client checked
earlier?
Cashier ModuleSQLInjected 3
OK 3
OK 3
OK 3
SQLInjected 3
SQLInjected 3
Legend
Outage 1
Resource Setup Identifier
Failure Mode
Use Case Id
Activity
ResourceDependency
Execution Path
15. Quanopt Ltd.
Motivational example
Design time analysis capabilities oSPOF analysis
oProcess-level effects of resource faults
oPropagating process errors to the resource layer
16. Quanopt Ltd.
Motivation and our contributions
Approach
Motivational example
Design time analysis
Runtime analysis
Future work and conclusion
17. Quanopt Ltd.
Design time analysis
Error propagation rules oThrough the process’ execution path
oThrough dependencies
Translate model to constraint satisfaction problem (CSP)
Solution of the CSP provide the results oOf root cause analysis
oSensitivity analysis Process model
Resource model
Annotation model
System model Error Propagation Analysis
Monitoring
18. Quanopt Ltd.
What is CSP?
Constraint satisfaction problem oProblems defined mathematically
• A set of variables
• Constraints between them
A general solver can find the solution oA single or a list of variable layouts
oAll constraints satisfied
19. Quanopt Ltd.
Business Processes Layer
Form processingCustomer login
Legend
Activity Execution Path
Sample mapping to CSP
(Customer_login_run)
(Form_processing_run)
20. Quanopt Ltd.
Sample mapping to CSP
(Customer_login_delay & Customer_login_run)
(Form_processing_delay)
Business Processes Layer
Form processingCustomer login
Legend
Activity Execution Path
21. Quanopt Ltd.
Motivation and our contributions
Approach
Motivational example
Design time analysis
Runtime analysis
Future work and conclusion
22. Quanopt Ltd.
Runtime process monitoring
Runtime monitoring based on the same model
Rule based online event processing oEvents captured during the execution
oEach time a rule satisfied • Notification can be recorded
• Update of rule-specific process metrics
Coverage checks
Annotation-based rule synthesis
Process model
Resource model
Annotation model
System model Error Propagation Analysis
Monitoring
23. Quanopt Ltd.
Architecture of the prototype
•Process Model •Resource Model •Fault model
•Process Execution Log
•Diagnostic Rules •Propagation Rules •Tagging
•Dependability bottleneck •Process hotspots
•Runtime diagnostic metrics •Runtime alerts
24. Quanopt Ltd.
Motivation and our contributions
Approach
Motivational example
Design time analysis
Runtime analysis
Future work and conclusion
25. Quanopt Ltd.
Future work
System model and fault model „libraries”
Hierarchical modelling
Hierarchical/Incremental CSP evaluation
Uncertain failure modes
Back annotation of monitoring results oQualitative abstraction
Precise modelling frontend
Connection with optimisation methods
26. Quanopt Ltd.
Conclusion
Design time analysis of business processes oWith the use of a resource model
oRoot cause analysis
oDetermine weak points
Rule based runtime diagnostic oProcess monitoring based on event processing
oRule synthesis
oCoverage test