run-time dependability monitoring system for...
Post on 21-Mar-2018
217 Views
Preview:
TRANSCRIPT
Run-Time Dependability Monitoring System for Asterisk
Presented by:Balakrishnan Dasarathy & Hira Agrawal{das,hira}@research.telcordia.com
Telcordia TechnologiesPiscataway, NJ 08854, USA
Prepared for:AstriCon 2008Glendale, AZ
September 25, 2008
2
General Problem
What is dependability? Trustworthiness of a computer system such that reliance can justifiably
be placed on the service it delivers Example Attributes: Availability, security, performance, and safety Priorities of attributes may vary from one domain to another
Particular emphasis Highly available, secure, distributed applications, specifically the Asterisk
communications platform Run-time defense against defects and bad configurations
Defects leading to failures remain in software despite extensive testing and verification
These defects and bad configurations manifest themselves as security threats and performance, reliability and availability problems
Ensuring dependability of distributed and networked software systems
3
Goal
Detect and mitigate application vulnerabilities at run-time, whether malicious or accidental in nature, for complex communication applications
Aim high detection rate with low false positives
4
Address Real Dependability Issues
Example Scenarios DependabilityAttribute
Fraudulent call Security
Inappropriate call preemptionsSecurity
Performance
Denial of service attacks targeting a specific extension or all extensions in a given department
SecurityAvailability
Unusual deterioration in call processing rates even under light call loads, arising, for example, from some type of resource exhaustion
Performance
Call spoofing where, for example, a call has inappropriate caller id information associated with it Security
Phone lines in unusable, “hung” states Availability
5
Approach We use two complementary approaches: Specification and learning-
based Both based on monitoring events Specification-base approach
Needs to know a priori the vulnerability situation No false positive
Learning-based Can detect new potential problem situations Statistical learning
Clustering and signature generation for each cluster More training data leads less false positives May not be able to handle arbitrary looping
Automata generation Generation of specification from instances of good and bad behavior Can handle arbitrary looping Need to be able to deduce state information May give rise to both false negatives and positives
Leverage open source technologies
6
Implementation Architecture
6
Event Bus
Events Events
Asterisk
EventLib
Event Manager (Adapters)
Events (logs, measurements)
Specification Based Monitor
Events
(ECharts)
API for Control
Statistical Learner &
Monitor (Event sequence
clustering & signature
generation)
Reactor
actions
Events
Automata Learner & Monitor
(R/SAS)
(customdeveloped)
Events
7
Specification-Based Techniques: Fraudulent Call
Alice
Bob
Carol
DennisExternal SIP server
*
← Enterprise PBX External World →
EventsMonitored
All calls from a certain external number, to a certain internal extension, during certain after hours, get automatically diverted to an international number!
Asterisk Server
8
Fraudulent Call Specification Used
NewChannel
Destination Channel:
NewState
Dial[Exten != OrigExten]
Ready
Down
Linked
Ringing Up
NewChannel
Dialed
NewState
Hangup
NewState Hangup
Link Unlink
Up Unlinked
Unlinked
Source Channel:
Overrides the dialed, internal extension to a preprogrammed, international number, thereby converting a toll call into a “toll free” call for the caller!
Anomaly(Raise Alarm!)
Dial[Exten == OrigExten]Down
NewExtension
NewChannel
Destination Channel:
NewState
Dial[Exten != OrigExten]
Ready
Down
Linked
Ringing Up
NewChannel
Dialed
NewState
Hangup
NewState Hangup
Link Unlink
Up Unlinked
Unlinked
Source Channel:
Overrides the dialed, internal extension to a preprogrammed, international number, thereby converting a toll call into a “toll free” call for the caller!
Anomaly(Raise Alarm!)
Dial[Exten == OrigExten]Down
NewExtension
Dial Plan containing override[external]
; allow external clients to dial in exten => _.,1,Macro(sip-client,${EXTEN}) ;the following is a trojan horse that connects an external user to another external user exten => bob,1,Dial("SIP/dennis@mcintosh.research.telcordia.com")
9
Statistical Learning Technique
Event sequence clustering and signature generation for the clusters
Based on the longest common subsequence technique
The generated clusters characterize the problem space and coincide with reality nicely
Mapping of Asterisk Call Events to an AlphabetMapping of Asterisk Call Events to an Alphabet
10
"CHANNEL_EXTERNAL“ "A“"CHANNEL_INTERNAL" "B""STATE_BUSY" "C""STATE_RINGING" "D""STATE_RING" "E" "STATE_UP" "F""EXTEN_ANSWER" "G""EXTEN_DIAL" "H""EXTEN_GOTOIF" "I" "EXTEN_GOTO" "J""EXTEN_HANGUP" "K""EXTEN_MACRO" "L""EXTEN_NOOP" "M""EXTEN_SET" "N""EXTEN_SOFT_HANGUP" "O""EXTEN_VOICEMAIL_MAIN" "P""EXTEN_VOICEMAIL" "Q""EXTEN_WAIT" "R""HANGUP_NORMAL_CLEAR" "S""HANGUP_NOANSWER" "T""HANGUP_NOROUTE" "U""HANGUP_INTERWORKING" "V""HANGUP_UNKNOWN" "W""CALLERID" "a""DIAL" "b""HOLD" "c""LINK" "d""UNHOLD" "e""UNLINK" "f"
11
SCREE Plot for PCA of Normalized Edit Distancesbetween Event Sequence Strings
Var
ianc
es
010
020
030
040
050
060
0
-20 -10 0 10 20 30
-20
-10
010
Plot of Event Sequence Factor Scores on First Two PCs
PC1 Scores
PC
2 S
core
s
Factor Score Clusters on First Two PCs Tessellation Using Cluster Centroids
PC1 Scores
PC
2 S
core
s
-30 -20 -10 0 10 20 30 40
-20
-10
010
LCS Signature for Each Event Sequence Cluster
PC1 Scores
PC
2 S
core
s
-30 -20 -10 0 10 20 30 40
-20
-10
010
ELMNHBbaDFFdfSS
BELMNHAbaD
ELMNHBbaDNJ
BELMNHAbaW
BELMNHbaDFFdfSS
ELMNHBbaDSW
(1) E
LMN
HB
baD
FFdf
SS
(5) B
ELM
NH
baD
FFdf
SS
(2) B
ELM
NH
Aba
D
(4) B
ELM
NH
Aba
W
(3) E
LMN
HB
baD
NJ
(6) E
LMN
HB
baD
SW
24
68
10
Classification Tree
Hei
ght
New Call BELMNHAbaBbaVWNJNNNIORHKU Located within Tessellated Event Space
PC1 Scores
PC
2 S
core
s
-30 -20 -10 0 10 20 30 40
-20
-10
010
Fault Detection with Event Sequence AnalysisFault Detection with Event Sequence Analysis
ALERTALERT...... BELMNHAbaBbaVWNJNNNBELMNHAbaBbaVWNJNNNflagged byflagged by BELMNHAbaWBELMNHAbaW
Computed distance of 6.384 exceeds threshold Computed distance of 6.384 exceeds threshold distance of 6distance of 6
BELMNHAbaBbaVWNJNNNBELMNHAbaBbaVWNJNNN represents first 76% represents first 76% of new call event sequence of new call event sequence
BELMNHAbaBbaVWNJNNNIORHKUBELMNHAbaBbaVWNJNNNIORHKU
12
Automata Generation
s1
S2
S5
S3
S3
S4
S3
S2
S2
S3
S3
Given event sequences during learning:
s2s3s1s2s3s4s2s3s1s2s3s4s5s3s3s4s2s3s1s3s3s4s2s3
Find a “compact” machine that will accept the event sequencesUse minimal number of sequences
1313
FSM Generation Example
14
Status
We have all the three techniques working using an integrated architecture
They are robust prototype implementation Need more time and effort for maturity
15
Seeking Collaboration
Need real data from operations in one or more sites for training the system
Need your input/requirements for a better run-time monitoring system for Asterisk
Open to licensing our tool as a hosted solution
We are also to open to “open sourcing” our tool for community development
top related