event driven solution to monitor datacenters through continuous queries and machine learning
DESCRIPTION
Our presentation made at DEBS'10, held in Cambridge, UK, in July, 2010. Describes the solution to monitor datacenters through CEP and Machine Learning.TRANSCRIPT
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
HOLMES: An event-driven solution to monitor data centers through continuous queries and
machine learning
Pedro Henriques dos Santos TeixeiraRicardo Gomes Clemente
Ronald Andreu KaiserDenis Almeida Vieira Jr
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Topics
• Motivation• Use Case• The Solution
• Overview• System architecture• CEP• Machine learning• CEP & Machine learning integration• Visualization and User Interface
• Conclusion
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Motivation
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Motivation
• Non-stop growing environment, dynamic• Understand our environment• Too many dependencies• Can't afford downtime
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Motivation
• Monitoring can be tricky• Precede the inevitable and try to avoid chaos• 1.2K servers• 14K+ monitored items• Correlation
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Use Case
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Use Case
• Big Brother Brazil• New world record• 151 million votes in 2 days• Peaks of 13500 votes per minute (~220 v/s)• DDoS atack detected
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Overview
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
The System Architecture
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
HOLMES
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
System architecture – modules and its purposes
• CEP module: known problems• Machine learning module: unknown problems• Visualization module: situational awareness• Storage: events history/log
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP
• Reaction to incidents in real-time is a requirement for data center monitoring
• Expression of abstract rules related to the business is desirable
• Correlation of events through user-defined queries
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP - Esper
• Open source CEP Implementation
• Supports an EPL
• High throughput, requirement in our context
• Ease of embed in our application
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP – simple example
SELECT avg(response_time) FROM HTTP.win:time(5 min)
E1E5 E4 E3 E2 E1
events stream
Ei
response time...
5 min
4 t.u. 3 t.u. 2 t.u. 3 t.u. 5 t.u.
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
If the number of sessions increase in 10% in a 3 minute window and the
average of cpu's usage of the web farm do not
increase in 5% and the number of slow queries in
the database is higher than 10, then we have achieved a
database contention situation. Alarm it!
If the number of sessions increase in 10% in a 3 minute window and the
average of cpu's usage of the web farm do not
increase in 5% and the number of slow queries in
the database is higher than 10, then we have achieved a
database contention situation. Alarm it!
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Machine learning“any signal, which is totally predictable, carries no information” - Shannon
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Machine learning characteristics
• FRAHST learns to detect anomalous behaviors
• Unsupervised streaming algorithm
• Linear complexity to the number of data streams
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
FRAHST, state-of-the-art
For further information, see reference [12] in our paper.
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Anomaly detection
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
CEP & Machine Learning Integration
• Users choose the data streams to be correlated
• CEP module aggregates events
• Notifications are raised whether a rank variance is detected
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Visualization and User Interface
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Visualization and User Interface
• Users can create Perspectives
• Real-time dashboard personalizations
• Events history visualization
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Dashboards
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
Conclusion
• Successfully implementation and acceptance in a real use case
• New challenges• improving situational
awareness & prediction• Make creation of queries
more intuitive
DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom
This presentation:
http://www.slideshare.net/intelie/debs2010
Our Nagios Plugin source code:
http://github.com/intelie/neb2activemq
Intelligent Monitoring with Esper:
http://esper.codehaus.org/tutorials/tutorial/presentations.html
Denis Vieira Jr. - [email protected] Ronald Kaiser - [email protected]