spit, gather, churn - mining infrastructure data for ops intelligence
DESCRIPTION
Presentation on how to design infrastructure services for meaningful ops intelligence, and how to integrate ops intelligence as feedback for software developmentTRANSCRIPT
Spit , Gather, ChurnMining Infrastructure Data for Ops Intelligence
Ranjib DeyTwitter: @RanjibDey
IRC/Github :@ranjibd
About Me
• Senior software engineer in the CD practice group @ThoughtWorks India
• Was system administrator before @ThoughtWorks India
• Worked on life science related algorithms @Persistent Systems before that.
• Masters in Bio-Informatics (thesis on HPC, Machine Learning)
• Life Science graduate
Agenda
• What is Ops intelligence?• Why its needed? Implications of Ops
Intelligence.• Why it is important now?• Designing intelligent infrastructure services • How the future looks like?• Q & A
What is Ops Intelligence?
• Suitable for fast , meaningful ops feedback to business
• Abstracts infrastructure details• Tech-Stack neutral• Allows forecasting• Pre-emptive in nature
What is intelligence? Data Mining
Data
Information
Knowledge
Why its needed? Implications
• Self serving • Lean• Elasticity• Adaptive
Why its important now?
• Market volatility increased• Its not the development, but the deployment ,
release and maintenance that’s introducing delay.
• Cloud is here• Infrastructure tooling is matured • Continuous Delivery and DevOps movement is
on
Designing intelligent infrastructure services
• End user driven services• Adhere to core unix philosophies• Remember the ‘|’ , don’t create dead ends• Feedback driven , iterative improvement• Think of horizontal scalability• Infrastructure as a code
Spitting out ops information
• State and Metrics• Logs
Metrics
• An unit test for a method and a monitoring service for each infrastructure service
• A single monitoring service can have multiple metrics
• Metrics can have relationships • These features should be configurable
Metrics driven infrastructure development
Service Metric
Logging
• Decouple logging framework from the core services
• Have configurable logging levels• Enforce appropriate logging and levels• Enforce logging patterns• Logs and logging patterns can be modeled as
metric too.
Metrics on Log
Log Metric on log pattern
Gathering Ops Information
• Information aggregation• Consider how you will use it• Metrics and Logs• Centralized logging
Gathering Ops information
• Two main patterns:– Time series data – OLAP Cubes
• Storage engine considerations– Flat files– RRDs– NoSQLs and other distributed storage systems
Churning Ops Information
• Visualizations– Charting – Trending– Customized Visualizations
• Dashboards– Customized views for stake holders– Information Radiators
Churning Ops Information
• Logs– Search– Index– Alerts and notification on top of aggregated logs
Validation 1: Continuous Delivery
Validation 1: Continuous Delivery
Validation 2: Performance Enhancements
Validation 3: Holistic information
Validation 4: Meaningful information
• Meaningful alerts:– Nodable http://www.nodeable.com/
• Log analytics:– Loggly http://loggly.com/– SplunkStorm https://www.splunkstorm.com/– Graylog2/Logstash
• Dashboards for Metrics– Graphite (+graphiti)
How the future looks like?
• IaaS• Ops is not the bottleneck • Context aware infrastructure• Test driven infrastructure• SSH is not a must
• “ The machines are alive” – Jon Crosby…… and they are emerging
Thank You