spit, gather, churn - mining infrastructure data for ops intelligence

Spit , Gather, ChurnMining Infrastructure Data for Ops Intelligence

Ranjib DeyTwitter: @RanjibDey

IRC/Github :@ranjibd

About Me

• Senior software engineer in the CD practice group @ThoughtWorks India

• Was system administrator before @ThoughtWorks India

• Worked on life science related algorithms @Persistent Systems before that.

• Masters in Bio-Informatics (thesis on HPC, Machine Learning)

• Life Science graduate

Agenda

• What is Ops intelligence?• Why its needed? Implications of Ops

Intelligence.• Why it is important now?• Designing intelligent infrastructure services • How the future looks like?• Q & A

What is Ops Intelligence?

• Suitable for fast , meaningful ops feedback to business

• Abstracts infrastructure details• Tech-Stack neutral• Allows forecasting• Pre-emptive in nature

What is intelligence? Data Mining

Data

Information

Knowledge

Why its needed? Implications

• Self serving • Lean• Elasticity• Adaptive

Why its important now?

• Market volatility increased• Its not the development, but the deployment ,

release and maintenance that’s introducing delay.

• Cloud is here• Infrastructure tooling is matured • Continuous Delivery and DevOps movement is

on

Designing intelligent infrastructure services

• End user driven services• Adhere to core unix philosophies• Remember the ‘|’ , don’t create dead ends• Feedback driven , iterative improvement• Think of horizontal scalability• Infrastructure as a code

Spitting out ops information

• State and Metrics• Logs

Metrics

• An unit test for a method and a monitoring service for each infrastructure service

• A single monitoring service can have multiple metrics

• Metrics can have relationships • These features should be configurable

Metrics driven infrastructure development

Service Metric

Logging

• Decouple logging framework from the core services

• Have configurable logging levels• Enforce appropriate logging and levels• Enforce logging patterns• Logs and logging patterns can be modeled as

metric too.

Metrics on Log

Log Metric on log pattern

Gathering Ops Information

• Information aggregation• Consider how you will use it• Metrics and Logs• Centralized logging

Gathering Ops information

• Two main patterns:– Time series data – OLAP Cubes

• Storage engine considerations– Flat files– RRDs– NoSQLs and other distributed storage systems

Churning Ops Information

• Visualizations– Charting – Trending– Customized Visualizations

• Dashboards– Customized views for stake holders– Information Radiators

Churning Ops Information

• Logs– Search– Index– Alerts and notification on top of aggregated logs

Validation 1: Continuous Delivery

Validation 2: Performance Enhancements

Validation 3: Holistic information

Validation 4: Meaningful information

• Meaningful alerts:– Nodable http://www.nodeable.com/

• Log analytics:– Loggly http://loggly.com/– SplunkStorm https://www.splunkstorm.com/– Graylog2/Logstash

• Dashboards for Metrics– Graphite (+graphiti)

http://www.nodeable.com/

http://www.nodeable.com/

http://loggly.com/

https://www.splunkstorm.com/

https://www.splunkstorm.com/

How the future looks like?

• IaaS• Ops is not the bottleneck • Context aware infrastructure• Test driven infrastructure• SSH is not a must

• “ The machines are alive” – Jon Crosby…… and they are emerging

Thank You

spit, gather, churn - mining infrastructure data for ops intelligence

Technology