using machine learning to optimize devops practices

Post on 22-Jan-2018

74 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using Machine Learning to Optimize DevOps Practices

Building Learning into Monitoring and Feedback

Peter Varhol

About me

• International speaker and writer• Degrees in Math, CS, Psychology• Technology communicator• Former university professor, tech journalist• Cat owner and distance runner• peter@petervarhol.com

Agenda

• What is machine learning?

• How is machine learning applied to DevOps?

• Challenges in training these systems

• What constitutes an issue?

• Summary and conclusions

What is Machine Learning?

• Layered algorithms that change parameters based on feedback from know data• Can be linear or nonlinear

• Algorithms can be fixed in production or adaptive• Fixed – algorithms do not adjust once deployed

• Adaptive – algorithms continually adjust to new data

• Usually part of a larger system

Adaptive Systems

• Airline pricing• Ticket prices change three times a day based on demand

• It can cost less to go farther

• It can cost less later

• Ecommerce systems• Recommendations try to discern what else you might want

• Can I incentivize you to fill up the plane?

Why Use Adaptive?

• The “right” result will vary over time

• Trying to optimize a particular result• Revenue

• The problem domain is not static

Confidential, Dynatrace LLC

How Are Fixed Systems Used?

• Transportation• Self-driving cars

• Aircraft/Drones

• Ecommerce• Recommendation engines

• Medical• Diagnosis systems

Why Use Fixed Machine Learning Systems

• The problem domain is static

• The expectations remain constant

• The right answer is known under most conditions

• The original algorithms remain valid over a long period of time

DevOps Practices Generate Data

• During development• Agile metrics, JIRA issues, test case metrics

• During continuous integration• System test metrics

• During continuous deployment• Quality metrics for deployments

• After deployment and into production• Application availability and performance

• Usage log files

Focus on Monitoring

• Ongoing data on availability and performance• RUM

• Synthetic tests

• Application monitoring

• Monitoring tackles the back end of DevOps• Identifying unhealthy trends

• Diagnoses failures and poor performance

• Recommends action

• Fixed or adaptive depends on your goals

Where Do Predictive Analytics Come In?

• Big data makes possible predictions of future events• Are we going to fail?

• How will we perform with traffic surges?

• As well as past events• What went wrong and how do we fix it

• We can rely on past data• Adaptive systems may not perform as well

• Clear goals needed

What Technologies Are Involved?

• Neural networks

• Genetic algorithms

• Rules engines

Neural Networks

• Set of layered algorithms whose variables can be adjusted via a learning process

• The learning process involves training with known inputs and outputs

• The algorithms adjust coefficients to converge on the correct answer (or not)

• You freeze the algorithms and coefficients, and deploy• Or you optimize on a particular set of characteristics

A Sample Neural Network

Genetic Algorithms

• Use the principle of natural selection

• Create a range of possible solutions

• Try out each of them

• Choose and combine two of the better alternatives

• Rinse and repeat as necessary

Bringing in DevOps

• DevOps has data that can be used to train neural networks• Health of the application

• Trends in application traffic and responsiveness

• Application failure

Machine Learning Helps DevOps

• Decisions are complex• Why is the CPU maxed?

• What is causing disk thrashing?

• Why did the network slow?

• Why did the application fail?

• Data is massive• Potentially thousands of data points a day

How Good Are Decisions?

• Expert versus machine

• Given the same data• In many domains they tie

• With additional data, the human can be better

• But machine learning will get better

• But only as good as the data

We Want to Do Two Things

• Identify trends that may indicate future problems• Increasing response times

• More page errors

• Diagnose faults once they have happened• Why did the application fail?

• How can we fix it as quickly as possible?

Fixed Algorithms Work for Some Problems

• Immediate performance and failure identification

• Diagnosis of failures and performance issues

• These are readily identifiable from known data

Adaptive Systems Supplement These Tools

• Predictions of future events• Performance

• Availability

• The target is moving• So we need current data to adjust the algorithms

The Machine Helps the DevOps Expert

• The machine learning app provides:• Early warning on possible performance issues and failures

• Immediate notification of failure or impending failure

• Trend analysis of data to predict unhealthy outcomes

• The machine learning is an assistant• It can’t fix anything

• It can’t necessarily identify the root cause

What is the Goal?

• We have many ways of monitoring• Many of them are represented at this conference

• Each measures something a little different• Latency, response time, availability, network, DNS . . .

• Too much data can be no better than no data at all

• Machine learning can correlate across measurements• Focus to eliminate false positives

Intelligent Systems Are Sometimes Wrong

• The problem domain is ambiguous

• There is no single “right” answer• “Close enough” is good

• We don’t know quite why the software responds as it does• We can’t easily trace code paths

Testing Machine Learning Systems

• Have objective acceptance criteria

• Test with new data

• Don’t count on all results being accurate

• Understand the architecture of the network as a part of the testing process

• Communicate the level of confidence you have in the results to management and users

A Cautionary Tale

• All events are not created equal

• AI systems treat events equally• A failure of a system during busy season is the same as any other

• DevOps pros know otherwise• And can exert additional effort in response

• And actually fix the problem

• We can’t automate what we don’t understand

• You need the human in the loop

Confidential, Dynatrace LLC

Conclusions

• DevOps is a natural environment for machine learning systems• Any activity that generates data and requires a decision is fair game

• Monitoring is low-hanging fruit

• Fixed systems for failure and diagnosis, adaptive for trend analysis

Confidential, Dynatrace LLC

References

• https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you-whos-to-blame/

• https://pvarhol.wordpress.com/2017/07/22/what-brought-about-our-ai-revolution/

• https://pvarhol.wordpress.com/2017/06/21/analytics-dont-apply-in-the-clutch/

Confidential, Dynatrace LLC

Thank You

Peter Varholpeter@petervarhol.com

top related