analyzing your logs: what are they telling you?

31
Analyzing Your Logs What are they telling you? Gerard Ibarra, PhD November 2008

Post on 15-Sep-2014

3 views

Category:

Business


0 download

DESCRIPTION

Use systems thinking and statistical analysis to learn more about your proprietary applications. Analyze their behavior based on the logs they generate. Determine patterns and trends to obviate system downtimes.

TRANSCRIPT

Page 1: Analyzing Your Logs:  What are they telling you?

Analyzing Your LogsWhat are they telling you?

Gerard Ibarra, PhDNovember 2008

Page 2: Analyzing Your Logs:  What are they telling you?

Goals Systems Thinking Definition of System: This Presentation Log Analysis Analysis Summary

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 2

Page 3: Analyzing Your Logs:  What are they telling you?

Think systems first Use statistics to understand what is going

on Get a better picture with charts Include control charts to monitor the system

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 3

Page 4: Analyzing Your Logs:  What are they telling you?

“A system is an assemblage or combination of elements or parts forming a complex or unitary whole;…” (Blanchard, B. S., and Fabrycky, W. J., Systems and Engineering and Analysis (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall, 1990)

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 4

Page 5: Analyzing Your Logs:  What are they telling you?

Systems could be any of the following:◦ A transportation network moving items from one

place to another – dynamic◦ A bridge used to connect places together – static◦ A set of unmanned aerial vehicles (UAV) located

in a strategic region providing intelligence – dynamic

◦ A group of applications and servers acting together to perform a service – dynamic

◦ A motor for a car – static/dynamic

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 5

Page 6: Analyzing Your Logs:  What are they telling you?

Systems today are more complex than before (Using Systems Engineering to Improve RMS&L Requirements, A Government-Industry Training Workshop, various discussions, Springfield VA: November 12-13, 2008)

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 6

Page 7: Analyzing Your Logs:  What are they telling you?

Changes in one part of the system affects the system as a whole◦ More items to move – extra resources to process◦ Increase traffic – longer times to cross bridge◦ Reduction in UAV – changes strategies if mission

remains the same◦ Server down – increases load; possible sales loss◦ New and improved parts – increase inventory to

maintain both motors

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 7

Page 8: Analyzing Your Logs:  What are they telling you?

Why think systems for your network?◦ Because changes done to its parts affect its

overall mission and ultimately the business as a whole. For example, the items below have an effect on how the system operates that in turn affects how the company can conduct its business. Adding or removing applications Modify software/hardware configuration Add or remove hardware from operations Improving, adding, or deleting features

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 8

Page 9: Analyzing Your Logs:  What are they telling you?

System is the aggregation of applications, servers, and services working in unison to produce a common function for the use, goals, sustainment, and operations of the company

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 9

Page 10: Analyzing Your Logs:  What are they telling you?

Various ways to analyze logs: Examples◦ Statistical

Central Tendency Variation Skewness Kurtosis

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 10

Page 11: Analyzing Your Logs:  What are they telling you?

Examples Continued:◦ Graphical

Bar Chart Line Chart Pie Chart Control Charts

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 11

Page 12: Analyzing Your Logs:  What are they telling you?

Statistical – Central Tendency◦ Determine how much central tendency there is in

the log data Know and understand what is the average number of

events occurring in a system – used for a quick check of how the system is currently operating

Compare the average events occurring over time – see if there are any patterns

Look at the startup of a process – determine if the number of errors differ as times progresses

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 12

Page 13: Analyzing Your Logs:  What are they telling you?

Statistical – Central Tendency Example◦ Use the following analytics to generate report

Mean Medium Mode Quartiles

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

First Hour

(Based on 1-min aggregations over 1-hour periods)

13

Page 14: Analyzing Your Logs:  What are they telling you?

◦ The mean is 2.3333 – this is the average times over one hour based on one minute increments that the error occurred; anything more than this should raise a flag when comparing the same events to the same hour to other days

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleDay1 Day2 Day3 Day4 Day5

Mean 2.33 2.35 2.21 7.45 2.41

14

Page 15: Analyzing Your Logs:  What are they telling you?

◦ The median is 2 – this is the mid point number of events based on the hour; it should be somewhat close to the mean unless the data is skewed

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleData (1, 1, 1, 1, 1, 1, 2, 20, 21, 22, 23, 24, 25 )Mean = 11Median = 2The mean is over five times the median – should raise a flag; notice that the data is skewed to the ones and twenties

15

Page 16: Analyzing Your Logs:  What are they telling you?

◦ The mode is 1 – this is the most reoccurring number of events based on one minute aggregations over the one hour; shows where most of the data comes from; should make some sense with respect to the mean or median or both

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleData (1, 1, 1, 1, 1, 1, 5, 17, 19, 21, 23, 25, 27)Mode = 1Mean =11Median = 5

There is a wide variation between the three indices – should raise flag

16

Page 17: Analyzing Your Logs:  What are they telling you?

◦ The lower and upper quartiles are 1 and 3.5 – this shows the lower half and upper half of the medians based on the Moore and McCabe or “M-and-M method” (there are various ways to calculate the quartiles)

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleData (0, 0, 0, 0, 0, 0, 5, 17, 19, 21, 23, 25, 27)LQ = 0; UQ = 22Mean = 11The mean is far from the LQ in terms of percentages – should raise flag; could show that at the startup of the period the #no. of errors were nil, and as time increased, so did the errors

17

Page 18: Analyzing Your Logs:  What are they telling you?

Statistical – Variation◦ Determine how much the log data is varying from

the mean The closer to the mean, the less the systems vary The less variations typically the smoother the system

operates

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 18

Page 19: Analyzing Your Logs:  What are they telling you?

Statistical – Variation Example◦ Use the following analytics to generate report

Mean Variation

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

First Hour

(Based on 1-min aggregations over 1-hour periods)

19

Page 20: Analyzing Your Logs:  What are they telling you?

◦ The mean is 2.3333 and the standard deviation is 1.91195 – the standard deviation is the amount that the data varies from the mean; it is the amount of spread from the mean expressed in the original units

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

ExampleMean = 45StdDev = 41The standard deviation is almost the same amount as the mean – this should raise a flag (Note that the company could define this type of behavior as normal)

20

Page 21: Analyzing Your Logs:  What are they telling you?

Statistical – Skewness and Kurtosis◦ Try to find out the type of distribution the system

generates Learn if the data is normal – good for predictions See how the system operates – determine if there

are modes during certain periods

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 21

Page 22: Analyzing Your Logs:  What are they telling you?

Statistical – Skewness and Kurtosis Example◦ Use the following analytics to generate report

Statistics

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

(Based on 1-hour aggregations over the range of the data)

22

Page 23: Analyzing Your Logs:  What are they telling you?

◦ The Skewness is -2.34592 – this is a measure of the symmetry of the distribution (negative means that it skews to the left and positive to the right)

◦ The Kurtosis is 8.49086 – this is the measure of how peaked the distribution is (the larger the number, the more “peaked”)

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Normal

Skewed LeftPeaked High

m

Re

gion

with

S

igni

fican

t #

of

Eve

nts

Example of possible distribution: Mostof the events take place at the start ofthe process and peaks in a short interval

23

Page 24: Analyzing Your Logs:  What are they telling you?

◦ A Skewness of 0.0 and Kurtosis of 3.0 means that this is an ideal normal distribution – great for predicting possible outcomes

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Normal

m

s sm

X

Z

24

Page 25: Analyzing Your Logs:  What are they telling you?

Graphical – Bar Charts◦ View the errors based on different periods◦ Understand the behavior of the systems better

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Most Errors on Day 2Least number of errorsat 6:00 am and 5:00 pm

Two instances of almostzero errors on day 5

25

Page 26: Analyzing Your Logs:  What are they telling you?

Graphical – Line Charts◦ Get a clearer perspective on the error rates◦ View same data, but from a different perspective

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Most Errors on Day 2Least number of errorsat 6:00 am and 5:00 pm

Two instances of almostzero errors on day 5

26

Page 27: Analyzing Your Logs:  What are they telling you?

Graphical – Line Charts◦ Use it to forecast

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Follows Same Trend Basedon Periods (Aug01 – Sep01 and Aug02 – Sep02)

Shows an Upward Trend

27

Page 28: Analyzing Your Logs:  What are they telling you?

Graphical – Pie Charts◦ Compare to other events◦ Compare to system as a whole

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Errors account for less than2% of the Events in the System

Significant number ofErrors occurring basedon the number of Warnings

28

Page 29: Analyzing Your Logs:  What are they telling you?

Graphical – Control Charts◦ Monitor the system or individual subsystems◦ Anticipate possible problems

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Out of Compliance

Trending Upwards: Tryto keep it from going abovethe UCL again

29

Page 30: Analyzing Your Logs:  What are they telling you?

Use analytics and charting to help view and understand what the system and its subsystems may be doing◦ Look for

Abnormalities Deviations Compliances

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved. 30

◦ Learn how to Predict Anticipate Forecast

Page 31: Analyzing Your Logs:  What are they telling you?

Copyright © 2008 Buildwave Technologies, Inc. All rights reserved.

Most of the chart and result screen shots shown in this presentation were created in Violog. http://www.buildwave.com/violog

31