improvement of log pattern extracting algorithm using text

14
Improvement of Log Pattern Extracting Algorithm Using Text Similarity ZHAO Yining Computer Network Information Center, Chinese Academy of Sciences in HPBDC18, 2018/05/21

Upload: others

Post on 09-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improvement of Log Pattern Extracting Algorithm Using Text

ImprovementofLogPatternExtractingAlgorithmUsingTextSimilarity

ZHAOYiningComputerNetworkInformationCenter,

ChineseAcademyofSciencesinHPBDC18,2018/05/21

Page 2: Improvement of Log Pattern Extracting Algorithm Using Text

Content

v CNGrid&LARGEv WhyLogPatterns&ExtractingAlgorithmv AlgorithmofIdenticalWordRatev TextSimilarityBasedApproach

Ø  ImprovedExtractingFormation&LCSØ  ExperimentResult

v ModifiedLogComparingModelv Summary&FutureWork

Page 3: Improvement of Log Pattern Extracting Algorithm Using Text

CNGrid&LARGE

v ChinaNationalHPCEnvironment

2OperatingCenters(Beijing/Hefei)

19Sites(200PF+162PB)

PortalwithMicro-ServiceArchitecture

ApplicationorientedGlobalScheduling&Predicting

ResourceEvaluationStandard&ComprehensiveEvaluationIndex

Page 4: Improvement of Log Pattern Extracting Algorithm Using Text

CNGrid&LARGE

v LogAnalyzingfRameworkinGridEnvironment

Page 5: Improvement of Log Pattern Extracting Algorithm Using Text

LogPatterns&ExtractingAlgorithm

v Wewanttobealertedforlogsincertainpatterns,but…Ø  toomanylogsforhumantoreadØ  needtosummarizepatternsbeforedefiningalertrules

v Setoflogpatternsinourcontext:Ø  patternsaredifferentfromeachotherØ  coveringalllogsinoriginalsetØ  significantlylessthanoriginal

v TheprocessofusinglogpatternsØ  filterandremovefrequentnormallogsØ  uselogpatternextractionalgorithmstogetthesetofpatternsØ manuallycheckthesetandpickoutabnormalpatternsØ  definerulestogeneratealertsforthesepatterns

Page 6: Improvement of Log Pattern Extracting Algorithm Using Text

AlgorithmofIdenticalWordRate

v Algorithmofidenticalwordrate–astraightforwardwayØ  identicalwords

•  2wordsthatareidentical•  andinthesamepositionin2originallogs

Ø  identicalwordrate•  (numberofidenticalwords)/(totalwords)•  predefinedthresholdt•  IfIWRisgreaterthant,thetwologsareinonepattern

v ProcessofalgorithmofIWRØ  setthresholdtandinitialemptypatternsetPØ  foreachnewincominglogs,computeIWRwitheachpatterninPØ  ifpatternmatched,skiptonext;ifnonematched,addtoP

v SignificantLimitationØ  LogswithdifferentlengthhasIWRofZERO!

Page 7: Improvement of Log Pattern Extracting Algorithm Using Text

TextSimilarityBasedApproach(1)

v UsingTextSimilaritytoresolvetheproblemØ  S=PxOØ  S:similarity,P:propotionofcommonwords,O:orderfactor

v Twologsl1andl2,L1andL2arewordsetsrespectivelyØ  defineP:P(l1,l2)=(|L1∩L2|×2)/(|L1|+|L2|)Ø  defineO:O(l1,l2)=SeqSim(l1,l2)/|L1∩L2|Ø  henceS:S(l1,l2)=(SeqSim(l1,l2)×2)/(|L1|+|L2|)

v Bythis,logsindifferentlengthscanbecompared

Page 8: Improvement of Log Pattern Extracting Algorithm Using Text

TextSimilarityBasedApproach(2)

v UsingLongestCommonSubsequencetodefineSeqSim(l1,l2)Ø  S(l1,l2)=(|LCS(l1,l2)|×2)/(|L1|+|L2|)Ø  SamepatternifS(l1,l2)≥t,wheretisthepredefinedthreshold

v TheprocessofimprovedlogpatternextractingalgorithmØ  setthethresholdvaluet.SettheinitiallogpatternsetPtobean

emptysetØ  foranewloglappearingfromtheinputlogsetL,computeSi(l,pi)

betweenlandeverypi∈PusingaLCSalgorithmØ  ifthereisnoSi(l,pi)≥t,addltoPØ  afteralllogsinLhavebeenchecked,returnP

v  IncreasetimecostforsinglecomparisonØ  butreducetotalnumberofcomparisonsØ  canbeoffsetbychoosingabetterLCSalgorithm

Page 9: Improvement of Log Pattern Extracting Algorithm Using Text

TextSimilarityBasedApproach(3)

v ExperimentresultØ  numbersofextractedpatterns

Page 10: Improvement of Log Pattern Extracting Algorithm Using Text

TextSimilarityBasedApproach(3)

v ExperimentresultØ  timecostsofcandidatealgorithms(inmilliseconds)

Page 11: Improvement of Log Pattern Extracting Algorithm Using Text

ModifiedPatternComparingModel(1)

v TheoriginalmodelisbadintimecostofsearchingpatternsØ  hastovisitallpatternsuntiltheoneismet

v UsehashmaptoacceleratethematchingØ  dividepatternsetintosubsetsbyinitialwordsØ  skipmajorityofpatternsinirrelevantsubsets

v Matchingprocess:1.  getinitialwordofthelog2.  hashtheword3.  finddesiredsubsetinhashmap4.  comparewithpatterns

inthesubset

Page 12: Improvement of Log Pattern Extracting Algorithm Using Text

ModifiedPatternComparingModel(2)

v ThisapproachcannotdealwithpatternswithunfixedinitialsØ  buildanunfixedpatternset

v  Inrealsystem,wesplitpatternsetin4parts:Ø  fixedalertpatternsetØ  unfixedalertpatternsetØ  fixednormalpatternsetØ  unfixednormalpatternset

v Whenanewlogcomes,itiscomparedinthe4setsinturntodecideprocessingmethods

Page 13: Improvement of Log Pattern Extracting Algorithm Using Text

ModifiedPatternComparingModel(3)

v Realtimecostcomparisonbetweenoriginal&modifiedmodels

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

originalmodel modifiedmodel

cronmillisecond

0

500000

1000000

1500000

2000000

2500000

3000000

originalmodel modifiedmodel

maillogmillisecond

0

100000

200000

300000

400000

500000

600000

originalmodel modifiedmodel

securemillisecond

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

9000000

10000000

originalmodel modifiedmodel

messagesmillisecond

Page 14: Improvement of Log Pattern Extracting Algorithm Using Text

Summary&FutureWork

v Logpatterns:usedtobuildlogrecognitionv AlgorithmofIWRisn’tcapabletomatchlogsindifferent

lengthsv UsingtheideaoftextsimilarityandLCStoimprovethe

algorithmv Modifylogcomparingmodeltoacceleratetheprocess

v Futurework:logpatternbasedanalysesinCNGridØ  logpatternassociationsØ  logflowfeaturemodeling