retrieving relevant reports from a customer engagement repository dharmesh thakkar zhen ming jiang...

19
Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University, Canada Gilbert Hamann Parminder Flora Research In Motion (RIM), Canada

Upload: alvin-hudson

Post on 12-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Retrieving Relevant Reports from a Customer Engagement Repository

Dharmesh Thakkar

Zhen Ming Jiang

Ahmed E. HassanSchool of Computing, Queen’s University, Canada

Gilbert Hamann

Parminder FloraResearch In Motion (RIM), Canada

Page 2: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Software Maintenance: Customer Support

ApplicationSupportAnalyst

SymptomsIdentified Problems

SolutionsExecution

Logs

Customer Engagement

Report

Customer Engagement Repository

Create Store

Contains

Attempted Workarounds

Page 3: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Retrieving Relevant Reports

■ State of Practice:– No systematic techniques to retrieve and use

information for future engagements– Keyword searching is limited:

• depends on the search skills and experience of the analyst and peculiarity of the problem

Page 4: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Customer SupportProblem Statement

■ We want to find customers with similar operational and problem profiles

■ We can reuse prior solutions and knowledge

Heavy Email, Light Web, Light MDS Light Email, Light Web, Light MDS Light Email, Heavy Web, Light MDS Heavy Email, Heavy Web, No MDS Light Email, Light Web, Heavy MDS

Other Customers

CompareNew

Customer Engagement

Page 5: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Using Logs forCustomer Support

■ Execution logs are readily available and contain – Operational Profile: usage patterns (heavy

users of email from device, or to device, or light users of calendar, etc.)

– Signature Profile: specific error line patterns (connection timeout, database limits, messages queued up, etc.)

■ Find the most similar profile

Page 6: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Execution Logs

■ Contain time-stamped sequence ofevents at runtime

■ Readily available representatives of both feature executions and problems

<time> Queuing new mail msgid=ABC threadid=XYZ<time> Instant message. Sending packet to client msgid=ABC threadid=XYZ<time> New meeting request msgid=ABC threadid=XYZ<time> Client established IMAP session emailid=ABC threadid=XYZ<time> Client disconnected. Cannot deliver msgid=ABC threadid=XYZ<time> New contact in address book emailid=ABC threadid=XYZ<time> User initiated appointment deletion emailid=ABC threadid=XYZ

Page 7: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Example

010002000

C2

0

2000

4000

C1

0

2000

4000

C3

MTH

MTH m

ore

requ

ests

MFH

View C

alend

ar

Mee

ting

Reque

st

Appoin

tmen

t syn

c

Mes

sage

mov

e

Conta

cts

HTTPM

DS

0500

1000150020002500

NewSupport Request

Other Customers

Compare

Page 8: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Our Technique

CustomerExecution

Logs

Convert Log Lines to Event Distribution

Compare Event

Distributions

Customer Engagement Repository

Compare Event

Distributions

Identify Signature

Events

Signature Event Distribution

OUTPUT RESULT SET

Closest Customer Engagement Reports

wrtSignature Profile

Closest Customer Engagement Reports

wrtOperational Profile

Event Distribution

Page 9: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Log Lines to Event Distribution

■ Remove dynamic information– Example: Given the two log lines “Open inbox

user=A” and “Open inbox user=B”, map both lines to the event “Open inbox user=?”

■ Use event percentages to compare event logs for different running lengths without bias

CustomerExecution

Logs

Convert Log Lines to Event Distribution

Compare Event

Distributions

Customer Engagement Repository

Compare Event

Distributions

Identify Rare Signature

Events

Rare Signature Event

Distribution

OUTPUT RESULT SET

Closest Customer Engagement Reports

wrtRare Event Profile

Closest Customer Engagement Reports

wrtOperational Profile

Event Distribution

Page 10: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

CompareEvent Distributions

CustomerExecution

Logs

Convert Log Lines to Event Distribution

Compare Event

Distributions

Customer Engagement Repository

Compare Event

Distributions

Identify Rare Signature

Events

Rare Signature Event

Distribution

OUTPUT RESULT SET

Closest Customer Engagement Reports

wrtRare Event Profile

Closest Customer Engagement Reports

wrtOperational Profile

Event Distribution

D1 D2 D3

■ Kullback-Leibler Divergence

■ Cosine Distance

Page 11: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Identify Signature Events

■ Signature Events have a different frequency when compared to events in other log files– Example signature events: dropped

connections, thread dumps, and full queues

■ Chi-square test identifies such events

CustomerExecution

Logs

Convert Log Lines to Event Distribution

Compare Event

Distributions

Customer Engagement Repository

Compare Event

Distributions

Identify Rare Signature

Events

Rare Signature Event

Distribution

OUTPUT RESULT SET

Closest Customer Engagement Reports

wrtRare Event Profile

Closest Customer Engagement Reports

wrtOperational Profile

Event Distribution

Page 12: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Measuring PerformanceC

RelevantLog Files

RRetrievedLog Files

F2

F1

F3F4

F5

CR

■ Precision = 2/4 = 50%100% precise if all the retrieved log files are relevant

■ Recall = 2/3 = 67%100% recall if all the relevant log files are retrieved

Page 13: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

The Big Picture

CustomerExecution

Logs

Convert Log Lines to Event

Distribution

Compare Event

Distributions

Customer Engagement Repository

Compare Event

Distributions

Identify Signuare Events

Signature Event Distribution

OUTPUT RESULT SET

Closest Customer Engagement Reports

wrtSignature Profile

Closest Customer Engagement Reports

wrtOperational Profile

Event Distribution

Page 14: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Case Studies

■ Case Study I– Dell DVD Store open source application– Code instrumentation done for event logging– Built the execution log repository by applying

synthetic workloads, changing the workload parameters each time

■ Case Study II– Globally deployed commercial application– More than 500 unique execution events

Page 15: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Case Study Results■ Dell DVD Store

– 100% precision and recall on both operational profile based and signature profile based retrieval

■ Commercial Application– 100% precision and recall for signature profile based

retrieval– Results for operational profile based retrieval:

ExperimentCount of Log Files

K-L Distance Cosine Distance

Precision Recall Precision Recall

Single Feature Group 28 67.71% 90.28% 67.71% 90.28%Multiple Feature Groups 28 60.71% 80.95% 75.00% 100.00%All Feature Groups 12 72.92% 97.22% 62.50% 83.33%Real World Log Files 12 54.17% 72.22% 68.75% 91.67%All the Log Files 80 59.93% 79.90% 56.72% 75.62%

Page 16: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Sources of Errors

■ Events that do not correspond directly to a particular operational feature, such as idle time events, server health check events, startup and shutdown events

■ Imbalance in the event logging

Page 17: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Imbalance in Event Logging

F1E2

E1

F2E4

E3

E6

E5

F3E8

E7

E10

E9

400

F1E2

E1

F2E4

E3

E6

E5

F3E8

E7

E10

E9

F1E2

E1

F2E4

E3

E6

E5

F3E8

E7

E10

E9

E12

E11

E12

E11

E12

E11

OP1 OP2 OP3

400

200

400

450

200

400

400

220

Page 18: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Related Work

■ Data mining techniques on textual information[Hui and Jha, 2000]– Cons: Limited results, depending on analyst’s search skills and

peculiarity of the problem

■ Using customer usage data [Elbaum and Narla, 2004]– Cons: Customer usage data rarely exists

■ Clustering HTTP execution logs [Menascé, 1999]– Cons: Complex process, works only for HTTP logs

■ Software Agent Deployment to build operational profile [Ramanujam et. al., 2006]– Cons: Intrusive, complex, costly

Page 19: Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,

Conclusion