![Page 1: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/1.jpg)
BuzzTrackTopic Detection and Tracking in Email
IUI – Intelligent User InterfacesJanuary 2007
Keno AlbrechtETH Zurich
Roger WattenhoferETH Zurich
Gabor CselleGoogle
![Page 2: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/2.jpg)
2
Email Overload• Email clients were not designed to
handle volume and variety of messages users are dealing with today:
• Large volumes of email• Task Management• Personal Archiving or Filing• Keeping Context
[Whittaker and Sidner, 1996]
![Page 3: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/3.jpg)
3
Search vs. Inbox Browsing• Fast full-text search
is today's solution to finding past emails.
• But the flat inbox view of newly incoming emails hasn’t changed.
In our work, we focus on the problem of sensibly structuring emails in the inbox.
![Page 4: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/4.jpg)
4
Today's Email Clients: The Three-Pane View
No sense of context: unrelated messages are shown together
Important emails may drop off the “first screen”
“Thread-based” tree views are unsophisticated, may not pull in all relevant messages.
![Page 5: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/5.jpg)
5
BuzzTrackEmail client extension for Mozilla Thunderbirdfor displaying email grouped by topic.
![Page 6: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/6.jpg)
6
Related Work
![Page 7: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/7.jpg)
7
Visualizations: ConversationsGmail (Google)
common conversation title
one entry per email, folds out on click
![Page 8: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/8.jpg)
8
Automatic Foldering• Using machine learning
techniques to automatically move emails into folders upon arrival
• Low accuracy rates [Bekkerman et al, 2005], conceptual problems:• Users need to manually
create folders and seed them with data.
![Page 9: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/9.jpg)
9
People-Centered Email Clients
Bifrost ContactMap
[Bälter and Sidner, 2002] [Whittaker et al., 2004]
![Page 10: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/10.jpg)
10
Task-based Email
Example: TaskMaster
thrasks
thrask contents
item contents
(emails, documents, etc.)
TaskMaster[Belotti et al., 2003]
![Page 11: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/11.jpg)
11
BuzzTrack
![Page 12: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/12.jpg)
12
BuzzTrack• Mozilla Thunderbird
extension to automatically group related emails into topics.
• Will be distributed through website: www.buzztrack.net
• Provides a view on the user’s inbox.
![Page 13: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/13.jpg)
13
What’s a Topic?
• Topics are groups of emails that relate to the same idea, action, event, task, or question.
• Examples:•A conversation about buying a
digital camera.•Referring a candidate for a job.•All emails belonging to same
newsgroup.
![Page 14: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/14.jpg)
14
Clustering Process• For every new incoming email:
Preprocessing Clustering
Label generation
Cluster storeBuzzTrack View in
Thunderbird
![Page 15: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/15.jpg)
15
Preprocessing• Tokenization (remove HTML tags, style
sheets, punctuation, and numbers)• Language detection• Stemming• For topic labelling:
• Identify Parts-of-speech• Remember popular original word
forms
![Page 16: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/16.jpg)
16
Clustering• Single-link clustering: Newly incoming emails are
compared to every email in existing topics:• Similarity value > threshold: assigned to topic• Similarity value <= threshold: email starts new topic
Topic 1 Topic 2
Topic 3
new email
![Page 17: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/17.jpg)
17
Features - 1• How do we generate similarity values
between emails?• Via a linear combination of several
similarity features. • Examples:
• Text similarity (TFIDF Value, cosine similarity metric)
• People similarities (comparing sets of people in the From / To / Cc lines of email headers)
• Thread membership
![Page 18: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/18.jpg)
18
Features - 2Other features for deriving similarities:• Subject similarity• Sender domain overlaps• Sender rank and percentage• % of email from sender that is
answered• Time passed since last email in topic• People and reference count for email• Known people and reference %• Cluster size• Has attachment
![Page 19: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/19.jpg)
19
Decision Score
Similarities are combined into a decision score for each email / cluster pair through a linear combination of feature values:deci,j = wa*sima(mi,Cj) + wb*simb(mi,Cj) + …
We tested two sets of weights wx, both trained on a development set of emails:
• Empirical• Linear SVM
![Page 20: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/20.jpg)
20
Evaluation• How do we evaluate clustering quality?• Topic Detection and Tracking
competitions by NIST. Aimed at clustering news articles.
• Corpus:
![Page 21: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/21.jpg)
21
Clustering Tasks• Clustering Task is split into subtasks:
• New Topic Detection (NTD):Given stream of emails, which ones start new topics?
• Topic Tracking (TT):Given a fixed topic, which newly incoming emails belong to it?
• DET Curves plot miss rate vs. false alarm rate for possible threshold for decision scores
![Page 22: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/22.jpg)
22
Results NTD• TDT New Topic Detection Task
Miss: 3%False alarm: 30%
bett
er
better
![Page 23: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/23.jpg)
23
Results TT• TDT Topic Tracking Task
Miss: 8%False alarm: 2%
bett
er
better
![Page 24: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/24.jpg)
24
Comparison• Comparable quality to TDT for news
articles [NIST 2004]• News has less metadata, email has
worse text quality.• Wide body of work exists on improving
clustering performance on news, we haven’t tapped into that yet.
![Page 25: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/25.jpg)
25
BuzzTrack View
• Mozilla Thunderbird plugin that provides useful view on inbox data “for free”
• Topics contain email from last 60 days• We’re interested in current email
only• Reduces initial clustering time
• Each email is shown in one topic
![Page 26: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/26.jpg)
26
![Page 27: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/27.jpg)
27
Demo 1: BuzzTrack
![Page 28: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/28.jpg)
28
BuzzTrack PanesTopic pane: • Provides additional
info• Starred topics
Email pane:• Topics sorted by last
incoming email
![Page 29: BuzzTrack Topic Detection and Tracking in Email](https://reader036.vdocuments.site/reader036/viewer/2022062314/56813c44550346895da5c151/html5/thumbnails/29.jpg)
29
Future Work• Distribute plugin to Thunderbird users
• Input on possible UI improvements• Input on clustering quality
• Different clustering styles• People-based• Thread-based
• We hope BuzzTrack will be valuable tool for real-world users