1 email viz future directions marti hearst uc berkeley
Post on 19-Dec-2015
223 views
TRANSCRIPT
2
Outline
• Important Infoviz Principle• Tough Data Mining Problem
– The infrequent important thing
• Interfaces tailored to user goals– Intelligence Analysts– Investigative Reporters
• Promising Future Directions– Integration of task, viz, and content analysis– Mixed-Initiative Interaction
4
Tough Data Mining Problem
• It’s easy to see the main trends• But often we want the rare but
unexpected and important event:– Russian oil company example– Schwarzenegger and Enron– Cigarettes and kids– Person on the periphery who is
working stealthily to influence things• Deep throat
6
Intelligence Analysts
• Interviews wit active counter-terrorist analysts
• Great diversity in– Goals– Computing environments
• Biggest problems are social/systemic
• Many mundane IT problems as well
7
Mundane IT Problems
• System incompatibilities• Data reformatting• Data cleaning• Documenting sources• Archiving materials
8
Intelligence Analysts: Problem 1
• Look at a series of reports, images, communication patterns;
• Try to build a model of what is going on– Follow leads– Compare to previous situations
• Recent problem: – Groups are changing their behavior patterns
quickly
• Very little use of sophisticated software tools
9
Intelligence Analysts: Problem 2
• Given a large collection• “Roll around” in the data
– See what has been “touched”• Tools should indicate which parts of the
collection have been examined and which have yet to be looked at, and by whom
– View data in several different ways• Data reduction methods such as MDS,
SVD, and clustering often hide important trends.
10
Intelligence Analysts: Problem 2
– Don’t show the obvious• e.g., Cheney is president
– Don’t show what you’ve already shown
– Only show the most recent version– Show which info is not present
• Changes in the usual pattern• Something stops happening
11
Intelligence Analysts: Problem 3
• Prepare a very short executive summary for the purposes of policy making– Really the culmination of a cascade of
summaries– Reps from different agencies meet and
“pow-wow” to form a view of the situation
– Rarely, but crucially, must be able to refer back to original sources and reasoning process for purposes of accountability
12
Investigative Reporter Example
• Looking for trends in online literature
• Create, support, refute hypotheses
13
Investigative Reporter Example
What are the current main topics?
What are the new popular terms? How do they track with the news?
Clustering
Corpus-level statistics, Co-occurrence statistics
Contrasting collection statistics
14
Investigative Reporter Example
How long after a new Star Trek series comes on the air before characters from the series appear in stories?
How often do Klingons initiate attacks against Vulcans, vs. the converse?
Named-entity recognitionCreating a list of termsApply the list to a Subcollection
Create regex rules withPOS information
15
Integration
• TAKMI, by Nasukawa and Nagano, IBM systems Journal 40(4), 2001
• The system integrates:– Real tasks (CRM, patent analysis)– Content analysis– Information Visualization
23
Mixed-Initiative Interaction
• Balance control between user and agent– In Spotfire demo, system adjusts axes after
“other” category hidden– EDA:
• User selects a subset of data based on interesting-looking grouping
• System then does stats on this subset in the background while user continues to work
• Then system notifies user of interesting trends• See the AIDE system:
– St. Amant, R., Dinardo, M. D., and Buckner, N. (2003). Balancing Efficiency and Interpretability in an Interactive Statistical Assistant. Proceedings of IUI.