linfovis winter 2011 chris culy scientific visualization of language data chris culy winter 2011

24
LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

Upload: moses-fletcher

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Scientific Visualization of Language Data

Chris CulyWinter 2011

Page 2: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

LInfoVis* (< Language Information Visualization, cf. InfoVis):

the visualization of language related information, especially on computer displays

* Not a standard term (not yet, anyway)

What are we doing?

Page 3: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

“Visualization has to be more than pretty pictures. It has to inform. It has to

challenge. It has to further our understanding. Visualizing data is not

about pretty pictures.”

Robert Kosara on www.eagereyes.org

Page 4: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

What are we not doing?

(Only language, no other data.)

Source: Lewis Carroll. Alice's Adventures in Wonderland. Ch. 3

Page 5: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Gray Area

Numeric information derived from language datae.g. frequencies, statistical measures, etc.

There are lots of chart/graphing packagese.g. With spreadsheets, in R, etc.

But, if there is an interesting and useful way to incorporate the language data, we'll do that

Page 6: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Corpus Cloudshttp://www.eurac.edu/en/research/institutes/multilingualism/Projects/LInfoVis/CorpusClouds.html

Page 7: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Presentation vs. Analysis

Presentation:

Convey information known to the author To an audience other than the author Typically static (e.g. charts in a paper)

Analysis

Present information that is not (well) known to the user Help the user understand (“make sense of”) the information Often interactive, though not necessarily

Different goals, different techniques

Page 8: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Why visualization?

The human visual system is very efficient at discovering certain patterns in large amounts of information.

The eye has on average: 92 million rods (for light level) 4.6 million cones (for color)

Curcio, C. A., Sloan, K. R., Kalina, R. E. and Hendrickson, A. E. (1990), Human photoreceptor topography. The Journal of Comparative Neurology, 292: 497–523. doi: 10.1002/cne.902920402

updated 10-12 times per second Things are more much more complicated than those basic numbers, but still ...

Preattentive processing: recognition of features before conscious processing

We can take advantage of this capacity to help linguists analyze language, especially in finding patterns

Page 9: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

What makes LInfoVis special?

Textual elements are:

Categorical not numeric in general, no scale of comparison

Hearst M. 2009. Search User Interfaces. Cambridge University Press.

NB: we will (almost?) always have non-textual data, but we will always need to show the textual elements as well

Page 10: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

What makes LInfoVis special?

Language is:

not mappable -- there is in general no more compact way to visualize language (that is humanly comprehensible)

i.e. unlike numbers, we can't map word to size, shape, color, etc.

cf. Culy, C., Lyding, V., and Dittmann, H. 2011. "xLDD: Extended Linguistic Dependency Diagrams" in Proceedings of the 15th International Conference on Information Visualisation IV2011, 12, 13 - 15 July 2011, University of London, UK. 164-169.

Page 11: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

What makes LInfoVis special?

Linguistics has:

particular data structures (like any field)

standard ones used in different ways e.g. trees, feature structures, KWIC

with particular (conventional) visual representations e.g. dependency structures as arcs

Page 12: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

What makes LInfoVis special?

Linguists:

Often want to exam the original data, not just the measurements/summary More than some (most?) fields e.g. word frequencies in a text/corpus -- linguists

want to be able to exam the source data, to see the words in context

Page 13: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Goethe on seeing

Goethe

Man sieht nur das, was man weiß.

You only see what you know.

Culy

You can only visualize what you have.

Page 14: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

The real Goethe on seeing

Man erblickt nur, was man schon weiß und versteht.

You glimpse only what you already know and understand.Kanzler F. v. Müller, Unterhaltungen mit Goethe, 24, April 1819, cited in Lexikon

Goethe-Zitate

Was man weiß, sieht man erst!

You see first what you know!

In: Einleitung in die Propyläen

That's more optimistic!

Page 15: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Some challenges in LInfoVis

Dealing with the categorical/non-mappable nature of language How can we show textual data in an effective way? Exploit the capabilities of the human visual system Cater to our general cognitive capabilites Interaction is key

Page 16: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Some challenges in LInfoVis

Dealing with large amounts of data e.g. 2560x1440 monitor = 3,686,400 pixels, but one pixel is pretty

small, and 3.7M is a lot smaller than the amount of information in a small

corpus: Penn Treebank has 4.5M words, plus POS, parses etc Particular subsets of interest will be smaller, but they often

(usually?) contain more information than can fit on a screen

What are effective strategies for dealing with large amounts of data? From a visualization perspective From an architectural/programming perspective

Page 17: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Some challenges in LInfoVis

What are the most useful levels of abstraction for LInfoVis tools? i.e. what functionalities should LInfoVis components

contain?

Page 18: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Other practical challenges

How to integrate LInfoVis into workflows Of people: How can LInfoVis be made useful to people

doing linguistic analysis? Of programs: How can LInfoVis programs be integrated

with other tools? e.g. Weblicht What are the roles of LInfoVis components?

Producer/consumer Read only vs. read/write (i.e. using LInfoVis tools to modify/create

data) What's the division of labor between LInfoVis components and

others? How do we maintain the connection with the original

data?

Page 19: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Where do LInfoVis visualizations come from?

Use existing visualizations as is

Modify and adapt existing visualizations

Add Infovis techniques to standard linguistic diagrams

New approaches

Page 20: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Why components?

In many applications, the visualizations are custom-designed for the application and tightly integrated with it.

But, reinventing the wheel is not very interesting or productive.

LInfovis visualizations could be more like graphs/charts and parsers: components that can be used with a variety of data of the same type

Line graphs can be used with data from any field Parsers can be used with grammars for any language

Claims (Culy):

a) Linguistic data of the same “type” can be visualized meaningfully by the same visualization(s).

b) There are enough data sets with the same “type” to make (a) interesting, and hence components worth creating.

Page 21: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Structure of the course

A mix of theory and practice

Survey of visualization theory and general techniques (CuC) Presentation of particular techniques and applications (everyone)

Read articles, with one person responsible for presenting them

Programming exercises Introduction to Javascript (as necessary) Basic drawing (with Java, Javascript) Some higher level visualization toolkits (e.g. Processing, Protovis/D3)

Project

Page 22: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

The project

Goal: develop a scientific visualization of some kind of linguistic data Start thinking about what kind of data you want to visualize, and where you'll get it

Who: Small groups If you are inexperienced in programming, work with someone who is more

experienced

What you'll need to provide me at the end of the term: 1. A functioning visualization, with some sample data to visualize 2. Technical documentation of how the visualization works, and how to use it

e.g. Javadoc and help/readme/tutorial 3. A short (~15 pages) paper describing the visualization: background, its goals,

how it works, and future directions 4. If you have gotten feedback from real or potential users, include that in the

paper

Page 23: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

Practical information

http://www.sfs.uni-tuebingen.de/~cculy/courses/W2011/vis/

[email protected]

Office: 1.07

Tel: 07071/29-7 3966

Sprechstunden (Office hours): T 14-15, Th 16-17

Page 24: LInfoVis Winter 2011 Chris Culy Scientific Visualization of Language Data Chris Culy Winter 2011

LInfoVis Winter 2011 Chris Culy

For next time

Read the tutorial (link web site) Through “Principles: visual variables (2)”