marti hearst sims 247 sims 247 lecture 4 graphing multivariate information january 29, 1998

28
Marti Hearst SIMS 247 SIMS 247 Lecture 4 SIMS 247 Lecture 4 Graphing Multivariate Graphing Multivariate Information Information January 29, 1998 January 29, 1998

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

SIMS 247 Lecture 4SIMS 247 Lecture 4Graphing Multivariate InformationGraphing Multivariate Information

January 29, 1998January 29, 1998

Page 2: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Follow-up previous lectureFollow-up previous lecture

• Docuverse: Docuverse: – length of arc is proportional to number of subdirectories– radius for a given arc is long enough to

contain marks for all the files in the directory

• Nightingale’s “coxcomb”Nightingale’s “coxcomb”– keep arc length constant– vary radius length (proportional to sqrt(freq))

Page 3: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Today: Multivariate InformationToday: Multivariate Information

• We see a 3D worldWe see a 3D world• How do we handle more than 3 How do we handle more than 3

variables?variables?– multi-functioning elements

• Tufte examples• cinematography example

– multiple views

Page 4: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Example Data SetsExample Data SetsHow do we handle 9 variables?How do we handle 9 variables?– Our web access dataset– Factors involved in alcoholism

• ALCOHOL – USE– AVAILABILITY– CONCERN ABOUT USE– COPING MECHANISMS

• PERSONALITY MEASURES– EXTROVERSION– DISINHIBITION

• OTHER– GENDER– GPA

Page 5: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Graphing Multivariate InformationGraphing Multivariate Information

How do we handle cases with more How do we handle cases with more than three variables?than three variables?– Scatterplot matrices– Parallel coordinates– Multiple views– Overlay space and time– Interaction/animation across time

Page 6: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Multiple Variables: Scatterplot MatricesMultiple Variables: Scatterplot Matrices(from Wegman et al.)(from Wegman et al.)

Page 7: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Mu

ltip

le V

aria

ble

s: S

catt

erp

lot

Mat

rice

sM

ult

iple

Var

iab

les:

Sca

tter

plo

t M

atri

ces

(fro

m S

chal

l 95)

(fro

m S

chal

l 95)

Page 8: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Multiple Views: Star PlotMultiple Views: Star Plot(Discussed in Feinberg 79. Works better with animation. Example (Discussed in Feinberg 79. Works better with animation. Example

taken from Behrans & Yu 95.)taken from Behrans & Yu 95.)

Page 9: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Multiple Dimensions: Parallel CoordinatesMultiple Dimensions: Parallel Coordinates(earthquake data, color indicates longitude, y axis severity (earthquake data, color indicates longitude, y axis severity

of earthquake, from Schall 95)of earthquake, from Schall 95)

Page 10: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Multiple Dimensions: Multivariate Star PlotMultiple Dimensions: Multivariate Star Plot(from Behran & Yu 95)(from Behran & Yu 95)

Page 11: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Chernoff FacesChernoff Faces• Assumption: people have built-in face Assumption: people have built-in face

recognizersrecognizers• Map variables to features of a cartoon faceMap variables to features of a cartoon face

– Example: eyes• location, separation, angle, shape, width

– Example: entire face• area, shape, nose length, mouth location, smile

curve

• Originally tongue-in-cheek, but taken seriouslyOriginally tongue-in-cheek, but taken seriously• Sometimes seems to work for small numbers Sometimes seems to work for small numbers

of pointsof points

Page 12: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Chernoff Example Chernoff Example (Marchette)(Marchette)

• Three groups of pointsThree groups of points– each drawn from a different distribution with

5 variables

• First show scatter-plot matrixFirst show scatter-plot matrix• Then graph with Chernoff facesThen graph with Chernoff faces

– vary faces overall– vary eyes– vary mouth and eyebrows

• Which seems to be most effective?Which seems to be most effective?

Page 13: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Chernoff Experiment Chernoff Experiment (Marchette)(Marchette)

Page 14: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Chernoff Experiment Chernoff Experiment (Marchette)(Marchette)

Page 15: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Chernoff Experiment Chernoff Experiment (Marchette)(Marchette)

Page 16: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Chernoff Experiment Chernoff Experiment (Marchette)(Marchette)

Page 17: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Overlaying Space and TimeOverlaying Space and Time(Minard’s graph of Napolean’s march through Russia)(Minard’s graph of Napolean’s march through Russia)

Page 18: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

A Detective StoryA Detective Story(Inselberg 97)(Inselberg 97)

• Domain: Manufacture of computer chipsDomain: Manufacture of computer chips• Objectives: create batches with Objectives: create batches with

– high yield (X1)– high quality (X2)

• Hypothesized cause of problem:Hypothesized cause of problem:– 9 types of defects (X3-X12)

• Some physical properties (X13-X16)Some physical properties (X13-X16)

• Approach:Approach:– examine data for 473 batches– use interactive parallel coordinates

Page 19: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Multidimensional DetectiveMultidimensional Detective

• Long term objectives: Long term objectives: – high quality, high yield

• Logical approach given the Logical approach given the hypothesis:hypothesis:– try to eliminate defects

• First clue: First clue: – what patterns can be found among

batches with high yield and quality?

Page 20: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Detectives aren’t intimidated!Detectives aren’t intimidated!

X1 seems to be normally distributed; X2 bipolar

Page 21: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

High quality yields obtained despite defectsHigh quality yields obtained despite defects

goodbatches

some low X3defect batchesdon’t appear here

X15breaksinto twoclusters(importantphysicalproperty)

at least onegood batch withdefects

Page 22: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Low-defect batches are not highest quality!Low-defect batches are not highest quality!few defects

lowyield,low quality

Page 23: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Original plot shows defect X6 behaves Original plot shows defect X6 behaves differently; exclude it from the 9-out-of-10 differently; exclude it from the 9-out-of-10 defects constraint; the best batches returndefects constraint; the best batches return

Page 24: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Isolate the best batches.Isolate the best batches.Conclusion: defects are necessary!Conclusion: defects are necessary!

The very best batchhas X3 and X6 defects

Ensure this is not anoutlier -- look attop few batches.The same result is found.

Page 25: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

How to graph web page traversals? How to graph web page traversals?

Page 26: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

References for this LectureReferences for this Lecture

• Visualization Techniques of Different Dimensions, John Behrens and Chong Ho Visualization Techniques of Different Dimensions, John Behrens and Chong Ho Yu, 1995 Yu, 1995 http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.htmlhttp://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html

• Feinberg, S. E. Graphical methods in statistics. Feinberg, S. E. Graphical methods in statistics. American Statisticians, 33, American Statisticians, 33, 165-165-178, 1979178, 1979

• Friendly, Michael, Gallery of Data Visualization. Friendly, Michael, Gallery of Data Visualization. http://www.math.yorku.ca/SCS/Galleryhttp://www.math.yorku.ca/SCS/Gallery– scan of Minard’s graph from Tufte 1983– multivariate means comparison

• Wegman, Edward J. and Luo, Qiang. High Dimensional Clustering Using Parallel Wegman, Edward J. and Luo, Qiang. High Dimensional Clustering Using Parallel Coordinates and the Grand Tour., Conference of the German Classification Coordinates and the Grand Tour., Conference of the German Classification Society, Freiberg, Germany, 1996. http://galaxy.gmu.edu/papers/inter96.htmlSociety, Freiberg, Germany, 1996. http://galaxy.gmu.edu/papers/inter96.html

• Cook, Dennis R and Weisberg, Sanford. An Introduction to Regression Graphics, Cook, Dennis R and Weisberg, Sanford. An Introduction to Regression Graphics, 1995. http://stat.umn.edu/~rcode/node3.html1995. http://stat.umn.edu/~rcode/node3.html

• Schall, Matthew. SPSS DIAMOND: a visual exploratory data analysis tool. Schall, Matthew. SPSS DIAMOND: a visual exploratory data analysis tool. Perspective, 18 (2), Perspective, 18 (2), 1995. http://www.spss.com/cool/papers/diamondw.html1995. http://www.spss.com/cool/papers/diamondw.html

• Marchette, David. An Investigation of Chernoff Faces for High Dimensional Marchette, David. An Investigation of Chernoff Faces for High Dimensional Data Exploration. http://farside.nswc.navy.mil/CSI803/Dave/chern.htmlData Exploration. http://farside.nswc.navy.mil/CSI803/Dave/chern.html

• Chernoff, H. The use of Faces to Represent Points in k-Dimensional Space Chernoff, H. The use of Faces to Represent Points in k-Dimensional Space Graphically. Graphically. Journal of the American Statistical Association, 68,Journal of the American Statistical Association, 68, 361-368, 1973. 361-368, 1973.

Page 27: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Next Time: Brushing and LinkingNext Time: Brushing and Linking

• An interactive techniqueAn interactive technique• Brushing:Brushing:

– pick out some points from one viewpoint

– see how this effects other viewpoints– (Cleveland scatterplot matrix

example)

• Graphs must be linked togetherGraphs must be linked together

Page 28: Marti Hearst SIMS 247 SIMS 247 Lecture 4 Graphing Multivariate Information January 29, 1998

Marti HearstSIMS 247

Brushing and Linking SystemsBrushing and Linking Systems

• VISAGE: Roth et. alVISAGE: Roth et. al• Attribute Explorer: Tweedie et. alAttribute Explorer: Tweedie et. al• SpotFire (IVEE): Ahlberg et. alSpotFire (IVEE): Ahlberg et. al