recap of last weeklcs.ios.ac.cn/~shil/wiki/images/c/c7/l16_evaluation.pdf · hyperbolic viewer v.s....

Recap of last weekM l i i d i li i• Multivariate data visualization

Recap of last week

• Final project checkpoint16 t– 16 teams

– Presentation orders announced in the website

This Week

Evaluation for InfoVis …—— IV Course Spring’14

Graduate Coursef UCASof UCAS

May 16th, 2014

Focus of this week

• Most of that we’ve learned in this Class has been the introduction of a new visualization technique tool or introduction of a new visualization technique, tool or algorithm– Parallel coordinate scatterplot matrix etcParallel coordinate, scatterplot matrix, etc.– D3, prefuse, flare, …– Spring model treemap layoutSpring model, treemap layout

“Am I fooling you?” g y

　“Is my new visualization cool?”

“Is another team’s visualization solution better than mine?”

Focus of this week

Want to learn what aspects of visualizations or systems • Want to learn what aspects of visualizations or systems “works”

• Want to ensure that methods are improving • Want to ensure that methods are improving • Want to ensure that technique actually helps people and

isn’t just “cool” isn t just cool • I need that section in my paper to get it accepted …

Strong View

Outline

• Evaluation: definitions and examplesEvaluation definitions and examples– Hyperbolic viewer v.s. windows explorer– Sunburst v.s. TreeMapp

• Theory in Evaluation of InfoVis– Taxonomy of evaluations: quantitative v.s. qualitative Taxonomy of evaluations quantitative v.s. qualitative

approaches– Seven scenarios of empirical studies in InfoVis

• Counter example– Dynamic network visualization

Evaluation

• DefinitionsO f d di ti th ki f j d t b t th t – Oxford dictionary: the making of a judgment about the amount, number, or value of something

– Wikipedia: a systematic determination of a subject's merit, Wikipedia a systematic determination of a subject s merit, worth and significance, using criteria governed by a set of standards.

• Evaluation of information visualizationH d d i if i li i i “ ff i ”? – How do we determine if a visualization system is “effective”?

– Can be visualization itself or the visual analysis process

Hyperbolic Viewer v.s. Windows Explorer

Focus + context technique will accelerate browsing?

• THE GREAT CHI ’97 BROWSE-OFF– The CHI'97 meeting in Atlanta, GA presented a panel called

The Great CHI'97 Browse offThe Great CHI 97 Browse-off– The aims of the panel were partly entertainment, partly

evaluative, and partly to spur on further research on the , p y pevaluation of browsers

– Six leading structure visualization and browsing technologies h t t i i t i f ti "li " is

have an entertaining yet informative "live" comparison.

Two heroic contestants using only a DOS command-line shell became an instant audience favorite by staying close to the leaders through the first two rounds), but the top performance was turned through the first two rounds), but the top performance was turned in by Ramana Rao using the Hyperbolic Tree technology from Inxight [Software, Inc.]. The Hyperbolic Tree proved itself tobe extremely responsive graphically efficient and devastatingly be extremely responsive, graphically efficient, and devastatingly effective in the hands of a skilled operator using novel techniques like "fanning" the data in a focus+context display.

http://www.baychi.org/calendar/19970812/

Can we conclude that the hyperbolic tree is the better browser?

No!• Different people operating each browser.• Tasks were not carefully designed.Xerox PARC researchers conducted eye-tracking

Xerox PARC researchers conducted eye trackingstudies to investigate... [Pirolli, Card, & van der Wege,

AVI 2000]

Task Types

• N ~ 128 tasks are used Si l t i l t k “Fi d L k Vi t i ”– Simple retrieval tasks – “Find Lake Victoria”

– Complex retrieval tasks – “Which army is led by a Generalissimo?”Generalissimo?

– Local relational tasks – “Which religion has the most holidays?”y

– Complex relational tasks – “Which Greek deity has the same name as a space mission?”

Method

• ParticipantN 8 ti i t it d f th St f d – N = 8 participants were recruited from the Stanford University Psychology Graduate program, and from Xerox PARC. Some recruits were eliminated due to problems with peye-tracking. The Stanford students were paid $50 for their participation.

A t• Apparatus– We used the Hyperbolic and Explorer browsers described

above An ISCAN RK-426PC eye tracker was used to record above. An ISCAN RK-426PC eye tracker was used to record eye fixations and saccades.

• ProcedureProcedure– The participants proceeded through (a) a familiarization

phase, (b) a practice phase, (c) a test phase, and (d) a retest

phase.

Initial Result: No Difference?• MeasuresPerformance time recorded Performance time recorded

for the test and retest phase tasks completed by participants.by part c pants.

Conduct analysis of variance ( NOV ) (ANOVA) test

Introduction of Information Scent

• ParticipantN 48 St f d U i it t d t d b f – N = 48 Stanford University students and members of BayCHI were paid to answer our survey.

• Materials• Materials– The questionnaire contained two questions for each of the

128 Browse-off tasks (a term, such as “Ebola virus” to be ( ,found in the tree). Along the left side of each page of the questionnaire was a tree diagram depicting the top four levels of the Browse-off tree data (The actual terms to be levels of the Browse-off tree data (The actual terms to be found were farther down in the tree).

• ProcedureF h f th 128 t k th i t ti k d – For each of the 128 tasks, the instructions asked participants (1) to rate their familiarity with the term on a 7-point scale, and (2) to identify their top choices of p ( ) y pcategories for locating the answer to tasks

Information Scent = the proportion of participants who p p p pcorrectly identified the location of the task answer fromlooking at upper branches in the tree.looking at upper branches in the tree.

Results Considering Information Scent

Hyperbolic Windows Explorer

Results Considering Information Scent

TreeMap v.s. SunBurstStasko et al IJHCS ‘00

Space-filling hierarchical views: which is best?

Hierarchy Data

• Four dataset in total

• Use sample files and directories from own systems (better than random)( )

Methodology

• N = 60 participantsP ti i t l k ith ll l hi h i • Participant only works with a small or large hierarchy in a sessionT i i t t t t l t l• Training at start to learn tool

• Vary order across participants

Tasks• Identification (naming or pointing out) of a file based on size,

specifically, the largest and second largest files (Questions 1-2)Id tifi ti f di t b s d si s ifi ll th l st • Identification of a directory based on size, specifically, the largest (Q3)

• Location (pointing out) of a file, given the entire path and name p g g p(Q4-7)

• Location of a file, given only the file name (Q8-9)• Identification of the deepest subdirectory (Q10)• Identification of the deepest subdirectory (Q10)• Identification of a directory containing files of a particular type

(Q11)• Identification of a file based on type and size, specifically, the

largest file of a particular type (Q12)• Comparison of two files by size (Q13)• Comparison of two files by size (Q13)• Location of two duplicated directory structures (Q14)• Comparison of two directories by size (Q15)

p y (Q )• Comparison of two directories by number of files contained (Q16)

Hypothesis• Treemap will be better for comparing file sizes

– Uses more of the areaUses more of the area

• Sunburst would be better for searching files and • Sunburst would be better for searching files and understanding the structure

More explicit depiction of structure– More explicit depiction of structure

S b st ld b f d ll• Sunburst would be preferred overall

Result (Small Hierarchy Data)

Result (Large Hierarchy Data)

Result Analysis

• Ordering effect for Treemap on large hierarchiesOrdering effect for Treemap on large hierarchies– Participants did better after seeing SB first

• Performance was relatively mixed, trends favored Sunburst but not clear-cutSunburst, but not clear-cut

Subjective Performance

• Subjective preference: SB (51) TM (9) unsure (1)Subjective preference: SB (51), TM (9), unsure (1)• People felt that TM was better for size tasks (not

borne out by data)borne out by data)• People felt that SB better for determining which

directories inside others– Identified it as being better for structure

Some Theory onSome Theory onEvaluation of Visualization

List of Events• Journal issue whose special topic focus was Empirical

Studies of Information VisualizationStudies of Information Visualization– International Journal of Human-Computer Studies, Nov. 2000,

Vol. 53, No. 5,• BELIV workshop: Beyond Time And Errors: Novel

Evaluation Methods For Visualization– 2006, 2008, 2010, 2012, 2014, held at AVI, CHI, VisWeek

List of Papers• Survey/overview papers

– C. Plaisant. The challenge of information visualization evaluation. In gProceedings of the Working Conference on Advanced Visual Interfaces (AVI), pages 109–116, New York, USA, 2004. ACM.S C d l E l ti i f ti i li ti I A K J– S. Carpendale. Evaluating information visualizations. In A. Kerren, J. T. Stasko, J.-D. Fekete, and C. North, editors, Information Visualization: Human-Centered Issues and Perspectives, pages 19–45, S i LNCS B li /H id lb 2007Springer LNCS, Berlin/Heidelberg, 2007.

– T. Munzner. A nested process model for visualization design and validation IEEE Transactions on Visualization and Computervalidation. IEEE Transactions on Visualization and Computer Graphics, 15(6):921–928, 2009.

– H. Lam, E. Bertini, P. Isenberg, C. Plaisant, and S. Carpendale, E i i l di i i f i i li i S iEmpirical studies in information visualization: Seven scenarios, IEEE Trans. Visualization and Computer Graphics, vol. 18, pp. 1520–1536, Sep. 2012.

– ......

Evaluation Method Taxonomy• Principled Rationale

– Apply design heuristics, perceptual principles• Informal User Study

– Have people use visualization, observe resultsC t ll d E i t• Controlled Experiment– Choose appropriate tasks / users to compare– Choose metrics (time error what else?)Choose metrics (time, error, what else?)

• Field Deployment or Case Studies– Observation and Interview– Document effects on work practices

• Theoretical AnalysisAl i h i d l i– Algorithm time and space complexity

• Benchmarks– Performance (e g interactive frame rates)

– Performance (e.g., interactive frame rates)– Scalability to larger data sets

Desirable Factors in Evaluation• Generalizability• Precision• Precision• Realism

Quantitative Evaluation Methods• Laboratory experiments & studies• Traditional empirical scientific experimental approach• Traditional empirical scientific experimental approach• Steps

Quantitative Evaluation Challenges• Conclusion Validity

Is there a relationship?– Is there a relationship?• Internal Validity

I h l i hi l?– Is the relationship causal?• Construct Validity

– Can we generalize to the ideas the study is based on?• External Validity

– Can we generalize the study results to other people/place/time?• Ecological Validityg y

– Does the experimental situation reflect the type of environment in which the results will be applied?

Qualitative Evaluation Methods• Types: observation and interview

Nested methods– Nested methodsExperimenter observation, think-aloud protocol, collecting participant opinionsparticipant opinions

– Inspection evaluation methodsH i ti t j dHeuristics to judge

• Observational context– In situ, laboratory, participatory– Contextual interviews important

Qualitative Evaluation Challenges• Sample sizes

• Subjectivity

• Analyzing qualitative data

Seven Scenarios of Evaluation on Visualization

• Meta-review: analysis of 850 Vis papers (361 with 850 Vis papers (361 with evaluation)

• Focus on evaluation scenarios

Seven Scenarios of Evaluation on Visualization

• Understanding data analysisUnderstanding environments and work practices (UWP)– Understanding environments and work practices (UWP)

– Evaluating visual data analysis and reasoning (VDAR)E l i i i h h i li i (CTV)– Evaluating communication through visualization (CTV)

– Evaluating collaborative data analysis (CDA)

• Understanding visualizations – Evaluating user performance (UP) – Evaluating user experience (UE) g p ( )– Evaluating visualization algorithms (VA)

Example: UWP

• Understanding environments and work practicesElicit formal requirements for design– Elicit formal requirements for design

– Study people for which a tool is being designed and the context of usecontext of use

– Very few infovis papers on this topic

Tagging Each Paper

Trends

Counterexample

Summaryy

• Evaluation: definitions and examplesEvaluation definitions and examples– Hyperbolic viewer v.s. windows explorer– Sunburst v.s. TreeMapp

• Theory in Evaluation of InfoVis– Taxonomy of evaluations: quantitative v.s. qualitative Taxonomy of evaluations quantitative v.s. qualitative

approaches– Seven scenarios of empirical studies in InfoVis

• Counter examples– Dynamic network visualization

Questions?Questions?

What’s Next –Hot Topics in Information Visualizationp

recap of last weeklcs.ios.ac.cn/~shil/wiki/images/c/c7/l16_evaluation.pdf · hyperbolic viewer v.s....

Documents

sámántalk – avagy a koreai sámánizmus a...

ac1310 ce, cul 97 ac1314 ce, cul 97 ac1335 ce, cul 97 ac2086...

diffusion of tai chi in older populations:...

chi ottoch chi chi · 2017. 10. 5. · chi ottoch chi chi ....

“när vi gick hem från shil så hörde man jiddisch över...

quanly.hpu2.edu.vn...truùn m hà nqi 2 - an 08 tín chi,...

type 97 chi-ha (id1) type 97 chi-ha (id2) us m4 sherman ......

crossref annual meeting 2012 global panel choon shil lee

invited paper socialnetworkanalysis...

danh sách chủ thẻ tín dụng visa, mastercard, jcb...

kraft slough - north dakota game and fish · l!y...

eØs-tillegget issn 1022-9310 · 97/eØs/41/07...

sjcs.com.vnsjcs.com.vn/upload/2012/03/27/20120312-clw-bctckiemtoan2011_1332832613.… ·...

97°34'23w 97°34'16w 97°34'9w 97°34'2w 97°33'55w … ·...

the future of funding societies credit risk model shil...

yin yang chi dan wu xing pada arsitektur...

chi acapella-chi, chi

wyatt earp and the _buntline special_ myth by william b....

replies of rti for the month of aug'18 ·...

financial analysis of shil