a hybrid tree-matrix visualization technique to support...
TRANSCRIPT
Context Visualisation Interaction Prototype Conclusion
Dendrogramix:a Hybrid Tree-Matrix Visualization Technique
to Support Interactive Exploration of Dendrograms
Renaud Blanch (IIHM) <[email protected]>Rémy Dautriche (IIHM) <[email protected]>
Gilles Bisson (AMA) <[email protected]>
Université Grenoble Alpes, CNRSLaboratoire d’Informatique de Grenoble
april 14, 2015<PacificVis 2015>
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Hierarchical agglomerative clustering
Hierarchical agglomerative clustering
ClassificationA greedy algorithm that recursively buildsa hierarchy of clusters:
one cluster per item;the two most similar clusters are agglomerated until thereis a single cluster left.
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Hierarchical agglomerative clustering
Hierarchical agglomerative clustering (cont.)
Inputs
a set of itemsa metric to measure the similarity of itemsa linkage, i.e. a generalization of this metric to groups ofitems
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Hierarchical agglomerative clustering
Example
ItemsPoints to classify according to their Euclidean distance.
CE
AB
D
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Hierarchical agglomerative clustering
Visual representation
DendrogramA dendrogram is the canonical representationof the clusters’ hierarchy.
B D E C A
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Hierarchical agglomerative clustering
Limitation
Items similarities are lost, no explanation for the clustering.Heatmaps are often juxtaposed to dendrograms but:
at the expense of screen space; andmaking items comparison difficult for distant items.
B D E C A
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: visualisation
Dendrogramix
Dendrogramixbuilds a visualization by merging:
the hierarchy of clusters; andthe similarity matrix
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: visualisation
Design
Similaritiesencoded as dot sizes.
A B C D EABCDE
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: visualisation
Design (cont.)
Matrix reorderingusing optimal leaf order [BarJoseph et al., 2003].
B D E C ABDECA
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: visualisation
Design (cont.)
Clusters encodingusing enclosure and luminance.
B D E C ABDECA
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: visualisation
Design (cont.)
Dendrogramixtop half of the matrix, rotated.
B D E C A
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: visualisation
Design (cont.)
Whole process Online Submission ID: 171
� � � � ������
� � � � ������
� � � � ������ � � � � � � � � � �
(a) (b) (c) (d) (e)
Figure 3: From similarity matrix to Dendrogramix: (a) graphic encoding of the similarity; (b) matrix reordered with an (optimal) order compatiblewith a traversal of the tree of clusters; (c) graphic encoding of clusters; (d) Dendrogramix compared to (e) classic dendrogram showing the samehierarchical clustering of the same data.
2 RELATED WORK
The clustering itself uses the well-known AHC method [16]. Den-drogramix is not the first attempt at showing together the clusteringresult and the details about items, but previous works in this areaall rely on the juxtaposition of two visualizations: the classical den-drogram and another visualization for the individual items. Den-drogramix is not a juxtaposition of two visualizations but rather amix of two visualizations. It is thus more close to recent works inthe InfoVis community about hybrid visualizations. The interactiontechniques provided by our Dendrogramix are also comparable torecent works focused on the interaction with visualizations, espe-cially interaction that exploit the structure of the data visualized toguide the user.
2.1 Hierarchical Clustering VisualizationSeveral visualization have been proposed to display the informationabout items together with the clustering result (see Wilkinson &Friendly survey [17] for a comprehensive overview of those tech-niques). The raw information on items can be shown with a heatmap [15] placed side by side with the dendrogram. Such visual-izations have been improved with interactions and overview+detailview to cope with the scaling issues encountered when dealing withlarge data sets [5].
Another way to display information about items is to show theirsimilarity matrix rather than their raw vectors, as proposed byGower & Digby [7]. It has the advantage of saving space whenthe item vectors have more dimensions than the number of items,as the similarity matrix is square and symmetric. It has also the ad-vantage of actually showing the similarity between items as seen bythe system, rather than letting the user reconstruct it from the visualsimilarity of the row vectors from the heat map which can be mis-leading. For those reasons we have also chosen to use the similaritymatrix as a starting point to provide information on items.
Our main contribution on the graphic representation is that theDendrogramix embeds the information about items into the visual-ization of the tree representing the clustering result.
2.2 Hybrid VisualizationsAnother way to cope with scaling issues is to use different repre-sentations for sub-parts of a data-set (e.g., node-link and treemapfor trees, as in elastic hierarchies [18], or node-link and adjacencymatrix for graphs, as in MatLink [8], NodeTrix [9], or TreeMa-trix [12]). The choice for a specific representation for a part ofthe data can be made interactively by the user, as in NodeTrix, ortake advantage of theoretical results about the space-efficiency ofvarious representations (e.g, for trees [10]) to automatically switchfrom a representation to another at various levels of aggregation.
Such hybrid visualizations have been used to display large den-drograms. In Stacked Trees [3], the classical dendrogram visual-ization is used for the largest clusters, but above an homogeneity
threshold the visualization switches to a stack of leaves rather thangoing on with the tree representation. This saves space at the ex-pense of loosing structural information.
Our Dendrogramix is also hybrid but in a different way: it com-bines the visualizations of two different data sets that do not havethe same structure: the input of the AHC —a matrix storing theitems similarity—, and the output of the AHC —a binary tree de-picting the clusters and their homogeneity.
2.3 Interaction TechniquesThe interaction techniques designed to explore the Dendrogramixfollow the principles of direct manipulation [14], especially the factthat they should be rapid, reversible, and incremental. The mainchallenge here is that the data structures manipulated are inherentlydiscrete. Making the interaction continuous requires the use of an-imated transitions controlled either by the system or by the userto switch between coherent states of the visualization. ZoomableTreemaps [4] is a good example of previous work that providesa continuous interaction with the discrete data structure of a treevisualized as a treemap. We borrowed from Zoomable Treemapsthe idea of using a crossing-based interaction [1] to perform an ad-vanced selection (of two clusters simultaneously in our case; of anarbitrary internal node of the treemap in their case).
Most of the interaction techniques proposed by Dendrogramixfit in the framework of interactions with hierarchical aggregationproposed by Elmqvist & Fekete [6]. However, they do not considerinteractions altering the layout (such as node reordering) in theirframework.
3 DENDROGRAMIX
A Dendrogramix is an hybrid visualization that mixes the similar-ity matrix of items with the binary tree of clusters resulting fromtheir AHC. We first describe how the visualization is built and whatkind of observations can be made using it. We then describe theinteraction techniques that allow its exploration.
3.1 VisualizationFigure 2.a shows the data set —five points of the plan— used belowto illustrate how a Dendrogramix is built. Figure 2.b shows the re-sult of an AHC of this data set using the Euclidean distance to mea-sure the similarity between two points, and the minimum distancebetween the elements of two clusters to generalize this similaritymeasure to clusters (i.e., the single linkage method).
3.1.1 ConstructionUsing the tree of clusters and the similarity matrix as inputs, theDendrogramix is built:
1. by encoding the similarity matrix using the size of the circulardots to denote similarity (large dots means similar items) —Figure 3.a;
2
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: visualisation
Dendrogramix
Vis Community
Paolo
Cig
noni
Mic
hael G
arl
and
Ala
rk Josh
iJa
rke J.
van W
ijkD
iete
r Sch
mals
tieg
Ale
xander
Lex
Marc
Str
eit
Just
in T
alb
ot
Pat
Hanra
han
Chi-W
ing F
uA
ndre
w H
anso
nFr
ank
van H
am
Fern
anda B
. V
iégas
Mart
in W
att
enberg
Robert
Kosa
raR
em
co C
hang
Will
iam
Rib
ars
kyM
ing D
ong
Jing H
ua
Xia
nfe
ng G
uA
rie K
aufm
an
Kla
us
Muelle
rPa
tric
Lju
ng
Anders
Pers
son
Cla
es
Lundst
röm
Anders
Ynnerm
an
Tim
o R
opin
ski
Kla
us
Hin
rich
sZ
hic
heng L
iuJo
hn S
task
oB
ohyoung K
imJin
wook
Seo
Bongsh
in L
ee
Georg
e R
obert
son
Danyel Fi
sher
Tim
Dw
yer
Petr
a Ise
nberg
Sheela
gh C
arp
endale
Chri
stopher
Colli
ns
Anast
asi
a B
eze
rianos
Jean-D
anie
l Fe
kete
Nath
alie
Henry
Ric
he
Mic
hael J. M
cGuff
inM
aneesh
Agra
wala
Jeff
rey H
eer
Mic
hael B
ost
ock
Mela
nie
Tory
Ivan V
iola
Bart
ter
Haar
Rom
eny
Anna V
ilanova
Marc
el B
reeuw
er
Javie
r O
liván B
esc
ós
Eduard
Grö
ller
Ste
fan B
ruck
ner
Chri
stoph H
ein
zlJü
rgen W
ase
rR
aphael Fu
chs
Benja
min
Sch
indle
rR
onald
Peik
ert
Thom
as
Ert
lSebast
ian G
rott
el
Mic
hael B
urc
hD
anie
l W
eis
kopf
Tors
ten M
olle
rA
lireza
Ente
zari
Charl
es
D.
Hanse
nR
obert
Kin
caid
Tam
ara
Munzn
er
Hansp
ete
r Pfist
er
Mir
iah M
eyer
Robert
Mik
e K
irby
Ross
T.
Whit
ake
rW
on-K
i Je
ong
Jens
Sch
neid
er
Rüdig
er
West
erm
ann
Denis
Gra
canin
Kre
sim
ir M
atk
ovic
Helw
ig H
ause
rM
ark
us
Hadw
iger
Helm
ut
Dole
isch
Phili
pp M
uig
gB
ern
hard
Pre
imC
hri
stophe H
urt
er
Fern
ando V
. Pa
ulo
vic
hLu
is G
ust
avo N
onato
Clá
udio
T.
Silv
aC
arl
os
E.
Sch
eid
egger
Julia
na F
reir
eH
am
ish C
arr
Ale
xander
Wie
bel
Geri
k Sch
euerm
ann
Heik
e Jänic
keM
in C
hen
Peer-Ti
mo B
rem
er
Vale
rio P
asc
ucc
iA
ttila
Gyula
ssy
Vija
y N
ata
raja
nB
ern
d H
am
ann
Ingri
d H
otz
Hans
Hagen
Chri
stoph G
art
hKenneth
I.
Joy
Xavie
r Tr
icoch
eC
arl
-Fre
dri
k W
est
inG
ord
on L
. K
indlm
ann
Thom
as
Sch
ult
zH
ans-
Pete
r Seid
el
Holg
er
Theis
el
Tino W
ein
kauf
Hans-
Chri
stia
n H
ege
Teng-Y
ok
Lee
Han-W
ei Shen
Chaoli
Wang
Kw
an-L
iu M
aC
arl
os
D.
Corr
ea
Pete
r Li
ndst
rom
Robert
Moorh
ead
Song Z
hang
David
S.
Ebert
Wei C
hen
Min
g-Y
uen C
han
Yingca
i W
uH
uam
in Q
uShix
ia L
iuW
eiw
ei C
ui
Hong Z
hou
Xia
oru
Yuan
Jian H
uang
Frit
s H
. Po
stC
harl
P.
Both
aR
obert
Lara
mee
Eugene Z
hang
Adit
i M
aju
mder
Eze
kiel S.
Bhask
er
Aid
an S
lingsb
yJo
Wood
Jaso
n D
yke
sG
regory
Cip
riano
Mic
hael G
leic
her
David
H.
Laid
law
Danie
l A
. Keim
Ben S
hneid
erm
an
David
Thom
pso
n
People are represented as the vector of their coauthors.Similarities are computed using the cosine similarityand single linkage.
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: interactions
Items Comparison
Single item highlight
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: interactions
Items Comparison (cont.)
Pairwise items comparison
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: interactions
Partition Selection
Partition selection
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: interactions
Cluster Aggregation
Cluster Labeling
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: interactions
Cluster Aggregation (cont.)
Cluster folding
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Dendrogramix: interactions
Tree Reordering
Reordering by direct manipulation
R. Blanch et al.
fils gauche (resp. droit) de sa classe parente, son glisser vers la droite (resp. gauche) est simple,il correspond à une permutation avec son frère. Dans le cas contraire, il faut remonter dansl’arbre pour trouver la première classe ancêtre pour laquelle la classe courante est incluse dansle fils gauche (resp. droit) et permuter alors les fils de celle-ci.
(a) (b) (c) (d)
FIG. 4 – Interaction de glisser-déposer pour permuter des classes, (a) état initial, (b) et (c)états intermédiaires, (d) état final après permutation.
Durant l’interaction (Figure 4 (b) et (c)), les points donnant l’information de similaritéindividuelle ne sont pas affichés pour des raisons de performance. La position de la classemanipulée est contrainte pour rester dans son parent et ne pas empiéter sur son frère. Sondéplacement prend donc la forme d’un saut : elle remonte vers la racine de son parent ; quandelle l’atteint, son frère passe au-dessous ; et enfin, elle peut redescendre de l’autre côté. Sil’utilisateur interrompt son glisser-déposer avant son terme, la fin du déplacement est animée :la classe revient à son point de départ si elle n’était pas encore passée par dessus son frère(Figure 4 (b)) ; sinon (Figure 4 (c)), elle poursuit son chemin de l’autre côté.
2.2.3 Agrégation de classes
Les classes peuvent être annotées : l’appui sur la touche entrée passe en mode édition dela classe courante, ce qui permet d’entrer ou d’éditer une étiquette pour cette classe. Cetteétiquette est affichée à proximité de la racine de la classe (Figure 5 (a), la classe courante de 5individus a été étiquetée “MSR”). Lorsqu’on clique sur une classe, elle est repliée et l’étiquetteest alors utilisée pour la montrer parmi les individus (Figure 5 (b)). Les lignes et colonnes
(a) (b)
FIG. 5 – Agrégation de classes : (a) étiquettage d’une classe ; et (b) repliement de cette classe.
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Prototype
Demo
demo
time for a demo!
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Prototype
Implementation
Portable open source code available at:<http://iihm.imag.fr/blanch/projects/dendrogramix/>.
Software used:
PythonOpenGL/GLUTscipy
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Prototype
Evaluation
Insight-based evaluation [North et al., 2011]
data set: 189 permanent researchers of the lab,characterized by the 2688 co-authorsof their 3245 publications over the last 4 years.participants: 6 senior researchers from the lab (deputydirector or group heads).two visualizations: classical dendrogramvs. Dendrogramix.
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Prototype
Evaluation (cont.)
Dendrogram
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Prototype
Evaluation (cont.)
Dendrogramix
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Prototype
Results
Insights and Value
CD DX commoncategory count value count value count
overview 6 6 7 7 5patterns 3 8 14 45 3
groups 14 34 17 43 14details 13 19 27 67 13
total 36 67 65 162 35per insight 1.86 2.49per minute 0.30 0.56 0.54 1.35
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Conclusion
Contributions
Dendrogramix is an alternative to dendrograms,
showing items similarities; and
allowing interactive exploration.
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Conclusion
Future work
Plenty of possibilities,e.g. combination with heatmaps.
Ge?ng%hints%about%(coB)clustering%
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Conclusion
Thank you!Pa
olo
Cig
noni
Mic
hael G
arl
and
Ala
rk Josh
iJa
rke J.
van W
ijkD
iete
r Sch
mals
tieg
Ale
xander
Lex
Marc
Str
eit
Just
in T
alb
ot
Pat
Hanra
han
Chi-W
ing F
uA
ndre
w H
anso
nFr
ank
van H
am
Fern
anda B
. V
iégas
Mart
in W
att
enberg
Robert
Kosa
raR
em
co C
hang
Will
iam
Rib
ars
kyM
ing D
ong
Jing H
ua
Xia
nfe
ng G
uA
rie K
aufm
an
Kla
us
Muelle
rPa
tric
Lju
ng
Anders
Pers
son
Cla
es
Lundst
röm
Anders
Ynnerm
an
Tim
o R
opin
ski
Kla
us
Hin
rich
sZ
hic
heng L
iuJo
hn S
task
oB
ohyoung K
imJin
wook
Seo
Bongsh
in L
ee
Georg
e R
obert
son
Danyel Fi
sher
Tim
Dw
yer
Petr
a Ise
nberg
Sheela
gh C
arp
endale
Chri
stopher
Colli
ns
Anast
asi
a B
eze
rianos
Jean-D
anie
l Fe
kete
Nath
alie
Henry
Ric
he
Mic
hael J. M
cGuff
inM
aneesh
Agra
wala
Jeff
rey H
eer
Mic
hael B
ost
ock
Mela
nie
Tory
Ivan V
iola
Bart
ter
Haar
Rom
eny
Anna V
ilanova
Marc
el B
reeuw
er
Javie
r O
liván B
esc
ós
Eduard
Grö
ller
Ste
fan B
ruck
ner
Chri
stoph H
ein
zlJü
rgen W
ase
rR
aphael Fu
chs
Benja
min
Sch
indle
rR
onald
Peik
ert
Thom
as
Ert
lSebast
ian G
rott
el
Mic
hael B
urc
hD
anie
l W
eis
kopf
Tors
ten M
olle
rA
lireza
Ente
zari
Charl
es
D.
Hanse
nR
obert
Kin
caid
Tam
ara
Munzn
er
Hansp
ete
r Pfist
er
Mir
iah M
eyer
Robert
Mik
e K
irby
Ross
T.
Whit
ake
rW
on-K
i Je
ong
Jens
Sch
neid
er
Rüdig
er
West
erm
ann
Denis
Gra
canin
Kre
sim
ir M
atk
ovic
Helw
ig H
ause
rM
ark
us
Hadw
iger
Helm
ut
Dole
isch
Phili
pp M
uig
gB
ern
hard
Pre
imC
hri
stophe H
urt
er
Fern
ando V
. Pa
ulo
vic
hLu
is G
ust
avo N
onato
Clá
udio
T.
Silv
aC
arl
os
E.
Sch
eid
egger
Julia
na F
reir
eH
am
ish C
arr
Ale
xander
Wie
bel
Geri
k Sch
euerm
ann
Heik
e Jänic
keM
in C
hen
Peer-Ti
mo B
rem
er
Vale
rio P
asc
ucc
iA
ttila
Gyula
ssy
Vija
y N
ata
raja
nB
ern
d H
am
ann
Ingri
d H
otz
Hans
Hagen
Chri
stoph G
art
hKenneth
I.
Joy
Xavie
r Tr
icoch
eC
arl
-Fre
dri
k W
est
inG
ord
on L
. K
indlm
ann
Thom
as
Sch
ult
zH
ans-
Pete
r Seid
el
Holg
er
Theis
el
Tino W
ein
kauf
Hans-
Chri
stia
n H
ege
Teng-Y
ok
Lee
Han-W
ei Shen
Chaoli
Wang
Kw
an-L
iu M
aC
arl
os
D.
Corr
ea
Pete
r Li
ndst
rom
Robert
Moorh
ead
Song Z
hang
David
S.
Ebert
Wei C
hen
Min
g-Y
uen C
han
Yingca
i W
uH
uam
in Q
uShix
ia L
iuW
eiw
ei C
ui
Hong Z
hou
Xia
oru
Yuan
Jian H
uang
Frit
s H
. Po
stC
harl
P.
Both
aR
obert
Lara
mee
Eugene Z
hang
Adit
i M
aju
mder
Eze
kiel S.
Bhask
er
Aid
an S
lingsb
yJo
Wood
Jaso
n D
yke
sG
regory
Cip
riano
Mic
hael G
leic
her
David
H.
Laid
law
Danie
l A
. Keim
Ben S
hneid
erm
an
David
Thom
pso
n
Thank you for your attention!<http://iihm.imag.fr/blanch/projects/dendrogramix/>
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix
Context Visualisation Interaction Prototype Conclusion
Conclusion
References
• Bar-Joseph, Demaine, Gifford, Hamel, Jaakkola, Srebro. K-ary clustering withoptimal leaf ordering for gene expression data. Bioinformatics 19(9), 1070–8. 2003.
• North, Saraiya, Duca. A comparison of benchmark task and insight evaluationmethods for information visualization.Information Visualization 10(3), 162–181. 2011.
[email protected], [email protected], [email protected] UGA, CNRS
Dendrogramix