visual analytics talk at ismb2013
TRANSCRIPT
- Visual Analytics -The human back in the loop
Jan AertsBiodata Analysis and VisualizationStadius Group, ESATLeuven University, [email protected]@jandothttp://orcid.org/0000-0002-6416-2717
hypothesis-driven -> data-driven
Scientific Research Paradigms (Jim Gray, Microsoft)
I have an hypothesis -> need to generate data to (dis)prove it.I have data -> need to find hypotheses that I can test.
1st 1,000s years ago empirical
2nd 100s years ago theoretical
3rd last few decades computational
4rd today data exploration
What does this mean?
• immense re-use of existing datasets
• much of initial analysis is exploratory in nature
• biologically interesting signals may be too poorly understood to be analyzed in automated fashion
• visualization is very effective in facilitating human reasoning about complex data
• automated algorithms often act as black boxes => biologists must have blind faith in bioinformatician (and bioinformatician in his/her own skills)
What is visualization?
T. Munzner
Data visualization framework
Data visualization framework
interactivity
Data visualization framework
Data visualization framework
visual analytics infographics
“visual analytics”
• Types of interaction (Yi et al, IEEE Transactions on Visualization and Computer Graphics, 2007)
• select -> mark something as interesting
• explore -> show me something else
• reconfigure -> show me a different arrangement
• encode -> show me a different representation
• abstract/elaborate -> show me less/more detail
• filter -> show me something conditionally
• connect -> show me connected items
Visualization for biological hypothesis generation
• example: eQTL data (IEEE BioVis visualization challenge 2011)
• 500 patients (affected + non-affected)
• 7500 SNPs; gene expression data for 15 genes
• PLINK one-locus/two-locus
Aracari
Ryo Sakai
Bartlett C et al. BMC Bioinformatics (2012)
RevealJäger, G et al. Bioinformatics (2012)
HiTSeeBertini E et al. IEEE Symposium on Biological Data Visualization (2011)
when do I know that my algorithm is “correct”? -> peek into the black box
input
filter 1
filter 2
output A
filter 3
output B output C
Visualization for algorithm development
AB
C
AB
C
AB
C
Caleydo MatchMaker
Lex A et al. IEEE Transactions on Visualization and Computer Graphics (2010)
MeanderPavlopoulos et al. Nucl Acids Res (2013)
Georgios Pavlopoulos
ParCoordBoogaerts T et al. IEEE International Conference on
Bioinformatics & Bioengineering (2012)
Thomas Boogaerts
Endeavour gene prioritization
Visualization for (live) interaction with analysis
• alternating between visual and automatic methods -> continuous refinement and verification of preliminary results
• misleading results: discovered at early stage
• leverage user’s (biologist’s) insights
• no black box
CytoscapeSmoot et al. Bioinformatics (2011)
Data filtering (visual parameter setting)
TrioVis
Ryo Sakai
Sakai R et al. Bioinformatics (2013)
User-guided analysis
SparkNielsen et al. Genome Research (2012)
clustering
chromatin modification
DNA methylationRNA-Seq
data samples
regions of interest
BaobabViewvan den Elzen S & van Wijk J. IEEE Conference on
Visual Analytics Science and Technology (2011)decision trees
Goecks, J. et al. Nature Biotechnology (2012)
Galaxy TracksterGoecks J et al. Nature Biotechnology (2012)
Bret Victor - Ladder of abstration
Many challenges remain
• scalability (data processing + perception), uncertainty, “interestingness”, interaction, evaluation
• infrastructure & architecture
• fast imprecise answers with progressive refinement
• incremental re-computation
• steering computation towards data regions of interest
Acknowledgments
• Bioinformatics Group at Stadius, Leuven University
• in particular: Ryo Sakai, Georgios Pavlopoulos
• visualization community for examples
• Jeremy for Trackster video