turning big data and text collections into web resrouces
TRANSCRIPT
Lars Juhl Jensen
Turning big data and text collections into web
resources
three parts
data integration
text mining
interface design
data integration
association networks
guilt by association
STRING
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
computational predictions
gene fusion
Korbel et al., Nature Biotechnology, 2004
experimental data
physical interactions
Jensen & Bork, Science, 2008
curated knowledge
metabolic pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
hard work
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
missing most of the data
text mining
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDC2
expansion rules
flexible matching
cyclin dependent kinase 1
cyclin-dependent kinase 1
CDC2
hCdc2
“black list”
SDS
proteins
small molecules
compartments
tissues
diseases
information extraction
count co-mentioning
within documents
within paragraphs
within sentences
corpora
~22 million abstracts
no access
~4 million full-text articles
interface design
ease of use
web resources
simple search interface
complex relational database
attractiveness
data visualization
STRING
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
payload
compartments.jensenlab.org
COMPARTMENTS
compartments.jensenlab.org
TISSUES
tissues.jensenlab.org
provenance
evidence viewers
DISEASES
reusability
web services
download files
open licenses
Acknowledgments
Protein networks
Christian von MeringDamian Szklarczyk
Michael KuhnManuel Stark
Samuel ChaffronChris Creevey
Jean MullerTobias DoerksPhilippe Julien
Alexander RothMilan Simonovic
Jan KorbelBerend Snel
Martijn HuynenPeer Bork
Literature miningSune FrankildEvangelos PafilisJanos BinderKalliopi TsafouAlberto SantosHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’Donoghue