pharos shining light on the druggable genome
TRANSCRIPT
PharosShining Light on the Druggable Genome
Dac Trung Nguyen, Timothy Sheils, Geetha Mandava, Ajit Jadhav, Noel Southall, Rajarshi Guha
NCATS, NIH
2016 ACS Fall Meeting, Philadelphia
The interface to the KMCEntity browsing (filterable & linked)Search (full text, auto-suggest)
Detailed view of entities Built on top of a robust REST API
Target Audience
Biologists & Clinical
Researcher• Characterize &
validate novel targets
• Identify key small molecules or biologics
Informatics Scientists
• Data mining• Support target
validation projects
Program Staff
• Explore the research landscape
• New directions for research & funding
Infrastructure
• Built using industry standard tools• Open Source, straightforward to run locally• Sources at https://spotlite.nih.gov/ncats/pharos
What’s Included?
• Pharos presents data from a variety of sources, integrated by U. New Mexico
• Primary focus is the protein target• Target related data include– Identifiers, ontology terms, sequence, expression
data, publications (curated & text mined)• Wherever possible, targets are linked to other
entities– Small molecules, Diseases, Publications
The Data Sources
Antibodypedia.com, BioPlex, Druggable Epigenome Domains, DrugCentral, Ensembl Cross References, GO Consortium, GTEx, GWAS Catalog, HGNC, HPA, HPM, IMPC, AnimalTFDB, JAX/MGI, Panther, PubChem, PubMed, NCBI Gene, NIH RePORTER, OMIM, TIN-X, UniProt, Harmonizome, DISEASES, TISSUES, DTO, CHEMBL
Interactions inside & outside the IDG
Drug Target Ontology
• Employed as a navigation tool as well as a filtering tool
• Currently DTOterms are used aslabels
• Exploring noveluses of thehierarchy
Target Ranking in PubMed
Novelty measures the scarcity of publications about a target: How much was published about it, as the inverse of the sum of FRACTIONS of papers/patents– E.g.: Target A is mentioned in 2 papers, first with other 4
targets, second with other 9 targets Novelty = 1/(1/5 + 1/10) = 3.33
Importance measures the strength of the associations betwee a target and a disease: Fractional disease-target score
– FDT = 1/ (nr targets + nr diseases) for each paper– Bayesian smoothing is used to compare general terms (cancer)
with specific ones (ovarian carcinosarcoma)
C Bologa, D. Cannon et al. 5/14/15 revision
Harmonizome
Ma’ayan et al. Trends Pharmacol Sci. 2014 Sep;35(9):450-60.http://amp.pharm.mssm.edu/Harmonizome/
Harmonogram (Tclin, Kinase)
Harmonogram (Tdark, GPCR)
Compute target similarity in “data availability space”
Tdark targets whose most similar target is not Tdark
Different Ways to Use Pharos
Precomputation converts analysis in to browsing
Supporting Both Types of Users
• Efficient full text search, coupled to relevant auto-suggestion– Primary entry point when exploring
and for hypothesis generation• Extensive list of facets– Supports easy construction of
complex filtering rules• Extensive details for each
target– Linked to external and internal
resources
Entity Dossier
• As you explore the knowledge base it’s useful keep track of data
• Pharos implements a dossier function– Analogous to e-commerce shopping carts
• Support for task-specific dossiers• Download a dossier as a ZIP file
Entity Dossier
Visualizations
• Interactive dashboard– Use visualizations as filters
• Inline visualizations for summary– Radar charts, word clouds, heatmaps, …– Context dependent drill down
• Links to external visualization resources– MSSM harmonogram– TINX (linkout & reduced version incorporated
locally)
Visualization Dashboard
• Different facets visualized appropriately• Directly filter results from visualization
Summary Visualizations
• Summarize text mined publications using word clouds, but also provide access to list
Summary Visualizations
• Consensus gene expression across three datasets (GTEx, HPA & HPM)
Original figure from Christian Stolte
Summary Visualizations
• Quickly scan targets that have similar types of data associated with them
Summary Visualizations - Drilldown
Facet Visualization
Pharos Usage
Pharos Indexing
The Long Term Vision• Provide access to all known
data about targets– Multi-scale, multi-domain –
bioactivity to symptoms• Intelligent summarization– Use explicit links & computational
inference to generate natural languagesummary using all known data
– Influenced by the query• The result is a biological dashboard,
customized for the user and the query
Feedback
• Explore the UI, try it, break it, and let us know what works and what doesn’t
• Are there data types and relations that would help you but are not available?
http://pharos.nih.gov
Acknowledgements
• Steve Mathias, Oleg Ursu, Jeremy Yang, Jayme Holmes, Christian Bologa, Daniel Canon, Tudor Oprea
• Stephan Schurer, Lars Juhl Jensen• Nicholas Fernandez, Andrew Rouillard, Avi
Mayan• Tomita Lab, Mike McManus, Gaia Skibinski• Ajay Pillai, Aaron Pawlyk, Christine Colvis