qtlnetminer - efficient search and prioritization of gene evidence networks

18
Rothamsted Research where knowledge grows QTLNetMiner – Efficient search and prioritization of gene evidence networks WheatIS Annual Meeting, San Diego 9 January 2015 Keywan Hassani-Pak

Upload: keywan-hassani-pak

Post on 18-Jul-2015

58 views

Category:

Software


0 download

TRANSCRIPT

Rothamsted Researchwhere knowledge grows

QTLNetMiner – Efficient search and prioritization of gene evidence networks

WheatIS Annual Meeting, San Diego

9 January 2015

Keywan Hassani-Pak

Many gene discovery routes exploit genetic or transcriptome data to produce markers for breeding or reverse genetics

Routes to candidate gene discovery

Gene Expression

QTL/GWAS

Candidate Genes

Prioritization Validation

Markers for Breeding or GM

Tra

its

1

2

3

Gene Prioritization – a knowledge discovery challenge!

Orthologous Genes

Arabidopsis, Rice, Yeast etc.

Lists of candidate genes

Gene Expression

Evaluation of different types of evidence Expensive and labour-intensive

Literature

Phenotype, GeneOntologies

PathwaysOmics data

Traits

Knowledge discovery process

Selection

Preprocessing

Transformation

Data mining

InterpretationEvaluation

Data integration and transformation using Ondex

• Ondex parsers for many data sources to transform raw data into semantic networks

• Accession mapping or text mining to link concepts from different data sources

• Update data warehouse needs download of new data and re-run integration workflow

Ondex: free, open-source, developed in Java www.ondex.org

Building a Wheat Information Network through integration of publicly available datasets

Wheat Genes Homology/Domains Annotations

5A

5B

5D

TTG2seed color

seed coat development

DNA-binding WRKY

WRKY1

PMID 19129166

Inferred from Mutant Phenotype

PMID: 15598800

GO

TO

encodes

text-mining

Mutations in TTG2 cause phenotypic defects in trichome

development and seed colorpigmentation. PMID: 17766401

41% identityEnsemblCompara

QTLNetMiner – Mining large semantic networks for gene-trait discovery

Arabidopsis, Wheat,Poplar at Rothamsted

Barley in collaboration with IPK, Germany

Potato & Solanaceaein collaboration with INTA, Argentina

Animals in collaboration with Roslin Institute, UK

• Web: https://ondex.rothamsted.ac.uk/QTLNetMiner• Code: https://github.com/KeywanHP/QTLNetMiner

QTLNetMiner search interface

Define a QTL region you are interested in.

Include a list of gene names and see if they are related to your keyword.

Let’s help you to suggest alternative search terms to

improve your results.

Trait related keywords

Your QTL in cM

Differentially expressed genes

QTLNetMiner – Gene View

QTLNetMiner – Evidence View

Show all genes involved in a particular biological

process

QTLNetMiner – Network View

Taubert J, Hassani-Pak K, Castells-Brooke N and Rawlings C, Ondex Web: web-based visualization and exploration of heterogeneous biological networks. Bioinformatics (2013)

Multi gene evidence networks

... zoom into regions of interest

TRAES_1AL_0404BC790

TRAES_1BL_1D865A8CC

TRAES_1DL_5BAB0B6BC

WRKY43

CML9

Calcium signalling

Mechanical stimulus response

Calcium ion detection

Stress tolerance

GO

GO

GO

TO

WRKY

Mutations of the AtCML9 gene also alter the expression of several stress-regulated genes,

suggesting that AtCML9 is involved in salt stress tolerance through its effects on the ABA-

mediated pathways.

Associating genes with trait terms through guilt by association in a labelled & directed multi-graph (Ondex network)

QTLNetMiner – Semantic motif search

auxincytokinin

strigolactone

CCDMAX

subapical shootsaxillary branching

shoot branching

hormone

?

Integrated knowledge network User input (prior knowledge)

Gene

• Scoring genes based on information retrieval metric

reflect how relevant a term is to a gene in a collection

• Developed a metric that takes into account

1. The amount of supporting evidence (tdf)

2. The specificity of evidence to a gene (IDFmean)

Candidate gene prioritisation

𝑆𝑐𝑜𝑟𝑒 𝑡, 𝑋 = 𝑡𝑑𝑓 𝑡, 𝑋 ∗ 𝐼𝐷𝐹𝑚𝑒𝑎𝑛(𝑋)

t: query terms X: set of documents associated with a gene

Gene ranking – Example

Query: Phytophthora infestans|late blight resistance|response to pathogen|LRR

Score: 5.72

Score: 2.71

• Compatible with iOS, Android and Microsoft mobile devices Replace the Java applet network viewer with CytoscapeJS

Replace the Flash GViewer with KineticsJS

• Develop a federated version (SolR, RDF, SPARQL) of QTLNetMiner instead of centralised data warehousing

• Tighter integration with gene expression and variation databases to improve gene ranking algorithm

Current and future development