a systematic approach to genotype-phenotype correlations

30
A Systematic approach to the Large-Scale Analysis of Genotype-Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass

Upload: fisherp

Post on 27-Jan-2015

122 views

Category:

Technology


2 download

DESCRIPTION

It is increasingly common to combine Microarray and Quantitative Trait Loci data to aid the search for candidate genes responsible for phenotypic variation. Workflows provide a means of systematically processing these large datasets and also represent a framework for the re-use and the explicit declaration of experimental methods. Here we highlight the issues facing the manual analysis of microarray and QTL data for the discovery of candidate genes underlying complex phenotypes. We show how automated approaches provide a systematic means to investigate genotype-phenotype correlations. This methodology was applied to a use case of resistance to African trypanosomiasis in the mouse. Pathways represented in the results identified Daxx as one of the candidate genes within the Tir1 QTL region.

TRANSCRIPT

Page 1: A systematic approach to Genotype-Phenotype correlations

A Systematic approach to the Large-Scale Analysis of Genotype-

Phenotype correlations

Paul Fisher

Dr. Robert Stevens

Prof. Andrew Brass

Page 2: A systematic approach to Genotype-Phenotype correlations

The entire genetic identity of an individual The entire genetic identity of an individual that that does not showdoes not show any outward any outward characteristics, characteristics, e.g.e.g. Genes, mutations Genes, mutations

Genotype

DNA

ACTGCACTGACTGTACGTATATCT

ACTGCACTGTGTGTACGTATATCT

Mutations

Genes

Page 3: A systematic approach to Genotype-Phenotype correlations

(harder to characterise)

The observable expression of gene’s The observable expression of gene’s producing producing notable characteristicsnotable characteristics in in an individual, an individual, e.g.e.g. Hair or eye colour, Hair or eye colour, body mass, resistance to diseasebody mass, resistance to disease

Phenotype

vs.

Brown White and Brown

Page 4: A systematic approach to Genotype-Phenotype correlations

Genotype to Phenotype

Page 5: A systematic approach to Genotype-Phenotype correlations

Genotype Phenotype

?

Current Methods

200

What processes to investigate?

Page 6: A systematic approach to Genotype-Phenotype correlations

?

200

Microarray + QTL

Genes captured in microarray experiment and present in QTL (Quantitative Trait Loci ) region

Genotype Phenotype

Metabolic pathways

Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping

Page 7: A systematic approach to Genotype-Phenotype correlations

CHR

QTL

Gene A

Gene B

Pathway A

Pathway B

Pathway linked to phenotype – high priority

Pathway not linked to phenotype – medium priority

Pathway C

Phenotype

literature

literature

literature

Gene C

Pathway not linked to QTL – low priority

Genotype

Page 8: A systematic approach to Genotype-Phenotype correlations

Issues with current approaches

Page 9: A systematic approach to Genotype-Phenotype correlations

Huge amounts of data

200+ Genes

QTL region on chromosome

Microarray

1000+ Genes

How do I look at ALL the genes systematically?

Page 10: A systematic approach to Genotype-Phenotype correlations

Hypothesis-Driven Analyses

200 QTL genes

Case: African Sleeping sickness

- parasitic infection

- Known immune response

Pick the genes involved in immunological process

40 QTL genesPick the genes that I am most familiar with

2 QTL genes

Biased view

Result: African Sleeping sickness

-Immune response

-Cholesterol control

-Cell death

Page 11: A systematic approach to Genotype-Phenotype correlations

Manual Methods of data analysis

Navigating through hyperlinks

No explicit methods

Human error

Tedious and repetitive

Page 12: A systematic approach to Genotype-Phenotype correlations

Implicit methods

Page 13: A systematic approach to Genotype-Phenotype correlations

Issues with current approaches

• Scale of analysis task

• User bias and premature filtering

• Hypothesis-Driven approach to data analysis

• Constant flux of data - problems with re-analysis of data

• Implicit methodologies (hyper-linking through web pages)

• Error proliferation from any of the listed issues

Solution – Automate through workflows

Page 14: A systematic approach to Genotype-Phenotype correlations

The Two W’s

• Web Services– Technology and standard for exposing code /

database with an means that can be consumed by a third party remotely

– Describes how to interact with it

• Workflows– General technique for describing and

executing a process

– Describes what you want to do

Page 15: A systematic approach to Genotype-Phenotype correlations

Taverna Workflow Workbench

http://taverna.sf.net

Page 16: A systematic approach to Genotype-Phenotype correlations

Hypothesis

Utilising the capabilities of workflows and the pathway-driven approach, we are able to provide a more:

- systematic

- efficient

- scalable

- un-biased

- unambiguous

the benefit will be that new biology results will be derived, increasing community knowledge of genotype and phenotype interactions.

Page 17: A systematic approach to Genotype-Phenotype correlations

Pathway Resource

QTL mapping study

Microarray gene expression study

Identify genes in QTL regions

Identify differentially expressed genes

Wet Lab Literature

Annotate genes with biological pathways

Annotate genes with biological pathways

Select common biological pathways

Hypothesis generation and verification

Statistical analysisGenomic

Resource

Page 18: A systematic approach to Genotype-Phenotype correlations

Replicated original chain of

data analysis

Page 19: A systematic approach to Genotype-Phenotype correlations

http://www.genomics.liv.ac.uk/tryps/trypsindex.html

Trypanosomiasis in Africa

An

dy Brass

Steve

Ke

mp

+ many Others

Page 20: A systematic approach to Genotype-Phenotype correlations

Preliminary Results

Trypanosomiasis resistanceA strong candidate gene was found – Daxx gene not found using manual investigation methods– The gene was identified from analysis of biological pathway

information– Possible candidate identified by Yan et al (2004): Daxx SNP info– Sequencing of the Daxx gene in Wet Lab showed mutations that

is thought to change the structure of the protein– Mutation was published in scientific literature, noting its effect on

the binding of Daxx protein to p53 protein – p53 plays direct role in cell death and apoptosis, one of the Trypanosomiasis phenotypes

– More genes to follow (hopefully) in publications being written

Page 21: A systematic approach to Genotype-Phenotype correlations

Shameless Plug!

A Systematic Strategy for Large-Scale Analysis of Genotype-Phenotype Correlations: Identification of

candidate genes involved in African Trypanosomiasis

Fisher et al., (2007) Nucleic Acids Research doi:10.1093/nar/gkm623

• Explicitly discusses the methods we used for the Trypanosomiasis use case

• Discussion of the results for Daxx and shows mutation

• Sharing of workflows for re-use, re-purposing

Page 22: A systematic approach to Genotype-Phenotype correlations

Recycling, Reuse, Repurposing

Here’s the Science!

• Identified a candidate gene (Daxx) for Trypanosomiasis resistance.

• Manual analysis on the microarray and QTL data failed to identify this gene as a candidate.

• Unbiased analysis. Confirmed by the wet lab.

Here’s the e-Science!• Trypanosomiasis mouse workflow reused without change

in Trichuris muris infection in mice

• Identified biological pathways involved in sex dependence

• Previous manual two year study of candidate genes had failed to do this.

Workflows now being run over Colitis/ Inflammatory Bowel Disease in Mice (without change)

Page 23: A systematic approach to Genotype-Phenotype correlations

Recycling, Reuse, Repurposing

http://www.myexperiment.org/

• Share

• Search

• Re-use

• Re-purpose

• Execute

• Communicate

• Record

Page 24: A systematic approach to Genotype-Phenotype correlations

What next?

• More use cases??– Can be done, but not for my project

• Text Mining !!!– Aid biologists in identifying novel links between

pathways– Link pathways to phenotype through literature

Page 25: A systematic approach to Genotype-Phenotype correlations

Pathway Resource

QTL mapping study

Microarray gene expression study

Identify genes in QTL regions

Identify differentially expressed genes

Wet Lab Literature

Annotate genes with biological pathways

Annotate genes with biological pathways

Select common biological pathways

Hypothesis generation and verification

Statistical analysisGenomic

Resource

Page 26: A systematic approach to Genotype-Phenotype correlations

CHR

QTL

Gene A

Gene B

Pathway A

Pathway B

Pathway linked to phenotype – high priority

Pathway not linked to phenotype – medium priority

Pathway C

Phenotype

literature

literature

literature

Gene C

Pathway not linked to QTL – low priority

Genotype

DONE MANUALLY

Page 27: A systematic approach to Genotype-Phenotype correlations

It can’t be that hard, right?

• PubMed contains ~17,787,763 journals to date

• Manually searching is tedious and frustrating

• Can be hard finding the links

Computers can help with data gathering and information extraction – that’s their job !!!

Page 28: A systematic approach to Genotype-Phenotype correlations

Text Mining• A means of assisting the researcher

– Time

– Effort

– Narrow searches

• Hypothesis generation and verification

– Suggested links

– Limited corpus, but its specific

NOT A REPLACEMENT FOR

DOMAIN EXPERTISE

Page 29: A systematic approach to Genotype-Phenotype correlations

To Sum Up ….• Need for Genotype-Phenotype correlations with respect to disease control

• High-throughput data can provide links between Genotype and Phenotype

• Highlighted issues with manually conducted in silico experiments

• Improved the methods of current microarray and QTL based investigations through systematic nature

• Increased reproducibility of our methods- workflows stored in XML based schema- explicit declaration of services, parameters, and methods of data analysis

• Shown workflows are capable of deriving new biologically significant results- African Trypanosomiasis in the mouse- Infection of mice with Trichuris muris

• The workflows require expansion to accommodate new analysis techniques – text mining

Page 30: A systematic approach to Genotype-Phenotype correlations

Many thanks to:

including: Joanne Pennock, EPSRC, OMII, myGrid, and lots more people