orthology & paralogy alignment & assembly alastair kerr ph.d. [many slides borrowed from...

26
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]

Upload: lynne-johnson

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Orthology & ParalogyAlignment & Assembly

Alastair Kerr Ph.D. [many slides borrowed from various sources]

Overview

Orthology & Paralogy Definitions and examples Ways to determine an ortholog Pre-calculations: resources

Alignment & Assembly Differences Key programs for each Jalview example

Homologs

Have common origins but may or may not have common activity.

Homologous or not?: Often determined by arbitrary threshold level of similarity determined by alignment

Homologs

…have common ancestry, but the way they are related

can vary

(i.e. the reasons they have diverged into different sequences can vary)

orthologs - Homologs produced by speciation. They tend to have similar function.

paralogs - Homologs produced by gene duplication. They tend to have differing functions.

Orthologous or paralogous homologs

Early globin gene

mouse

ß-chain gene-chain gene

cattle ß human ß mouse ßhuman cattle

Orthologs () Orthologs (ß)Paralogs (cattle)

Homologs

Gene Duplication

Orthologs – diverged after speciation – tend to have similar function

Paralogs – diverged after gene duplication – some functional divergence occurs

Therefore, for linking similar genes between species, or performing “annotation transfer”, identify orthologs

True or False?

A1x is the ortholog in species x of A1y?

A1x is a paralog of A2x?

A1x is a paralog of A2y?

Identifying Gene/Protein Relationships from Phylogenetic trees

orthologs - Homologs produced by speciation. Gene phylogeny matches organismal phylogeny.

paralogs - Homologs produced by gene duplication. Multiple copies of homologs in a given species or evidence that gene duplication involved through phylogenetic analysis and lack of match to organismal phylogeny

Gene Orthology: How to detect? Most : Identify reciprocal best BLAST hits (EGO, COGs,…)

Example Problem:

If making comparisons between human and bovine, for example, the bovine gene dataset is still quite incomplete

Therefore, current best hit may be a paralog now and the true ortholog not yet sequenced

cattle human cattle mouse

2 Forms in 1 Species+ + ++ +

Slides from Jonathan Eisen

2 Forms in 1 Species - Gene Loss

Gene duplicated in common ancestor

+ + ++ +

++

LossLoss

Unusual Distribution Pattern+ +

Unusual Distribution - Gene Loss+ +

Gene present in ancestor

Gene losthere

Unusual Distribution -Evolutionary Rate Variation -?

+

+

Gene too diverged to be found

Ortholog guess via synteny

AA CCB

AA CC?

Syntenic blocks

ensEMBL calculationshttp://www.ensembl.org

demo

OMA Browserhttp://omabrowser.org

demo

Alignments and Assemblies

Alignment ALL sequences from SAME region Therefore can be useless for a

non-overlapping contigs PCR probes/oligos

Good for paralog/orthologs Basis for phylogeny

Assembly: Good for near identical sequences Types:

De-novo Guided [reference sequence]

Alignment

Implicit statement Each residue in an aligned sequence

derived from the last common ancestor [LCA]

Therefore ok to only look at conserved regions or mask non-conserved regions Especially for phylogeny

Alignment Tools

Faster but less accurate (some better with gaps) Muscle ClustalW/X MAFFT

Slow but more accurate *-Coffee

T: original 3D: uses pdb as guide (structural) M: uses multiple methods

Probcons

Alignment Edit Tools

NEVER use a word processor or excel to edit alignments……

JalView (Java Alignment Viewer) Good for editing DAS capable

FigureGeneration Trees

Annotation

Features

Structures

PDB

‘Standard’ FormatsFASTA MSF CLUSTAL

PILEUP BLC PFAM

DistributedAnnotationSystem

DistributedAnnotationSystem

GFF

Jalview Features

Newick

Secondary StructurePrediction

MultipleSequenceAlignment

Sequences

Alignments

ClickableHTML

ImagesLine Art

Analysis

ConsensusConservation& Clustering

Visualization

Jalview Annotation

Jalview DAS Client Functionality

DASANNOTATIO

NSERVERS

DASANNOTATIO

NSERVERS

•Query matches ID to Authority•Map to local reference frame

•Mouse over for feature name, links and scores

•Group features by source•Type==colour•Highlight start-end

•Select specific sources•Filtered list•Add user defined sources

Assemblers

Many free options STADEN - staden.sf.net

Original assembler, all platforms No longer in development Useless for next gen sequencing

MAQ and MAQView Installed in computers in COIL