1/29 comparative genomics. 2/29 overview of the talk comparing genomes homologies & families...

30
Comparative Genomics

Upload: melvin-gibbs

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

Comparative Genomics

Page 2: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

2/29

Overview of the Talk

• Comparing Genomes

• Homologies & Families

• Sequence Alignments

Page 3: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

3/29

Evolution at the DNA Level

…ACTGACATGTACCA…

…AC----CATGCACCA…

Mutation

Sequence edits

Rearrangements

Deletion

InversionTranslocationDuplication

Page 4: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

4/29

• We can better understand evolution/ speciation

• We can find important, functional regions of the sequence (codons, promoters, regulatory regions)

• It can help us locate genes in other species that are missing or not well-defined (also through comparison and alignments).

Why Compare Genomes?

Page 5: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

5/29

Mammals have roughly 3 billion base pairs in their genomes

Over 98% human genes are shared with primates, wth more than 95-98% similarity between genes.

Even the fruit fly shares 60% of its genes with humans! (March 2000)

Differences: gene structure, sequence

Remember… one nucleotide change can cause disease such as sickle cell anemia and cancer.

Comparing Genomes

Page 6: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

6/29

• Uses all the species

• Uses a representative protein (the longest) for every gene

• Builds a gene tree

• EnsemblCompara GeneTrees: Analysis of complete, duplication aware phylogenetic trees in vertebrates. Vilella AJ, Severin J, Ureta-Vidal A, Durbin R, Heng L, Birney E. Genome Res. 2008 Nov 24.

How Does Ensembl Predict Homology?

Page 7: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

7/29

Load longest protein for every gene from all species

WU Blastp + SmithWaterman longest translation of every gene

against every other (Blast Reciprocal Hit/ Blast Score Ratio)

Protein clustering, build multiple alignments (MCoffee)

From each alignment, build a gene tree

Reconcile each gene tree with the species tree to determine internal

nodes (TreeBest) Orthologues, paralogues…

Steps in Homology Prediction

..MEDPATA…

Page 8: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

8/29

Viewing Trees in Ensembl

Page 9: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

9/29

Types of Homologues

• Orthologues : any gene pairwise relation where the ancestor node is a speciation event

• Paralogues : any gene pairwise relation where the ancestor node is a duplication event

Page 10: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

10/29

The Gene Tree for INS (insulin precursor)

A red square is a

duplication event

(Paralogues)

A blue square is a

speciation event

(Orthologues)

Page 11: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

Reconciliation

M

R

H

M

R

H

species tree

unrooted gene tree

Duplication nodeSpeciation node

M

R

HM

H

R

gene

loss

gene

loss

gene lossR’

H’

M’

Page 12: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

12/29

Orthologue Types

What is ‘1 to 1’?

What is ‘1 to many’?

Page 13: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

13/29

Protein Families

• How: Cluster proteins for every isoform in every species + UniProt proteins.

• BLASTP comparison of:– all Ensembl ENSP…– all metazoan (animal) proteins in UniProt

Page 14: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

14/29

1. Find the human MYL6 gene: go to its gene summary.

2. How many paralogues does it have? Find them in the gene tree.

3. Which paralogue is closest to the human MYL6 gene? In what taxon is the common ancestor?

Homologues ExerciseHomologues Exercise

Page 15: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

15/29

Pan-Compara (Ensembl Genomes)

Bacillus subtilisEscherichia coli K12Mycobacterium tuberculosis H37RvNeisseria meningitidis A 4APyrococcus horikoshiiStaphylococcus aureus N315Streptococcus pneumoniae TIGR4Streptococcus pyogenes M1 SF370

Plasmodium falciparumPlasmodium vivax

Anolis carolinensis Ciona savignyiDanio rerioEquus caballusGallus gallusHomo sapiensMacaca mulattaAnopheles gambiae

Caenorhabditis elegansDrosophila melanogaster

Arabidopsis thalianaOryza sativa japonicaVitis vinifera

Saccharomyces cerevisiaeSchizosaccharomyces pombe

Monodelphis domesticaMus musculusOrnithorhynchus anatinusPan troglodytesPongo pygmaeusXenopus tropicalis

x8

x3

x3

x2

x2

x13

Page 16: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

16/29

www.ensemblgenomes.org

Page 17: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

17/29

Families

Page 18: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

18/29

Ensembl Proteins in the Family

Page 19: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

19/29

Overview of the Talk

• Comparing Genomes

• Homologies and Families

• Sequence Alignments

Page 20: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

20/29

• Large stretches of non-coding regions in vertebrates

• Regulatory regions of:

Developmental genes

Transcription factors

miRNA

Non-Coding Regions

Kikuta et. al, Genome Research, May 2007

Page 21: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

21/29

Comparative Genomics today

Page 22: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

22/29

• To identify homologous regions

• To spot trouble gene predictions

• Conserved regions could be functional

• To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved)

Aligning Whole Genomes- Why?

Page 23: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

23/29

Aligning large genomic sequences

Difficulties:• Requires a significant computer resource• Scalability, as more and more genomes are

sequenced• Time constraint• As the «true» alignment is not known, then

difficult to measure the alignment accuracy and apply the right method

Page 24: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

24/29

Whole Genome Alignments• BLASTZ-net (nucleotide level) closer species e.g. human – mouse

• Translated BLAT (amino acid level) more distant species, e.g. human – zebrafish

• EPO/PECAN multispecies alignments

• ORTHEUS used to determine ancestral alleles

Page 25: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

25/29

1. Find the Ensembl MYH2 gene for human and go to Region in Detail.

2. Turn on the BLASTZ alignment against cow. What part of the cow genome aligns to this region in human?

3. Jump to the region in cow.

Alignments ExerciseAlignments Exercise

Page 26: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

26/29

Go back to the human page.

• Use the Alignments (text) and Multi-species view links to explore the alignments.

AlignmentsAlignments ExerciseExercise

Page 27: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

27/29

Go back to region in detail

• Turn on the conservation score for 31 species, and the constrained elements tracks.

• Where are the regions of high conservation?

1. Click on the regulatory feature that corresponds to a highly conserved block of sequence. What is it?

Conserved Regions ExerciseConserved Regions Exercise

Page 28: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

28/29

Ancestral AllelesAncestral Alleles

• Go to the variation tab for rs34161789, and take the Phylogenetic Context link.

• What is the allele in the four primates?

Hint… either go to the gene tab and click on the SNP ID from the variation table, or do a new search using rs34161789.

Page 29: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

29/29

Compara Team at EBI

• Javier Herrero• Kathryn Beal• Stephen Fitzgerald• Albert Vilella

Page 30: 1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

30/29

End of Course Survey

Exercises on page 43. Answers are on page 44.