basics of comparative genomics dr g. p. s. raghava

19
Basics of Comparative Genomics Dr G. P. S. Raghava Dr G. P. S. Raghava

Upload: dora-porter

Post on 18-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Basics of Comparative Genomics Dr G. P. S. Raghava

Basics of Comparative Genomics

Dr G. P. S. RaghavaDr G. P. S. Raghava

Page 2: Basics of Comparative Genomics Dr G. P. S. Raghava

AIM: To understand Biology of Organisms Importance: More than 100 genomes

sequenced, more than 250 in progress Definition: Comparison of set of proteins of

one genome to another genome + comparision of gene location, gene order and gene regulation

Application– Visualization of information on genome– Genome annotation (Prediction of gene, repeats,

regulation region)– Evolutionary information (gene loss, duplication,

horizontal gene transfer, ancestor)– Essential genes for cell survival – Classification of genes based on function

Tools and Databases

Page 3: Basics of Comparative Genomics Dr G. P. S. Raghava

What is comparative What is comparative genomics?genomics?

Analyzing & comparing genetic Analyzing & comparing genetic material from different species to material from different species to study evolution, gene function, and study evolution, gene function, and inherited diseaseinherited disease

Understand the uniqueness Understand the uniqueness between different speciesbetween different species

Page 4: Basics of Comparative Genomics Dr G. P. S. Raghava

Why Comparative Why Comparative Genomics ?Genomics ?

It tells us what are common and what are unique It tells us what are common and what are unique betweenbetween different species at the genome level.different species at the genome level.

Genome comparison may be the surest and most Genome comparison may be the surest and most reliablereliable way to identify genes and predict their way to identify genes and predict their functions andfunctions and interactions.interactions.

– – e.g., to distinguish orthologs from paralogse.g., to distinguish orthologs from paralogs

The functions of human genes and other DNA The functions of human genes and other DNA regions canregions can be revealed by studying their be revealed by studying their counterparts in lowercounterparts in lower organisms.organisms.

Page 5: Basics of Comparative Genomics Dr G. P. S. Raghava

What is compared?What is compared?

Gene locationGene location Gene structureGene structure

– Exon numberExon number– Exon lengthsExon lengths– Intron lengthsIntron lengths– Sequence similaritySequence similarity

Gene characteristicsGene characteristics– Splice sitesSplice sites– Codon usageCodon usage– Conserved syntenyConserved synteny

Page 6: Basics of Comparative Genomics Dr G. P. S. Raghava

Few facts from genome comparisionFew facts from genome comparision

High degree of conservation of microbial High degree of conservation of microbial proteins (~70% ancestral conserved region)proteins (~70% ancestral conserved region)

Protein related with ENERGY process are Protein related with ENERGY process are generally found all genomesgenerally found all genomes

Proteins related to COMMUNICATION repersent Proteins related to COMMUNICATION repersent repersent most distinctive function in each repersent most distinctive function in each genomegenome

INFORMATION related protein have complex INFORMATION related protein have complex behaviourbehaviour

High frequence (~10%) non-orthologous gene High frequence (~10%) non-orthologous gene displacementdisplacement

Page 7: Basics of Comparative Genomics Dr G. P. S. Raghava

Few TerminologiesFew Terminologies

Homology :- Homology is the relationship of Homology :- Homology is the relationship of any two characters ( such as two proteins any two characters ( such as two proteins that have similar sequences ) that have that have similar sequences ) that have descended, usually through divergence, descended, usually through divergence, from a common ancestral character. from a common ancestral character. Homologues are thus components or Homologues are thus components or characters (such as genes/proteins with characters (such as genes/proteins with similar sequences) that can be attributed to similar sequences) that can be attributed to a common ancestor of the two organisms a common ancestor of the two organisms during evolution.during evolution.

Page 8: Basics of Comparative Genomics Dr G. P. S. Raghava

Homologoues can either be Homologoues can either be orthologues xenologues, paralogues orthologues xenologues, paralogues

or.or.

Orthologues are homologues that have Orthologues are homologues that have evolved from a common ancestral gene by evolved from a common ancestral gene by speciation. They usually have similar speciation. They usually have similar functions.functions.

Paralogues are homologues that are related Paralogues are homologues that are related or produced by duplication within a genome or produced by duplication within a genome followed by subsequent divergence. They followed by subsequent divergence. They often have different functions.often have different functions.

Xenologues are homologous that are related Xenologues are homologous that are related by an interspecies (horizontal transfer) of by an interspecies (horizontal transfer) of the genetic material for one of the the genetic material for one of the homologues. The functions of the homologues. The functions of the xenologues are quite often similar.xenologues are quite often similar.

Page 9: Basics of Comparative Genomics Dr G. P. S. Raghava

AnaloguesAnalogues

Analogues are non-homologues Analogues are non-homologues genes/proteins that have genes/proteins that have descended convergently from an descended convergently from an unrelated ancestor. They have unrelated ancestor. They have similar functions although they are similar functions although they are unrelated in either sequence or unrelated in either sequence or structure. structure.

Page 10: Basics of Comparative Genomics Dr G. P. S. Raghava

Frequently used termsFrequently used terms HomologyHomology

– Orthologous: Common ancestral gene. They usually Orthologous: Common ancestral gene. They usually have similar functionshave similar functions

– Paralogous: duplication of gene within genome have Paralogous: duplication of gene within genome have usually different functionsusually different functions

– Xenologous: That are related by an interspecies Xenologous: That are related by an interspecies (horizontal gene transfer) of the genetic material, (horizontal gene transfer) of the genetic material, have similar functionhave similar function

Analogous: Not evolve from same ancestor Analogous: Not evolve from same ancestor Similarity: sequence similaritySimilarity: sequence similarity Percent IdentitityPercent Identitity

Page 11: Basics of Comparative Genomics Dr G. P. S. Raghava

Visualising Genome Visualising Genome InformationInformation

Page 12: Basics of Comparative Genomics Dr G. P. S. Raghava

Genome AnnotationGenome Annotation

The Process of Adding Biology Information andThe Process of Adding Biology Information and

Predictions to a Sequenced Genome FrameworkPredictions to a Sequenced Genome Framework

Page 13: Basics of Comparative Genomics Dr G. P. S. Raghava

All-against-all Self-All-against-all Self-comparisoncomparison

How?How?– Making a database of the proteomeMaking a database of the proteome– Use each protein as a query in a similarity search Use each protein as a query in a similarity search

against the databaseagainst the database

(BLAST, WU-BLAST or FASTA)(BLAST, WU-BLAST or FASTA)– Generate a matrix of alignment scores (P or E value)Generate a matrix of alignment scores (P or E value)

: A conservative cutoff E value : 10e-6 : A conservative cutoff E value : 10e-6 Why?Why?

– Number of Gene FamiliesNumber of Gene Families

This comparison distinguishes unique proteins from This comparison distinguishes unique proteins from proteins arisen from gene duplication, and also proteins arisen from gene duplication, and also reveals the # of gene families.reveals the # of gene families.

– ParalogsParalogs

Significantly matched pairs of protein sequences Significantly matched pairs of protein sequences may be paralogs.may be paralogs.

Page 14: Basics of Comparative Genomics Dr G. P. S. Raghava

Between-Proteome Between-Proteome Comparisons : Why?Comparisons : Why?

To identify orthologs, gene families, and domainsTo identify orthologs, gene families, and domains Orthologs: (proteins that share a common ancestry & Orthologs: (proteins that share a common ancestry &

function)function)– A pair of proteins in two organisms that align along most A pair of proteins in two organisms that align along most

of their lengths with a highly significant alignment score.of their lengths with a highly significant alignment score.– These proteins perform the core biological functions These proteins perform the core biological functions

shared by the two organisms.shared by the two organisms.– Two matched sequences (X in A, Y in B) may not be Two matched sequences (X in A, Y in B) may not be

orthologsorthologs(Y and Z are paralogs in B, X and Z are orthologs)(Y and Z are paralogs in B, X and Z are orthologs)

– Identify true orthologsIdentify true orthologs(a)(a) highest-scoring match (best hit)highest-scoring match (best hit)(b)(b) E value < 0.01E value < 0.01(c)(c) > 60% alignment over both proteins> 60% alignment over both proteins

Page 15: Basics of Comparative Genomics Dr G. P. S. Raghava

Between-Proteome Between-Proteome Comparisons: How?Comparisons: How?

1.1. Choose a yeast protein and perform a database similarity search Choose a yeast protein and perform a database similarity search of the worm proteome (WU-BLAST): a yeast-versus-worm searchof the worm proteome (WU-BLAST): a yeast-versus-worm search

2.2. Group the worm seqs that match the yeast query seq with a Group the worm seqs that match the yeast query seq with a high P value (10high P value (10-10-10 to 10 to 10-100-100), also include the yeast query seq in ), also include the yeast query seq in the group the group

3.3. From the group made in 2, choose a worm seq and make a From the group made in 2, choose a worm seq and make a search of the yeast proteome, using the same P limitsearch of the yeast proteome, using the same P limit

4.4. Add any matching yeast seq to the group made in 2Add any matching yeast seq to the group made in 25.5. Repeat 3 & 4 for all initially matched seqs in the groupRepeat 3 & 4 for all initially matched seqs in the group6.6. Repeat 1-5 for every yeast proteinRepeat 1-5 for every yeast protein7.7. As 1-6, perform a comparable worm-versus-yeast searchAs 1-6, perform a comparable worm-versus-yeast search8.8. Coalesce the groups of related seqs. and remove any Coalesce the groups of related seqs. and remove any

redundancies so that every sequence is represented only once.redundancies so that every sequence is represented only once.9.9. Eliminate any matched pairs in which less than 80% of each seq Eliminate any matched pairs in which less than 80% of each seq

is in the alignment is in the alignment

Page 16: Basics of Comparative Genomics Dr G. P. S. Raghava

Figure 1   Regions of the human and mouse homologous genes: Coding exons (white), noncoding exons (gray}, introns (dark gray), and intergenic regions (black). Corresponding strong (white) and weak (gray) alignment regions of GLASS are shown connected with arrows. Dark lines connecting the alignment regions denote very weak or no alignment. The predicted coding regions of ROSETTA in human, and the corresponding regins in mouse, are shown (white) between the genes and the alignment regions.

Page 17: Basics of Comparative Genomics Dr G. P. S. Raghava

Target ValidationTarget Validation Target validation involves taking steps to prove Target validation involves taking steps to prove

that a DNA, RNA, or protein molecule is directly that a DNA, RNA, or protein molecule is directly

involved in a disease process and is therefore a involved in a disease process and is therefore a

suitable target for development of a new suitable target for development of a new

therapeutic compound.therapeutic compound.

Genes that do not belong to an established family Genes that do not belong to an established family

are critical to many disease processes and also are critical to many disease processes and also

need to be validated as potential drug targets. need to be validated as potential drug targets.

Page 18: Basics of Comparative Genomics Dr G. P. S. Raghava
Page 19: Basics of Comparative Genomics Dr G. P. S. Raghava