annotation and alignment of the drosophila genomes centro de ciencas genomicas, may 29, 2006

Download Annotation and Alignment of the Drosophila Genomes Centro de Ciencas Genomicas,  May 29, 2006

If you can't read please download the document

Post on 27-Jan-2016

26 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Annotation and Alignment of the Drosophila Genomes Centro de Ciencas Genomicas, May 29, 2006. Genes or Regulation ?. “10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura / melanogaster divergence” - PowerPoint PPT Presentation

TRANSCRIPT

  • Annotation and Alignment of the Drosophila GenomesCentro de Ciencas Genomicas, May 29, 2006.

  • Genes or Regulation? 10,516 putative orthologs have been identified as a core gene set conserved over 2555 million years (Myr) since the pseudoobscura/melanogaster divergence

    Cis-regulatory sequences are more conserved than random and nearby sequences between the speciesbut the difference is slight, suggesting that the evolution of cis-regulatory elements is flexibleRichards et al., Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution, Genome Res., Jan 2005.

  • http://rana.lbl.gov/drosophila/wiki/

  • BP England, U Heberlein, R Tjian. Purified Drosophila transcription factor, Adh distal factor-1 (Adf-1), binds to sites in several Drosophila promoters and activates transcription, J Biol Chem 1990.

  • S. Chatterji and L. Pachter, GeneMapper: Reference based annotation with GeneMapper, in press.http://bio.math.berkeley.edu/genemapper/

  • Genes or Regulatory Elements? 10,516 10,867 putative orthologs have been identified as a core gene set conserved over 2555 million years (Myr) since the pseudoobscura/melanogaster divergence

    Cis-regulatory sequences are more conserved than random and nearby sequences between the speciesbut the difference is slight, suggesting that the evolution of cis-regulatory elements is flexibleRichards et al., Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution, Genome Res., Jan 2005.

  • DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **Alignment of coding sequenceDroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC *** **Alignment of non-coding sequence

  • DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **Alignment of coding sequenceAlignment of non-coding sequencedroAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA--CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAAAGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA----TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAATAGATCCT-TTATTT *** * * * *

    droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCACdroMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCACdp3.chr4_group3 -----------------------------------------TGT--ACTTACdroSim1.chr2L -----------------------------------------TATGGACTCACdroVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCACdroYak1.chr2L -----------------------------------------CATAAACTCAC *** **

  • Example of a conserved microRNA target

  • Richards et al., Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution, Genome Res., Jan 2005.

  • dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG

    dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTACHow is an alignment made from two sequences?

    >dm2.chr2L CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC>dp3.chr4_group3CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTACTTAC?Given two sequences of lengths n,m:n=50

    m=62

  • dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG

    dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTACDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCACDroPse_1_ ------TGTACTTACEach alignment can be summarized by counting the number of matches (#M), mismatches (#X), gaps (#G), and spaces (#S).

  • dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG

    dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTACDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCACDroPse_1_ ------TGTACTTACEach alignment can be summarized by counting the number of matches (#M), mismatches (#X), gaps (#G), and spaces (#S).2(#M+#X)+#S=112 so #X,#G and #S suffice to specify a summary.

  • The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points:

    (22,3,12)(18,3,28)

  • The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points:

    (22,3,12)(18,3,28)

    In the example of our two sequences there are 379522884096444556699773447791552717765633different alignments.

  • The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points:

    (22,3,12)(18,3,28)

    In the example of our two sequences there are 379522884096444556699773447791552717765633different alignments, but only53890 different summaries. So we dont need to plot that many points.

  • The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points:

    (22,3,12)(18,3,28)

    In the example of our two sequences there are 379522884096444556699773447791552717765633different alignments, but only53890 different summaries. So we dont need to plot that many points.

    But 53890 is still quite a large number. Fortunately, there are only 69 vertices on the convex hull of the 53890 points.

    These are the interesting ones, and we can even draw them

  • >melCTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGAC>pseCTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTAC

    For the sequences:the alignment polytope is:

  • mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC

    mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

    mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC

    mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

    mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC

    mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

    mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC

    mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

Recommended

View more >