encode pseudogene updates adam frankish, havana 13/10/05

Download ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05

If you can't read please download the document

Upload: loreen-reynolds

Post on 17-Jan-2018

228 views

Category:

Documents


0 download

DESCRIPTION

Not added - YalePgene_139 I have been able to reconstruct a coding gene with a full length CDS at this locus (AC ) and would not annotate a coding gene and pseudogene at the same locus as discussed previously. The majority of the gene (3' end of exon 3 to final exon (8)) is supported by 100% matching (best in genome hits) human EST (Em:DN , Em:BG ) and mRNA evidence (Em:BC ) which together support a structure (although there is a small gap in support in exon 5) with an ORF extending from start to the final exon. Using human ESTs not from this locus eg Em:BM (approx 70% ID at this locus best hit in genome 100% to the KIR2DL4 gene also on chr19 by ensembl SSAHA) the 5' end of exon 3 and two further upstream exons can be clearly identified (all splice sites are clearly intact). The structure contains a CDS which starts in exon 1 (shares homology with the N-terminal sequence of several KIR2D family members in the exon), ends in the final exon and contains three immunoglobulin domains. The fact that despite the lack of transcript evidence from the 5’ end locus and the quite high degree of divergence between this locus and other gene family members, these splice sites are preserved suggests that this structure is correct and a coding gene rather than a pseudogene.

TRANSCRIPT

ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05 Not added - AK The transcripts on which this pseudogene is based do not appear to have a valid translation (only BC has a translation which looks spurious) Reverse strand mRNAs Ral-GDS related protein Rgr (Rgr) pseudogene Translation Not added - YalePgene_139 I have been able to reconstruct a coding gene with a full length CDS at this locus (AC ) and would not annotate a coding gene and pseudogene at the same locus as discussed previously. The majority of the gene (3' end of exon 3 to final exon (8)) is supported by 100% matching (best in genome hits) human EST (Em:DN , Em:BG ) and mRNA evidence (Em:BC ) which together support a structure (although there is a small gap in support in exon 5) with an ORF extending from start to the final exon. Using human ESTs not from this locus eg Em:BM (approx 70% ID at this locus best hit in genome 100% to the KIR2DL4 gene also on chr19 by ensembl SSAHA) the 5' end of exon 3 and two further upstream exons can be clearly identified (all splice sites are clearly intact). The structure contains a CDS which starts in exon 1 (shares homology with the N-terminal sequence of several KIR2D family members in the exon), ends in the final exon and contains three immunoglobulin domains. The fact that despite the lack of transcript evidence from the 5 end locus and the quite high degree of divergence between this locus and other gene family members, these splice sites are preserved suggests that this structure is correct and a coding gene rather than a pseudogene. Not added - YalePgene_139 ProteinESTmRNA Supporting evidence Not added - YalePgene_139 Dot plot of EST Splice donor Havana+, Yale-, UCSC- AC AC AC AF RP11-143H AC Z Z AC AC AC AC AC AC AC AC AC AC AL We think the annotation of these as pseduogenes can be supported ENm001 - AC , AC heterogeneous nuclear ribonucleoprotein A1 (Hnrpa1) pseudogene NADH dehydrogenase 2 (MTND2) pseudogene NADH dehydrogenase 4 (MTND4) pseudogene Yale pseudo UCSC pseudo New cytochrome b (CYTB) pseudogene ENm002 - AC Dot plot Alignment ENm004 - RP1-127L4.3 UCSC pseudo Yale pseudo HAVANA pseudo ENm006 - AF olfactory receptor family pseudogene ENm006 - RP11-143H17.1 HAVANA pseudo Frameshift ENm007 - AC HAVANA LIR pseudogene ENm008 - Z HAVANA hemoglobin, alpha pseudogene ENm009 - AC olfactory receptor, family 51, subfamily N, member 1 pseudogene Frameshift ENm009 - AC olfactory receptor, family 52, subfamily Y, member 1 pseudogene ENm009 - AC olfactory receptor, family 52, subfamily Z, member 1 pseudogene No Met First possible Met ENm009 - AC olfactory receptor, family 51, subfamily A, member 10 pseudogene Frameshift ENm009 - AC Novel pseudogene ENm013 - AC ribosomal protein L5 (RPL5) pseudogene ENr121 - AC hydroxytryptamine (serotonin) receptor 5B (HTR5B) pseudogene Frameshift ENr131 - AC UDP glycosyltransferase 1 family, polypeptide A2 pseudogene Frameshift ENr233 - AC Novel pseudogene 3 truncation ~350aa missing, no stop ENr233 - AC stereocilin (STRC) pseudogene Stop codon in exon 20 ENr322 - AL pseudogene similar to part of ribosomal protein L3 (RPL3) Protein dot plot mRNA dot plot HAVANA pseudogene overlaps exon Non-coding locus AC , AC , AC , AC , AC , AC , AC , AC , RP3-477O4.5 Coding locus opposite strand AC , RP11-143H17.1, AC , RP11-398K22.9, RP3-477O4.4 Coding locus same strand AC , Z , AC We believe all these pseudogenes are valid Non-coding locus HAVANA sialyltransferase pseudogene Putative novel transcript Supporting EST Aligned proteins (column collapsed) Coding locus opposite strand Protein alignment HAVANA novel pseudogene Non-coding exon ENm001 Pseudogene: AC Coding locus same strand Frameshift LILRA3 LILR pseudogene But not. In-frame stop codon KIR2DL3 coding gene