encode pseudogene updates adam frankish, havana 6/10/05

24
ENCODE pseudogene updates Adam Frankish, HAVANA 6/10/05

Upload: berniece-willis

Post on 27-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

ENCODE pseudogene updates

Adam Frankish, HAVANA

6/10/05

Not added - AK125808

The transcripts on which this pseudogene is based do not appear to have a valid translation (only BC007286.1 has a translation which looks spurious)

Reverse strand mRNAs

Ral-GDS related protein Rgr (Rgr) pseudogene

Translation

Not added - YalePgene_139I have been able to reconstruct a coding gene with a full length CDS at this locus (AC009892.1) and would not annotate a coding gene and pseudogene at the same locus as discussed previously. The majority of the gene (3' end of exon 3 to final exon (8)) is supported by 100% matching (best in genome hits) human EST (Em:DN998408.1, Em:BG743947.1) and mRNA evidence (Em:BC033195.1) which together support a structure (although there is a small gap in support in exon 5) with an ORF extending from start to the final exon. Using human ESTs not from this locus eg Em:BM918119.1 (approx 70% ID at this locus best hit in genome 100% to the KIR2DL4 gene also on chr19 by ensembl SSAHA) the 5' end of exon 3 and two further upstream exons can be clearly identified (all splice sites are clearly intact). The structure contains a CDS which starts in exon 1 (shares homology with the N-terminal sequence of several KIR2D family members in the exon), ends in the final exon and contains three immunoglobulin domains. The fact that despite the lack of transcript evidence from the 5’ end locus and the quite high degree of divergence between this locus and other gene family members, these splice sites are preserved suggests that this structure is correct and a coding gene rather than a pseudogene.

Not added - YalePgene_139Protein EST mRNA

Supporting evidence

Not added - YalePgene_139

Dot plot of EST

Splice donor

Havana+, Yale-, UCSC-AC006326.4-001

AC006326.2-001

AC063976.2-001

AF277315.12-001

RP11-143H17.1-001

AC009892.5-001

Z84721.2-001

Z84721.4-001

AC103710.2-001

AC103710.4-001

AC129505.5-001

AC087380.10-001

AC087380.14-001

AC002456.1-001

AC009404.5-001

AC114812.7-001

AC011330.5-001

AC011330.8-001

AL162151.3-001

We think the annotation of these as pseduogenes can be supported

ENm001 - AC006326.2, AC006326.4

heterogeneous nuclear ribonucleoprotein A1 (Hnrpa1) pseudogene

NADH dehydrogenase 2 (MTND2) pseudogene

NADH dehydrogenase 4 (MTND4) pseudogene

Yale pseudo

UCSC pseudo

New cytochrome b (CYTB) pseudogene

ENm002 - AC063976.2

Dot plot

Alignment

ENm004 - RP1-127L4.3

UCSC pseudo

Yale pseudoHAVANA pseudo

ENm006 - AF277315.12

olfactory receptor family pseudogene

ENm006 - RP11-143H17.1

HAVANA pseudo

Frameshift

ENm007 - AC009892.5

HAVANA LIR pseudogene

ENm008 - Z84721.4

HAVANA hemoglobin, alpha pseudogene

ENm009 - AC103710.2

olfactory receptor, family 51, subfamily N, member 1 pseudogene

Frameshift

ENm009 - AC103710.4olfactory receptor, family 52, subfamily Y, member 1 pseudogene

ENm009 - AC129505.5

olfactory receptor, family 52, subfamily Z, member 1 pseudogene

No Met

First possible Met

ENm009 - AC087380.10

olfactory receptor, family 51, subfamily A, member 10 pseudogene

Frameshift

ENm009 - AC087380.14

Novel pseudogene

ENm013 - AC002456.1

ribosomal protein L5 (RPL5) pseudogene

ENr121 - AC009404.5

5-hydroxytryptamine (serotonin) receptor 5B (HTR5B) pseudogene

Frameshift

ENr131 - AC114812.7

UDP glycosyltransferase 1 family, polypeptide A2 pseudogene

Frameshift

ENr233 - AC011330.5

Novel pseudogene

3’ truncation ~350aa missing, no stop

ENr233 - AC011330.8

stereocilin (STRC) pseudogene Stop codon in exon 20

ENr322 - AL162151.3

pseudogene similar to part of ribosomal protein L3 (RPL3)

Protein dot plot

mRNA dot plot