encode pseudogene call summary
DESCRIPTION
ENCODE Pseudogene Call Summary. Mark Gerstein 2005,10.27 11:00 EDT (Draft for G&T call on 2005,10.28 10:00 EDT). Pseudogene group. - PowerPoint PPT PresentationTRANSCRIPT
1
ENCODE Pseudogene Call
SummaryMark Gerstein
2005,10.27 11:00 EDT
(Draft for G&T call on2005,10.28 10:00 EDT)
2
Pseudogene group Core people:
Jennifer Harrow <[email protected]>, WEI Chia-Lin <[email protected]>, Adam Frankish <[email protected]>, "Dike, Sujit"
<[email protected]>, Robert Baertsch
<[email protected]>, [email protected], Deyou Zheng
<[email protected]>, Yontao Lu <[email protected]>
[email protected], [email protected]
Others: "Hoyem, Tara L" <[email protected]>, Roderic Guigo Serra <[email protected]>, "'Gingeras, Tom'“ [email protected]>, [email protected], Suganthi Balasubramanian [email protected]
6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27
3
81 (34)
Havana-Gencode: 165 pseudogenes(167 -2 )
Yale: 167 pseudogenes (164 + 3)
UCSC retrogenes: 146 not expressed
16 (7)
33 (1)
15 (1)
17 (2) 16 (0)
54 (2)
Refresher: many repetitions of the below “Venn analysis”
7 Havana agrees to be added (8, 11, 40, 59, 139, 152, 169).4 at coding loci. [Yale agrees to delete]
1 with weak sequence identity.*5 with “non-real” proteins.*
9 Havana agrees to be added.2 at coding loci. [Yale agrees to delete]
1 with weak sequence identity.*2 with “non-real” proteins.*
* Solved by consistent protein set & threshold
Numbers according to Adam’s note
4
A proposal for qualified union with a uniform criteria for boundaries
1. Identify a “good” set of human proteins – HAVANA set? 2. Remove pseudogenes (from all 4 groups) overlapping with current GENCODE exons
(does GENCODE have an updated version?).
3. Create an union of the remaining pseudogenes.4. Find the “best” matching proteins for each pseudogene, remove entries without a BLAST hit (e-value cutoff
issue?). 5. Realign each pseudogene to its parent protein to produce a uniform alignment and to define the start and end
coordinates.6. Apply a threshold to sequence identity and coverage? (No.)7. Classify pseudogenes into processed and non-processed (how?)
Overall 222 pseudogenesApplication of above receipe gives
198 Consensus Intersection set of above is 81 (proc) + 49 (non-proc)
on browser + encode wiki + http://pseudogene.org/ENCODE
From Deyou Z. + Robert B.
5
Insertion into processed pseudogene
heterogeneous nuclear ribonucleoprotein A1 (HNRPA1) pseudogene (parent on Chr12)
NADH dehydrogenase 2 (MTND2) pseudogene (parent mitochondrial)
NADH dehydrogenase 4 (MTND4) pseudogene (parent mitochondrial)
cytochrome b (CYTB) pseudogene (parent mitochondrial)
First insertion event
Remnant of a second, mitochondrial insertion event (has post-insertion deletions)
Protein evidence
From Adam F.
6
Rearranged exon order in unprocessed pseudogene
Protein evidence
adaptor-related protein complex 1, beta 1 subunit (AP1B1) pseudogenes
Exon 6 Exon 3
Splice sites same as parent gene
Dot plot protein evidence vs genome
Following duplication of the AP1B1 locus rearrangements/duplications have produced two unprocessed pseudogenes corresponding to exons 6 and 3 of the parent gene
From Adam F.
7
Rearrangement of processed pseudogene
pseudogene similar to part of ribosomal protein L3 (RPL3)
Protein dot plot
mRNA dot plot
Following insertion, one end of the RPL3 pseudogene has been flipped onto the opposite strand (with some loss of internal sequence)
From Adam F.
8
Transcription among 198 consensus pseudogenes
- Nb overlapped by interrogated regions (affy arrays): 180 (90.9%)
- Nb overlapped by yale tars or affy transfrags (union): 106 (53.5% of all ; 58.9% of interrogated)
=> There is evidence of transcription (from tars or transfrags) of the pseudogene or the parent gene (if cross-hybridization) for 53.5% of the consensus pseudogenes
- Nb overlapped by cage tags: 11 (5.5%)
- Nb overlapped by ditag tags: 1 (0.5%) (83 (41.9%) are overlapped by full length ditags)
From France D.
9
Pseudogene overlapped by tars/transfrags and ditags: ENCODE_consensus_187
93% similar to parent
From France D.
10
Overlaps by tar/transfrag subset
- Nb overlapped by interrogated regions (affy arrays): 180 (90.9%)
- Nb overlapped by yale tars or affy transfrags (union): 106 (53.5% of all ; 58.9% of interrogated)
- Nb overlapped by yale tars (union): 84 (42.4% of all ; 46.7% of interrogated)
- Nb overlapped by affy transfrags (union): 102 (51.5% of all ; 56.7% of interrogated)
- Nb overlapped by polyA+ tars/transfrags (union) 105 (53% of all ; 58.3% of interrogated)
- Nb overlapped by total RNA tars (union) 61 (30.8% of all ; 33.9% of interrogated)
From France D.
11
ENCODE pseudogenes expression
• ENCODE pseudogenes from the intersection part of consensus set– 49 non-processed, 125 processed
• Designed oligos (25mer, Tm 70°C)– Either specific to pseudogene or shared
between parental gene and pseudogene
From Alex R.
12
ENCODE pseudogenes expression #2
• 5’RACE in 12 human tissues– Brain, heart, kidney, spleen, liver, colon, sm.
intestine, muscle, lung, stomach, testis, placenta
– First 96 pseudogenes 5’RACEs done in 12 tissues
– Last 78 will be done next week
• To do: pool multiple RACEs, send to Santa Clara and hybridize to Affymetrix ENCODE 20 nucleotide resolution arrays
Stylianos Antonarakis, Robert Baertsch, Jorg Drenkow, Tom Gingeras, Charlotte Henrichsen Philipp Kapranov, Catherine Ucla, Alexandre ReymondAffymetrix, UCSC, University of Geneva, University of Lausanne
From Alex R.
13
Expression from pseudogene locus
(1) – putative novel transcript
HAVANA sialyltransferase pseudogene (RP3-477O4.5) supported by protein evidence
Putative novel transcript supported by a single EST with has a polyA site and signal
Supporting EST (100% ID)
Aligned proteins (column collapsed)
polyA site and signal
Appears to be some transcription from this locus which is supported at the 3’ end by a single EST
From Adam F.
14
Frameshift
LILRA3
LILR pseudogene
Expression from pseudogene locus (2) – 5’ UTR of known gene
Upstream pseudogene corresponds to exons 1-3 of LILR family genes, 3’ exons have been lost. EST evidence supports expression from the pseudogene locus extending to known gene LILRA3.
From Adam F.
15
Intersect Consensus Pseudogenes with ChIP-chip Hits
Factors E2F H3K4me3
(0h)
H3K4me3 (30h)
Sp3 STAT1
Group UCDavis UCSD UCSD Stanford Yale
Total Hits 400 1000 1000 400 400
Known Genes (405)
145 149 154 86 15
genes (198) 4 25 24 3 7
From Deyou Z.
16
Consensus Pseudogenes with ≥2 ChIP-chip Hits
Pgene-ID Pgene-type E2F H3K4me3
(0h)
H3K4me3 (30h)
Sp3 STAT1
13 Processed 0 1 1 0 0
45 Processed 0 1 1 0 0
47 Processed 0 1 1 0 0
77 Processed 1 1 1 0 0
126 Processed 0 1 1 0 0
149 Processed 1 1 1 0 0
174 Non-Processed 0 1 1 0 0
[ 177 ] Non-Processed 1 1 1 0 0
187 Processed 0 1 1 0 0
193 Processed 0 0 0 1 1
Has Trans-criptional Evidence (intersects Gencode transcript)
From Deyou Z.
17
Example Pseudogene with Binding Hits (#177)
From Deyou Z.