survey of misannotations and pseudogenes in the arabidopsis genome

Post on 24-Jan-2016

54 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Survey of Misannotations and Pseudogenes in the Arabidopsis Genome. Tanmay Prakash. Objectives. Objectives Find Possible Misannotations Find Possible Pseudogenes. Why Misannotation can hinder research Pseudogenes can be used to study natural selection. Misannotations. Intron. UTR. - PowerPoint PPT Presentation

TRANSCRIPT

Survey of Misannotations and Pseudogenes in the Arabidopsis Genome

Tanmay Prakash

Objectives

Why•Misannotation can hinder research•Pseudogenes can be used to study natural selection

Objectives•Find Possible Misannotations•Find Possible Pseudogenes

Many misannotations are the result of gene prediction programs mislabeling introns because of the presence of a stop codon

Misannotations

CDS CDSIntronUTR UTR

Pseudogenes are DNA sequences that no longer function but resemble the functional genes they once were. There are two types:•Processed•Non-processed

Common Properties of Pseudogenes•Stop Codons•Frameshift mutations•Lack of Selective Pressure

agtacatgcataggactcgatcgactc

agtacatgataggactcgatcgactc

STCIGLDRL

ST..DSID

Pseudogenes

Query Protein

Domains

SubjectArabidopsis

Introns

BLASTSearch

HMMERSearch

Query Protein

Domains

SubjectArabidopsis

CDS

GenesMatching In Introns

GenesMatching

In CDS

GenesMatchingIn Both

PossiblyMisannotated

Genes

Check forStop CodonsFrameshift

CheckKa/Ks

PossiblePseudogenes

Pipeline

Query Protein

Domains

SubjectArabidopsis

Introns

BLASTSearch

HMMERSearch

Query Protein

Domains

SubjectArabidopsis

CDS

GenesMatching In Introns

GenesMatching In Exons

GenesMatchingIn Both

PossiblyMisannotated

Genes

Results

There were 346 genes (different models not included) that had matches to the same domain in the introns and exons

There were 299 genes (different models not included) that had matches to the same domain in an intron and flanking exons. These are most likely misannotations.

Domain Possible Misannotations #DomainsPF01657.7 16 76PF02902.8 15 32PF06721.1 13 3PF07734.2 15 113

4 domains with the most possible misannotations

Domain Family Size vs Misannotations

02468

10121416

0 500 1000 1500 2000 2500 3000

Number of Domains in Family

Nu

mb

er o

f M

isan

no

tati

on

s

Series1

Misannotation Frequency

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2000 4000 6000 8000 10000

Number of Genes Matching Domain

Per

cen

tag

e M

isan

no

tati

on

Domian Gene Frequentcy

0

5

10

15

20

0 2000 4000 6000 8000 10000

Number of genes matching Domain

Num

ber o

f M

isan

nota

tions

Future Research

•Identify pseudogenes by looking for stop codons, and frameshift mutations in the introns and checking the Ka/Ks value•Use a more recent database of domains•Follow the same process for the rice genome

Acknowledgement

Dr. Shin-Han ShiuDr. Kosuke HanadaDr. Melissa Lehti-ShiuDr. Gail RichmondHSHSP

top related