microrna identification based on sequence and structure alignment presented by - neeta jain xiaowo...

19
MicroRNA identification based MicroRNA identification based on on sequence sequence and and structure alignment structure alignment Presented by - Presented by - Neeta Jain Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li

Upload: frederick-kenneth-washington

Post on 02-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

MicroRNA identification based on MicroRNA identification based on sequence sequence

andandstructure alignmentstructure alignment

Presented by -Presented by -

Neeta JainNeeta Jain

Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and

Yanda Li

Page 2: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

OutlineOutline

IntroductionIntroduction MotivationMotivation ExperimentExperiment

MaterialsMaterials MethodsMethods

ResultsResults ConclusionConclusion

Page 3: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

IntroductionIntroduction What are miRNAs and why are What are miRNAs and why are

they important? they important?

miRNAs are ~22 nt long non-coding miRNAs are ~22 nt long non-coding RNAsRNAs

They are derived from their ~70 nt They are derived from their ~70 nt precursors, which typically have a precursors, which typically have a hairpin structurehairpin structure

Importance of miRNAs:

They are found to regulate the expression of target genes via complementary base pair interactions.

Page 4: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

MotivationMotivation

Since miRNAs are short (~22 nt), conventional sequence Since miRNAs are short (~22 nt), conventional sequence alignment methods can only find relatively close alignment methods can only find relatively close homologueshomologues

It has been reported that miRNA genes are more conserved It has been reported that miRNA genes are more conserved in their secondary structure than in primary structurein their secondary structure than in primary structure

This paper exploits this secondary structure conservation This paper exploits this secondary structure conservation and proposes a novel computational approach to detect and proposes a novel computational approach to detect miRNAs based on both sequence and structure alignmentmiRNAs based on both sequence and structure alignment

The authors devised a tool – miRAlign and have compared The authors devised a tool – miRAlign and have compared it’s performance with existing searching methods such as it’s performance with existing searching methods such as BLAST and ERPINBLAST and ERPIN

Page 5: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

ExperimentExperiment

MaterialsMaterials Reference setsReference sets

Consists of 1298 miRNAs from 12 species out of Consists of 1298 miRNAs from 12 species out of which 1054 were animal miRNAs. which 1054 were animal miRNAs.

1054 animal miRNAs and their precursors(1104) 1054 animal miRNAs and their precursors(1104) composed our raw training set Train_All.composed our raw training set Train_All.

Train_Sub_1 : All animal miRNAs except those from Train_Sub_1 : All animal miRNAs except those from C.briggsaeC.briggsae

Train_Sub_2: All animal miRNAs except those from Train_Sub_2: All animal miRNAs except those from C.briggsae and C.elegansC.briggsae and C.elegans

Genomic sequencesGenomic sequences Sequences of 6 species were used.Sequences of 6 species were used.

Page 6: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

MethodsMethods PreprocessingPreprocessing

Known precursors from training set are used to Known precursors from training set are used to BLAST against the genomeBLAST against the genome

Potential regions are cut from the genome with 70 nt Potential regions are cut from the genome with 70 nt flanking sequences to each endflanking sequences to each end

Such regions are scanned using a 100nt window with Such regions are scanned using a 100nt window with 10 nt step10 nt step

Overlapping sequences with repeat sequences are Overlapping sequences with repeat sequences are discarded.discarded.

Page 7: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Methods (contd)Methods (contd) miRAlignmiRAlign

Secondary Structure PredictionSecondary Structure Prediction Both the candidate sequence and it’s reverse Both the candidate sequence and it’s reverse

complement are analyzed by RNA fold to predict complement are analyzed by RNA fold to predict hairpins.hairpins.

Only hairpins with MFE lower than -20 kcal/mol Only hairpins with MFE lower than -20 kcal/mol are retained.are retained.

Pairwise sequence alignmentPairwise sequence alignment Sequences from previous step are aligned Sequences from previous step are aligned

pairwise to all the ~22 nt known miRNA pairwise to all the ~22 nt known miRNA sequences from the training setsequences from the training set

Sequence similarity score between the candidate Sequence similarity score between the candidate and known mature miRNAs is calculated by and known mature miRNAs is calculated by CLUSTALW.CLUSTALW.

If the score exceeds a user-defined threshold, If the score exceeds a user-defined threshold, then the candidate to known miRNA pairs are then the candidate to known miRNA pairs are kept for further analysiskept for further analysis

Page 8: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Methods (contd)Methods (contd) Checking miRNA’s position on stemloopChecking miRNA’s position on stemloop

3 properties for miRNA’s position are considered:3 properties for miRNA’s position are considered: Should not locate on terminal loop of hairpinShould not locate on terminal loop of hairpin Should locate on the same arm of hairpinShould locate on the same arm of hairpin Position of potential miRNA on hairpin should not Position of potential miRNA on hairpin should not

differ too much from it’s known homologuesdiffer too much from it’s known homologues

Position difference of miRNA on precursors A and B:Position difference of miRNA on precursors A and B:

Page 9: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Methods (contd)Methods (contd) RNA secondary structure alignmentRNA secondary structure alignment

RNAforester computes pairwise structure alignment RNAforester computes pairwise structure alignment and gives similarity scoreand gives similarity score

Score is a summation of all base (base pair) match Score is a summation of all base (base pair) match (insertion, deletion).(insertion, deletion).

Normalized similarity score of structure C and m is Normalized similarity score of structure C and m is given as:given as:

where,

C – Candidate sequence ; m – known pre-miRNA;

sigma_local(C,m) – raw local alignment score between C and m

Sigma(m,m) – self-alignment score of m

Page 10: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Methods (contd)Methods (contd) Total similarity scoreTotal similarity score

After aligning all potential homologue pairs, a total After aligning all potential homologue pairs, a total similarity score (tss) is assigned to each candidate similarity score (tss) is assigned to each candidate sequence.sequence.

Where,

C- candidate sequence ; R – set composed of all C’s

Page 11: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Methods (contd)Methods (contd)Summary -

Page 12: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

ResultsResults Application on C.briggsaeApplication on C.briggsae

Detection of miRNA homologues -Detection of miRNA homologues -

miRAlign was applied on C.briggsae’s data with training miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded. set Train_Sub_1 and sensitivity and specificity were recorded.

Identification of miRNAs in distantly related Identification of miRNAs in distantly related species -species -

miRAlign was applied on C.briggsae’s data with training miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were set Train_Sub_1 and sensitivity and specificity were

recordedrecorded

Page 13: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Graph 1 -

Results (contd)

Page 14: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Graph 2 -

Results (contd)

Page 15: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Comparison of miRAlign with BLAST -

Results (contd)

Page 16: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Comparison of miRAlign with ERPIN -

Results (contd)

Page 17: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

Other results:Other results:

miRAlign was applied to miRAlign was applied to A. gambiaeA. gambiae and 59 putative and 59 putative miRNAs with tss > 35 were detected . This was validated miRNAs with tss > 35 were detected . This was validated when 38 when 38 A. gambiaeA. gambiae miRNAs were reported in the miRNAs were reported in the MicroRNA registry 6.0 and 37 of them were covered by MicroRNA registry 6.0 and 37 of them were covered by miRAlignmiRAlign

miRAlign was also applied to plant, miRAlign was also applied to plant, Zea mays Zea mays and and detected 28 out of 40 known detected 28 out of 40 known Zea Mays Zea Mays miRNAs.miRNAs.

Results (contd)

Page 18: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

ConclusionConclusion

Combining sequence and structure alignments, miRAlign Combining sequence and structure alignments, miRAlign has better performance than previously reported has better performance than previously reported homologue search methodshomologue search methods

Although, mirAlign was based on animal data, the miRNAs Although, mirAlign was based on animal data, the miRNAs predicted in Zea mays indicates that miRAlign can be predicted in Zea mays indicates that miRAlign can be applied to plants. Further investigation regarding this is applied to plants. Further investigation regarding this is underway.underway.

Page 19: MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong

THANK YOU

Questions ??