a new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

35
Ismail M. El-Badawy, Ashraf M. Aziz, Senior Member, IEEE, Safa Gasser and Mohamed E. Khedr Department of Electronics & Communications Engineering Arab Academy for Science, Technology and Maritime Transport, Egypt Presented by Ismail M. El-Badawy A New Multiple Classifiers Soft Decisions Fusion Approach for Exons Prediction in DNA Sequences 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Upload: ismail-m-el-badawy

Post on 05-Dec-2014

436 views

Category:

Education


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Ismail M. El-Badawy, Ashraf M. Aziz, Senior Member, IEEE, Safa Gasser and Mohamed E. Khedr

Department of Electronics & Communications Engineering

Arab Academy for Science, Technology and Maritime Transport, Egypt

Presented by

Ismail M. El-Badawy

A New Multiple Classifiers Soft Decisions Fusion

Approach

for Exons Prediction in DNA Sequences

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 2: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Outline Introduction

DNA Structure

Predicting Exons Locations

Exons Prediction using DFT

Proposed Soft Decisions Fusion Approach

Performance Evaluation

Conclusion

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 3: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Introduction

Digital Signal Processing has proved its success in different fields,and bioinformatics is one of these fields.

Identification of protein coding regions in DNA sequences is one ofthe important topics in biosignal processing and bioinformaticsarea.

With the significant growth of sequenced genomic data, it hasbecome important to come up with computarized methods forpredicting these important protein coding regions (exons) in DNAsequences.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 4: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

DNA Structure

DNA, or deoxyribonucleic acid, is the hereditary material in humans

and almost all other organisms.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 5: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

DNA Structure Organisms can be categorized intoprokaryotes (e.g bacteria) andeukaryotes (e.g human).

In both categories, DNA consistsof genes separated by intergenicregions.

In eukaryotes, genes are furtherdivided into protein-codingregions (exons) and non-coding regions (introns).

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 6: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

DNA Structure

DNA is made up of nucleotides.

Nucleotides are identified by the

four nitrogen bases.

Nitrogen bases pair up with each

other forming a double helix.

Adenine (A) Thymine (T)

Cytosine (C) Guanine (G)

The two DNA strands are

complementary to each other.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 7: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

DNA Structure

DNA = Chain of nucleotides {A, C, G and T}.

This DNA chain (Exons and introns) can symbolically be

represented by a character string of four alphabet letters.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

………TCCGATCGATCGATCTCTCTAGCGTCTACGCTAT

CATCGCTCTCTATTATCGCGCGATCGTCGATCGCGCG

AGAGTATGCTACGTCGATCGAATTG …………………………

Page 8: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

DNA Structure

Protein-Coding regions (Exons) are the portions in DNA that

contain the information for producing proteins.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 9: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Predicting Exons Locations

Accurate prediction of the exons locations in DNA sequences is

an important issue for biologists since they are considered as

information bearing parts.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Exons

finder

TATTCCGATCGATCGATCT

CTCTAGCGTCTACGCTATC

ATCGCTCTCTATTATCGCG

CG ……

Exons Locations

Page 10: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Predicting Exons Locations The order of the nucleotides

stored in the Exons spell out acode for protein synthesis.

Triplets of nucleotides (codons)in the exonic segments of DNAspecify each type of amino acidbased on a genetic code.

Each amino acid is encoded by oneor more codons (many to onemapping).

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 11: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Predicting Exons Locations

It was shown in previous publications that exonic parts exhibit a

period-3 property due to the codon structure and the non-

uniform usage of codons in exonic regions.

This periodicity is absent outside the exonic segments.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

……… ACGTATTCCGATCGA …………… GACTCTAGCGTCTAC ………

Page 12: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Predicting Exons Locations

Three main steps to predict exons locations using digital signal

processing (DSP) tool.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Symbolic to

Numeric Mapping

Track the strength

of the period-3

component using

DSP tool

Decision Making

…TATTCCGATCGATCGATCTCTCTAGCGTCTAC

GCTATCATCGCTCTCTATTATCGCGCG ……

Exons Locations

Page 13: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Exons Prediction using DFT

Sliding window DFT is one of various DSP methods previously

proposed in the filed of exons prediction based on DNA spectral

analysis.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Numerical

Mapping

Sliding

Window

DFT

…TATTCCGATCGATCGATCTCTCTAGCGTCTAC

GCTATCATCGCTCTCTATTATCGCGCG ……

Exons Locations

X[n] S[L/3]

Page 14: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Exons Prediction using DFT

Calculating the power spectrum of a windowed DNA

numerical sequence at k=L/3 is sufficient as it is expected to be

large value in exonic regions and small value outside.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Numerical

Mapping

Sliding

Window

DFT

…TATTCCGATCGATCGATCTCTCTAGCGTCTAC

GCTATCATCGCTCTCTATTATCGCGCG ……

X[n] S[L/3]

Page 15: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Exons Prediction using DFT

A hard decision for each nucleotide (exonic or intronic

nucleotide) is made according to the corresponding S[L/3] value,

whether it is above or below a decision threshold.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Exons Locations

S[L/3]

Page 16: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Exons Prediction using DFT

In our work, we selected two symbolic-to-numeric mapping

schemes from different schemes that previously showed a

reasonable performance.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Numerical

Mapping

…TATTCCGATCGATCGATCTCTCTAGCGTCTAC

GCTATCATCGCTCTCTATTATCGCGCG ……

Nucleotide EIIP CIS

Adenine (A) 0.1260 1

Cytosine (C) 0.1340 -j

Guanine (G) 0.0806 -1

Thymine (T) 0.1335 j

X[n]

Page 17: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Exons Prediction using DFT

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

EIIP

Mapping

Sliding

Window

DFT

…TATTCCGATCGATCGAT…CTCTC…TAGCGTCT

ACGCTATCATCGCTCTCT…ATTATCGCGCG ……

CIS

Mapping

Sliding

Window

DFT

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

Nucleotide Positions

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

Nucleotide Positions

X[n]

X[n]

S[L/3]

S[L/3]

Gene F56F11.4 contains

five exons

Page 18: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Exons Prediction using DFT

Each mapping scheme is ablepronounce the peaks insome exonic segments than theother scheme.

The peaks in the exonicsegments are not alwaysconsistently large whilethose in the intronic segmentsare not always consistently low.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

Nucleotide Positions

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

Nucleotide Positions

Gene F56F11.4 contains

five exons

Page 19: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Proposed Soft Decisions Fusion Approach

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

EIIP

Mapping

Sliding

Window

DFT

…TATTCCGATCGATCGAT…CTCTC…TAGCGTCT

ACGCTATCATCGCTCTCT…ATTATCGCGCG ……

CIS

Mapping

Sliding

Window

DFT

X[n]

X[n]

S[L/3]

S[L/3]

Soft Decisions

Page 20: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Proposed Soft Decisions Fusion Approach

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Hard Decision (0 or 1)

Soft Decision (0 to 1)

Page 21: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Proposed Soft Decisions Fusion Approach

Each nucleotide belongs to exonic regions with a partial

membership value (i.e possibility of being an exonic nucleotide).

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)S[

L/

3]

Page 22: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Proposed Soft Decisions Fusion Approach

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

EIIP

Mapping

Sliding

Window

DFT

…TATTCCGATCGATCGAT…CTCTC…TAGCGTCT

ACGCTATCATCGCTCTCT…ATTATCGCGCG ……

CIS

Mapping

Sliding

Window

DFT

X[n]

X[n]

S[L/3]

S[L/3]

DFC

Exons Locations

Page 23: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Proposed Soft Decisions Fusion Approach

The DFC averages the two local soft decisions.

If the average exceeds 0.5 (i.e the average possibility of being

an exonic nucleotide exceeds 50% ),

the final decision is ‘1’,

otherwise ‘0’.

The combined decision

helps in making a more

reliable decision as compared to making

a hard decision depending on only one classifier.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

DFC

Exons Locations

Soft Decisions

Page 24: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation Metrics

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Prediction

Decision

PositiveTrue

False

NegativeTrue

False

Page 25: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation Metrics

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Prediction

Decision

PositiveTrue

False

NegativeTrue

False

Page 26: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation Metrics

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 27: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation Metrics

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 28: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation Metrics

Area under the ROC curve (AUC) is a good indicator.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 29: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation Metrics

F_measureVs Decision threshold is also a good indicator.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 30: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation

MATLAB Simulation is conducted on real data (HMR195

dataset) which is available online.

It contains 195 mammalian sequences consisting of 43 single-

exon and 152 multi-exon genes.

Traditional and proposed approaches are simulated using

different window shapes with a constant length (L=351) as

reported in previous publications.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 31: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

AUC values for HMR195 dataset and ROC curves plotted in

case of using Bartlett window.

Window

Shape

Single Classifier Multiple

ClassifierEIIP CIS

Rectangular 0.7280 0.7398 0.7862

Nutall 0.7264 0.7439 0.7972

Parzen 0.7281 0.7457 0.7989

Bohman 0.7314 0.7490 0.8021

Blackman 0.7331 0.7504 0.8035

Hanning 0.7387 0.7553 0.8079

Hamming 0.7425 0.7580 0.8106

Bartlett 0.7438 0.7589 0.8115

Page 32: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Numerical

Scheme

used by the

classifier

Number of

Classifiers

% of exonic nucleotides detected as true

positives

at 10% FPR at 20% FPR at 30% FPR

EIIP 1 43.5 56.9 66.4

CIS 1 46.8 59.9 68.7

Both 2 54.1 67.3 76.0

At 10% FPR:

by 24.4 % over single classifier

using EIIP

by 15.6 % over single classifier

using CIS

Page 33: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Performance Evaluation

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Maximum F_measures achieved and corresponding

decision thresholds for HMR195 dataset.

Single Classifier Multiple

ClassifierEIIP CIS

Maximum

F_measure

0.4287 0.4562 0.5086

Decision

Threshold

0.029 0.048 0.037

by 18.6 % over single classifier

using EIIP

by 11.5 % over single classifier

using CIS

Page 34: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Conclusion In our work, a new multiple DFT-based classifiers approach for exons

prediction has been proposed.

Making soft decisions instead of hard decisions and depending on twoclassifiers instead of one helps in making more reliable decisions.

The prediction accuracy is enhanced at the expense of increasingcomputational time and complexity.

Although the analysis of the proposed approach has been investigated incase of only two classifiers for simplicity, it can be easily be extended tomore than two classifiers.

2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)

Page 35: A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences

Thank You