feature selection using complementary particle swarm optimization for dna microarray data

21
al Kaohsiung University of Applied Sciences Bioinformatics Lab. Reporter: Hua-Fang Chang Feature Selection using Complementary Particle Swarm Optimization for DNA Microarray Data

Upload: mjsky

Post on 20-Aug-2015

910 views

Category:

Technology


0 download

TRANSCRIPT

National Kaohsiung University of Applied Sciences

Bioinformatics Lab.

Reporter: Hua-Fang Chang

Feature Selection using Complementary Particle Swarm Optimization for DNA Microarray Data

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

Outline Introduction

Method

Result and Discussion

Conclusion

2/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• DNA microarray– DNA microarray could contribute the biological scholars to

analyze the various disease types; it was widely used to identify the DNA types, cells, and cancer classification.

– DNA microarray data was usually huge and complexity.

– Feature selection technique was applied to select the helpful DNA dimension.

– Feature selection to choose the subset from the dataset, and used the classification to estimate the subset.

Introduction (1/4)

3/21

Height Weight Age Interest

166 50 23 Soccer

156 54 65 Basketball

189 90 22 Soccer

177 68 24 Soccer

165 54 63 Basketball

156 45 50 tennis

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

Introduction (2/4)

• Algorithm– Many computational algorithms have been proposed to DNA

microarray.› Genetic Algorithm (GA), 1975

Each candidate solution has a set of properties (its chromosomes or genotype) which can be mutated and altered.

› Particle Swarm Optimization(PSO), 1995Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling.

› Binary particle swarm optimization (BPSO), 2005In their model a particle will decide on "yes" or " no", "true" or "false", "include" or "not to include" etc. also this binary values can be a representation of a real value in binary search space.

4/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

› Complementary Particle Swarm Optimization :

The complementary strategy aims to assist the particle search ability which help the particle deviating in a local optimum by moving their position to a new region in the search space.

› K-Nearest Neighbor :

The K-Nearest Neighbor (KNN) method is used to classify the features.

› leave-one-out cross-validation :

leave-one-out cross-validation (LOOCV) to compute classification error rates.

Introduction (3/4)

5/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

– We propose a Complementary Particle Swarm Optimization for DNA microarray.

– In standard PSO, particles may get trapped in a local optimum due to the premature convergence of particles.

– Therefore, we used the complementary strategy to avoid the particles trapped in a local optimum by moving the new region in the search space.

Introduction (4/4)

6/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Complementary Particle Swarm Optimization (CPSO)– PSO has been developed through simulation of the social behavior of

organisms, such as the social behavior observed of birds in a flock or fish in a school.

– Each particle is affected by its past experience and the swarm behavior.

– PSO has been successfully applied in many research areas, produced results more efficiently and has a lower cost compared to other methods.

– However, PSO is not suitable for optimization problems in a discrete feature space.

– We propose a Complementary Particle Swarm Optimization(CPSO) to overcome this problem.

Method (1/9)

7/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Complementary– The complementary strategy aims to assist the particle search ability

which help the particle deviating in a local optimum by moving their position to a new region in the search space.

– We used the complementary function to generate the new particles, and replace the 50% of the particles in the swarm.

Method (2/9)

8/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Initialization – Randomly initialize particle swarm(particle = 50).– Adjust position of particle swarm– Evaluate fitness of particle swarm– number of iterations=100

Method (3/9)

9/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Particle update– In CPSO, each particle is updated according to the following

equations:

• where w is the inertia weight that controls the impact of the previous velocity of a particle.

• c1 and c2 are acceleration constants that control the distance a particle moves at each generation.

• r1 and r2 are two random numbers between [0, 1].

• and represent the velocity of the new and old particles, respectively. • Particles and denote the position of the current particle and the updated

particle, respectively.

Method (4/9)

new

idvoldidvoldidx

newidx

old

idd

old

idid

old

id

new

id xgbestrcxpbestrcvwv 2211

newid

oldid

newid vxx

10/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Position and were the current position (solution) and the updated particle position. We use the LDW strategy to update the inertia weight w.

• The wmax and wmin were the value 0.9 and 0.4, respectively. Iterationmax and Iterationi were the maximal number of iterations and the current number of iterations, respectively. The function made the inertia weight w was linearly decreases from 0.9 to 0.4 though iteration.

Method (5/9)

minmax

minmaxminmax )( w

Iteration

IterationIterationwwwLDW

11/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Complementary Particle Swarm Optimization flowchart

Method (6/9)

12/21

start Initialize particle swarms with random position(x) and velocity(v)

Compute fitness

Evaluate position(x)

Sequence results

Whether reach complementary condition

Whether reach complementary conditioncomplementarycomplementary

Whether the termination condition

End

YES

NO

YES

NO

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• K-Nearest Neighbor (K-NN) – Each data points can according to its own features in a D-dimensional space.

K-NN classification effect the subject for the number of impact of these K neighbors.

– We used the Euclidean distance to compute all the testing data distance nearest the K know type data to decided the testing data type.

• Leave-one-out cross-validation (LOOCV) – In the LOOCV procedure, N samples are divide into a testing data and the N-1

training samples.

Method (7/9)

13/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

PSOparticle

pbesti

gbest

Consider the gbest and pbesti

Method (8/9)

14/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

particle

pbesti

gbest

(9,10)

Coordinate Axis (6,5)

Convert binary (0110,0101)

Complementary (1001,1010)

CPSO

Method (9/9)

15/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Data set– The data contains Brain_Tumor1_GEMS, Brain_Tumor2_ GEMS,

DLBCL_GEMS, Leukemia1_GEMS, Prostate_Tumor_GEMS, and SRBCT_GEMS. Table I shows the six data information.

Result and Discussion (1/4)

16/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Results– The prediction results of Complementary Particle Swarm Optimization

are superior to other methods from the literature.

Result and Discussion (2/4)

17/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• Discussion– In the pretreatment process, the feature selection can effectively reduce

the calculation time without negatively affecting classification accuracy.

– Feature selection uses relatively fewer features since only selective features need to be used. This does not affect the classification accuracy in a negative way.

– We perform an ‘and’ logic operation for all bits of all pbest values. pbest is the previously optimal position of each particle. In CPSO, if the position of pbest in each particle is recorded as {1}, then the new bit of a complementary will be {1} as well after the ‘and’ logic operation is performed, else it is {0}.

Result and Discussion (3/4)

18/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

– The purpose of this study was to improve on standard PSO.

– Some classification algorithms, such as decision tree, K-nearest neighbor aim at all feature to evaluate the classification performance.

– Experiments show that K-NN often achieve higher classification accuracy than other classification method. In a future work, we will combine K-NN with CPSO to evaluate and compare their classification accuracy and performances.

Result and Discussion (4/4)

19/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

• The classification error rate obtained by the CPSO method that is the lowest classification error rate when compare with other several methods in six DNA microarray datasets.

• The results on the DNA microarray dataset show that the complementary particle swarm optimization is superior to Non-SVM, MC-SVM, and BPSO in terms of diversity, convergence and computation cost.

• In the future, we intend to use different properties and other algorithms for DNA microarray in order to further enhance feature selection efficacy.

Conclusion

20/21

Bioinformatics Lab.

National Kaohsiung University of Applied Sciences

21/21

for your attention!Thanks

E-mail: [email protected]