feature selection using complementary particle swarm optimization for dna microarray data
TRANSCRIPT
National Kaohsiung University of Applied Sciences
Bioinformatics Lab.
Reporter: Hua-Fang Chang
Feature Selection using Complementary Particle Swarm Optimization for DNA Microarray Data
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
Outline Introduction
Method
Result and Discussion
Conclusion
2/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• DNA microarray– DNA microarray could contribute the biological scholars to
analyze the various disease types; it was widely used to identify the DNA types, cells, and cancer classification.
– DNA microarray data was usually huge and complexity.
– Feature selection technique was applied to select the helpful DNA dimension.
– Feature selection to choose the subset from the dataset, and used the classification to estimate the subset.
Introduction (1/4)
3/21
Height Weight Age Interest
166 50 23 Soccer
156 54 65 Basketball
189 90 22 Soccer
177 68 24 Soccer
165 54 63 Basketball
156 45 50 tennis
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
Introduction (2/4)
• Algorithm– Many computational algorithms have been proposed to DNA
microarray.› Genetic Algorithm (GA), 1975
Each candidate solution has a set of properties (its chromosomes or genotype) which can be mutated and altered.
› Particle Swarm Optimization(PSO), 1995Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling.
› Binary particle swarm optimization (BPSO), 2005In their model a particle will decide on "yes" or " no", "true" or "false", "include" or "not to include" etc. also this binary values can be a representation of a real value in binary search space.
4/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
› Complementary Particle Swarm Optimization :
The complementary strategy aims to assist the particle search ability which help the particle deviating in a local optimum by moving their position to a new region in the search space.
› K-Nearest Neighbor :
The K-Nearest Neighbor (KNN) method is used to classify the features.
› leave-one-out cross-validation :
leave-one-out cross-validation (LOOCV) to compute classification error rates.
Introduction (3/4)
5/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
– We propose a Complementary Particle Swarm Optimization for DNA microarray.
– In standard PSO, particles may get trapped in a local optimum due to the premature convergence of particles.
– Therefore, we used the complementary strategy to avoid the particles trapped in a local optimum by moving the new region in the search space.
Introduction (4/4)
6/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Complementary Particle Swarm Optimization (CPSO)– PSO has been developed through simulation of the social behavior of
organisms, such as the social behavior observed of birds in a flock or fish in a school.
– Each particle is affected by its past experience and the swarm behavior.
– PSO has been successfully applied in many research areas, produced results more efficiently and has a lower cost compared to other methods.
– However, PSO is not suitable for optimization problems in a discrete feature space.
– We propose a Complementary Particle Swarm Optimization(CPSO) to overcome this problem.
Method (1/9)
7/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Complementary– The complementary strategy aims to assist the particle search ability
which help the particle deviating in a local optimum by moving their position to a new region in the search space.
– We used the complementary function to generate the new particles, and replace the 50% of the particles in the swarm.
Method (2/9)
8/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Initialization – Randomly initialize particle swarm(particle = 50).– Adjust position of particle swarm– Evaluate fitness of particle swarm– number of iterations=100
Method (3/9)
9/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Particle update– In CPSO, each particle is updated according to the following
equations:
• where w is the inertia weight that controls the impact of the previous velocity of a particle.
• c1 and c2 are acceleration constants that control the distance a particle moves at each generation.
• r1 and r2 are two random numbers between [0, 1].
• and represent the velocity of the new and old particles, respectively. • Particles and denote the position of the current particle and the updated
particle, respectively.
Method (4/9)
new
idvoldidvoldidx
newidx
old
idd
old
idid
old
id
new
id xgbestrcxpbestrcvwv 2211
newid
oldid
newid vxx
10/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Position and were the current position (solution) and the updated particle position. We use the LDW strategy to update the inertia weight w.
• The wmax and wmin were the value 0.9 and 0.4, respectively. Iterationmax and Iterationi were the maximal number of iterations and the current number of iterations, respectively. The function made the inertia weight w was linearly decreases from 0.9 to 0.4 though iteration.
Method (5/9)
minmax
minmaxminmax )( w
Iteration
IterationIterationwwwLDW
11/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Complementary Particle Swarm Optimization flowchart
Method (6/9)
12/21
start Initialize particle swarms with random position(x) and velocity(v)
Compute fitness
Evaluate position(x)
Sequence results
Whether reach complementary condition
Whether reach complementary conditioncomplementarycomplementary
Whether the termination condition
End
YES
NO
YES
NO
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• K-Nearest Neighbor (K-NN) – Each data points can according to its own features in a D-dimensional space.
K-NN classification effect the subject for the number of impact of these K neighbors.
– We used the Euclidean distance to compute all the testing data distance nearest the K know type data to decided the testing data type.
• Leave-one-out cross-validation (LOOCV) – In the LOOCV procedure, N samples are divide into a testing data and the N-1
training samples.
Method (7/9)
13/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
PSOparticle
pbesti
gbest
Consider the gbest and pbesti
Method (8/9)
14/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
particle
pbesti
gbest
(9,10)
Coordinate Axis (6,5)
Convert binary (0110,0101)
Complementary (1001,1010)
CPSO
Method (9/9)
15/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Data set– The data contains Brain_Tumor1_GEMS, Brain_Tumor2_ GEMS,
DLBCL_GEMS, Leukemia1_GEMS, Prostate_Tumor_GEMS, and SRBCT_GEMS. Table I shows the six data information.
Result and Discussion (1/4)
16/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Results– The prediction results of Complementary Particle Swarm Optimization
are superior to other methods from the literature.
Result and Discussion (2/4)
17/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• Discussion– In the pretreatment process, the feature selection can effectively reduce
the calculation time without negatively affecting classification accuracy.
– Feature selection uses relatively fewer features since only selective features need to be used. This does not affect the classification accuracy in a negative way.
– We perform an ‘and’ logic operation for all bits of all pbest values. pbest is the previously optimal position of each particle. In CPSO, if the position of pbest in each particle is recorded as {1}, then the new bit of a complementary will be {1} as well after the ‘and’ logic operation is performed, else it is {0}.
Result and Discussion (3/4)
18/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
– The purpose of this study was to improve on standard PSO.
– Some classification algorithms, such as decision tree, K-nearest neighbor aim at all feature to evaluate the classification performance.
– Experiments show that K-NN often achieve higher classification accuracy than other classification method. In a future work, we will combine K-NN with CPSO to evaluate and compare their classification accuracy and performances.
Result and Discussion (4/4)
19/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
• The classification error rate obtained by the CPSO method that is the lowest classification error rate when compare with other several methods in six DNA microarray datasets.
• The results on the DNA microarray dataset show that the complementary particle swarm optimization is superior to Non-SVM, MC-SVM, and BPSO in terms of diversity, convergence and computation cost.
• In the future, we intend to use different properties and other algorithms for DNA microarray in order to further enhance feature selection efficacy.
Conclusion
20/21
Bioinformatics Lab.
National Kaohsiung University of Applied Sciences
21/21
for your attention!Thanks
E-mail: [email protected]