paper presentation april 10, 2006 rui min topic in bioinformatics, dr. charles yan - training hmm...

17
Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence analysis

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Paper Presentation

April 10, 2006

Rui Min

Topic in Bioinformatics, Dr. Charles Yan

- Training HMM structure with genetic algorithm for biological sequence analysis

Page 2: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Overview

• An automatic means of optimizing the structure of HMMs

• Genetic algorithm (GA) for optimizing the HMM structure

• Experiments on two models– Promoter model of C.jejuni

– Coding region model of C.jejuni

• Conclusion

Page 3: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Train HMM structure by GA

Problems• Biologically interpretable structure• Controllable complexity

Method• Combine Baum-Wetch training with

GA, called GA for hidden Markov models (GA-HMM).

Page 4: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Flowchart

Page 5: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Genetic Operations (I)

• Selection– Roulette wheel selection– Stochastic universal sampling

Page 6: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Genetic Operations (II)

• Mutation

Page 7: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Genetic Operations (III)

• Crossover

Page 8: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Selective Baum-Welch

• The Log-likelihood of model k

Page 9: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Fitness value

• Fitness

Page 10: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Experiment I: promoter model of C.jejuni

• Parameters

Page 11: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Structure

Page 12: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Comparison

Page 13: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Experiment II: coding region model of C.jejuni

Page 14: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Experiment II: coding region model of C.jejuni

Page 15: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Conclusion

• Drawbacks– Biologically interpretable structure– No novel types of architecture– No large HMM structures– Those may be the future works

• Merit– Capability of dealing with substructures– GA has an application on bioinformatics

Page 16: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Unstated Aspects

• Too many constant parameters– Probability of population for Baum-Welch

training– Percentage of training/validation– Iteration times– Are they best?

• Unclear parameters– Terminal condition– The distribution of results, t-test?– The specific way to crossover, single?

Page 17: Paper Presentation April 10, 2006 Rui Min Topic in Bioinformatics, Dr. Charles Yan - Training HMM structure with genetic algorithm for biological sequence

Questions &

Discussion