an efficient ga-based algorithm for mining negative sequential patterns

Click here to load reader

Upload: ninon

Post on 21-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns. Zhigang Zheng , Yanchang Zhao, Ziye Zuo , and Longbing Cao PAKDD 2010. Outline. Motivation Problem Definition GA-Based Negative Sequential Pattern Mining Algorithm Experiments Conclusion. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns

Zhigang Zheng, Yanchang Zhao, Ziye Zuo, and Longbing CaoPAKDD 2010An Efficient GA-Based Algorithm for Mining Negative Sequential PatternsOutlineMotivationProblem DefinitionGA-Based Negative Sequential Pattern Mining AlgorithmExperimentsConclusion

MotivationNegative sequential patterns focus on negative relationships between itemsets.Absent items are taken into considerationDrawbackThe search space for mining negative patterns is much bigger than that for positive ones.Huge amounts of negative candidates will be generated.Ex. 10 distinct 1-item positive frequent items 103 3-item positive candidates, but there will be 203 3-item negative candidates.(Cont.)Based on Genetic Algorithm, a generation pass good genes on to a new generation by crossover and mutation without generating candidates using dynamic fitness function and pruning method to improve performance.Problem DefinitionA sequence is an ordered list of elementsA element ei consists of one or more items.Ex. consists of 4 elements and (c,d) is an element which includes two items.A positive sequence s =A negative sequence s = or A sequence is a max. positive subsequence of sequences and

(Cont.)Negative sequential patterns_sup min_supItems in the same element should be all positive or all negative. Ex. is not allowedTwo or more continuous negative elements are not accepted.For each negative item in a negative pattern, its positive item is required to be frequent.Negative Matching

GA-Based Negative Sequential Pattern Mining AlgorithmPopulation and SelectionCrossover and MutationPruningAlgo. FlowPopulation and SelectionInitial Population: all 1-item frequent positive and negative patterns.Selecting top K individuals with high dynamic fitnessIn order to evaluate the individuals and decide which are the best for the next generation, a fitness function is used.

Crossover and MutationCrossoverParents with different lengths are allowed to crossover with each other.Crossover may happen at different positions to get sequential patterns with varied lengths.Ex.

(Cont.)MutationMutation is helpful in avoiding contraction of the population to a special frequent pattern.Ex. Pruning Ex. c=c = is the max. positive subsequence of c and 0