efficient and discovery of patterns in sequences data

19
By- M. Sai Sampath(09311A1912) N. Prasad Rao(09311A1918) Ch. Rajesh(09311A1938)

Upload: mallipeddi-sai-sampath

Post on 14-Apr-2015

159 views

Category:

Documents


2 download

DESCRIPTION

ppt

TRANSCRIPT

Page 1: Efficient and Discovery of Patterns in Sequences Data

By- M. Sai Sampath(09311A1912) N. Prasad Rao(09311A1918)

Ch. Rajesh(09311A1938)

Page 2: Efficient and Discovery of Patterns in Sequences Data
Page 3: Efficient and Discovery of Patterns in Sequences Data

 ABSTRACT Existing sequence mining algorithms mostly focus on mining for

subsequences. However, a large class of applications, such as biological DNA and protein

motif mining, require efficient mining of “approximate” patterns that are contiguous.

The few existing algorithms that can be applied to find such contiguous approximate pattern mining have drawbacks like poor scalability, lack of guarantees in finding the pattern, and difficulty in adapting to other applications.

In this paper, we present a new algorithm called Flexible and Accurate Motif Detector (FLAME).

FLAME is a flexible suffix-tree-based algorithm that can be used to find frequent patterns with a variety of definitions of motif (pattern) models.

It is also accurate, as it always finds the pattern if it exists. Using both real and synthetic data sets, we demonstrate that FLAME is fast, scalable, and outperforms existing algorithms on a variety of performance metrics.

In addition, based on FLAME, we also address a more general problem, named extended structured motif extraction, which allows mining frequent combinations of motifs under relaxed constraints.

Page 4: Efficient and Discovery of Patterns in Sequences Data

EXISTING SYSTEMExisting sequence mining algorithms mostly focus on mining for subsequences. Existing algorithms for structured motif mining can mine these patterns only if the user specifies the minimum and maximum number of gaps between the simple motifs.

Page 5: Efficient and Discovery of Patterns in Sequences Data

Disadvantage:Poor scalability,Lack of guarantees in finding the pattern,

Difficulty in adapting to other applications.

Page 6: Efficient and Discovery of Patterns in Sequences Data

PROPOSED SYSTEMThis method is primarily focused at finding

pairs (or sets) of motifs that co-occur in the data set within a short distance of each other. This method only considers a simple mismatch-based definition of noise, and does not consider other more complex motif models.

Page 7: Efficient and Discovery of Patterns in Sequences Data

Advantage:These show that FLAME is able to identify many true biological motifs.

FLAME never misses any matches. 

Page 8: Efficient and Discovery of Patterns in Sequences Data

MODULES

Doctor Module.Admin Module.Technician Module.FLAMES Module.

Page 9: Efficient and Discovery of Patterns in Sequences Data

MODULES DESCRIPTIONDoctor Module:In this module, is used to send mail to other

doctors, Admin and Lab Technicians. Doctors, view the patient entry details and patient test details. Edit personal details. Search test result using FLAMES algorithms.

Admin Module:In this module, is used to enter the patient, doctor

registration details and to send the doctor username and password from the mail. View the test details and send and view the mails using inbox. An admin is intermediate to doctor and lab technicians.

Page 10: Efficient and Discovery of Patterns in Sequences Data

MODULES DESCRIPTION(Cont.)Technician Module: In this module, is used to enter the patient

test results and also edit those details. The lab technician is used to send mails to others and view mails from inbox. The lab technician performs separately; it is not allowed to access other doctors and patient details without admin permission.

FLAMES Module: In this module, which can be used to find

the (L, M, s, k) motifs. For ease of exposition, we explain the algorithm using an (L, d, k) model, and then describe how we extend it to the full-fledged (L, M, s, k) model.

Page 11: Efficient and Discovery of Patterns in Sequences Data

MODULES DESCRIPTION(Cont.)The approach we take in FLAME explores the

space of all possible models. In order to carry out this exploration in an efficient way, we first construct two suffix trees: a suffix tree on the actual data set that contains counts in each node (called the data suffix tree), and a suffix tree on the set of all possible model strings (called the model suffix tree). This second set is typically the set of all strings of length L over the alphabet.

Page 12: Efficient and Discovery of Patterns in Sequences Data

SYSTEM SPECIFICATION

Hardware Requirements: System : Pentium IV 2.4 GHz.Hard Disk : 40 GB.Floppy Drive : 1.44 Mb.Monitor : 14’ Colour Monitor.Mouse : Optical Mouse.Ram : 512 Mb.Keyboard : 101 Keyboard.

Page 13: Efficient and Discovery of Patterns in Sequences Data

SYSTEM SPECIFICATION(Cont.)Software Requirements: Operating system : Windows XP.Coding Language : ASP.Net with C#Data Base : SQL Server 2005.

Page 14: Efficient and Discovery of Patterns in Sequences Data

UML DIAGRAMS:Use Case Diagram:

Page 15: Efficient and Discovery of Patterns in Sequences Data

Sequence Diagram:

Page 16: Efficient and Discovery of Patterns in Sequences Data

Collaboration Diagram:

Page 17: Efficient and Discovery of Patterns in Sequences Data

State Diagram:

Page 18: Efficient and Discovery of Patterns in Sequences Data

Activity Diagram:

Page 19: Efficient and Discovery of Patterns in Sequences Data