medical data diagnosis

12
Feature Selection of Medical Diagnosis Data Using Genetic Algorithm and Data Mining

Upload: bhargav-srinivasan

Post on 11-Feb-2017

224 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Medical data diagnosis

Feature Selection of Medical Diagnosis DataUsing Genetic Algorithm and Data Mining

Page 2: Medical data diagnosis

Introduction● Data mining - Helps us with making sense and finding

patterns in huge collection of data that is obtained. Automate prediction and minimize human interaction.

● Feature Selection - Redundant and irrelevant variables and predictors need to be removed from the data.To simplify the model, remove noise and prevent

overfitting.● Genetic algorithm - Simulating a process of natural

selection.

Page 3: Medical data diagnosis

How it worksStep 1

Obtain the Data

Step 2

Feature Selection

Step 3

Genetic Algorithm

Medical Data

Mined Data Disease specific data

Mathematical model

Prediction Appropriate Treatment

New Records

Page 4: Medical data diagnosis

Data MiningData is obtained from

Medical Records of patients in Hospitals, Clinics.

We tabulate all the data of each patient into a number of parameters or variables

This data is the Training Data for the program.

Page 5: Medical data diagnosis

These are collected and tabulated medical records. This is called as training data. The data of 14 persons have been noted with their age, BMI (Body Mass Index), hereditary and vision along with the result, the risk of a medical condition. This data is subjected to Feature selection where irrelevant variables are eliminated. It is passed through a filter algorithm to obtain a better training data set. This data is crucial in the prediction of new patients disease.

Page 6: Medical data diagnosis

Feature Selection● Selecting a subset of the training data which is

relevant to the specific disease.● Removes irrelevant and redundant variables.● Find the variables that affect the outcome the most,

discard the rest.● Improves processing time and prevents overfitting● Three major methods

○ Filter Method○ Wrapper Method○ Embedded Method

Page 7: Medical data diagnosis

Feature Selection Methods

Filter Method

Pick out intrinsic properties of the data

Two stepped process

Ranking

Subset Selection

Fast and prevents overfitting

Wrapper Method

Almost the same as Filter except it can detect possible interaction between variables.

More specific but increases computation time

Embedded Method

Embedded into the model construction process

Combines the advantages of Filter and Wrapper methods

Page 8: Medical data diagnosis

Genetic Algorithm● Genetic Algorithms are adaptive optimization methods that mimic

natural evolution processes via non-exhaustive searches among randomly generated solutions.

● Inspired by natural evolution, such as inheritance, mutation, selection, and crossover.

● Application is in Medicine: Clinical Decision Support● The data is considered to be population, every record in the data

is treated like an “individual” and it’s output is treated as its score● We select the best individuals and apply the Genetic Algorithm to

create new individuals and repeat this till we get a population of the best individuals

Page 9: Medical data diagnosis

Fitness EvaluationFind the outputs of the inputs and find the best individuals

Initial PopulationAssess the population (data) and assign scores to each of them

Process of Genetic Algorithm

Mating/MutationTwo selected inputs can be mated with a chance of mutation to obtain an input with hopefully a better output

Quality CheckCheck if the population has sufficient quality. If yes, end the process. Else, repeat the process

Page 10: Medical data diagnosis

Process of Genetic Algorithm

Initial PopulationAssess the population (data) and assign scores to each of them

Fitness EvaluationFind the outputs of the inputs and find the best individuals

Mating/MutationTwo selected inputs can be mated with a chance of mutation to obtain an input with hopefully a better output

Quality CheckCheck if the population has sufficient quality. If yes, end the process. Else, repeat the process

Page 11: Medical data diagnosis

In Conclusion

● Data Mining and Genetic Algorithm techniques yield efficient results in the diagnosis of a disease.

● Feature selection methods enable elimination of irrelevant variables and generation of a better training set.

● The prediction for the new record is accurate and less time is consumed by the mathematical model to generate the prediction.

● Time is critical in diagnosis of disease, so early treatment results in high success rates for curing of disease.

Page 12: Medical data diagnosis

Thank You