vol. 4, special issue 6, may 2015 prediction of heart disease using naïve bayes algorithm ·...

9
ISSN(Online) : 2319 - 8753 ISSN (Print) : 2347 - 6710 International Journal of Innovative Research in Science, Engineering and Technology (An ISO 3297: 2007 Certified Organization) Vol. 4, Special Issue 6, May 2015 Copyright to IJIRSET www.ijirset.com 327 Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1 , S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu, India 1 PG Scholar, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu, India 2 ABSTRACT: The healthcare environment is generally perceived as being „information rich‟ yet „knowledge poor‟. There is a wealth of data available within the healthcare systems. There is a lack of effective analysis tools to discover hidden relationships and trends in data. Knowledge discovery and data mining have found numerous applications in business and scientific domain. Valuable knowledge can be discovered from application of data mining techniques in healthcare system. The potential use of classification based data mining technique Naïve Bayes to massive volume of healthcare data. The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined” to discover hidden information. Data preprocessing and effective decision making Naïve Bayes classifier is used. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart disease. The focus of this paper is to predict the heart disease using Naïve Bayes Algorithm. KEYWORDS: Heart Disease, Naïve Bayes, Data mining. 1. INTRODUCTION Knowledge discovery in databases is well-defined process consisting of several distinct steps. “Data mining is the non- trivial extraction of implicit previously unknown and potentially use ful information about data”. Data mining technology provides a user-oriented approach to novel and hidden patterns in the data. The discovered knowledge can be used by the healthcare administrators to improve the quality of service. A major challenge facing healthcare organizations (hospitals, medical centres) is the provision of quality services at affordable costs. Quality service implies diagnosing patients correctly and administering treatments that are effective. Hospitals must also minimize the cost of clinical tests. They can achieve these results by employing appropriate computer-based information and/or decision support systems. Health care data is massive. It includes patient centric data, resource management data and transformed data. Health care organizations must have ability to analyze data. Treatment records of millions of patients can be stored and computerized and data mining techniques may help in answering several important and critical questions related to health care. The availability of integrated information via the huge patient repositories, there is a shift in the perception of clinicians, patients and payers from qualitative visualization of clinical data by demanding a more quantitative assessment of information with the supporting of all clinical and imaging data. Medical diagnosis is considered as a significant yet intricate task that needs to be carried out precisely and efficiently. Clinical decisions are often made based on doctors‟ intuition and experience rather than on the knowledge rich data hidden in the database. This suggestion is promising as data modeling and analysis tools, e.g., data mining, have the potential to generate a knowledge-rich environment which can help to significantly improve the quality of clinical decisions. EXISTING SYSTEM The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined” to discover hidden information. Clinical decisions are often made based on doctors‟ intuition and experience rather than on the knowledge rich data hidden in the database. This practice leads to unwanted biases, errors and excessive medical

Upload: others

Post on 09-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 327

Prediction of Heart Disease Using Naïve Bayes

Algorithm

R.Karthiyayini1 , S.Chithaara

2

Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

India1

PG Scholar, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu, India 2

ABSTRACT: The healthcare environment is generally perceived as being „information rich‟ yet „knowledge poor‟.

There is a wealth of data available within the healthcare systems. There is a lack of effective analysis tools to discover

hidden relationships and trends in data. Knowledge discovery and data mining have found numerous applications in

business and scientific domain. Valuable knowledge can be discovered from application of data mining techniques in

healthcare system. The potential use of classification based data mining technique Naïve Bayes to massive volume of

healthcare data. The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined”

to discover hidden information. Data preprocessing and effective decision making Naïve Bayes classifier is used. Using

medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart

disease. The focus of this paper is to predict the heart disease using Naïve Bayes Algorithm.

KEYWORDS: Heart Disease, Naïve Bayes, Data mining.

1. INTRODUCTION

Knowledge discovery in databases is well-defined process consisting of several distinct steps. “Data mining is the non-

trivial extraction of implicit previously unknown and potentially useful information about data”. Data mining

technology provides a user-oriented approach to novel and hidden patterns in the data. The discovered knowledge can

be used by the healthcare administrators to improve the quality of service. A major challenge facing healthcare

organizations (hospitals, medical centres) is the provision of quality services at affordable costs. Quality service implies

diagnosing patients correctly and administering treatments that are effective. Hospitals must also minimize the cost of

clinical tests. They can achieve these results by employing appropriate computer-based information and/or decision

support systems. Health care data is massive. It includes patient centric data, resource management data and

transformed data. Health care organizations must have ability to analyze data. Treatment records of millions of patients

can be stored and computerized and data mining techniques may help in answering several important and critical

questions related to health care.

The availability of integrated information via the huge patient repositories, there is a shift in the perception

of clinicians, patients and payers from qualitative visualization of clinical data by demanding a more quantitative

assessment of information with the supporting of all clinical and imaging data. Medical diagnosis is considered as a

significant yet intricate task that needs to be carried out precisely and efficiently. Clinical decisions are often made

based on doctors‟ intuition and experience rather than on the knowledge rich data hidden in the database. This

suggestion is promising as data modeling and analysis tools, e.g., data mining, have the potential to generate a

knowledge-rich environment which can help to significantly improve the quality of clinical decisions.

EXISTING SYSTEM

The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined” to

discover hidden information. Clinical decisions are often made based on doctors‟ intuition and experience rather than

on the knowledge rich data hidden in the database. This practice leads to unwanted biases, errors and excessive medical

Page 2: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 328

costs which affects the quality of service provided to patients. Many healthcare organizations struggle with the

utilization of data collected through an organization. Online transaction processing (OLTP) system that is not integrated

for decision making and pattern analysis.

PROPOSED SYSTEM

Knowledge discovery in databases is well-defined process consisting of several distinct steps.

1. Data mining is the core step, which results in the discovery of hidden but useful knowledge from massive

databases.

2. For successful healthcare organization it is important to empower the management and staff with data

warehousing based on critical thinking. Data warehousing can be supported by decision support tools such as

data mart, OLAP and data mining tools.

3. With stored data in two-dimensional format OLAP makes it possible to analyze potentially large amount of

data with very fast response times.

4. It provides the ability for users to go through the data and drill down or roll up through various dimensions as

defined by the data structure.

This paper consists of four phases. The following are:

USER ENROLLMENT

TRAINING SET MAINTENANCE

STORAGE OF RELEVANT USER PROFILE

REPORT GENERATION

1.1 USER ENROLLMENT:

In the Health care environment the data have been collected and mined and stored in a particular

locations. The users can access those locations by performing certain operations and can view the pattern generated.

The users are the doctors, researches, peoples who can enter the website and are allowed to view the records. The

doctors are the authorized persons who can update, insert, view and delete the records. The hospitals should be

registered properly and the details of the respective patients are mined and records are maintained. This helps the users

to access the details of the patients very easily and there is 100% conformance record of the patients and their

respective diseases. And this data mining is made for particular diseases and only for registered hospitals. The users can

also check out the information provided such as symptoms, causes, locations, number of hospitals and the brief

descriptions.

1.2 TRAINING SET MAINTENANCE:

The hospitals which are to be registered need to contact the administrator. The administrator provides the

login ID for the doctors working in the hospitals which have been registered. The doctors who wants to be a member of

the website, but working in the nonregistered hospitals also can create their account directly. This doesn‟t mean there is

no proper authentication, but paves the way to provide a better knowledge discovery. Those outside doctors can register

their details and want to send the details to the administrator. The administrator then check out the details and finally if

the verification is over, admin sends the ID and password to the respected doctor‟s email. If any other problem arises

then immediately the admin is called, the internal operations are properly handled by the admin only.

Page 3: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 329

1.3 STORAGE OF RELEVANT USER PROFILE:

The doctors can login with their user name and they are provided to view the patient record. The patient‟s

record are updated and inserted by the doctor and make an entry in the database. Through this the patient‟s record is

maintained perfectly. The hospitals are registered and they are provided the registration ID, this provides the users to

view the registered hospital and their doctors list and patient details. This will be useful to generate the pattern using

NAIVE BAYESIAN classifier. This pattern will provide the appropriate occurrence diseases and its effects.

1.4 REPORT GENERATION:

The patient‟s record maintained is then make the use of generating the pattern. The rate of death due to

heart diseases and their risk factors are shown in the pattern and effectively generated. This pattern creates the user to

give right decision through effective mining of data‟s from a certain number of hospital‟s and the % of causes. The

patterns are generated with different charts :

-Year chart

-Age chart

- Gender chart

The probability of the states are compared and executed through NAIVE BAYESIAN Algorithm.

II. ARCHITECTURE

Page 4: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 330

1. ALGORITHM:

1.1 PREDICTION ALGORITHM:

The Bayesian Classification represents a supervised learning method as well as a statistical method for

classification. Assumes an underlying probabilistic model and it allows us to capture uncertainty about the model in a

principled way by determining probabilities of the outcomes. It can solve diagnostic and predictive problems.

1.2 STEPS:

The Naive Bayes algorithm is based on Bayesian theorem Steps in algorithm are as follows:

1. Each data sample is represented by an n dimensional feature vector, X = (x1, x2….. xn), depicting n

measurements made on the sample from n attributes, respectively A1, A2, An.

2. Suppose that there are m classes, C1, C2……Cm. Given an unknown data sample, X (i.e., having no class

label), the classifier will predict that X belongs to the class having the highest posterior probability, conditioned if and

only if:

P(Ci/X)>P(Cj/X) for all 1< = j< = m and j!= i

Thus we maximize P(Ci|X). The class Ci for which P(Ci|X) is maximized is called the maximum posteriori

hypothesis. By Bayes theorem,

3. As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. If the class prior probabilities

are not known, then it is commonly assumed that the classes are equally likely, i.e. P(C1) = P(C2) = …..= P(Cm), and

we would therefore maximize P(X|Ci). Otherwise, we maximize P(X|Ci)P(Ci).

2. SCREEN SHOTS:

HEART

DISEASE

PREDICTION

AGE

YEAR

GENDER

Page 5: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 331

Page 6: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 332

Page 7: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 333

Page 8: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 334

Page 9: Vol. 4, Special Issue 6, May 2015 Prediction of Heart Disease Using Naïve Bayes Algorithm · 2015-08-15 · Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini1

ISSN(Online) : 2319 - 8753

ISSN (Print) : 2347 - 6710

International Journal of Innovative Research in Science,

Engineering and Technology

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Special Issue 6, May 2015

Copyright to IJIRSET www.ijirset.com 335

III. CONCLUSION

This system present the problem of constraining and summarizing different algorithms of data mining. It

focused on using different algorithms for predicting combinations of several target attributes. In this paper, have

presented an intelligent and effective heart attack prediction methods using data mining. Firstly, an efficient approach

for the extraction of significant patterns from the heart disease data warehouses for the efficient prediction of heart

attack Based on the calculated significant weightage, the frequent patterns having value greater than a predefined

threshold were chosen for the valuable prediction of heart attack. All these models could answer complex queries in

predicting heart attack. This system classifies the given data into different categories and also predicts the risk of the

heart disease if unknown sample is given as an input. The system can be served as training tool for medical students.

Also, it will be helping hand for doctor.

REFERENCES

[1].Mai Shouman, Tim Turner, Rob Stocker, “Using data mining techniques in heart disease diagnosis and treatment”, JapanEgypt Conference on

Electronics, Communications and Computers 978-1-4673-0483-2 c_2012 IEEE. [2]. N. Aaditya Sunder, P. PushpaLatha, “Performance analysis of classification data mining techniques over heart disease database” Inernational

Journal Of Engineering Science and Advance Technology”-vol-2 issue-3,470-478,May-June 2012.

[3.] Sellappan Palaniappan, Rafiah Awang, Intelligent Heart Disease Prediction System Using Data Mining Techniques, 978-1-4244-1968- 5/08/$25.00 ©2008 IEEE.

[4] Han, J., Kamber, M.: “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2006.

[5] Shantakumar B.Patil, Y.S.Kumaraswamy, Intelligent and Effective Heart Attack Prediction System Using Data Mining and Artificial Neural Network, European Journal of Scientific Research ISSN 1450-216X Vol.31 No.4 (2009), pp.642656 © EuroJournals Publishing, Inc. 2009.