analysis and classification of respiratory health risks with respect to air pollution levels

18
Analysis and Classification of Respiratory Health Risks with Respect to Air Pollution Levels Ruhul Amin Dicken North South University, Bangladesh SNPD 2015 2015/10/26(Mon.) Chang Wei-Yuan @ MakeLab Lab Meeting Keywords: data mining; health problem; decision tree; air pollution; respiratory diseases.

Upload: wei-yuan-chang

Post on 11-Apr-2017

361 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Analysis and Classification of Respiratory HealthRisks with Respect to Air Pollution Levels

Ruhul Amin DickenNorth South University, Bangladesh

SNPD 2015

2015/10/26(Mon.)Chang Wei-Yuan @ MakeLab Lab Meeting

Keywords: data mining; health problem; decision tree; air pollution; respiratory diseases.

Outline§ Introduction§ Data Description§ Methodology§ Evaluation§ Conclusion

2

Introduction§ Air pollution is the harmful materials to cause

adverse effects on human lives.– more serious with the development of the growing

cities§ Bangladesh is facing this problem due to

continuous increase of population. – this study is focused on a developing country, Dhaka

3

Goal§ This paper works on the relationship between

the pollutants and the admittance of patients.– focused on a case of Dhaka, Bangladesh– K-means method: clustering different air pollutants in

different seasons– CART method: to classify the patients according to

different rate of admission

4

Data Description: Air pollutions§ Air quality data is collected from Dhaka City

monthly– by CASE (Clean Air and Sustainable Environment)– collected: air pollutants and meteorological variables

5

Data Description: Air pollutions§ Air quality data is collected from Dhaka City

monthly– by CASE (Clean Air and Sustainable Environment)– collected: air pollutants and meteorological variables

6

stations time SO2 NO2 … solar rainfall …string datetime float float … float float …

Data Description: Diseases § Respiratory diseases data is collected from

NIDCH monthly– for each diseases

7

location time Age group COPD   ILD BroCarstring datetime string integer integer integer

• COPD (chronic obstructive pulmonary disease) 慢性阻塞性肺病• ILD (interstitial lung disease) 肺病變• Bronchogenic/Bronchial Carcinoma 支氣管癌

Methodology§ Clustering using k-Means algorithm§ Classification using the CART analysis

8

Clustering § Air quality data– using k-means with k=4

9

Clustering § Air quality data– correlation among the Air data attributes

10

Clustering § Respiratory diseases admissions data– using k-means with k=3– High (H), Medium (M), Low (L)

11

location time Age group COPD   ILD BroCarstring datetime string integer integer integer

location time Age group COPD   ILD BroCarstring datetime string level level level

Classification§ Using the air pollution data and clustered

medical data acting as class label– to generate a decision tree which would predict the

level of hospital admissions level– for each Age groups and different diseases

12

stations time SO2 … solar … diseasesstring datetime float … float … level

location time Age group COPD   ILD BroCarstring datetime string level level level

Classification§ The decision tree generation process was

conducted on the basis of the three different criterion metrics– (i) Information Gain– (ii) Gini Index– (iii) Gain Ratio

§ Then the two best trees were selected in our results.

13

Evaluation§ ILD for 24-49 Age group

14

Evaluation§ ILD for 50+ Age group

15

Evaluation16

• In order for any model to be validated as an applicable model to real world scenarios it must have an accuracy higher than 50%.

Conclusion§ The COPD and ILD model came as applicable

but the bronchitis carcinoma gave a model which was not applicable in real life due to low accuracy.

§ The other factors related to the diagnosis of disease play more important role and levels of air pollution alone is not enough to create a sufficient classification model.

17

Thanks for listening.

18

2015/10/26 (Mon.)Chang Wei-Yuan @ MakeLab Lab [email protected]