서울시 미세먼지 데이터 분석
TRANSCRIPT
![Page 1: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/1.jpg)
Principles and Practice in Data
Mining2012314261 LEE DONG HEE
Seoul City Weather Data Analysis
2016. 12. 09.
Prof. Seo yuran
1
![Page 2: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/2.jpg)
INDEX 01PROBLEM
02ANALYSIS PROCESS
03CONCLUSION
2
![Page 3: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/3.jpg)
01PROBLEM
3
![Page 4: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/4.jpg)
01 | PROBLEM
4
![Page 5: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/5.jpg)
01 | PROBLEM
5
![Page 6: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/6.jpg)
01 | PROBLEM
1.1 BackgroundIn recent years, high concentration of local pollution has occurred due to regional characteristics.Therefore, it is necessary to analyze the cause by scientific reason.
1.2 PurposeThe relationship between find dust and meteorological factors is identified and formulatedthrough statistical techniques.And, a basis for the prediction of fine dust management in Seoul is provided.
6
![Page 7: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/7.jpg)
01 | PROBLEM
1.3 Data Source
ASOS(Automated Synopic
Oberving System)PM10
• Temperature• Wind Speed• Sunshine
…”59 variables”
• Fine dust
“1 variable”
▪ Area : Seoul City
▪ Period : 2010 - 2015
▪ Rows : 2190 (365 X 6)
▪ Columns : 60 (59 + 1)
7
![Page 8: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/8.jpg)
02ANALYSISPROCESS
8
![Page 9: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/9.jpg)
02ANALYSISPROCESS
1. Exploring Data 2. Refining Data3. Creating
Model& Verification
9
![Page 10: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/10.jpg)
02 | ANALYSIS PROCESS
2.1 Exploring DataBasic Statistic
MeanTemperature
MeanWind Speed Precipitation Mean
Relative Humidity Radiation Sunshine PM
NA 0 0 1320 0 13 1 8
MIN - 14.5 1.1 0 20.1 0.25 0 3.9
MEDIAN 14 2.5 1.5 60 11.57 7.2 41.05
MEAN 12.68 2.69 10.03 60.33 12.24 6.29 45.95
MAX 31.8 7.5 301.5 99.8 29.74 13.5 658.2
10
![Page 11: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/11.jpg)
02 | ANALYSIS PROCESS
2.1 Exploring Data
PM
Date 11
PM Trend Graph
![Page 12: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/12.jpg)
02 | ANALYSIS PROCESS
2.1 Exploring DataMissing Value : NA Outliers
NA
Precipitation
1320 ☞ 0
Radiation 13☞ 0
Sunshine 1☞ 0
PM 8 ☞ 0
All missing value is changed to ‘0’. Because ’NA’ means it has nothing value and this show that meteorological instrument do not observe anything.
Boxplot of PM
The outlier’s valueIs 658.2. And the next higher value I s 292.So, the outlier is changed to 300.Because generally the outlier is replace with upper limit of data.
12
![Page 13: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/13.jpg)
02 | ANALYSIS PROCESS
2.1 Exploring Data
13
Correlation Coefficient
![Page 14: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/14.jpg)
02 | ANALYSIS PROCESS
2.2 Refining DataAdd a variable : Degree(factor type)
Fine Dust Levels PM(µg/m3)
Good 0 ~ 30
Normal 31 ~ 80
Bad 81 ~ 150
Very Bad 151 ~
MeanTemperature
MeanWind Speed
PrecipitationMean
Relative Humidity
Radiation Sunshine
PM Degree
14
![Page 15: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/15.jpg)
02 | ANALYSIS PROCESS
2.2 Refining DataCreate Standardization data
Because the scales are different for each variable, you standardized the variables for accurate modeling.
Train Set : 80 % Test Set: 20 %
Separate Train Set and Test Set randomly from the dataset
• Train Set is used to learn the model• Test Set is used to evaluate the performance of the model which you created.
15
![Page 16: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/16.jpg)
02 | ANALYSIS PROCESS
2.3 Creating ModelVariable Selection
• This variables are selected among
60 variables from raw data, based on
the relevant paper about weather.
Therefore, general variable selection
way is not used to select specific
variables in this process.
MeanTemperature
MeanWind Speed
PrecipitationMean
Relative Humidity
Radiation Sunshine
PM Degree
16
![Page 17: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/17.jpg)
02 | ANALYSIS PROCESS
2.3 Creating ModelAnalysis Methods
PCAMultinomial
LogisticRegression
NeuralNetwork
17
![Page 18: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/18.jpg)
02 | ANALYSIS PROCESS
2.3 Creating ModelPCA
18
![Page 19: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/19.jpg)
02 | ANALYSIS PROCESS
2.3 Creating ModelPCA : Parallel Analysis
Parallel analysis suggests that the number of factors
= 319
![Page 20: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/20.jpg)
02 | ANALYSIS PROCESS
2.3 Creating ModelPCA : 3 Principal component
• PC1 = 0.08425922*Mean Temperature + (-0.01601653)*Mean Wind Speed + 0.36003954*Precipitation +0.52249043*Mean Relative Humidity + (-0.51271657)*Radiation + (-0.57196228)*Sunshine
• PC2 = (-0.7977584)*Mean Temperature + 0.2431814*Mean Wind Speed + (-0.1878170)*Precipitation +(-0.2777519)*Mean Relative Humidity + (-0.4220239)*Radiation + (-0.1179781)*Sunshine
• PC3 = 0.080656658*Mean Temperature + 0.899196707*Mean Wind Speed + 0.387333525*Precipitation +0.009005798*Mean Relative Humidity + 0.160962973*Radiation + 0.094458149*Sunshine
20
![Page 21: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/21.jpg)
02 | ANALYSIS PROCESS
2.3 Creating Modela. Multinomial Logistic Regression
21
![Page 22: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/22.jpg)
02 | ANALYSIS PROCESS
2.3 Creating Modelb. Neural Network (Basic Variables)
22
![Page 23: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/23.jpg)
02 | ANALYSIS PROCESS
2.3 Creating Modelc. Neural Network (Principal Component)
23
![Page 24: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/24.jpg)
02 | ANALYSIS PROCESS
2.3 VerificationConfusion Matrix
62.01%
Accuracy
62.92%
68.27%
MultinomialLogistic Regression
Neural Network(Basic Variable)
Neural Network(Principal Component)
24
![Page 25: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/25.jpg)
03CONCLUSION
25
![Page 26: 서울시 미세먼지 데이터 분석](https://reader034.vdocuments.site/reader034/viewer/2022051300/5877a7f01a28ab826e8b6613/html5/thumbnails/26.jpg)
03 | CONCLUSION
Correlation coefficient analysis showed that the correlation coefficient between PM10 concentration and meteorological factors ranged from -0.200 to 0.058.
As the result of model, principal component has higher prediction accuracy than basic variable. And, neural network has higher prediction accuracy than multinomial logistic regression.
26