![Page 1: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/1.jpg)
Noise and Outlier
DetectionBORUT SLUBAN
DATA MINING AND KNOWLEDGE DISCOVERY
![Page 2: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/2.jpg)
![Page 3: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/3.jpg)
Anomalies?
Errors in the data – noise
Animals of white color
Exceptions or Outliers
Herd of sheep
![Page 4: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/4.jpg)
Motivation
Noise in data negatively affect
data mining results.(Zhu et al., 2004)
False medical diagnosis (classification noise)
can have serious consequences
(Gamberger et al. 2003)
Outlier detection proved to be effective in detection of network intrusion and bank fraud.
(Aggarwal and Yu, 2001)
![Page 5: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/5.jpg)
Used for:
Improving machine learning performance
through cleaning of training data
Data understanding and knowledge expansion
by discovering potentially interesting
exceptional cases in data
Detecting noise and outliers
![Page 6: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/6.jpg)
Detecting noise and outliers
Nature
Follows certain patters
Adheres to the laws of physics
Is not random
![Page 7: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/7.jpg)
Errors and exceptions are:
Inconsistencies with common patterns
Great deviations from expected values
Hard to describe
Detecting noise and outliers
![Page 8: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/8.jpg)
Identify the “laws” of the data
Build models
Patterns and rules = “laws” of the data
Errors and exceptions
Do NOT obey the laws (models)
Detecting noise and outliers
![Page 9: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/9.jpg)
Classification noise filtering
Model the data
What can’t be modeled is considered noise
Can use any learning algorithm
(Brodley & Friedl 1999)
![Page 10: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/10.jpg)
Example Workflow
![Page 11: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/11.jpg)
Ensembles
Combine predictions of various models
To overcome weaknesses or bias of individual models
Averaging, Majority voting, Consensus voting, Ranking, etc.
![Page 12: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/12.jpg)
Example Workflows Ensembles of noise filters
![Page 13: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/13.jpg)
Example Workflows NoiseRank
![Page 14: Noise and Outlier Detection - IJSkt.ijs.si/petra_kralj/IPS_DM_1516/NoiseAndOutlierDetection.pdf · Motivation Noise in data negatively affect data mining results. (Zhu et al., 2004)](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f41799502e99a14dc7fb1fb/html5/thumbnails/14.jpg)
Try it out
Noise filtering using ensembles (with performance evaluation)
http://clowdflows.org/workflow/245/
NoiseRank
http://clowdflows.org/workflow/115/
Clowdflows:
Noise Handling
Orange, Weka classification
Performance evaluation
Need help or advice: [email protected]