content
DESCRIPTION
content of thesisTRANSCRIPT
iv
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ......................................................................................................... II
LIST OF TABLES ............................................................................................................. VII
LIST OF FIGURES ............................................................................................................ IX
ACRONYMS ...................................................................................................................... XI
CHAPTER 1 INTRODUCTION .......................................................................................1
CHAPTER 2 REVIEW OF LITERATURE ......................................................................3
2.1 WHAT IS AN OUTLIER? ....................................................................................................... 3
2.2 HISTORY OF OUTLIERS ...................................................................................................... 4
2.3 IMPORTANCE OF DETECTING OUTLIERS .......................................................................... 5
2.4 CAUSES OF OUTLIERS ........................................................................................................ 6
2.4.1 OUTLIERS IN SURVEY DATA SETS ........................................................................................... 7
2.4.2 NATURAL VARIATION .......................................................................................................... 9
2.4.3 CONTAMINATION ................................................................................................................. 9
2.5 EFFECTS OF OUTLIERS ..................................................................................................... 10
2.5.1 DAMAGING EFFECTS OF OUTLIERS .................................................................................... 10
2.5.2 BENEFITS OF OUTLIERS IN THE DATA SET ......................................................................... 11
2.6 MASKING AND SWAMPING EFFECTS OF THE OUTLIERS ................................................. 11
2.6.1 MASKING EFFECT............................................................................................................... 12
2.6.2 SWAMPING EFFECT ............................................................................................................ 12
2.7 APPLICATIONS OF OUTLIER DETECTING TECHNIQUES ................................................. 13
2.7.1 FRAUD DETECTION ............................................................................................................ 13
2.7.2 MEDICAL DATA.................................................................................................................. 13
2.7.3 COMMUNITY BASED DISEASES .......................................................................................... 14
2.7.4 SPORTS DATA ANALYSIS ................................................................................................... 14
2.7.5 DETECTING MEASUREMENT ERRORS ................................................................................ 14
2.8 PREVIOUS TECHNIQUES ................................................................................................... 15
2.8.1 TUKEY‟S METHOD (BOXPLOT) .......................................................................................... 17
2.8.2 METHOD BASED ON MEDCOUPLE ...................................................................................... 18
CHAPTER 3 SPLIT SAMPLE SKEWNESS .................................................................. 20
3.1 INTRODUCTION ................................................................................................................. 20
v
3.2 SKEWNESS ......................................................................................................................... 21
3.3 VARIOUS MEASURES OF SKEWNESS ................................................................................ 22
3.3.1 MOMENT BASED MEASURE OF SKEWNESS ....................................................................... 22
3.3.2 PEARSON SKEWNESS .......................................................................................................... 23
3.3.3 QUARTILE SKEWNESS ........................................................................................................ 23
3.3.4 OCTILE SKEWNESS ............................................................................................................. 24
3.3.5 MEDCOUPLE ....................................................................................................................... 24
3.4 SPLIT SAMPLE SKEWNESS (SSS)...................................................................................... 25
3.5 METHODOLOGY: BOOTSTRAP TESTS FOR SKEWNESS ......................................................... 28
3.6 POWER AND SIZE OF THE TEST ....................................................................................... 31
3.7 MERITS OF THE NEW TECHNIQUE .................................................................................. 40
CHAPTER 4 SPLIT SAMPLE SKEWNESS BASED BOX PLOTS ............................... 41
4.1 INTRODUCTION ................................................................................................................. 41
4.2 INTRODUCTION ................................................................................................................. 41
4.3 PROBLEM STATEMENT ..................................................................................................... 43
4.4 PROPOSED TECHNIQUE .................................................................................................... 44
4.4.1 CONSTRUCTION ................................................................................................................. 45
4.4.2 BENEFITS/ADVANTAGES OF SPLIT SAMPLE SKEWNESS ADJUSTED TECHNIQUE .............. 46
4.5 HYPOTHETICAL DATA EXAMPLE .................................................................................... 48
4.6 HYPOTHESIS AND METHODOLOGY ................................................................................. 48
4.7 THEORETICAL APPROACH ............................................................................................... 50
4.8 CONVENTIONAL APPROACH: BEST AND WORST CASE IN CONTEXT OF PERCENTAGE
OUTLIERS ...................................................................................................................................... 56
4.9 COMPARISON OF SSSBB TECHNIQUE WITH KIMBER’S APPROACH ............................. 58
4.9.1 COMPARISON IN SYMMETRIC DISTRIBUTION ........................................................................ 59
4.9.2 COMPARISON IN SKEWED DISTRIBUTIONS ............................................................................ 60
4.10 CONCLUSION ..................................................................................................................... 64
CHAPTER 5 MODIFIED HUBERT VANDERVIEREN BOXPLOT ............................. 65
5.1 INTRODUCTION ................................................................................................................. 65
5.2 PROBLEM STATEMENT ..................................................................................................... 66
5.3 MODIFIED HUBERT VANDERVIEREN BOXPLOT (MHVBP) ........................................... 69
5.4 CONSTRUCTION OF TECHNIQUE BY PROPOSED MODIFICATION .................................. 69
5.5 HYPOTHESIS AND METHODOLOGY ................................................................................. 70
5.6 THEORETICAL APPROACH AND SIMULATION STUDY .................................................... 71
5.7 SIZE OF TESTS ................................................................................................................... 72
5.8 POWER OF THE TEST ........................................................................................................ 73
4.11 CONVENTIONAL APPROACH: BEST AND WORST CASE IN CONTEXT OF PERCENTAGE
OUTLIERS ...................................................................................................................................... 79
vi
5.10 ARTIFICIAL OUTLIER EXAMPLE ..................................................................................... 80
5.11 CONCLUSION ..................................................................................................................... 82
CHAPTER 6 MEDCOUPLE BASED SPLIT SAMPLE SKEWNESS ADJUSTED
TECHNIQUE ..................................................................................................................... 83
6.1 INTRODUCTION ................................................................................................................. 83
6.2 PROPOSED MODIFICATION .............................................................................................. 84
6.3 MONTE CARLO SIMULATION STUDY .............................................................................. 85
6.4 COMPARISON OF FENCES PRODUCED BY MCSSSBB AND HVBP TECHNIQUES WITH
TRUE 95 PERCENT CENTRAL BOUNDARY ................................................................................... 85
6.5 CONVENTIONAL APPROACH: BEST AND WORST CASE IN CONTEXT OF PERCENTAGE
OUTLIERS ...................................................................................................................................... 93
6.6 CONCLUSION ..................................................................................................................... 95
CHAPTER 7 APPLICATIONS....................................................................................... 96
SUMMARY ...................................................................................................................................... 96
7.1 STOCK RETURN DATA SET ............................................................................................... 96
7.2 BABY BIRTH WEIGHT DATA ............................................................................................ 99
7.3 COMPARISON OF TUKEY’S TECHNIQUE AND SSSBB IN BABY BIRTH WEIGHT DATA
…………………………………………………………………………………………...101
7.4 COST BENEFIT ANALYSIS ............................................................................................... 103
CHAPTER 8 CONCLUSIONS AND RECOMMENDATIONS .................................... 105
8.1 CONCLUSIONS ................................................................................................................. 105
8.2.1 ADVANTAGES OF SPLIT SAMPLE SKEWNESS BASED BOXPLOT ................................... 106
8.3 ADVANTAGES OF THE MODIFIED HUBERT VANDERVIEREN BOXPLOT ...................... 106
8.4 RECOMMENDATIONS ...................................................................................................... 106
8.5.1 FUTURE WORK ............................................................................................................... 107
BIBLIOGRAPHY ................................................................................................................... 108
APPENDIX ....................................................................................................................... 112
PREVIOUS TECHNIQUES ............................................................................................. 112