content

3
iv TABLE OF CONTENTS ACKNOWLEDGEMENTS ......................................................................................................... II LIST OF TABLES ............................................................................................................. VII LIST OF FIGURES ............................................................................................................ IX ACRONYMS ...................................................................................................................... XI CHAPTER 1 INTRODUCTION .......................................................................................1 CHAPTER 2 REVIEW OF LITERATURE ......................................................................3 2.1 WHAT IS AN OUTLIER? ....................................................................................................... 3 2.2 HISTORY OF OUTLIERS ...................................................................................................... 4 2.3 IMPORTANCE OF DETECTING OUTLIERS .......................................................................... 5 2.4 CAUSES OF OUTLIERS ........................................................................................................ 6 2.4.1 OUTLIERS IN SURVEY DATA SETS........................................................................................... 7 2.4.2 NATURAL VARIATION .......................................................................................................... 9 2.4.3 CONTAMINATION ................................................................................................................. 9 2.5 EFFECTS OF OUTLIERS..................................................................................................... 10 2.5.1 DAMAGING EFFECTS OF OUTLIERS .................................................................................... 10 2.5.2 BENEFITS OF OUTLIERS IN THE DATA SET ......................................................................... 11 2.6 MASKING AND SWAMPING EFFECTS OF THE OUTLIERS ................................................. 11 2.6.1 MASKING EFFECT............................................................................................................... 12 2.6.2 SWAMPING EFFECT ............................................................................................................ 12 2.7 APPLICATIONS OF OUTLIER DETECTING TECHNIQUES................................................. 13 2.7.1 FRAUD DETECTION ............................................................................................................ 13 2.7.2 MEDICAL DATA.................................................................................................................. 13 2.7.3 COMMUNITY BASED DISEASES .......................................................................................... 14 2.7.4 SPORTS DATA ANALYSIS ................................................................................................... 14 2.7.5 DETECTING MEASUREMENT ERRORS ................................................................................ 14 2.8 PREVIOUS TECHNIQUES ................................................................................................... 15 2.8.1 TUKEYS METHOD (BOXPLOT) .......................................................................................... 17 2.8.2 METHOD BASED ON MEDCOUPLE ...................................................................................... 18 CHAPTER 3 SPLIT SAMPLE SKEWNESS .................................................................. 20 3.1 INTRODUCTION ................................................................................................................. 20

Upload: joanne-wong

Post on 09-Apr-2016

3 views

Category:

Documents


2 download

DESCRIPTION

content of thesis

TRANSCRIPT

Page 1: Content

iv

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ......................................................................................................... II

LIST OF TABLES ............................................................................................................. VII

LIST OF FIGURES ............................................................................................................ IX

ACRONYMS ...................................................................................................................... XI

CHAPTER 1 INTRODUCTION .......................................................................................1

CHAPTER 2 REVIEW OF LITERATURE ......................................................................3

2.1 WHAT IS AN OUTLIER? ....................................................................................................... 3

2.2 HISTORY OF OUTLIERS ...................................................................................................... 4

2.3 IMPORTANCE OF DETECTING OUTLIERS .......................................................................... 5

2.4 CAUSES OF OUTLIERS ........................................................................................................ 6

2.4.1 OUTLIERS IN SURVEY DATA SETS ........................................................................................... 7

2.4.2 NATURAL VARIATION .......................................................................................................... 9

2.4.3 CONTAMINATION ................................................................................................................. 9

2.5 EFFECTS OF OUTLIERS ..................................................................................................... 10

2.5.1 DAMAGING EFFECTS OF OUTLIERS .................................................................................... 10

2.5.2 BENEFITS OF OUTLIERS IN THE DATA SET ......................................................................... 11

2.6 MASKING AND SWAMPING EFFECTS OF THE OUTLIERS ................................................. 11

2.6.1 MASKING EFFECT............................................................................................................... 12

2.6.2 SWAMPING EFFECT ............................................................................................................ 12

2.7 APPLICATIONS OF OUTLIER DETECTING TECHNIQUES ................................................. 13

2.7.1 FRAUD DETECTION ............................................................................................................ 13

2.7.2 MEDICAL DATA.................................................................................................................. 13

2.7.3 COMMUNITY BASED DISEASES .......................................................................................... 14

2.7.4 SPORTS DATA ANALYSIS ................................................................................................... 14

2.7.5 DETECTING MEASUREMENT ERRORS ................................................................................ 14

2.8 PREVIOUS TECHNIQUES ................................................................................................... 15

2.8.1 TUKEY‟S METHOD (BOXPLOT) .......................................................................................... 17

2.8.2 METHOD BASED ON MEDCOUPLE ...................................................................................... 18

CHAPTER 3 SPLIT SAMPLE SKEWNESS .................................................................. 20

3.1 INTRODUCTION ................................................................................................................. 20

Page 2: Content

v

3.2 SKEWNESS ......................................................................................................................... 21

3.3 VARIOUS MEASURES OF SKEWNESS ................................................................................ 22

3.3.1 MOMENT BASED MEASURE OF SKEWNESS ....................................................................... 22

3.3.2 PEARSON SKEWNESS .......................................................................................................... 23

3.3.3 QUARTILE SKEWNESS ........................................................................................................ 23

3.3.4 OCTILE SKEWNESS ............................................................................................................. 24

3.3.5 MEDCOUPLE ....................................................................................................................... 24

3.4 SPLIT SAMPLE SKEWNESS (SSS)...................................................................................... 25

3.5 METHODOLOGY: BOOTSTRAP TESTS FOR SKEWNESS ......................................................... 28

3.6 POWER AND SIZE OF THE TEST ....................................................................................... 31

3.7 MERITS OF THE NEW TECHNIQUE .................................................................................. 40

CHAPTER 4 SPLIT SAMPLE SKEWNESS BASED BOX PLOTS ............................... 41

4.1 INTRODUCTION ................................................................................................................. 41

4.2 INTRODUCTION ................................................................................................................. 41

4.3 PROBLEM STATEMENT ..................................................................................................... 43

4.4 PROPOSED TECHNIQUE .................................................................................................... 44

4.4.1 CONSTRUCTION ................................................................................................................. 45

4.4.2 BENEFITS/ADVANTAGES OF SPLIT SAMPLE SKEWNESS ADJUSTED TECHNIQUE .............. 46

4.5 HYPOTHETICAL DATA EXAMPLE .................................................................................... 48

4.6 HYPOTHESIS AND METHODOLOGY ................................................................................. 48

4.7 THEORETICAL APPROACH ............................................................................................... 50

4.8 CONVENTIONAL APPROACH: BEST AND WORST CASE IN CONTEXT OF PERCENTAGE

OUTLIERS ...................................................................................................................................... 56

4.9 COMPARISON OF SSSBB TECHNIQUE WITH KIMBER’S APPROACH ............................. 58

4.9.1 COMPARISON IN SYMMETRIC DISTRIBUTION ........................................................................ 59

4.9.2 COMPARISON IN SKEWED DISTRIBUTIONS ............................................................................ 60

4.10 CONCLUSION ..................................................................................................................... 64

CHAPTER 5 MODIFIED HUBERT VANDERVIEREN BOXPLOT ............................. 65

5.1 INTRODUCTION ................................................................................................................. 65

5.2 PROBLEM STATEMENT ..................................................................................................... 66

5.3 MODIFIED HUBERT VANDERVIEREN BOXPLOT (MHVBP) ........................................... 69

5.4 CONSTRUCTION OF TECHNIQUE BY PROPOSED MODIFICATION .................................. 69

5.5 HYPOTHESIS AND METHODOLOGY ................................................................................. 70

5.6 THEORETICAL APPROACH AND SIMULATION STUDY .................................................... 71

5.7 SIZE OF TESTS ................................................................................................................... 72

5.8 POWER OF THE TEST ........................................................................................................ 73

4.11 CONVENTIONAL APPROACH: BEST AND WORST CASE IN CONTEXT OF PERCENTAGE

OUTLIERS ...................................................................................................................................... 79

Page 3: Content

vi

5.10 ARTIFICIAL OUTLIER EXAMPLE ..................................................................................... 80

5.11 CONCLUSION ..................................................................................................................... 82

CHAPTER 6 MEDCOUPLE BASED SPLIT SAMPLE SKEWNESS ADJUSTED

TECHNIQUE ..................................................................................................................... 83

6.1 INTRODUCTION ................................................................................................................. 83

6.2 PROPOSED MODIFICATION .............................................................................................. 84

6.3 MONTE CARLO SIMULATION STUDY .............................................................................. 85

6.4 COMPARISON OF FENCES PRODUCED BY MCSSSBB AND HVBP TECHNIQUES WITH

TRUE 95 PERCENT CENTRAL BOUNDARY ................................................................................... 85

6.5 CONVENTIONAL APPROACH: BEST AND WORST CASE IN CONTEXT OF PERCENTAGE

OUTLIERS ...................................................................................................................................... 93

6.6 CONCLUSION ..................................................................................................................... 95

CHAPTER 7 APPLICATIONS....................................................................................... 96

SUMMARY ...................................................................................................................................... 96

7.1 STOCK RETURN DATA SET ............................................................................................... 96

7.2 BABY BIRTH WEIGHT DATA ............................................................................................ 99

7.3 COMPARISON OF TUKEY’S TECHNIQUE AND SSSBB IN BABY BIRTH WEIGHT DATA

…………………………………………………………………………………………...101

7.4 COST BENEFIT ANALYSIS ............................................................................................... 103

CHAPTER 8 CONCLUSIONS AND RECOMMENDATIONS .................................... 105

8.1 CONCLUSIONS ................................................................................................................. 105

8.2.1 ADVANTAGES OF SPLIT SAMPLE SKEWNESS BASED BOXPLOT ................................... 106

8.3 ADVANTAGES OF THE MODIFIED HUBERT VANDERVIEREN BOXPLOT ...................... 106

8.4 RECOMMENDATIONS ...................................................................................................... 106

8.5.1 FUTURE WORK ............................................................................................................... 107

BIBLIOGRAPHY ................................................................................................................... 108

APPENDIX ....................................................................................................................... 112

PREVIOUS TECHNIQUES ............................................................................................. 112