introduction to the theory of measure and some concepts of...

120
1 Introduction to the theory of measure and some concepts of descriptive statistics and inferential statistics Lecturer: Giuseppe Santucci (Some slides taken from slideshare.net)

Upload: others

Post on 11-Jan-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

1

Introduction to the theory of measure and some concepts of descriptive statistics

and inferential statistics

Lecturer: Giuseppe Santucci

(Some slides taken from slideshare.net)

Page 2: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

2

Why do we measure?

• We cannot govern what we cannot measure (cf. De Marco, 1982)

• Measures, in the field of Software Engineering, are meant for:– Verifying how far certain parameters of quality are from the

reference values – Identifying deviations from temporal and resource allocation

planning– Identifying productivity indicators– Validating the effect of strategies aimed at improving the

development process (quality, productivity, planning, cost control)

• We measure for monitoring and taking decisions

Page 3: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

3

Pragmatically

Define quality requirements

Inplicit or explicit needs

SW (developed or mainteined)

Select metric Define

referece levels Define evaluation

criteria

Take the measure

Assessment

Check with reference

level

Page 4: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

4

Measurement process

• Rating: definition of reference levels

• Metrics provide quantitative values which do not inherently correspond to quality judgments

• We have to map quantitative data to a qualitative scale

Organization’sresponsibility

Page 5: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

5

Basics of mesure theory

Page 6: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

6

Definition of a measure scale

• A measure scale consists of:

– a set of empirical relations S

• A is taller than B

– a set if numerical relations R

• 180 cm > 160 cm

– a mapping between S and R

• A is taller than B of 20 cm

Page 7: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

7

Measurement Scales

• To measure different variables, we have five measurement scales:

• Nominal Scale

• Ordinal Scale

• Interval Scale

• Ratio Scale

• Absolute Scale

Page 8: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

8

Nominal Scale

• Nominal scale classifies persons or objects into two or more categories

• Members of a category have a common set of characteristics, and each member may only belong to one category

• Other names: categorical, discontinuous, dichotomous (only two categories)

Page 9: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

9

Nominal Scale

• A pre-defined non ordered set of distinct values

– E.g., possible types of programming errors (syntactical, semantic, etc.) without defining a hierarchy of worseness of errors

– Possible operators {= , !=}

– If we use this scale the average value makes sense only if we want to check the frequency by which certain measures fall into certain categories

Page 10: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

10

Ordinal Scale

• Ordinal variables allow us to rank the order of the items we measure in terms of which has less and which has more of the quality represented by the variable, but still they do not allow us to say "how much more“

• Possible operators {=, !=, >, <}

• Example: Ranking students (A,B,C,D), CMMI levels

Page 11: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

11

Ordinal Scale

• Ordinal scale classifies subjects and rank them in terms of how they possess the characteristic of interest

• Members are placed in terms of highest to lowest, or most to least

• Ordinal scales do not, however, state how much difference there is between two adjacent ranks

• On some scale it is assumed that the distance btw the ranks is equal but we have to be careful if we want to compute and use the average

• E.g. Likert scale on a experimental therapy– 1: recovered; 2: light complications; 3: medium complications; 4:

hard complications; 5: death– 50 recovered and 50 death medium complications

Page 12: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

12

Interval Scale

• Interval scales allow us to rank the order of the items that are measured, and to quantify and compare the sizes of differences between them

• For example: – students performance on a spelling test a score of 16 will be

higher than 14 and lower than 18 and the difference between them is 2 points (equal intervals)

• Interval scales normally have an arbitrary minimum and maximum point – e.g. 0 to 20

• A score of 0 in a spelling test does not represent an absence of spelling knowledge, nor does a score of 20 represent perfect spelling knowledge

Page 13: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

13

Interval Scale

• Interval scale requires a precise definition of the unit of measure to be used

• An example of interval scale is the temperature in C° or F°

• Integer or real values• Possible operators {=, !=, <, >, +, -}• The presence of an arbitral zero implies that you

cannot compare the magnitude of two values, e.g., 80 °F is NOT four times hotter than 20° F, while you can say that there are 60° of difference

Page 14: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

14

Ratio Scale

• Very similar to interval scale• It has all the properties of interval scales, and it has an

absolute (not arbitrary) zero point • Height, weight, speed, and temperature in Kelvin degree

are examples of ratio scales • Possible operators {=, !=, <, >, +, -, *, /}• For example, we can say that a person who runs a mile in 5

minutes is twice faster than a person who runs the mile in 10 minutes

• Because ratio scales are often used in physical measurements (where absolute zero exists), they are not often employed in educational research and testing

• Ratio s. Interval s. Ordinal s. Nominal s.

Page 15: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

15

Absolute scale

• In this scale we count the actual occurences of entities

– E.g. Lines of Code (LOC) constituting a program

Page 16: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

16

Choosing a scale

• The choice of a scale depends on the attribute to be measured

• The scale chosen must correspond to a set of relations which are valid for the attribute

• Numerical values associated with the attribute must correspond to the actual empirical relations between the measured entities

• For example, it is not possible to determine if a product is reliable twice or three times, etc. we will choose either an ordinal or a nominal scale

Page 17: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

17

Synopsis of measure scales (1)

Page 18: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

18

Types of measures

Page 19: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

19

Ratio and proportion

• Ratio: The result of a division between two values that come from two different and disjoint domains. The result is multiplied by 100– E.g. , (Males/females) * 100

– It can have values above and below 100

– E.g. , (LOC/lines of comments) * 100

• Proportion: The result of a division between two values where the dividend contributes to the divisor, e.g., a/(a+b)– E.g., n. of satisfied users/n. of users

– It can assume values between 0 and 1

– Often the divisor is composed of various elements for which we want to compute the proportions

• E.g. a+b+c=N; a/N + b/N + c/N = 1

– A fraction is a proportion between real values

Page 20: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

20

Percentage

• A proportion or fraction expressed by normalizing the divisor to 100– E.g., defects in the requirements were 15%, in

the design 25%, in the code 60%

– The percentage must be used by indicating the involved values

– The use of percentage must be avoided when values are less than 30-50

– E.g. defects were 25% in the requirements, 15% in the design, and 60% in the code

– E.g. defects in the project were 20, 5 in the requirements, 3 in the design, and 12 in the code

Page 21: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

21

Rate

• Identifies a value associated with the dynamics of a phenomenon

• Typically it measures the change of a quantity (y) wrt the unity of another quantity (x) on which it depends

• Usually x is the time• E.g., crude rate of births in a certain year

– (N/P)*k• N: births in the observed year• P: population (computed in the middle of the year)• K: a constant, typically 1000

Page 22: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

22

Rate

• Elements of the divisor can become or produce elements of the dividend

– This means there is a “risk exposure”

• It is crude because births are not produced by all the population but only by fertile women

• We can improve the previous formula by using P’ instead of P, where P’ is the number of women btw 15 and 44 years old (not crude)

(N/P')*k

Page 23: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

Software Engineering example

• Defect rate = (D/OFE) *k dove

– D=Defect observed within the observation period

– OFE =Opportunities For Error:

• Software example: Defect rate: (D/KLOC)* 1000 dove

• D=Defect observed within the observation period (e.g., one year)

• It is a "crude“ rate : KLOC do not coincide with OFE

– One defect may rise from more than one LOC

– One LOC may generate more than one error

23

Page 24: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

24

Definition, working definition, and measures

• Before to enter in the scope of the various measures and reference systems we recall some basic notions of the Theory of Measure

• Let’s use an example based on the following statement“the more rigorous the final part of the sw development

process the higher the quality of the sw released to the customer”

• In order to accept/reject such statement we have to better define certain concepts:– Development process: requirements analysis, design,, …,

integration,…., acceptance tests– Final part of the development process: integration and

associated testing

Page 25: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

25

Definition, working definition, and measures

• Rigorous: that adheres to the process documentation (quality manual)

– This is still vague, we need some indicators:

• E.g., if there is an inspection of the code we can use the working

definition of: the percentage of code actually inspected

• For the quality of the inspection we can use a working definition

based on a Likert scale with 5 values

– 1: low quality, … 5: high quality

• Testing rigorousness could be associated with the working definition

of the percentage of tested LOC

• Testing effectiveness could be associated with the working definition

of the number of removed defects per KLOC

– Quality of released software: number errors per KLOC discovered

during the system testing

– Working definitions can be debatable, however they are

– not ambiguous and

– they can be measured

Page 26: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

26

Definition, working definition, and measures

• Now we can rephrase the previous statement through the following hypotheses:

1. The greater the percentage of tested KLOC, the lesser the number of errors per KLOC discovered during the system testing

2. The greater the effectiveness of the inspection the lesser is the number of errors per KLOC discovered during the system testing

3. The greater the efficacy of tests, in terms of discovered errors, the lesser the number of errors per KLOC discovered during the system test

Page 27: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

27

...Definition, working definition, and measures

• The example shows the importance of measures and the need of different levels of abstraction

Theory

Statements

Hypotheses

Data analysis

Concepts

Definitions

Working definitions

Measures

Asbtract level

Empirical level

A set of statements related to concepts

Are “formalized” through definitions and imply one or more working definitions

In order to be validated we need measures formalized by working definitions

Final part of the processaimed at validating the theory. It is substantiated by measuring real world entities

Page 28: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

28

Definition, working definition, and measures

• In order to validate the theory previously formulated we need to:

– introduce a unit of analysis e.g., component, project, etc.

– validate the chosen indicators, i.e., means to collect and interpret measurements

– perform statistical analysis in order to validate the hypotheses e.g., analysis of the variance

Page 29: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

29

Definition, working definition, and measures

• For example, we want to validate our theory through the hypothesis 1.

• We have obtained the following measurements through a unit of analysis consisting of 9 experiments (3 with 50%, 3 with 70%, and 3 with 90%)– Percentage of testes LOC : 50%, 70%, and 90%

– Errors discovered during system testing: 20/KLOC, 15/KLOC, and 12/KLOC respectively

Page 30: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

30

Definition, working definition, and measures

• Without a correct analysis of data e.g. analysis of variance such as ANOVA, we cannot be sure of the statistic significance of results

• For example if 20 is the average of {19 20 21}, 15 of {15 15 15}, and 12 of {11 12 13} we would feel safe

• On the contrary if 20 is the average of {5 5 50}, 15 of {1 4 40} and 12 of {3 3 30}…

• We will come back to this later

Page 31: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

31

Basics of descriptive statistics (recall of main concepts)

Page 32: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

32

Mean, variance, and standard deviation

• Consider a population of n known elements on which we want to perform a measurement

• E.g., the age of students in this class {x1,…,xn}

• We define the following parameters:

– Mean m= (x1+ x2+... +xn)/n

– Variance var=[(x1-µ)2+ (x2-µ)2 +...(xn-µ)2]/n

– Standard deviation s=var1/2

• Usually the variance is indicated by s2

Page 33: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

33

Further observations on data

• Median

• Mode

• Percentile (Quartile)

• Frequency distribution

Page 34: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

34

Percentile, quartile

• A percentile (or centile) is the value of a variable below which a

certain percent of observations fall

• So the 20th percentile is the value (or score) below which 20 percent

of the observations may be found

• A quartile is any of the three values which divide the sorted data set

into four equal parts, so that each part represents one fourth of the

sampled population

first decile

first quartile

median (second quartile)

third quartile

Page 35: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

35

Mean

• Fancy Formula

– µ = Xi/n

Page 36: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

36

Mean

Sample data:

98cm

76cm

82cm

54cm

90cm

How to calculate:

98+76+82+54+90 = 400cm

400cm/5 = 80cm

Page 37: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

37

Median

• It’s the second quartile• It’s the value at the middle of the ordered

distribution, such that the 50% of the elements have a value equal or less than it, and the other 50% has a greater value

• The median is the middle data point in an ordered set

• In order to compute the median we need a scale which is at least an ordinal scale

• To determine the median, sort the data from smallest to largest and find the middle data point

Page 38: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

38

Median

Sample data:

98cm

76cm

82cm

54cm

90cm

Page 39: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

39

Median

Sample data:

98cm

76cm

82cm

54cm

90cm

Rearranged Data:

54cm

76cm

82cm

90cm

98cm

Page 40: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

40

Median

Sample data:

98cm

76cm

82cm

54cm

90cm

Rearranged Data:

54cm

76cm

82cm

90cm

98cm

Page 41: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

41

Median

• If there is an even number of data, there will be two middle points.

• To find the median, take the average of those two points.

Page 42: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

42

Median

Sample Data:

4ml

8ml

12ml

2ml

Page 43: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

43

Median

Sample Data:

4ml

8ml

12ml

2ml

Rearranged Data:

2ml

4ml

8ml

12ml

Page 44: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

44

Median

Sample Data:

4ml

8ml

12ml

2ml

Rearranged Data:

2ml

4ml

8ml

12ml

4 + 8 = 12ml

12/2 = 6ml

Page 45: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

45

Mode

• The mode is the most frequently occurring data point.

• To find the mode, arrange the data from smallest to largest, and then determine which amount occurs most often.

Page 46: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

46

Mode

Sample Data:

20g 23g

30g 30g

22g 27g

25g 20g

23g 24g

23g 25g

20g 23g

Page 47: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

47

Mode

Sample Data:

20g 23g

30g 30g

22g 27g

25g 20g

23g 24g

23g 25g

20g 23g

Rearranged Data:

20g 20g 20g

22g

23g 23g 23g 23g

24g

25g 25g

27g

30g 30g

Page 48: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

48

Range

• The range is the distance between the smallest and largest data point.

• To calculate, determine the smallest data point and the largest data point, then subtract the smallest from the largest.

Page 49: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

49

Range

Sample data:

98cm

76cm

82cm

54cm

90cm

Rearranged Data:

54cm

76cm

82cm

90cm

98cm

Page 50: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

50

Range

Sample data:

98cm

76cm

82cm

54cm

90cm

Rearranged Data:

54cm

76cm

82cm

90cm

98cm

98cm – 54cm = 44cm

Page 51: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

51

Frequency distribution• It is obtained by ordering the values observed and by indicating, for

each of them, the corresponding frequency

• Typically the n elements follow a Normal distribution (or Gaussian)

• The analysis techniques for estimating the distribution of a sample is

beyond the aims of this course

Page 52: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

52

Normal distributionP

mm+1.96s

68.26% of data

95% of data

m+sm-sm-1.96s

X

f(x)=ae-(x-b)2/c2

Usually the Gaussian is put at the center of the Y axis by putting X=X-µ

Page 53: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

53

Some examples of normal distributions

Page 54: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

54

Exercise on statistical characterization of a dataset

Page 55: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

55

Reference parameters

• Let S a multi-set of n values belonging to a scale at least ordinal

• Let x1…xn a partial order on such values• In order to characterize such data we can compute:

– The mean µ– The standard deviation s and the variance s2

– The maximum and the minimum value x1 and xn

– The median Me (Q2)– The first and the third quartile: Q1 and Q3– The mode– The frequency distribution

• You can compute all this values with the help of Excel

Page 56: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

56

Mean, standard deviation, variance, and median

• Mean, standard deviation, and variance computation is banal– µ= (x1+ x2+... +xn)/n– s2=[(x1-µ)2+ (x2-µ)2 +...(xn-µ)2]/n– s=(s2)1/2

• If collected data are odd then the median is the number which divides into two parts the ordered list of data:– 1 3 7 9 12 25 77 : Me = 9

• If the collected data are even then the median is the mean of the two central numbers:– 1 3 7 9 12 25 77 99 : Me = (9+12)/2=10.5

Page 57: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

57

The kth percentile

• The index of the kth percentile is given by: Ik = (n+1)*k /100

• From this index you can compute the exact number by taking the index of the two integers before and after Ik

• E.g., n=14. Let’s compute the 23th percentile– I23 = (14+1)*23 /100 = 3.45– The value of the 23th percentile is between the 3rd and

the 4th value (x3 and x4)– Numerically its value is x3 + (x4 – x3) * 0.45 (linear

interpolation)

Page 58: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

58

Example

• Given the following n=10 values

7 7 10 15 23 27 29 35 47 99

– Already ordered for the sake of simplicity

• Compute the mean, the 1st quartile, the median, and the 3rd quartile

Page 59: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

59

Example

• Given the following n=10 values 7 7 10 15 23 27 29 35 47 99– Already ordered for the sake of simplicity

• Compute the mean, the 1st quartile, the median, and the 3rd quartile

• The mean:µ=(7+7+10+15+23+27+29+35+47+99)/10=299/10=29.9

• The index of the 1st quartile Q1 is:– I25 = (10+1)*25 /100 = 2.75 --> Q1 is between the 2nd and the 3rd values

• Hence, Q1= x2 + (x3 – x2) * 0.75 = 7+(10-7) * 0.75 =9.25

• The median Me (or Q2) is (23+27)/2=25

• The index of the 3rd quartile Q3 is:– I75 = (10+1)*75 /100 = 8.25 -> Q3 is between the 8th and 9th values

• Hence Q3= x8 + (x9 – x8) * 0.25 = 35+(47-35) * 0.25 =38

Page 60: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

60

Graphical representation of parameters : Boxplot

Page 61: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

A more interesting example

• 800 SE grades

• Mean= 24,9

• Are you happy with that?

61

Page 62: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

62

Quality of a measure

Page 63: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

63

Reliability of a measure

• Reliability is the consistency of your measurement– The degree to which an instrument measures the same

way each time it is used under the same condition with the same subjects

– It is the repeatability of your measurement – A measure is considered reliable if a person's score on the

same test given twice is similar– It is important to remember that reliability is not

measured, it is estimated– Typically you can characterize this quality aspect by

analyzing the variance s2 of repeated measures of the same value

– The smaller s2 the more reliable the measure

Page 64: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

64

Validity of a measure

• Validity is the strength of our conclusions, inferences or propositions

• Is the measure measuring what we actually are looking for?

• The best available approximation to the truth or falsity of a given inference, proposition or conclusion. Cook and Campbell (1979)

• In short, were we right?

• E.g. We want measure the comprehension of my classes• We can count the number of questions and use it as an indicator

• No questions means full understanding?

• For more concrete measure it coincides with accuracy– E.g., weight, volume

Page 65: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

65

Reliable and not

validValid and not

reliable

Reliable and

valid

Reliability and validity of a measure

Page 66: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

66

Errors in measuring• The result of a measure is a real number M which should

capture the true value T of the phenomenon under analysis

• Experiences indicate that if we perform more measures of

the same quantity rarely we obtain equal values

– The measured values (M) are always different from the true value T

• The difference btw the measured value and the true one is

called total error (ET)

M = T + ET

MeasureTrue value

Total error

Page 67: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

67

Errors in measuring

• By performing a measure we cannot determine with certainty the true value of the measured quantity, we produce an estimation

– We have to consider the types of error in measuring

ET = Esystematic+ Erandom

They are typically present in statistical methods

Esystematic

Erandom

Influences validity

Influences reliability

Page 68: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

68

Errors in measuring• Systematic errors influence validity• They occur constantly• E.g. , a scale which configuration was wrong and adds always 1kg

more than the true weight

– measure= T + 1kg + random variation : M= T + Es + Er• The measure is not valid

• If we assume that there can be only Er we have : M= T + Er– If Er is really due to a random event its contribution on the average

can be ignored (expected value E(Er)=0) – The mean of an infinite number of measures (observations) is E(M)=T

hence the measure is valid– A technique that exploits this principle is to repeat the measure N

times and compute the mean

Page 69: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

69

Errors in measuring

• What is this effect of a random error on the reliability?• Intuitively, the smaller the error the less the influence

M = T + Er var(M) = var(T) + var(Er)

• The reliability of a measure is the ratio between the variance of the measured quantity and the variance of the metric

rm=var(T)/var(M) = [var(M)-var(Er)]/var(M)= 1- [var(Er)/var(M)]

• Reliability value is between 1 and 0 (1 is the best value)

• In summary– Systematic errors influence validity– Random errors influence reliability

Page 70: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

70

Evaluation of reliability

• We can evaluate reliability in various ways• Reliability is defined as var(T)/var(M)• In the context of software engineering we use the double test• The idea is to repeat many times two measurements

• M1 = T + Er1• M2 = T + Er2

• And compute the correlation between the various measures which gives us the reliability of the measure

– rm = Correlation(M1,M2)= var(T)/var(M)

• Software engineering automatic measurements have reliability 1: this method makes sense only for manual activities such as inspections

• For example two persons can perform the same inspection e.g., count deviations from standard coding for a certain Java class using a checklist. By repeating this pairwise inspection many times we can compute the reliability of the inspecting method (or of the checklist)

Page 71: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

71

Correlation

• Indicates if a relationship btw two variables holds

• The most popular correlation is Pearson’s which can have values btw -1 (negative correlation) and +1 (positive correlation)

• Only for linear relations

Page 72: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

Correlation

72

Page 73: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

Correlation with Excel

73

Page 74: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

74

Inferential statistics

Page 75: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

75

Reference parameters

• We want to analyze a population of M elements (M is unknown) through a sample of N elements {x1, … ,xN}.

• We identify the following parameters– Mean (sample)µ= (x1+ x2+... +xN)/n– Variance (sample) var=[(x1-µ)2+ (x2-µ)2 +...(xN-µ)2]/(n-1)– Standard deviation (sample) s=var1/2

– Percentile /median / mode

• These parameters are random variables which value depends on the causality of the sample

• Typically (hopefully) the elements of the sample have a normal distribution (Gaussian, bell-shaped) and we can perform estimation

Page 76: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

76

The problem

• We work with inferential statistics (vs. descriptive)

• We want to infer properties by using a sample of the data

• The statistical characterization of our sample e.g., the mean, is different from the actual data’s one– The greater the size of the sample the lower the

difference

• What is the trend of this difference?

• Inferential statistics allows us to estimate the error

Page 77: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

77

Confidence interval

• Under the assumption of a normal distribution we can estimate the probability that the mean of a sample of n elements differs from the actual mean more than a certain quantity

• Such difference is related to the probability of error that we can tolerate

– The error probability is proportional to the standard deviation of the sample and inversely proportional to the size of the sample

Page 78: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

78

Confidence interval• A confidence interval (CI) is a particular kind of interval estimate of a

population parameter

• Instead of estimating the parameter by a single value, we use an interval that is likely to include such a parameter

• Confidence intervals are used to indicate the reliability of an estimate

• How likely the interval is to contain the parameter is determined by the confidence level or confidence coefficient

• Increasing the desired confidence level will widen the confidence interval

• A confidence interval is always qualified by a particular confidence level, usually expressed as a percentage

– E.g., 95% confidence interval

• The end points of the confidence interval are referred to as confidence limits

Page 79: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

79

Confidence interval• If the value of a parameter using a sample is X, with

confidence interval [a,b] at confidence level P, then any value outside the interval [a,b] will be significantly different from X with a significance level α = 1 − P

• A typical value for P is 95%

• For P=95% the mean of the sample will differ from the true mean of a quantity that can be calculated by the following formulas

– 2.77*s/N1/2 if N=5

– 2.26*s/N1/2 if N=10

– 2.09*s/N1/2 if N=40

– 1.96*s/N1/2 if N ”large" (implemented in Excel)

Page 80: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

80

Confidence interval95% of observed data

m-1.96 s m+1.96 sm

1.96*s/N1/2 if N is large

Page 81: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

81

Example

• We have collected 10 exam grades 26, 21, 29, 26, 21, 28, 27, 26, 29, 272.26*s/N1/2 (N=10)

• µ= 26 • s= 2.87• 2.26*s/101/2 = 2.05• The true mean (26.29), with a confidence level of 95%, falls within [26-

2.05, 26+2.05] = [23.95, 28.05]

• If we collect 20 scores• µ= 26.75 • s= 3.02• 2.26*s/201/2 =1.41• The true mean (26.29), with a confidence level of 95%, falls within [26.75-

1.41, 26.75+1.41] = [25.34, 28.16]

Page 82: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

82

Hypotheses verification

Page 83: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

83

Hypotheses verification

• Often we need to compare different repeated measures – e.g., results coming from different methods

• We can perform appropriate statistic tests• A statistic test consist of challenging the hypothesis

that the means of different samples are the same• The hypothesis that all true means are equal indicates

that we assume that all observed differences are random– This hypothesis is called null hypothesis

• The test is performed by fixing a priori the probability of having an error (α)

Page 84: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

84

Hypotheses verification

• A statistical hypothesis is a statement on the distribution of one or more random variables

• It is indicated by the letter H• We pose the following question

– Given two samples of the same measure, Ca and Cb, which is the probability that they come from the same population?

• We compare two alternative opposite hypotheses– H0 (null hypothesis).

H0: a = b• is the parameter of interest e.g., the mean

– H1 (alternative hypothesis). H1: a != b

• With a value for the error probability (α)• The typical limit value for α is 0.05:

– α> 0.05 -> we accept H0– α<=0.05 -> we accept H1

• α is the probability of rejecting H0 when it is true

Page 85: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

85

Example

• We have developed two interfaces for the same application. We want to understand which one is the perceived as the best one by the users

• We interview 7 users who have used prototype A and 7 users that have used prototype B

• We analyze the answers to the questions which are associated with a ratio scale from 1 to 6 (1 low degree of satisfaction, 6 high degree of satisfaction)

• We observe the following results

286,37/)2661161(. am

143,37/)1426135(. bm

Page 86: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

86

Two sample means

• Do the samples come from the same population?

• A rough answer using the confidence interval

m1

m2

m1

m2m1

m2

Two means

coming from two

samples

Two confidence intervals

that likely share

the same mean

Two

confidence

intervals

that likely

DO NOT

share the

same mean

Page 87: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

87

Example

• Is the difference significant?

• Performing a statistical test implies the risk of making errors

• In practice we have two types of errors (and their associated probability)

– Reject H0 when it is true, 1st type error

– Accept H0 when it is false, 2nd type error

• Additionally we define

– Protection (1- α): probability to accept H0 when it is true

Page 88: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

88

T-test• A technique that allows to compare the difference between two

mean values• It exploits a comparison between means and standard deviation

• The actual t value can be compared with the distribution of t assuming that a and b come from the same population– It is expressed by a table that indicates the (α) that the observed

difference between the means (and, obviously, lower values) is random, according to the degree of freedom (sample dimension)

– Given N samples and the mean we have N-1 degree of freedom for a single sample:

µ= (x1+ x2+... +xN)/N

126,0||

22

n

t

ba

ba

ss

mm

Page 89: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

89

t-test table

t distribution for 10 degrees of freedomDistinguish two-tails (α/2) from one-tail (α)

t

α

Page 90: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

90

t-test table (Gaussian!)

• Compute the degrees of freedom: (7-1)+(7-1)=12

• Look at the corresponding row the value of the probability (α/2 in case of two-tail): 0.025-->2.179

• The value of t, 0.126 is less or equal than 2.179 hence α is greater than 0.05 we have to choose H0

126,0||

22

n

t

ba

ba

ss

mm

286,37/)2661161(. am

143,37/)2425135(. bm

Any value lower than 2.179 is casual (α=0.05)

Page 91: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

91

T-test with Excel

Page 92: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

92

Another example

714,37/)3444443(. am

143,37/)3433333(. bm

449,2||

22

n

t

ba

ba

ss

mm

Page 93: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

93

t-test table

714,37/)3444443(. am

143,37/)3433333(. bm

449,2||

22

n

t

ba

ba

ss

mm

• Compute the degrees of freedom: (7-1)+(7-1)=12

• Look at the corresponding row the value of the probability (α/2 in case of two-tail): 0.025-->2.179

• The value of t, 2.449 is greater than 2.179 hence α is smaller than 0.05 we can reject H0

Any value greater than 2.179 is NOT casual (α=0.05)

Page 94: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

94

T-test with Excel

Page 95: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

95

p-value

• A value of p equal to 0.05 indicates 5% probability that the difference between the observed means is random

• 0.05 is considered a boundary value:• A measurement is considered valid for values of p<= 0.05

• If:– p<=0.005 the measurement is classified as statistically

significant– p<=0.001 highly significant

• These values are arbitrary, although they are widely used

Page 96: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

96

Probability of error

• If we accept H1 the probability to be wrong is α or less (it depends on the actual p value)

• If we accept H0 the probability to be wrong is (1-α) or more

Page 97: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

97

What does it happen if we have more than two samples?

• If we perform [n*(n-1)/2 ]=3 t-test with a=0.05 the 0.95 probability quickly degrades 0.95*0.95*0.95= 0.86

• With n=5 we need 10 comparisons....

• For n>2 samples we use a technique based-on analysis of variance (ANOVA)

• ANOVA is conceptually similar to t-test, but it considers all means at once

Page 98: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

Analysis Of VAriance

• The analysis of variance (ANOVA) is a statistic test aimed at verifying hypothesis

• Such techniques allow to compare two or more samples through comparing the internal variability within the groups (Varw) with the variability between the groups (VarB)

• The null hypothesis assumes that data of all groups have the same distribution, and that any observed difference is casual

• The idea is that if the variability within the groups is much higher than the variability between the groups than the observed difference is caused by the internal variability.

• The most popular and known set of techniques is based on comparing the variance, and uses the random Snedecor variable F (similar to the t variable for the t test)

• Notice: t-test and Anova on two groups are perfectly equivalent

98

Page 99: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

ANOVA hypotheses

• ANOVA is a general technique that can be used to test thehypothesis that the means among two or more groups areequal, under the assumption that the sampled populationsare normally distributed.

The hypotheses are the followings:

– H0: μ1 = μ2 =…μK

– H1: at least two among the means are different

99

Page 100: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

100

How ANOVA works

• We have

I > 2 different samples: {C1, … ,CI }

• Each sample is assumed to have the same number J of objects (although this is not mandatory)

Yij is the j-th observation on the i-th sample

Where:

II

i

i /)(1

mm

JYJ

j

iji /)(1

m− Mean of sample i :

− General mean:

Page 101: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

101

Fisher test

• The random Snedecor variable F :

• Once the degrees of freedom are known (numerator and denominator) it ispossible to evaluate the probability associated with the values of F

• For a fixed this test tells us whether to accept the null hypothesis ( F<F(I-1),(I(J-1)) ) orreject it ( F>F(I-1),(I(J-1)) )

)]1(/[

)1/(

JISS

ISSF

W

B

))1((,1 JIIF

Page 102: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

102

Fisher test

• Decision criterion:

Page 103: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

Example

F-Distribution: criticalvalue at 0.05

numerator

denominator

Page 104: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

104

Example (2 samples for the sake of simplicity)

28,37/)2661161( am

14,37/)1426135( bm

21,314/)24251352661161( m

2222222 )28,32()28,36()28,36()28,31()28,31()28,36()28,31(WSS

0714,0])21,314,3()21,328,3[(7 22 BSS

29,62)14,31()14,34()14,32()14,36()14,31()14,33()14,35( 2222222

01376,0)]17(2/[29,62

)12/(0714,0

F 75,412,1 F<<

Accept Ho

I=2

J=7

Page 105: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

105

Excel example

Page 106: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

106

Analysis of variance (ANOVA) on tested KLOC and trend of defects

• Most often the attempt of demonstrating an hypothesis substantiates in searching a relation between two variables: if we change A then B changes (following a certain rule)

• For example: we try to demonstrate that if the proportion KT of tested KLOC increases then the trend of defect DR in the first year after the release decreases

• Proportion KT=(tested KLOC)/KLOC (for the sake of simplicity expressed as percentage)

• Trend DR=(D/KLOC)*k (let k be 1)

Page 107: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

107

Dependent and independent variables

• In such analyses we call

– Independent variables those ones that are manipulated to the aim of verifying a hypothesis

– Dependent variables those ones that are observed and that depend (hopefully) on independent ones

• In our example

– KT= independent variable

– DR= dependent variable

Page 108: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

108

Not significant

Page 109: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

109

Highly significant

Page 110: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

110

Our sample set

• Let’s assume that in the considered software house we observe testing activities for three different fixed percentage values:

– 50 % 70% 90%

• and each test activity has been performed on 5 software packages for one year.

• We compute the mean of DR and obtain the following table

Anova allows us to evaluate the probability

that the difference between

the means is random

Page 111: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

111

Post hoc test (1)

• The experiment just performed tells us that the three samples do not belong to the same population

• The probability that the three samples belong to the same population is a=0.000166

• This does not imply that they belong to threedifferent populations! For example, the 70% group and the 90% group could have the same mean…

• It is necessary to compare the single pairs n*(n-1)/2=3

• The problem has been studied in the last years and yet there is not a definitive solution

Page 112: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

112

Post hoc test : Fisher protected test

• This is the method most often applied

• Pairwise tests are used only after ANOVA has confirmed the significance f differences

• For example

1. 50, 70 and 90 do not have the same mean (p=0.000166) Anova at 3

2. 50 and 70 have different means (P1=0.012) Anova at 2 or t-test

3. 70 and 90 have different means (P2=0.010) Anovaat 2 or t-test

4. 50 and 90 have different means (P3=0.00037) Anovaat 2 or t-test

• In this case we can say that 50, 70, and90 have all different means:

– (1-P1)(1-P2)(1-P3)=0.977 and

– P123=1-0.977=0.022

50-70

70-90

50-90

Page 113: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

113

Exercise with ANOVA

Page 114: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

114

Experiment on naming convention

• The quality management system of Acme software house is studying the influence of naming convention on the readability of code, with special attention to the integration phase

• More specifically, the quality management system set up 3 different techniques of naming convention NC1, NC2, NC3 and wants to investigate ifs there is a relationship between the applied technique and the number of errors identified during the integration test phase

• Design an experiment for validating this hypothesis

Page 115: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

115

Dependent and independent variables Structure of the experiment

• Independent variable: type of adopted naming convention technique (NC1, NC2, NC3)

• Dependent variable: number of errors found during the integration test /KLOC

• The different techniques for naming convention are used in three similar projects (an attempt to minimize the influence of other factors) A, B, C.

• The developers involved in and the methodology followed by the three projects are similar

• During the integration phase are anlysed: 6 classes of project A, 6 classes of project B, and six of project C:– A (15, 14, 20, 22, 19, 20) mean 18.3 var 9.9– B (23, 25, 22, 28, 19, 20) mean 22.8 var 11.0– C (15, 16, 15, 18, 16, 22) mean 17.0 var 7.2

Page 116: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

116

3 hypotheses for applying ANOVA

• 1 ) Independence– k samples taken random from k populations

• 2) Normal distribution– K samples with normal distribution– or symmetry between distributions of small samples

(boxplot)– or big large sample (n>30) : central limit theorem– Alternatively: non-parametric tests

• 3) Same variance– k samples with same variance– Levene test (not supported by Excel!)– or empirical test (Excel)

Page 117: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

117

Normal distribution?

Reasonably symmetric wrt the mean

Page 118: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

118

The central limit theorem

• Let X1, X2, X3, …, Xn be a sequence of n independent and identically distributed random variables each having finite values of expectation µ and variance σ2 > 0.

• The central limit theorem states that as the sample size n increases the distribution of the sample average of these random variables approaches the normal distribution with a mean µ and variance σ2/n irrespective of the shape of the common distribution of the individual terms Xi.

• We define the random variable Sn given by:

• x is the arithmetic average between xj

• Snwill converge in distribution to the standard normal distribution, with expected value 0, and variance 1, as n approaches infinity

Page 119: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

119

Let’s assume the hypotheses are satisfied Anova single factor

Null hypothesis Ho-> µa=µb=µc

We can reject the null hypothesis (P=0.00834)Hence at least one mean is different

Page 120: Introduction to the theory of measure and some concepts of ...santucci/SW_Engineering/Material/00_theoryOf... · Lecturer: Giuseppe Santucci ... • We cannot govern what we cannot

120

Post hoc test : Fisher protected test

• After a ANOVA test that reject the null hypothesis we can perform n(n-1)/2 t-test

B>AB>Cµa= µc

Hence we can discard BWe don’t know what to chose between A and C

ok: A and B have different means

ok: B and C have different means

I have to accept the null hypothesis