statistics, visualization and more using 'r' (298.916 ... · standard probability...

72
Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics, Visualization and More Using ”R” (298.916) Block I+II: Distributions & simulations, loops and functions, linear regression Ass.-Prof. Dr. Wolfgang Trutschnig Research group for Stochastics/Statistics Department for Mathematics University Salzburg www.trutschnig.net Salzburg, March 2017 Wolfgang Trutschnig Statistics, Visualization and More Using ”R” (298.916)

Upload: others

Post on 27-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Statistics, Visualization and More Using ”R” (298.916)

Block I+II: Distributions & simulations, loops and functions, linearregression

Ass.-Prof. Dr. Wolfgang Trutschnig

Research group for Stochastics/StatisticsDepartment for Mathematics

University Salzburg

www.trutschnig.net

Salzburg, March 2017

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 2: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Plan for today:

I Some standard probability distributions (uniform, exponential, normaldistribution) which will be used throughout the seminar

I Generating samples from these distributions

I Loops and if/ifelse

I Writing own R-functions

I Pearson vs. Spearman (rank) correlation

I First steps linear regression

I Exercises

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 3: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Uniform distribution U(a, b)

Uniform distribution U(a, b)

I Range: Interval [a, b] (with a < b).

I The density f is given by

f (x) =1

b − a1[a,b](x).

I For X ∼ U(a, b) we have

E(X ) =a + b

2, V(X ) =

(b − a)2

12.

I Where does this distributionnaturally appear?

I Generate a sample of sizen = 10.000.

1 n <− 100002 x <− r u n i f ( n , min=−1,max=1)3 h i s t ( x , p r o b a b i l i t y = TRUE)

Den

sity

−1.0 −0.5 0.0 0.5 1.00.

00.

10.

20.

30.

40.

5

Figure: Histogram of a sample of size 10.000 from

X ∼ U(−1, 1)

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 4: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exponential distribution E(λ)

Exponential distribution E(λ) (λ > 0)

I Range: Interval [0,∞).

I The density f is given by

f (x) = λe−λx1[0,∞)(x).

I For X ∼ E(λ) we have

E(X ) =1

λ, V(X ) =

1

λ2.

I Where does this distributionnaturally appear?

I Generate a sample of sizen = 10.000.

1 n <− 100002 lambda <− 33 x <− r e x p ( n , r=lambda )4 h i s t ( x , p r o b a b i l i t y = TRUE)

Den

sity

0.0 0.5 1.0 1.5 2.0 2.5 3.00.

00.

51.

01.

52.

0

Figure: Histogram of a sample of size 10.000 from X ∼ E(3)

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 5: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Normal distribution

Normal distribution N (µ, σ2)

I Range: R

I Density of N (µ, σ2)

f (x) =1

√2πσ2

e− (x−µ)2

2σ2 .

I For X ∼ N (µ, σ2) we have

E(X ) = µ, V(X ) = σ2.

I The most important case isX ∼ N (0, 1), for which we haveE(X ) = 0, V(X ) = 1.

I Generate a sample of sizen = 10.000.

1 n <− 100002 mu <− 0 ; s igma <− 13 x <− rnorm ( n , mean=mu, sd=

sigma )4 h i s t ( x , p r o b a b i l i t y = TRUE)

Den

sity

−4 −2 0 2 40.

00.

10.

20.

30.

4

Figure: Histogram of a sample of size 10.000 from

X ∼ N (0, 1)

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 6: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Normal distribution

I Why ’Normal’ distribution?

Example (Coin tossing)

I A fair coin is tossed n times.

I x1, x2, . . . , xn denote the results (0or 1).

I Calculate the standardized value

z := 2√n (xn−0.5) =

√n

(xn − 0.5)√

0.5 0.5.

I1 n <− 1002 x <− sample ( c ( 0 , 1 ) , n , r e p l a c e

=TRUE)3 z <− s q r t ( n )∗ ( mean ( x )−0.5) /

s q r t ( 0 . 5 ∗ 0 . 5 )

I Repeat R = 20.000 times and plotthe histogram of z1, . . . , zR .

n=30

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

n=100

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

n=500

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

n=5000

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

Figure: Histogram of z1, . . . , zR .

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 7: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Normal distribution

Example (Rolling a dice)

I A dice is rolled n times.

I x1, x2, . . . , xn denotes the results.

I Calculate the standardized value

z :=√n

(xn − 3.5)√35/12

.

I1 n <− 1002 x <− sample ( 1 : 6 , n , r e p l a c e=

TRUE)3 z <− s q r t ( n )∗ ( mean ( x )−3.5) /

s q r t (35 / 12)

I Repeat R = 20.000 times and plotthe histogram of z1, . . . , zR .

n=30

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

n=100

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

n=500

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

n=5000

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

Figure: Histogram of z1, . . . , zR .

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 8: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Normal distribution

Example (Uniform distribution)

I Suppose that X ∼ U(0, 1) and thatx1, x2, . . . , xn denote a sample of X .

I Calculate the standardized value

z :=√n

(xn − 1/2)√1/12

.

I1 n <− 1002 x <− r u n i f ( n , 0 , 1 )3 z <− s q r t ( n )∗ ( mean ( x )−0.5) /

s q r t (1 / 12)

I Repeat R = 20.000 times and plotthe histogram of z1, . . . , zR .

n=30

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

n=100

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

n=500

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

n=5000

x

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

Figure: Histogram of z1, . . . , zR .

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 9: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Normal distribution

I In each of the considered cases the standardized mean approximately had aN (0, 1)-distribution.

I ...we observed examples for the central limit theorem (CLT).

I The general result is as follows:

Theorem (CLT)

Suppose that (Xn)n∈N is an i.i.d. sequence of random variables with finite varianceV(X1) = σ2 > 0. Set µ := E(X1) and define Zn as

Zn :=

∑ni=1 Xi − nµ√nσ

=√n

Xn − µσ

for every n ∈ N. Then FZn (x) −→ Φ(x) for n→∞ and arbitrary x ∈ R(Φ.... distribution function of N (0, 1)).

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 10: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Loops

Learning by doing - let’s have a look at an example

I1 R <− 100002 e r g <− r e p ( 0 ,R)3 n <− 5004 f o r ( i i n 1 :R){5 x <− r u n i f ( n , 0 , 1 )6 e r g [ i ] <− s q r t ( n )∗ ( mean ( x )−0.5) / s q r t (1 / 12)7 }8 h i s t ( erg , c o l=” l i g h t b l u e ” , main=”” , x l a b=” x ” , y l a b=” f r e q u e n c y ” ,

p r o b a b i l i t y=TRUE, x l i m=c (−4 ,4) , b r e a k s =35)

I @line 2: construct a vector with name ’erg’ of length R only containing zeros.

I @line 4: repeat the same procedure R times; save the result of the first run inthe 1-st coordinate of ’erg’, the result of the second run in the 2nd coordinate,the result of the third run in the 3rd coordinate, and so on till run number R.

I @line 8: plot a histogram of the resulting values.

Solve exercises 01 - 03 in the R-script R-Codes-R-SVm01.R.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 11: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

If and ifelse statements

Learning by doing - let’s have a look at two examples of if-statements

I1 x <− rnorm ( 1 )2 i f ( x<0){ p r i n t ( ” N e g a t i v e v a l u e ” )}

I @line 1: sample of size one of X ∼ N (0, 1).

I @line 2: If the value is negative print ’Negative value’.

I1 n <− 10002 x <− rnorm ( n )3 z <− r e p ( 0 , n )4 f o r ( i i n 1 : n ){5 i f ( x [ i ]>=0){z [ i ] <− 1}6 }7 mean ( z )

I What is the code doing?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 12: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

If and ifelse statements

I If loops can be avoided by using ’ifelse’ instead.

I The subsequent code is significantly faster than the previous snippet:

I1 n <− 10002 x <− rnorm ( n )3 z <− i f e l s e ( x>=0 ,1 ,0)4 mean ( z )

Solve exercises 04 - 07 in the R-script R-Codes-R-SVm01.R.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 13: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

R-functions

Learning by doing - let’s have a look at the following simple function

I1 #@ f u n c t i o n s :2 my . fun <− f u n c t i o n ( n ){3 x <− r u n i f ( n ,−1 ,1)4 a <− min ( x )5 b <− max ( x )6 r e s <− c ( a , b )7 r e t u r n ( r e s )8 }9

10 my . fun ( 1 0 0 )

I What does the function do?

I How can the function be applied?

I NB: A function takes some arguments/data as input, does some calculations andthen returns a result as output.

I Any structures (vector, data.frame, list, etc.) can serve as input and as output.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 14: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

R-functions

I Let’s extend the previous function ’my.fun’ in such a way that the user can alsochoose the parameters of the uniform distribution and that the functionautomatically produces a histogram:

I1 my . fun2 <− f u n c t i o n ( n , a=0,b=1){2 x <− r u n i f ( n , min=a , max=b )3 h i s t ( x , p r o b a b i l i t y = TRUE, c o l=” l i g h t b l u e ” )4 a <− min ( x )5 b <− max ( x )6 r e s <− c ( a , b )7 r e t u r n ( r e s )8 }9

10 my . fun2 (1000 , a=3,b=5)

Solve exercises 08 - 09 in the R-script R-Codes-R-SVm01.R.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 15: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercise 10: A small simulation study concerning GPS bias

Exercise 10: A small simulation study concerning GPS bias

I Ten students of geoinformatics want to test GPS-based distance measurements.

I They (consecutively) record the GPS-coordinates of (the outer track of) the100m starting line in an athletics stadium close by, then (consecutively) walkalong the outer track till the finishing line, and again record the GPS-coordinates.

I Each of them repeats this procedure 50 times.

I For each of the 500 pairs they calculate the distance in meters.

I Given the sample size of n = 500 they expect the mean distance to be prettyclose to 100m (why?).

I All the bigger the surprise when the mean distance turns out to be roughly 102m.

I What went wrong - just bad luck?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 16: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercise 10: A small simulation study concerning GPS bias

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

−40

−20

0

20

40

0 50 100 150x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 17: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercise 10: A small simulation study concerning GPS bias

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

−40

−20

0

20

40

0 50 100 150x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 18: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercise 10: A small simulation study concerning GPS bias

I What went wrong - just bad luck?

I We answer the question by means of simulations.

I Assume that the starting point S and the end point Z have the following exactcoordinates: S = (0, 0),Z = (100, 0)

I S ′,Z ′ will denote the measured coordinates; F = (X1,Y1) denotes themeasurement error in S, G = (X2,Y2) the measurement error in Z .

I In other words

S ′ = S + (X1,Y1) = (X1,Y1)

Z ′ = Z + (X2,Y2) = (100 + X2,Y2)

I The measured distance d therefore given by

d =√

(100 + X2 − X1)2 + (Y2 − Y1)2

I To simplify matters we assume that the errors follow a normal distribution, i.e.X1,X2,Y2,Y2 ∼ N (0, σ2).

I Consider the case σ = 15.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 19: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercise 10: A small simulation study concerning GPS bias

Exercise 10:

I Simulate n = 100.000 (or more) distance measurements.

I Calculate the corresponding distances distances d1, . . . , dn.

I Calculate the mean distance dn - is it greater or smaller than 100m?

I Produce a boxplot of the calculated distances.

I Analyze what happens if σ2 is increased or reduced.

I Find a possible explanation of the observation made.

I Write a function with sample size n as input parameter which produces a boxplotof the distances and returns the mean dn.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 20: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Quick Reminder: Pearson correlation coefficient ρ

●●

●●

●●

●●

−5

0

5

10

−2 0 2 4x

y

Figure: What is the correlation coefficient of the drawn

sample?

I The graphic depicts a sample(x1, y1), . . . , (xn, yn).

I Give a rough estimate of thecorrelation coefficient ρ of the sample

I How can ρ be calculated?

I Let sx (resp. sy ) denote the standarddeviation of the x-coordinates(y -coordinates) of the sample, i.e.

sx =

√√√√ 1

n − 1

n∑i=1

(xi − xn)2

sy =

√√√√ 1

n − 1

n∑i=1

(yi − yn)2

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 21: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Quick Reminder: Pearson correlation coefficient ρ

●●

●●

●●

●●

−5

0

5

10

−2 0 2 4x

y

Figure: What is the correlation coefficient of the drawn

sample?

I Let sxy denote the (empirical)covariance of the sample, i.e.

sxy =1

n − 1

n∑i=1

(xi − xn)(yi − yn)

I The (Pearson) correlation coefficientρxy is defined as

ρxy =sxy

sx sy

if sx , sy > 0.

I In our case we get ρxy = 0.97464.

I How can this value be interpreted?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 22: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Quick Reminder: Pearson correlation coefficient ρ

Properties of ρ

I Whenever ρxy exists (i.e. whenever sx , sy > 0) we have −1 ≤ ρxy ≤ 1.

I We have ρxy = ρyx . As a consequence we will simply write ρ in the sequel.

I ρ = 1 if and only if (x1, y1), . . . , (xn, yn) lie on a straight line with positive slope.

I ρ = −1 if and only if (x1, y1), . . . , (xn, yn) lie on a straight line with negativeslope.

I In case of ρ = 0 we call the sample (x1, y1), . . . , (xn, yn) uncorrelated.

I ρ = 0 is not a measure of dependence - it only measures linear dependence.

I ρ = 0 means that there is no linear dependence.

I If instead of (x1, y1), . . . , (xn, yn) we consider (2x1, 3y1), . . . , (2xn, 3yn), whathappens to ρ?

I If instead of (x1, y1), . . . , (xn, yn) we consider (−2x1,−3y1), . . . , (−2xn,−3yn),what happens to ρ?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 23: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Quick Reminder: Pearson correlation coefficient ρ

I If instead of (x1, y1), . . . , (xn, yn) we consider (−2x1,−3y1), . . . , (−2xn,−3yn),what happens to ρ?

1 f i l e <− u r l ( ” h t t p : //www. t r u t s c h n i g . n e t / geo r e g 1 . RData” )2 l o a d ( f i l e )3 A<−geo r e g 14 head ( geo r e g 1 )5

6 c o r (A$x , A$ y )7 c o r (2∗A$x , 3∗A$ y )8 c o r (−2∗A$x ,−3∗A$ y )9 c o r (−2∗A$x , 3∗A$ y )

I ρ does not change under linear transformations with the same sign.

I ρ changes, however, under non-linear transformations:

I If instead of (x1, y1), . . . , (xn, yn) we consider (x31 , y

31 ), . . . , (x3

n , y3n ) then we get

ρ = 0.9.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 24: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Spearman rank correlation ρS

I Assume we want to have a measure quantifying if there is a monotonicrelationship between the x- and the y -coordinates of a sample(x1, y1), . . . , (xn, yn).

I ’Monotonic relationship’ (or concordance) in the sense that if the x-coordinatesincrease then also the y -coordinates (grow or fall together).

I There is no need for the relationship to be linear.

I One natural idea is to work with ranks - best explained by some simple examples:

1 x1 <− c ( 3 , 1 , 4 , 15 , 13)2 r 1 <− rank ( x1 )3 x14 #[ 1 ] 3 1 4 15 135 r 16 #[ 1 ] 2 1 3 5 4

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 25: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Spearman rank correlation ρS

1 x1 <− c ( 3 , 1 , 3 , 15 , 13)2 r 1 <− rank ( x1 )3 x14 #[ 1 ] 3 1 3 15 135 r 16 #[ 1 ] 2 . 5 1 . 0 2 . 5 5 . 0 4 . 0

I The values are sorted - the rank rk(xi ) of observation xi is the position after theranking.

I In case of ties averages of the ranks will be calculated (other choices are optionalin the function).

I From (x1, y1), . . . , (xn, yn) we get the sample ranks(rkx (x1), rky (y1)), . . . , (rkx (xn), rky (yn)).

I rkx (xi ) is the rank of observation xi among x1, . . . , xn.

I rky (yi ) is the rank of observation yi among y1, . . . , yn.

I The Spearman rank correlation is defined as the Pearson correlation of theseranks.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 26: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Spearman rank correlation ρS

Example

I Considering the following sample of size n = 5

x y3.05 10.211.38 2.194.32 19.31

15.51 241.087.08 50.81

x y rk.x rk.y3.05 10.21 2.00 2.001.38 2.19 1.00 1.004.32 19.31 3.00 3.00

15.51 241.08 5.00 5.007.08 50.81 4.00 4.00

I What can be seen?

I For ρS we get ρs = 1

1 c o r ( rank (E$ x ) , rank (E$ y ) )2 c o r (E$x , E$y , method=” spearman ” )

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 27: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Spearman rank correlation ρS

Properties of ρS :

I Whenever ρS exists we have −1 ≤ ρxy ≤ 1.

I ρS is symmetric too.

I ρS = 1 if and only if: for each pair (xi , yi ), (xj , yj ) we have xi ≤ xj and only ifyi ≤ yj .

I ρS = −1 if and only if: for each pair (xi , yi ), (xj , yj ) we have xi ≤ xj if and onlyif yi ≥ yj .

I ρS = 0 is not a measure of dependence - it only measures monotonicdependence (aka concordance).

I ρS = 0 means that there is no monotonic relationship dependence.

I If instead of (x1, y1), . . . , (xn, yn) we consider (2x1, 3y1), . . . , (2xn, 3yn), whathappens to ρS?

I If instead of (x1, y1), . . . , (xn, yn) we consider (−2x1,−3y1), . . . , (−2xn,−3yn),what happens to ρS?

I If instead of (x1, y1), . . . , (xn, yn) we consider (x31 , y

31 ), . . . , (x3

n , y3n ), what

happens to ρS?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 28: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Spearman rank correlation ρS

1 f i l e <− u r l ( ” h t t p : //www. t r u t s c h n i g . n e t / geo r e g 1 . RData” )2 l o a d ( f i l e )3 A<−geo r e g 14 head ( geo r e g 1 )5

6 c o r (A$x , A$y , method = ” spearman ” )7 c o r (2∗A$x , 3∗A$y , method = ” spearman ” )8 c o r (−2∗A$x ,−3∗A$y , method = ” spearman ” )9 c o r (A$ x ˆ3 ,A$ y ˆ3 , method = ” spearman ” )

I For all four cases we get ρS = 0.9633945.

I Easy to verify: ρS is invariant under monotonic transformations (both increasingor both decreasing).

I Let’s add two outliers to A and see how ρ and ρS change.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 29: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Spearman rank correlation ρS

●●

●●

●●

−2 0 2 4 6 8 10

−5

05

10

x

y

1 Dazu<−data . f rame ( x=c ( 1 0 , 1 0 . 3 ) , y=c( 2 , 2 . 4 ) )

2 A1<−r b i n d (A, Dazu )3 p l o t (A1)4 c o r (A1$x , A1$ y )5 c o r (A1$x , A1$y , method = ” spearman ” )

I Which is more influenced by the twonew points?

I We get ρ = 0.8187617 (beforeρ = 0.97464)

I Moreover ρS = 0.9349794 (beforeρS = 0.9633945)

I ρ is less robust against outliers than ρS

I Rank-based quantities are generallyrobust

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 30: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

● ●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

−4 −2 0 2 4

−3

−2

−1

01

23

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

0 200 400 600 800 1000

020

040

060

080

010

00

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 31: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

● ●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

−4 −2 0 2 4

−3

−2

−1

01

23

rho=0.0477

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

0 200 400 600 800 1000

020

040

060

080

010

00

rho_S=0.0469

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 32: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

● ●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

−4 −2 0 2 4

−4

−2

02

4

x

y

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

0 200 400 600 800 1000

020

040

060

080

010

00

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 33: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

● ●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

−4 −2 0 2 4

−4

−2

02

4

rho=0.7266

x

y

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

0 200 400 600 800 1000

020

040

060

080

010

00

rho_S=0.7013

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 34: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●● ●

●●

● ●

●●

●● ●

● ●

● ●

● ●

●●

● ●

●●

●●

●● ●

●●

● ●●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

● ●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

● ●

●● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−4 −2 0 2 4

−20

−10

010

20

x

y

●●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

0 200 400 600 800 1000

020

040

060

080

010

00

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 35: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●● ●

●●

● ●

●●

●● ●

● ●

● ●

● ●

●●

● ●

●●

●●

●● ●

●●

● ●●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

● ●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

● ●

●● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−4 −2 0 2 4

−20

−10

010

20

rho=−0.9175

x

y

●●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

0 200 400 600 800 1000

020

040

060

080

010

00

rho_S=−0.9652

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 36: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 200 400 600 800 1000

020

040

060

080

010

00

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 37: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

rho=0.0143

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 200 400 600 800 1000

020

040

060

080

010

00

rho_S=0.0251

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 38: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

● ● ●

● ●

● ●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

● ●

● ●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

● ●

● ●

● ●

●●

●●

●●●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●●

−3 −2 −1 0 1 2 3

02

46

8

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●●

0 200 400 600 800 1000

020

040

060

080

010

00

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 39: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

● ● ●

● ●

● ●

● ●●

●●

●●

●●

●●

● ●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

● ●

● ●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

● ●

● ●

● ●

●●

●●

●●●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●●

−3 −2 −1 0 1 2 3

02

46

8

rho=0.8576

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●●

0 200 400 600 800 1000

020

040

060

080

010

00

rho_S=0.9422

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 40: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Some examples and exercises

Solve Exercise 11 and Exercise 12 in the R-script R-Codes-R-SVm01.R.

Exercise 13: Can you find a sample (x1, y1), . . . , (xn, yn) for which the Pearsoncorrelation ρ and the Spearman correlation ρS have different sign?Hint: Running simulations is never a bad idea; simulate five x-coordinates and fivey -coordinates from U(0, 1) and calculate ρ and ρS ; repeat several times

Solve Exercise 14 in the R-script R-Codes-R-SVm01.R.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 41: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

What is regression all about - a general perspective

Known:

I We know that there is a relationship between quantities X and Y of thefollowing form:

Y = r(X ) + ε (1)

I r is an unknown function and ε is a random error fulfilling E(ε) = 0.

I Usually we also assume that ε is not influenced by X (might be a too restrictivecondition in various situations).

I We call X the predictor and Y the response.

Wanted:

I Based on observations (x1, y1), (x2, y2), . . . , (xn, yn) from (1) we want todetermine/estimate the function r (why?).

I If we have a good estimator r of r then we can predict Y for arbitrary values ofX by considering r(X ).

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 42: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

A real-life example

Example (Offer optimization in supermarkets)

I A supermarket chain wants to optimize their offers.

I If the price is only reduced by 5% then the sales numbers will only go up a bit.

I If the price is reduced by 50% then the sales numbers will go up a lot but thecompany might earn less because the margin is too small.

I Objective: Determine the optimal price reduction in the sense that thesupermarket’s profit is maximal.

I X ...price reduction (absolute or percentage) of a certain product.

I Y ...net earnings (based on this product).

I Y = r(X ) + ε.

I What do you think: Is the model solely based on price reduction as predictorgood?

I Which other predictors would you choose?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 43: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

●●

●●

●●

●●

−5

0

5

10

−2 0 2 4x

y

Figure: Prediction at the point x = 1.5?

I The graphic depicts measurements(x1, y1), . . . , (xn, yn).

I It is known that the data comes fromthe following linear model

Y = aX + b︸ ︷︷ ︸r(X )

+ε.

I In other words: yi = axi + b + εi fori ∈ {1, . . . , n}.

I εi ...samples of the random error εfulfilling E(ε) = 0 that do not influenceeach other and are not influenced by xi .

I Wanted: Forecast the y -value at thepoint x = 1.5.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 44: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

●●

●●

●●

●●

−5

0

5

10

−2 0 2 4x

y

Figure: Prediction at the point x = 1.5?

I How would you predict the value atthe point x = 1.5?

I Problem: We do not know theparameters a and b.

I Choose a and b in such a way thatthe straight line y = ax + b fits thedata in the best possible way.

I Denote the optimal values by a andb.

I Given a and b, predicty = a 1.5 + b.

I Which of the following straightlines fits best?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 45: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

●●

●●

●●

●●

−5

0

5

10

−2 0 2 4x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 46: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

I Choose those values for a and b that minimize the prediction errors at the pointsin the sample.

I Choosing a and b as parameters we would forecast axi + b for xi .

I The error ri we make is ri = yi − (axi + b) = yi − axi − b. Plot ri

I The sum of all squared errors is given by

F (a, b) :=n∑

i=1

(yi − axi − b

)2(2)

I Choose a and b in such a way that F (a, b) is minimal.

I Analytic calculation yields the following optimal values

a =

∑ni=1(xi − xn)(yi − yn)∑n

i=1(xi − xn)2=

sxy

s2x

(3)

b = yn − a xn. (4)

I For our given sample we get a = 2.010 and b = 0.897.

I The forecast at the point x = 1.5 therefore is y = 2.01 · 1.5 + 0.897 = 3.912

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 47: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

Back

●●

●●

●●●

●●

−3 −2 −1 0 1 2 3 4

−5

05

10

x

y

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 48: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

I Before fitting linear models in R some additional observations:

I The estimate slope a =sxys2x

looks a bit like the Pearson correlation ρ =sxysx sy

.

I Using both expressions we get

a = ρsy

sx

I Increasing x by one standard deviation sx increases y by ρ standard deviationssy , in fact

r(x + sx ) = a(x + sx ) + b = ax + b︸ ︷︷ ︸y

+asx = y + ρsy

sxsx = y + ρsy .

I How do we quantify if our optimal model offers a good explanation of the model?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 49: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

I A natural idea is the coefficient of determination R2

I Easy to show:

n∑i=1

(yi − yn)2 =n∑

i=1

(yi − yi )2︸ ︷︷ ︸

r2i

+n∑

i=1

(yi − yn)2

I Variance of y1, . . . , yn equals the variance of the residuals plus the variance ofthe forecasts y1, . . . , yn.

I Calculate

R2 = 1−∑n

i=1(yi − yi )2∑n

i=1(yi − yn)2=

∑ni=1(yi − yn)2∑ni=1(yi − yn)2

(5)

I R2 is the portion of y -variance explained by the model.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 50: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

●●

●●

●●

●●

−5

0

5

10

−2 0 2 4x

y

R^2=0.9499

●●

●●

●●

●●

●●

0.0

2.5

5.0

7.5

−2 0 2 4x

y

R^2=0.1044

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 51: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

Properties of R2:

I We have 0 ≤ R2 ≤ 1.

I The higher R2 the higher the percentage of variance explained by the model.

I If R2 is close to 1 then the model explains the data very well.

I If R2 is close to 0 the model does not help much to explain the data.

I There should be a strong interrelation between R2 and the correlation ρ of theoriginal sample (x1, y1), . . . , (xn, yn)...

I Calculations in R will make this clear.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 52: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

1 f i l e <− u r l ( ” h t t p : //www. t r u t s c h n i g . n e t / geo r e g 1 . RData” )2 l o a d ( f i l e )3 head ( geo r e g 1 )4 A<−geo r e g 15

6 model<−lm ( data=A, y ˜ x ) #use what eve r name you want i n s t e a d o fmodel

7 summary ( model )

I yields

1

2 C a l l :3 lm ( f o r m u l a = y ˜ x , data = A)4

5 R e s i d u a l s :6 Min 1Q Median 3Q Max7 −3.07477 −0.63681 −0.03544 0.70030 1.95308

I and

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 53: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Linear regression

1 C o e f f i c i e n t s :2 E s t i m a t e Std . E r r o r t v a l u e Pr (>| t | )3 ( I n t e r c e p t ) 0 .89704 0.11406 7 . 8 6 5 1 . 1 3 e−11 ∗∗∗4 x 2 .00965 0.05035 39 .917 < 2e−16 ∗∗∗5 −−−6 S i g n i f . codes : 0 ∗∗∗ 0 . 0 0 1 ∗∗ 0 . 0 1 ∗ 0 . 0 5 .

0 . 1 17

8 R e s i d u a l s t a n d a r d e r r o r : 1 . 0 3 on 84 d e g r e e s o f f reedom9 M u l t i p l e R−s q u a r e d : 0 . 9 4 9 9 , A d j u s t e d R−s q u a r e d : 0 .9493

10 F−s t a t i s t i c : 1593 on 1 and 84 DF, p−v a l u e : < 2 . 2 e−16

I Calculate the prediction for x = 1.5

1 ND<−data . f rame ( x=c ( 1 . 5 ) )2 p<−p r e d i c t ( model , new=ND)3 p4 3.91152

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 54: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercises

Exercise 15:

I Load the dataset geo reg1.RData (see R-Code, end of part 01 in linearregression).

I Produce a scatterplot of the data including the regression line.

I Add the values of the estimated parameters a and b in the title of the plot.

I Produce a boxplots of the residuals r1, . . . , rn.

I Calculate ρ and ρS of the data.

I Forecast r(x) for x ∈ {0, 0.1, 0.2, . . . , 0.9, 1}.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 55: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercises

Exercise 16:

I The datset ’brainhead.txt’ (see R-Code )contains Brain weight (grams) and headsize (cm3) for 237 adults.

I Fit a linear regression with ’weight’ as response and ’cm3’ as explanatoryvariable.

I Plot the data together with the regression line.

I Calculate the corresponding R2.

I Calculate the biggest ten residuals (’biggest in the sense of absolute value’) -how many man and how many woman are in the ’top-ten’?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 56: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

@Performance

Summary @univariate linear regression

I (x1, y1), . . . , (xn, yn) are observations from the model Y = aX + b + ε.

I Thereby ε was a random error fulfilling E(ε) = 0; set σ2 = V(ε).

I In other words: yi = axi + b + εi for every i ∈ {1, . . . , n} .

I Using least squares we got the following estimators a of a and b of b

a =

∑ni=1(xi − xn)(yi − yn)∑n

i=1(xi − xn)2=

sxy

s2x

(6)

b = yn − a xn. (7)

I We hope to get a ≈ a and b ≈ b, i.e. we hope that the estimates are close tothe true values.

I Will this always be the case?

I When can we expect to get good estimates?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 57: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

@Performance

1 #one s i m u l a t i o n2 a<−2 ; b<−13 n<−1004 x<−r u n i f ( n ,−3 ,4) #g e n e r a t e random x v a l u e s5 e r r o r<−rnorm ( n , 0 , 1 ) #e r r o r from normal d i s t r i b u t i o n N( 0 , 1 )6 y<−a∗x+b+e r r o r7 A<−data . f rame ( x=x , y=y )8 p l o t (A)9 model<−lm ( data=A, y ˜ x )

10 a b l i n e ( model )11 summary ( model )

I yields

1

2 C a l l :3 lm ( f o r m u l a = y ˜ x , data = A)4

5 R e s i d u a l s :6 Min 1Q Median 3Q Max7 −2.20647 −0.66814 −0.09888 0.77627 1.95348

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 58: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

@Performance

1 C o e f f i c i e n t s :2 E s t i m a t e Std . E r r o r t v a l u e Pr (>| t | )3 ( I n t e r c e p t ) 0 .76146 0.10149 7 . 5 0 3 2 . 8 7 e−11 ∗∗∗4 x 2 .09902 0.04686 44 .790 < 2e−16 ∗∗∗5 −−−6 S i g n i f . codes : 0 ∗∗∗ 0 . 0 0 1 ∗∗ 0 . 0 1 ∗ 0 . 0 5 . 0 . 1 17

8 R e s i d u a l s t a n d a r d e r r o r : 0 .9631 on 98 d e g r e e s o f f reedom9 M u l t i p l e R−s q u a r e d : 0 . 9 5 3 4 , A d j u s t e d R−s q u a r e d : 0 .9529

10 F−s t a t i s t i c : 2006 on 1 and 98 DF, p−v a l u e : < 2 . 2 e−16

1 sum ( model $ r e s i d u a l s ˆ2) / ( n−2)

I yields

1

2 [ 1 ] 0 .9275251

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 59: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

@Performance

1 #s e v e r a l r u n s2 R<−10003 E<−data . f rame ( a=r e p ( 0 ,R) , b=r e p ( 0 ,R) )4

5 a<−2 ; b<−16 n<−1007 f o r ( i i n 1 :R){8 x<−r u n i f ( n ,−3 ,4) #g e n e r a t e random x v a l u e s9 e r r o r<−rnorm ( n , 0 , 1 )

10 y<−a∗x+b+e r r o r11 A<−data . f rame ( x=x , y=y )12 model<−lm ( data=A, y ˜ x )13 E [ i , ]<−as . numer ic ( c o e f f i c i e n t s ( model ) ) [ 2 : 1 ]14 }

I yields

1 a b

2 1 1.994841 0.8079434

3 2 1.987354 1.0237531

4 3 1.951075 0.9251133

5 4 1.999110 1.0721703

6 5 1.996653 0.8200005

7 6 1.968700 1.0383586

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 60: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

@Performance

1 a b2 Min . : 1 . 8 5 8 Min . : 0 . 6 8 4 13 1 s t Qu . : 1 . 9 6 6 1 s t Qu . : 0 . 9 2 8 04 Median : 2 . 0 0 1 Median : 0 . 9 9 9 45 Mean : 2 . 0 0 2 Mean : 0 . 9 9 8 96 3 rd Qu . : 2 . 0 3 5 3 rd Qu . : 1 . 0 7 2 27 Max . : 2 . 1 6 2 Max . : 1 . 3 0 3 1

I What does the table tell us?

I A graphical overview also helps to interpret the results.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 61: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

@Performance

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

0.8

1.0

1.2

1.9 2.0 2.1a

b

sample size n= 100

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 62: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Influence of the parameters at stake

Natural related questions:

I What happens if the sample size n is increased?

I The more info the better the estimates should (on average) be!

I What other parameter in the simulation could have an influence on the quality ofthe estimates?

I Answer: The variance σ2 of ε is important.

I The higher the variance the poorer the estimates.

I Repeat the simulation (several runs) for higher and lower sample size and varythe variance of the error.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 63: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Influence of the parameters at stake

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

0.8

1.0

1.2

1.9 2.0 2.1a

b

sample size n= 100

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 64: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Influence of the parameters at stake

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

0.85

0.90

0.95

1.00

1.05

1.10

1.96 2.00 2.04 2.08a

b

sample size n= 500

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 65: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Influence of the parameters at stake

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

0.90

0.95

1.00

1.05

1.10

1.950 1.975 2.000 2.025 2.050a

b

sample size n= 1000

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 66: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Influence of the parameters at stake

●● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

0.96

0.98

1.00

1.02

1.98 1.99 2.00 2.01a

b

sample size n= 10000

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 67: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Influence of the parameters at stake

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

0.8

1.0

1.2

1.9 2.0 2.1a

b

sample size n= 100

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 68: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Influence of the parameters at stake

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●●

● ●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

0.4

0.8

1.2

1.6

1.8 2.0 2.2 2.4a

b

sample size n= 100, sigma^2=4

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 69: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Influence of the parameters at stake

●●

●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●

0

1

2

1.6 2.0 2.4a

b

sample size n= 100, sigma^2=16

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 70: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercises

Exercise 17:Modify the last 30 lines of the R-Code R-Codes-R-SVm01.R to do the following:

I Simulate a sample of size n = 100 from the model Y = 0.5X − 1 + ε wherebyε ∼ N (0, 0.5).

I Include a scatterplot of the data including the regression line; include theestimated parameters a and b in the title of the scatterplot.

I Produce a boxplots of the residuals r1, . . . , rn.

I Calculate ρ and ρS of the data.

I Forecast r(x) for x ∈ {0, 0.1, 0.2, . . . , 0.9, 1}.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 71: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercises

Exercise 18:Modify the last 30 lines of the R-Code R-Codes-R-SVm01.R to do the following:

I Simulate a sample of size n = 100 from the model Y = 0.5X − 1 + ε withε ∼ N (0, 0.5).

I Save the estimated parameters a and b in a data.frame A.

I Repeat the previous two steps R = 1000 times.

I Produce a boxplots of the estimates a1, . . . , aR and a boxplot of the estimatesb1, . . . , bR .

I Calculate the biggest, the smallest and the median value of a1, . . . , aR .

I Calculate the biggest, the smallest and the median value of b1, . . . , bR .

I Repeat the previous steps for bigger sample size and/or for bigger variance ofthe errors.

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)

Page 72: Statistics, Visualization and More Using 'R' (298.916 ... · Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression Statistics,

Standard probability distributions Loops, if/ifelse, R-functions Correlation Regression in general Linear regression

Exercises

Exercise 19:

I In the literature and in bad courses one frequently sees that regression onlyworks in case the errors have normal distribution.

I Consider U(−1, 1)-distributed errors using the command error=runif(n,-1,1) andrepeat the tasks in Exercise 18 and Exercise 19 for this situation.

I Do we also get good results in this setting?

Wolfgang Trutschnig

Statistics, Visualization and More Using ”R” (298.916)