using r tool for probability and statistics

15
Probability and Statistics Lab n o 1 Report Nazli Temur - April ,2015 PROBABILITY&STATISTICS - NAZLI TEMUR 1

Upload: universite-nice-sophia-antipolis

Post on 14-Apr-2017

74 views

Category:

Engineering


8 download

TRANSCRIPT

Page 1: Using R Tool for Probability and Statistics

Probability and Statistics Lab no 1 Report Nazli Temur - April ,2015

PROBABILITY&STATISTICS - NAZLI TEMUR �1

Page 2: Using R Tool for Probability and Statistics

IntroductionThis lab includes 5 main exercises that should be completed by the help of R Tool.I achived to complete all the exercises except 5th one and this report includes a small

brief as per exercises along with R codes&outcomes.

Exercise 1

1.1 Generate 3 random vectors of size 10000 from different distributions .

• A uniform distribution between 0 and 1.

unif <-runif(10000,0.0,1.0)

• AnormaldistributionN(0,10)

norm<-rnorm(10000,0,sqrt(10))

• A exponential distribution of parameter λ = 2

rexp(10000,2)

a) What is the number of bins to be used to represent the corresponding histograms according to Sturge’s rule?

Technically, Sturges’ rule is a number-of-bins rule rather than a bin-width rule.

> number_of_bin=log(10000,base=2)+1

> number_of_bin

[1] 14.28771

PROBABILITY&STATISTICS - NAZLI TEMUR �2

n=1+log2 N

Page 3: Using R Tool for Probability and Statistics

b) What is the bin size according to the Normal Reference rule?

For Uniform : ((24*(sd(unif)^2)*sqrt(pi))/10000)^(1/3)

0.0706738

For Normal : ((24*(sd(norm)^2)*sqrt(pi))/10000)^(1/3)

0.3470349

For Exponantial : ((24*(sd(exp)^2)*sqrt(pi))/10000)^(1/3)

0.1013582

c) What is the number of bins for each sample vector you have generated according to the Normal Reference Rule ?

For Uniform :

> unif_n=NULL

> unif_max=length(unif)

> unif_min=0

> unif_n=(unif_max-unif_min)/unif_h

> unit_n [1] 141495.2

PROBABILITY&STATISTICS - NAZLI TEMUR �3

Page 4: Using R Tool for Probability and Statistics

For Normal :

> norm_n=NULL //number

> norm_max=length(norm) // number of elements

> norm_max

[1] 10000

> norm_min=0

> norm_n=(norm_max-norm_min)/norm_h

> norm_n //number of elements divided by width of bin equally gives number of bin

[1] 28815.54

For Exponantial :

> exp_n=NULL

> exp_max=length(exp)

> exp_min=0

> exp_n=(exp_max-exp_min)/exp_h

> exp_n

[1] 98660.04

PROBABILITY&STATISTICS - NAZLI TEMUR �4

Page 5: Using R Tool for Probability and Statistics

d)   Represent the histograms (R is using Sturge’s rule with improvements, hence you can just use hist(X)) , cdfs and boxplots of each random vector.

hist(unif)

boxplot(unif)

plot.ecdf(unif)

hist(norm)

boxplot(norm)

plot.ecdf(norm)

hist(exp)

boxplot(exp)

plot.ecdf(exp)

PROBABILITY&STATISTICS - NAZLI TEMUR �5

Page 6: Using R Tool for Probability and Statistics

1.2 For each random vector, compute the empirical variance and the empirical IQR and plot those pairs in a graph.

Varvector=NULL IQRvector=NULL

for(V in seq(1,1000,by=50)) { + x<-rnorm(1000,0,sqrt(V)) + IQRvector=c(IQRvector,IQR(x)) + Varvector=c(Varvector,var(x)) } plot(IQRvector,Varvector)

PROBABILITY&STATISTICS - NAZLI TEMUR �6

Page 7: Using R Tool for Probability and Statistics

Exercise 2

2. E[1/X] vs. 1/E[X]

Let us consider the family of uniform distributions in the interval [100 − v, 100 + v] for v > 0

2.1. What are the mean/variance of the family?

x=[a,b] //a =100-v b=100+v

E=[a+b]/2 //mean

V= [b-a]^2/12 //variance

E=(100+v-(100-v))/2 =100 it means the mean is not depend the variance of this uniform distribution of interval.

V=((100+v) -(100-v))^2 /12 =(2v)^2/12 = v^2/3 which means, the variance is impacted exponentially depend on the v value.

2.2. For each v ∈ {1, 2, . . . 30}, draw a random vector of size 1000, compute its empirical variance v[X] as well as E[1/X] (simply mean(1/x) in R). Plot the pairs (E[1/X] − 1/E[X],

> for(v in seq(1,30,by=1))

+ { E=(100-v)+(100+v)/2

+ V=((100+v)-(100-v))^2/12

+ Vector_x<-rnorm(1000,E,V)

+ }

> for(v in seq(1,30,by=1))

+ { E=(100-v)+(100+v)/2

PROBABILITY&STATISTICS - NAZLI TEMUR �7

Page 8: Using R Tool for Probability and Statistics

+ V=((100+v)-(100-v))^2/12

+ Vector_y<-rnorm(1000,1/E,V)

+ }

> plot(Vector_x,Vector_y)

Exercise 3

3. Dependence vs. similar distribution

3.1. Draw a random variable X and a random variable Y (both of size 10000) from the same exponen- tial distribution of parameter λ = 2. Plot the qqplot and the scatterplot of X and Y . The scatterplot is simply obtained by plot(X,Y). In the scatterplot, it might be useful to zoom in where the mass is. You can adjust the x-axis (resp. y-axis) between the 10-th and 90-th quantiles of X (resp. Y) with the command :

> X<-rexp(10000,2)> Y<-rexp(10000,2)> plot(X,Y,main="Scatter Plot")> qqplot(X,Y,main="QQ Plot")

PROBABILITY&STATISTICS - NAZLI TEMUR �8

Page 9: Using R Tool for Probability and Statistics

For Adjusment :

> min_x=quantile(X,0.1)> max_x=quantile(X,0.9)> min_y=quantile(Y,0.1)> max_y=quantile(Y,0.9)> X2<-X[X>min_x&X<max_x]> Y2<-Y[Y>min_y&Y<max_y]> plot(X2,Y2,main="Adjusted Scatter Plot")> qqplot(X2,Y2,main="Adjusted QQ Plot")>

3.2. Let Z = log(X) + 5. Plot the qqplot and the scatterplot of X and Z. Comment the results

PROBABILITY&STATISTICS - NAZLI TEMUR �9

Page 10: Using R Tool for Probability and Statistics

The distribution of new vector Z follows the same distribution.We can see this via QQ Plot. and If we try to draw a scatter plot it will look like line because there is a relation between Z and X such that Z=a(x)+c , because a is a log of X vector the line will be convergent like logarithm function.

>Z<-log(X)+5> qqplot(Z,X,main=" QQ Plot X-Z”)> Z2<-log(X2)+5> qqplot(Z2,X2,main="Adjusted QQ Plot X2-Z2")

PROBABILITY&STATISTICS - NAZLI TEMUR �10

Page 11: Using R Tool for Probability and Statistics

Exercise 4

4. Loss Events4.1 Data Cleaning

myfile=scan("~/Desktop/LAB/147.32.125.132.loss.txt")Read 3439 items

min=quantile(myfile,0.1)max=quantile(myfile,0.9)X<-myfileX2<-X[X>min&X<max]X2boxplot(X,X2)

myfile2=scan("~/Desktop/LAB/195.204.26.25.loss.txt")Read 16091 items

min2=quantile(myfile2,0.1)max2=quantile(myfile2,0.9)Y<-myfile2Y2<-Y[Y>min&Y<max]Y2boxplot(Y,Y2)

PROBABILITY&STATISTICS - NAZLI TEMUR �11

Page 12: Using R Tool for Probability and Statistics

4.2 Assessing the exponential hypothesis

4.2.1. For each of the 2 connections (the cleaned versions obtained from the previous question), estimate the parameter of the exponential distribution that should model it.

First File

> myfile=scan("~/Desktop/LAB/147.32.125.132.loss.txt")

> Read 3439 items

> min=quantile(myfile,0.1)

> max=quantile(myfile,0.9)

> X<-myfile

> X2<-X[X>min&X<max]

> Mean_vector_x=NULL

> for(V in seq(1,1000,by=1)) {

+ x<-rnorm(1000,mean(X2),sqrt(var(X2)))

+ y<-sample(x,10)

+ Mean_vector_x<-c(Mean_vector_x,mean(y))

+ }

+ > hist(Mean_vector_x,main=“Sample Means")

+ > plot(Mean_vector_x,main=“Sample Means”)

Second File

PROBABILITY&STATISTICS - NAZLI TEMUR �12

Page 13: Using R Tool for Probability and Statistics

Second File

> myfile2=scan("~/Desktop/LAB/195.204.26.25.loss.txt")

Read 16091 items

> min2=quantile(myfile2,0.1)

> max2=quantile(myfile2,0.9)

> Y<-myfile2

> Y2<-Y[Y>min&Y<max]

> Mean_vector_y=NULL

> for(V in seq(1,1000,by=1)) {

+ x<-rnorm(1000,mean(Y2),sqrt(var(Y2)))

+ y<-sample(x,10)

+ Mean_vector_y<-c(Mean_vector_y,mean(y))

+ }

+ > hist(Mean_vector_y,main=“Sample Means of Second File ")

+ > plot(Mean_vector_y,main=“Sample Means of Second File ")

PROBABILITY&STATISTICS - NAZLI TEMUR �13

Page 14: Using R Tool for Probability and Statistics

4.2.2 For each of the 2 connections, generate a random vector following the exponential distribution of size 1000, represent the qqplot of each vector and the corresponding trace. Comment.

qqplot(Mean_vector_x,Mean_vector_y)

Exercise 5

5. Central limit theorem

• A uniform distribution between 0 and 1. • AnormaldistributionN(0,10) • A exponential distribution of parameter λ = 2

5.1 Report in a table the empirical (resp. theoretical) mean and standard deviation for each random vector (resp. random variable).

5.2 Prove that we are in the conditions of the theorem for each vector.

PROBABILITY&STATISTICS - NAZLI TEMUR �14

Page 15: Using R Tool for Probability and Statistics

5.3 Towards which distribution should ︎(n)(Sn − μ) should converge in each case.

5.4 Represent in a table with three columns (one for each original distribution) and two rows corresponding to: • the histogram of the original distributions • S10

5.5 Report also the empirical mean and standard deviation for S10 for all cases.

PROBABILITY&STATISTICS - NAZLI TEMUR �15