using r tool for probability and statistics

Probability and Statistics Lab no 1 Report Nazli Temur - April ,2015

PROBABILITY&STATISTICS - NAZLI TEMUR �1

IntroductionThis lab includes 5 main exercises that should be completed by the help of R Tool.I achived to complete all the exercises except 5th one and this report includes a small

brief as per exercises along with R codes&outcomes.

Exercise 1

1.1 Generate 3 random vectors of size 10000 from different distributions .

• A uniform distribution between 0 and 1.

unif <-runif(10000,0.0,1.0)

• AnormaldistributionN(0,10)

norm<-rnorm(10000,0,sqrt(10))

• A exponential distribution of parameter λ = 2

rexp(10000,2)

a) What is the number of bins to be used to represent the corresponding histograms according to Sturge’s rule?

Technically, Sturges’ rule is a number-of-bins rule rather than a bin-width rule.

> number_of_bin=log(10000,base=2)+1

> number_of_bin

[1] 14.28771


n=1+log2 N

b) What is the bin size according to the Normal Reference rule?

For Uniform : ((24*(sd(unif)^2)*sqrt(pi))/10000)^(1/3)

0.0706738

For Normal : ((24*(sd(norm)^2)*sqrt(pi))/10000)^(1/3)

0.3470349

For Exponantial : ((24*(sd(exp)^2)*sqrt(pi))/10000)^(1/3)

0.1013582

c) What is the number of bins for each sample vector you have generated according to the Normal Reference Rule ?

For Uniform :

> unif_n=NULL

> unif_max=length(unif)

> unif_min=0

> unif_n=(unif_max-unif_min)/unif_h

> unit_n [1] 141495.2


For Normal :

> norm_n=NULL //number

> norm_max=length(norm) // number of elements

> norm_max

[1] 10000

> norm_min=0

> norm_n=(norm_max-norm_min)/norm_h

> norm_n //number of elements divided by width of bin equally gives number of bin

[1] 28815.54

For Exponantial :

> exp_n=NULL

> exp_max=length(exp)

> exp_min=0

> exp_n=(exp_max-exp_min)/exp_h

> exp_n

[1] 98660.04


d) Represent the histograms (R is using Sturge’s rule with improvements, hence you can just use hist(X)) , cdfs and boxplots of each random vector.

hist(unif)

boxplot(unif)

plot.ecdf(unif)

hist(norm)

boxplot(norm)

plot.ecdf(norm)

hist(exp)

boxplot(exp)

plot.ecdf(exp)


1.2 For each random vector, compute the empirical variance and the empirical IQR and plot those pairs in a graph.

Varvector=NULL IQRvector=NULL

for(V in seq(1,1000,by=50)) { + x<-rnorm(1000,0,sqrt(V)) + IQRvector=c(IQRvector,IQR(x)) + Varvector=c(Varvector,var(x)) } plot(IQRvector,Varvector)


Exercise 2

2. E[1/X] vs. 1/E[X]

Let us consider the family of uniform distributions in the interval [100 − v, 100 + v] for v > 0

2.1. What are the mean/variance of the family?

x=[a,b] //a =100-v b=100+v

E=[a+b]/2 //mean

V= [b-a]^2/12 //variance

E=(100+v-(100-v))/2 =100 it means the mean is not depend the variance of this uniform distribution of interval.

V=((100+v) -(100-v))^2 /12 =(2v)^2/12 = v^2/3 which means, the variance is impacted exponentially depend on the v value.

2.2. For each v ∈ {1, 2, . . . 30}, draw a random vector of size 1000, compute its empirical variance v[X] as well as E[1/X] (simply mean(1/x) in R). Plot the pairs (E[1/X] − 1/E[X],

> for(v in seq(1,30,by=1))

+ { E=(100-v)+(100+v)/2

+ V=((100+v)-(100-v))^2/12

+ Vector_x<-rnorm(1000,E,V)

+ }

> for(v in seq(1,30,by=1))

+ { E=(100-v)+(100+v)/2


+ V=((100+v)-(100-v))^2/12

+ Vector_y<-rnorm(1000,1/E,V)

+ }

> plot(Vector_x,Vector_y)

Exercise 3

3. Dependence vs. similar distribution

3.1. Draw a random variable X and a random variable Y (both of size 10000) from the same exponential distribution of parameter λ = 2. Plot the qqplot and the scatterplot of X and Y . The scatterplot is simply obtained by plot(X,Y). In the scatterplot, it might be useful to zoom in where the mass is. You can adjust the x-axis (resp. y-axis) between the 10-th and 90-th quantiles of X (resp. Y) with the command :

> X<-rexp(10000,2)> Y<-rexp(10000,2)> plot(X,Y,main="Scatter Plot")> qqplot(X,Y,main="QQ Plot")


For Adjusment :

> min_x=quantile(X,0.1)> max_x=quantile(X,0.9)> min_y=quantile(Y,0.1)> max_y=quantile(Y,0.9)> X2<-X[X>min_x&X<max_x]> Y2<-Y[Y>min_y&Y<max_y]> plot(X2,Y2,main="Adjusted Scatter Plot")> qqplot(X2,Y2,main="Adjusted QQ Plot")>

3.2. Let Z = log(X) + 5. Plot the qqplot and the scatterplot of X and Z. Comment the results


The distribution of new vector Z follows the same distribution.We can see this via QQ Plot. and If we try to draw a scatter plot it will look like line because there is a relation between Z and X such that Z=a(x)+c , because a is a log of X vector the line will be convergent like logarithm function.

>Z<-log(X)+5> qqplot(Z,X,main=" QQ Plot X-Z”)> Z2<-log(X2)+5> qqplot(Z2,X2,main="Adjusted QQ Plot X2-Z2")


Exercise 4

4. Loss Events4.1 Data Cleaning

myfile=scan("~/Desktop/LAB/147.32.125.132.loss.txt")Read 3439 items

min=quantile(myfile,0.1)max=quantile(myfile,0.9)X<-myfileX2<-X[X>min&X<max]X2boxplot(X,X2)

myfile2=scan("~/Desktop/LAB/195.204.26.25.loss.txt")Read 16091 items

min2=quantile(myfile2,0.1)max2=quantile(myfile2,0.9)Y<-myfile2Y2<-Y[Y>min&Y<max]Y2boxplot(Y,Y2)


4.2 Assessing the exponential hypothesis

4.2.1. For each of the 2 connections (the cleaned versions obtained from the previous question), estimate the parameter of the exponential distribution that should model it.

First File

> myfile=scan("~/Desktop/LAB/147.32.125.132.loss.txt")

> Read 3439 items

> min=quantile(myfile,0.1)

> max=quantile(myfile,0.9)

> X<-myfile

> X2<-X[X>min&X<max]

> Mean_vector_x=NULL

> for(V in seq(1,1000,by=1)) {

+ x<-rnorm(1000,mean(X2),sqrt(var(X2)))

+ y<-sample(x,10)

+ Mean_vector_x<-c(Mean_vector_x,mean(y))

+ }

+ > hist(Mean_vector_x,main=“Sample Means")

+ > plot(Mean_vector_x,main=“Sample Means”)

Second File


Second File

> myfile2=scan("~/Desktop/LAB/195.204.26.25.loss.txt")

Read 16091 items

> min2=quantile(myfile2,0.1)

> max2=quantile(myfile2,0.9)

> Y<-myfile2

> Y2<-Y[Y>min&Y<max]

> Mean_vector_y=NULL

> for(V in seq(1,1000,by=1)) {

+ x<-rnorm(1000,mean(Y2),sqrt(var(Y2)))

+ y<-sample(x,10)

+ Mean_vector_y<-c(Mean_vector_y,mean(y))

+ }

+ > hist(Mean_vector_y,main=“Sample Means of Second File ")

+ > plot(Mean_vector_y,main=“Sample Means of Second File ")


4.2.2 For each of the 2 connections, generate a random vector following the exponential distribution of size 1000, represent the qqplot of each vector and the corresponding trace. Comment.

qqplot(Mean_vector_x,Mean_vector_y)

Exercise 5

5. Central limit theorem

• A uniform distribution between 0 and 1. • AnormaldistributionN(0,10) • A exponential distribution of parameter λ = 2

5.1 Report in a table the empirical (resp. theoretical) mean and standard deviation for each random vector (resp. random variable).

5.2 Prove that we are in the conditions of the theorem for each vector.


5.3 Towards which distribution should ︎(n)(Sn − μ) should converge in each case.

5.4 Represent in a table with three columns (one for each original distribution) and two rows corresponding to: • the histogram of the original distributions • S10

5.5 Report also the empirical mean and standard deviation for S10 for all cases.


using r tool for probability and statistics

Engineering