assignment 2 answers
TRANSCRIPT
-
8/2/2019 Assignment 2 Answers
1/18
ENCH 701/ENEN 621 Solution for Assignment #2
Question 1 10 points
Given:
Using the dataset below, assume a good/bad criterion that Y > 1 is good and cutoff x0 = 10%.
X, % Y
10.3 1.1
10.5 2.3
10.7 1.5
11 2.1
11 2.3
11.2 1.1
11.3 4.8
11.4 2
11.5 6.6
11.7 3.3
11.8 1.2
11.8 5.3
11.9 5
12 4.6
12.2 4.7
X, % Y
12.2 7.4
12.4 1.3
12.4 6.6
12.7 7.4
13.1 2
13.3 1.2
13.5 19
13.6 3.2
13.8 11
14 2.7
14.1 14
14.2 3.1
14.7 1.5
15 5.6
15.4 16
X, % Y
15.6 3.7
15.6 7.7
15.8 5.5
16 23
16.7 32
19.3 3
8.1 0.01
4.8 0.01
4.8 0.01
8.8 0.01
5.4 0.01
7.3 0.01
5 0.01
6.8 0.02
4.6 0.02
X, % Y
6.9 0.04
4.9 0.04
4.5 0.04
5.2 0.09
9.2 0.1
3.8 0.12
5.7 0.13
4.2 0.13
6.8 0.13
9.9 0.17
8.3 0.19
6.1 0.21
4.4 0.26
6.8 0.26
3.6 0.28
X, % Y
8.3 0.32
9.5 0.4
7.3 0.47
8.2 0.48
6.3 0.51
3.5 0.56
5 0.56
7.6 0.6
9.3 0.66
6.7 0.67
8.5 0.78
8.6 0.91
8.3 0.97
11.9 0.84
11 0.24
X, % Y
4.8 1.8
6.3 1.4
6.7 1.6
7 2.9
7.4 2
7.7 1.3
8.3 1.7
9.3 1.1
9.4 1.8
9.5 1.2
9.9 1.7
Required:
a. Express the probs of being in the NW, SE, SW, and NE quadrants as probs conditioned on X. 3 pts
b. What is the X cutoff which will equalize the NW and SE errors? Is this value the same as the X cutoff whichminimizes the total error? 5 pts
c. Explain why using the cutoff which equalizes the errors would be better than simply assuming Prob(Y 1) =Prob(NW) + Prob(NE). 2 pts
Solution:
a.
Prob(Y at NW | X < 10) = 11/(11 + 37) = 0.229
Prob(Y at SW | X < 10) = 37/(11 + 37) = 0.771
Prob(Y at NE | X > 10) = 36/(36 + 2) = 0.947
Prob(Y at SE | X > 10) = 2/(36 + 2) = 0.053
b.
At 8.6
-
8/2/2019 Assignment 2 Answers
2/18
c.
For the dataset we have, Prob( Y 1 ) is best estimated to be 47/86 = 0.55. However, we want toestimate whether Y 1 for areas where we have not measured Y but have measured X. Weknow we make mistakes about Y 1 when we use X. However, by choosing the X cutoff value
to be approximately 8.7%, we minimize the error of estimating Prob( Y 1).
Question 2: 5 pts ea =20 points
Given:
The dataset in Question 1
Required:
a) calculate and plot the sample CDF of each of X, Y, and log(Y) using pi = (i )/N and pi =
i/(N + 1). plot the sample CDFs using the different probability formulas on the same plot.1) where do the CDFs differ most?2) by how much are the deciles and IQR changed in value? (make a table to show results)
b) calculate the sample PDFs and test the results for stability by shifting and replotting a - bin
differently.
c) using a probability plot for each of X, Y, and log(Y), conclude whether any of the variablescould come from a normal distribution.
d) what characteristic of the Y values has a significant effect on the interpretations of the
probability plots for Y and log(Y)?
Solution:
a.
0
0.10.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 20 22
p
X,%
pi=(i1/2)/N
pi=i/(N+1)
-
8/2/2019 Assignment 2 Answers
3/18
X Y Log(Y)(i )/N i/(N+1) diff (i )/N i/(N+1) diff (i )/N i/(N+1) diff
Median 9.35 9.35 0 1.15 1.15 0 0.06 0.06 0
IQR 5.37 5.4 .03 2.86 2.89 0.03 1.11 1.13 0.021st Decile 4.74 4.71 0.03 0.02 0.02 0 -1.69 -1.74 0.052nd Decile 6.17 6.14 0.03 0.13 0.13 0 -0.89 -0.89 03th Decile 7.04 7.01 0.03 0.34 0.33 0.01 -0.47 -0.48 0.01
4th Decile 8.25 8.24 0.01 0.67 0.67 0 -0.17 -0.17 0
5th Decile 9.35 9.35 0 1.15 1.15 0 0.06 0.06 0
6th Decile 10.81 10.82 0.01 1.65 1.66 0.01 0.22 0.22 0
7th Decile 11.78 11.79 0.01 2.27 2.29 0.02 0.36 0.36 0
8th Decile 12.82 12.94 0.12 4.63 4.66 0.03 0.67 0.67 0
9th Decile 14.65 14.79 0.14 6.96 7.12 0.16 0.84 0.85 0.01
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
p
Y
pi=(i1/2)/N
pi=i/(N+1)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2.5 2 1.5 1 0.5 0 0.5 1 1.5 2
p
log(Y)
pi=(i1/2)/N
pi=i/(N+1)
-
8/2/2019 Assignment 2 Answers
4/18
The choice of probability formula makes only a small difference for the log(Y) case. The
differences are also quite small for the X and Y cases except for near the endpoints, where they
are somewhat larger.
b.
Using the rule of thumb, bin size: 5(Xmax - Xmin)/86 = 0.92, so bin size = 1 might work.
Stability is easier to judge with both histograms on the same plot. Here, we see fairly good
stability; mode locations are in similar positions and general shape is retained.
once again, stability is good for both these histograms.
-
8/2/2019 Assignment 2 Answers
5/18
c. X and log(Y) might come from normal distribution since most of the points fall near a straight
line, but there is a problem at the lower end of the datasets: this could be caused either by poor
sampling (eg the sampling avoided areas with smaller X or Y values or the measurement
apparatus stopped measuring below a minimum amount) or the variable comes from a mixed
distribution.
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18 20 22
N
DRV
EDRV
probabilityplotofX
-
8/2/2019 Assignment 2 Answers
6/18
d. Y appears to have a lower limit on the measurement, which is particularly apparent in the
log(Y) histogram and probability plots. 7 measurements at 0.01 suggest truncation at 0.01. This
same type of problem occurs in time-to-fail testing, when the testing is stopped before all
components fail.
Question 3: 20 points
Required:
15
10
5
0
5
10
15
20
15 10 5 0 5 10 15 20 25 30 35
NDRV
EDRV
probabilityplotofY
3
2.5
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
3 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5NDRV
EDRV
probabilityplotoflog(Y)
-
8/2/2019 Assignment 2 Answers
7/18
Create a 10x10 grid in Excel and label each row i = 1, 2, , 10, and each column j = 1, 2, ,
10. Then assign a random number Sij ~ U(0, 1) and assign a value M(i,j) to each cell, where M= 1 if Sij > p and M = 0 otherwise.
For each set of 100 values of M, check whether there is a percolating blob from left to right. Fill
in the following table, based on 10 sets of 100 values of M for each value of p.
p
Numberof
times
there
is
percolation
out
of
10
realizations
0.2
0.4
0.6
0.8
Suppose that a thin slice of rock has porosity only as one-sized voids. Given the results above,
what would the porosity have to be for a pathway for fluid flow to exist across the slice?
Solution:
-
8/2/2019 Assignment 2 Answers
8/18
-
8/2/2019 Assignment 2 Answers
9/18
p
porosity=1p Numberoftimesthereispercolationoutof10realizations
0.2 0.8 1
0.4 0.6 7/10
0.6 0.4 2/10
0.8
0.2
0
Porosity should be at least 0.4
Question 4: 15 pts derivation, 5 pts MC = 20
Required:
Derive the PDF for Z, where Z = (W + X + Y)/3 and W, X, Y are uniformly distributed with
0 and 1 as endpoints. Compare your answer using Monte Carlo for 30, 300, and 3000 totalrealizations on a crossplot with the 3 curves and where the Monte Carlo CDF and the theoretical
CDF values are on the vertical and horizontal axes.
Solution:
For uniform distribution:
1 / ( )( )
0 otherwiseX
b a a x bf x
x
Assume:
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Numberof
timesthereispercolation
outof10realizations
porosity
-
8/2/2019 Assignment 2 Answers
10/18
Z W X YZ 3ZS W X
a = 0, b = 1
first for deriving PDF of S:
10 10 10 10
10 10 1 1 0 If 0 1 and 0 1 then 0 2for 0 1:
1 1
for 1 2:
1 1
2
0
1
2
0 1 2
fX,
fW
X,W
-
8/2/2019 Assignment 2 Answers
11/18
now deriving PDF of Z:
s0 12 1 20otherwise 10 10
10 10 1 1 0 If 0 2 and 0 1 then 0 3for 0 1:
1
2 for 1 2:
1
1 2
323
for 2 3:
1 2
92 3 2
So:
0
1
2
0 1 2
fS
S
-
8/2/2019 Assignment 2 Answers
12/18
2 0 1 323 1 292 3
2 2 30
Z 3Z Z Z3
dZdZ 13 3 33
3
0 3 1 33 31 3 2 33 2 3 30or
3 32 0 1/3 32 33 31/3 2/392 33 3
2 2/3 10
0
0.2
0.4
0.6
0.8
0 1 2 3
fz'
z'
-
8/2/2019 Assignment 2 Answers
13/18
The CDFs of Z and Z are:
6 0 132 3
2
3 12 1 292 3
2
6 72 2 3
36 0 1/332 333
2 3
3 12 1/3 2/392 3 3 3
2 3
6 72 2/3 1
0
0.5
1
1.5
2
2.5
0 0.5 1
fz
z
-
8/2/2019 Assignment 2 Answers
14/18
This comparison of the analytical and MC results shows the analytical result is correct and that, as the
number of MC runs increases, the tails are better defined. Even 30 runs is sufficient for the middle part,
but the extremes need more runs to check these low-probability events.
The plots below are not required.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
CDF
Z
MC30
MC300
MC3000
TheoreticalCDF
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
MCof30realization
analyticalsolution0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
MCof300realization
analyticalsolution
-
8/2/2019 Assignment 2 Answers
15/18
Question 5: 5 pts each = 15 points
Given:
Marvin Moneybags is the chief financial officer of a small oil company, ABC Enterprises. ABC
has two oil fields, one in Peru (Field X) and the other in south Texas (Field Y). The 10th
, 50th
,
and 90th
percentiles of the oil reserves in X are estimated to be RX,0.1, RX,0.5, RX,0.9, respectively,and RY,0.1, RY,0.5, RY,0.9 for Field Y. To get the overall oil value distribution for the company,
Marvin calculates RT,0.1 = RX,0.1, + RY,0.1, and similarly for the 50th
and 90th
percentiles.
Required:
a) Has Marvin calculated the 10th
, 50th
, and 90th
percentiles of the total asset correctly? Why or
why not?b) If he has not calculated correctly, what are the probabilities of RT,0.1. RT,0.5, and RT,0.9?c) Justify your answer using an example with Monte Carlo.
Solution:
a) No. Marvin is wrong. Combining events requires calculating the combined probability. For example,
the event that the reserves of both fields X and Y are at the 10% points could be an event with 1%
probability.
b) since X and Y are independent:
Prob(RX RX,p) = p
Prob(RY
RY,p) = pProb(RT RT,p) = pp = p
2
Prob(RX > RX,p) = 1 - p
Prob(RY > RY,p) = 1 - p
Prob(RT > RT,p) = (1-p)(1-p) = (1-p)2
Prob(RT RT,p) = 1 - Prob(RT>RT,p) = 1 (1 p)2
So
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
MCof3000realization
analyticalsolution
-
8/2/2019 Assignment 2 Answers
16/18
RT, 0.1 actually has prob somewhere between 0.01 (= 0.12) and 0.19 (= 1 0.92)
RT,0.5 has prob somewhere between 0.25 and 0.75
RT,0.9 has prob somewhere between 0.81 and 0.99
c)
MC:
200 realizations,
X~ a triangular function with fX(x) = 2x for 0 x 1.
Y~ uniform(0,1)
T=X+Y
X0.1=0.30
Y0.1=0.08
T0.1=0.66
X0.1+ Y0.1 = 0.38 so the correct 10th
percentile 0.66 is underestimated by adding the 10th
percentiles of the componentsX0.5=0.73
Y0.5=0.51
T0.5=1.17
X0.5+ Y0.5=1.24
X0.9=0.95
Y0.9=0.93
T0.9=1.66
X0.9+ Y0.9=1.88
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.5 1 1.5 2 2.5
CDF
X,Y,T
x
y
T
T_lowerlimit
T_upperlimit
-
8/2/2019 Assignment 2 Answers
17/18
Question 6: 15 points total
Given:
Suppose X has the PDF1/ ( )
( )0 otherwise
X
b a a x bf x
x
and Y = X2,
Required:
Mathematically evaluate and plot the CDF of Y. (10 pts) On the same plot, show the CDF of Yobtained using Monte Carlo and N = 10, 100, and 200. (5 pts)
Solution:
Method 1:
Y = X2, 2, X = (Y/
dYdx 12x 1b a 12y
0.5b a
1 y
0.5b a Method 2:
x ab a y0.5 ab a
Assume: a = 0, b = 1,
-
8/2/2019 Assignment 2 Answers
18/18
This comparison of Monte Carlo and analytical solution makes a useful check that the analytical
solution is correct. It also shows us that the MC with 10 runs is not sufficient to characterize the
CDF. 100 or 200 runs are better and about a 1000 would be even better.
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3 3.5 4
CDF
Y
MCof10
MCof100
MCof200
analytical