assignment 2 answers

8/2/2019 Assignment 2 Answers

1/18

ENCH 701/ENEN 621 Solution for Assignment #2

Question 1 10 points

Given:

Using the dataset below, assume a good/bad criterion that Y > 1 is good and cutoff x0 = 10%.

X, % Y

10.3 1.1

10.5 2.3

10.7 1.5

11 2.1

11 2.3

11.2 1.1

11.3 4.8

11.4 2

11.5 6.6

11.7 3.3

11.8 1.2

11.8 5.3

11.9 5

12 4.6

12.2 4.7

X, % Y

12.2 7.4

12.4 1.3

12.4 6.6

12.7 7.4

13.1 2

13.3 1.2

13.5 19

13.6 3.2

13.8 11

14 2.7

14.1 14

14.2 3.1

14.7 1.5

15 5.6

15.4 16

X, % Y

15.6 3.7

15.6 7.7

15.8 5.5

16 23

16.7 32

19.3 3

8.1 0.01

4.8 0.01

4.8 0.01

8.8 0.01

5.4 0.01

7.3 0.01

5 0.01

6.8 0.02

4.6 0.02

X, % Y

6.9 0.04

4.9 0.04

4.5 0.04

5.2 0.09

9.2 0.1

3.8 0.12

5.7 0.13

4.2 0.13

6.8 0.13

9.9 0.17

8.3 0.19

6.1 0.21

4.4 0.26

6.8 0.26

3.6 0.28

X, % Y

8.3 0.32

9.5 0.4

7.3 0.47

8.2 0.48

6.3 0.51

3.5 0.56

5 0.56

7.6 0.6

9.3 0.66

6.7 0.67

8.5 0.78

8.6 0.91

8.3 0.97

11.9 0.84

11 0.24

X, % Y

4.8 1.8

6.3 1.4

6.7 1.6

7 2.9

7.4 2

7.7 1.3

8.3 1.7

9.3 1.1

9.4 1.8

9.5 1.2

9.9 1.7

Required:

a. Express the probs of being in the NW, SE, SW, and NE quadrants as probs conditioned on X. 3 pts

b. What is the X cutoff which will equalize the NW and SE errors? Is this value the same as the X cutoff whichminimizes the total error? 5 pts

c. Explain why using the cutoff which equalizes the errors would be better than simply assuming Prob(Y 1) =Prob(NW) + Prob(NE). 2 pts

Solution:

a.

Prob(Y at NW | X < 10) = 11/(11 + 37) = 0.229

Prob(Y at SW | X < 10) = 37/(11 + 37) = 0.771

Prob(Y at NE | X > 10) = 36/(36 + 2) = 0.947

Prob(Y at SE | X > 10) = 2/(36 + 2) = 0.053

b.

At 8.6


2/18

c.

For the dataset we have, Prob( Y 1 ) is best estimated to be 47/86 = 0.55. However, we want toestimate whether Y 1 for areas where we have not measured Y but have measured X. Weknow we make mistakes about Y 1 when we use X. However, by choosing the X cutoff value

to be approximately 8.7%, we minimize the error of estimating Prob( Y 1).

Question 2: 5 pts ea =20 points

Given:

The dataset in Question 1

Required:

a) calculate and plot the sample CDF of each of X, Y, and log(Y) using pi = (i )/N and pi =

i/(N + 1). plot the sample CDFs using the different probability formulas on the same plot.1) where do the CDFs differ most?2) by how much are the deciles and IQR changed in value? (make a table to show results)

b) calculate the sample PDFs and test the results for stability by shifting and replotting a - bin

differently.

c) using a probability plot for each of X, Y, and log(Y), conclude whether any of the variablescould come from a normal distribution.

d) what characteristic of the Y values has a significant effect on the interpretations of the

probability plots for Y and log(Y)?

Solution:

a.

0

0.10.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 20 22

p

X,%

pi=(i1/2)/N

pi=i/(N+1)


3/18

X Y Log(Y)(i )/N i/(N+1) diff (i )/N i/(N+1) diff (i )/N i/(N+1) diff

Median 9.35 9.35 0 1.15 1.15 0 0.06 0.06 0

IQR 5.37 5.4 .03 2.86 2.89 0.03 1.11 1.13 0.021st Decile 4.74 4.71 0.03 0.02 0.02 0 -1.69 -1.74 0.052nd Decile 6.17 6.14 0.03 0.13 0.13 0 -0.89 -0.89 03th Decile 7.04 7.01 0.03 0.34 0.33 0.01 -0.47 -0.48 0.01

4th Decile 8.25 8.24 0.01 0.67 0.67 0 -0.17 -0.17 0

5th Decile 9.35 9.35 0 1.15 1.15 0 0.06 0.06 0

6th Decile 10.81 10.82 0.01 1.65 1.66 0.01 0.22 0.22 0

7th Decile 11.78 11.79 0.01 2.27 2.29 0.02 0.36 0.36 0

8th Decile 12.82 12.94 0.12 4.63 4.66 0.03 0.67 0.67 0

9th Decile 14.65 14.79 0.14 6.96 7.12 0.16 0.84 0.85 0.01

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

p

Y

pi=(i1/2)/N

pi=i/(N+1)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2.5 2 1.5 1 0.5 0 0.5 1 1.5 2

p

log(Y)

pi=(i1/2)/N

pi=i/(N+1)


4/18

The choice of probability formula makes only a small difference for the log(Y) case. The

differences are also quite small for the X and Y cases except for near the endpoints, where they

are somewhat larger.

b.

Using the rule of thumb, bin size: 5(Xmax - Xmin)/86 = 0.92, so bin size = 1 might work.

Stability is easier to judge with both histograms on the same plot. Here, we see fairly good

stability; mode locations are in similar positions and general shape is retained.

once again, stability is good for both these histograms.


5/18

c. X and log(Y) might come from normal distribution since most of the points fall near a straight

line, but there is a problem at the lower end of the datasets: this could be caused either by poor

sampling (eg the sampling avoided areas with smaller X or Y values or the measurement

apparatus stopped measuring below a minimum amount) or the variable comes from a mixed

distribution.

0

2

4

6

8

10

12

14

16

18

20

0 2 4 6 8 10 12 14 16 18 20 22

N

DRV

EDRV

probabilityplotofX


6/18

d. Y appears to have a lower limit on the measurement, which is particularly apparent in the

log(Y) histogram and probability plots. 7 measurements at 0.01 suggest truncation at 0.01. This

same type of problem occurs in time-to-fail testing, when the testing is stopped before all

components fail.

Question 3: 20 points

Required:

15

10

5

0

5

10

15

20

15 10 5 0 5 10 15 20 25 30 35

NDRV

EDRV

probabilityplotofY

3

2.5

2

1.5

1

0.5

0

0.5

1

1.5

2

2.5

3 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5NDRV

EDRV

probabilityplotoflog(Y)


7/18

Create a 10x10 grid in Excel and label each row i = 1, 2, , 10, and each column j = 1, 2, ,

10. Then assign a random number Sij ~ U(0, 1) and assign a value M(i,j) to each cell, where M= 1 if Sij > p and M = 0 otherwise.

For each set of 100 values of M, check whether there is a percolating blob from left to right. Fill

in the following table, based on 10 sets of 100 values of M for each value of p.

p

Numberof

times

there

is

percolation

out

of

10

realizations

0.2

0.4

0.6

0.8

Suppose that a thin slice of rock has porosity only as one-sized voids. Given the results above,

what would the porosity have to be for a pathway for fluid flow to exist across the slice?

Solution:


8/18


9/18

p

porosity=1p Numberoftimesthereispercolationoutof10realizations

0.2 0.8 1

0.4 0.6 7/10

0.6 0.4 2/10

0.8

0.2

0

Porosity should be at least 0.4

Question 4: 15 pts derivation, 5 pts MC = 20

Required:

Derive the PDF for Z, where Z = (W + X + Y)/3 and W, X, Y are uniformly distributed with

0 and 1 as endpoints. Compare your answer using Monte Carlo for 30, 300, and 3000 totalrealizations on a crossplot with the 3 curves and where the Monte Carlo CDF and the theoretical

CDF values are on the vertical and horizontal axes.

Solution:

For uniform distribution:

1 / ( )( )

0 otherwiseX

b a a x bf x

x

Assume:

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Numberof

timesthereispercolation

outof10realizations

porosity


10/18

Z W X YZ 3ZS W X

a = 0, b = 1

first for deriving PDF of S:

10 10 10 10

10 10 1 1 0 If 0 1 and 0 1 then 0 2for 0 1:

1 1

for 1 2:

1 1

2

0

1

2

0 1 2

fX,

fW

X,W


11/18

now deriving PDF of Z:

s0 12 1 20otherwise 10 10

10 10 1 1 0 If 0 2 and 0 1 then 0 3for 0 1:

1

2 for 1 2:

1

1 2

323

for 2 3:

1 2

92 3 2

So:

0

1

2

0 1 2

fS

S


12/18

2 0 1 323 1 292 3

2 2 30

Z 3Z Z Z3

dZdZ 13 3 33

3

0 3 1 33 31 3 2 33 2 3 30or

3 32 0 1/3 32 33 31/3 2/392 33 3

2 2/3 10

0

0.2

0.4

0.6

0.8

0 1 2 3

fz'

z'


13/18

The CDFs of Z and Z are:

6 0 132 3

2

3 12 1 292 3

2

6 72 2 3

36 0 1/332 333

2 3

3 12 1/3 2/392 3 3 3

2 3

6 72 2/3 1

0

0.5

1

1.5

2

2.5

0 0.5 1

fz

z


14/18

This comparison of the analytical and MC results shows the analytical result is correct and that, as the

number of MC runs increases, the tails are better defined. Even 30 runs is sufficient for the middle part,

but the extremes need more runs to check these low-probability events.

The plots below are not required.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

CDF

Z

MC30

MC300

MC3000

TheoreticalCDF

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

MCof30realization

analyticalsolution0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

MCof300realization

analyticalsolution


15/18

Question 5: 5 pts each = 15 points

Given:

Marvin Moneybags is the chief financial officer of a small oil company, ABC Enterprises. ABC

has two oil fields, one in Peru (Field X) and the other in south Texas (Field Y). The 10th

, 50th

,

and 90th

percentiles of the oil reserves in X are estimated to be RX,0.1, RX,0.5, RX,0.9, respectively,and RY,0.1, RY,0.5, RY,0.9 for Field Y. To get the overall oil value distribution for the company,

Marvin calculates RT,0.1 = RX,0.1, + RY,0.1, and similarly for the 50th

and 90th

percentiles.

Required:

a) Has Marvin calculated the 10th

, 50th

, and 90th

percentiles of the total asset correctly? Why or

why not?b) If he has not calculated correctly, what are the probabilities of RT,0.1. RT,0.5, and RT,0.9?c) Justify your answer using an example with Monte Carlo.

Solution:

a) No. Marvin is wrong. Combining events requires calculating the combined probability. For example,

the event that the reserves of both fields X and Y are at the 10% points could be an event with 1%

probability.

b) since X and Y are independent:

Prob(RX RX,p) = p

Prob(RY

RY,p) = pProb(RT RT,p) = pp = p

2

Prob(RX > RX,p) = 1 - p

Prob(RY > RY,p) = 1 - p

Prob(RT > RT,p) = (1-p)(1-p) = (1-p)2

Prob(RT RT,p) = 1 - Prob(RT>RT,p) = 1 (1 p)2

So

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

MCof3000realization

analyticalsolution


16/18

RT, 0.1 actually has prob somewhere between 0.01 (= 0.12) and 0.19 (= 1 0.92)

RT,0.5 has prob somewhere between 0.25 and 0.75

RT,0.9 has prob somewhere between 0.81 and 0.99

c)

MC:

200 realizations,

X~ a triangular function with fX(x) = 2x for 0 x 1.

Y~ uniform(0,1)

T=X+Y

X0.1=0.30

Y0.1=0.08

T0.1=0.66

X0.1+ Y0.1 = 0.38 so the correct 10th

percentile 0.66 is underestimated by adding the 10th

percentiles of the componentsX0.5=0.73

Y0.5=0.51

T0.5=1.17

X0.5+ Y0.5=1.24

X0.9=0.95

Y0.9=0.93

T0.9=1.66

X0.9+ Y0.9=1.88

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5

CDF

X,Y,T

x

y

T

T_lowerlimit

T_upperlimit


17/18

Question 6: 15 points total

Given:

Suppose X has the PDF1/ ( )

( )0 otherwise

X

b a a x bf x

x

and Y = X2,

Required:

Mathematically evaluate and plot the CDF of Y. (10 pts) On the same plot, show the CDF of Yobtained using Monte Carlo and N = 10, 100, and 200. (5 pts)

Solution:

Method 1:

Y = X2, 2, X = (Y/

dYdx 12x 1b a 12y

0.5b a

1 y

0.5b a Method 2:

x ab a y0.5 ab a

Assume: a = 0, b = 1,


18/18

This comparison of Monte Carlo and analytical solution makes a useful check that the analytical

solution is correct. It also shows us that the MC with 10 runs is not sufficient to characterize the

CDF. 100 or 200 runs are better and about a 1000 would be even better.

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3 3.5 4

CDF

Y

MCof10

MCof100

MCof200

analytical

assignment 2 answers

Documents