assignment 2 answers

Upload: ivonne-navas

Post on 06-Apr-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Assignment 2 Answers

    1/18

    ENCH 701/ENEN 621 Solution for Assignment #2

    Question 1 10 points

    Given:

    Using the dataset below, assume a good/bad criterion that Y > 1 is good and cutoff x0 = 10%.

    X, % Y

    10.3 1.1

    10.5 2.3

    10.7 1.5

    11 2.1

    11 2.3

    11.2 1.1

    11.3 4.8

    11.4 2

    11.5 6.6

    11.7 3.3

    11.8 1.2

    11.8 5.3

    11.9 5

    12 4.6

    12.2 4.7

    X, % Y

    12.2 7.4

    12.4 1.3

    12.4 6.6

    12.7 7.4

    13.1 2

    13.3 1.2

    13.5 19

    13.6 3.2

    13.8 11

    14 2.7

    14.1 14

    14.2 3.1

    14.7 1.5

    15 5.6

    15.4 16

    X, % Y

    15.6 3.7

    15.6 7.7

    15.8 5.5

    16 23

    16.7 32

    19.3 3

    8.1 0.01

    4.8 0.01

    4.8 0.01

    8.8 0.01

    5.4 0.01

    7.3 0.01

    5 0.01

    6.8 0.02

    4.6 0.02

    X, % Y

    6.9 0.04

    4.9 0.04

    4.5 0.04

    5.2 0.09

    9.2 0.1

    3.8 0.12

    5.7 0.13

    4.2 0.13

    6.8 0.13

    9.9 0.17

    8.3 0.19

    6.1 0.21

    4.4 0.26

    6.8 0.26

    3.6 0.28

    X, % Y

    8.3 0.32

    9.5 0.4

    7.3 0.47

    8.2 0.48

    6.3 0.51

    3.5 0.56

    5 0.56

    7.6 0.6

    9.3 0.66

    6.7 0.67

    8.5 0.78

    8.6 0.91

    8.3 0.97

    11.9 0.84

    11 0.24

    X, % Y

    4.8 1.8

    6.3 1.4

    6.7 1.6

    7 2.9

    7.4 2

    7.7 1.3

    8.3 1.7

    9.3 1.1

    9.4 1.8

    9.5 1.2

    9.9 1.7

    Required:

    a. Express the probs of being in the NW, SE, SW, and NE quadrants as probs conditioned on X. 3 pts

    b. What is the X cutoff which will equalize the NW and SE errors? Is this value the same as the X cutoff whichminimizes the total error? 5 pts

    c. Explain why using the cutoff which equalizes the errors would be better than simply assuming Prob(Y 1) =Prob(NW) + Prob(NE). 2 pts

    Solution:

    a.

    Prob(Y at NW | X < 10) = 11/(11 + 37) = 0.229

    Prob(Y at SW | X < 10) = 37/(11 + 37) = 0.771

    Prob(Y at NE | X > 10) = 36/(36 + 2) = 0.947

    Prob(Y at SE | X > 10) = 2/(36 + 2) = 0.053

    b.

    At 8.6

  • 8/2/2019 Assignment 2 Answers

    2/18

    c.

    For the dataset we have, Prob( Y 1 ) is best estimated to be 47/86 = 0.55. However, we want toestimate whether Y 1 for areas where we have not measured Y but have measured X. Weknow we make mistakes about Y 1 when we use X. However, by choosing the X cutoff value

    to be approximately 8.7%, we minimize the error of estimating Prob( Y 1).

    Question 2: 5 pts ea =20 points

    Given:

    The dataset in Question 1

    Required:

    a) calculate and plot the sample CDF of each of X, Y, and log(Y) using pi = (i )/N and pi =

    i/(N + 1). plot the sample CDFs using the different probability formulas on the same plot.1) where do the CDFs differ most?2) by how much are the deciles and IQR changed in value? (make a table to show results)

    b) calculate the sample PDFs and test the results for stability by shifting and replotting a - bin

    differently.

    c) using a probability plot for each of X, Y, and log(Y), conclude whether any of the variablescould come from a normal distribution.

    d) what characteristic of the Y values has a significant effect on the interpretations of the

    probability plots for Y and log(Y)?

    Solution:

    a.

    0

    0.10.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 2 4 6 8 10 12 14 16 18 20 22

    p

    X,%

    pi=(i1/2)/N

    pi=i/(N+1)

  • 8/2/2019 Assignment 2 Answers

    3/18

    X Y Log(Y)(i )/N i/(N+1) diff (i )/N i/(N+1) diff (i )/N i/(N+1) diff

    Median 9.35 9.35 0 1.15 1.15 0 0.06 0.06 0

    IQR 5.37 5.4 .03 2.86 2.89 0.03 1.11 1.13 0.021st Decile 4.74 4.71 0.03 0.02 0.02 0 -1.69 -1.74 0.052nd Decile 6.17 6.14 0.03 0.13 0.13 0 -0.89 -0.89 03th Decile 7.04 7.01 0.03 0.34 0.33 0.01 -0.47 -0.48 0.01

    4th Decile 8.25 8.24 0.01 0.67 0.67 0 -0.17 -0.17 0

    5th Decile 9.35 9.35 0 1.15 1.15 0 0.06 0.06 0

    6th Decile 10.81 10.82 0.01 1.65 1.66 0.01 0.22 0.22 0

    7th Decile 11.78 11.79 0.01 2.27 2.29 0.02 0.36 0.36 0

    8th Decile 12.82 12.94 0.12 4.63 4.66 0.03 0.67 0.67 0

    9th Decile 14.65 14.79 0.14 6.96 7.12 0.16 0.84 0.85 0.01

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

    p

    Y

    pi=(i1/2)/N

    pi=i/(N+1)

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    2.5 2 1.5 1 0.5 0 0.5 1 1.5 2

    p

    log(Y)

    pi=(i1/2)/N

    pi=i/(N+1)

  • 8/2/2019 Assignment 2 Answers

    4/18

    The choice of probability formula makes only a small difference for the log(Y) case. The

    differences are also quite small for the X and Y cases except for near the endpoints, where they

    are somewhat larger.

    b.

    Using the rule of thumb, bin size: 5(Xmax - Xmin)/86 = 0.92, so bin size = 1 might work.

    Stability is easier to judge with both histograms on the same plot. Here, we see fairly good

    stability; mode locations are in similar positions and general shape is retained.

    once again, stability is good for both these histograms.

  • 8/2/2019 Assignment 2 Answers

    5/18

    c. X and log(Y) might come from normal distribution since most of the points fall near a straight

    line, but there is a problem at the lower end of the datasets: this could be caused either by poor

    sampling (eg the sampling avoided areas with smaller X or Y values or the measurement

    apparatus stopped measuring below a minimum amount) or the variable comes from a mixed

    distribution.

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    0 2 4 6 8 10 12 14 16 18 20 22

    N

    DRV

    EDRV

    probabilityplotofX

  • 8/2/2019 Assignment 2 Answers

    6/18

    d. Y appears to have a lower limit on the measurement, which is particularly apparent in the

    log(Y) histogram and probability plots. 7 measurements at 0.01 suggest truncation at 0.01. This

    same type of problem occurs in time-to-fail testing, when the testing is stopped before all

    components fail.

    Question 3: 20 points

    Required:

    15

    10

    5

    0

    5

    10

    15

    20

    15 10 5 0 5 10 15 20 25 30 35

    NDRV

    EDRV

    probabilityplotofY

    3

    2.5

    2

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    2.5

    3 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5NDRV

    EDRV

    probabilityplotoflog(Y)

  • 8/2/2019 Assignment 2 Answers

    7/18

    Create a 10x10 grid in Excel and label each row i = 1, 2, , 10, and each column j = 1, 2, ,

    10. Then assign a random number Sij ~ U(0, 1) and assign a value M(i,j) to each cell, where M= 1 if Sij > p and M = 0 otherwise.

    For each set of 100 values of M, check whether there is a percolating blob from left to right. Fill

    in the following table, based on 10 sets of 100 values of M for each value of p.

    p

    Numberof

    times

    there

    is

    percolation

    out

    of

    10

    realizations

    0.2

    0.4

    0.6

    0.8

    Suppose that a thin slice of rock has porosity only as one-sized voids. Given the results above,

    what would the porosity have to be for a pathway for fluid flow to exist across the slice?

    Solution:

  • 8/2/2019 Assignment 2 Answers

    8/18

  • 8/2/2019 Assignment 2 Answers

    9/18

    p

    porosity=1p Numberoftimesthereispercolationoutof10realizations

    0.2 0.8 1

    0.4 0.6 7/10

    0.6 0.4 2/10

    0.8

    0.2

    0

    Porosity should be at least 0.4

    Question 4: 15 pts derivation, 5 pts MC = 20

    Required:

    Derive the PDF for Z, where Z = (W + X + Y)/3 and W, X, Y are uniformly distributed with

    0 and 1 as endpoints. Compare your answer using Monte Carlo for 30, 300, and 3000 totalrealizations on a crossplot with the 3 curves and where the Monte Carlo CDF and the theoretical

    CDF values are on the vertical and horizontal axes.

    Solution:

    For uniform distribution:

    1 / ( )( )

    0 otherwiseX

    b a a x bf x

    x

    Assume:

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    Numberof

    timesthereispercolation

    outof10realizations

    porosity

  • 8/2/2019 Assignment 2 Answers

    10/18

    Z W X YZ 3ZS W X

    a = 0, b = 1

    first for deriving PDF of S:

    10 10 10 10

    10 10 1 1 0 If 0 1 and 0 1 then 0 2for 0 1:

    1 1

    for 1 2:

    1 1

    2

    0

    1

    2

    0 1 2

    fX,

    fW

    X,W

  • 8/2/2019 Assignment 2 Answers

    11/18

    now deriving PDF of Z:

    s0 12 1 20otherwise 10 10

    10 10 1 1 0 If 0 2 and 0 1 then 0 3for 0 1:

    1

    2 for 1 2:

    1

    1 2

    323

    for 2 3:

    1 2

    92 3 2

    So:

    0

    1

    2

    0 1 2

    fS

    S

  • 8/2/2019 Assignment 2 Answers

    12/18

    2 0 1 323 1 292 3

    2 2 30

    Z 3Z Z Z3

    dZdZ 13 3 33

    3

    0 3 1 33 31 3 2 33 2 3 30or

    3 32 0 1/3 32 33 31/3 2/392 33 3

    2 2/3 10

    0

    0.2

    0.4

    0.6

    0.8

    0 1 2 3

    fz'

    z'

  • 8/2/2019 Assignment 2 Answers

    13/18

    The CDFs of Z and Z are:

    6 0 132 3

    2

    3 12 1 292 3

    2

    6 72 2 3

    36 0 1/332 333

    2 3

    3 12 1/3 2/392 3 3 3

    2 3

    6 72 2/3 1

    0

    0.5

    1

    1.5

    2

    2.5

    0 0.5 1

    fz

    z

  • 8/2/2019 Assignment 2 Answers

    14/18

    This comparison of the analytical and MC results shows the analytical result is correct and that, as the

    number of MC runs increases, the tails are better defined. Even 30 runs is sufficient for the middle part,

    but the extremes need more runs to check these low-probability events.

    The plots below are not required.

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    CDF

    Z

    MC30

    MC300

    MC3000

    TheoreticalCDF

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    MCof30realization

    analyticalsolution0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    MCof300realization

    analyticalsolution

  • 8/2/2019 Assignment 2 Answers

    15/18

    Question 5: 5 pts each = 15 points

    Given:

    Marvin Moneybags is the chief financial officer of a small oil company, ABC Enterprises. ABC

    has two oil fields, one in Peru (Field X) and the other in south Texas (Field Y). The 10th

    , 50th

    ,

    and 90th

    percentiles of the oil reserves in X are estimated to be RX,0.1, RX,0.5, RX,0.9, respectively,and RY,0.1, RY,0.5, RY,0.9 for Field Y. To get the overall oil value distribution for the company,

    Marvin calculates RT,0.1 = RX,0.1, + RY,0.1, and similarly for the 50th

    and 90th

    percentiles.

    Required:

    a) Has Marvin calculated the 10th

    , 50th

    , and 90th

    percentiles of the total asset correctly? Why or

    why not?b) If he has not calculated correctly, what are the probabilities of RT,0.1. RT,0.5, and RT,0.9?c) Justify your answer using an example with Monte Carlo.

    Solution:

    a) No. Marvin is wrong. Combining events requires calculating the combined probability. For example,

    the event that the reserves of both fields X and Y are at the 10% points could be an event with 1%

    probability.

    b) since X and Y are independent:

    Prob(RX RX,p) = p

    Prob(RY

    RY,p) = pProb(RT RT,p) = pp = p

    2

    Prob(RX > RX,p) = 1 - p

    Prob(RY > RY,p) = 1 - p

    Prob(RT > RT,p) = (1-p)(1-p) = (1-p)2

    Prob(RT RT,p) = 1 - Prob(RT>RT,p) = 1 (1 p)2

    So

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 0.2 0.4 0.6 0.8 1

    MCof3000realization

    analyticalsolution

  • 8/2/2019 Assignment 2 Answers

    16/18

    RT, 0.1 actually has prob somewhere between 0.01 (= 0.12) and 0.19 (= 1 0.92)

    RT,0.5 has prob somewhere between 0.25 and 0.75

    RT,0.9 has prob somewhere between 0.81 and 0.99

    c)

    MC:

    200 realizations,

    X~ a triangular function with fX(x) = 2x for 0 x 1.

    Y~ uniform(0,1)

    T=X+Y

    X0.1=0.30

    Y0.1=0.08

    T0.1=0.66

    X0.1+ Y0.1 = 0.38 so the correct 10th

    percentile 0.66 is underestimated by adding the 10th

    percentiles of the componentsX0.5=0.73

    Y0.5=0.51

    T0.5=1.17

    X0.5+ Y0.5=1.24

    X0.9=0.95

    Y0.9=0.93

    T0.9=1.66

    X0.9+ Y0.9=1.88

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 0.5 1 1.5 2 2.5

    CDF

    X,Y,T

    x

    y

    T

    T_lowerlimit

    T_upperlimit

  • 8/2/2019 Assignment 2 Answers

    17/18

    Question 6: 15 points total

    Given:

    Suppose X has the PDF1/ ( )

    ( )0 otherwise

    X

    b a a x bf x

    x

    and Y = X2,

    Required:

    Mathematically evaluate and plot the CDF of Y. (10 pts) On the same plot, show the CDF of Yobtained using Monte Carlo and N = 10, 100, and 200. (5 pts)

    Solution:

    Method 1:

    Y = X2, 2, X = (Y/

    dYdx 12x 1b a 12y

    0.5b a

    1 y

    0.5b a Method 2:

    x ab a y0.5 ab a

    Assume: a = 0, b = 1,

  • 8/2/2019 Assignment 2 Answers

    18/18

    This comparison of Monte Carlo and analytical solution makes a useful check that the analytical

    solution is correct. It also shows us that the MC with 10 runs is not sufficient to characterize the

    CDF. 100 or 200 runs are better and about a 1000 would be even better.

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    0 0.5 1 1.5 2 2.5 3 3.5 4

    CDF

    Y

    MCof10

    MCof100

    MCof200

    analytical