parametric distributions - trinity college dublin
Post on 20-Dec-2021
14 Views
Preview:
TRANSCRIPT
1
Parametric Distributions
2
Definitions
• A random variable, X is a map from the
result of an experiment or observation to the
real numbers.
• The cumulative distribution function of a
random variable is defined through the
probability measure as FX(z)=P(X≤z).
• This is often written F(z).
3
Properties of F
• F() is non-decreasing.
• F() vanishes to 0 on lhs and increases to 1
on rhs.
• Note that F() is right continuous.
• For any such F(), a random variable can be
created. (Skorokhod Representation)
4
• Where F(x) can be written as the integral
from minus infinity to x of some function,
f(z),
– Then f(z) is termed a density (or pdf).
– Where this is expressible as a discrete sum, the
discrete function f(j) is also termed a pdf.
• A pdf will tell us which values of the R.V.
are most likely.
5
Important Note
• Thus, this idea is very general.
• Lots of F()s are possible.
• A closed functional form for F() and f() is not required.
• Exercise:
– Draw some ‘possible’ cdfs.
– Check that they fulfil the conditions.
– A note about empirical cdfs.
6
Example
• A lecturer is thinking of doing building work in his house, but is waiting to hear about the profits from a venture he was involved in before deciding whether to proceed.
• He knows he will get at least €10,000 net from the project.
• Things are going well, and it is likely that the actual returns will be around €20,000.
• There is an outside chance that €40,000 could be returned, but this is unlikely.
7
Example II
• A lecturer takes about 22 minutes to cycle
to work.
• On a good day, and pedaling hard, he can
make it in 15 minutes. The fastest he has
done it is 12 minutes.
• It would take 90 minutes to walk, so this is
a realistic upper bound for cycling.
8
Example III
• An SS MSISS is going on a J1 trip to America for
90 days.
• They expect to “socialise” with friends about three
times a week.
• The consequences of socialising results in a severe
hangover about 25% of the time.
• How many paracetamol are likely required for the
duration of the visit? (Assume 1 tablet per
hangover)
9
Parametric Forms
• Over the years, mathematicians have examined
functions that have the properties described.
• Many of these have arisen through considering
combinations of other simple functions.
• These functions have parameters, which can be
modified to change the shape of the curve.
• However, the overall functional form stays the
same.
10
Advantages Parametric Dists
• Properties and behaviours well understood.
• Moments can readily be calculated.
• Black box software available.
• Can readily communicate models to colleagues.
• Sufficiently flexible for most purposes.
• As realistic as empirical functions and may be more physically justifiable.
11
Disadvantages
• May not exactly match application (ease of
use vs tool availability compromise.)
• Results may be sensitive to distributional
assumptions.
• Sometimes easy to program without a full
understanding of what is going on –
downside of black box.
12
Some Models
• Bernoulli - Br(x|q) - dbinom(size = 1)
• Binomial - Bi(x|q,n) - dbinom()
• Poisson - Pn(x|l) - dpois()
• Beta - Be(x|a,b) - dbeta()
• Uniform - Un(x|a,b) - dunif()
• Gamma - Ga(x|a,b) - dgamma()
• Exponential Ex(x|q) - dexp()
• Normal - N(x|m,s) - dnorm()
13
Binomial
• Bi(x|q,n)
• Pdf f(x) =
– nCxqx(1-q)(n-x)
• E(x) = nq
• Var(x) =nq(1-q)
• Graph for n=9 and
q=0.5.
14
Binomial
• Cdf
• This is a step function,
since can only have
integer values.
15
Normal (Gaussian)
• N(x|m,s)
• Pdf f(x) =
– cexp{-0.5 s-2(x-m)2}
• E(x) = m
• Var(x) =s2
• Graph for m=0 and
s=1.0.
16
Normal
• Cdf
• This is smooth since
the underlying rv is
continuous.
• Note that neither 0 nor
1 is reached in the
plotted region.
17
Choosing Models
• Thus, for example, if one is interested in a
smoothly varying quantity, such as response rate,
then one might consider ‘modeling’ it using a
Normal distribution.
• If an ‘expert’ tells you that response rate is likely
to be around 7%, but could go from 5% to 9%,
neither of which is very likely, what values of
parameters for a Normal model might represent
this ‘belief’?
18
R
• Access via web page - also on lab machines.
• Command line interface.x <- seq(0, 5, length.int = 1000)
y <- dgamma(x, 2, 2)
plot(x, y, type = "l", col = 1)
• Sets up vector, x, taking sequential values between
0 and 5.
• Sets up y to be the pdf of x.
• Plots y as a function of x, as a line plot, in black.
19
Norm (7,0.5) vs (7,0.8)
20
Issues
• What if the ‘belief’ says that high response rates are more likely than low ones (skew)?
• Can you draw a density that might match?
• What if there is likely to be a response rate of around 6%, but if by chance a marketing stunt that is being run next week gets air time on radio, then the rate will be around 10%?
21
Exercise
• Write down a pdf for
– Skewed distribution
– Truncated distribution
– Mixture of distributions
• Show (in outline) that there exists a random
variable, which has as its pdf the quantity
that you have written down.
22
Gamma Distribution
• Ga(x|a,b) – shape
a and rate b
• Pdf f(x) =
– c x(a –1)exp(-bx)
• E(x) = a/ b
• Var(x) = a/(b2)
• Graph a=2, b=2
23
Use in modeling
• Thus, instead of fixing deterministic aspects of the
model, we can allow inputs to be defined by
parametric distributions.
• We still need to fix the parameters of the
distributions, but this may be much more realistic
than fixing values.
• Elicitation is the term given to the assignment of
parameters based on ‘expert opinion.’
24
Method
• Thus, we have the following method at the modeling step;
– Determine a ‘realistic’ model for the situation (conditional on particular values of inputs.)
– Examine which inputs have the biggest impact on the output variable of main interest.
– Model the uncertainty of the inputs through a probability distribution.
– Examine the impact on outputs.
25
Practicalities
• This can be done by;
– Examining the moments of the combinations of
random variables.
– Analytically (gives exact answer, but messy.)
– Simulation from the distributions of interest.
26
Simulation
• In order to ‘simulate’ values from the distribution of interest we need a system of generating random numbers.
• It suffices to be able to generate numbers from a uniform[0,1).
• If this can be done, then any random variable can be simulated.
• Example: Normsinv(Rand())
27
Exercise
• Examine each of the distributions listed
earlier in lectures.
• For each one, you should produce a pdf and
cdf for various parameters of interest.
• These graphs can readily be constructed in R. E.g., dnorm, pnorm.
28
Exercise II
• For the Norseman problem, examine the
impact of a response rate which is
unknown, but apriori believed to be
Normal, with mean 6% and standard
deviation 0.6% .
• Additionally, you might consider the impact
of Gamma distributed orders, with shape 10
and rate 12.
top related