cheatsheet final

8/10/2019 CheatSheet Final

http://slidepdf.com/reader/full/cheatsheet-final 1/4

Cheat Sheet

• Some Theorems Involving Sets:

(1) If A ⊂ B and B ⊂ C , then A ⊂ C (8) A − B = A ∩ B

(2) A ∪ B = B ∪ A (9) If A ⊂ B , then A ⊃ B or B ⊂ A

(3) A ∪ (B ∪ C ) = (A ∪ B) ∪ C = A ∪ B ∪ C (10) A ∪ ∅ = A, A ∩ ∅ = ∅(4) A ∩ B = B ∩ A (11) A ∪ U = U , A ∩ U = A(5) A ∩ (B ∩ C ) = (A ∩ B) ∩ C = A ∩ B ∩ C (12a) (A ∪ B)

= A

∩ B

(6) A ∩ (B ∪ C ) = (A ∩ B) ∪ (A ∩ C ) (12b) (A ∩ B) = A ∪ B

(7) A ∪ (B ∩ C ) = (A ∪ B) ∩ (A ∪ C ) (13) A = (A ∩ B) ∪ (A ∩ B)

• Fundamental principle of counting: If an operation consists of k steps, of which the first can be donein n1 ways, for each of these the second step can be done in n2 ways, for each of the first two the thirdstep can be done in n3 ways, an so forth, then the whole operation can be done in n1 · n2 · · · nk ways.

• Permutations and Combinations:

The number of permutation of n distinct objects taken r at a time is nP r = n!(n−r)!

for r = 0, 1, 2, · · · , n.

The number of permutations of n objects of which n1 are of one kind, n2 are of a second kind, · · · , nk areof a k th type, and n = n1 + n2 + · · · + nk is nP n1,n2,··· ,nk = n!

n1!·n2!···nk! .

The number of combinations of n distinct objects taken r at a time isnr

= n!

r!(n−r)!

for all r =

0, 1, 2, · · · , n.

• Binomial coefficient: (x + y)n = n

r=0

nr

xn−ryr.

• Some Important Theorems on Probability: Let Ai (i = 1, 2, 3, · · · n) be events of the sample spaceS , then

(1) If A1 ⊂ A2 then P (A1) ≤ P (A2) (2) For every event A, 0 ≤ P (A) ≤ 1

(3) If A is the complement of A, then P (A) = 1 − P (A) (4) P (∅) = 0

(5) If A and B are any two events, then P (A ∪ B) = P (A) + P (B) − P (A ∩ B). More generally, if A1, A2,A3 are any three events, then P (A1 ∪ A2 ∪ A3) = P (A1) + P (A2) + P (A3) − P (A1 ∩ A2) − P (A2 ∩ A3) −P (A3 ∩ A1) + P (A1 ∩ A2 ∩ A3).

• Conditional probability of B given A: P (B|A) = P (A

∩B)

P (A) or P (A ∩ B) = P (A)P (B|A). For any threeevents A1, A2, A3, we have P (A1 ∩ A2 ∩ A3) = P (A1)P (A2|A1)P (A3|A1 ∩ A2).

• Independent Events: Events A and B are independent if and only if P (A ∩ B) = P (A)P (B), or inother words P (B|A) = P (B). If A and B are independent, then A and B are also independent.

• Bayes’ Theorem: If B1, B2, · · · , Bk constitute a partition of a sample space S and P (Bi) = 0 for i =

1, 2, · · · , k, then for any event A in S such that P (A) = 0 P (Br|A) = P (Br)·P (A|Br)k

i=1 P (Bi)·P (A|Bi)

for r = 1, 2, · · · , k.

• Probability distributions:

I. Discrete case.

Discrete probability distributions (or probability functions): If X is a discrete randomvariable, the function given by f (x) = P (X = x) for each x within the range of X is called the probabilitydistribution of X . f (x) is a probability distribution if and only if (1) f (x) ≥ 0, and (2)

x f (x) = 1.

Distribution functions for discrete random variables: If X is a discrete random variable,the function given by F (x) = P (X ≤ x) =

t≤x f (t) for −∞ < x < ∞ where f (t) is the value of the

probability distribution of X at t, is called the distribution function of X . If F (x) is a distribution functionthen (1) F (−∞) = 0, F (∞) = 1, and (2) if a ≤ b, then F (a) ≤ F (b) for any real numbers a and b.

If the range of a random variable X consists of the values x1 < x2 < x3 < · · · < xn, then f (x1) =F (x1) and f (xi) = F (xi) − F (xi−1) for i = 2, 3, · · · , n.

II. Continuous case.

Continuous probability density functions: A function with values f (x), defined over the setof all real numbers, is called a probability density function of the continuous random variable X if and

only if P (a < X < b) = ba

f (x)dx. Where f (x) has the properties (1) f (x) ≥ 0, and (2) ∞

−∞ f (x)dx = 1.

Distribution functions for continuous random variables: If X is a continuous random va-riable and the value of its probability density at t is f (t), then the function given by F (x) = P (X ≤x) =

x−∞ f (t)dt is called the distribution function of X . If F (x) is a distribution function then (1)

F (−∞) = 0, F (∞) = 1, and (2) if a ≤ b, then F (a) ≤ F (b) for any real numbers a and b.



If f (x) and F (x) are the values of the probability density and the distribution of the continuousrandom variable X at x, then P (a < X < b) = F (b) − F (a) for any real a and b with a ≤ b, andf (x) = dF (x)/dx where the derivative exists.

• Joint distributions:

Joint probability distributions: If X and Y are discrete random variables, the function given byf (x, y) = P (X = x, Y = y ) for each pair of values (x, y) within the range of X and Y is called the jointprobability distribution of X and Y . f (x, y) is a joint probability distributions if and only if (1) f (x, y) ≥ 0and (2)

x

y f (x, y) = 1.

Joint distribution functions: If X and Y are discrete random variables, the function given byF (x, y) = P (X ≤ x, Y ≤ y) =

s≤x

t≤y f (s, t) for −∞ < x < ∞ and −∞ < y < ∞ where f (s, t) is the

value of the joint probability distribution of X and Y at (s, t), is called the joint distribution function of X and Y . If F (x, y) is the value of the joint distribution function of two discrete random variables X andY at (x, y), then (1) F (−∞, −∞) = 0, (2) F (∞, ∞) = 1, (3) if a < b and c < d, then F (a, c) ≤ F (b, d).

Marginal probability functions: If X and Y are discrete random variables and f (x, y) is thevalue of their joint probability distribution at (x, y), the function given by g(x) =

y f (x, y) for each x

within the range of X is called the marginal distribution of X . Correspondingly, the function given byh(y) =

x f (x, y) for each y within the range of Y is called the marginal distribution of Y .

Conditional distribution: If f (x, y) is the value of the joint probability distribution of the discrete

random variables X and Y at (x, y), and h(y) is the value of the marginal distribution of Y at y, thefunction given by f (x|y) = f (x,y)

h(y) (h(y) = 0) for each x within the range of X , is called the conditional

distribution of X given Y = y. Correspondingly, if g(x) is the value of the marginal distribution of X

at x, the function w(y|x) = f (x,y)g(x)

(g(x) = 0) for each y within the range of Y , is called the conditional

distribution of Y given X = x.

Independent random variables: Suppose that X and Y are discrete random variables. If theevents X = x and Y = y are independent events for all x and y, then we say that X and Y areindependent random variables. In such case, P (X = x, Y = y) = P (X = x) · P (Y = y) or equivalentlyf (x, y) = f (x|y) · h(y) = g(x) · h(y). Conversely, if for all x and y the joint probability function f (x, y)can be expressed as the product of a function of x alone and a function of y alone (which are then themarginal probability functions of X and Y ), X and Y are independent. If, however, f (x, y) cannot be soexpressed, then X and Y are dependent.

• Mathematical expectation:

I. Discrete case. Let f (x) be the probability distribution of X then the expected value of X isE (X ) =

x xf (x). Let X be a discrete random variable with probability distribution f (x), the expected

value of g (X ) is given by E [g(X )] =

x g(x)f (x).

II. Continuous case. Let f (x) be the probability density function of X then the expected value of X is E (X ) =

∞−∞ xf (x)dx. Let X be a continuous random variable with probability density f (x), the

expected value of g(X ) is given by E [g(X )] = ∞−∞ g(x)f (x)dx.

• Mathematical expectation:

I. Discrete case. If X is a discrete random variable and f (x) is the value of its probability distribution

at x, the expected value of X is E (X ) =

x xf (x). The expected value of g(X ) is given by E [g(X )] =

x g(x)f (x).

II. Continuous case. If X is a continuous random variable and f (x) is the value of its probabilitydensity at x, the expected value of X is E (X ) =

∞−∞ xf (x)dx. The expected value of g(X ) is given by

E [g(X )] = ∞−∞ g(x)f (x)dx.

III. Joint case. If X and Y are discrete random variables and f (x, y) is the value of their jointprobability distribution at (x, y), the expected value of g (X, Y ) is E [g(X, Y )] =

x

y g(x, y) · f (x, y).

Some theorems on expectation.

(1) If a and b are any constant, then E (aX + b) = aE (X ) + b.

(2) If X and Y are any random variables, then E (X + Y ) = E (X ) + E (Y ).

(3) If X and Y are independent random variables, then E (XY ) = E (X )E (Y ).

• Moments:

rth moment of X about the origin. The rth moment about the origin of a random variable X ,denoted by µr, is the expected value of X r; symbolically, µr = E (X r) =

x xrf (x) for r = 0, 1, 2, · · ·



when X is discrete, and µr = E (X r) =

∞−∞ xrf (x)dx when X is continuous. µ

1 is called the mean of thedistribution function of X , and it is denoted by µ.

rth moment of X about the mean. The rth moment about the mean of a random variable X ,denoted by µr, is the expected value of (X − µ)r; symbolically, µr = E [(X − µ)r] =

x(x − µ)rf (x) for

r = 0, 1, 2, · · · when X is discrete, and µr = E [(X − µ)r] = ∞−∞(x − µ)rf (x)dx when X is continuous. µ2

is called the variance of the distribution of X , and it is denoted by σ2, var(X ), or V (X ); σ , the positivesquare root of the variance, is called the standard deviation.

Some theorems on variance.

(1) σ2 = µ2 − µ2 = E (X 2) − [E (X )]2.

(2) If a and b are any constants then var(aX + b) = a2var(X ).

(3) If X and Y are independent random variables, v ar(X ± Y ) = var(X ) + var(Y ).

• Chebyshev’s inequality. Suppose that X is a random variable (discrete or continuous) having mean µ

and variance σ2, which are finite. Then if ε is any positive number P (|X − µ| ≥ ε) ≤ σ2

ε2 or, with ε = kσP (|X − µ| ≥ kσ) ≤ 1/k2.

• Moment generating function. The moment-generating function of a random variable X , where it exists,is given by M X(t) = E [etX ] =

x etxf (x) when X is discrete and M X(t) = E [etX ] =

∞−∞ etxf (x)dx

when X is continuous. drM X(t)

dtrt=0 = µr.

Some theorems on moment generating function. If a and b are constant, then

(1) M X+a(t) = E [e(X+a)t] = eatM X(t);

(2) M bX(t) = E (ebXt) = M X(bt);

(3) M X+ab

(t) = E

e(X+ab )t = ea

btM X

tb

.

• Product Moments.

Product moment about the origin. The rth and sth product moment about the origin of therandom variables X and Y , denoted by µr,s, is the expected value of X rY s ; symbolically µr,s = E (X rY s) =

x

y xrysf (x, y) for r = 0, 1, 2, · · · and s = 0, 1, 2, · · · when X and Y are discrete. Note that µ1,0 =

E (X ), which we denote here µX , and that µ0,1 = E (Y ), which we denote here by µY .

Product moment about the mean. The rth and sth product moment about the means of therandom variables X and Y , denoted by µr,s, is the expected value of (X − µX)r(Y − µY )

s; symbolicallyµr,s = E [(X − µX)r(Y − µY )

s] =

x

y(x − µX)r(y − µY )

sf (x, y) for r = 0, 1, 2, · · · and s = 0, 1, 2, · · ·when X and Y are discrete. µ1,1 is called the covariance of X and Y , and it is denoted by σXY , cov(X, Y ),or C (X, Y ).

Some theorems on covariance.

(1) σXY = C ov(X, Y ) = E (XY ) − E (X )E (Y ) = µ1,1 − µXµY .

(2) If X and Y are independent random variables, then σXY = C ov(X, Y ) = 0.

(3) V ar(X ± Y ) = V ar(X ) + V ar(Y ) ± 2Cov(X, Y ).

• Conditional expectation. If X is a discrete random variable and f (x|y) is the value of the conditional

probability distribution of X given Y = y at x, then

Conditional expectation. the conditional expectation of u(X ) given Y = y is E [u(X )|y] =x u(x)f (x|y).

Conditional mean. the conditional mean of the random variable X given Y = y, is µX|y = E (X |y) =x xf (x|y).

Conditional variance. the conditional variance of the random variable X given Y = y is σ2X|y =

E [(X − µX|y)2|y] = E (X 2|y) − µ2X|y .

• Special Probability Distributions.

The Discrete Uniform Distribution. A random variable X has a discrete uniform distribution if and only if its probability distribution is given by f (x) = 1

k for x = x1, x2, · · · , xk where xi = xj when

i = j .The Bernoulli Distribution. A random variable X has a Bernoulli distribution if and only if its

probability distribution is given by f (x; θ) = θx(1 − θ)1−x for x = 0, 1. µ = θ , σ2 = θ(1 − θ), µr = θ forr = 1, 2, 3, · · · .



The Binomial Distribution. A random variable X has a binomial distribution if and only if itsprobability distribution is given by b(x; n, θ) =

nx

θx(1 − θ)n−x for x = 0, 1, 2 · · · ,n. M X(t) = [1 + θ(et −

1)]n, µ = nθ and σ2 = nθ(1 − θ).

The Negative Binomial and Geometric Distribution. A random variable X has negative bi-nomial distribution if and only if b∗(x; k, θ) =

x−1k−1

θk(1 − θ)x−k for x = k, k + 1, k + 2, · · · . b∗(x; k, θ) =

kx · b(k; x, θ), µ = k

θ , σ2 = kθ

1θ − 1

. For k = 1 we obtain the geometric distribution denoted by g(x; θ).

The Hypergeometric Distribution. A random variable X has a hypergeometric distribution if and only if its probability distribution is given by h(x; n,N,M ) =

M x

N −M n−x

/N n

for x = 0, 1, 2, · · · , n,

x ≤ M and n − x ≤ N − M . µ = nM N , σ2 = nM (N − M )(N − n)/{N 2(N − 1)}.

The Binomal Approximation to the Hypergeometric Distribution. As a rule of thumb,if n is not exceed 5 percent of N , then we may use binomial probabilities in place of hypergeometricprobabilities.

The Poisson Distribution. A random variable X has a Poisson distribution if and only if its

probability distribution is given by p(x; λ) = λxe−λ

x! for x = 0, 1, 2 · · · where λ is the mean of successes.

µ = σ2 = λ, M X(t) = eλ(et−1).

The Poisson Approximation to the Binomial Distribution. Poisson distribution will providea good approximation to binomial probabilities when n ≥ 20 and θ ≤ 0.05. When n ≥ 100 and nθ < 10,the approximation will generally be excellent.

• Measures of Central Tendency

Mean. The mean is equal to the sum of all the values in the data set divided by the number of valuesin the data set.

Median. The median is the middle score for a set of data that has been arranged in order of magnitude.

Mode. The mode is the most frequently occurring value in a set of values.

• Sampling Distributions.

Sample Mean and Sample Variance If X 1, X 2, · · · , X n are constitute a random sample, then the

sample mean is given by X =

n

i=1 Xi

n and the sample variance is given by S 2 =

n

i=1(Xi−X)2

n−1 .

Some theorems on sampling distributions of means.

(1) If X 1, X 2, · · · , X n are constitute a random sample from an infinite population with mean µ and

the variance σ2, then E (X ) = µ and var(X ) = σ2

n .

(2) For any positive constant c, the probability that X will take on a value between µ − c and µ + c

is at least 1 − σ2

nc2 . When n → ∞, this probability approaches 1.

(3) If X 1, X 2, · · · , X n are constitute a random sample from an infinite population with mean µ,

the variance σ2, and the moment-generating function M X(t), then the limiting distribution of Z = X−µσ/√ n

as n → ∞ is the standard normal distribution.

(4) If X is the mean of a random sample of size n from a normal population with mean µ and thevariance σ2, its sampling distribution is a normal distribution with mean µ and the variance σ2/n.

(5) If X is the mean of a random sample of size n taken without replacement from a finite populationof size N with mean µ and the variance σ2, the E (X ) = µ and v ar(X ) = σ2

nN −nN −1

.

• The Estimation of Means. If θ̂1 and θ̂2 are values of the random variables Θ̂1 and Θ̂2 such thatP (Θ̂1 < θ < Θ̂2) = 1 − α for some specified probability 1 − α, we refer to the interval θ̂1 < θ < θ̂2 as a(1 − α)100% confidence interval for θ. The probability 1 − α is called the degree of confidence, and theendpoints of the interval are called the lower and upper confidence limits.

Some theorems on estimation of means.

(1) If X , the mean of a random sample of size n from a normal population with the known varianceσ2, is to be used as an estimator of the mean of the population, the probability is 1 − α that the errorwill be less than zα/2 · σ√

n.

(2) If x is the value of the mean of a random sample of size n from a normal population with theknown variance σ2, then x − zα/2 · σ√

n < µ < x + zα/2 · σ√

n is a (1 − α)100% confidence interval for the

mean of the population.

cheatsheet final

Documents