generalized variance multivariate normal distribution
Post on 04-Feb-2017
239 Views
Preview:
TRANSCRIPT
Lecture #4 - 9/14/2005 Slide 1 of 57
Generalized VarianceMultivariate Normal Distribution
Lecture 4September 14, 2005Multivariate Analysis
Overview
Last Time
Today’s Lecture
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 2 of 57
Last Time
■ Matrices and vectors.
◆ Eigenvalues.
◆ Eigenvectors.
◆ Determinants.
■ Basic descriptive statistics using matrices:◆ Mean vectors.
◆ Covariance Matrices.
◆ Correlation Matrices.
Overview
Last Time
Today’s Lecture
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 3 of 57
Today’s Lecture
■ Generalized Variance (Chapter 3, Section 4).
■ Multivariate Normal Distribution (Chapter 4).
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 4 of 57
Sample Covariance Matrix
■ Recall the sample covariance matrix:
S =1
n − 1
n∑
i=1
(xi − x)2 =1
n − 1(X − 1x′)′(X − 1x′).
■ The overall sample covariance matrix gives a picture of thecovariation between each variable in the sample.
■ A single-number summary of this matrix can be provided bythe generalized variance.
■ Although the generalized variance is not used frequently, youwill see in later slides that this value is part of the multivariatenormal distribution.
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 5 of 57
Generalized Sample Variance
■ Generalized Sample Variance is computed by |S| (thedeterminant of the sample covariance matrix).
■ To begin, imagine a multidimensional cube (an ellipsoid) thatrepresents the end points of all the column vectors in thesample matrix X.
◆ Covariance matrix describes the overall spread of thatshape in each direction (if we could plot it).
◆ The equation (x − x)S−1(x − x) = a2 describes anequation where all points are equal distant from the meanx (i.e., it will form an ellipsoid).
◆ That ellipsoid will have axes proportional to the squareroots of the eigenvalues of S.
◆ The volume of that ellipsoid is equal to |S|1/2 (notice whathappens if we have a zero eigenvalue?
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 6 of 57
Generalized Sample Variance With R
■ The generalized sample variance is dependent upon thescale of the variables in the sample.
■ Because of the scale of these variables, the generalizedsample variance can be hard to interpret (much likevariances and covariances).
◆ The scale of a single variable may have a disproportionateimpact on the generalized variance (e.g., your sample hasthe variables of GPA and income in US dollars).
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 7 of 57
Generalized Sample Variance With R
■ Rather than using |S|, the sample correlation matrix can beused |R|.
■ The interpretation of the GSV is the same - the volume of theellipsoid that is formed by the standardized variables in thesample.
■ The difference in magnitude is proportional to the product ofthe variances of the variables in the sample.
■ SAS Example #1...
Overview
Generalized Variance
Generalized Sample Variance
Generalized Sample Variance
With RTotal Sample Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 8 of 57
Total Sample Variance
■ Another way to characterize the sample variance is with thetotal variance.
■ Total variance equals tr(S) = s11 + s22 + . . . + spp.
■ Describes the variability of the data without taking in toaccount the covariances.
■ Like generalized variance, the total sample variance reflectsthe overall spread of the data.
■ Many multivariate techniques refer use total sample variancein computation of variance accounted for.
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 9 of 57
Multivariate Normal Distribution
■ The generalization of the well-known normal distribution tomultiple variables is called the multivariate normaldistribution (MVN).
■ Many multivariate techniques rely on this distribution in somemanner.
■ Although real data may never come from a true MVN, theMVN provides a robust approximation, and has many nicemathematical properties.
■ Furthermore, because of the central limit theorem, manymultivariate statistics converge to the MVN distribution as thesample size increases.
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 10 of 57
Univariate Normal Distribution
■ The univariate normal distribution function is:
f(x) =1√
2πσ2e−[(x−µ)/σ]2/2
■ The mean is µ.
■ The variance is σ2.
■ The standard deviation is σ.
■ Standard notation for normal distributions is N(µ, σ2), whichwill be extended for the MVN distribution.
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 11 of 57
Univariate Normal Distribution
N(0, 1)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
Univariate Normal Distribution
x
f(x)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 12 of 57
Univariate Normal Distribution
N(0, 2)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
Univariate Normal Distribution
x
f(x)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 13 of 57
Univariate Normal Distribution
N(3, 1)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
Univariate Normal Distribution
x
f(x)
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 14 of 57
UVN - Notes
■ Recall that the area under the curve for the univariate normaldistribution is a function of the variance/standard deviation.
■ In particular:
P (µ − σ ≤ X ≤ µ + σ) = 0.683
P (µ − 2σ ≤ X ≤ µ + 2σ) = 0.954
■ Also note the term in the exponent:
(
(x − µ)
σ
)2
= (x − µ)(σ2)−1(x − µ)
■ This is the square of the distance from x to µ in standarddeviation units, and will be generalized for the MVN.
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 15 of 57
MVN
■ The multivariate normal distribution function is:
f(x) =1
(2π)p/2|Σ|1/2e−(x−µ)Σ
−1
(x−µ)/2
■ The mean vector is µ.
■ The covariance matrix is Σ.
■ Standard notation for multivariate normal distributions isNp(µ,Σ).
■ Visualizing the MVN is difficult for more than two dimensions,so I will demonstrate some plots with two variables - thebivariate normal distribution.
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 16 of 57
Bivariate Normal Plot #1
µ =
[
0
0
]
,Σ =
[
1 0
0 1
]
−4
−2
0
2
4
−4
−2
0
2
40
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 17 of 57
Bivariate Normal Plot #1a
µ =
[
0
0
]
,Σ =
[
1 0
0 1
]
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 18 of 57
Bivariate Normal Plot #2
µ =
[
0
0
]
,Σ =
[
1 0.5
0.5 1
]
−4
−2
0
2
4
−4
−2
0
2
40
0.05
0.1
0.15
0.2
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 19 of 57
Bivariate Normal Plot #2
µ =
[
0
0
]
,Σ =
[
1 0.5
0.5 1
]
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 20 of 57
MVN Contours
■ The lines of the contour plots denote places of equalprobability mass for the MVN distribution.
■ These contours can be constructed from the eigenvaluesand eigenvectors of the covariance matrix.
◆ The direction of the ellipse axes are in the direction of theeigenvalues.
◆ The length of the ellipse axes are proportional to theconstant times the eigenvector.
■ Specifically:
(x − µ)Σ−1(x − µ) = c2
has ellipsoids centered at µ, and has axes ±c√
λiei.
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 21 of 57
MVN Contours, Continued
■ Contours are useful because they provide confidenceregions for data points from the MVN distribution.
■ The multivariate analog of a confidence interval is given byan ellipsoid, where c is from the Chi-Squared distributionwith p degrees of freedom.
■ Specifically:
(x − µ)Σ−1(x − µ) = χ2p(α)
provides the confidence region containing 1 − α of theprobability mass of the MVN distribution.
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 22 of 57
MVN Contour Example
■ Imagine we had a bivariate normal distribution with:
µ =
[
0
0
]
,Σ =
[
1 0.5
0.5 1
]
■ The covariance matrix has eigenvalues and eigenvectors:
λ =
[
1.5
0.5
]
, E =
[
0.707 −0.707
0.707 0.707
]
■ We want to find a contour where 95% of the probability willfall, corresponding to χ2
2(0.05) = 5.99
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 23 of 57
MVN Contour Example
■ This contour will be centered at µ.
■ Axis 1:
µ ±√
5.99 × 1.5
[
0.707
0.707
]
=
[
2.12
2.12
]
,
[
−2.12
−2.12
]
■ Axis 2:
µ ±√
5.99 × 0.5
[
−0.707
0.707
]
=
[
−1.22
1.22
]
,
[
1.22
−1.22
]
Overview
Generalized Variance
MVN
Univariate Review
MVN
MVN Contours
MVN Properties
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 24 of 57
MVN Properties
■ The MVN distribution has some convenient properties.
■ If X has a multivariate normal distribution, then:
1. Linear combinations of X are normally distributed.
2. All subsets of the components of X have a MVNdistribution.
3. Zero covariance implies that the correspondingcomponents are independently distributed.
4. The conditional distributions of the components are MVN.
Overview
Generalized Variance
MVN
Distributions
UVN CLT
Multi CLT
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 25 of 57
Distribution of x and S
Recall back in Univariate statistics you discussed the CentralLimit Theorem (CLT)
It stated that, if the set of n observations x1, x2, . . . , xn werenormal or not...
■ The distribution of x would be normal with mean equal to µand variance σ2/n
■ We were also told that (n − 1)s2/σ2 had a Chi-Squaredistribution with n − 1 degrees of freedom
■ Note: We ended up using these pieces of information forhypothesis testing such as t-test and ANOVA.
Overview
Generalized Variance
MVN
Distributions
UVN CLT
Multi CLT
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 26 of 57
Distribution of x and S
We also have a Multivariate Central Limit Theorem (CLT)
It states that, if the set of n observations x1, x2, . . . , xn aremultivariate normal or not...
■ The distribution of x would be normal with mean equal to µ
and variance/covariance matrix Σ/n
■ We are also told that (n − 1)S will have a Wishartdistribution, Wp(n − 1,Σ), with n − 1 degrees of freedom
◆ This is the multivariate analogue to a Chi-Squaredistribution.
■ Note: We will end up using some of this information formultivariate hypothesis testing.
Overview
Generalized Variance
MVN
Distributions
UVN CLT
Multi CLT
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 27 of 57
Distribution of x and S
■ Therefore, let x1, x2, . . . , xn be independent observationsfrom a population with mean µ and covariance Σ
■ The following are true:
◆√
n(
X − µ)
is approximately Np(0,Σ).
◆ n (X − µ)′ S−1 (X − µ) is approximately χ2
p.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 28 of 57
Assessing Normality
■ Recall from earlier that IF the data have a Multivariatenormal distribution then all of the previously discussedproperties will hold.
■ So we will want to have at least a set of test/methods toassess Multivariate Normality.
◆ We said that if all marginal distributions are NOT normalthen the joint distribution can not be MVN. So we will firsttalk about the assessing normality
◆ We also said even if all marginals are normally distributedthe joint is not necessarily MVN, So we will then assessMultivariate Normality.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 29 of 57
Assessing Normality
■ We will find that there are two ways to assessnormality/MVN.
1. By comparing the distribution of your observations (orsome transformation of your observations) to some knowndistribution. (These are commonly called Q-Q plots)
2. By computing some set of statistics and obtaining ap-value (i.e., compute a statistic with a known distributionand determine how extreme the statistic is compared to anull hypothesis).
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 30 of 57
Assessing Univariate Normality
We begin with Assessing Univariate Normality using a Q-Qplot.■ A Q-Q plot is a plot that matches the Quantiles of the
observed data with the Quantiles of a specific distribution.
■ A Quantile (commonly called a percentile) is that value suchthat a specific proportion p of the population will score at orbelow.
◆ For example the .5 quantile of a N(0,1) is 0.
■ In our case the Quantiles of a specific distribution will be anormal, N(0, 1).
◆ It could be a N(x, s2x), if preferred.
■ There should be a linear relationship between the quantilesof the observed data with their theoretical quantiles(assuming the distribution) if they follow the samedistribution.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 31 of 57
Constructing a Q-Q plot
Lets assume that we have n observations x1, x2, . . . , xn. Toconstruct a Q-Q plot we:
1. Order the observations from smallest to largest (i.e.,x(1) ≤ y(2) ≤ . . . ≤ x(n) ).
2. Next we define the ith point, x(i), as the (i − .5)/n quantile.
■ We could use i/n but can cause problems.
3. Based on a N(0, 1) distribution we compute the quantilevalues q1, q2, . . . , qn (this is typically done using a table orcomputer).
4. Finally plot (x(i), qi), and if they follow the same distribution(Normal) they should form a line.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 32 of 57
Example Q-Q plot
Lets assume that we have 5 observations: 3, 6, 4, 5, 2:
First we order them
y(i) (i − .5)/n qi
23456
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 33 of 57
Example Q-Q plot
Lets assume that we have 5 observations: 3, 6, 4, 5, 2:
Next compute quantiles
y(i) (i − .5)/n qi
2 (1 − .5)/5 = .1
3 (2 − .5)/5 = .3
4 (3 − .5)/5 = .5
5 (4 − .5)/5 = .7
6 (5 − .5)/5 = .9
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 34 of 57
Example Q-Q plot
Lets assume that we have 5 observations: 3, 6, 4, 5, 2:
Finally compute quantiles values assuming N(0, 1) (i.e., this isa z-score)
y(i) (i − .5)/n qi
2 (1 − .5)/5 = .1 -1.283 (2 − .5)/5 = .3 -0.524 (3 − .5)/5 = .5 0.005 (4 − .5)/5 = .7 0.526 (5 − .5)/5 = .9 1.28
and plot
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 35 of 57
Example Q-Q plot
Notice how it follows nearly a straight line
Figure 1: Q-Q plot
Q
1.5 1.0 .5 0.0 -.5 -1.0 -1.5
Y
7
6
5
4
3
2
1
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 36 of 57
Three Tests for Normality
■ The remaining methods are tests for normality
■ In each case it is computing a statistic and checking forsignificance.
■ I will mention these because in some journals this ismentioned.
■ However, because we will be mostly interested in MVN onecould quickly check for normality and the if happy, test forMVN.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 37 of 57
Test 1
Begin by computing Skewness and Kurtosis
Skewness,√
b1
√
b1 =
√n
∑ni=1(yi − y)3
[∑n
i=1(yi − y)2]3/2
Kurtosis, b2
b2 =n
∑ni=1(yi − y)4
[∑n
i=1(yi − y)2]2
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing Normality
Uni Norm
Make a Q-Q plot
Example Q-Q plot
Normality Tests
Test 1
Other Tests
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 38 of 57
Other Tests
Other tests can be gathered in SAS (note the null hypothesis isalways that the data is normally distributed).proc univariate data=mydata normal plot;var x1-x5;run;
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 39 of 57
Assessing MVN
■ Notice that many of the procedures that we discussed(Including the Q-Q plot) required that we rank order theobservations.
■ If instead of single observations, we have a set of nobservations of p variables (x1, x1, . . . , xn) ordering allobservations is much more difficult.
■ In fact, the book mentions that any test for MVN is far moredifficult and often has little power because of the number ofobservations on p-space, but some check should be done.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 40 of 57
Scatter Plots
Here we will discuss two graphical methods
■ Possibly the easiest method to assess MVN is to use scatterplots.
■ If there are not a large number of variables consider lookingscatter plots of all pairs of variables.
◆ Recall that one property of a MVN is that all subsets ofvariables should also be multivariate (in this casebivariate) normal.
◆ This means any relationship between variables should belinear and otherwise should have a random pattern
■ Could also look at all sets of three variables should also havetri-variate normal distributions.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 41 of 57
Q-Q Plots
As an alternative (for example if there are too many variableswe could use a Q-Q plot
■ Our Q-Q plot will be based on D2 = (x − x)′S−1(y − y).Note, x is each entity.
◆ Recall that D2 = (x − µ)′Σ−1(x − µ) has a χ2p distribution
with p degrees of freedom.
◆ We could use that to compute a Q-Q plot (i.e., use the χ2p
distribution to get the values of qi).
◆ However in estimating the mean vector and variancecovariance matrix this plot could be misleading.
■ Instead, some suggest using a function of D2.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 42 of 57
Q-Q Plots
We will define a new variable ui, where
ui = D2i .
1. Order the observations from smallest to largest (i.e.,u(1) ≤ u(2) ≤ . . . ≤ u(n) ).
2. Next we define the ith point, u(i), as the (i − .5)/n quantile.
3. Based on a χ2p distribution, we compute the quantile values
q1, q2, . . . , qn (this is typically done using a table orcomputer) where:
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Assessing MVN
Scatter Plots
Q-Q Plots
Tests
Outliers
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 43 of 57
Tests
■ Tests based on the Skewness and Kurtosis also exist.
■ These are a generalizations of the univariate tests.
■ You will not be tested on this, but know where it is just incase you ever need a test.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 44 of 57
Outliers
■ Detecting outliers is difficult in Multiple dimensions.
◆ Can’t simply order them and look for extreme values.
◆ May not be able to see in only 2-D plots.
◆ Could be different degrees of recording errors.
■ Here I discuss one method for detecting Outliers
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 45 of 57
Outliers Using D2
Wilk’s statistics was designed to detect a single outlier:■ Compute w where
w = maxi
|(n − 2)S−i||(n − 1)S|
■ To simplify it can be shown that w also equals:
w = 1 −nD2
(n)
(n − 1)2
■ The distribution for w is the F distribution, however the onlyunknown is D2
(n) and so we can just make decisions basedon it.
■ Therefore a test can be made that is based on the D2, whichis also computed to assess MVN
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 46 of 57
Example
Here we consider three variables: time it takes people to get toclass, number of years in school, overall happiness.
■ Just as a generic procedure I would
1. Evaluate each variable individually
2. Evaluate each pair (if this is reasonable)
3. Evaluate MVN using a Q-Q plot
4. Evaluate the reasonableness of any outliers (this can alsobe done in the previous steps.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 47 of 57
Example Q-Q plots
First we us a Q-Q plot for each variable.
Figure 2: Q-Q plot of Time
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 48 of 57
Example Q-Q plots
Figure 3: Q-Q plot of Year
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 49 of 57
Example Q-Q plots
Figure 4: Q-Q plot of Happy
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 50 of 57
Example Bivariate plots
■ So if we decide that everything looks OK then we move on tothe bivariate plots.
■ If they do not look OK we can consider using some kind ofrobust analyses or consider a transformation.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 51 of 57
Example Bivariate plots
Figure 5: Time versus Year
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 52 of 57
Example Bivariate plots
Figure 6: Time versus Happy
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 53 of 57
Example Bivariate plots
Figure 7: Year versus Happy
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Outliers
Outliers Using D2
Example
Example Q-Q plots
Example Bivariate plots
MV Q-Q Plot
Transformations
Wrapping Up
Lecture #4 - 9/14/2005 Slide 54 of 57
MV Q-Q Plot
Finally, the Multivariate Q-Q plot.
Figure 8: Year versus Happy
■ We can also look for an outlier.■ Largest D2 = 13.4
■ Compare it to Table A.6 for 30 observation and 3 variables.◆ Max D2 is 12.24 with p-value=.05 and 14.14 with
p-value=.01.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
TransformationsTransformations to Near
Normality
Wrapping Up
Lecture #4 - 9/14/2005 Slide 55 of 57
Transformations to Near Normality
A few types of data must be transformed prior to doinganalyses that assume data comes from a MVN distribution:
■ Counts, y, are transformed with√
y.
■ Proportions, p, are transformed with logit(p) = 12 log
(
p1−p
)
■ Correlations, r, are transformed with (Fisher’s)
z(r) = 12 log
(
1+r1−r
)
.
Variations of transformations exist, as do other techniques.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Final Thought
Next Class
Lecture #4 - 9/14/2005 Slide 56 of 57
Final Thought
■ The multivariate normaldistribution is an analog tothe univariate normaldistribution.
■ The MVN distribution willplay a large role in theupcoming weeks.
■ We can finally put the background material to rest, and beginlearning some practical statistics.
Overview
Generalized Variance
MVN
Distributions
Assessing Uni Normality
Assessing MV Normality
Outliers
Transformations
Wrapping Up
Final Thought
Next Class
Lecture #4 - 9/14/2005 Slide 57 of 57
Next Time
■ Statistical Analyses - Mean Vector Inference.
top related