ii. the multivariate normal distribution “…it is not enough to know that a sample could have...

118
II. The Multivariate Normal Distribution “…it is not enough to know that a sample could have come from a normal population; we must be clear that it is at the same time improbable that it has come from a population differing so much from the normal as to invalidate the use of the ‘normal theory’ tests in further handling of the material.” E. S. Pearson, 1930 (quoted on page 1 in Tests of Normality, Henry C. Thode, Jr., 2002)

Upload: conrad-daniel

Post on 18-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • II. The Multivariate Normal Distribution it is not enough to know that a sample could have come from a normal population; we must be clear that it is at the same time improbable that it has come from a population differing so much from the normal as to invalidate the use of the normal theory tests in further handling of the material. E. S. Pearson, 1930 (quoted on page 1 in Tests of Normality, Henry C. Thode, Jr., 2002)
  • Slide 2
  • A.Review of the Univariate Normal Distribution Normal Probability Distribution - expresses the probabilities of outcomes for a continuous random variable x with a particular symmetric and unimodal distribution. This density function is given by where = mean = standard deviation = 3.14159 e = 2.71828
  • Slide 3
  • but the probability is given by This looks like a difficult integration problem! Will I have to integrate this function every time I want to calculate probabilities for some normal random variable?
  • Slide 4
  • Characteristics of the normal probability distribution are: -there are an infinite number of normal distributions, each defined by their unique combination of the mean and standard deviation - determines the central location and determines the spread or width -the distribution is symmetric about -it is unimodal - = M d = M o -it is asymptotic with respect to the horizontal axis -the area under the curve is 1.0 -it is neither platykurtic nor leptokurtic -it follows the empirical rule:
  • Slide 5
  • Normal distributions with the same mean but different standard deviations:
  • Slide 6
  • Normal distributions with the same standard deviation but different means:
  • Slide 7
  • The Standard Normal Probability Distribution - the probability distribution associated with any normal random variable (usually denoted z) that has = 0 and = 1. There are tables that can be used to obtain the results of the integration for the standard normal random variable.
  • Slide 8
  • Some of the tables work from the cumulative standard normal probability distribution (the probability that a random value selected from the standard normal random variable falls between and some given value b > 0, i.e., P(- z b)) There are tables that give the results of the integration (Table 1 of the Appendices in J&W).
  • Slide 9
  • Cumulative Standard Normal Distribution (J&W Table 1)
  • Slide 10
  • Lets focus on a small part of the Cumulative Standard Normal Probability Distribution Table Example: for a standard normal random variable z, what is the probability that z is between - and 0.43?
  • Slide 11
  • Example: for a standard normal random variable z, what is the probability that z is between 0 and 2.0? 2.0
  • Slide 12
  • Again, looking at a small part of the Cumulative Standard Normal Probability Distribution Table, we find the probability that a standard normal random variable z is between - and 2.00?
  • Slide 13
  • Example: for a standard normal random variable z, what is the probability that z is between 0 and 2.0? 2.0 } Area of Probability = 0.9772 0.5000 = 0.4772 Area of Probability = 0.5000 {
  • Slide 14
  • 2.0 Area of Probability = 1.0000 - 0.9772 = 0.0228 What is the probability that z is at least 2.0? {
  • Slide 15
  • 2.0 Area of Probability = 0.4772 What is the probability that z is between -1.5 and 2.0? -1.5 }
  • Slide 16
  • Again, looking at a small part of the Cumulative Standard Normal Probability Distribution Table, we find the probability that a standard normal random variable z is between - and 1.50?
  • Slide 17
  • 2.0 Area of Probability = 0.4772 What is the probability that z is between -1.5 and 2.0? -1.5 }} Area of Probability = 0.5000 - 0.0668 = 0.4332
  • Slide 18
  • 2.0 Area of Probability = 0.9772 Notice we could find the probability that z is between -1.5 and 2.0 another way! -1.5 } } Area of Probability = 1.0000 - 0.9332 = 0.0668
  • Slide 19
  • There are often multiple ways to use the Cumulative Standard Normal Probability Distribution Table to find the probability that a standard normal random variable z is between two given values! How do you decide which to use? - Do what you understand (make yourself comfortable) and - DRAW THE PICTURE!!!
  • Slide 20
  • 2.0 Area of Probability = 0.4772 Notice we could also calculate the probability that z is between -1.5 and 2.0 yet another way! -1.5 }} Area of Probability = 0.9332 - 0.5000 = 0.4332
  • Slide 21
  • -2.0 Area of Probability = 0.5000 - 0.0228 = 0.4772 -1.5 } } Area of Probability = 0.4332 What is the probability that z is between -1.5 and -2.0? Area of Probability = 0.4772 0.4332 = 0.0440 }
  • Slide 22
  • 1.5 } Area of Probability = 0.9332 (why?) What is the probability that z is exactly 1.5?
  • Slide 23
  • Other tables work from the half standard normal probability distribution (the probability that a random value selected from the standard normal random variable falls between 0 and some given value b > 0, i.e., P(0 z b)) There are tables that give the results of the integration as well.
  • Slide 24
  • Standard Normal Distribution
  • Slide 25
  • Lets focus on a small part of the Standard Normal Probability Distribution Table Example: for a standard normal random variable z, what is the probability that z is between 0 and 0.43?
  • Slide 26
  • Example: for a standard normal random variable z, what is the probability that z is between 0 and 2.0? 2.0
  • Slide 27
  • Again, looking at a small part of the Standard Normal Probability Distribution Table, we find the probability that a standard normal random variable z is between 0 and 2.00?
  • Slide 28
  • Example: for a standard normal random variable z, what is the probability that z is between 0 and 2.0? 2.0 } Area of Probability = 0.4772
  • Slide 29
  • 2.0 Area of Probability = 0.5000 - 0.4772 = 0.0228 What is the probability that z is at least 2.0? {
  • Slide 30
  • 2.0 Area of Probability = 0.4772 What is the probability that z is between -1.5 and 2.0? -1.5 }
  • Slide 31
  • Again, looking at a small part of the Standard Normal Probability Distribution Table, we find the probability that a standard normal random variable z is between 0 and 1.50?
  • Slide 32
  • 2.0 Area of Probability = 0.4772 What is the probability that z is between -1.5 and 2.0? -1.5 }} Area of Probability = 0.4332
  • Slide 33
  • -2.0 Area of Probability = 0.4772 -1.5 } } Area of Probability = 0.4332 What is the probability that z is between -1.5 and -2.0? Area of Probability = 0.4772 0.4332 = 0.0440 }
  • Slide 34
  • 1.5 } Area of Probability = 0.4332 (why?) What is the probability that z is exactly 1.5?
  • Slide 35
  • z-Transformation - mathematical means by which any normal random variable with a mean and standard deviation can be converted into a standard normal random variable. -to make the mean equal to 0, we simply subtract from each observation in the population -to then make the standard deviation equal to 1, we divide the results in the first step by The resulting transformation is given by
  • Slide 36
  • Example: for a normal random variable x with a mean of 5 and a standard deviation of 3, what is the probability that x is between 5.0 and 7.0? 7.0 } Area of Probability 0.0 z
  • Slide 37
  • Using the z-transformation, we can restate the problem in the following manner: then use the standard normal probability table to find the ultimate answer:
  • Slide 38
  • 7.0 } Area of Probability = 0.2486 0.0 z which graphically looks like this: 0.67
  • Slide 39
  • Why is the normal probability distribution considered so important? -many random variables are naturally normally distributed -many distributions, such as the Poisson and the binomial, can be approximated by the normal distribution (Central Limit Theorem) -the distribution of many statistics, such as the sample mean and the sample proportion, are approximately normally distributed if the sample is sufficiently large (also Central Limit Theorem)
  • Slide 40
  • B. The Multivariate Normal Distribution The univariate normal distribution has a generalized form in p dimensions the p-dimensional normal density function is where - x i , i = 1,,p. This p-dimensional normal density function is denoted by N p ( , ) where ~ squared generalized distance from x to ~
  • Slide 41
  • The simplest multivariate normal distribution is the bivariate (2 dimensional) normal distribution, which has the density function where - x i , i = 1, 2. This 2-dimensional normal density function is denoted by N 2 ( , ) where ~ squared generalized distance from x to ~
  • Slide 42
  • We can easily find the inverse of the covariance matrix (by using Gauss-Jordan elimination or some other technique): Now we use the previously established relationship to establish that
  • Slide 43
  • By substitution we can now write the squared distance as
  • Slide 44
  • which means that we can rewrite the bivariate normal probability density function as
  • Slide 45
  • Graphically, the bivariate normal probability density function looks like this: X1X1 X2X2 All points of equal density are called a contour, defined for p-dimensions as all x such that ~ contours
  • Slide 46
  • The contours f(X 1, X 2 ) X2X2 where contour for constant c form concentric ellipsoids centered at with axes ~ X1X1
  • Slide 47
  • The general form of contours for a bivariate normal probability distribution where the variables have equal variance ( 11 = 22 ) is relative easy to derive: First we need the eigenvalues of ~
  • Slide 48
  • Next we need the eigenvectors of ~
  • Slide 49
  • -for a positive covariance 12, the first eigenvalue and its associated eigenvector lie along the 45 0 line running through the centroid ~ f(X 1, X 2 ) X2X2 contour for constant X1X1 What do you suppose happens when the covariance is negative? Why?
  • Slide 50
  • -for a negative covariance 12, the second eigenvalue and its associated eigenvector lie at right angles to the 45 0 line running through the centroid ~ f(X 1, X 2 ) X2X2 contour for constant X1X1 What do you suppose happens when the covariance is zero? Why?
  • Slide 51
  • What do you suppose happens when the two random variables X 1 and X 2 are uncorrelated (i.e., r 12 = 0): f(X 1 )f(X 2 )
  • Slide 52
  • -for covariance 12 of zero the two eigenvalues and eigenvectors are equal (except for signs) - one runs along the 45 0 line running through the centroid and the other is perpendicular ~ f(X 1, X 2 ) X2X2 contour for constant X1X1 What do you suppose happens when the covariance is zero? Why?
  • Slide 53
  • Contours also have an important probability interpretation the solid ellipsoid of x values satisfying: ~ has a probability 1 , i.e.,
  • Slide 54
  • C.Properties of the Multivariate Normal Distribution For any multivariate normal random vector X 1.The density has maximum value at i.e., the mean is equal to the mode! ~
  • Slide 55
  • 2.The density is symmetric along its constant density contours and is centered at , i.e., the mean is equal to the median! 3.Linear combinations of the components of X are normally distributed 4.All subsets of the components of X have a (multivariate) normal distribution 5.Zero covariance implies that the corresponding components of X are independently distributed 6.Conditional distributions of the components of X are (multivariate) normal ~ ~ ~ ~ ~
  • Slide 56
  • D.Some Important Results Regarding the Multivariate Normal Distribution 1.If X ~ N p ( , ), then any linear combination Furthermore, if aX ~ N p ( , ) for every a, then X ~ N p ( , ) ~ ~ ~ ~ ~ ~ ~ ~
  • Slide 57
  • 2.If X ~ N p ( , ), then any set of q linear combinations Furthermore, if d is a conformable vector of constants, then X + d ~ N p ( + d, ) ~ ~ ~ ~ ~ ~ ~ ~
  • Slide 58
  • 3.If X ~ N p ( , ), then all subsets of X are (multivariate) normally distributed, i.e., for any partition then X 1 ~ N q ( 1, 11 ), X 2 ~ N p-q ( 2, 22 ) ~ ~ ~ ~
  • Slide 59
  • 4.If X 1 ~ N q1 ( 1, 11 ) and X 2 ~ N q2 ( 2, 22 ) are independent, then Cov(X 1, X 2 ) = 12 = 0 and if then X 1 and X 2 are independent iff 12 = 0 and if X 1 ~ N q1 ( 1, 11 ) and X 2 ~ N q2 ( 2, 22 ) and are independent, then ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~ ~ ~ ~
  • Slide 60
  • and the N p ( , ) distribution assigns probability 1 to the solid ellipsoid 5.If X ~ N p ( , ) and | | > 0, then ~ ~ ~~ ~
  • Slide 61
  • 6.Let X j ~ N p ( j, ), j = 1,,n be mutually independent. Then ~ ~ ~ common covariance matrix Furthermore, V 1 and ~ are jointly normal with covariance matrix so V 1 and V 2 are independent if bc = 0! ~ ~~ ~ ~
  • Slide 62
  • E.Sampling From a Multivariate Normal Distribution and Maximum Likelihood Estimation Let X j ~ N p ( , ), j = 1,,n represent a random sample. Since the X j s are mutually independent and each have distribution N p ( , ), their joint density is the product of their marginal densities, i.e., ~ as a function of and , this is the likelihood for fixed observations x j, j = 1,,n ~ ~ ~ ~ ~ ~
  • Slide 63
  • Maximum Likelihood Estimation estimation of parameter values by finding estimates that maximize the likelihood of the sample data on which they are based (select estimated values for parameters that best explain the sample data collected) Maximum Likelihood Estimates the estimates of parameter values that maximize the likelihood of the sample data on which they are based For a multivariate normal distribution, we would like to obtain the maximum likelihood estimates of parameters and given the sample data X we have collected. To simplify our efforts we will need to utilize some properties of the trace to rewrite the likelihood function in another form. ~ ~
  • Slide 64
  • For a k x k symmetric matrix A and a k x 1 vector x: -xAx = tr(xAx) = tr(Axx) -tr(A) = where i, I = 1, k are the eigenvalues of A These two results can be used to simplify the joint density of n mutually independent random observations X j s, each have distribution N p ( , ) we first rewrite ~~ ~~ ~ ~ ~ ~~
  • Slide 65
  • Then we rewrite since the trace of the sum of matrices is equal to the sum of their individual traces
  • Slide 66
  • We can further state that Because the crossproduct terms are both matrices of zeros
  • Slide 67
  • Substitution of these two results yield an alternative expression of the joint density of a random sample from a p-dimensional population: Substitution of the observed values x 1,,x n into the joint density yields the likelihood function for the corresponding sample X, which is often denoted as L( , ). ~ ~
  • Slide 68
  • So for observed values x 1,,x n that comprise random sample X drawn from a p-dimensional normally distributed population, the likelihood function is ~ ~
  • Slide 69
  • Finally, note that we can express the exponent of the likelihood function in many ways one particular alternate expression will be particularly convenient:
  • Slide 70
  • which, by another substitution, yields the likelihood function Again, keep in mind that we are pursuing estimates of and that maximize the likelihood function L( , ) for a given random sample X. ~ ~
  • Slide 71
  • This result will also be helpful in deriving the maximum likelihood estimates of and . For a p x p symmetric positive definite matrix B and scalar b > 0, it follows that for all positive definite of dimension p x p, with equality holding only for ~ ~ ~
  • Slide 72
  • Now we are ready for maximum likelihood estimation of and . For a random sample X 1,,X n from a normal population with mean and covariance , the maximum likelihood estimators and of and are Their observed values for observed data x 1,,x n ~ ~ ^ ~ are the maximum likelihood estimates of and . ~
  • Slide 73
  • Note that the maximum of the likelihood is achieved at and since we have that generalized variance constant
  • Slide 74
  • It can be shown that maximum likelihood estimators (or MLEs) possess an invariance property if is the MLE of , then the MLE of f( ) = f( ). Thus we can say -the MLE of is ^ ^ where is the MLE of Var(X i ).
  • Slide 75
  • It can be also be shown that are sufficient for the multivariate normal joint density i.e., the density depends on the entire set of observations x 1,,x n only through Thus, we refer to X and S as the sufficient statistics for the multivariate normal distribution. Sufficient Statistics contain all information necessary to evaluate a particular density for a given sample. ~ _
  • Slide 76
  • F.The Sampling Distributions of X and S The assumption that X 1,,X n constitute a random sample with mean and covariance completely determines the sampling distributions of X and S. For a univariate normal distribution, X is normal with ~ ~~ _ Analogously, for the multivariate (p 2) case (i.e., X is normal with mean and covariance ), X is normal with _ ~~ ~ ~ _
  • Slide 77
  • Similarly, for random sample X 1,, X n from a univariate normal distribution with mean and variance 2 Analogously, for the multivariate (p 2) case (i.e., X is normal with mean and covariance ), S is Wishart distributed (denoted W m ( | ) where ~ ~ ~ where ~ ~~ ~
  • Slide 78
  • Some important properties of the Wishart distribution: -The Wishart distribution exists only if n > p -If then independently of common covariance matrix -and
  • Slide 79
  • -When it exists, the Wishart distribution has a density of for a positive symmetric definite matrix A. ~
  • Slide 80
  • F.Large Sample Behavior of X and S -The (Univariate) Central Limit Theorem suppose that ~ where the V i have approximately equivalent variability. Then the distribution of X becomes relatively normal as the sample size increases no matter what form the underlying population distribution. -Convergence in Probability a random variable X is said to converge in probability to a given constant value c if, for any prescribed accuracy , P[- < X c < ] approaches 1 as n _
  • Slide 81
  • -The Law of Large Numbers Let Y 1,, Y n constitute independent observations from a population with mean E[Y] = . Then converges in probability to as n increases without bound, i.e., P[- < Y < ] approaches 1 as n _
  • Slide 82
  • Multivariate implications of the Law of Large Numbers include P[- < X < ] approaches 1 as n and P[- < S < ] approaches 1 as n or similarly P[- < S n < ] approaches 1 as n _ ~ ~ These happen very quickly!
  • Slide 83
  • These statements are sometimes written as and or similarly
  • Slide 84
  • -These results can be used to support the (Multivariate) Central Limit Theorem Let X 1,, X n constitute independent observations from any population with mean and finite (nonsingular) covariance . Then. for n large relative to p. This can be restated as again for n large relative to p.. ~ ~ ~
  • Slide 85
  • Because the sample covariance matrix S (or S n ) converges to the population covariance matrix so quickly (i.e., at relatively small values of n p), we often substitute the sample covariance for the population covariance with little concern for the ramifications so we have. for n large relative to p. This can be restated as again for n large relative to p.. ~ ~
  • Slide 86
  • One final important result due to the CLT by substitution. for n large relative to p.
  • Slide 87
  • G.Assessing the Assumption of Normality There are two general circumstances in multivariate statistics under which the assumption of multivariate normality is crucial: -the technique to be used relies directly on the raw observations X j -the technique to be used relies directly on sample mean vector X j ( including those which rely on distances of the form n(X )S -1 (X ) ) In either of these situations, the quality of inferences to be made depends on how closely the true parent population resembles the assumed multivariate normal form! ~ ~ ~ ~ ~ ~ ~
  • Slide 88
  • Based on the properties of the Multivariate Normal Distribution, we know -all linear combinations of the individual normal are normal -the contours of the multivariate normal density are concentric ellipsoids These facts suggest investigation of the following questions (in one or two dimensions): -Do the marginal distributions of the elements of X appear normal? What about a few linear combinations? -Do the bivariate scatterplots appear ellipsoidal? -Are there any unusual looking observations (outliers)? ~
  • Slide 89
  • Tools frequently used for assessing univariate normality include -the empirical rule -dot plots (for small samples sets) and histograms or stem & leaf plots (for larger samples) -goodness-of-fit tests such as the Chi-Square GOF Test and the Kolmogorov-Smirnov Test -the test developed by Shapiro and Wilk [1965] called the Shapiro-Wilk test -Q-Q plots (of the sample quantiles against the expected quantile for each observation given normality)
  • Slide 90
  • Example suppose we had the following fifteen (ordered) sample observations on some random variable X: Do these data support the assertion that they were drawn from a normal parent population?
  • Slide 91
  • In order to assess normality by the the empirical rule, we need to compute the generalized distance from the centroid (convert the data to a standard normal random variable) for our data we have so the corresponding standard normal values for our data are Nine of the observations (or 60%) lie within one standard deviation of the mean, and all fifteen of the observations lie within two standard deviation of the mean does this support the assertion that they were drawn from a normal parent population?
  • Slide 92
  • -1 0 1 2 3 4 5 6 7 8 9 10 11............... A simple dot plot could look like this: This doesnt seem to tell us much (of course, fifteen data points isnt much to go on). How about a histogram? This doesnt seem to tell us much either!
  • Slide 93
  • We could use SAS to calculate the Shapiro-Wilk test statistic and corresponding p-value: DATA stuff; INPUT x; LABEL x='Observed Values of X'; CARDS; 1.43 1.62 2.46 2.48 2.97 4.03 4.47 5.76 6.61 6.68 6.79 7.46 7.88 8.92 9.42 ; PROC UNIVARIATE DATA=stuff NORMAL; TITLE4 'Using PROC UNIVARIATE for tests of univariate normality'; VAR x; RUN;
  • Slide 94
  • Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.935851 Pr < W 0.3331 Kolmogorov-Smirnov D 0.159493 Pr > D >0.1500 Cramer-von Mises W-Sq 0.058767 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.362615 Pr > A-Sq >0.2500 Stem Leaf # Boxplot 9 4 1 | 8 9 1 | 7 59 2 +-----+ 6 678 3 | | 5 8 1 *--+--* 4 05 2 | | 3 0 1 | | 2 55 2 +-----+ 1 46 2 | ----+----+----+----+ Normal Probability Plot 9.5+ +++* | +*+ | *+*+ | **+*+ 5.5+ *++ | +**+ | ++++ | +++* * * 1.5+ * +++* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
  • Slide 95
  • Or a Q-Q plot -put the observed values in ascending order - call these the x (j) -calculate the continuity corrected cumulative probability level (j 0.5)/n for the sample data -find the standard normal quantiles (values of the N(0,1) distribution) that have a cumulative probability of level (j 0.5)/n call these the q (j), i.e., find z such that -plot the pairs (q (j), x (j) ). If the points lie on/near a straight line, the observations support the contention that they could have been drawn from a normal parent population.
  • Slide 96
  • The results of calculations for the Q-Q plot look like this:
  • Slide 97
  • and the resulting Q-Q plot looks like this: There dont appear to be great departures from the straight line drawn through the points, but it doesnt fit terribly well, either
  • Slide 98
  • Looney & Gulledge [1985] suggest calculating the Pearsons correlation coefficient between q (j) and x (j) (a test has even been developed) the formula for the correlation coefficient is Critical points for the test of normality are given in Table 4.2 (page 182) of J&W (note we reject the hypothesis of normality if r Q is less than the critical value ).
  • Slide 99
  • For our previous example, the intermediate calculations are given in the table below:
  • Slide 100
  • Evaluation of the Pearsons correlation coefficient between q (j) and x (j) yields The sample size is n = 15, so critical points for the test of normality are 0.9503 at = 0.10, 0.9389 at = 0.05, and 0.9216 at = 0.01. Thus we do not reject the hypothesis of normality at any larger than 0.01.
  • Slide 101
  • When addressing the issue of multivariate normality, these tools aid in assessment of normality for the univariate marginal distributions. However, we should also consider bivariate marginal distributions (each of which should be normal if the overall joint distribution is multivariate normal). The methods most commonly used for assessing bivariate normality are -scatter plots -Chi-Square Plots
  • Slide 102
  • Example suppose we had the following fifteen (ordered) sample observations on some random variables X 1 and X 2 : Do these data support the assertion that they were drawn from a bivariate normal parent population?
  • Slide 103
  • The scatter plot of pairs (x 1, x 2 ) support the assertion that these data were drawn from a bivariate normal distribution (and that they have little or no correlation).
  • Slide 104
  • To create a Chi-Square plot, we will need to calculate the squared generalized distance from the centroid for each observation x j ~ For our bivariate data we have
  • Slide 105
  • so the squared generalized distances from the centroid are if we order the observations relative to their squared generalized distances
  • Slide 106
  • We then find the corresponding percentile Now we create a scatter plot of the pairs ( d 2 j1, q c,2 [(j-.5)/n] ) If these points lie on a straight line, the data support the assertion that they were drawn from a bivariate normal parent population. of the Chi-Square distribution with p degrees of freedom.
  • Slide 107
  • These data dont seem to support the assertion that they were drawn from a bivariate normal parent population possible outliers!
  • Slide 108
  • Some suggest also looking to see if roughly half the squared distances d 2 j are less than or equal to q c,p (0.50) (i.e., lie within the ellipsoid containing 50% of all potential p-dimensional observations). For our example, 7 of our fifteen observations (about 46.67%) of all observations are less than q c,p (0.50) = 1.386 standardized units from the centroid (i.e., lie within the ellipsoid containing 50% of all potential p- dimensional observations). Note that the Chi-Square plot can easily be extended to p > 2 dimensions. Note also that some researchers also calculate the correlation between d 2 j1 and q c,p [(j-.5)/n]. For our example this is 0.8952.
  • Slide 109
  • H.Outlier Detection Detecting outliers (extreme or unusual observations) in p > 2 dimensions is very tricky. Consider the following situation: 90% confidence ellipsoid 90% confidence interval for X 1 90% confidence interval for X 2 X2X2 X1X1
  • Slide 110
  • A strategy for multivariate outlier detection: -Look for univariate outliers standardized values dot plots, histograms, stem & leaf plots Shapiro Wilk test, GOF Tests Q-Q Plots and correlation -Look for bivariate outliers generalized square distances scatter plots (perhaps a scatter plot matrix) Chi-Square plots and correlation -Look for p-dimensional outliers generalized square distances Chi-Square plots and correlation Note that NO STRATEGY guarantees detection of outliers!
  • Slide 111
  • Here are calculated standardized values (z ji s) and squared generalized distances (d 2 j s) for our previous data: This one looks a little unusual in p = 2 space
  • Slide 112
  • I.Transformations to Near Normality Transformations to make nonnormal data approximately normal are usually suggested by -theory -the raw data Some common transformations include Original ScaleTransformed Scale Counts y Proportions p Correlations r ^
  • Slide 113
  • For continuous random variables, an appropriate transformation can usually be found among the family of power Box and Cox [1964] suggest an approach to finding an appropriate transformation from this family. Box and Cox consider the slightly modified family of power transformations
  • Slide 114
  • For observations x 1,,x n, the Box-Cox choice of appropriate power for the normalizing transformation is that which maximizes where and
  • Slide 115
  • We then evaluate l ( ) at many points on an short interval (say [-1,1] or [-2,2]), plot the pairs (, l ( )) and look for a maximum point. Often a logical value of near * is chosen. l ( ) l ( )
  • Slide 116
  • Unfortunately, l is very volatile as changes (which create some other analytic problems to overcome). Thus we consider another transformation to avoid this additional problem: where is the geometric mean of the responses and is frequently calculated as the antilog of
  • Slide 117
  • The that results in minimum variance of this transformed variable also maximizes our previous criterion and is the nth power of the appropriate Jacobian of the transformation (which converts the responses (x i s into s). From this point forward proceed substituting the s for the s in the previous analysis.
  • Slide 118
  • Note that: -the value of generated by the Box-Cox transformation is only optimal in a mathematical sense use something close that has some meaning. -an approximate confidence interval for can be found -other means for estimating exist -if we are dealing with a response variable, transformations are often use to stabilize the variance -for a p-dimensional sample, transformations are considered independently for each of the p variables -while the Box-Cox methodology may help convert each marginal distribution to near normality, it does not guarantee the resulting transformed set of p variables will have a multivariate normal distribution.