1 multivariate statistical data analysis with its applications hua-kai chiou ph.d., assistant...

45
1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC [email protected] September, 2005

Upload: margaret-terry

Post on 27-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

1

Multivariate Statistical Data Analysis with Its

Applications

Hua-Kai ChiouPh.D., Assistant Professor

Department of Statistics, [email protected]

September, 2005

Page 2: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

2

Agenda

1. Introduction2. Examining Your Data3. Sampling & Estimation4. Hypothesis & Testing5. Multiple Regression Analysis6. Logistic Regression7. Multivariate Analysis of Variance 8. Principal Components Analysis

Page 3: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

3

9. Factor Analysis10. Cluster Analysis 11. Discriminant Analysis 12. Multidimensional Scaling13. Canonical Correlation Analysis14. Conjoint Analysis 15. Structural Equation Modeling

Page 4: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

4

1Introduction

Page 5: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

5

Some Basic Concept of MVA

• What is Multivariate Analysis (MVA)?• Impact of the Computer Revolution• Multivariate Analysis Defined• Measurement Scales• Type of Multivariate Techniques

Page 6: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

6

• Dependence technique – the objective is prediction of the dependent variable(s) by the independent variable(s), e.g., regression analysis.

• Dependent variable – presumed effect of, or response to, a change in the independent variable(s).

• Dummy variable – nometrically measured variable transformed into a metric variable by assigning 1 or 0 to a subject, depending on whether it possesses a particular characteristic.

• Effect size – estimate of the degree to which the phenomenon being studied (e.g., correlation or difference in means) exists in population.

Page 7: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

7

• Indicator – single variable used in conjunction with one or more other variables to form a composite measure.

• Interdependence technique – classification of statistical techniques in which the variables are not divided into dependent and independent sets (e.g., factor analysis).

• Metric data – also called quantitative data, interval data, or ratio data, these measurements identify or describe subjects (or objects) not only on the possession of an attribute but also by the amount or degree to which the subject may be characterized by attribute. For example, a person’s age and weight are metric data.

Page 8: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

8

• Multicollinearity – extent to which a variable can be explained by the other variables in the analysis. As multicollinearity increases, it complicates the interpretation of the variate as it is more difficult to ascertain the effect of any single variable, owing to their interrelationships.

• Nonmetric data – also called qualitative data.• Power – probability of correctly rejecting the

null hypothesis when it is false, that is, correctly finding a hypothesized relationship when it exists. Determined as a function of (1)the statistical significance level (α) set by the researcher for a Type I error, (2) the sample size used in the analysis, and (3) the effect size being examined.

Page 9: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

9

• Practical significance – means of assessing multivariate analysis results based on their substantive findings rather than their statistical significance. Whereas statistical significance determines whether the result is attributable to chance, practical significance assesses whether the result is useful.

• Reliability – extent to which a variable or set of variables is consistent in what it is intended to measure. Reliability relates to the consistency of the measure(s).

• Validity – extent to which a measure or set of measures correctly represents the concept of study. Validity is concerned with how well the concept is defined by the measure(s).

Page 10: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

10

• Type I error – probability of incorrectly rejecting the null hypothesis.

• Type II error - probability of incorrectly failing to reject the null hypothesis, it meaning the chance of not finding a correlation or mean difference when it does exist.

• Variate – linear combination of variables formed in the multivariate technique by deriving empirical weights applied to a set of variables specified by the researcher.

Page 11: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

11

• The Relationship between Multivariate Dependence Methods

Analysis of Variance (ANOVA)

(metric) (nometric)

Multivariate Analysis of Variance (MANOVA)

(metric) (nometric)

Canonical Correlation

(metric, nometric) (metric, nometric)

1 2 3 1 2 3... ...n nY Y Y Y X X X X

1 2 3 1 2 3... ...n nY Y Y Y X X X X

1 1 2 3 ... nY X X X X

Page 12: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

12

Discriminant Analysis

(nometric) (metric)

Multiple Regression Analysis

(metric) (metric, nometric)

Conjoint Analysis

(metric, nometric) (nometric)

1 1 2 3 ... nY X X X X

1 1 2 3 ... nY X X X X

1 1 2 3 ... nY X X X X

Page 13: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

13

Structural Equation Modeling

(metric) (metric, nometric)

1 11 12 13 1

2 21 22 23 2

1 2 3

...

...

...

n

n

m m m m mn

Y X X X X

Y X X X X

Y X X X X

Page 14: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

14

What type of

relationship is being

examined?

How many variables are being predicted?

Is the structure of relationship

s among:

InterdependenceDependence

What is the measurement scale of the dependent variable?

Several dependent variables in single

relationship

One dependent variables in single

relationship

What is the measurement scale of the dependent variable?

Multiple relationships of dependent and

independent variables

Structural Equation Modeling

What is the measurement scale of the dependent variable?

Metric Nometric

Canonical correlation

analysis with dummy

variables

Nometric

Multivariate analysis of variance

(MANOVA)

Metric

Canonical correlation

analysis

Nometric

Multiple discriminant

analysis

Linear probability

models

Metric

Multiple regression

Conjoint analysis

ObjectVariable

Factor analysis

Cluster analysis

Cases/Respondent

How are the

attributes measured?

Nometric

Correspondence analysis

Metric

Multidimensional scaling

Nometric

Page 15: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

15

Stage 1: Define the research problem, objectives, and multivariate technique to be used

Stage 2: Develop the analysis planStage 3: Evaluate the assumptions underlying th

e multivariate techniqueStage 4: Estimate the multivariate model and as

sess overall model fitStage 5: Interpret the variate(s)Stage 6: Validate the multivariate model

A Structured Approach to Multivariate Model Building

Page 16: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

16

2Examining Your Data

Page 17: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

17

HATCO Case• Primary Database

– This example investigates a business-to-business case from existing customers of HATCO.

– The primary database consists 100 observations on 14 separate variables.

• Three types of information were collected:– The perceptions of HATCO, 7 attributes (X1 –

X7);– The actual purchase outcomes, 2 specific

measures (X9,X10);– The characteristics of the purchasing

companies, 5 characteristics (X8, X11-X14).

Page 18: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

18

Table 2.1 Description of Database Variables (Hair et al., 1998)

Variables Description Variable Type Rating Scale Perceptions of HATCO X1 Delivery Speed Metric 0 – 10 X2 Price Level Metric 0 – 10 X3 Price Flexibility Metric 0 – 10 X4 Manufacturer’s Image Metric 0 – 10 X5 Overall Service Metric 0 – 10 X6 Salesforce Image Metric 0 – 10 X7 Product Quality Metric 0 – 10 Purchase Outcomes X9 Usage Level Metric 100-point percentage X10 Satisfaction Level Metric 0 – 10 Purchaser Characteristics X8 Size of Firm Nonmetric {0,1} X11 Specification Buying Nonmetric {0,1} X12 Structure of Procurement Nonmetric {0,1} X13 Type of Industry Nonmetric {0,1} X14 Type of Buying Situation Nonmetric {1,2,3}

Page 19: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

19

Fig 2.1 Scatter Plot Matrix of Metric Variables (Hair et al., 1998)

Page 20: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

20

Fig 2.2 Examples of Multivariate Graphical Displays (Hair et al., 1998)

Page 21: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

21

Missing Data

• A missing data process is any systematic event external to the respondent (e.g. data entry errors or data collection problems) or action on the part of the respondent (such as refusal to answer) that leads to missing values.

• The impact of missing data is detrimental not only through its potential “hidden” biases of the results but also in its practical impact on the sample size available for analysis.

Page 22: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

22

• Understanding the missing data– Ignorable missing data– Remediable missing data

• Examining the pattern of missing data

Page 23: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

23

Table 2.2 Summary Statistics of Pretest Data (Hair et al., 1998)

Page 24: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

24

Table 2.3 Assessing the Randomness of Missing Data through Group Comparisons of Observations with Missing versus Valid Data (Hair et al., 1998)

Page 25: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

25

Table 2.4 Assessing the Randomness of Missing Data through Dichotomized Variable Correlations and the Multivariate Test for Missing Completely at

Random (MCAR) (Hair et al., 1998)

Page 26: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

26

Table 2.5 Comparison of Correlations Obtained with All-Available (Pairwise), Complete Case (Listwise), and Mean Substitution Approaches (Hair et al.,

1998)

Page 27: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

27

Table 2.6 Results of the Regression and EM Imputation Methods (Hair et al., 1998)

Page 28: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

28

Outliers

• Four classes of outliers:– Procedural error– Extraordinary event can be explained– Extraordinary observations has no explanation– Observations fall within the ordinary range of

values on each of the variables but are unique in their combination of values across the variables.

• Detecting outliers– Univariate detection– Bivariate detection– Multivariate detection

Page 29: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

29

Outliers detection

• Univariate detection threshold: – For small samples, within ±2.5 standardized varia

ble values– For larger samples, within ±3 or ± 4 standardized

variable values• Bivariate detection threshold:

– Varying between 50 and 90 percent of the ellipse representing normal distribution.

• Multivariate detection:– The Mahalanobis distance D2

Page 30: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

30

Table 2.7 Identification of Univariate and Bivariate Outliers (Hair et al., 1998)

Page 31: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

31

Fig 2.3 Graphical Identification of Bivariate Outliers (Hair et al., 1998)

Page 32: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

32

Table 2.8 Identification of Multivariate Outliers (Hair et al., 1998)

Page 33: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

33

Testing the Assumptions of Multivariate Analysis

• Graphical analyses of normality– Kurtosis refers to the peakedness or flatness of the

distribution compared with the normal distribution.– Skewness indicates the arc, either above or below t

he diagonal. • Statistical tests of normality

;6 24

skewness kurtosis

skewness kurtosisz z

N N

Page 34: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

34

Fig 2.4 Normal Probability Plots and Corresponding Univariate Distribution (Hair et al., 1998)

Page 35: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

35

Homoscedasticity vs. Heteroscedasticity

• Homoscedasticity is an assumption related primarily to dependence relationships between variables.

• Although the dependent variables must be metric, this concept of an equal spread of variance across independent variables can be applied either metric or nonmetric.

Page 36: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

36

Fig 2.5 Scatter Plots of Homoscedastic and Heteroscedastic Relationships (Hair et al., 1998)

Page 37: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

37

Fig 2.6 Normal Probability Plots of Metric Variables (Hair et al., 1998)

Page 38: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

38

Table 2.9 Distributional Characteristics, Testing for Normality, and Possible Remedies (Hair et al., 1998)

Page 39: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

39

Fig 2.7 Transformation of X2 (Price Level) to Achieve Normality (Hair et al., 1998)

Page 40: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

40

Table 2.10 Testing for Homoscedasticity (Hair et al., 1998)

Page 41: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

41

3Sampling Distribution

Page 42: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

42

Understanding sampling distributions

• A histogram is constructed from a frequency table. The intervals are shown on the X-axis and the number of scores in each interval is represented by the height of a rectangle located above the interval.

Page 43: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

43

• A bar graph is much like a histogram, differring in that the columns are separated from each other by a small distance. Bar graphs are commonly used for qualitative variables.

Page 44: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

44

What is a normal distribution?

• Normal distributions are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped. The height of a normal distribution can be specified mathematically in terms of two parameters: the mean (m) and the standard deviation (s).

Page 45: 1 Multivariate Statistical Data Analysis with Its Applications Hua-Kai Chiou Ph.D., Assistant Professor Department of Statistics, NDMC hkchiou@rs590.ndmc.edu.tw

45