Download - Mba2216 week 11 data analysis part 02

Data Analysis Part 2:Variances, Regression,

Correlation

Data Analysis Part 2:Variances, Regression,

Correlation

MBA2216 BUSINESS RESEARCH PROJECT

byStephen Ong

Visiting Fellow, Birmingham City University, UKVisiting Professor, Shenzhen University

17. Understand the concept of analysis of variance (ANOVA)

18. Interpret an ANOVA table

19. Apply and interpret simple bivariate correlations

22. Interpret a correlation matrix

23. Understand simple (bivariate) regression

24. Understand the least-squares estimation technique

25. Interpret regression output including the tests of hypotheses tied to specific parameter coefficients

27. Understand what multivariate statistical analysis involves and know the two types of multivariate analysis

28. Interpret results from multiple regression analysis

29. Interpret results from multivariate analysis of variance (MANOVA)

19–2

LEARNING OUTCOMESLEARNING OUTCOMES

30. Interpret basic exploratory factor analysis results

31. Know what multiple discriminant analysis can be used to do

32. Understand how cluster analysis can identify market segments

19–3

LEARNING OUTCOMESLEARNING OUTCOMES

Remember this,

Garbage in, garbage out! If data is collected improperly, or coded

incorrectly, then the research results are “garbage”.

19–5

EXHIBIT 19.1 Overview of the Stages of Data Analysis

Relationship Amongst Test, Analysis of Variance, Analysis of Covariance, &

Regression

One Independent One or More

Metric Dependent Variable

t Test

Binary

Variable

One-Way Analysisof Variance

One Factor

N-Way Analysisof Variance

More thanOne Factor

Analysis ofVariance

Categorical:Factorial

Analysis ofCovariance

Categoricaland Interval

Regression

Interval

Independent Variables

The Z-Test for Comparing Two Proportions

Z-Test for Differences of Proportions Tests the hypothesis that proportions are

significantly different for two independent samples or groups.

Requires a sample size greater than thirty.

The hypothesis is: Ho: π1 = π2

may be restated as: Ho: π1 - π2 = 0


Z-Test statistic for differences in large random samples:

21

2121

ppS

ppZ

p1 = sample portion of successes in Group 1

p2 = sample portion of successes in Group 2

(p1 - p1) = hypothesized population proportion 1

minus hypothesized population proportion 2

Sp1-p2 = pooled estimate of the standard errors of

differences of proportions


To calculate the standard error of the differences in proportions:

21

1121 nn

qpS pp

One-Way Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) An analysis involving the investigation of the

effects of one treatment variable on an interval-scaled dependent variable.

A hypothesis-testing technique to determine whether statistically significant differences in means occur between two or more groups.

A method of comparing variances to make inferences about the means.

The substantive hypothesis tested is: At least one group mean is not equal to another

group mean.

Partitioning Variance in ANOVA

Total Variability Grand Mean

The mean of a variable over all observations.

SST = Total of (observed value-grand mean)2

Partitioning Variance in ANOVA

Between-Groups Variance The sum of differences between the group mean and

the grand mean summed over all groups for a given set of observations.

SSB = Total of ngroup(Group Mean − Grand Mean)2

Within-Group Error or Variance The sum of the differences between observed values

and the group mean for a given set of observations Also known as total error variance.

SSE = Total of (Observed Mean − Group Mean)2

The F-Test F-Test

Used to determine whether there is more variability in the scores of one sample than in the scores of another sample.

Variance components are used to compute F-ratios SSE, SSB, SST

groupswithinVariance

groupsbetweenVarianceF

EXHIBIT 22.6 Interpreting ANOVA

1 - 15

SPSS Windows

One-way ANOVA can be efficiently performed using the program COMPARE MEANS and then One-way ANOVA. To select this procedure using SPSS for Windows, click:

Analyze>Compare Means>One-Way ANOVA …

N-way analysis of variance and analysis of covariance can be performed using GENERAL LINEAR MODEL. To select this procedure using SPSS for Windows, click:

Analyze>General Linear Model>Univariate …

SPSS Windows: One-Way ANOVA

1. Select ANALYZE from the SPSS menu bar.

2. Click COMPARE MEANS and then ONE-WAY ANOVA.

3. Move “Sales [sales]” in to the DEPENDENT LIST box.

4. Move “In-Store Promotion[promotion]” to the FACTOR box.

5. Click OPTIONS.

6. Click Descriptive.

7. Click CONTINUE.

8. Click OK.

SPSS Windows: Analysis of Covariance


2. Click GENERAL LINEAR MODEL and then UNIVARIATE.

3. Move “Sales [sales]” in to the DEPENDENT VARIABLE box.

4. Move “In-Store Promotion[promotion]” to the FIXED FACTOR(S) box. Then move “Coupon[coupon] also to the FIXED FACTOR(S) box.

5. Move “Clientel[clientel] to the COVARIATE(S) box.

6. Click OK.

The Basics Measures of Association

Refers to a number of bivariate statistical techniques used to measure the strength of a relationship between two variables.

The chi-square (2) test provides information about whether two or more less-than interval variables are interrelated.

Correlation analysis is most appropriate for interval or ratio variables.

Regression can accommodate either less-than interval or interval independent variables, but the dependent variable must be continuous.

23–20

EXHIBIT 23.1

Bivariate Analysis—Common Procedures for

Testing Association

Simple Correlation Coefficient (continued)

Correlation coefficient A statistical measure of the covariation, or

association, between two at-least interval variables.

Covariance Extent to which two variables are

associated systematically with each other.

n

i

n

i

n

iii

yxxy

YYiXXi

YYXX

rr

1 1

22

1

Simple Correlation Coefficient Correlation coefficient (r)

Ranges from +1 to -1 Perfect positive linear relationship = +1 Perfect negative (inverse) linear relationship =

-1 No correlation = 0

Correlation coefficient for two variables (X,Y)

EXHIBIT 23.2 Scatter Diagram to Illustrate Correlation Patterns

Correlation, Covariance, and Causation

When two variables covary (i.e. vary systematically), they display concomitant variation.

This systematic covariation does not in and of itself establish causality.

e.g., Rooster’s crow and the rising of the sun Rooster does not cause the sun to rise.

Coefficient of Determination

Coefficient of Determination (R2) A measure obtained by squaring the

correlation coefficient; the proportion of the total variance of a variable accounted for by another value of another variable.

Measures that part of the total variance of Y that is accounted for by knowing the value of X.

Variance Total

varianceExplained2 R

Correlation Matrix

Correlation matrix The standard form for reporting correlation

coefficients for more than two variables. Statistical Significance

The procedure for determining statistical significance is the t-test of the significance of a correlation coefficient.

EXHIBIT 23.4 Pearson Product-Moment Correlation Matrix for Salesperson Examplea

Regression Analysis Simple (Bivariate) Linear Regression

A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable.

The Regression Equation (Y = α + βX ) Y = the continuous dependent variable X = the independent variable α = the Y intercept (regression line intercepts Y

axis) β = the slope of the coefficient (rise over run)

130

120

110

100

90

80

80 90 100 110 120 130 140 150 160 170

X

Y

XaY ˆˆ

XY

Regression Line and Slope

The Regression Equation Parameter Estimate Choices

β is indicative of the strength and direction of the relationship between the independent and dependent variable.

α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X)

Standardized Regression Coefficient (β) Estimated coefficient of the strength of

relationship between the independent and dependent variables.

Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1).

The Regression Equation (cont’d)

Parameter Estimate Choices Raw regression estimates (b1)

Raw regression weights have the advantage of retaining the scale metric—which is also their key disadvantage.

If the purpose of the regression analysis is forecasting, then raw parameter estimates must be used.

This is another way of saying when the researcher is interested only in prediction.

Standardized regression estimates (β) Standardized regression estimates have the advantage of

a constant scale. Standardized regression estimates should be used when

the researcher is testing explanatory hypotheses.

EXHIBIT 23.5 The Advantage of Standardized Regression Weights

EXHIBIT 23.6 Relationship of Sales Potential to Building Permits Issued

EXHIBIT 23.7 The Best Fit Line or Knocking Out the Pins

Ordinary Least-Squares (OLS) Method of Regression

Analysis OLS Guarantees that the resulting straight line will produce the

least possible total error in using X to predict Y. Generates a straight line that minimizes the sum of

squared deviations of the actual values from this predicted regression line.

No straight line can completely represent every dot in the scatter diagram.

There will be a discrepancy between most of the actual scores (each dot) and the predicted score .

Uses the criterion of attempting to make the least amount of total error in prediction of Y from X.

Ordinary Least-Squares Method of Regression Analysis (OLS) (cont’d)


The equation means that the predicted value for any value of X (Xi) is determined as a function of the estimated slope coefficient, plus the estimated intercept coefficient + some error.

© 2010 South-Western/Cengage Learning. All rights reserved. May not

be scanned, copied or duplicated, or posted to a publically accessible

website, in whole or in part.23–38

Ordinary Least-Squares Method of Regression

Analysis (OLS) (cont’d)

© 2010 South-Western/Cengage Learning. All rights reserved. May not

be scanned, copied or duplicated, or posted to a publically accessible

website, in whole or in part.23–39

Ordinary Least-Squares Method of Regression

Analysis (OLS) (cont’d) Statistical Significance Of Regression Model

F-test (regression) Determines whether more variability is

explained by the regression or unexplained by the regression.


Statistical Significance Of Regression Model ANOVA Table:


R2

The proportion of variance in Y that is explained by X (or vice versa)

A measure obtained by squaring the correlation coefficient; that proportion of the total variance of a variable that is accounted for by knowing the value of another variable.

875.040.882,3

49.398,32 R

EXHIBIT 23.8 Simple Regression Results for Building Permit Example

EXHIBIT 23.9 OLS Regression Line

Simple Regression and Hypothesis Testing

The explanatory power of regression lies in hypothesis testing. Regression is often used to test relational hypotheses. The outcome of the hypothesis test involves

two conditions that must both be satisfied: The regression weight must be in the hypothesized

direction. Positive relationships require a positive coefficient and negative relationships require a negative coefficient.

The t-test associated with the regression weight must be significant.

What is Multivariate Data Analysis?

Research that involves three or more variables, or that is concerned with underlying dimensions among multiple variables, will involve multivariate statistical analysis. Methods analyze multiple variables or even

multiple sets of variables simultaneously. Business problems involve multivariate data

analysis: most employee motivation research customer psychographic profiles research that seeks to identify viable market segments

The “Variate” in Multivariate

Variate A mathematical way in which a set of

variables can be represented with one equation.

A linear combination of variables, each contributing to the overall meaning of the variate based upon an empirically derived weight.

A function of the measured variables involved in an analysis: Vk = f (X1, X2, . . . , Xm )

EXHIBIT 24.1 Which Multivariate Approach Is Appropriate?

24–48

Classifying Multivariate Techniques

Dependence Techniques Explain or predict one or more dependent

variables. Needed when hypotheses involve

distinction between independent and dependent variables.

Types: Multiple regression analysis Multiple discriminant analysis Multivariate analysis of variance Structural equations modeling

Classifying Multivariate Techniques (cont’d)

Interdependence Techniques Give meaning to a set of variables or seek

to group things together. Used when researchers examine questions

that do not distinguish between independent and dependent variables.

Types: Factor analysis Cluster analysis Multidimensional scaling

Classifying Multivariate Techniques (cont’d)

Influence of Measurement Scales The nature of the measurement scales will

determine which multivariate technique is appropriate for the data.

Selection of a multivariate technique requires consideration of the types of measures used for both independent and dependent sets of variables.

Nominal and ordinal scales are nonmetric. Interval and ratio scales are metric.

24–51

EXHIBIT 24.2 Which Multivariate Dependence Technique Should I Use?

24–52

EXHIBIT 24.3 Which Multivariate Interdependence Technique Should I Use?

Analysis of Dependence General Linear Model (GLM)

A way of explaining and predicting a dependent variable based on fluctuations (variation) from its mean due to changes in independent variables.

μ = a constant (overall mean of the dependent variable)

∆X and ∆F = changes due to main effect independent variables(experimental variables) and blocking independent variables (covariates or grouping variables)

∆ XF = represents the change due to the combination(interaction effect) of those variables.

Interpreting Multiple Regression Multiple Regression Analysis

An analysis of association in which the effects of two or more independent variables on a single, interval-scaled dependent variable are investigated simultaneously.

inni eXbXbXbXbbY 3322110

• Dummy variable The way a dichotomous (two group)

independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other.

Multiple Regression Analysis

A Simple Example Assume that a toy manufacturer wishes to explain

store sales (dependent variable) using a sample of stores from Canada and Europe.

Several hypotheses are offered: H1: Competitor’s sales are related negatively to sales. H2: Sales are higher in communities with a sales

office thanwhen no sales office is present.

H3: Grammar school enrollment in a community is related

positively to sales.

Multiple Regression Analysis (cont’d)

Statistical Results of the Multiple Regression Regression Equation:

Coefficient of multiple determination (R2) = 0.845

F-value= 14.6, p < 0.05

321 7362115387018102 XXXY ....


Regression Coefficients in Multiple Regression Partial correlation

The correlation between two variables after taking into account the fact that they are correlated with other variables too.

R2 in Multiple Regression The coefficient of multiple determination in

multiple regression indicates the percentage of variation in Y explained by all independent variables.

24–58

Multiple Regression Analysis (cont’d) Statistical Significance in Multiple

Regression F-test

Tests statistical significance by comparing the variation explained by the regression equation to the residual error variation.

Allows for testing of the relative magnitudes of the sum of squares due to the regression (SSR) and the error sum of squares (SSE).

MSE

MSR

knSSe

kSSrF

1/

/


Degrees of Freedom (d.f.) k = number of independent variables n = number of observations or

respondents Calculating Degrees of Freedom (d.f.)

d.f. for the numerator = k d.f. for the denominator = n - k - 1

F-test

MSE

MSR

knSSe

kSSrF

1/

/

EXHIBIT 24.4

Interpreting Multiple Regression Results

ANOVA (n-way) and MANOVA

Multivariate Analysis of Variance (MANOVA) A multivariate technique that predicts

multiple continuous dependent variables with multiple categorical independent variables.

ANOVA (n-way) and MANOVA (cont’d)

Interpreting N-way (Univariate) ANOVA1. Examine overall model F-test result. If significant,

proceed.

2. Examine individual F-tests for individual variables.

3. For each significant categorical independent variable, interpret the effect by examining the group means.

4. For each significant, continuous covariate, interpret the parameter estimate (b).

5. For each significant interaction, interpret the means for each combination.

Discriminant Analysis A statistical technique for predicting the

probability that an object will belong in one of two or more mutually exclusive categories (dependent variable), based on several independent variables. To calculate discriminant scores, the linear

function used is:

niniii XbXbXbZ 2211

Discriminant Analysis Example

332211 XbXbXbZ

321 0007001300690 XXX ...

EXHIBIT 24.5 Multivariate Dependence Techniques Summary

Factor Analysis

Statistically identifies a reduced number of factors from a larger number of measured variables.

Types: Exploratory factor analysis (EFA)—performed

when the researcher is uncertain about how many factors may exist among a set of variables.

Confirmatory factor analysis (CFA)—performed when the researcher has strong theoretical expectations about the factor structure before performing the analysis.

EXHIBIT 24.6 A Simple Illustration of Factor Analysis

Factor Analysis (cont’d)

How Many Factors Eigenvalues are a measure of how much

variance is explained by each factor. Common rule:

Base the number of factors on the number of eigenvalues greater than 1.0.

Factor Loading Indicates how strongly a measured

variable is correlated with a factor.


Factor Rotation A mathematical way of simplifying factor analysis

results to better identify which variables “load on” which factors.

Most common procedure is varimax rotation. Data Reduction Technique

Approaches that summarize the information from many variables into a reduced set of variates formed as linear combinations of measured variables.

The rule of parsimony: an explanation involving fewer components is better than one involving many more.


Creating Composite Scales with Factor Results When a clear pattern of loadings exists,

the researcher may take a simpler approach by summing the variables with high loadings and creating a summated scale. Very low loadings suggest a variable does not

contribute much to the factor. The reliability of each summated scale is tested

by computing a coefficient alpha estimate.


Communality A measure of the percentage of a

variable’s variation that is explained by the factors.

A relatively high communality indicates that a variable has much in common with the other variables taken as a group.

Communality for any variable is equal to the sum of the squared loadings for that variable.


Total Variance Explained Squaring and totaling each loading factor;

dividing the total by the number of factors provides an estimate of variance in a set of variables explained by a factor. This explanation of variance is much the same

as R2 in multiple regression.

1 - 74

SPSS Windows

To select this procedure using SPSS for Windows, click:

Analyze>Data Reduction>Factor …

SPSS Windows: Principal Components

1. Select ANALYZE from the SPSS menu bar.2. Click DATA REDUCTION and then FACTOR.3. Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],”

“Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth [v6]” into the VARIABLES box

4. Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box check INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO AND BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. Click CONTINUE.

5. Click on EXTRACTION. In the pop-up window, for METHOD select PRINCIPAL COMPONENTS (default). In the ANALYZE box, check CORRELATION MATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In the DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.

6. Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY box, check ROTATED SOLUTION. Click CONTINUE.

7. Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE COEFFICIENT MATRIX. Click CONTINUE.

8. Click OK.

Cluster Analysis Cluster analysis

A multivariate approach for grouping observations based on similarity among measured variables. Cluster analysis is an important tool for identifying

market segments. Cluster analysis classifies individuals or objects into a

small number of mutually exclusive and exhaustive groups.

Objects or individuals are assigned to groups so that there is great similarity within groups and much less similarity between groups.

The cluster should have high internal (within-cluster) homogeneity and external (between-cluster) heterogeneity.

EXHIBIT 24.7 Clusters of Individuals on Two Dimensions

24–79

EXHIBIT 24.8 Cluster Analysis of Test-Market Cities

1 - 80

SPSS Windows

To select this procedure using SPSS for Windows, click:

Analyze>Classify>Hierarchical Cluster …

Analyze>Classify>K-Means Cluster …

Analyze>Classify>Two-Step Cluster

SPSS Windows: Hierarchical Clustering


2. Click CLASSIFY and then HIERARCHICAL CLUSTER.

3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.

4. In the CLUSTER box, check CASES (default option). In the DISPLAY box, check STATISTICS and PLOTS (default options).

5. Click on STATISTICS. In the pop-up window, check AGGLOMERATION SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS. Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER OF CLUSTERS, enter 4. Click CONTINUE.

6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE box, check ALL CLUSTERS (default). In the ORIENTATION box, check VERTICAL. Click CONTINUE.

7. Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE. Click CONTINUE.

8. Click OK.

SPSS Windows: K-Means Clustering


2. Click CLASSIFY and then K-MEANS CLUSTER.

3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.

4. For NUMBER OF CLUSTER, select 3.

5. Click on OPTIONS. In the pop-up window, in the STATISTICS box, check INITIAL CLUSTER CENTERS and CLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.

6. Click OK.

SPSS Windows: Two-Step Clustering


2. Click CLASSIFY and then TWO-STEP CLUSTER.

3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the CONTINUOUS VARIABLES box.

4. For DISTANCE MEASURE, select EUCLIDEAN.

5. For NUMBER OF CLUSTER, select DETERMINE AUTOMATICALLY.

6. For CLUSTERING CRITERION, select AKAIKE’S INFORMATION CRITERION (AIC).

7. Click OK.

Multidimensional Scaling

Multidimensional Scaling Measures objects in multidimensional

space on the basis of respondents’ judgments of the similarity of objects.

EXHIBIT 24.9 Perceptual Map of Six Graduate Business Schools: Simple Space

1 - 87

1 - 88

SPSS Windows

The multidimensional scaling program allows individual differences as well as aggregate analysis using ALSCAL. The level of measurement can be ordinal, interval or ratio. Both the direct and the derived approaches can be accommodated.

To select multidimensional scaling procedures using SPSS for Windows, click:

Analyze>Scale>Multidimensional Scaling …

The conjoint analysis approach can be implemented using regression if the dependent variable is metric (interval or ratio).

This procedure can be run by clicking:

Analyze>Regression>Linear …

SPSS Windows : MDSFirst convert similarity ratings to distances by subtracting each value of Table 21.1 from 8. The form of the data matrix has to be square symmetric (diagonal elements zero and distances above and below the diagonal. See SPSS file Table 21.1 Input).

1. Select ANALYZE from the SPSS menu bar.2. Click SCALE and then MULTIDIMENSIONAL SCALING

(ALSCAL).3. Move “Aqua-Fresh [AquaFresh],” “Crest [Crest],” “Colgate

[Colgate],” “Aim [Aim],” “Gleem [Gleem],” “Ultra Brite [UltraBrite],” “Ultra-Brite [var00007],” “Close-Up [CloseUp],” “Pepsodent [Pepsodent],” and “Sensodyne [Sensodyne]” into the VARIABLES box.

SPSS Windows : MDS4. In the DISTANCES box, check DATA ARE DISTANCES.

SHAPE should be SQUARE SYMMETRIC (default).

5. Click on MODEL. In the pop-up window, in the LEVEL OF MEASUREMENT box, check INTERVAL. In the SCALING MODEL box, check EUCLIDEAN DISTANCE. In the CONDITIONALITY box, check MATRIX. Click CONTINUE.

6. Click on OPTIONS. In the pop-up window, in the DISPLAY box, check GROUP PLOTS, DATA MATRIX and MODEL AND OPTIONS SUMMARY. Click CONTINUE.

7. Click OK.

24–92

EXHIBIT 24.10 Summary of Multivariate Techniques for Analysis of Interdependence

Further Reading COOPER, D.R. AND SCHINDLER, P.S. (2011)

BUSINESS RESEARCH METHODS, 11TH EDN, MCGRAW HILL

ZIKMUND, W.G., BABIN, B.J., CARR, J.C. AND GRIFFIN, M. (2010) BUSINESS RESEARCH METHODS, 8TH EDN, SOUTH-WESTERN

SAUNDERS, M., LEWIS, P. AND THORNHILL, A. (2012) RESEARCH METHODS FOR BUSINESS STUDENTS, 6TH EDN, PRENTICE HALL.

SAUNDERS, M. AND LEWIS, P. (2012) DOING RESEARCH IN BUSINESS & MANAGEMENT, FT PRENTICE HALL.

Download - Mba2216 week 11 data analysis part 02

Top Related