analysis of data – basic concepts
DESCRIPTION
14. Analysis of Data – Basic Concepts. 中央大學 . 資訊管理系 范錚強 mailto: [email protected] 2013.05 updated. Descriptive Statistics 描述性統計. 描述樣本的特性 主要要呈現的是: 你的研究樣本,和母體究竟有什麼差異?. Exploratory Data Analysis. Exploratory. Confirmatory. 一些探索性的資料呈現 (Ch.16). Scatter-plot Bar Chart, Pie chart - PowerPoint PPT PresentationTRANSCRIPT
11中央資管──范錚強
Analysis of Data –
Basic Concepts
中央大學 . 資訊管理系范錚強
mailto: [email protected]
2013.05 updated
14
22中央資管──范錚強
Descriptive Statistics描述性統計
描述樣本的特性主要要呈現的是:
你的研究樣本,和母體究竟有什麼差異?
33中央資管──范錚強
Exploratory Data Analysis
ConfirmatoryExploratory
44中央資管──范錚強
一些探索性的資料呈現 (Ch.16)
Scatter-plot
Bar Chart, Pie chart
Frequency table
Histogram 長條圖Cross Tabulation
55中央資管──范錚強
Statistical Procedures
Descriptive Statistics
Inferential Statistics
66中央資管──范錚強
Confirmatory Studies
Hypothesis Testing 假說檢驗Research Hypothesis
Null Hypothesis H0
Refutation 反証基於想要驗證的研究假說,建立反面的一個「稻草人」 H0 (原來的研究假說就是統計裡的替代假說)
用統計來推翻 H0 的真實性因此替代假說獲得支持
77中央資管──范錚強
Types of Hypotheses
Null
H0: = 50 mpg
H0: < 50 mpg
H0: > 50 mpg
Alternate
HA: = 50 mpg
HA: > 50 mpg
HA: < 50 mpg
88中央資管──范錚強
Two-Tailed Test of Significance
99中央資管──范錚強
One-Tailed Test of Significance
1010中央資管──范錚強
Decision Rule
Take no corrective action if the analysis shows that one cannot reject the null hypothesis.
1111中央資管──范錚強
Statistical Decisions
1212中央資管──范錚強
Tests of Significance
Nonparametric非參數、無母數
統計:弱統計
Parametric參數統計
是「強」統計
1313中央資管──范錚強
Assumptions for Using Parametric Tests
Independent observationsIndependent observations
Normal distributionNormal distribution
Equal variancesEqual variances
Interval or ratio scalesInterval or ratio scales
1414中央資管──范錚強
Advantages of Nonparametric Tests
Easy to understand and useEasy to understand and use
Usable with nominal dataUsable with nominal data
Appropriate for ordinal dataAppropriate for ordinal data
Appropriate for non-normal population distributions
Appropriate for non-normal population distributions
1515中央資管──范錚強
How to Select a Test
How many samples are involved?
If two or more samples are involved, are the individual cases independent or related?
Is the measurement scale nominal, ordinal, interval, or ratio?
1616中央資管──范錚強
Recommended Statistical Techniques
Two-Sample Tests____________________________________________
k-Sample Tests ____________________________________________
Measurement Scale One-Sample Case Related Samples
Independent Samples Related Samples
Independent Samples
Nominal Binomial
x2 one-sample test
McNemar Fisher exact test
x2 two-samples test
Cochran Q x2 for k samples
Ordinal Kolmogorov-Smirnov one-sample test
Runs test
Sign test
Wilcoxon matched-pairs test
Median test
Mann-Whitney U
Kolmogorov-Smirnov
Wald-Wolfowitz
Friedman two-way ANOVA
Median extension
Kruskal-Wallis one-way ANOVA
Interval and Ratio
t-test
Z test
t-test for paired samples
t-test
Z test
Repeated-measures ANOVA
One-way ANOVA
n-way ANOVA
1717中央資管──范錚強
Measures of Association: Interval/Ratio
Pearson correlation coefficient For continuous linearly related variables
Correlation ratio (eta)For nonlinear data or relating a main effect to a continuous dependent variable
BiserialOne continuous and one dichotomous variable with an underlying normal distribution
Partial correlationThree variables; relating two with the third’s effect taken out
Multiple correlationThree variables; relating one variable with two others
Bivariate linear regressionPredicting one variable from another’s scores
1818中央資管──范錚強
Pearson’s Product Moment Correlation r
Is there a relationship between X and Y?
What is the magnitude of the relationship?
What is the direction of the relationship?
1919中央資管──范錚強
Scatterplots of Relationships
2020中央資管──范錚強
Scatterplots
2121中央資管──范錚強
Interpretation of Correlations
X causes YX causes Y
Y causes XY causes X
X and Y are activated by one or more other variablesX and Y are activated by
one or more other variables
X and Y influence each other reciprocally
X and Y influence each other reciprocally
2222中央資管──范錚強
Artifact Correlations
2323中央資管──范錚強
Interpretation of Coefficients
A coefficient is not remarkable simply because it is statistically significant! It must be practically meaningful.
2424中央資管──范錚強
Coefficient of Determination: r2
Total proportion of variance in Y explained by X
Desired r2: 80% or more
2525中央資管──范錚強
Classifying Multivariate Techniques
InterdependencyDependency
2626中央資管──范錚強
Multivariate Techniques
2727中央資管──范錚強
Multivariate Techniques
2828中央資管──范錚強
Multivariate Techniques
2929中央資管──范錚強
Right Questions. Trusted Insight.
When using sophisticated techniques you want to rely on the knowledge of the researcher.
Harris Interactive promises you can trust their experienced research professionals to draw the right conclusions from the collected data.
3030中央資管──范錚強
Dependency Techniques
Multiple RegressionMultiple Regression
Discriminant AnalysisDiscriminant Analysis
MANOVAMANOVA
Structural Equation Modeling (SEM)Structural Equation Modeling (SEM)
Conjoint Analysis Conjoint Analysis
3131中央資管──范錚強
Uses of Multiple Regression
Develop self-weighting
estimating equation to
predict values for a DV
Control for
confounding Variables
Test and
explain causal theories
3232中央資管──范錚強
Generalized Regression Equation
3333中央資管──范錚強
Multiple Regression Example
3434中央資管──范錚強
Selection Methods
Forward
Backward
Stepwise
3535中央資管──范錚強
Evaluating and Dealing with Multicollinearity
Choose one of the variables and delete the other
Create a new variable that is a composite of the others
CollinearityStatistics
VIF
1.000
2.289
2.289
2.748
3.025
3.067
3636中央資管──范錚強
Discriminant Analysis
Predicted Success
Actual GroupNumber of Cases 0 1
Unsuccessful
Successful
0
1
15
15
13 86.70%
3
20.00%
2
13.30%
12
80.00%
Note: Percent of “grouped” cases correctly classified: 83.33%
Unstandardized Standardized
X1
X1
X1
Constant
.36084
2.61192
.53028
12.89685
.65927
.57958
.97505
A.
B.
3737中央資管──范錚強
MANOVA
3838中央資管──范錚強
MANOVA Output
3939中央資管──范錚強
Bartlett’s Test
4040中央資管──范錚強
MANOVA Homogeneity-of-Variance Tests
4141中央資管──范錚強
Multivariate Tests of Significance
4242中央資管──范錚強
Univariate Tests of Significance
4343中央資管──范錚強
Structural Equation Modeling (SEM)
Model SpecificationModel Specification
EstimationEstimation
Evaluation of FitEvaluation of Fit
Respecification of the ModelRespecification of the Model
Interpretation and CommunicationInterpretation and Communication
4444中央資管──范錚強
Structural Equation Modeling (SEM)
4545中央資管──范錚強
Interdependency Techniques
Factor AnalysisFactor Analysis
Cluster AnalysisCluster Analysis
Multidimensional ScalingMultidimensional Scaling
4646中央資管──范錚強
Factor Analysis
4747中央資管──范錚強
Factor Matrices
AUnrotated Factors
BRotated Factors
Variable I II h2 I II
A
B
C
D
E
F
Eigenvalue
Percent of variance
Cumulative percent
0.70
0.60
0.60
0.50
0.60
0.60
2.18
36.3
36.3
-.40
-.50
-.35
0.50
0.50
0.60
1.39
23.2
59.5
0.65
0.61
0.48
0.50
0.61
0.72
0.79
0.75
0.68
0.06
0.13
0.07
0.15
0.03
0.10
0.70
0.77
0.85
4848中央資管──范錚強
Orthogonal Factor Rotations
4949中央資管──范錚強
Factor Matrix, Metro U MBA Study
Variable Course Factor 1 Factor 2 Factor 3 Communality
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
Eigenvalue
Percent of variance
Cumulative percent
Financial Accounting
Managerial Accounting
Finance
Marketing
Human Behavior
Organization Design
Production
Probability
Statistical Inference
Quantitative Analysis
0.41
0.01
0.89
-.60
0.02
-.43
-.11
0.25
-.43
0.25
1.83
18.30
18.30
0.71
0.53
-.17
0.21
-.24
-.09
-.58
0.25
0.43
0.04
1.52
15.20
33.50
0.23
-.16
0.37
0.30
-.22
-.36
-.03
-.31
0.50
0.35
0.95
9.50
43.00
0.73
0.31
0.95
0.49
0.11
0.32
0.35
0.22
0.62
0.19
5050中央資管──范錚強
Varimax Rotated Factor Matrix
Variable Course Factor 1 Factor 2 Factor 3
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
Financial Accounting
Managerial Accounting
Finance
Marketing
Human Behavior
Organization Design
Production
Probability
Statistical Inference
Quantitative Analysis
0.84
0.53
-.01
-.11
-.13
-.08
-.54
0.41
0.07
-.02
0.16
-.10
0.90
-.24
-.14
-.56
-.11
-.02
0.02
0.42
-.06
0.14
-.37
0.65
-.27
-.02
-.22
-.24
0.79
0.09
5151中央資管──范錚強
Cluster Analysis
Select sample to clusterSelect sample to cluster
Define variablesDefine variables
Compute similaritiesCompute similarities
Select mutually exclusive clustersSelect mutually exclusive clusters
Compare and validate clusterCompare and validate cluster
5252中央資管──范錚強
Cluster Analysis
5353中央資管──范錚強
Cluster Membership
________Number of Clusters ________
Film Country Genre Case 5 4 3 2
Cyrano de Bergerac
Il y a des Jours
Nikita
Les Noces de Papier
Leningrad Cowboys . . .
Storia de Ragazzi . . .
Conte de Printemps
Tatie Danielle
Crimes and Misdem . . .
Driving Miss Daisy
La Voce della Luna
Che Hora E
Attache-Moi
White Hunter Black . . .
Music Box
Dead Poets Society
La Fille aux All . . .
Alexandrie, Encore . . .
Dreams
France
France
France
Canada
Finland
Italy
France
France
USA
USA
Italy
Italy
Spain
USA
USA
USA
Finland
Egypt
Japan
DramaCom
DramaCom
DramaCom
DramaCom
Comedy
Comedy
Comedy
Comedy
DramaCom
DramaCom
DramaCom
DramaCom
DramaCom
PsyDrama
PsyDrama
PsyDrama
PsyDrama
DramaCom
DramaCom
1
4
5
6
19
13
2
3
7
9
12
14
15
10
8
11
18
16
17
1
1
1
1
2
2
2
2
3
3
3
3
3
4
4
4
4
5
5
1
1
1
1
2
2
2
2
3
3
3
3
3
4
4
4
4
3
3
1
1
1
1
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
5454中央資管──范錚強
Dendogram