design & analysis of social surveys
TRANSCRIPT
DESIGN & ANALYSIS OF SOCIAL SURVEYS
COMPARATIVE PERSPECTIVES ACROSS
TIME & SPACE
MORE EFFICIENT CROSS-NATIONAL DATA ANALYSIS
WITH
MULTIPLE CORRESPONDENCE ANALYSIS
Fifth International Conference John KochevarSocial Science Methodology 17 Monument SquareCologne, Germany October 2000 Boston, MA 02129
OUTLINE
I. Introduction
II. Commercial Cross-National Research
III. Theoretical and Practical Problems
IV. An Analysis Approach and Methodology
V. Examples- Fear of Hypoglycemia in Four Nations- Job Stress among Women in Five Nations
VI. Summary
EFFICIENT METHODS FOR MINING CROSS-NATIONAL DATA
INTRODUCTION
COMPARATIVE CROSS-NATIONAL RESEARCH IS VERY DIFFICULT
THREE LESSONS
• There are severe theoretical and practical restraints in comparative cross-national research.
• Exploratory data analysis is necessary for most cross-national research.
• Path analysis logic and multiple correspondence analysis can be used together for comprehensive, convenient and efficient data mining.
COMMERCIAL CROSS-NATIONAL RESEARCH
A GROWING SEGMENT
BOOKS ON CROSS-NATIONALRESEARCH IN HARVARD LIBRARIES
• We used a keyword search on Harvard’s online catalog to estimate the number of books published under “cross-national,” “cross-cultural” and “comparative studies” over the last 40 years. We counted books that appeared to be quantitative in nature.
• The number of books on cross-national research peaked in the 1970’s and has leveled off since.
• This is not a precise count. Other books on cross-national research may appear under different keywords. We only counted English language
references.
Number of Books
1960 to 1999
0
5
10
15
20
25
30
1960-65 1971-75 1981-85 1991-95
INCOME FROM INTERNATIONAL RESEARCH
TOP 50 US RESEARCH FIRMS
Note: Data collected by Council of American Survey Research Organization (CASTRO).Source: Honomichl Top 50 Marketing News, June 5, 2000; May 28, 1990.
$2.6 billion
31% 39%
$6.8 billion
Total US$ Revenues
PercentInternational
Percent International
Total US$ Revenues
2000 1990
PROBLEMS
WEAK THEORY MESSY DATA
CROSS-NATIONAL COMMERCIAL RESEARCH
THEORETICAL PROBLEMS
• Concept Equivalence
- Awareness and understanding of products vary strongly across nations.
- Beliefs, attitudes and values are seldom equivalent.
• Too many variables and too few theories
- Behavior is multivariate complex.
- Marketing “Theory” is a fashion industry.
CROSS-NATIONAL COMMERCIAL RESEARCH
PRACTICAL PROBLEMS
• Rating scales are not used the same way across nations.
- Surface validity is often weak.
- Reliability is weak. Intervals are not equal.
• Statistical assumptions are not justified.
- Normal distribution
- Homogeneity of variance
- Interaction effects
• Clients are not well-trained.
- Not interested in methodology
- Cannot read data displays
- Have the patience of angry children
• Data is often dirty.
- Sampling error/Non-response problems
- Interviewer error
- Problems in editing, coding, tabulating
I f you love surveys or
sausages, you should not
wat ch e it her be ing made . . .
Source: Otto von Bismarck: “Wenn sie Gesetze and Würste mögen, dan sollten Sie niemals bei der Herstellung von beiden zuschauen.”
CONCEPTUAL APPROACH AND METHODOLOGY
COMPREHENSIVE ROBUST UNDERSTANDABLE
ASSUME NOTHING / CHECK EVERYTHING
Multivariate
Hierarchical Effect Models
Effects are universal. Nations are intervening.
Blocked indicators
Dependent variables. Simon/Blalock approach.
Two-stage analysis
Check distributions Eliminate outliers
Seek interaction effects Multiple Correspondence Analysis
Practical Display results in the simplest format.
CONCEPTUAL METHODOLOGICAL
1. Conceptual Model. Decide on dependent variable. Organize similar independent variables into “blocks.” Organize blocks into a path model or hierarchical levels of influence.
2. Recode data. Categorical level of analysis.
3. Universal Analysis. Run separate analyses on each block of indicators with the dependent variable(s). Remove outliers and rerun analyses.
4. Nation Analysis. Introduce nations into each Universal analysis to determine extent of “universality.”
5. Most important predictors. Run the strongest predictors from each block of indicators. Introduce nations in a second analysis.
6. Check results. Examine the most important findings using crosstabs and other statistical techniques.
DATA MINING WITH MCA
STAGES OF ANALYSIS
EXAMPLE I
FEAR OF HYPOGLYCEMIA
CROSS-NATIONAL COMMERCIAL SURVEYS
FEAR OF HYPOGLYCEMIA
• An American pharmaceutical manufacturer wanted to know the causes of fear of hypoglycemia (low blood sugar) among diabetics.
• Sample: Diabetics in four nations: United Kingdom, Germany, Spain, France.
• Questionnaires were self-completed under supervision.
• Approximately 50 respondents in each nation. N=195.
• Data quality varied. There was high variance in the main fear rating.
FACTORS CAUSING FEAR OF HYPOGLYCEMIADemographics Diabetes Experiences
Fear Rating
Type of Symptoms
FrequencySymptoms
FrequencyWorry
Reasonsfor
Worry
Age
Sex
Education
Type of Diabetes
Years of
Diabetes
Insulin Use
Living Conditions
Type ofInsulin
FEAR OF HYPOGLYCEMIA AND NATIONALITYIndependent
Intervening
Frequency Reasons
Worry
Nationality
Demographics
Diabetes
Experience Symptoms
Fear Rating
Demographics
Diabetes
Experience Symptoms
Worry
Nationality
Fear Rating
24
23
22
21
20
191817
16
15
14
13
12
11
10
9
8
7
65
4
3
2
1
Age3 LT 204 20-255 26-306 31-357 36-408 41-459 46-5010 51-5511 56-6012 61-6513 66-7014 71+
Sex15 Male16 Female
Education17 Middle school or less18 High school19 College20 Post-Grad
Living Conditions21 Live Alone22 Live with children but no
other adults23 Live with at least one
other adult24 Live with children and at
least one other adult
1 Low2 High
Intensity of Worry
SEVERE HYPOGLYCEMIA FEAR - DEMOGRAPHICSUNIVERSAL
Age3 LT 204 20-255 26-306 31-357 36-408 41-459 46-5010 51-5511 56-6012 61-6513 66-7014 71+
Sex15 Male16 Female
Education17 Middle school or less18 High school19 College20 Post-Grad
Living Conditions21 Live Alone22 Live with children but no
other adults23 Live with at least one
other adult24 Live with children and at
least one other adult
1 Low2 High
Intensity of Worry
SEVERE HYPOGLYCEMIA FEAR - DEMOGRAPHICSNATION INFLUENCE
28
27
26
25
24
23
22 21
20
19
18
17
1615
14 13
12
11
10
9
8
7
6
5
4
3
2
1
Nation
25 UK 26 Germany 27 France 28 Spain
1 Low2 High
Intensity of Worry
Age3 LT 204 20-255 26-306 31-357 36-408 41-459 46-5010 51-5511 56-6012 61-6513 66-7014 71+
Sex15 Male16 Female
Living Conditions17 Live Alone18 Live with children but no other adults19 Live with at least one other adult20 Live with children and at least one
other adultFrequency of Worry21 Weekly22 Less Often23 Never
Reasons for Worry24 Fear of Associated Symptoms25 Difficult to Manage26 Fear of Coma27 Get it While Asleep28 Get it While Alone / No one to help29 Unpleasant
2928
27
26
25
24
23
22
21 20
19
18
17
16
15
14
13
1211
10
987
6 5
4
3
2 1
SEVERE HYPOGLYCEMIA FEAR - STRONG PREDICTORSUNIVERSAL
SEVERE HYPOGLYCEMIA FEAR - STRONG PREDICTORSNATION INFLUENCE
1 Low2 High
Intensity of Worry
Age 3 LT 20 4 20-25 5 26-30 6 31-35 7 36-408 41-459 46-5010 51-5511 56-6012 61-6513 66-7014 71+
Sex15 Male16 Female
Living Conditions17 Live Alone18 Live with children but no other adults19 Live with at least one other adult20 Live with children and at least one
other adult
Frequency of Worry21 Weekly22 Less Often23 Never
Reasons for Worry24 Fear of Associated Symptoms25 Difficult to Manage26 Fear of Coma27 Get it While Asleep28 Get it While Alone / No one to help29 Unpleasant Nation
33
32 31
30
29
28
27
26
25
24 23
22
2120
19
18
17
1615
14
13
1211
10
9
87
6 5
4
32 1
30 UK 31 Germany 32 France 33 Spain
100% 33% 50%
- - 67 50
100% 92% 93%
- - 8 8
FEAR OF HYPOGLYCEMIA
BEHIND THE MCA CHARTS
Spain France
High
Low
Get it while alone
Total
Fear of associated symptoms
Fear Yes No TotalYes No
Get it while alone
No fear of associated symptoms
High fear
High
Low
86% 71% 74%
14 29 26
² =.09 ² =1.333
² =.64
88% 52% 60%
13 48 40
² =3.268
• This table illustrates part of the data in the final “Top predictors” chart. The results seem to be contradictory and require close attention.
• The MCA chart shows that “Getting while alone” and “Fear of associated symptoms” interact with fear more strongly for Spanish diabetics. The data shown here indicate the interactions are stronger for France.
• Overall, there was a stronger relationship between Spain and high fear. The MCA chart does not include the data point “No fear of associated symptoms.” The data point of France, 32, is pulled toward this invisible point and away from high fear. The display is correct, but incomplete. Take care. . .
1. Data Display. We show only two examples from the total analysis. In the full analysis we ran universal and national MCA’s for variables blocked as “Demographics,” “Diabetes,” and “Experiences.” Those variables highly associated with the dependent variable were run in the final “Top predictor” MCA charts. There were eight display charts.
2. Final Predictors and “Causality”. The distance between a predictor and the dependent variable is approximately their ² distance (higher ², closer distance) with other relationships taken into account. Logically, we need to know time order and if-only-if, to infer causality. Practically speaking, variables that are causally related show stronger relationships than those which are not causally related. We use our judgement to pick top predictors that have the strongest relationships with the dependent variable.
3. Interpretation of Axes. Initially we interpreted the axes in presentations to clients. They disagreed with our interpretations and could not agree on their own. In general, MCA appears to commercial clients because it summarizes important interactions and subgroups in their overall context. We seldom interpret latent variables implicit in axes loadings.
4. Severe Hypoglycemia and Fear. The final table shows that some factors, e.g. “Fear of Coma”, “Unpleasant”, “Get it while asleep”, were associated with fear of severe hypoglycemia, independent of other variables. The strongest relationships, however, were interactions of several variables, e.g. “Fear of associated symptoms” (e.g. ”confusion”), “Get it while alone” and Spanish nationality. None of the strong relationships changed when nationality was introduced, so we conclude that nationality does not intervene.
So why are the Spanish experiencing fear so intensely? Subsequent research determined that the Spanish gave themselves more insulin injections and positively valued the initial symptoms of hypoglycemia because these otherwise unpleasant symptoms indicated their sugar was under control. They chose to risk hypoglycemia and experience its symptoms more than other nationalities. Germans, for example, gave themselves fewer injections, allowed their sugar to run high, and ultimately experienced more long term toxic consequences of their illness, e.g. diabetic foot amputations.
SEVERE HYPOGLYCEMIA FEARNOTES ON INTERPRETATION
EXAMPLE II
WOMEN AND WORK STRESS
CROSS-NATIONAL COMMERCIAL SURVEYS
WOMEN AND WORK STRESS
• An American woman’s magazine wanted to know the causes of job stress among working women.
• Sample: Working women magazine readers in five nations: United States, Japan, Germany, Brazil and Australia.
• Questionnaires were in magazines, self-completed and returned by mail (N=22,500). We randomly sampled returns.
• Final sample N=4,500.
• Data quality varied.
Demographics
FACTORS ASSOCIATED WITH WORK STRESS
Personality
EducationIncome
Work Stress
Incidence DurationSeverity
Age
PerfectionismStress is Stimulating
Work Motivations
Career GoalsReasons for Working
Home Factors Work Factors
Culture
Nationality
Home Problems
Children
Marital Status
Environment
ControlSocial support
Occupation
WORK STRESS AND DEMOGRAPHICSUNIVERSAL
Income - Quintiles4 Low5 6 Medium78 High
Education 9 Less than HS 10 High School11 Vocational/Trade 12 Jr Coll/Assoc Some College 13 College14 Grad Prof School
Age15 18-2516 26-3017 31-3618 36-4019 41-4520 46-5021 51-5522 56+
1 Low 2 Moderate 3 High
Stress
22
2120
1918
17
16
15
14
131211
10
9
8
7
65
4
32
1
WORK STRESS AND DEMOGRAPHICSCOUNTRY INFLUENCE
Income - Quintiles4 Low5 6 Medium78 High
Education 9 Less than HS 10 High School11 Vocational/Trade 12 Jr Coll/Assoc Some College 13 College14 Grad Prof School
Age15 18-2516 26-3017 31-3618 36-4019 41-4520 46-5021 51-5522 56+
1 Low 2 Moderate 3 High
Stress
27
26
25
24
23
2221
2019
18
1716
15
14
13
12
11
10
9
8
7
65
4
3
2
1
Country23 US 24 Japan 25 Australia 26 Germany 27 Brazil
WORK STRESS AND STRONG PREDICTORS
UNIVERSAL
1 Low 2 Moderate 3 High
Stress
Age 4 18-25 5 26-30 6 31-35 7 36-40 8 41-45 9 46-5010 51-5511 56+
Education12 Less than high school13 High School14 Voc/Trade15 Jr college/some educ16 College17 Graduate school
18 Work to support myself
Perfectionist - Describes19 Very well20 Somewhat21 Describes22 Somewhat does not23 Does not
Stimulated by stress/ pressure24 Yes25 Pressure, not stress26 Seldom
27 Moved28 Changed jobs
Occupation29 Managers30 Professionals31 Craftsmen32 Technicians and Admin.33 Bureaucratized Service34 Commercialized Service35 Routinized workers36 Laborers37 Marginal Workers (Students)
38 No privacy39 Too many interruptions40 Too much work to do a good job
Can control pace - describes41 Very well42 Somewhat43 Describes44 Somewhat doesn’t 45 Does not
Hours per week46 1-2047 21-3448 35-3949 4050 41-4551 46-5952 60+
Colleagues under stress53 Majority54 A few55 Don’t know
Female Work Friends56 None57 1-258 3-559 6+
59
58
57
56
55
5453
52
51
50
49
48
47
46
45
44 43
42
4140
39 38
37
36
35
34
3332
31
30
29
282726
25
24
23
22
2120
19
18
17
16
15
14
1312
11109
8
7
6
5
4
3
2
1
64
63
62
61
60
59
5857 56
55
54
53
5251
50
4948
47
46
45
4443
42 41
4039 38
37
36
35
3433
32
31
3029
2827
26
25
24
23
22
2120
19
18
17
1615
14
13
12
1110
98
7
6
54
3
2
1
WORK STRESS AND STRONG PREDICTORSNATION INFLUENCE
1 Low 2 Moderate 3 High
Stress
Country60 US 61 Japan 62 Australia 63 Germany 64 Brazil
Age 4 18-25 5 26-30 6 31-35 7 36-40 8 41-45 9 46-5010 51-5511 56+
Education12 Less than high school13 High School14 Voc/Trade15 Jr college/some educ16 College17 Graduate school
18 Work to support myself
Perfectionist - Describes19 Very well20 Somewhat21 Describes22 Somewhat does not23 Does not
Stimulated by stress/ pressure24 Yes25 Pressure, not stress26 Seldom
27 Moved28 Changed jobs
Occupation29 Managers30 Professionals31 Craftsmen32 Technicians and Admin.33 Bureaucratized Service34 Commercialized Service35 Routinized workers36 Laborers37 Marginal Workers (Students)
38 No privacy39 Too many interruptions40 Too much work to do a good job
Can control pace - describes41 Very well42 Somewhat43 Describes44 Somewhat doesn’t 45 Does not
Hours per week46 1-2047 21-3448 35-3949 4050 41-4551 46-5952 60+
Colleagues under stress53 Majority54 A few55 Don’t know
Female Work Friends56 None57 1-258 3-559 6+
DO CAUSES OF STRESS INTERACT?TOTAL SAMPLE
• This table shows the cumulative impact of the major stress predictors.
• There is a small cumulative impact with the addition of each problem.
• A total of 130 respondents reported all four problems.
Too Much Work to do a Good Job
+ Low Pace Control
+ Majority of Colleagues Under Stress
+ Too Many Interruptions
Those Reporting High Stress
45%
54%
59%
60%
Yes No
45% 23%
22%
29%
19%
20%
10%
9%
IS GERMANY UNIQUE?Too Much Work to Do a Good Job?
All Countries Average
High Stress
Strength of Relationship
Strength by Country
Germany
Australia
US
Japan
Brazil
Cannot Control Pace? Yes No
44% 28%
16%
23%
16%
16%
15%
7%
1. Data Display. We show only four charts from a much larger analysis. In addition, we have not displayed some data points. For example, “No privacy” has two values, “Describes” and “Does not Describe.” We do not show the “Does not Describe” on the chart. This makes it possible to fit many more variables on the chart - and it is consistent with our data mining approach. However, the absence of a cause (e.g. describe“) can have unique interactions. We examine the initial computer results and plot them only when they are important.
2. Occupation. We coded results to a standard used by job safety researchers except in Japan. At the time of the survey there was a controversy concerning stress-related worker deaths (karaoshi) in Japan. Japanese workers were classified only as “part-time” or “full-time” under the orders of a unit manager (not a researcher) who was afraid the results might reflect badly on certain businesses. Unfortunately, almost all female Japanese workers are customarily classified as part-time workers, and all tables with occupation as a variable are distorted by the unique relationship between Japan and occupation.
3. Job stress. The Universal table supports the findings of earlier studies that show that immediate factors in
the worker’s environment - “Control of Pace”, “Too much work” - are the most important cause of worker stress. This relationship held while controlling for a variety of other relationships and was even stronger in the presence of other work conditions, e.g. “Colleagues under stress.” German women tended to experience more stress in response to bad work conditions. Some of this can be explained by the higher proportion of factory workers in the German sample, but not all.
We are conducting additional analyses to determine if actions taken to alleviate stress, e.g. Exercise, dancing, drinking, may account for the differences in experienced work stress across nations.
WORKING WOMEN AND WORK STRESSNOTES ON INTERPRETATION
SUMMARY
• Theory is weak in cross-national commercial research. Exploratory data analysis is required.
• Data is messy. Error is very high. Use categorical level of measurement.
• Interaction effects are common. Systematically identify effects on single dependent variable. Use multiple correspondence analysis.
• Results are multivariate complex. Determine the most important predictors among blocks of similar predictors. Use causal path logic.
• Display all results. Show the comparative effects of nations for each block of indicators.
• Check results using alternative analyses.
SUMMARY
• Exploratory. Comprehensive.
• Logic is obvious.
• Few assumptions about data.
• Interactions become apparent.
• Efficient.
• Easy to read displays.
CORRESPONDENCE ANALYSIS FOR DATA MINING
ADVANTAGES
REFERENCES
Problems of Cross-national Comparative Research
The problems have been well known for many years.
Armer, M., Grimsaw, A.D. (Eds.), Comparative Social Research: Methodological Problems and Strategies.New York: Wiley, 1973.
Kohn, M.L.,Cross-National Research as an Analytic Strategy. American Sociological Review.1987, Vol.52,713-731.
Van de Vijer, F., Leung, K. Methods and Data Analysis for Cross-Cultural Research. London: Sage, 1997.
Correspondence Analysis
The application of correspondence analysis for detecting interactions was noted by Hayashi.
Hayashi, C., Suzuki, T. Quantitative Approach to a Cross-Societal Research. Annals of the Institute of Statistical Mathematics, Vol. 27, 1975, 1-32.
Logic and Strategy
Hayashi, C., The Quantitative Study of National Character. In Sasaki, M. (Ed.) Values and Attitudes Across Nations, Boston: Brill, 1998, 91-114.
Jones, L. (Ed.) The Collected Works of John W. Tukey: Volume IV Philosophy and Principles of Data Analysis. Monterey, CA: Wordsworth and Brooks, 1986.
Simon, H. A. Spurious Correlation: A Causal Interpretation. Journal of the American Statistical Association. 1954, Vol. 49, 467-479.
Blalock, H.M. Multiple Indicators and the Causal Approach to Measurement Error. American Journal of Sociology, 1969, Vol. 75, 264-272.