presentación de powerpoint -...
TRANSCRIPT
27/11/2013
1
Research Methods Lesson Nº10 Research Methods Lesson Nº10
Introduction
• To answer your research question you
would need to:
– Organize data
– Analyse data
– Interprete it
But remember, IF DATA IS COLLECTED BADLY
THE ANALYSYS IS WORTHLESS.
Research Methods Lesson Nº10
• Data analysis usually employs variables.
A variable is the characteristic of the
population or sample observed.
– Quantitative variables are those that are
expressed numerically. They are the base
for Quantitative research.
• Discrete variables are restricted to whole
numbers
• Continuous variables are those that can take any
value among a given range.
– Qualitative variables are those that are not
measured numerically.
Introduction Research Methods Lesson Nº10
Types of Variables
1. Nominal variables: involve categories and they must be mutually exclusive (male/female).
2. Ordinal variables: measure the intensity of something by ordering categories (agree,..., indifferent,... strongly disagree). Spaces between categories, however, may not be equal.
3. Interval/ratio variables allow to measure differences between values. In the case of interval variables there is a logical zero point (total absence of this variable), e.g. age, income.
Research Methods Lesson Nº10
• The analysis of data starts with the data layout
• Record data using numerical codes
• Data is usually entered in table format. This table is called a data matrix. – Each column usually represents a single variable
– Each row contains the values of the variables for an individual case or Time period
Id Age Gender Service Employed
Case 1 1 27 1 3 1
Case 2 2 39 2 1 2
Case 3 3 34 1 2 1
Exploring and representing data
Research Methods Lesson Nº10
Exploring (Cont.)
• Graphs help to explore and understand
your data
• What should diagrams and tables
include?
– Clear descriptive title
• Clear axis labels / row and column headings
– Units of measurement
– Logical sequence of bars / columns and rows
– Sources of data
– Explanations for every abbreviation
– Size of the sample employed
27/11/2013
2
Research Methods Lesson Nº10
EXPLORING AND REPRESENTING DATA
Individual Variable
Frequency Distribtution
Graphs
Statistics
Several variables
Graphs
Contingency Tables
Relationships
Research Methods Lesson Nº10
Research Methods Lesson Nº10
Exploratory analysis
• Frequency distribution table
• To achieve clarity use percentages
Nationality Frequency
Portuguese 20
French 25
Spanish 15
Mexican 5
Defects?
Research Methods Lesson Nº10
Practice1.xls shows the the age of the
participants:
43 35 48 43 11 15 38 21 25 37 17 31 49 11 11 47 42 25 27 36 27 13 13 35 23 47 34 23 27 30 41 25 41 18 24 18 18 28 40 32 40 24 26 47 30 22 48 30 37 23 33 42 29 47 32 29 24 19 35 35 20 31 46 36 45 31 44 26 20 44 27 23 43 31 40 27 23 28 46 28
Research Methods Lesson Nº10
• First, identify the number of intervals or groups (n). It can be calculated with the rule of thumb 2^n >number of cases.
• In this case the number of cases is 80; therefore n=7;
• The intervals’ width is calculated as (largest value –smallest value)/ n = 5,43
• Consequently there are 7 intervals using a width of 6 which must start from the smallest value and end in the largest
• 11-16;17-22;..…;47-52
• Then you calculate the frequency for each interval
• Excel commands: Max(range), Min(range); Frequency(range of data; range of intervals) Crtl+Shift+Enter.
Research Methods Lesson Nº10
The result will be:
Age Interval
Frequency Relative
frequency % Cumulative
Frequency %
11-16 6 7,5 7,5
17-22 9 11,25 18,75
23-28 21 26,25 45
29-34 13 16,25 61,25
35-40 12 15 76,25
41-46 12 15 91,25
47-52 7 8,75 100
TOTAL 80 100
27/11/2013
3
Research Methods Lesson Nº10
PRACTICE 1 AGE.xls
• Calculate the frequency
• Relative frequency
• Cumulative frequency
Research Methods Lesson Nº10
It will ask you about the data range. Be sure there are no blank rows or columns
Insert
Frequency Distributions using
Pivot tables. WINDOWS
Pivot Table (tableau croisé dynamique)
Research Methods Lesson Nº10
Values of the variable selected: frequencies, percentages….
Variables
The table actualizes every time you change any parameter. Solution: Copy the results
Research Methods Lesson Nº10
What Kind of Values Can it Show? Left click and Value
configuration
Research Methods Lesson Nº10
To Group Data go to Table Tools
Group Categories
Options
Research Methods Lesson Nº10
If original data changes you have to update the pivot table.
Update
Options
27/11/2013
4
Research Methods Lesson Nº10
Frequency Distributions in Excel Macintosh
tableau croisé dynamique
Research Methods Lesson Nº10
Values of the variable selected: frequencies, percentages….
Variables
The table actualizes every time you change any parameter. Solution: Copy the results
Research Methods Lesson Nº10
What Kind of Values Can it Show? Left click and Value
configuration
Research Methods Lesson Nº10
If original data changes you have to update the pivot table
Research Methods Lesson Nº10
PRACTICE 2 Titanic.xls
• Calculate the frequency by gender
• Relative frequency
PRACTICE 1 AGE.xls
• Calculate the frequency by intervall of 6 individuals
• Relative frequency
Research Methods Lesson Nº10
Graphs: Bar Charts • Number of separate bars whose height represents the
data values.
• They allow to identify the highest and lowest frequency.
• To see the precise values, select the correspondent
option.
• Scale of the axis could exaggerate findings.
0
200
400
600
800
1000
1st 2nd 3rd Crew
Number of individuals by class
27/11/2013
5
Research Methods Lesson Nº10
WINDOWS Insert Type of Graph
Pivot Tables define Graphs in the Options menu
Research Methods Lesson Nº10
• Bar chart for discrete data histograms for
continuous data.
• Histograms where bars are continues to depict
the continuous nature of the categories.
Individual variables (Cont.)
Distributions
Research Methods Lesson Nº10
Individual variables (Cont.)
– Each slice represents the percentage of cases falling in each category.
– It should not have too many segments
– To show percentages in excel….
15%
13%
32%
40%
Percentage of Individuals by class
1st
2nd
3rd
Crew
Pie Chart
Research Methods Lesson Nº10
To show percentages click in the Graph design
Select the graph and click Design
Research Methods Lesson Nº10
Charts
Pivot Tables define Graphs in the Options menu
Type of Graph
Research Methods Lesson Nº10
Individual variables (Cont.)
– Show trends over time.
– They show peaks and their slope is an indication of the rate of any increase/decrease.
– The independent variable must be in the horizontal axis
– You can Add a tendency line
0
10
20
30
40
50
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Failure Rate
Line Graphs
27/11/2013
6
Research Methods Lesson Nº10
PRACTICE 1 AGE.xls
• Build the Bar chart and the Pie chart.
Research Methods Lesson Nº10
Statistics • Statistics is a branch of mathematics
concerned with the analysis of numerical data
• It can be divided into Descriptive and Inferential
– Descriptive indicators that try to summarize data.
They focus on two aspects: • The central tendency;
• The dispersion.
– Inferential indicators try to deduce the characteristics of the population.
Research Methods Lesson Nº10
Statistics
Descriptive
Central Tendency (location)
Dispersion
Inferential
Significance testing
Hypothesis testing
Research Methods Lesson Nº10
Measures of Location
• The Average can be interpreted in three
different ways
1. The Mode
2. The Median
3. The Mean or arithmetic mean
• Analyze all of them
Research Methods Lesson Nº10
1. The Mode is the value that occurs more
frequently. = MODE(range of data)
1. It can be misleading if it is found at the end of a
range of data.
2. Moreover, there could be several modes.
2. The Median is the middle value so half of the
ranked scores are below and half above.
= MEDIAN(range of data)
It will not be disturbed by extreme values.
Measures of Location Research Methods Lesson Nº10
3. The Mean or arithmetic mean, is the sum
of the values divided by its number.
= AVERAGE(Array)
Unlike the other two measures it uses
all the values in the distribution for its
calculation.
It can be distorted by extreme values.
Measures of Location
27/11/2013
7
Research Methods Lesson Nº10
• HOWEVER, Just the value of the median
does not tell much, e.g. the median wage is
1,000 €. Are salaries clustered together or
are they more widely spread?
• Salaries : 200; 250; 1,000; 2,000; 2,500.
• Salaries: 800; 900; 1,000; 1,100; 1,250;.
• The spread of your data will allow a more
representative analysis.
Measures of Location
Research Methods Lesson Nº10
Describing the Dispersion • The following indicators help to measure the
representativeness of the mean.
• The Range: subtract the smallest value from the largest. It should be used when values are clustered together.
• Quartiles method: – The median is the mid point; 50% of the data lie
below it.
– Distribution can be divided in 4 equal parts: =QUARTILE(Array;quart)
– The lower quartile (1st quartile) is the value below which a quarter of your data values will fall.
Research Methods Lesson Nº10
First 25% Of Cases
Second 25% Of Cases
Third 25% Of cases
Fourth 25% Of cases
MEDIAN VALUE
Q1 Q2 Q3
The inter-quartile range
Minimum value
Maximum value
A 0 15 20 25 47
B 0 5 20 40 47
Research Methods Lesson Nº10
Describing the Dispersion
• Inter-quartile range: Difference within the
middle 50 per cent of values.
• It is not affected by any extreme values.
• You can also calculate the range for the
other fractions of variable’s distribution.
= PERCENTIL (array; percentile
centesimally)
Research Methods Lesson Nº10
Describing the Dispersion
• Other measures to calculate how the values differ from the mean:
– Variance,
– Standard deviation and
– Coefficient of variation.
• Variance: average deviation of the mean in squared units. =VARP(Array) if the array is the population; VAR(Array) if you are using a sample
• It is in square units so the negative variations will
not compensate the positive ones.
• However this brings a problem of units of measurement
)2
Research Methods Lesson Nº10
Describing the Dispersion
• Standard deviation, is the positive square
root of the variance. = STDEV(Array)
• The smaller it is, the more concentrated
the data is to the mean. So the mean is
more representative.
• But the size of the standard deviation is
in part the size of the mean itself.
• How can we solve this problem?
27/11/2013
8
Research Methods Lesson Nº10
Describing the dispersion
• Coefficient of variation is calculated as
Standard deviation divided by the mean
and then multiplied by 100.
=(STDEV(Array)/AVERAGE(Array))*100
Research Methods Lesson Nº10
The variance and Standard deviation using
Pivot tables
Left click and Value configuration
Research Methods Lesson Nº10
PRACTICE 1 AGE.xls
• Calculate with excel commands: – 1st, 2nd and 3rd quartile
– Variance
– Standard deviation
– Coefficient of Variation
• Calculate the same with Pivot tables
Research Methods Lesson Nº10
ANALYSIS TOOLPALK FOR EXCEL
Research Methods Lesson Nº10
ADD INS
Analysis Tools & Solvers
WINDOWS
Research Methods Lesson Nº10
Data Data analysis
Descriptive Statistiscs
27/11/2013
9
Research Methods Lesson Nº10
ADD INS
Analysis Tools & Solvers
Analysis Toolpak & Analysis toolpak
MAC
Research Methods Lesson Nº10
Descriptive Statistiscs
DATA ANALYSIS
Research Methods Lesson Nº10
Microsoft office 2011 with Mac
Download the Statplus free version
Watch the video:
http://www.youtube.com/watch?v=F61HTaUsxH8)
Research Methods Lesson Nº10
PRACTICE 1 AGE.xls
• Use the Excel tool pack to generate a
summary of the statistics for AGE.
Research Methods Lesson Nº10 Research Methods Lesson Nº10
Frequency Distribution Table
Count of Gender
Class Gender Total
1st Female 145
Male 180
Total 1st 325 2nd Female 106
Male 179
Total 2nd 285
3rd Female 196
Male 510
Total 3rd 706 Crew Female 23
Male 862
Total Crew 885
Total general 2201
27/11/2013
10
Research Methods Lesson Nº10
Finally, move sex to Values
First move Class and then gender to Row
values box
Research Methods Lesson Nº10
• Multiple bar chart to compare highest
and lowest values
Graphical Comparison
Research Methods Lesson Nº10
• Stacked bar chart to compare totals:
Graphical Comparison
Research Methods Lesson Nº10
• Compound bar chart to compare
proportions:
Graphical Comparison
Research Methods Lesson Nº10
PRACTICE 2 Titanic.xls
• Represent the frequency by class and
gender using tables and graphs
Research Methods Lesson Nº10
• A multiple line graph to compare trends
and conjunctions.
0
2
4
6
8
10
12
14
2000 2001 2002 2003
Hombre
Mujer
Graphical Comparison
27/11/2013
11
Research Methods Lesson Nº10
• Comparative proportional pie charts, to
compare proportions and totals.
Total population and age structure
Graphical Comparison Research Methods Lesson Nº10
• Multiple boxplots: To compare the
distribution values.
Upper quartile
Lower quartile
Median
Graphical Comparison
www.cshg.es Nombre Asignatura TEMA Nº
Distribution not skewed with data compressed
Distribution skewed with data compressed
Distribution not skewed with data elongated
This type of graph is not available in Excel, but it is in SPSS
Research Methods Lesson Nº10
• Contingency tables or cross tabulation
show specific values and interdependence
of categorical variables
Contingency Tables
Class Not Survived Survived Total
1st 122 37,54% 203 62,46% 325 100,00%
2nd 167 58,60% 118 41,40% 285 100,00%
3rd 528 74,79% 178 25,21% 706 100,00%
Crew 673 76,05% 212 23,95% 885 100,00%
Total 1.490 67,70% 711 32,30% 2.201 100,00%
Research Methods Lesson Nº10
Rows
Columns
Values shown
27/11/2013
12
Research Methods Lesson Nº10
Relationships Between Variables
• How does a variable relate to another
variable?
• What is the strength of the relationship, and
is it statistically significant?
• Two types of questions:
– Measures of association: Is there a relationship
between wage and gender? (this states a
relationship between variables)
– Inferential statistics (tests of differences): Are
consumption habits the same for older and younger
passengers? (This checks if the results from a
sample can be generalized to the population)
Research Methods Lesson Nº10
Measures of Association • Two variables are said to be associated if
one variable differs in accordance with the other.
• Two types of relationship: –Cause –and –effect relationship: A
change in one or more (independent) variables causes a change in another (dependent) variable. i.e. Wage disparities related to gender of workers.
–Correlation: Changes in one variable is accompanied by a change in another variable but it is not clear which variable caused the other to change.
Research Methods Lesson Nº10
Measures of Association
Scatter Graph
Cross Tabulation
Correlation Coefficient
Significance Testing
Regression
Research Methods Lesson Nº10
• It shows the relationship between cases of one variable against other.
• The dependent variable is always placed on the vertical or y axis. The independent is placed on the horizontal axis.
520
540
560
580
600
620
640
660
0 100 200 300 400 500
Test
B
Test A
Outliers (Check that there is no data entry error)
It shows if the relation is linear
You can distinguish if there are different groups
Measures of Association
Scatter Graph
You can add a tendency line
Research Methods Lesson Nº10
The Dependent Variable forms columns and the independent forms rows.
SURVIVED (DEPENDENT)
No Yes Total
Female 80 17,03% 390 82,97% 470 100,00%
Male 1410 81,46% 321 18,54% 1731 100,00%
Total 1490 67,70% 711 32,30% 2201 100,00%
To compare the difference betwen Female and Male, we calculate the percentage they represent per row
Cross Tabulation
Measures of Association Research Methods Lesson Nº10
• The size of the percentage difference between the column is an indication of the strength of the association. The closer the figure is to 100% the higher is the association. Here it is of 64%, so it looks like gender determined survival possibilities.
• If there are too many categories or values in the dependent variable; then you should group them, so it could be easier to find relationships.
Cross Tabulation
27/11/2013
13
Research Methods Lesson Nº10
Identify the variables that you want to show in the table
Independent variable
Establish what variable is the Dependent
Values (frequency, percentage…)
Research Methods Lesson Nº10
Correlation Coefficient
• It measures the degree to which two ranked or quantifiable variables co-vary. It is a measure of association that indicates how strongly two variables are related.
= CORREL(array1,array2)
• It takes values from -1 to 1. Indicating the direction and strength of the association.
• It measures linear association.
• It can be misleading, you can get a value of 0.8 and conclude that is a linear positive relation. However, it could be curvilinear.
Var1 Var2
Var 1 1
Var2 0,82 1
Research Methods Lesson Nº10
• A correlation coefficient is simply a number. It does not imply any causal relationship.
• You can have a high correlation between the size of feet and quality of handwriting. But this is caused by the age, that makes increase both.
• Causation could be supposed if:
– The association is repeated in different circumstances
– A plausible explanation can be offered
– No equally plausible third variable could cause changes in both variables together.
Correlation Coefficient Research Methods Lesson Nº10
Regression
• Measuring the size of the impact of one
variable on the other is more practical
than the strength of a relationship.
• Regression analysis is a set of statistical
procedures that estimates the values of
a dependent variable based on the
values of one or more independent
variables.
Research Methods Lesson Nº10
• Regression analysis basically calculates the equation for the best fit line, which in the case of linear regression takes the form:
• Y = a + b*X or
• Y = a + b1*X1 + b2*X2 + ... + bp*Xp (multiple regression analysis)
• Where a is the constant that represents the point at which the line crosses the y axis. b is the coefficient representing the gradient of the slope, and y and x are the dependent and independent variable respectively.
• You estimate the values of a and b, and then substituting X you get the estimated Y.
Regression
Research Methods Lesson Nº10
Regression Line
27/11/2013
14
Research Methods Lesson Nº10
STRENGTH OF A CAUSE-AND-EFFECT RELATIONSHIP
• Regression coefficient or the Coefficient of determination, R2.
• It measures the proportion of the variation in the dependent variable that can be explained statistically by the independent variable or variables.
• It can take on any value between 0 and 1.
• If 50% of the variation can be explained the coefficient will be 0.5. Its value is rarely above 0.8.
Regression
Research Methods Lesson Nº10
• If you include a categorical variable, you
have to analyse it as a set of dummy
variables. For example, if you have a
variable with 3 categories you create 2
dummies
Dept. (Original Variable) Var1 Var2
Family Studies 1 1 0
Biology 2 0 1
Business 3 0 0
Dept Department (1=Family Studies, 2=Biology, 3=Business)
Research Methods Lesson Nº10
Data Data Analysis
Regression
Correlation Coefficient
WINDOWS
Research Methods Lesson Nº10
Regression
Correlation Coefficient
Data Analysis
Research Methods Lesson Nº10
• Multiple Correlation coefficient represents the relationship between the observed values of the dependent variable and those predicted.
OUTCOME FROM EXCEL
Regression Statistics
Multiple correlation coefficient 0,9978
Determination coefficient R^2 0,9956
R^2 adjusted 0,9951
Standard Error 81,3712
Observations 2201,0000
Coefficients Standard
Error T
statistic Prob. Lower
95%
Upper
95%
Intercept 0 #N/A #N/A #N/A #N/A #N/A Variable 1 85,58 0,12 705,27 0,00 85,34 85,82
Example: Regression analysis of wages attending
to the hours worked for the Titanic.xls
Research Methods Lesson Nº10
PRACTICE 2 Titanic.xls
Do the regression analysis between
weekly hours of work and wages.
27/11/2013
15
Research Methods Lesson Nº10
Bibliography
• Balnaves, M. and Caputi P. (2007) Introduction to Quantitative Research Methods: An investigative approach. London, SAGE Publications Ltd.
• Field, A. (2009) Discovering Statistics Using SPSS. 3rd edition. London, SAGE Publications Ltd.
• Finn, M., Elliot-White, M. and Walton, M. (2000) Tourism and Leisure Research Methods. Harlow, Pearson Education Ltd.
• Saunders, M., Lewis, P. and Thornhill, A. (2009) Research Methods for Business Students. 5th edition. London, Pearson Education Ltd.