presentación de powerpoint -...

15
27/11/2013 1 Research Methods Lesson Nº10 Research Methods Lesson Nº10 Introduction To answer your research question you would need to: Organize data Analyse data Interprete it But remember, IF DATA IS COLLECTED BADLY THE ANALYSYS IS WORTHLESS. Research Methods Lesson Nº10 Data analysis usually employs variables. A variable is the characteristic of the population or sample observed. Quantitative variables are those that are expressed numerically. They are the base for Quantitative research. Discrete variables are restricted to whole numbers Continuous variables are those that can take any value among a given range. Qualitative variables are those that are not measured numerically. Introduction Research Methods Lesson Nº10 Types of Variables 1. Nominal variables: involve categories and they must be mutually exclusive (male/female). 2. Ordinal variables: measure the intensity of something by ordering categories (agree,..., indifferent,... strongly disagree). Spaces between categories, however, may not be equal. 3. Interval/ratio variables allow to measure differences between values. In the case of interval variables there is a logical zero point (total absence of this variable), e.g. age, income. Research Methods Lesson Nº10 The analysis of data starts with the data layout Record data using numerical codes Data is usually entered in table format. This table is called a data matrix. Each column usually represents a single variable Each row contains the values of the variables for an individual case or Time period Id Age Gender Service Employed Case 1 1 27 1 3 1 Case 2 2 39 2 1 2 Case 3 3 34 1 2 1 Exploring and representing data Research Methods Lesson Nº10 Exploring (Cont.) Graphs help to explore and understand your data What should diagrams and tables include? Clear descriptive title Clear axis labels / row and column headings Units of measurement Logical sequence of bars / columns and rows Sources of data Explanations for every abbreviation Size of the sample employed

Upload: others

Post on 16-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

27/11/2013

1

Research Methods Lesson Nº10 Research Methods Lesson Nº10

Introduction

• To answer your research question you

would need to:

– Organize data

– Analyse data

– Interprete it

But remember, IF DATA IS COLLECTED BADLY

THE ANALYSYS IS WORTHLESS.

Research Methods Lesson Nº10

• Data analysis usually employs variables.

A variable is the characteristic of the

population or sample observed.

– Quantitative variables are those that are

expressed numerically. They are the base

for Quantitative research.

• Discrete variables are restricted to whole

numbers

• Continuous variables are those that can take any

value among a given range.

– Qualitative variables are those that are not

measured numerically.

Introduction Research Methods Lesson Nº10

Types of Variables

1. Nominal variables: involve categories and they must be mutually exclusive (male/female).

2. Ordinal variables: measure the intensity of something by ordering categories (agree,..., indifferent,... strongly disagree). Spaces between categories, however, may not be equal.

3. Interval/ratio variables allow to measure differences between values. In the case of interval variables there is a logical zero point (total absence of this variable), e.g. age, income.

Research Methods Lesson Nº10

• The analysis of data starts with the data layout

• Record data using numerical codes

• Data is usually entered in table format. This table is called a data matrix. – Each column usually represents a single variable

– Each row contains the values of the variables for an individual case or Time period

Id Age Gender Service Employed

Case 1 1 27 1 3 1

Case 2 2 39 2 1 2

Case 3 3 34 1 2 1

Exploring and representing data

Research Methods Lesson Nº10

Exploring (Cont.)

• Graphs help to explore and understand

your data

• What should diagrams and tables

include?

– Clear descriptive title

• Clear axis labels / row and column headings

– Units of measurement

– Logical sequence of bars / columns and rows

– Sources of data

– Explanations for every abbreviation

– Size of the sample employed

27/11/2013

2

Research Methods Lesson Nº10

EXPLORING AND REPRESENTING DATA

Individual Variable

Frequency Distribtution

Graphs

Statistics

Several variables

Graphs

Contingency Tables

Relationships

Research Methods Lesson Nº10

Research Methods Lesson Nº10

Exploratory analysis

• Frequency distribution table

• To achieve clarity use percentages

Nationality Frequency

Portuguese 20

French 25

Spanish 15

Mexican 5

Defects?

Research Methods Lesson Nº10

Practice1.xls shows the the age of the

participants:

43 35 48 43 11 15 38 21 25 37 17 31 49 11 11 47 42 25 27 36 27 13 13 35 23 47 34 23 27 30 41 25 41 18 24 18 18 28 40 32 40 24 26 47 30 22 48 30 37 23 33 42 29 47 32 29 24 19 35 35 20 31 46 36 45 31 44 26 20 44 27 23 43 31 40 27 23 28 46 28

Research Methods Lesson Nº10

• First, identify the number of intervals or groups (n). It can be calculated with the rule of thumb 2^n >number of cases.

• In this case the number of cases is 80; therefore n=7;

• The intervals’ width is calculated as (largest value –smallest value)/ n = 5,43

• Consequently there are 7 intervals using a width of 6 which must start from the smallest value and end in the largest

• 11-16;17-22;..…;47-52

• Then you calculate the frequency for each interval

• Excel commands: Max(range), Min(range); Frequency(range of data; range of intervals) Crtl+Shift+Enter.

Research Methods Lesson Nº10

The result will be:

Age Interval

Frequency Relative

frequency % Cumulative

Frequency %

11-16 6 7,5 7,5

17-22 9 11,25 18,75

23-28 21 26,25 45

29-34 13 16,25 61,25

35-40 12 15 76,25

41-46 12 15 91,25

47-52 7 8,75 100

TOTAL 80 100

27/11/2013

3

Research Methods Lesson Nº10

PRACTICE 1 AGE.xls

• Calculate the frequency

• Relative frequency

• Cumulative frequency

Research Methods Lesson Nº10

It will ask you about the data range. Be sure there are no blank rows or columns

Insert

Frequency Distributions using

Pivot tables. WINDOWS

Pivot Table (tableau croisé dynamique)

Research Methods Lesson Nº10

Values of the variable selected: frequencies, percentages….

Variables

The table actualizes every time you change any parameter. Solution: Copy the results

Research Methods Lesson Nº10

What Kind of Values Can it Show? Left click and Value

configuration

Research Methods Lesson Nº10

To Group Data go to Table Tools

Group Categories

Options

Research Methods Lesson Nº10

If original data changes you have to update the pivot table.

Update

Options

27/11/2013

4

Research Methods Lesson Nº10

Frequency Distributions in Excel Macintosh

tableau croisé dynamique

Research Methods Lesson Nº10

Values of the variable selected: frequencies, percentages….

Variables

The table actualizes every time you change any parameter. Solution: Copy the results

Research Methods Lesson Nº10

What Kind of Values Can it Show? Left click and Value

configuration

Research Methods Lesson Nº10

If original data changes you have to update the pivot table

Research Methods Lesson Nº10

PRACTICE 2 Titanic.xls

• Calculate the frequency by gender

• Relative frequency

PRACTICE 1 AGE.xls

• Calculate the frequency by intervall of 6 individuals

• Relative frequency

Research Methods Lesson Nº10

Graphs: Bar Charts • Number of separate bars whose height represents the

data values.

• They allow to identify the highest and lowest frequency.

• To see the precise values, select the correspondent

option.

• Scale of the axis could exaggerate findings.

0

200

400

600

800

1000

1st 2nd 3rd Crew

Number of individuals by class

27/11/2013

5

Research Methods Lesson Nº10

WINDOWS Insert Type of Graph

Pivot Tables define Graphs in the Options menu

Research Methods Lesson Nº10

• Bar chart for discrete data histograms for

continuous data.

• Histograms where bars are continues to depict

the continuous nature of the categories.

Individual variables (Cont.)

Distributions

Research Methods Lesson Nº10

Individual variables (Cont.)

– Each slice represents the percentage of cases falling in each category.

– It should not have too many segments

– To show percentages in excel….

15%

13%

32%

40%

Percentage of Individuals by class

1st

2nd

3rd

Crew

Pie Chart

Research Methods Lesson Nº10

To show percentages click in the Graph design

Select the graph and click Design

Research Methods Lesson Nº10

Charts

Pivot Tables define Graphs in the Options menu

Type of Graph

Research Methods Lesson Nº10

Individual variables (Cont.)

– Show trends over time.

– They show peaks and their slope is an indication of the rate of any increase/decrease.

– The independent variable must be in the horizontal axis

– You can Add a tendency line

0

10

20

30

40

50

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Failure Rate

Line Graphs

27/11/2013

6

Research Methods Lesson Nº10

PRACTICE 1 AGE.xls

• Build the Bar chart and the Pie chart.

Research Methods Lesson Nº10

Statistics • Statistics is a branch of mathematics

concerned with the analysis of numerical data

• It can be divided into Descriptive and Inferential

– Descriptive indicators that try to summarize data.

They focus on two aspects: • The central tendency;

• The dispersion.

– Inferential indicators try to deduce the characteristics of the population.

Research Methods Lesson Nº10

Statistics

Descriptive

Central Tendency (location)

Dispersion

Inferential

Significance testing

Hypothesis testing

Research Methods Lesson Nº10

Measures of Location

• The Average can be interpreted in three

different ways

1. The Mode

2. The Median

3. The Mean or arithmetic mean

• Analyze all of them

Research Methods Lesson Nº10

1. The Mode is the value that occurs more

frequently. = MODE(range of data)

1. It can be misleading if it is found at the end of a

range of data.

2. Moreover, there could be several modes.

2. The Median is the middle value so half of the

ranked scores are below and half above.

= MEDIAN(range of data)

It will not be disturbed by extreme values.

Measures of Location Research Methods Lesson Nº10

3. The Mean or arithmetic mean, is the sum

of the values divided by its number.

= AVERAGE(Array)

Unlike the other two measures it uses

all the values in the distribution for its

calculation.

It can be distorted by extreme values.

Measures of Location

27/11/2013

7

Research Methods Lesson Nº10

• HOWEVER, Just the value of the median

does not tell much, e.g. the median wage is

1,000 €. Are salaries clustered together or

are they more widely spread?

• Salaries : 200; 250; 1,000; 2,000; 2,500.

• Salaries: 800; 900; 1,000; 1,100; 1,250;.

• The spread of your data will allow a more

representative analysis.

Measures of Location

Research Methods Lesson Nº10

Describing the Dispersion • The following indicators help to measure the

representativeness of the mean.

• The Range: subtract the smallest value from the largest. It should be used when values are clustered together.

• Quartiles method: – The median is the mid point; 50% of the data lie

below it.

– Distribution can be divided in 4 equal parts: =QUARTILE(Array;quart)

– The lower quartile (1st quartile) is the value below which a quarter of your data values will fall.

Research Methods Lesson Nº10

First 25% Of Cases

Second 25% Of Cases

Third 25% Of cases

Fourth 25% Of cases

MEDIAN VALUE

Q1 Q2 Q3

The inter-quartile range

Minimum value

Maximum value

A 0 15 20 25 47

B 0 5 20 40 47

Research Methods Lesson Nº10

Describing the Dispersion

• Inter-quartile range: Difference within the

middle 50 per cent of values.

• It is not affected by any extreme values.

• You can also calculate the range for the

other fractions of variable’s distribution.

= PERCENTIL (array; percentile

centesimally)

Research Methods Lesson Nº10

Describing the Dispersion

• Other measures to calculate how the values differ from the mean:

– Variance,

– Standard deviation and

– Coefficient of variation.

• Variance: average deviation of the mean in squared units. =VARP(Array) if the array is the population; VAR(Array) if you are using a sample

• It is in square units so the negative variations will

not compensate the positive ones.

• However this brings a problem of units of measurement

)2

Research Methods Lesson Nº10

Describing the Dispersion

• Standard deviation, is the positive square

root of the variance. = STDEV(Array)

• The smaller it is, the more concentrated

the data is to the mean. So the mean is

more representative.

• But the size of the standard deviation is

in part the size of the mean itself.

• How can we solve this problem?

27/11/2013

8

Research Methods Lesson Nº10

Describing the dispersion

• Coefficient of variation is calculated as

Standard deviation divided by the mean

and then multiplied by 100.

=(STDEV(Array)/AVERAGE(Array))*100

Research Methods Lesson Nº10

The variance and Standard deviation using

Pivot tables

Left click and Value configuration

Research Methods Lesson Nº10

PRACTICE 1 AGE.xls

• Calculate with excel commands: – 1st, 2nd and 3rd quartile

– Variance

– Standard deviation

– Coefficient of Variation

• Calculate the same with Pivot tables

Research Methods Lesson Nº10

ANALYSIS TOOLPALK FOR EXCEL

Research Methods Lesson Nº10

ADD INS

Analysis Tools & Solvers

WINDOWS

Research Methods Lesson Nº10

Data Data analysis

Descriptive Statistiscs

27/11/2013

9

Research Methods Lesson Nº10

ADD INS

Analysis Tools & Solvers

Analysis Toolpak & Analysis toolpak

MAC

Research Methods Lesson Nº10

Descriptive Statistiscs

DATA ANALYSIS

Research Methods Lesson Nº10

Microsoft office 2011 with Mac

Download the Statplus free version

Watch the video:

http://www.youtube.com/watch?v=F61HTaUsxH8)

Research Methods Lesson Nº10

PRACTICE 1 AGE.xls

• Use the Excel tool pack to generate a

summary of the statistics for AGE.

Research Methods Lesson Nº10 Research Methods Lesson Nº10

Frequency Distribution Table

Count of Gender

Class Gender Total

1st Female 145

Male 180

Total 1st 325 2nd Female 106

Male 179

Total 2nd 285

3rd Female 196

Male 510

Total 3rd 706 Crew Female 23

Male 862

Total Crew 885

Total general 2201

27/11/2013

10

Research Methods Lesson Nº10

Finally, move sex to Values

First move Class and then gender to Row

values box

Research Methods Lesson Nº10

• Multiple bar chart to compare highest

and lowest values

Graphical Comparison

Research Methods Lesson Nº10

• Stacked bar chart to compare totals:

Graphical Comparison

Research Methods Lesson Nº10

• Compound bar chart to compare

proportions:

Graphical Comparison

Research Methods Lesson Nº10

PRACTICE 2 Titanic.xls

• Represent the frequency by class and

gender using tables and graphs

Research Methods Lesson Nº10

• A multiple line graph to compare trends

and conjunctions.

0

2

4

6

8

10

12

14

2000 2001 2002 2003

Hombre

Mujer

Graphical Comparison

27/11/2013

11

Research Methods Lesson Nº10

• Comparative proportional pie charts, to

compare proportions and totals.

Total population and age structure

Graphical Comparison Research Methods Lesson Nº10

• Multiple boxplots: To compare the

distribution values.

Upper quartile

Lower quartile

Median

Graphical Comparison

www.cshg.es Nombre Asignatura TEMA Nº

Distribution not skewed with data compressed

Distribution skewed with data compressed

Distribution not skewed with data elongated

This type of graph is not available in Excel, but it is in SPSS

Research Methods Lesson Nº10

• Contingency tables or cross tabulation

show specific values and interdependence

of categorical variables

Contingency Tables

Class Not Survived Survived Total

1st 122 37,54% 203 62,46% 325 100,00%

2nd 167 58,60% 118 41,40% 285 100,00%

3rd 528 74,79% 178 25,21% 706 100,00%

Crew 673 76,05% 212 23,95% 885 100,00%

Total 1.490 67,70% 711 32,30% 2.201 100,00%

Research Methods Lesson Nº10

Rows

Columns

Values shown

27/11/2013

12

Research Methods Lesson Nº10

Relationships Between Variables

• How does a variable relate to another

variable?

• What is the strength of the relationship, and

is it statistically significant?

• Two types of questions:

– Measures of association: Is there a relationship

between wage and gender? (this states a

relationship between variables)

– Inferential statistics (tests of differences): Are

consumption habits the same for older and younger

passengers? (This checks if the results from a

sample can be generalized to the population)

Research Methods Lesson Nº10

Measures of Association • Two variables are said to be associated if

one variable differs in accordance with the other.

• Two types of relationship: –Cause –and –effect relationship: A

change in one or more (independent) variables causes a change in another (dependent) variable. i.e. Wage disparities related to gender of workers.

–Correlation: Changes in one variable is accompanied by a change in another variable but it is not clear which variable caused the other to change.

Research Methods Lesson Nº10

Measures of Association

Scatter Graph

Cross Tabulation

Correlation Coefficient

Significance Testing

Regression

Research Methods Lesson Nº10

• It shows the relationship between cases of one variable against other.

• The dependent variable is always placed on the vertical or y axis. The independent is placed on the horizontal axis.

520

540

560

580

600

620

640

660

0 100 200 300 400 500

Test

B

Test A

Outliers (Check that there is no data entry error)

It shows if the relation is linear

You can distinguish if there are different groups

Measures of Association

Scatter Graph

You can add a tendency line

Research Methods Lesson Nº10

The Dependent Variable forms columns and the independent forms rows.

SURVIVED (DEPENDENT)

No Yes Total

Female 80 17,03% 390 82,97% 470 100,00%

Male 1410 81,46% 321 18,54% 1731 100,00%

Total 1490 67,70% 711 32,30% 2201 100,00%

To compare the difference betwen Female and Male, we calculate the percentage they represent per row

Cross Tabulation

Measures of Association Research Methods Lesson Nº10

• The size of the percentage difference between the column is an indication of the strength of the association. The closer the figure is to 100% the higher is the association. Here it is of 64%, so it looks like gender determined survival possibilities.

• If there are too many categories or values in the dependent variable; then you should group them, so it could be easier to find relationships.

Cross Tabulation

27/11/2013

13

Research Methods Lesson Nº10

Identify the variables that you want to show in the table

Independent variable

Establish what variable is the Dependent

Values (frequency, percentage…)

Research Methods Lesson Nº10

Correlation Coefficient

• It measures the degree to which two ranked or quantifiable variables co-vary. It is a measure of association that indicates how strongly two variables are related.

= CORREL(array1,array2)

• It takes values from -1 to 1. Indicating the direction and strength of the association.

• It measures linear association.

• It can be misleading, you can get a value of 0.8 and conclude that is a linear positive relation. However, it could be curvilinear.

Var1 Var2

Var 1 1

Var2 0,82 1

Research Methods Lesson Nº10

• A correlation coefficient is simply a number. It does not imply any causal relationship.

• You can have a high correlation between the size of feet and quality of handwriting. But this is caused by the age, that makes increase both.

• Causation could be supposed if:

– The association is repeated in different circumstances

– A plausible explanation can be offered

– No equally plausible third variable could cause changes in both variables together.

Correlation Coefficient Research Methods Lesson Nº10

Regression

• Measuring the size of the impact of one

variable on the other is more practical

than the strength of a relationship.

• Regression analysis is a set of statistical

procedures that estimates the values of

a dependent variable based on the

values of one or more independent

variables.

Research Methods Lesson Nº10

• Regression analysis basically calculates the equation for the best fit line, which in the case of linear regression takes the form:

• Y = a + b*X or

• Y = a + b1*X1 + b2*X2 + ... + bp*Xp (multiple regression analysis)

• Where a is the constant that represents the point at which the line crosses the y axis. b is the coefficient representing the gradient of the slope, and y and x are the dependent and independent variable respectively.

• You estimate the values of a and b, and then substituting X you get the estimated Y.

Regression

Research Methods Lesson Nº10

Regression Line

27/11/2013

14

Research Methods Lesson Nº10

STRENGTH OF A CAUSE-AND-EFFECT RELATIONSHIP

• Regression coefficient or the Coefficient of determination, R2.

• It measures the proportion of the variation in the dependent variable that can be explained statistically by the independent variable or variables.

• It can take on any value between 0 and 1.

• If 50% of the variation can be explained the coefficient will be 0.5. Its value is rarely above 0.8.

Regression

Research Methods Lesson Nº10

• If you include a categorical variable, you

have to analyse it as a set of dummy

variables. For example, if you have a

variable with 3 categories you create 2

dummies

Dept. (Original Variable) Var1 Var2

Family Studies 1 1 0

Biology 2 0 1

Business 3 0 0

Dept Department (1=Family Studies, 2=Biology, 3=Business)

Research Methods Lesson Nº10

Data Data Analysis

Regression

Correlation Coefficient

WINDOWS

Research Methods Lesson Nº10

Regression

Correlation Coefficient

Data Analysis

Research Methods Lesson Nº10

• Multiple Correlation coefficient represents the relationship between the observed values of the dependent variable and those predicted.

OUTCOME FROM EXCEL

Regression Statistics

Multiple correlation coefficient 0,9978

Determination coefficient R^2 0,9956

R^2 adjusted 0,9951

Standard Error 81,3712

Observations 2201,0000

Coefficients Standard

Error T

statistic Prob. Lower

95%

Upper

95%

Intercept 0 #N/A #N/A #N/A #N/A #N/A Variable 1 85,58 0,12 705,27 0,00 85,34 85,82

Example: Regression analysis of wages attending

to the hours worked for the Titanic.xls

Research Methods Lesson Nº10

PRACTICE 2 Titanic.xls

Do the regression analysis between

weekly hours of work and wages.

27/11/2013

15

Research Methods Lesson Nº10

Bibliography

• Balnaves, M. and Caputi P. (2007) Introduction to Quantitative Research Methods: An investigative approach. London, SAGE Publications Ltd.

• Field, A. (2009) Discovering Statistics Using SPSS. 3rd edition. London, SAGE Publications Ltd.

• Finn, M., Elliot-White, M. and Walton, M. (2000) Tourism and Leisure Research Methods. Harlow, Pearson Education Ltd.

• Saunders, M., Lewis, P. and Thornhill, A. (2009) Research Methods for Business Students. 5th edition. London, Pearson Education Ltd.