quantitative data analysis
Post on 09-Jul-2016
6 Views
Preview:
DESCRIPTION
TRANSCRIPT
6CN010 - Dissertation6CN010 - Dissertation
Quantitative Data Analysis
• Coding
• Descriptive statistics
–Measure of distribution
–Measure of central tendency
–Measure of dispersion
Quantitative data analysis
• Processing the data collected for analysis requires coding
• Coding – converting data into numeric form for analysis
•Knowing how data is going to be analysed is essential to designing surveys
• Has to be done at the start – not after the data has been collected!!
From survey to usable data
• Check the level of measurement – make sure it is appropriate for envisaged analysis.
• Variables can be defined into types according to the level of mathematical scaling that can be carried out on the data.
• 4 types of data or levels of measurement:
Coding
1. Nominal 2. Ordinal
3. Interval 4. Ratio
• Nominal (categorical) data comprises of categories that cannot be rank ordered – each category is just different.
• No order to categories coding can be in any order but good practice to code in order of appearance.
Coding nominal variables
What is your gender? (please tick)
Male ̸�Female
Did you enjoy the film? (please tick)
Yes 0
No 1
Sometimes coded 0 and 1.
• Ordinal data is data that comprises of categories that can be rank ordered.
• Coding similar to nominal but coded in rank order.
• Ranking can run low high or high low
Coding ordinal variables
How satisfied are you with the level of service you have received? (please tick)
Very satisfied 5
Somewhat satisfied 4
Neutral 3
Somewhat dissatisfied 2
Very dissatisfied 1
What is your highest level of qualification? (please tick)
Degree or higher 1
Level 3 or equivalent 2
Level 2 or equivalent 3
Level 1 or equivalent 4
No qualifications 5
• Scale data – interval and ratio data
• data is in numeric format (£50, £100, £150)
• can be measured on a continuous scale
• distance between each can be observed and as a result measured
• data can be placed in rank order.
• Coding scale data – just enter the value (age = 52, number of bedrooms in a house = 3)
Coding interval and ratio variables
• Descriptive statistics describe what the data is or what the data shows.
•Descriptive statistics are different from inferential statistics.
• Inferential statistics are used to infer conclusions from the data and make generalisations to the populations.
•Descriptive statistics – conducting analysis on one variable at a time or univariate analysis.
Descriptive statistics
Distribution is a summary for each variable of the ‘frequency’ or number of times a value or range of values occurs.
Examples:
•number and percentage of male and female
•ages of research participants.
Measures of distribution
• A frequency table is one of the most common methods used to describe a single variable.
• Used to describe nominal or ordinal variables – those with a category (yes & no, strongly agree to strongly disagree and so on).
• Shows number and/or percentage of the occurrence of a category within a variable.
•Frequency distributions can be depicted in two ways: •table•graph
Frequency tables (1/3)
Example
Frequency table (2/3)
Age range Number Percentage
Less than 20 150 19.9
20 – 49 250 33.1
50 – 64 180 23.8
65 – 80 100 13.3
Over 80 75 9.9
Total 755 100.0
Example: other things you may see
Frequency table (3/3)
Age range Number Percentage Valid Percentage
Cumulative Percentage
Less than 20 150 19.3 19.9 19.9
20 – 49 250 32.2 33.1 53.0
50 – 64 180 23.2 23.8 76.8
65 – 80 100 12.9 13.3 90.1
Over 80 75 9.8 9.9 100.0
Total 755 97.4 100.0 100.0
Missing (not recorded)
20 2.6
Total 775 100.0
• Measures of central tendency: quantification of the location of the middle or centre of a data set – what the typical or
average score/ result of a data set is.
• So, identifying a typical value that best summarises the distribution of values in a variable.
•There are three main different measures:
1 Mean – Average
2 Mode – Most frequently occurring (280)
Measures of central tendency
2765380320280220180x
180 220 280 320 280 180 350 280 330 220
2 Mode (cont’d)
•Bi-modal: two most frequently occurring values in a distribution (two pronounced views or patterns of response).
• Multi-modal: where there are more than two modes in a distribution (potentially several pronounced views or patterns of response).
Measures of central tendency (Cont’d)
3 MedianMedian is the midpoint in a distribution, when arranged in ascending or descending order.
280
180 220 280 320 380
Where there is an even number of observations the median will be the average of the two middle values.
290
180 220 280 300 320 380
Measures of central tendency (Cont’d)
Appropriate measure
Level of measurement Measure of central tendency
Nominal Mode
Ordinal Median and mode
Interval/Ratio Mean, median and mode
Measures of dispersion: statistical measures that summarise the amount of spread or variation in the distribution of values in a variable.
So, how values are spread within a distribution.
There are a number of different measures (applicable to interval or ratio data):
•Range•Standard deviation•Variance
Measures of dispersion
Measures of dispersion
Type Description
Range Difference between the highest (maximum) and lowest (minimum) value in the distribution of values
Variance The measure of the spread.
Standard deviation Shows the relation that a set of data has to the mean of the sample data.
Range is the difference between the highest and lowest value in the distribution of values.
Example:Weekly income of 10 people:
Range is maximum income minus minimum income: 330-180 = £150.
Range
£180 £220 £280 £320 £280 £180 £310 £280 £330 £220
Of course, ordinal data can be ordered and so can give information on range.
Example:Survey question – How useful did you find the book?
Range is from “Very useful” to “Very un-useful”
Range – using ordinal data
Very useful
Very un-
usefulUseful Un-
useful
Very un-
usefulUseful Very
usefulVery
useful Useful Un-useful
• Inter quartile range (IQR) is another range measure but this time looks at the data in terms of quarters or percentiles.
• The range of data is divided into four equal percentiles or quarters (25%).
Inter quartile range
Min Max
Q2Median
50th Percentile
Q125th percentile
Q375th percentile
IQR
Range
• IQR is the range of the middle 50% of the data. Therefore, because it uses the middle 50%, it is not affected by minima or maxima values (outliers).
• Outliers – variables that are the extreme lower or upper end of the distribution. They are a typical, infrequent observations.
• These will influence the mean (arithmetic). Why?
10 people record their height:
160, 162, 164, 166, 168, 170, 172, 174, 176 and 200 cm tall. With those values the mean is 171cm.
(200cm is the outlier – take it out and the mean is 168cm)
Inter Quartile Range
• Where the mean is a measure of the centre of a group of numbers, the variance is the measure of the spread.
• It involves measuring the distance between each of the values and the mean.
• To calculate the variance :
1. calculate the mean
2. for each value in the distribution subtract the mean and then square the result (the squared difference)
3. calculate the average of those squared differences.
Variance
= Sum of (observed value – mean score) 2
Total number of scores -1
• The larger the variance value the further the observed values of the data set are dispersed from the mean.
• A variance value of zero means all observed values are the same as the mean.
Variance
1
2
2
NXX
s i
• Standard deviation = The square root of the variance.
• As it is square rooted the results correspond to the original data units. E.g. if the variable is height recorded in cm then the standard deviation can be interpreted as cm.
• Standard deviation: how far on average each value is from the mean.
1
2
N
XXs
Standard deviation
Appropriate descriptive statistics: summary
Level of measurement
Univariate analysis
NominalFrequency table: count, %, valid %, cumulative %.Measure of central tendency: modeMeasure of dispersion: no measure.
Ordinal Frequency table: count, %, valid %, cumulative %.Measure of central tendency: mode, medianMeasure of dispersion: no measure.
Interval/Ratio
Frequency table: count, %, valid %, cumulative %.Measure of central tendency: mode, median, meanMeasure of dispersion: range, variance, standard deviation
Further Reading
Creswell, John W (1994), “Research design: Qualitative and Quantitative Approaches”. Sage Publication, London, Page 116-171
Holt, G. (1998). A guide to successful dissertation study for students of the built environment, Second edition. Wolverhampton: Built Environment Research Unit. ISBN: 1-902010-01-9, page 100-118
Naoum, S.G. (2007) Dissertation Research and Writing for Construction Students, 2nd Edition. Oxford: Butterworth Heinemann. ISBN: 0 7506 2988 6, page 91-131
top related