b11r01 - descriptive statistics.docx

8
Block XI | Research | Lesson 1DESCRIPTIVE STATISTICS Dr. Telia Avendano Posecion September 15, 2015 (10:00AM-12:00 PM) SUMMARY/OUTLINE I. Data Analysis & Collection A. Getting Ready for Data Analysis B. 10 Commandments for Data Analysis C. Data Analysis D. Levels of Measurements 1. Nominal 2. Ordinal 3. Interval 4. Ratio II. Statistics A. Selecting Statistical Analysis B. Major Uses of Statistical Procedures 1. Inferential Statistics 2. Descriptive Statistics a. Frequency Distribution b. Graphical Presentation of Data c. Summary Statistics i. Measures of Central Tendency a) Mean b) Mode c) Median d) Distributions - Symmetrical - Positively Skewed - Negatively Skewed ii. Measures of Variability a) Range b) Standard Deviation c) Variance - Coefficient of Variability iii. Measures of Location a) Percentile b) Quartile c) Decile DATA ANALYSIS & COLLECTION A. Getting Ready for Data Analysis 1. Research proposal 2. Data collection form 3. Data collection 4. Data processing - looking for errors, for completeness, arranging, and encoding 5. Data analysis B. Ten Commandments of Data Collection 1. Think of the data you have to collect to answer your question. 2. Think about where you will be getting your data. There should be a catchment area. 3. Make sure that the data collection form you are using is clear and easy to use. Tools should be validated and undergo pre-testing. 4. Once you transfer your scores to your data collection form, make a duplicate copy of the data file and keep it in a separate location. Always have a back up. 5. Do not rely on other people to collect or transfer your data unless you have personally trained them and are confident that they understand the data collection process as well as you do. 6. Plan a detailed schedule of when and where you will be collecting your data. This is where we make a Gantt chart. 7. As soon as possible, cultivate possible sources for your participant pool. 8. Try to follow up on subjects who missed their testing session or interview. Make sure that the data collection form is completely filled in before leaving. It is very difficult to follow-up respondents. 9. Never discard original data. You may have missed something. 10. Follow the previous 9. C. Data Analysis Process of summarizing trends and patterns observed in the data Page 1 of 8 SGD 7B| Acanto, Maquilang, Roldan

Upload: charles-jebb-belonio-juanitas

Post on 09-Dec-2015

7 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: B11R01 - Descriptive Statistics.docx

SUMMARY/OUTLINEI. Data Analysis & Collection

A. Getting Ready for Data AnalysisB. 10 Commandments for Data AnalysisC. Data AnalysisD. Levels of Measurements

1. Nominal2. Ordinal3. Interval4. Ratio

II. StatisticsA. Selecting Statistical AnalysisB. Major Uses of Statistical Procedures

1. Inferential Statistics2. Descriptive Statistics

a. Frequency Distributionb. Graphical Presentation of Datac. Summary Statistics

i. Measures of Central Tendencya) Meanb)Modec) Mediand)Distributions

- Symmetrical- Positively Skewed- Negatively Skewed

ii. Measures of Variabilitya) Rangeb)Standard Deviationc) Variance

- Coefficient of Variabilityiii. Measures of Location

a) Percentileb)Quartilec)Decile

DATA ANALYSIS & COLLECTIONA. Getting Ready for Data Analysis1. Research proposal2. Data collection form3. Data collection4. Data processing - looking for errors, for completeness,

arranging, and encoding5. Data analysis

B. Ten Commandments of Data Collection1. Think of the data you have to collect to answer your

question.2. Think about where you will be getting your data.

There should be a catchment area.

3. Make sure that the data collection form you are using is clear and easy to use. Tools should be validated and undergo pre-testing.

4. Once you transfer your scores to your data collection form, make a duplicate copy of the data file and keep it in a separate location. Always have a back up.

5. Do not rely on other people to collect or transfer your data unless you have personally trained them and are confident that they understand the data collection process as well as you do.

6. Plan a detailed schedule of when and where you will be collecting your data. This is where we make a Gantt chart.

7. As soon as possible, cultivate possible sources for your participant pool.

8. Try to follow up on subjects who missed their testing session or interview. Make sure that the data collection form is completely filled in before leaving. It is very difficult to follow-up respondents.

9. Never discard original data. You may have missed something.

10. Follow the previous 9.

C. Data Analysis Process of summarizing trends and patterns

observed in the data Determine major differentials or relationships among

variables used in the study Application of appropriate statistical tests on a set of

data to answer the objectives of a study

Type of data analysis to use depends on the: Objective of the study (to describe groups,

determine sensitivity, to know risk factors, etc.) Kind of scales of measurement of the data or

variables being dealt with is very important too.

- Levels of Measurement 1. Nominal

Simplest, no mathematical values, categorical scale, we don’t measure but we count number of observations with or without attribute or interest

2. OrdinalRanked into two or more orders, distance between is not the same, ex. small medium large, observations are greater than others; you use this on parametric statistical tools

Page 1 of 6SGD 7B| Acanto, Maquilang, Roldan

Block XI | Research | Lesson 1DESCRIPTIVE STATISTICSDr. Telia Avendano Posecion September 15, 2015 (10:00AM-12:00 PM)

Page 2: B11R01 - Descriptive Statistics.docx

Block XI | Research | Lesson 1Descriptive Statistics

3. Interval Data have numerical value, distance in between have equal distances

4. RatioSame as interval, difference of addition of meaningful zero point

STATISTICS Powerful tool for organizing and understanding data Provides ways to represent and describe groups,

summarize results and evaluate data All about summarizing

A. Selecting Statistical Analysis: Initial Flowchart

B. Two Major Uses of Statistical Procedures1. Descriptive Statistics First step in analysis of data is to describe them Simplify and organize data Describe some of the characteristics of the

distribution of scores you have collected Demographic data usually first

2. Inferential Statistics Interpret what the data mean

Help you make decisions about how the data you collected relates to your original hypothesis

Help you make generalizations but should be careful, it depends on sampling method etc.

DESCRIPTIVE STATISTICSA. Frequency distributionB. Graphical representation of dataC. Summary statistics

A. Frequency Distribution Simplest way to organize and summarize data at a

glance How often? List of the number of participants who fall in a

particular category It is helpful to convert frequencies to percentages If there are many possible scores between the

highest and lowest scores, frequency distribution will be long and almost as difficult to read as the original data Used usually if the data is nominal.

We use group frequency distribution which shortens the table to a more manageable size.

Sometimes, it is helpful to categorize participants on the basis of more than one variable at the same time. This is called cross-tabulation. Cross-tabulations can help us to see relationships between nominal measures. Usually a 2x2 table sometimes called a contingency table.

Table 1. Contingency table of males and females pro- and anti- reproductive Health (RH) Bill

Sex pro RH Bill anti RH Bill TotalMale 38 12 50

Female 24 26 50total 62 38 100

B. Graphical Presentation of Data “one picture is worth a thousand words” Clarify a data set Most people find a graphic representation easier to

understand than other statistical procedures Helps interpret a summary statistic or statistical set

Page 2 of 6

Page 3: B11R01 - Descriptive Statistics.docx

Block XI | Research | Lesson 1Descriptive Statistics

Examples:1. Bar Graph (Used if data is discrete/nominal with

categories)

2. Histogram (Used if the data is continuous)

Bar graph vs Histogram The bars represent different categories. They could

be rearranged. Histograms use continuous data where the bins

represent ranges of data rather than categories.

3. Pie Chart - Used if there are more than two variables being compared

4. Component Bar Graph -used to show different categories, within a category, there is another category

5. Frequency polygon- a graphical device for understanding the shapes of

distributions. They serve the same purpose as histograms, but are especially helpful for comparing sets of data. Frequency polygons are also a good choice for displaying cumulative frequency distributions.

6. Line Gram (can be days of the week, years)

Page 3 of 6

Page 4: B11R01 - Descriptive Statistics.docx

Block XI | Research | Lesson 1Descriptive Statistics

7. Scatter-plot (relationships and correlations)

Types of Graphs Commonly Used in Presenting

C. Summary statistics Measures of Central Tendency

- Mean, Median, Mode Measures of Variability

- Range, Variance, Standard Deviation, - Coefficient of Variability

Measures of Location- Percentile , decile, quartile

MEASURES OF CENTRAL TENDENCY1. Mean Most commonly used measure of central tendency

unless distribution is skewed The average

When a distribution is skewed, the most strongly affected is the mean

May be influenced by extreme values Not ordinarily used with ordinal data because of the

arbitrary nature of an ordinal scale

2. Median Score in a distribution above which one half of the

scores lie Order scores/data from lowest to highest then select

the middle score as the median Used if distribution is skewed If n= even, find the mean of the 2 middle scores Often used to compute an average when extreme

scores are involved Used in ordinal data To determine the position of the median, the

following formula may be used:

3. Mode Most frequently occurring value An excellent choice if you want a general

overview of which class or category occurs most frequently

A distribution may have more than one mode Common mistake: The mode is the value and not

the frequency of that value. Easily computed but unstable; can be affected by

a change in only one or two scores Choice of the Measures of Tendency depends on

nature of the distribution and concept of central tendency which is desired

Can be bimodal, (2), Multimodal (more than 2)

Choice of Measure Tendency (very important)Depend on:a. Nature of distributionb. Concept of central tendencyc. Scale /level of measurement

Guidelines to help you decide which measure of central tendency is best:

Mean is used for numerical data and for symmetric distribution.

Median is used for ordinal data or for numerical data if the distribution is skewed

Mode is used primarily for bimodal distributions

Page 4 of 6

Page 5: B11R01 - Descriptive Statistics.docx

Block XI | Research | Lesson 1Descriptive Statistics

4. Distributions Symmetric / normal- Bell shaped curve- Most participants are near the middle of the

distribution- Location of the measures of central tendency for a

symmetric distribution

Positively skewed- Direction is indicated by the tail (skewedness

depends on the tail)- Most of the scores pile up near the bottom- Skewed to the right

Negatively skewed- Most of the scores pile up near the high or positive

side - skewed to the left (clue is to look at the mean, it is to

the left of the median)

MEASURES OF VARIABILITY Highest minus lowest One of the most important concepts in research Measures of variability measure the spread or

degree of variability present in the distribution while central tendency give information only as to the tendency of the values to clump together

Natural variability among participants or samples often can mask the effects of variables under study

Most research designs and statistical procedures were developed to control or minimize the effects of natural variability of scores

Some variables may have large differences between participants, others small

Important points to remember: Scores do vary Degree of variability can be quantified

1. Range Simplest measure of variability Difference between the highest and the lowest

value/score Too unstable because it depends on only two scores A single deviant score can dramatically affect the

range of scores

2. Variance A measure of variability is better than range since it

utilizes all the scores in quantifying the degree of variability in the data

Has statistical properties that make it useful in influential statistics

It answers the question “ on average, how much do the scores in the sample differ from the mean of the sample”

Takes the mean as the reference point Takes into account the deviation of each individual

observation from the mean It is the average of the squared deviations from the

mean Another definition of mean is the score around

which the sum of the deviations equals zero.

Page 5 of 6

Page 6: B11R01 - Descriptive Statistics.docx

Block XI | Research | Lesson 1Descriptive Statistics

The more variability in a group, the higher the value of the variance; the more homogenous the group, the lower the variance

3. Standard Deviation Square root of the variance Transforms the variance back into the same units as

the original scores For ungrouped data:

Coefficient of Variation Use it when units of measurement of variables being

compared are different Ex. Weight in kg VS height in cm Or when the means differ markedly A more peaked pot shows less variability Measure of relative dispersion which expresses the

standard deviation as a percentage of the mean

MEASURES OF LOCATION Percentile is one of the 99 values of a variable which

divides the distribution into 100 equal parts Decile is one of 9 values of a variable which divides

the distribution into 10 equal parts Quartile is one of the 3 values of a variable which

divides the distribution into 4 equal parts

Percentile Most frequent type of measure used to report the

results of standardized tests, anthropometric measurements

These scores are normed on very large groups in which the scores form an approximately normal distribution

A person’s percentile rank is a very close estimate of how many persons could be expected to score lower than that person

Easiest score to understand Ex. NMAT Scores

Interquartile Range (IQR) It is a measure of spread,

It is primarily used to build box plots. It can also be used as a test for normal distribution. The formula can be used to find outliers in a data set. It is a measure of where the first and last data items

are in a set The difference between the first quartile and third

quartile of a set of data or the difference between the upper quartile or the lower quartile

The IQR formula is used in conjunction with the mean and standard deviation to test whether or not a population has a normal distribution.

Reference:Dr. Posecion’s Lecture

Page 6 of 6