b11r01 - descriptive statistics.docx
TRANSCRIPT
![Page 1: B11R01 - Descriptive Statistics.docx](https://reader036.vdocuments.site/reader036/viewer/2022082504/563db992550346aa9a9e9abe/html5/thumbnails/1.jpg)
SUMMARY/OUTLINEI. Data Analysis & Collection
A. Getting Ready for Data AnalysisB. 10 Commandments for Data AnalysisC. Data AnalysisD. Levels of Measurements
1. Nominal2. Ordinal3. Interval4. Ratio
II. StatisticsA. Selecting Statistical AnalysisB. Major Uses of Statistical Procedures
1. Inferential Statistics2. Descriptive Statistics
a. Frequency Distributionb. Graphical Presentation of Datac. Summary Statistics
i. Measures of Central Tendencya) Meanb)Modec) Mediand)Distributions
- Symmetrical- Positively Skewed- Negatively Skewed
ii. Measures of Variabilitya) Rangeb)Standard Deviationc) Variance
- Coefficient of Variabilityiii. Measures of Location
a) Percentileb)Quartilec)Decile
DATA ANALYSIS & COLLECTIONA. Getting Ready for Data Analysis1. Research proposal2. Data collection form3. Data collection4. Data processing - looking for errors, for completeness,
arranging, and encoding5. Data analysis
B. Ten Commandments of Data Collection1. Think of the data you have to collect to answer your
question.2. Think about where you will be getting your data.
There should be a catchment area.
3. Make sure that the data collection form you are using is clear and easy to use. Tools should be validated and undergo pre-testing.
4. Once you transfer your scores to your data collection form, make a duplicate copy of the data file and keep it in a separate location. Always have a back up.
5. Do not rely on other people to collect or transfer your data unless you have personally trained them and are confident that they understand the data collection process as well as you do.
6. Plan a detailed schedule of when and where you will be collecting your data. This is where we make a Gantt chart.
7. As soon as possible, cultivate possible sources for your participant pool.
8. Try to follow up on subjects who missed their testing session or interview. Make sure that the data collection form is completely filled in before leaving. It is very difficult to follow-up respondents.
9. Never discard original data. You may have missed something.
10. Follow the previous 9.
C. Data Analysis Process of summarizing trends and patterns
observed in the data Determine major differentials or relationships among
variables used in the study Application of appropriate statistical tests on a set of
data to answer the objectives of a study
Type of data analysis to use depends on the: Objective of the study (to describe groups,
determine sensitivity, to know risk factors, etc.) Kind of scales of measurement of the data or
variables being dealt with is very important too.
- Levels of Measurement 1. Nominal
Simplest, no mathematical values, categorical scale, we don’t measure but we count number of observations with or without attribute or interest
2. OrdinalRanked into two or more orders, distance between is not the same, ex. small medium large, observations are greater than others; you use this on parametric statistical tools
Page 1 of 6SGD 7B| Acanto, Maquilang, Roldan
Block XI | Research | Lesson 1DESCRIPTIVE STATISTICSDr. Telia Avendano Posecion September 15, 2015 (10:00AM-12:00 PM)
![Page 2: B11R01 - Descriptive Statistics.docx](https://reader036.vdocuments.site/reader036/viewer/2022082504/563db992550346aa9a9e9abe/html5/thumbnails/2.jpg)
Block XI | Research | Lesson 1Descriptive Statistics
3. Interval Data have numerical value, distance in between have equal distances
4. RatioSame as interval, difference of addition of meaningful zero point
STATISTICS Powerful tool for organizing and understanding data Provides ways to represent and describe groups,
summarize results and evaluate data All about summarizing
A. Selecting Statistical Analysis: Initial Flowchart
B. Two Major Uses of Statistical Procedures1. Descriptive Statistics First step in analysis of data is to describe them Simplify and organize data Describe some of the characteristics of the
distribution of scores you have collected Demographic data usually first
2. Inferential Statistics Interpret what the data mean
Help you make decisions about how the data you collected relates to your original hypothesis
Help you make generalizations but should be careful, it depends on sampling method etc.
DESCRIPTIVE STATISTICSA. Frequency distributionB. Graphical representation of dataC. Summary statistics
A. Frequency Distribution Simplest way to organize and summarize data at a
glance How often? List of the number of participants who fall in a
particular category It is helpful to convert frequencies to percentages If there are many possible scores between the
highest and lowest scores, frequency distribution will be long and almost as difficult to read as the original data Used usually if the data is nominal.
We use group frequency distribution which shortens the table to a more manageable size.
Sometimes, it is helpful to categorize participants on the basis of more than one variable at the same time. This is called cross-tabulation. Cross-tabulations can help us to see relationships between nominal measures. Usually a 2x2 table sometimes called a contingency table.
Table 1. Contingency table of males and females pro- and anti- reproductive Health (RH) Bill
Sex pro RH Bill anti RH Bill TotalMale 38 12 50
Female 24 26 50total 62 38 100
B. Graphical Presentation of Data “one picture is worth a thousand words” Clarify a data set Most people find a graphic representation easier to
understand than other statistical procedures Helps interpret a summary statistic or statistical set
Page 2 of 6
![Page 3: B11R01 - Descriptive Statistics.docx](https://reader036.vdocuments.site/reader036/viewer/2022082504/563db992550346aa9a9e9abe/html5/thumbnails/3.jpg)
Block XI | Research | Lesson 1Descriptive Statistics
Examples:1. Bar Graph (Used if data is discrete/nominal with
categories)
2. Histogram (Used if the data is continuous)
Bar graph vs Histogram The bars represent different categories. They could
be rearranged. Histograms use continuous data where the bins
represent ranges of data rather than categories.
3. Pie Chart - Used if there are more than two variables being compared
4. Component Bar Graph -used to show different categories, within a category, there is another category
5. Frequency polygon- a graphical device for understanding the shapes of
distributions. They serve the same purpose as histograms, but are especially helpful for comparing sets of data. Frequency polygons are also a good choice for displaying cumulative frequency distributions.
6. Line Gram (can be days of the week, years)
Page 3 of 6
![Page 4: B11R01 - Descriptive Statistics.docx](https://reader036.vdocuments.site/reader036/viewer/2022082504/563db992550346aa9a9e9abe/html5/thumbnails/4.jpg)
Block XI | Research | Lesson 1Descriptive Statistics
7. Scatter-plot (relationships and correlations)
Types of Graphs Commonly Used in Presenting
C. Summary statistics Measures of Central Tendency
- Mean, Median, Mode Measures of Variability
- Range, Variance, Standard Deviation, - Coefficient of Variability
Measures of Location- Percentile , decile, quartile
MEASURES OF CENTRAL TENDENCY1. Mean Most commonly used measure of central tendency
unless distribution is skewed The average
When a distribution is skewed, the most strongly affected is the mean
May be influenced by extreme values Not ordinarily used with ordinal data because of the
arbitrary nature of an ordinal scale
2. Median Score in a distribution above which one half of the
scores lie Order scores/data from lowest to highest then select
the middle score as the median Used if distribution is skewed If n= even, find the mean of the 2 middle scores Often used to compute an average when extreme
scores are involved Used in ordinal data To determine the position of the median, the
following formula may be used:
3. Mode Most frequently occurring value An excellent choice if you want a general
overview of which class or category occurs most frequently
A distribution may have more than one mode Common mistake: The mode is the value and not
the frequency of that value. Easily computed but unstable; can be affected by
a change in only one or two scores Choice of the Measures of Tendency depends on
nature of the distribution and concept of central tendency which is desired
Can be bimodal, (2), Multimodal (more than 2)
Choice of Measure Tendency (very important)Depend on:a. Nature of distributionb. Concept of central tendencyc. Scale /level of measurement
Guidelines to help you decide which measure of central tendency is best:
Mean is used for numerical data and for symmetric distribution.
Median is used for ordinal data or for numerical data if the distribution is skewed
Mode is used primarily for bimodal distributions
Page 4 of 6
![Page 5: B11R01 - Descriptive Statistics.docx](https://reader036.vdocuments.site/reader036/viewer/2022082504/563db992550346aa9a9e9abe/html5/thumbnails/5.jpg)
Block XI | Research | Lesson 1Descriptive Statistics
4. Distributions Symmetric / normal- Bell shaped curve- Most participants are near the middle of the
distribution- Location of the measures of central tendency for a
symmetric distribution
Positively skewed- Direction is indicated by the tail (skewedness
depends on the tail)- Most of the scores pile up near the bottom- Skewed to the right
Negatively skewed- Most of the scores pile up near the high or positive
side - skewed to the left (clue is to look at the mean, it is to
the left of the median)
MEASURES OF VARIABILITY Highest minus lowest One of the most important concepts in research Measures of variability measure the spread or
degree of variability present in the distribution while central tendency give information only as to the tendency of the values to clump together
Natural variability among participants or samples often can mask the effects of variables under study
Most research designs and statistical procedures were developed to control or minimize the effects of natural variability of scores
Some variables may have large differences between participants, others small
Important points to remember: Scores do vary Degree of variability can be quantified
1. Range Simplest measure of variability Difference between the highest and the lowest
value/score Too unstable because it depends on only two scores A single deviant score can dramatically affect the
range of scores
2. Variance A measure of variability is better than range since it
utilizes all the scores in quantifying the degree of variability in the data
Has statistical properties that make it useful in influential statistics
It answers the question “ on average, how much do the scores in the sample differ from the mean of the sample”
Takes the mean as the reference point Takes into account the deviation of each individual
observation from the mean It is the average of the squared deviations from the
mean Another definition of mean is the score around
which the sum of the deviations equals zero.
Page 5 of 6
![Page 6: B11R01 - Descriptive Statistics.docx](https://reader036.vdocuments.site/reader036/viewer/2022082504/563db992550346aa9a9e9abe/html5/thumbnails/6.jpg)
Block XI | Research | Lesson 1Descriptive Statistics
The more variability in a group, the higher the value of the variance; the more homogenous the group, the lower the variance
3. Standard Deviation Square root of the variance Transforms the variance back into the same units as
the original scores For ungrouped data:
Coefficient of Variation Use it when units of measurement of variables being
compared are different Ex. Weight in kg VS height in cm Or when the means differ markedly A more peaked pot shows less variability Measure of relative dispersion which expresses the
standard deviation as a percentage of the mean
MEASURES OF LOCATION Percentile is one of the 99 values of a variable which
divides the distribution into 100 equal parts Decile is one of 9 values of a variable which divides
the distribution into 10 equal parts Quartile is one of the 3 values of a variable which
divides the distribution into 4 equal parts
Percentile Most frequent type of measure used to report the
results of standardized tests, anthropometric measurements
These scores are normed on very large groups in which the scores form an approximately normal distribution
A person’s percentile rank is a very close estimate of how many persons could be expected to score lower than that person
Easiest score to understand Ex. NMAT Scores
Interquartile Range (IQR) It is a measure of spread,
It is primarily used to build box plots. It can also be used as a test for normal distribution. The formula can be used to find outliers in a data set. It is a measure of where the first and last data items
are in a set The difference between the first quartile and third
quartile of a set of data or the difference between the upper quartile or the lower quartile
The IQR formula is used in conjunction with the mean and standard deviation to test whether or not a population has a normal distribution.
Reference:Dr. Posecion’s Lecture
Page 6 of 6