chapter 3: organizing data. raw data is useless to us unless we can meaningfully organize and...

26
Chapter 3: Organizing Data

Upload: horace-norris

Post on 19-Jan-2016

245 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Chapter 3:Organizing Data

Page 2: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics).

• Organization techniques include:– Tables such as frequency distributions– Graphs such as histograms, bar graphs, line

graphs, pie charts, stem-and-leaf plots, and scatterplots

Page 3: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Frequency Distribution

• A frequency distribution is a table that lists all the categories or values of a variable as well as the corresponding number of occurrences or responses for each category or value of the variable (its frequency, or how often the category occurs).

• Frequency distributions can be used for both categorical variables (nominal or ordinal) and numerical variables (interval or ratio).

Page 4: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

To create a frequency distribution for categorical data:• First create a list of all the categories or values of the

variable and then count the number of times each different category or value occurred in the data.

• Then find the percentage of respondents for each category.

• Set up your basic table to have 3 columns: (1) the list of categories or the values of the variable of interest, (2) the frequency count, and (3) the percentage.

• When dealing with ordinal variables, make sure the categories are ranked (lowest to highest or highest to lowest).

Page 5: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Examples of frequency distributions for nominal variables:

Page 6: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Examples of frequency distributions for ordinal variables:

Page 7: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Statistical Software

• There are many statistical software packages that exist. Most of the output looks the same, so reading the output will be similar amongst programs. The textbook shows output from SPSS (Statistical Package for the Social Sciences). Some other useful programs are StatCrunch, SAS (Statistical Analytics System), and even Excel, amongst many others.

• Let’s look at SPSS output for a frequency distribution.

Page 8: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• Notice this is similar to our frequency distributions, but has a few extra columns.

• Percent is calculated the same way it was calculated prior.• Valid percent accounts for the missing data. It divides the frequency

by the total minus the missing data (1485 for this example).• Cumulative percent is a running total (based on valid %).

Page 9: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• It is important to be able to utilize the frequency distribution to interpret the data and answer questions.

• Utilize the valid percent column when answer percent questions.

Page 10: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• The two common ways to represent frequency distributions of categorical data are bar graphs and pie charts.

• For a bar graph, place the categories on the horizontal axis and either the frequency or the percent on the vertical axis.

Page 11: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• For a pie chart, make sure each sector is labeled, appropriately sized, and contains the percent.

Page 12: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Simple Frequency Distribution for Numerical Data

• Frequency distributions for numerical data are either simple (the individual values are displayed with their frequencies) or grouped (list grouped frequencies, or equal sized classes).

• Grouped frequency distributions are used when we have a large number of observations.

Page 13: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• Examples:

Page 14: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

To construct a simple frequency distribution:1) Find the lowest and highest numbers.2) In column form, write in ascending order all

the consecutive numbers from the lowest to highest.

3) Count the frequency.

Page 15: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Example: Construct a simple frequency distribution.7 9 5 7 9 7 5 7 9 610 7 6 5 7 8 10 6 9 68 12 8 8 7 5 7 6 8 75 6 9 7 6 8 7 5 5 6

Page 16: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• In a grouped frequency, the numbers are usually grouped into equal-sized ranges called class intervals.

• Each class interval contains a lower class limit and an upper class limit.

• The class width is how wide each interval is, and should usually be equal amongst intervals.

• If the data contains an extremely small or large value, it might not be possible to have intervals of equal width. In this case, use an open-end class interval.

Page 17: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Example: Identify the class width for the following class intervals.

Page 18: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• The class mid-point is the average of the lower and upper class limits.

• The mid-point of a class interval 10-15 would be:

• What is the class mid-point for a class interval of 20-40?

Page 19: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Steps to Construct a Grouped Frequency Table

1) Find the highest and lowest values in the dataset and subtract to find the range.

2) Decide on the desired number of classes (it should be between 5 and 20) and then compute the class width by dividing the range by the desired number of classes. Note: There is no clear right or wrong answer for the number of classes.

3) Select a starting point (lower class limit). Use either the lowest number or a convenient number slightly lower than the lowest score.

4) Add the class width to the starting point to get the second lower limit. List the lower limits in a vertical column and enter the corresponding upper limits. Then fill in the values.

Page 20: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Example: The daily high temperature in degrees F for the month of July in Carucciville was as follows:85 83 96 101 97 90 106 102 82106 104 72 89 85 97 85 94 10092 96 104 102 75 99 92 79 94102 76 99Construct a grouped frequency distribution for the data.

Page 21: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

85 83 96 101 97 90 106 102 82106 104 72 89 85 97 85 94 10092 96 104 102 75 99 92 79 94102 76 99

Note: There are 8 classes instead of the desired 7 due to rounding.

Page 22: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

We can also find the relative frequencies for each class.

Page 23: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

• Sometimes, if a variable is continuous, the values recorded in the study may be rounded off.

• In these instances, we want to know the real limits or class boundaries.

• The real limits of a continuous variable are usually the values that are above or below the recorded value by one-half of the place value to which the numbers were rounded.

• Example: Say we are examining height and rounding to the nearest centimeter. If 130 cm is recorded, the real limits are 129.5-130.5 cm, because anything in those limits would result in us recording 130 cm.

Page 24: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

Example: Find the real class limits or the class boundaries for the following class intervals:

Page 25: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

A histogram is a graph in which the areas in the form of vertical bars represent the frequency of occurrence in a distribution of scores.

Page 26: Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques

A relative frequency histogram has the same shape as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.