psychology 510/511 lecture 3 · web viewmore than you ever wanted to know about the standard...

46
PSY 5100 Lecture 2 Numeric Descriptors. What is described? What kinds of characteristics can a collection of numbers have? People can be kind, aloof, gregarious, tall, friendly, mean, spacy, etc. Cities can be forward-looking, violent, progressive, etc. Cars can be fast, economical, stylish, ugly, heavy, etc. Just as there are certain characteristics which seem to "belong" to people or cities or cars, there are a few characteristics which "belong" to collections of numbers and which statisticians feel should be mentioned whenever an attempt is made to describe a collection. The Big Three Characteristics of data 1: Central Tendency The first characteristic is called the central tendency. (It's also called "average" value, location, and expected value.) It reflects the sizes of the numbers in the collection. Consider the following weights: 230, 260, 305, 195. Compare them with the following: 115, 120, 105, 94, 110,115, 100 90, 85. Even though the second collection has more scores in it, the central tendency of the first is larger. The scores in the first collection are larger than those in the second. 2: Variability The second important characteristic of collections of numbers is the variability of the values. It is also called the dispersion, heterogeneity or width of the values. This characteristic reflects the differences between the values. If all the values are close to each other we say that variability is small. If the values in the collection are quite different from each other, we say that variability is large. Consider the following collection: 150, 155, 158, 160, 153, 156, 152. Compare it with: 85, 175, 305, 95, 130. Note that the scores in the second collection are quite different from each other. Thus, the second collection is more variable than the first. Measures of CT and Variability - 1 4/11/2022

Upload: others

Post on 02-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

PSY 5100 Lecture 2Numeric Descriptors.What is described?

What kinds of characteristics can a collection of numbers have?

People can be kind, aloof, gregarious, tall, friendly, mean, spacy, etc. Cities can be forward-looking, violent, progressive, etc. Cars can be fast, economical, stylish, ugly, heavy, etc.

Just as there are certain characteristics which seem to "belong" to people or cities or cars, there are a few characteristics which "belong" to collections of numbers and which statisticians feel should be mentioned whenever an attempt is made to describe a collection.

The Big Three Characteristics of data

1: Central Tendency

The first characteristic is called the central tendency. (It's also called "average" value, location, and expected value.) It reflects the sizes of the numbers in the collection.

Consider the following weights: 230, 260, 305, 195. Compare them with the following: 115, 120, 105, 94, 110,115, 100 90, 85.

Even though the second collection has more scores in it, the central tendency of the first is larger. The scores in the first collection are larger than those in the second.

2: Variability

The second important characteristic of collections of numbers is the variability of the values. It is also called the dispersion, heterogeneity or width of the values. This characteristic reflects the differences between the values. If all the values are close to each other we say that variability is small. If the values in the collection are quite different from each other, we say that variability is large.

Consider the following collection: 150, 155, 158, 160, 153, 156, 152. Compare it with: 85, 175, 305, 95, 130.

Note that the scores in the second collection are quite different from each other. Thus, the second collection is more variable than the first.

3: Shape

Shape refers to the way score values are position or placed on the number line.In some distributions, the scores are all piled up on one side or the other. In others, the scores are piled up in the middle.

Shape will be considered in detail after graphical methods of description have been introduced.

Other Characteristics

4. Relation between paired values.

We will consider the relation or correlation between paired data later in the course.

Measures of CT and Variability - 1 5/16/2023

Page 2: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Numeric Measures of Central Tendency and VariabilityHowell Chapters 4 & 5

Pros and Cons of Tables and Graphs

Pros

1. Easy for the laypeople to understand.

2. Many are fairly easy to construct.

3. Show the complexities of distributions and comparisons of distributions – central tendency, variability, shape, outliers all in one presentation.

4. Particularly good for identifying problem distributions and outliers.

5. Don’t require or assume specific distribution shape, such as normality.

Cons (relative to numeric summaries)

1. Take up space.

2. Are not amenable to further computations – no analog to a mean of means, for example.

3. Richness of information may make you crazy.

4. Not useful for generalizing from samples to populations.

Numeric Summaries

Single values chosen to represent a characteristic of data.

Measures of central tendency Single values chosen to represent central tendency of a collection.

Measures of variability – Single values chosen to represent variability of a collection.

Measures of skewness – Single values chosen to represent skewness of a distribution

Measures of kurtosis – Single values chosen to represent how similar the distribution is to the normal distribution

Looking ahead

Measures of correlation – The extent to which values of one variable covary with paired values of another variable.

Fewer than 20 measures that you’ll have to be able to interpret as a data analyst.

Measures of CT and Variability - 2 5/16/2023

Page 3: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Measures of central tendency

From worst to best

The Mode:

Definition: Value that occurred most frequently in the collection.

Example data: 5 6 7 7 7 7 8 9 10 11 13 Mode is 7

Problems 1) Often not computable, especially with small samples.

E.g., What’s the mode of 3,4,5,5,6,7,8,8,9?

2) Very unstable (unreliable) from sample to sample.

Should only be reported . . .

1) When it dominates the data, e.g., 70% of scores are one value.

2) When data are nominal, e.g., gender, ethnic group, in which case other quantitative measures are not appropriate

Don’t report it (on penalty of lost points) in other situations

Measures of CT and Variability - 3 5/16/2023

Page 4: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

The median

Conceptual definition: Value above which and below which 50% of scores fall.

Example data: How about: 2 4 6 8 Hmm. We need to be more precise.

Operational definition: 1) Order the scores. 2) For odd N, median is middle score in the ordered list.

. For even N, median is the average of the two middle scores in the ordered list.

Example 1 – N is odd

Xs: 81, 69, 77, 93, 96, 99, 83, 85, 75, 89, 94

Ordered: 69, 75, 77, 81, 83, 85, 89, 93, 94, 96, 99. Median is 85.

Xs: 81 69 77 92 96 99 85 85 75 89 94

Ordered: 69 75 77 81 85 85 89 93 94 96 99. Median is 85.

Example 2 – N is even

X’s: 81, 69, 77, 93, 96, 99, 83, 85, 75, 89, 94, 57

Ordered: 57, 69, 75, 77, 81, 83, 85, 89, 93, 94, 96, 99. Median is (83+85)/2 = 84.

Pros

1. Gives an indication of the center of the distribution.

2. Usually not affected by outliers. E.g.,

Median of 69, 75, 77, 81, 83, 85, 89, 93, 94, 96, 999 is 85.

So the 999 didn’t affect it. Robust with respect to outliers.

3. All in all, a very useful measure.

Cons

1. For normally distributed data for which there are absolutely no outliers, median is slightly less stable from sample to sample than the mean.

2. Not a part of the normal distribution. Not descended from royalty.

Measures of CT and Variability - 4 5/16/2023

Page 5: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

The mean

Definition: Arithmetic average of the scores.

Weighted sum of the scores with weighting equal to 1/N.

Symbols

Group: Sample PopulationSymbol: X or MX µ (Pronounced myou.

Pros

1. Good heritage – comes from royalty. It’s a part of the normal distribution formula.

2. For normally distributed data with no outliers, most stable from sample to sample.

3. Computation is straightforward, doesn’t involve sorting.

Cons

1. Can be dramatically affected by outliers.

For example, mean of 69, 75, 77, 81, 83, 85, 89, 93, 94, 96, 99 from above is 82.8.

But the mean of 69, 75, 77, 81, 83, 85, 89, 93, 94, 96, 999 is 167.4, a value not close to ANY of the original scores. Compare this with the median of the above data. You should always compute both and compare them.

2. Related to the above, many analysts feel that the mean is unrepresentative of skewed data.So you should routinely compute the median AND the mean. If they’re approximately equal, then use the mean.If they’re different, then probably the median is more appropriate.

Measures of CT and Variability - 5 5/16/2023

If you mated a cat that says “meow” and a cow that says “moo”, the offspring would say “mu”.

Best

Worst

Median

Mode

Mean

Page 6: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Trimmed mean

Definition: Mean of the scores remaining after the largest K% and smallest K% have been removed. Typically, K is 5.

Having your cake and eating it too - gives the the benefits of the mean without the sensitivity to extreme values.

Olympic tradition.

Pros.

1. Less affected by outliers.2. May be more reliable from sample to sample than median.

Cons

1. Still not representative of skewed data in my view.

When to use the various measures of Central TendencyMemorize this table. Make a locket out of it.

I. Numeric Variables

Distribution ShapeUnimodal and Symmetric (US) Skewed

No Outliers Mean MedianOutliers may be present Median

Trimmed MeanMedian

Some common skewed distributions: Salaries, Time to criterion, Housing prices

II. Nominal Variables.

The mode is the only measure that makes sense when you're attempting to summarize nominal data.

Measures of CT and Variability - 6 5/16/2023

Page 7: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Measures of Variability1. The Range

Definition: Difference between largest score and smallest.

2 problems.

1. Range is restricted whenever score values are restricted.

Use of 5-point scales on questionnaires is a good example.

2. Range is unstable from sample to sample.

Don’t use as the primary measure.

Measures of CT and Variability - 7 5/16/2023

Page 8: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

2. The Interquartile Range

Quartiles: Points identifying "quarters" of a distribution.

Conceptual Definitions

Q4 Fourth Quartile The value below which 4/4th's of the distribution falls.

Q3 Third Quartile The value below which 3/4ths of the distribution falls.

Q2 Second Quartile The value below which 2/4ths of the distribution falls.

Q1 First Quartile The value below which 1/4th of the distribution falls.

Q0 "Zeroth" Quartile The value below which 0/4th's of the distribution falls.

Operational Definitions

Q4 The largest score in the distribution.

Q3 The median of the upper half of the distribution. (If N is odd, include the overall median in the upper half.)

Q2 The overall median of the collection. Compute using the median formula.

Q1 The median of the lower half of the distribution.. (If N is odd, include the overall median in the lower half.)

Q0 The smallest score in the distribution.

Interquartile Range: The distance (on the number line) between the Q1 and Q3 - between the first quartile and the third quartile.

IQR = Q3 - Q1

Interpretation

The distance or interval size required to contain the middle 50% of the scores.

If the middle 50% is contained in a small area, the distribution is quite "crowded" - the scores are close to each other; the distribution has little variability.

If the middle 50% is contained in a wide area, the distribution is sparse - the scores are far from either other; the distribution has much variability.

Measures of CT and Variability - 8 5/16/2023

Page 9: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Example - A distribution with an even number of scores.

IQR = 45 – 30 = 15.

Example - A distribution with an odd number of scores.

IQR = 42.5 – 25 = 17.5

Measures of CT and Variability - 9 5/16/2023

75 65 50 45 40 40 35 35 30 30 30 25 25 10

Upper half of distribution

Upper half of distributionLower half of distribution

Note that 35, the overall median is included in both the lower and upper halves.

65 50 45 40 35 35 30 25 25 20 15

Page 10: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Data Examples illustrating the Interquartile Range

Conscientiousness scale scores from the Bias Study Questionnaire Packet administered at the beginning of semester in 2008. Each person’s score was the mean of either 10 items (IPIP) or 12 items (NEO-FFI). For each, the response scale was a 5-point scale, numbered from 1 to 5.

Distribution of Conscientiousness scores from the IPIP Personality Questionnaire.

About 50% of scores are between 3.25 and 4.

Distribution of Conscientiousness scores from the NEO-FFI Personality Questionnaire

About 50% of the scores are between 3.3 and 4.1.

Both the IPIP questionnaire at the top and the NEO questionnaire at the bottom were scored on the same 5-point scale.

The two distributions are pretty nearly identical and about equally variable.

Measures of CT and Variability - 10 5/16/2023

Statistics

nconN Valid 189

Missing 0Mean 3.70767Median 3.83333Std. Deviation .574311Range 2.750Percentiles 25 3.33333

50 3.8333375 4.12500

Interquartile range = 4.12 – 3.33 = 0.79

Statistics

iconN Valid 189

Missing 0Mean 3.59418Median 3.60000Std. Deviation .614729Range 3.000Percentiles 25 3.25000

50 3.6000075 4.00000

Interquartile range = 4.00 – 3.25 = 0.75

Page 11: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

3. Variance Start here on 9/3/17.

Why do we need another measure? We need a measure that reflects EVERY score.

Definition 1: The sum of the squared differences of all the scores from the mean divided by N.

This is the “dividing by N” definition. Use this formula for populations.

Definition 2: The sum of the squared differences of the scores from the mean divided by N-1.

This is called, you guessed it, the “dividing by N-1” definition. Use this formula for samples.

The variance is a useful theoretical measure of variability, but it’s not useful as descriptive measure because it’s in squared units.

Variance is part of the normal distribution formula, so it has good roots.

Variance is a part of many formulas (e.g., t, F) in inferential statistics.

Pros

1. Reflects the value of every score in the sample or population.

2. Is part of the Normal Distribution formula, so it has good roots.

3. Is a key quantity in many inferential statistics.

Cons

1. Since the deviations are squared, the value of the variance often does not look like the raw data.

Measures of CT and Variability - 11 5/16/2023

Page 12: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

4. Standard Deviation

Definition 1: Square root of the sum of the squared differences of the scores from the mean divided by NThat is, the standard deviation is the square root of the variance. This definition is for populations.

Definition 2: Square root of the sum of the squared differences of the scores from the mean divided by N-1.This definition is for samples.

Wait! Is this daja vu all over again. Do these formulas seem familiar?

It should, because the standard deviation is simply the square root of the variance.

Symbols and formulas for the Variance and Standard Deviation

Group Sample Sample PopulationPopulation

Measure Variance Standard Deviation Variance Standard Deviation

Symbol S2 S σ2 σ

Σ(X-Mean)2 Σ(X-Mean)2 Σ(X-Mean)2 Σ(X-Mean)2

Formula --------- ------------ ----------- ------------- N – 1 N – 1 N N

Pros of the standard deviation

1. Good roots – is in the normal distribution formula.

2. Most reliable for normal distributions (with no outliers).

Cons of the standard deviation

1. Inflated by the presence of outliers. Can be dramatically inflated by them.

2. What’s it mean??

In spite of the Cons, the standard deviation is the most frequently used measure of variability.

Measures of CT and Variability - 12 5/16/2023

Page 13: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Wrap up – when to use each measure of variability. ...US distribution Skewed Distribution

No outliers Standard deviation IQROutliers possible IQR IQR

Measures of Central Tendency and Variability – Biderman’s recommendations

US distribution Skewed DistributionNo outliers Mean

Standard deviationMedianIQR

Outliers possible Median or Trimmed MeanIQR

MedianIQR

Of course, since it’s so easy to compute ALL of the measures, you don’t have to restrict yourself to just one. Compute them all.

Use a comparison of the different measures to give yourself a better understanding of your data.

If all measures of central tendency are about equal to each other, that tells you your data are well-behaved.

If all measures of variability are what you would expect, that also tells you your data are well-behaved.

Measures of CT and Variability - 13 5/16/2023

Page 14: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Mean Mean + SDMean - SD

More than you ever wanted to know about the Standard deviation

Assume you have a large (e.g., N >= 30) collection of scores that are unimodal and symmetric.

1. About 2/3 of the scores will be within 1 SD of the mean

2. About 95% of the scores will be within 2 SDs of the mean

So, if you scored 2 standard deviations about the mean in Conscientiousness, what would be your approximate score? 2 SDs above the mean would be 3.6 + .61 + .61 = 4.83. Two SDs below is 3.6 – 1.22 = 2.4

We use these facts to get a “feeling” for a collection of scores and to identify persons who are extreme.

Measures of CT and Variability - 14 5/16/2023

Mean Mean + SDMean - SD Mean + 2 SDMean - 2 SD

-About 2/3 of scores in here --

-------------------------About 95% of scores in here ---------------------------

Page 15: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Measures of distribution shapeMeasure of skewness

A popular measure of skewness is the following, given by

Kirk, R. (1999). Statistics: An introduction. 4th Ed. New York: Harcourt Brace.

Skewness = (Σ(X-Mean)3 / N ) / S3

In English: The sum of the cubed deviations of scores from the mean divided by N, then divided by the cube of the standard deviation.

Or, the average of the cubed deviations of scores from the mean then divided by the cube of the standard deviation.

Interpretation of values

Value of Skewness measure Interpretaton

Larger than 0 Positively skewed distribution

0 Symmetric distribution

Less than 0 Negatively skewed distribution

Measures of CT and Variability - 15 5/16/2023

Page 16: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Example of the skewness statistic

1. Salaries from the Employee Data file. 2. Extroversion scores of 109 UTC students

Measures of CT and Variability - 16 5/16/2023

Statistics

hex t109

1

-.220

.231

Val i d

Mis s ing

N

Sk ewnes s

Std. Error of Sk ewnes s

0.00 2.00 4.00 6.00 8.00

hext

0

2

4

6

8

10

12

14

Freq

uenc

y

Mean = 4.4582Std. Dev. = 0.95104N = 109

Histogram

Statis tics

s a lary Curren t Sa lary474

0

2.125

.112

Val id

Mis s i ng

N

Sk ewnes s

Std . Erro r o f Sk ewnes s

$0$20,000

$40,000$60,000

$80,000$100,000

$120,000$140,000

Current Salary

0

20

40

60

80

100

120

Freq

uenc

y

Mean = $34,419.57Std. Dev. = $17,075.661N = 474

Histogram

Page 17: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Kurtosis

Kurtosis refers to the relationship of the shape of a distribution to the shape of the Normal Distribution.

Kirk gives the following measure of Kurtosis

Kursosis = ( (Σ(X-Mean)4 / N ) / S4 ) - 3

In English: The sum of the deviations of scores from the mean raised to the fourth power divided by N, then divided by the standard deviation raised to the fourth power minus 3.

The average of the 4th-powered deviations from the mean divided by the standard deviation to the 4th power, then minus 3.

Interpretation

Value of Kurtosis measure Interpretaton

Larger than 0 More peaked than the Normal distribution

0 Same peakedness as the Normal distribution.

Less than 0 Less peaked (flatter) than the Normal distribution.

Measures of CT and Variability - 17 5/16/2023

Page 18: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Example

1. Extroversion scores of 109 UTC students

Although it’s not immediately apparent from the histogram, according to the Kurtosis measure the distribution is slightly less peaked than the Normal Distribution.

Measures of CT and Variability - 18 5/16/2023

Statistics

hex t109

1

-.371

.459

Valid

Miss ing

N

Kurtos is

Std. Error of Kurtos is

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00

hext

0

5

10

15

20

25

Freq

uenc

y

Mean = 4.4582Std. Dev. = 0.95104N = 109

Histogram

Page 19: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

2. Measure of Affect from responses of 1195 students to the HEXACO questionnaire.

Measures of CT and Variability - 19 5/16/2023

Statisticshmeanesemtrocrmmpmn HEX mean

M,Mp,Mn from ESEMTR O CR MMpMn

model

N Valid 1195

Missing 613

Mean .0000

Median .0173

Mode .16a

Std. Deviation .51524

Variance .265

Skewness -.112

Std. Error of Skewness .071

Kurtosis -.087

Std. Error of Kurtosis .141

a. Multiple modes exist. The smallest

value is shown

Page 20: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Missing DataWhy consider missing data here? Because the presence of missing data complicates the computation and representation of data using numeric summaries.

Reasons for missing data include1) respondents failing to answer questions in a survey.2) values incorrectly entered into the computer.3) values that represent “Don’t Know” or “Don’t Care” or “Won’t tell you” responses.

In SPSS parlance, a missing value is an actual value that was put into the data not as a valid data value but in order to represent the fact that a score is in fact, missing.

In SPSS, an empty cell in the data editor means that the value that should have been there is missing. But in many situations, an actual value must be recorded when there is a missing response. Such values are the “missing values” we’re dealing with here.

If you’re saving data as a text file for use in another program, it is often easiest to for every cell in the data editor to have something in it prior to saving.

Missing values are not a terribly important issue when frequency distributions and graphs are used to summarize data because they’re just part of the summary.

But when a statistic is to be computed, values that “don’t count” should not be included in the computation. The statistical package has to be told that such values are special and are not to be included in computation of statistics.

Missing data are represented in SPSS in two ways.

1) Empty cells in the Data Editor window. These are called SYSTEM MISSING.2) Actual values entered into the Data Editor window but given “Missing Value” status by you.

In RCMDR , the NA symbol is used to represent missingness.

In Excel, only empty cells are recognized as missing values

Measures of CT and Variability - 20 5/16/2023

Page 21: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

To tell SPSS that one of the values of a variable is to be treated as a “Missing Value”,

1) Click on the “Variable View” tab at the lower left of the Data Editor window.

2) Click under “Missing” in the same row as the variable for whom Missing values are to be declared.

3) Enter the values to be treated as missing in the dialog box shown below.

Measures of CT and Variability - 21 5/16/2023

Page 22: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Comparing Groups Using SPSS

One of the goals of the ATV study was to determine whether severity of injury was related to whether or not the rider was wearing a helmet.

SPSS FREQUENCIES output for helmet

Suppose we wished to compare injury severity of persons who were wearing a helmet at the time of their accident with the injury severity of persons who were not wearing a helment.

We might also be interested in the ISS scores of persons for whom no information was available on helmet use.

The variable, helmet, can be used to form three groups – helmet wearers, non wearers, and no information groups.

Most statistical programs have special procedures for comparing descriptive statistics between groups defined by a variable such as helmet.

Measures of CT and Variability - 22 5/16/2023

Page 23: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

The SPSS EXPLORE procedure – a procedure for group comparisonsA procedure in SPSS designed to allow comparison of groups using a variety of descriptive techniques.We’ll compare ISS scores of helmet users vs non helmet users.

Analyze Descriptive Statistics Explore

The EXPLORE main dialog window (Data are the ATV data)

Analysis specifics

Measures of CT and Variability - 23 5/16/2023

I told the program to give me histograms.

Page 24: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

The EXPLORE Output

Measures of CT and Variability - 24 5/16/2023

I clicked on Options and told the program to include reports for missing values.

Page 25: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Whew!!

Note that no statistical tests such as t-tests or analysis of variance, are reported by EXPLORE.

Measures of CT and Variability - 25 5/16/2023

Whew!!

Page 26: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

It appears that the “Info Unavailable” group also had no very high ISS values.

Note that the Helmet group had no patients with very high ISS values.

Note that only the No-helmet group had patients with very high ISS values.

The Histograms

Note:

1. I stacked the histograms vertically – following the rule for comparing groups using histograms.

2. I manipulated the histogram so that they would have equal x-axis labels and equal column widths. You must also do this for every comparison involving histograms that you submit to me.

Measures of CT and Variability - 26 5/16/2023

Page 27: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

To manipulate x-axis labels in SPSS.

1. Double-click on the histogram to open the Chart Editor window.

2. Double-click on one of the x-axis numbers

3. Then click on Scale and choose the appropriate scale values – in this case I chose 0, 80, and 10 for Minimum, Maximum, and Major Increment

4. Click on Apply.

Measures of CT and Variability - 27 5/16/2023

Page 28: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

To manipulate column width in SPSS

1. Double-click the figure to open the Chart Editor window.2. Double-click on a column.

2. Click on Binning, then click on Custom, and enter the desired width. I entered 5.

3. Click on Apply.

Measures of CT and Variability - 28 5/16/2023

Page 29: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

I clicked on the “Summarize by groups…” button.

Comparing Groups in rcmdr . . .R Load Packages RcmdrData Inport Data from SPSS dataset ATVDataForClass050906.sav

Statistics Summaries Numerical summaries . . .

> numSummary(ATVData[,"iss"], groups=ATVData$helmet, statistics=c("mean", + "sd", "IQR", "quantiles"), quantiles=c(0,.25,.5,.75,1)) mean sd IQR 0% 25% 50% 75% 100% data:nno 11.39244 8.647624 11 1 5 9 16 75 344yes 7.84127 4.749215 6 1 4 8 10 25 63

Measures of CT and Variability - 29 5/16/2023

Page 30: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

I clicked on the “Plot by groups…” button.

graphs histogram. . .

As was the case with the SPSS histogram, it’s clear to see that the No helmet group had larger ISS values.

Measures of CT and Variability - 30 5/16/2023

Page 31: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

I clicked on the “Plot by groups…” button.

Dot plots in rcmdr by helmet group

Same conclusion from the dot plot – more larger ISS values in the No helmet group.

WEAR YOUR HELMET!!!

Measures of CT and Variability - 31 5/16/2023

Page 32: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Making use of both scale level and scale variability - skipped in 2017

In psychology, we often create scale scores by summing or averaging the responses to several items, all of which refer to the same construct, such as conscientiousness, for example.

Often, the items are responded to on a 1- 7 scale, with 1 meaning “Least like me” and 7 meaning “Most like me.”

“I keep my room orderly.”“I strive to get ahead.”

We typically think only of the level of a psychological variable, how big the responses to all items making up the scale were.

But what about how different an individual’s responses were from item to item – the variability of responses.

Data: IPIP Conscientiousness Scale.Excerpt from Data Editor

gencon is the typical Conscientiousness scale score

sgencon is the standard deviation of responses to the 10 conscientiousness items.

Measures of CT and Variability - 32 5/16/2023

Page 33: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Compare lines 1 and 8 – both have the same scale level (4.00) but 8 is much more variable than 1.Compare lines 17 and 20 – both have the same variability (1.07) but 20 has a higher scale value than 17.

These examples suggest that both levels and variabilities are exhibited by the responses to questionnaires.

Are these differences of any use to us???

Measures of CT and Variability - 33 5/16/2023

Page 34: Psychology 510/511 Lecture 3 · Web viewMore than you ever wanted to know about the Standard deviation Assume you have a large (e.g., N >= 30) collection of scores that are unimodal

Distributions of level We looked at both the level of responses – the typical score computed from a questionnaire - and also the variability of responses to Conscientiousness items.

and variability . . .

Note that both distributions are approximately unimodal and symmetric, although the distribution of standard deviations is slightly positively skewed.

We’ve foun, as have a probably more than 100 other researchers, that level of conscientiousness (gencon in the above graph) is a valid predictor of GPA. It’s not a perfect predictor, but it has been found to be statistically significant in a vast majority of studies. People who score high on conscientiousness scales generally get better grades than people with the same intelligence who score lower on conscientiousness.

Now here’s something that is almost new to our research here at UTC: We have found that variability in self-reported conscientiousness (sgencon in the above) is ALSO a valid predictor of GPA. Only about 5 studies have found that – all of them conducted here at UTC. The relationship is inverse. People who are more inconsistent in their self-reports (who have higher sgencon values) have slightly LOWER GPAs than people who are less inconsistent.

So both level and variability may be of use to us.

Measures of CT and Variability - 34 5/16/2023

The largest levels of Conscientiousness predicted high GPAs. People who are high in level of conscientiousness have higher GPAs.

But the smallest variabilities of Conscientiousness predicted high GPAs. People who are less variable in their report of conscientiousness have higher GPAs.