notes from frankel and wallen

How to Design and Evaluate Research in Education

ByJack R. Fraenkel and Norman E. Wallen

Chapter 1The Nature of Research

Ways of knowing Sensory experience (incomplete/undependable) Agreement with others (common knowledge wrong) Experts’ opinion (they can be mistaken) Logic/reasoning things out (can be based on false premises) Why research is of value Scientific research (using scientific method) is more trustworthy than expert/colleague opinion, intuition, etc.

Chapter 1 - continuedThe Nature of Research

Scientific Method (testing ideas in the public arena) Put guesses (hypotheses) to tests and see how they hold up All aspects of investigations are public and described in detail so anyone who questions results can repeat study for themselves Replication is a key component of scientific method


Scientific Method (requires freedom of thought and public procedures that can be replicated)

Identify the problem or question Clarify the problem Determine information needed and how to obtain it Organize the information obtained Interpret the results

All conclusions are tentative and subject to change as new evidence is uncovered (don’t PROVE things)


Types of Research

Experimental (most conclusive of methods) Researcher tries different treatments (independent variable) to see their effects

(dependent variable) In simple experiments compare 2 methods and try to control all extraneous

variables that might affect outcome Need control over assignment to treatment and control groups (to make sure

they are equivalent) Sometimes use single subject research (intensive study of single individual or

group over time)


(Types of Research continued) Correlational Research

Looks at existing relationships between 2 or more variables to make better predictions

Causal Comparative Research Intended to establish cause and effect but cannot assign subjects to

trtmt/control Limited interpretations (could be common cause for both cause and effect…

stress causes smoking and cancer) Used for identifying possible causes; similar to correlation


(Types of Research continued) Survey Research

Determine/describe characteristics of a group Descriptive survey in writing or by interview Provides lots of information from large samples Three main problems: clarity of questions, honesty of respondents, return rates

Ethnographic research (qualitative) In depth research to answer WHY questions Some is historical (biography, phenomenology, case study, grounded theory)


(Types of Research continued) Historical Research

Study past, often using existing documents, to reconstruct what happened Establishing truth of documents is essential

Action Research (differs from above types) Not concerned with generalizations to other settings Focus on information to change conditions in a particular situation (may use all

the above methods) Each of these methods is valuable for a different purpose


General Research Types Descriptive (describe state of affairs using surveys, ethnography, etc.) Associational (goes beyond description to see how things are related

so can better understand phenomena using correl/causal-comparative Intervention (try intervening to see effects using experiments)


Quantitative v. Qualitative Quantitative ( numbers )

Facts/feelings separate World is single reality Researcher removed Established research design Experiment prototype Generalization emphasized


Meta-Analysis Locate all the studies on a topic and synthesize results using statistical techniques

(average the results) Critical Analysis of Research (some say all research is flawed)

Question of reality (are only individual perceptions of it) Question of communication (words are subjective) Question of values (no objectivity only social constructs) Question of unstated assumptions (researchers don’t clarify assumptions that

guide them) Question of societal consequences (research serves political purposes that are

conservative or oppressive; preserve status quo)Chapter 1 - continued

The Nature of Research Overview of the Research Process (Fig. 1.4)

Introduction chapter Problem statement that includes some background info and

justification for study Exploratory question or hypothesis (relationship among variables

clearly defined); goes last in Ch. Definitions (in operational terms) Review of related literature (other studies of the topic read and

summarized to shed light on what is already known)


Overview of the Research Process (Fig. 1.4) Methods chapter

Subjects (sample, population, method to select sample) Instruments (tests/measures described in detail and with rationale for

their use) Procedures (what, when, where, how, and with whom);

Give schedule/dates, describe materials used, design of study, and possible biases/threats to validity

4. Data analysis (how data will be analyzed to answer research questions or test hypothesis)

Chapter 2The Research Problem

Statement of the Problem (identify a problem/area of concern to investigate)

Must be feasible, clear, significant, ethical Research Questions (serve as focus of investigation, see p. 28

list) Some info must be collected that answers them (must be

researchable) Cannot research “should” questions See diagram, p. 29

Chapter 2 - ContinuedThe Research Problem

RQ should be feasible (can be investigated with available resources) RQ should be clear (specifically define terms used…operational needed,

but give both) Constitutive definitions (dictionary meaning) Operational definitions (specific actions/steps to measure term; IQ=time to solve

puzzle, where <20 sec. is high; 20-40 is med.; 40+ is low) RQ should be significant (worth investigating; how does it contribute to

field and who can use info) RQs often investigate relationships (two characteristics/qualities tied

together)

Chapter 3

Variables and Hypotheses Important to study relationships

Sometimes just want to describe (use RQ) Usually want to look for patterns/connections

Hypothesis predicts the existence of a relationship Variables (anything that can vary in measure; opposite of

constant) Variables must be clearly defined Often investigate relationship between variables

Chapter 3 - ContinuedVariables and Hypotheses

Variable Classifications (Fig. 3.4, p. 42) Quantitative (variables measured as a matter of degree, using real numbers; i.e.

age, number kids) Categorical (no variation…either in a category or not; i.e. gender, hair color) Independent: the cause (aka the manipulated, treatment or experimental variable) Dependent: the effect (aka outcome variable) Extraneous: uncontrolled IVs (see Fig. 3.2, p. 46)

All extraneous variables must be accounted for in an experimentChapter 3 - Continued

Variables and Hypotheses Hypotheses – predictions about possible outcome of a study; sometimes several

hypotheses from one RQ (Fig 3.3) RQ: Will athletes have a higher GPA that nonathletes? H: Athletes will have higher GPAs that nonathletes

Advantages to stating a hypothesis as well as RQ Clarifies/focuses research to make prediction based on previous research/theory Multiple supporting tests to confirm hypothesis strengthens it

Disadvantages Can lead to bias in methods (conscious or un) to try to support hypothesis Sometimes miss other important info due to focus on hypothesis (peer review/replication is a check

on this)Chapter 3 - Continued

Variables and Hypotheses Some hypothesis more important than others Directional v. nondirectional

Directional says which group will score higher/do better Nondirectional just indicates there will be a difference, but not who will

score higher/do better Directional more risky, so be careful/tentative in using directional ones

Chapter 4

Ethics and Research Examples of unethical practices

Requiring participation from powerless (students) Using minors without parental permission Deleting data that don’t agree w/ hypothesis Invading privacy of subjects Physically or psychologically harming subjects

APA statement of ethical principles in research Each student must sign one and have it signed by workplace

supervisorChapter 4 - Continued

Ethics and Research Protecting participants from harm requires informed consent

Subjects must know the purpose of the study, possible benefits/harm; participation is voluntary and they can w/draw without penalty any time (Fig. 4.3, p. 59)

Researchers should ask: Could subjects be harmed? Is there another way to get the info? Is the info valuable enough to justify study?

Researchers must ensure confidentiality of data (limit access; no names if possible; tell subjects confidential or anonymous)

Deceiving subjects is sometimes necessary (Milgram study), ask if results justify ethical lapse

When deception used subjects they should be okay with it after (and they can refuse use of their data)

Chapter 4 - Continued Ethics and Research

Research with children Parental consent required (signed permission from parents APA Ethics in Research Form addresses this also

Regulation of Research (National Research Act of 1974) If federal funding received must have an IRB to check: risks to subjects, informed

consent guidelines met, debriefing plans for subjects HHS made changes in 1981 so that educational research is exempt under certain

conditions

Chapter 5 Review of the Literature

Value of the Literature Review Glean ideas from others interested in topic See results of related studies (must be able to evaluated those objectively)

Types of sources General References – indexes (of primary sources and abstracts (ERIC, Psych

Abstracts) Primary Sources – publications where researchers report their results (peer

reviewed/refereed journals) Secondary Sources – publications where authors describe works of others

(encyclopedias, tradebooks, textbooks)

Chapter 5 - Continued Review of the Literature

Steps in the Literature Review (manual or electronic) See examples p. 74

Define problem precisely as possible Review some secondary sources* Review some general reference works* Formulate search terms (keywords/descriptors) Search general references for primary sources Obtain and read primary sources (make notes/summarize)

*May be based on existing knowledge or previous readingChapter 5 - Continued

Review of the Literature Making notes

Include problem/purpose; hypotheses/RQ; procedures w/ subjects/methods; findings/conclusions; citation!

Searching strategies…use Boolean operators (AND, OR, NOT) Searching www…be careful of reliability

Writing up the Literature Review Introduction - describes problem and justification for study; Body – discuss related studies together (#2, p.88) Summary – ties literature together/give conclusions arising from literature Reference list

Don’t replace a review of primary sources with meta-analysis (a combined review of all available research on a topic w/ results averaged)

End Part 1Chapter 6

Sampling Sample – any group on which info is obtained Population – group that researcher is trying to represent

Population must be defined first; more closely defined, easier to do, but less generalizable

Study a subset of the population because it is cheaper, faster, easier, and if done right, get same results as a census (study of whole pop)

Accessible population – the group you are able to realistically generalize to…may differ from target population

Chapter 6 - Continued Sampling

(Random v. Nonrandom Sampling) Random – every population element has an equal and

independent chance to participate Uses names in a hat or table or random numbers Elimination of bias in selecting the sample is most important (meaning

the researcher does not influence who gets selected) Ensuring sufficient sample size is second most important

Nonrandom/purposive - troubles with representativeness/generalizing


(Random Sampling Methods) Simple random sampling

Names in a hat or table of random numbers--p.99 Larger samples more likely to represent pop. Any difference between population and sample is random and small (called

random sampling error) Stratified random sampling

Ensures small subgroups (strata) are represented Normally proportional to their part of pop. Break pop into strata, then randomly select w/in strata Multistage sampling (see p. 94)


(Random Sampling Methods, cont.) Cluster random sampling

Select groups as sample units rather than individuals REQUIRES a large number of groups/clusters Multistage sampling (see p. 94)

Systematic (Nth) sampling Considered random is list if randomly ordered or nonrandom if

systematic w/ random starting point Divide pop size by sample size to get N (ps/ss=N)


(Non-Random Sampling Methods) Systematic can be nonrandom if list is ordered Convenience sampling

Using group that is handy/available (or volunteers) Avoid, if possible, since tend not to be representative due to homogeneity of

groups Report large number of demographic factors to see likeliness of

representativeness Purposive sampling

Using personal judgment to select sample that should be representative (i.e., this faculty seems to represent all teachers) OR selecting those who are known to have needed info (interested in talking only to those in power)

Snowball is a type (used with hard to identify groups such as addicts)


Sample size affects accuracy of representation Larger sample means less chance of error Minimum is 30; upper limit is 1,000 (see table)

External validity – how well sample generalizes to the population Representative sample is required (not the same thing as variety in a sample) High participation rate is needed Multiple replications enhance generalization when nonrandom sampling is used Ecological generalization (gen to other settings/conditions, such as using a method

tested in math for English class)

Video 17Chapter 7

Instrumentation(Measurement)

Data – information researchers obtain about subjects Demographic data are characteristics of subjects such as age, gender, education

level, etc. Assessment data are scores on tests, observations, etc. (the device used to

measure these is called the measurement instrument) Key questions in data measurement/ instrumentation

Where and when will data be collected How often will data be collected Who will collect the data

Chapter 7 - Continued Instrumentation

Validity – measures what it is supposed to (accurate) Reliability – a measure that consistently gives same readings

(repeatable) Objectivity – absence of subjective judgments (need to

eliminate subjectivity in measuring) Usability of instruments

Consider ease of administration; time to administer; clarity of directions; ease of scoring; cost; reliability/validity data availability


(Classifying Data Collection Instruments) By the group providing the data

Researcher instruments (researchers observes student performance and records) Subject instruments (subjects record data about themselves, such as taking test) Others/Informants (3rd party reports about subjects such as teacher rates students)

By where instrument came from Preference is for existing ones (www.ericae.net, MMY Can develop your own (requires time, effort, skill, testing; see p. 125)

By response type Written response – preferred – objective tests, rating checklist Performance instruments – measure procedure, product


(Examples of Data Collection Instruments) Researcher Completed Instruments

Rating scales (mark a place on a continuum for example numeric rating 1=poor to 5= excellent)

Interview schedules (complete scales as interview takes place; use precoding; beware of dishonesty)

Tally sheets (for counting/recording frequency of behavior, remarks, activities, etc.)

Flow charts (to record interactions in a room) Anecdotal records (need to be specific and factual) Time/Motion logs (record what took place and when)


(Examples of Data Collection Instruments) Subject Completed Instruments

Questionnaires (question clarity to reader essential) Self checklists Attitude scales (Likert is one type, how much subject agrees/disagrees with

descriptive statements about a topic indicates a positive/negative attitude toward topic)

Semantic differential (good/bad; poor/excellent ratings) Personality profiles Achievement/Aptitude tests Performance tests Projective devices (Rorschach Ink Blot Test) Sociometric devises (peer ratings)

Chapter 7 - Continued

http://www.ericae.net/

Instrumentation Item Formats

Selection items or closed response (T/F; Yes/No; Right/Wrong; Multiple choice) Supply items or open ended (short answer; essay) Unobtrusive measures (no intrusion into event… usually direct observation and

recording) Types of Scores

Raw scores (initial score or count obtained…w/out context) Derived scores (raw scores translated to meaningful usage with standardized

process) Age/Grade equivalence; Percentile ranks; Standard scores (how far a score is from a given

reference point, i.e. z and T scores); Which to use depends on the purpose; usually standard scores used


Norm Referenced v. Criterion Referenced Tests Norm referenced scores give a score relative to a reference group (the

norm group) Criterion referenced scores determine if a criterion has been mastered

These are used to improve instruction since they indicate what students can or cannot do or do or do not know


(Measurement Scales) Nominal (in name only)

Numbers are only name tags, they have no mathematical value (gender: 1=male and 2= female OR race: 1= Blk, 2=Wht, 3=other)

Ordinal (in name, plus relative order) Numbers show relative position, but not quantity (grade level, finishing place in a race)

Interval (in name w/ order AND equal distance) Numbers show quantity in equal intervals, but an arbitrary zero (can have negative numbers;

degrees C or F) Ratio (in name, w/ order, eq. distance AND absolute zero)

Numbers show quantity with base of zero where zero means the construct is absent Higher levels more precise…collect data at highest level possible; some statistics

only work with higher level dataChapter 7 - Continued Instrumentation

(Preparing for Data Analysis) Scoring data – use exact same format for each test and

describe scoring method in text Tabulating and Coding – carefully transfer data from source

documents to computer Give each test an ID number

Any words must be coded with numerical values Report codes in text of research report

Video 18Chapter 8

Validity and Reliability(Quality of instruments is important)

Validity is most important aspect of measures Means accuracy, correctness, usefulness of instrument Validation is the process of collecting and analyzing evidence to

support inferences based on an instrument Test publishers usually give a statement of intended use as well as

evidence to support validity Reliability (consistency in scoring) is part of validity

Chapter 8 - Continued Validity and Reliability

(Three ways to establish validity) Content validity – is entire content of construct covered by test, are

important parts emphasized? Established by expert judgment Facial validity is part of this

Criterion validity – is there consistency between the instrument and some predicted or concurrent criterion?

Established by empirical evidence using validity coefficient (-1 to +1 scores) Correlate scores of the test with the criterion (SAT and GPA in college)


(Three ways to establish validity) Construct validity – Does the measure correctly identify those

with different levels of the construct Established with empirical evidence Correlate scores on test with known indicator of the construct

(prisoners score low on test of ethics) Validity problems come from systematic error (also known as

bias…something the research did wrong)


Reliability means that scores are consistent from one time measuring to the next

Can have a reliable measure that may not be valid Must be reliable to be valid

See p. 166, target shooting Errors of measurement – there is always some variation from

measure to measure Look at reliability coefficient to determine reliability


(Three ways to establish reliability) Test/Retest – give the same test (of enduring trait) to the same

people at two times and correlate the scores Equivalent forms – give two parallel forms of a test to the same

people and correlate scores Internal consistency – several methods

Split halves (score two halves of test and correlate scores) KR-21 and Cronbach Alpha – Correlate each item to overall score


Standard Error of Measurement – variations in measurement result in some error which is reported

Scoring Agreement – for subjective tests or direct observations (check of internal reliability)

Validity and Reliability should be addressed in all research (including qualitative)

Chapter 9 Internal Validity

(The IV really caused a change in the DV) Threats

Subject characteristics/selection bias – when subjects in study or in trmt/cont groups differ from each other (on age, gender, ability, etc)

Loss of subj/Mortality – must address question of whether those dropping out are different than those not

Location/Experiment variables – characteristics of the school, classroom, etc. may be interfere with the cause/effect relationship (keep constant for both groups)

Chapter 9 - Continued Internal Validity

(The IV really caused a change in the DV) Threats (continued)

Instrumentation – need constant application and scoring of instruments Instrument decay – when scoring varies due to fatique Data collector characteristics (age, gender, etc.) influence results) … use same

collector or randomly assn Data collector bias – unconscious or conscious distortion of data (use single or

double blind technique)5. Testing – pretest sensitization can occur or subjects can figure out

acceptable answersChapter 9 - Continued Internal Validity

(The IV really caused a change in the DV) Threats (continued)

History – an external occurrence that interferes with relationship between IV and DV

Maturation – changes in relationship between IV and DV due to passage of time/growth of subj

Attitudes of Subjects – Hawthorne or guinea pig effects, novelty effects and demoralization may occur

Regression (toward the mean) – Low scorers do better in subsequent tests; high scorers do worse

Implementation – experiment differs for groups

Chapter 9 - Continued Internal Validity

(The IV really caused a change in the DV) How to minimize threats:

Standardized conditions Collect and report demogr characteristics of subj Identify/report details of study Select a design to minimize effects (true randomized experimental

designs are best) See page 189, Fig. 9.10 for threats summary

End Part 2Chapter 13

Experimental Research Most powerful design Used to establish cause and effect by manipulating

(influencing) an IV (independent variable, aka treatment or experimental variable) to see its effect on a DV (dependent variable (aka criterion or outcome variable)

Goes beyond description and prediction

Chapter 13 - ContinuedExperimental Research(Characteristics of Experimental Research)

Comparison of groups (at least two groups of subjects, called treatment and control groups)

Manipulation of the IV (experimenter changes something for the treatment group that’s different than the control group)

Randomization (true experiments require random assignment into treatment/control conditions…after random selection of subjects to participate in study)

Assignment takes place at start of experiment Do not use already formed groups Groups should be equivalent (any differences due to chance) Randomization eliminates threats from extraneous variables Groups must be sufficiently large to be equivalent

Chapter 13 - ContinuedExperimental Research

(Control of Extraneous Variables) All extraneous variables must be controlled to eliminate threats to

validity/rival hypotheses Ensure groups are equivalent to begin using randomization Hold certain variables constant (i.e. age, IQ) or build them into to the design Use matching when necessary Use subjects as their own controls (treat same group first in control condition then

in treatment OR use pre-test/posttest on same group) Use analysis of covariance to statistically equate unequivalent groups


(Group Designs) Weak Designs

One Shot Case Study (X O) One group exposed to treatment then DV is measured No controls Example: Try new teaching method then see how students do on post test

One Group Pretest-Posttest Design (O X O) Adds a pretest but no control group

Static-Group Comparison Design X1 O Need control for diff subj characteristics X2 O

Static Group Pretest/Posttest Design (adds a pretest) Chapter 13 - Continued

Experimental Research(Group Designs)

True Experimental Designs Randomized Posttest Only Design R X1 O (random assign to trtmt/cntrl, then posttest) R O Randomized Pretest/Posttest Control Group R O X1 O (controls history, maturation, etc.) R O X2 O

Randomized Solomon 4-Group Design combines the above two (eliminates testing threat; problem is number of subjects needed)

Random Assignment w/ Matching Match pairs on factors that influence DV then randomly assign to treatment or control (subjects

limited by no match elimination) Statistical matching can be done using predicted scores


(Group Designs) Quasi Experimental Designs

Matching only – different from random assignment w/ matching (uses existing groups) Match subjects in trmt and cntrl groups on known extraneous variables If possible, use multiple groups, and randomly assign them

Counterbalanced – Each group exposed to all the same treatments but in different order

Time series – Repeated treatments and observations over a period of time (both before and after treatment)

Factoral designs – Multiple IVs or DVs investigated simultaneously (i.e. look for interactions between 2 IVs)


(Controlling Threats to Internal Validity) See Table 13.1, p. 284 for advantage/disadv. of each design To evaluate the likelihood of a threat to internal validity in experiments

ask:

What are the known extraneous factors? Do the groups differ on them? How were they controlled?

Researchers need tight control for experiments to be successful See pp. 288-289 questions to evaluate published article See evaluation of selected article on pp. 290-299

Chapter 15Correlation Research

(Predicting Outcomes Through Association) Correlational research involves study of existing relationships

between two variables Descriptive in nature Often a precursor to experimental research Positive correlation is Hi/Hi and Lo/Lo (coeff. +r) Negative correlation is Hi/Lo and Lo/Hi (-r)

Purpose is to explain relationships or to predict outcomes

Chapter 15 - continued-Correlation Research

(Predicting Outcomes Through Association) Explanatory studies examine relationship to identify possible

cause/effect Relationship might or MIGHT NOT mean causation For causation: 1) A before B; 2) A and B related; 3) Rule out other causes of B

(need experiment) Prediction studies identify predictors of criterions (i.e. HS GPA and

College GPA) Scatterplots with regression line/equation predicts scores numerically The stronger the correlation the better the prediction

Chapter 15 – continued Correlation Research

(Predicting Outcomes Through Association) Complex Correlation Techniques, such as multiple regression allow use

of several predictors for one criterion Coefficient of multiple correlation (R) gives strength of correlation between

predictors and criterion Coefficient of determination (r2) is amount x and y vary together Descriminant function analysis is for non-quantitative criterion (predict which

group someone will be in) Other techniques also used (factor analysis, path analysis, structural modeling)

Chapter 15 - continued Correlation Research

(Steps in the process) Problem selection – usually it’s are x and y related or how well does p

predict c Sample – random selection of at least 30 Measurement – need quantitative data Design/Procedures – need two measures on each subject Data collection – usually both measures close in time Data analysis – correlation coefficient, r, and plot (r is -1 to +1, and the

closer to plus or minus 1, the stronger the relationship)


(Interpreting Correlation Coefficients) General guideslines:

+ .75 to +1.0 Very strong relationship + .50 to +.75 Moderate strong relationship + .25 to +.50 Weak relationship + .00 to +.25 Low to no relationship

Need .5 or better for prediction of any use, and .65 for accurate predictions

Reliability coefficients should be .7 up Validity coefficients should be .5 up


(Threats to Internal Validity in Correlation Research) Remember correlation is not causation (lurking variables) Subject characteristics – may get different correl w/ different ability levels, gender,

etc. (can control with partial correlation) Location – testing conditions can impact results Instrumentation problems – helps to standardize instrument and data collection for

both groups Testing – pretest interference and sensitization possible Mortality – be careful if have large loss from one group being tested


(Questions to ask to avoid threats to internal validity) What factors could affect the variables being studied? Does any factor affect BOTH variables? (this is where threats occur) Figure a way to control any lurking variables

Chapter 16 Causal Comparative Research

(Ex Post Facto) Determines cause (or effect) that has occurred and looks for effect (or

cause) from it Start w/ differences in groups and examine them Examples: Difference in math abilities of male/female stu

No random assignment to treatment (it already occurred) Associational like correlation but primarily interested in cause/effect IV either cannot (ethnicity) or should not (smoking) be manipulated

Chapter 16 - continuedCausal Comparative Research

(Ex Post Facto) Often an alternative to experimental (faster and cheaper) Serious limitation is lack of control over threats to internal

validity Need to remember the cause may be the effect; they may only

be related and there is some other variable that is the cause (lurker)

Remember three canons of causationChapter 16 - continued

Causal Comparative (CC) Research(CC versus Correlational Research)

Both are associational (looking for relationship) Both are often prelude to experiments Neither involves manipulation of variables CC works with different groups; correl examines one group on

different variables Correlation is measured w/ coefficient while CC compares

means/medians/percents of group membersChapter 16 - continued

Causal Comparative (CC) Research(CC versus Experimental Research)

Both compare group scores of some type In experimental the IV is manipulated, but not in CC (already

took place) CC does not provide as strong evidence as experimental for

cause and effect

Chapter 16 - continuedCausal Comparative (CC) Research

(Steps in CC Research) Problem formation – identify phenomena and look for causes or consequences of it

Sometimes several alternate hypotheses investigated Sample – define (operationally) characteristics of study carefully, then select

individuals who possess Groups should be homogeneous in regard to several important variables (to control for them as

causes) then match control/exp groups on one or more variables (smoking study matched on 19 variables)

Instruments – use any type to compare the groups Design – basic CC involves 2 or more grps that differ on variable of interest (basic

design is one group possesses trait (athlete) other doesn’t compare DV (GPA)Chapter 16 - continued

Causal Comparative (CC) Research(Threats to Internal Validity in CC Research)

Subject characteristics – since don’t select subjects and form groups, there may be unidentified lurking variables

Can use matching to control for any identified differences, but limits samples size Can find or create homogeneous groups (for example compare only high GPA

students to other high GPA students) on attitudes toward x Statistical matching – adjusts posttest scores based on some initial difference

Other threats – location, instrument, history, maturation, loss of subjects can be concerns

Need to control as many as possible to eliminate alternate hypothesesChapter 16 - continued

Causal Comparative (CC) Research(Evaluating threats to Internal Validity in CC Research)

Questions to ask What factors are known to affect the variable being studied? What is the likelihood the comparison groups differ on these factors? How well did the design identify and control for these?

For example consider subject characteristics such as socioeconomic status, gender, ethnicity, job skills; mortality rates in groups; location (schools differ); instrument (differrent data collectors and/ or biases)

Data Analysis in CC – often compare means of groups; with 2 categorical use crosstabs (crossbreak tables) to compare percents by groups

Text gives example study

Chapter 17Survey Research

(Used to describe what people think/do/believe) Types

Cross sectional provide a snapshot in time

Longitudinal collect data at different points in time to study changes over time Trend study - random sample each year on same topic Cohort study - sample from same cohort members year after year Panel study - same individuals surveyed year after year (mortality a problem

over long time periods) Often surveys are the data collection instrument in correlation

(or cc/exp’l) studies

Chapter 17 - ContinuedSurvey Research

(Steps to conduct survey research) Define the problem

Needs to be important enough respondents will invest their time to complete it

Must be based on clear objectives Identify the target population

Defined by sample unit or unit of analysis Unit can be a person, school, classroom, district, etc.) Survey a sample or do a census of the population


(Steps to conduct survey research) Methods of data collection

Direct administration to a group (such as at a meeting) - good response rate, limited generaliz.

Mail survey (inexpensive way to get large amount of data from widespread pop) - lower response rates, not in-depth info, illiterate missed

Telephone survey (cheap/fast) - response rates higher due to encouragement (“I’m not selling…”); miss some pop members, interviewer bias possible

Personal interviews (face-to-face has good response rate but time and cost high) - lack anonymity, interviewer bias


(Steps to conduct survey research) Select the sample (randomly, but check to see respondents are

qualified to answer) Pilot test can indicate likely response rate and problems with data

collection or sample

Prepare instrument (questionnaire and interview schedule) Appearance important - look short and easy Clarity in questions is essential


(Steps to conduct survey research) Question types (same questions need to be asked of all

respondents) Closed ended (multiple choice) - easier to complete, score, analyze

Categories must be all inclusive, mutually exclusive Open ended - easy to write, hard to analyze and hard on respondents See examples p. 403

Chapter 10Descriptive Statistics

(Tools to summarize data) Descriptive statistics describe many scores with just one or two indices

(such as mean or median) Sample of a pop is described w/ indices called statistics Entire pop is described w/ indices called parameters

Types of data (words or numbers) Quantitative data – scales measure how much (test scores, amount of money

spent, etc. Interval, Ratio, and sometimes Ordinal, variables

Categorical data – total number of objects in a category (ethnicity, gender, etc.) Nominal and sometimes Ordinal, variables

Chapter 10 - ContinuedDescriptive Statistics

(Summarizing Quantitative Data) Frequency distributions or tables show the layout of the data

(see text example p. 201) Frequency polygons – shows where most scores are and how spread

out data are Pay attention to shape (positive, negative skews) Normal curves – smoothed polygons – most scores in the center, fewer in the

tails – many variables follow a normal shape (height, weight, age, etc.) Normal curves are the foundation for inferential statistics


(Summarizing Quantitative Data) Averages – measures of of central tendency

Three indices tell what is a typical score

Mode – most frequent score Median – middle score (50th percent) Mean – takes into account all scores

Which to use depends on what you are trying to show See example pp. 205/206

Spreads – measures of variation or dispersion Three indices tell how closely scores cluster together

Range (highest – lowest); a crude indicator of spread Standard deviation (average distance of each point from the mean)

Smaller SD means less spread out, larger one means more spread out Quartiles, percents, IQR, boxplots

SD and normal curves…68/95/99.7 rule


(Summarizing Quantitative Data) Standard scores and the normal curve

Standard scores use a common scale for all scores z scores are simplest – tell how far from the mean in SD units

Score on mean then z=0; score 1 SD above then z=1.0; 1SD below then z=-1.0, etc.

Use mean and SD to calculate z scores so you can compare apples/oranges (p. 210)

Z = any score – mean standard deviation


(Summarizing Quantitative Data) Probability based on z scores

All scores in normal distribution are equal to 100% A z-table gives percent of scores from any score to the mean (Appendix, pp. A-4/5) The probability for getting higher or lower than any given score can then be

calculated T-scores are often used because negative z scores awkward (all T-scores

are positive) Multiply z times 10, then add 50 (p. 212 Table 10.15) Standard test scores often given with T-scores and percents above/below the

given score Note…use z and T scores only with NORMAL distributions!


(Summarizing Quantitative Data) Correlation examines relationships between two quantitative

variables (interval/ratio data) Scatterplot shows the relationship visually

Use it to check for pattern in data (hi/hi or hi/lo?) If linear pattern, can us Pearson’s r coefficient

Use it to look for strength (scatteredness) Pay attention to outliers (p. 215/216 examples)

Correlation coefficient is a numerical indicator or strength of the relationship Pearson’s ppm (r) is for linear data (-1 to +1) Eta is for curved data


(Summarizing Categorical Data) Frequency tables

Give percents for ease in interpreting Crossbreak or crosstabulations for relationships (IV goes on the

side, then give row percents) Bar charts and pie charts used

Bars for ordered categories Pies for unordered categories

Chapter 11Inferential Statistics

Inferences about a population based on data from a sample Answers questions about how likely a sample is to represent

some parameter about a population Inferential test used depends on the level of data (quantitative

or categorical) Chapter 11 - Continued

Inferential Statistics(The logic of inferential statistics)

Sampling error Samples differ from their parent populations (no two samples are the same) Difference is called sampling error

Distribution of sampling means (the sampling distribution) Large collections of random samples of at least 30 follow a normal curve pattern Its mean (mean of means) is the mean of the population Its SD (SD of means) is the standard error of the mean (SEM)

Chapter 11 - ContinuedInferential Statistics

(The logic of inferential statistics)

Standard error of the mean (SEM) It’s the SD of the sampling distribution Since distribution is normal, then +1SEM has 68% of cases; +2SEM has 95%;

+3SEM has 99.7% Once we can estimate the mean and SD of the sampling distribution can determine how likely it

is that a particular sample mean came from that population i.e. Mean of pop=100, SD=10 and draw a sample with a mean of 110, yes could be from that

pop…but if draw a sample with a mean of 140, most likely NOT from that pop…since is +4SEM from the mean (almost zero probability)

Express means as z scores; a z score move that 2SEM is going to occur less than 5% of the time (2.5% each side)


(The logic of inferential statistics) Estimating the SEM

It is estimated from the SD of the sample, adjusted for sample size: SEM=SD/√n-1 Confidence Intervals (CI)

Use the SEM to indicate boundaries 95% of the time a pop mean will be within +2 SEM from the sample mean

(actually + 1.96 SEM) If sample mean IQ=85 (& SEM=2) then 95% of the time the pop mean IQ will be

85+1.96(2) or 85 +3.92 which is 81.08 to 88.92; 99% CI=79.84 to 90.16 Can be 95% confident that true pop mean is 81.08-88.92


(The logic of inferential statistics) Probability is a predicted occurrence such as 5 in 100 times (5% or .05)

In previous example, the probability of the population mean being outside the 95% CI (of 81.08 to 88.92) is 5%

Usually comparing more than one mean Examine difference in 2 sample means to see if how likely the difference in the

sample is to represent a true difference in the population…is it due to a true difference in the pop or only due to sampling error

The SEM of the difference between sample means, called the SED or standard error of the difference is used and w/in +1SED is 68%; +2 SED is 95%; +3 SED is 99%


(Hypothesis Testing) A hypothesis is a predicted relationship

Usually comparing means, proportions, or looking for correlations between groups

The heart of infer. stats…is the relationship found in the sample most likely due to a relationship in the pop, or just due to random sampling

error? The null hypothesis is stated and tested

THE NULL ALWAYS SAYS THERE IS NONO RELATIONSHIP OR DIFFERENCE!!!


(Hypothesis Testing) Research hypothesis is what you really think is going on; opposite of the

null Example of hypothesis test

H0 (null) is that mean1=mean2, meaning the mean scores are equal OR the difference between the mean scores is 0

The distribution for a difference of zero between the means is a normal curve centered on zero

As diff between means gets larger, meaning further from the center (in SEM units), the more likely it is to represent a true diff in the pop means

If the prob is .05 or less, reject null…called a statistically significant difference (some fields use .01 or .001)


(Hypothesis Testing Process) State the research hypothesis (Ha or Hr) State the null (H0) (Remember NO) Obtain the sample statistics (means, proportions, correlations) Determine the probability of getting the sample results just by chance if the null is

true Small probability (p<.05) means reject null; there is a significant difference (or

correlation) in pop. Large probability (p>.05) means do not reject; there is no significant difference (or

correl) in pop.Note: Just because finding is statistically significant does not mean it is a practical

difference (given a large enough sample most are significant)Chapter 11 - Continued

Inferential Statistics(Hypothesis Testing)

One tailed versus two tailed tests When literature strongly indicates the need for directional hypothesis

then do a one-tail In a one tail all 5% is on one side (2-tailed cutoff is 1.96SD while 1

tailed cutoff is 1.65) Type I (alpha) versus Type II error

See Figure 11.16, p. 240 Type I – reject true null; Type II – accept a false Inversely related errors


(Inference Techniques) Parametric tests (for quantitative I/R data from normal distributions of sample size

30+) t-tests compare means of two groups (can be independent or correlated/paired samples) ANOVA tests compare means of two or more groups (use post hoc) Correlations t-test (with computers just use significance of r)

Nonparametric tests (for categorical data and I/R from non-normal pops or small samples)

Mann Whitney U compares ranks of two groups Kruskal Wallis Oneway ANOVA compares ranks of two plus groups Chi-square test (compares proportions)

Power of tests – use parametrics and increase sample size

Chapter 12Statistics in Perspective

Approaches to research Either 2 or more groups compared OR variables in 1 group studied AND data are

either categorical or quantitative Comparing groups on quantitative data

Can compare freq distributions (histograms), m. of center, and m. of spread OR all three

Interpretation – improves with experience…need to know when something statistically significant is not practically significant

Calculate effect size - look at size of difference or delta Δ…if it is greater than .5, practically significant

Use infer. stats judicially paying attention to size of diff. and sample size and method it is based on

Chapter 12 - continuedStatistics in Perspective

Relating variables within group w/ quant data Scatterplot and correl coeff – examine plot carefully Beyond significance pay attn to size of r and especially to r-squared Examine how sample data collected

Comparing groups w/ categorical data Use freq and percent in crossbreak tables Look at summary stats carefully and pay attn to sample size

Relating variables within a group with categorical data – use one sample chi-square

Chapter 12 - continuedStatistics in Perspective

Recap

Use graphics and numbers Pay attention to outliers Pay attention to magnitude of differences Use inference tests for generalizing purposes and examine sampling Use multiple techniques and CIs

notes from frankel and wallen

Documents