chapter 01 teminology

116
Terminology 1

Upload: goldielou

Post on 27-Dec-2015

44 views

Category:

Documents


1 download

DESCRIPTION

Statistics Chapter 1 overview

TRANSCRIPT

Page 1: CHAPTER 01 Teminology

Terminology

1

Page 2: CHAPTER 01 Teminology

Introduction

2

Page 3: CHAPTER 01 Teminology

Statistics (as opposed to statistic) is the science

of gathering, organizing, analyzing, and

interpreting numerical and categorical

information.

“Statistics is the art of decision making in the

presence of uncertainty”

CPT

3

Page 4: CHAPTER 01 Teminology

Descriptive Statistics involve methods

of organizing, picturing, and

summarizing information from samples

or populations (Chapters 1-3)

Inferential Statistics involve methods of

using information form a sample to draw

conclusions regarding the population

(Chapters 7-11)

4

Page 5: CHAPTER 01 Teminology

5

Descriptive Statistics

Probability

Inferential

Statistics

Page 6: CHAPTER 01 Teminology

1. Choose a topic and identify the problem to be addressed.

2. Background research –past research including historical descriptive statistics and references.

3. Develop a conjecture or hypothesis. 4. Design an experiment 5. Gather additional information through

experimentation and observation. 6. Analyze the results and interpret the results –if

hypothesis is rejected, then repeat Step 3 thru 6. 7. Formulate conclusions and draw inferences

6

Page 7: CHAPTER 01 Teminology

A population, N, is a group of individual persons, objects, or items that one wishes to better understand certain characteristics about and from which samples are taken for statistical measurement.

The population data is the complete collection of information from all of the individuals or subjects of interest in a given study.

7

Page 8: CHAPTER 01 Teminology

A census is a survey of every individual in the

population and the information gathered is called

the population data.

Recall: the population data is the complete

collection of information from all of the

individuals or subjects of interest in a given

study.

8

Page 9: CHAPTER 01 Teminology

A sample, n, is a partial collection of

information for only some of the individuals or

subjects of interest in a given study

The sample size, n, of a sample is the number of

observations that constitute the sample. Whereas

the size of a population is denoted by the capital

letter N, the size of the sample is denoted by the

lowercase letter n.

9

Page 10: CHAPTER 01 Teminology

Descriptive Statistics involve methods of

organizing and summarizing information (data)

and presenting it numerically or visually

(graphically).

Stem-and-leaf, Frequency Tables, and

Contingency Tables

Bar Charts, Pie Charts, and Histograms

Scatter Plots

Among others…

10

Page 11: CHAPTER 01 Teminology

The distribution of the data is a list of all the

values recorded in a sample; that is, the observed

outcomes and their frequency.

Distributions can be given in tabular form as a

frequency or contingency table or illustrated

graphically in the form of a bar chart or

histogram.

11

Page 12: CHAPTER 01 Teminology

Characteristics of

Distribution

The mean, median, and mode

are central tendencies

Uniform distribution is

symmetric, not skewed

The range of information

between minimum and

maximum

Location

Shape

Spread

12

Page 13: CHAPTER 01 Teminology

Location within a distribution is a specific value

within the domain of the data such as the

extremes: minimum and maximum, central

tendencies, etc.

13

Page 14: CHAPTER 01 Teminology

Outliers are data within a distribution of the

data, but outside the overall pattern (cluster) of

the graph; that is, extreme values that can distort

the interpretation of the data by creating

misleading statistics.

14

Page 15: CHAPTER 01 Teminology

Common

Locations

Minimum and Maximum

Mean, Median, and Mode (among other weighted or trimmed means)

A value, xp%, such that p% of the observed values are to the left

p%=0%, 25%, 50%, 75%, 100%

Extremes

Central

Tendencies

Percentiles

Quartile

15

Page 16: CHAPTER 01 Teminology

Shape of a distribution describes the symmetry

or lack thereof (skewness).

Data that is symmetric exhibits balance and self-

similarity whereas skewness is a measure of the

asymmetry.

16

Page 17: CHAPTER 01 Teminology

Common Shapes

Equal frequencies – no mode

The mean equals the median equals the mode - looks like a

The above are symmetric

Left: mean < median Right: mean > median

Uniform

Bell

Shaped

Symmetric

Skewed

17

Page 18: CHAPTER 01 Teminology

Spread of a distribution is a measure which

indicates how the data values are distributed, a

measure of the dispersion or variability within a

group of values. Some appropriate measures

include:

Range (from minimum to maximum)

Variance (mean square error)

Deviation (square root of variance)

Where error=observed value – expected value

18

Page 19: CHAPTER 01 Teminology

Common Measures

Minimum and Maximum

Mean, Median, and Mode (among other weighted or trimmed means)

Range, Variance, and Standard Deviation

Count, Relative Frequency, Rate

Extremes

Central

Tendencies

Deviations

Frequency &

Proportions

19

Page 20: CHAPTER 01 Teminology

An extreme is a characteristic farthest removed

from the ordinary; common extremes are the

minimum, which is the least observed value in a

sample, and the maximum, the greatest observed

value in a sample.

20

Page 21: CHAPTER 01 Teminology

Measure of a characteristic which clusters around

a central value, using the data to estimate the

central tendency, called averages; there are

three common measures: the mode, median, and

mean.

21

Page 22: CHAPTER 01 Teminology

Common Central

Tendencies

Most frequently observed value A value such that 50% of the

observed values fall to the left Sum of all data divided by

number of data points (equal weights)

Average such that the weights are not equal

Average such that some of the weights are zero

Mode Median Mean Weighted Mean Trimmed Mean

22

Page 23: CHAPTER 01 Teminology

Common

Deviations

Maximum minus minimum

Q3 – Q1

Expected value of the

differences (error)-squared

Square root of the variance

Range IQR (Interquartile Range)

Variance Standard Deviation

23

Page 24: CHAPTER 01 Teminology

Common

Frequency &

Proportions

1,2,3,…

The number of times a given value

occurs (a count)

The ratio of the frequency of one value to the sample size

Part in ratio to the whole – the relative frequency

Count

Frequency

Relative

Frequency

Rate

24

Page 25: CHAPTER 01 Teminology

A statistical measure is sensitive if the

computed value changes readily even if a

single observed data value is different. Also, a

statistical method is sensitive if the decision

changes radically based on the assumptions

we made to develop it.

A statistical method is robust if the decision

is not strongly dependent on the assumptions.

Not formally defined in the text

25

Page 26: CHAPTER 01 Teminology

The degree of confidence represents the

proportion of times the statistical methodology

used captures the true state of nature.

This value will be denoted (1-)%, where is

the level of significance.

26

Page 27: CHAPTER 01 Teminology

An event is considered statistically significant if

its occurrence is unlikely to happen by chance.

This value will be denoted by %.

27

Page 28: CHAPTER 01 Teminology

Inferential Statistics involves methods of

analyzing and interpreting descriptive statistics

to draw conclusions regarding a particular

characteristic in the population with a certain

degree of assurance based on a preset level of

significance and specified assumptions.

28

Page 29: CHAPTER 01 Teminology

Hypothesis Testing is the use of a statistical

method in arguing for or against a hypothesized

value based on observed information and using

this information to make a decision regarding an

initial hypothesis and an alternative hypothesis.

Not formally defined in chapter 1

29

Page 30: CHAPTER 01 Teminology

An experiment is the method (procedure) that

we follow to obtain data or information.

An experiment design is the art of planning

and executing experiments designed to gather

information (data) from the population, N, in

such a way as to ensure the sample, n, is

representative of the population, N.

30

Page 31: CHAPTER 01 Teminology

Individuals are the people, places, or things

included in a study and for which information is

gathered. In medical research studies,

individuals are referred as the subjects in the

study.

31

Page 32: CHAPTER 01 Teminology

A variable is a distinct characteristic of an individual to be observed and measured. These observed data can be qualitative or quantitative. A qualitative (categorical) variable is a variable

that describes the individual by placing the individual into a category or group

A quantitative (numerical) variable is a variable that takes on a real value or numerical measurement for which sums, differences, and ratios have meaning.

32

Page 33: CHAPTER 01 Teminology

Regression is a statistical procedure used to

estimate the relationship among variables,

specifically between the primary (response)

variable of interest and all other variables.

33

Page 34: CHAPTER 01 Teminology

In a study, the response variable is the primary

variable of interest; that is, the objective in the

given study.

This variable is also referred to as the dependent

variable (although the relationship of

dependence or cause-and-effect is yet to be

determined).

34

Page 35: CHAPTER 01 Teminology

The explanatory variables are the extraneous variables that have been measured but are not the primary variable of interest; they are used to understand the behavior of the response (primary) variable. This variable is also referred to as the independent variable (although the relationship of dependent/independent has not yet been established).

35

Page 36: CHAPTER 01 Teminology

Lurking variable(s) are the unknown variables

that have not been measured; however, they do

contribute to the response (primary) variable and

are not included as an explanatory variable.

36

Page 37: CHAPTER 01 Teminology

Correlation is a measure of association between

a response variable and an explanatory variable.

Correlation measures the strength and direction

of a simple linear relationship, that is, a straight

line.

37

Page 38: CHAPTER 01 Teminology

Causation is more than a measure of association

between a response variable and an explanatory

variable. It also implies direct cause or

dependence.

Correlation between two events may be a

common response to a lurking variable.

38

Page 39: CHAPTER 01 Teminology

Confounding variables are the variables that

have been measured and are significantly

contributing variables; however, their

independent contributions to the subject response

are indistinguishable and are not deemed

significantly contributing in the larger model.

39

Page 40: CHAPTER 01 Teminology

Discrete/Continuous

40

Page 41: CHAPTER 01 Teminology

An instrument is any means by which

information is gathered or measured such as an

exam, survey, or other rulers such as a barometer,

thermometer, etc.

41

Page 42: CHAPTER 01 Teminology

A parameter is a numerical measure that describes the outlined characteristic of the population such as central tendencies (mean, median, mode, and proportion), spread (range, variance, and standard deviation), and shape (symmetric and skewed). In general, when a specific parameter is not specified, the lowercase Greek letter Theta () is used to denote a population parameter.

42

Page 43: CHAPTER 01 Teminology

Common

Parameters

Mean

Variance

Standard Deviation

Proportions

Correlation

43

p

2

Page 44: CHAPTER 01 Teminology

A sample survey is a survey of only some of the

individuals in the population and the information

gathered is called the sample data.

The number of individuals included in the

sample survey is called the sample size, n.

The sample data is a subset of the population

data often denoted: x1, x2,…. xn.

44

Page 45: CHAPTER 01 Teminology

A statistic is a numerical measure that yields an estimate of a population parameter. That is, a numerical measure that uses the data from the sample to estimate the outlined characteristic of the population. As opposed to Statistics - the study of how to gather, organize, analyze and interpret information.

45

Page 46: CHAPTER 01 Teminology

POPULATION IS TO SAMPLE AS CENSUS IS TO SAMPLE SURVEY

POPULATION IS TO SAMPLE AS PARAMETER IS TO STATISTIC

46

Page 47: CHAPTER 01 Teminology

Measure involves any standard of comparison, estimation, or judgment; property of an individual given a numerical value; a quantity, a count, a degree, a rate, or a proportion. In terms of data collections, the measured values are referred to as the outcomes or observed values. Two types of measure are discrete and continuous.

47

Page 48: CHAPTER 01 Teminology

A discrete measure is such that the set of

possible observed outcomes are separate,

distinct, and finite such as a count.

Discrete measures are such that the outcomes can

be enumerated: one, two, three, etc.

48

Page 49: CHAPTER 01 Teminology

Examples of

Discrete Measures

Number of children in a family tree –

depending on the number of generations

included in the tree, there can be either 1,

2, 3, …, but not 1.5 – nothing between 1

and 2, or 2 and 3, etc.

Count of whole beans – depending on the

number of pods included, there can be

either 1, 2, 3, …, but not 1.2 since the

count is restricted to the whole number

Frequency of blue-eyed men and green-

eyed women.

Number

Count

Frequency

49

Page 50: CHAPTER 01 Teminology

A continuous measure is such that the set of

possible observed outcomes are infinite and

uncountable.

Continuous measures are dense; that is, between

any two values (outcomes) there exist another

value (outcome) such as a mean or rate.

50

Page 51: CHAPTER 01 Teminology

Examples of

Continuous

Measures

Length of a road – it can measure 1 mile or 2

miles, and between these possible measures

exists 1.5 miles, 1.24 miles; in fact, between

any two values there exist other possible values

Height of a man – a man can be 5 feet tall or 6

feet tall, and between these potential values

exist 5.5 feet, 5.14 feet, etc. While we might

not have an instrument precise enough to

measure the 1/100th of a foot, this measure

exists

Age of a woman – between this moment and

the next, there is a continuous existence.

Between 1 yrs and 2 yrs, 1.8 yrs exist, etc.

Length

Height

Age

51

Page 52: CHAPTER 01 Teminology

Samplings

52

Page 53: CHAPTER 01 Teminology

53

Page 54: CHAPTER 01 Teminology

Validity refers to the degree of accuracy to which a study reflects the specific concept or characteristic that the analyst or researcher is attempting to measure. Internal validity is the degree to which one can draw valid conclusions about the causal effect between variables. External validity is the degree to which one can extend the findings that are relevant to subjects and settings outside those included in the experimental design.

54

Page 55: CHAPTER 01 Teminology

For example, when evaluating a class of 180 students from a single mass lecture of STA 2023, can this information be used to evaluate all students taking STA 2023 given there is more than one section taught by different instructors? Internal Validity – drawing conclusions about this specific subjects inside the study. External Validity – the ability to extend conclusions to subjects outside of the study.

55

Page 56: CHAPTER 01 Teminology

56

Page 57: CHAPTER 01 Teminology

57

Page 58: CHAPTER 01 Teminology

Bias is a consistent deviation of the statistics to

one side of the parameter.

LOW BIAS HIGH BIAS

58

Page 59: CHAPTER 01 Teminology

For example, when weighing out coffee to be

ground and brewed at a coffee shop, the

employee forgets to zero-out the scale with the

cup used to measure the coffee. This leads to the

coffee measured in the cup to be off by the

weight of the cup.

Solution: add the weight of the cup in coffee to

each cup.

59

Page 60: CHAPTER 01 Teminology

60

Page 61: CHAPTER 01 Teminology

61

Page 62: CHAPTER 01 Teminology

Variability measures the degree of dispersion within a given data set. Some common measures of dispersion include range, mean (average) deviation, standard deviation, variance, inter-quartile range, and mean difference. Variability can appear as gaps in the data when illustrated graphically.

62

Page 63: CHAPTER 01 Teminology

Reliable refers to the accuracy and precision of

the actual measuring instrument or procedure.

A reliable measure is a (precise) measurement

such that the random error is small.

63

Page 64: CHAPTER 01 Teminology

Valid (Accurate)

Reliable (Precise)

We like samples to

represent the population

and the measures taken to

represent the parameters

estimated. These statistics

need to be a valid measure,

accurately estimating the

parameter with low bias as

well as be reliable,

measured with such

precision as to have low

variability when estimating

the parameter using

statistics.

64

Page 65: CHAPTER 01 Teminology

ACCURACY (VALID MEASURE)

HITS THE TARGET’S “BULL’S EYE”

PRECISION (RELIABLE MEASURE) HITS THE SAME LOCATION REPEATEDLY

65

Page 66: CHAPTER 01 Teminology

ACCURATE INACCURATE

66

Page 67: CHAPTER 01 Teminology

PRECISE IMPRECISE

67

Page 68: CHAPTER 01 Teminology

PRECISE IMPRECISE

68

Page 69: CHAPTER 01 Teminology

Nominal, Ordinal, Interval, & Ratio

69

Page 70: CHAPTER 01 Teminology

Common

Levels of Measure

Data that consist of names, labels, or categories

Data that can be arranged in order; however, differences between data values cannot be determined or are meaningless

Data that can be ordered and differences have meaning, but ratios do not (equal distances, but no fixed zero)

Ordinal and interval, but ratios have meaning (equal distances and fixed zero)

Nominal

Ordinal

Interval

Ratio

70

Page 71: CHAPTER 01 Teminology

A nominal measure is one that measures a

characteristic of an individual by name only;

information in the form of categorical data where

the order of the categories is not relevant.

Names only – no calculations can be preformed.

71

Page 72: CHAPTER 01 Teminology

Examples of

Nominal Measures

Can be made ordinal if considered alphabetically, but

otherwise, this is a name only

There are relations among the digits that make up such

numbers, but there is not a true ordering, difference, or

ratio

While these codes can be “ordered” numerically, the

order is arbitrary and therefore not meaningful – the zip

code 33617 is not “less than” the zip code 33620 – the

only difference is geographical

Male/Female: these are clearly labels for which there is

no order other than “alphabetically”; however, it is

meaningless to argue “less than” or “greater than” in

general

Surnames

SSN

Zip Code

Gender

72

Page 73: CHAPTER 01 Teminology

An ordinal measure is one that measures a

characteristic of an individual by the rank order

(1st, 2nd, 3rd, etc.) of the entities measured or by

implied ordering such as worst, bad, good, great.

Ordering the measured outcomes.

73

Page 74: CHAPTER 01 Teminology

A simple ranking imposes an order on the

measured characteristic of an individual and the

set of natural numbers by defining a relationship

that establishes the position within a sequence of

outcomes "ranked higher than," "ranked lower

than," or "ranked equal to.“

Imposing an ordinal scale.

74

Page 75: CHAPTER 01 Teminology

A Likert scale establishes the hierarchy within a

sequence of outcomes.

For example, “how attractive is a person on a

scale from 1 to 10,” 1 meaning not very

attractive to a 10 which represents perfect

attraction.

75

Page 76: CHAPTER 01 Teminology

Examples of

Ordinal Measures

What is the best-selling flavor of ice cream?

A five-point scale by which to evaluate an

instructor: poor, unsatisfactory, satisfactory,

good, great

Due to inconsistencies found in “sizes”

between designers – a size “0” is smaller than a

size “2,” which is smaller than a “4,” but this does

not mean the difference between a “2” and a “4” is

the same as the difference between a “0” and a “2.”

Furthermore, a “4” is not twice as large as a “2”;

this ratio has no meaning.

Ranking

Likert

Scale

Dress Size

&

Shoe Size

76

Page 77: CHAPTER 01 Teminology

An interval measure is one that measures a

characteristic of an individual where differences

between measures have meaning; that is, the

distance between two adjacent units is the same

but there is not a meaning zero point. An interval

measure is such that sums and averages have

meaning; however, ratios do not have meaning.

Sums (differences) but not ratios.

77

Page 78: CHAPTER 01 Teminology

Examples of

Interval Measures

If your watch reads 12:05 and mine reads 12:07, then my watch reads a later time than yours; hence the measure is at least ordinal. However there is a 2-minute difference, therefore this measure is interval. It is not ratio since 12:07 in ratio to 12:05 has no meaning.

If the daytime temperature is 50°F in New York and 100°F in Miami, then it is 50°F hotter in Miami than it is in New York. While the ratio of 100°F to 50°F is 2, this measure has no meaning and is therefore an invalid measure. You can not say 100°F is “twice as hot” as 50°F.

Some may argue that degrees Kelvin, which has an “absolute zero,” is ratio; however, in general, temperature is interval.

Time of Day

Temperature

78

Page 79: CHAPTER 01 Teminology

A ratio measure is one that measures a

characteristic of an individual where not only do

differences between measures have meaning, but

ratios also have meaning. That is, a measure in

which any two adjoining values are the same

distance apart and there is a true zero point. Ratio

measures have fixed zeros; that is, an interval

measure with a true zero.

79

Page 80: CHAPTER 01 Teminology

Examples of Ratio

Measures

At 2:00, the measure is 2 hours past noon and at 4:00, the measure is 4 hours past noon, 4 hours is greater than 2 hours; hence at least ordinal. The difference between 4 hours and 2 hours is 2 hours, which has meaning; hence at least interval. Moreover, the ratio of 4 hours to 2 hours is 2, that is 4 hours is twice as much time as 2 hours; thus this measure is ratio.

If you are 6 feet tall and your child is 3 feet tall, then you are taller than your child (ordinal), you are 3 feet taller than your child (interval), and you are twice as tall as your child (ratio). Therefore, this measure is Ratio.

If you are 36 years old and your child is 12 years old, then you are older than your child (ordinal), you are 24 years older than your child (interval), and you are three times as old as your child (ratio). Therefore, this measure is Ratio.

Time Past

Noon

Height

Age

80

Page 81: CHAPTER 01 Teminology

Changing Level of

Measure

What is your yearly salary? (a continuous scale)

Interval–what is your income bracket? (a discrete scale) 0-9,999, 10,000-19,999, 20,000-29,999, 30,000-39,999, 40,000-49,999, 50,000-59,999, etc.?

Where the difference between intervals is 10,000

Ordinal–what is your tax bracket? (a discrete scale) 0-9,999, 10,000-39,999, 40,000-59,999, 60,000or more?

Where difference are not well-defined

Nominal–in what currency are you paid?

Dollar, Yen, Euro, etc. (ordinal if you consider exchange rates)

Ratio

Interval

Ordinal

Nominal

81

Page 82: CHAPTER 01 Teminology

SRS, Systematic, Cluster, Stratified, etc.

82

Page 83: CHAPTER 01 Teminology

Samples

Simple Random

Samples

Systematic

Cluster Samples

Stratified

Samples

Convenience

Samples

83

Page 84: CHAPTER 01 Teminology

A random sample is a sample of size n taken

from a population of size N in such a way that

each individual observed has an equally likely

chance of being selected.

84

Page 85: CHAPTER 01 Teminology

A simple random sample (SRS) is such that

(1) each individual has an equally likely chance

of being selected as well as

(2) all groups of size n have an equally likely

chance of being selected.

85

Page 86: CHAPTER 01 Teminology

Common Sampling

Schemes

Using a system to select

Using clusters of individuals

that are pre-existing

Using “clusters” of individuals

selected by a specified strata

Using individuals who are

conveniently surveyed

More than one stage of

sampling done in succession

Systematic

Cluster

Stratified

Convenience

(Volunteer

Response)

Multi-stage

86

Page 87: CHAPTER 01 Teminology

Systematic sampling is a sample such that every

kth individual or item is measured.

Every 3rd: 1, not 2, not 3, 4, not 5, not 6, 7

Every 5th: 1, 6, 11,16,… or 2, 7, 12, 17,…

or 5,10,15,20…. Etc.

87

Page 88: CHAPTER 01 Teminology

Cluster sampling is such that groups are

selected based on pre-existing groups that is

arbitrary to the individual and not based on any

characteristic of the individual.

In the country, by region

In the state, by zip code

In the state or nation, by area code

For example, in a state, randomly selecting five

counties and surveying 100 individual from each

88

Page 89: CHAPTER 01 Teminology

Stratified sampling is such that individuals are first grouped by specific characteristics such as gender and then samples are taken from each group or strata. Individuals grouped by gender Individuals grouped by age Individuals grouped by race For example, grouping individuals by gender, male/female, then selecting 100 individuals from each group

89

Page 90: CHAPTER 01 Teminology

Convenience sampling is such that individuals

are selected based upon ease of access. Such

sampling techniques are prone to bias. An

example of a convenience sampling is a

volunteer response.

Individuals as they passed by

Individuals willing to call in on a talk show

Individuals who agree to take online surveys

90

Page 91: CHAPTER 01 Teminology

Multistage sampling is such that more than one

sampling technique is employed in the gathering

of information.

First stratify by gender, then systematically

take every other individual in each group.

First cluster individuals by state, then poll

these regions using mailers which individuals

have the option to fill out at their convenience

91

Page 92: CHAPTER 01 Teminology

Too Regular

Implausible Numbers Inconsistencies

Missing Information

Non-Adherers

Non-sampling Error

Hidden Agenda

Hidden Bias

Survey Error

Under-coverage

Incorrect Arithmetic

92

Page 93: CHAPTER 01 Teminology

Control, Randomization, Replication, & Enough Information

93

Page 94: CHAPTER 01 Teminology

An observational study is an experiment

designed to observe without interference from

the observer in that every effort is made not to

sway the subject response or lead a subject in

their response.

Do not sway individuals!

94

Page 95: CHAPTER 01 Teminology

Common

Observational

Studies

Historical data (past)

Single point in time (present)

Data gathered over an extended

period of time (future)

Retrospective

studies

Cross

Sectional

Prospective

studies

(Longitudinal)

95

Page 96: CHAPTER 01 Teminology

An experimental study is an experiment designed to be observed with interference from the observer in that specific treatments are applied to the individuals, in an effort to measure differences in the subject response. Note: the treatments used in an experiment are intended to sway the outcome of the subject response. Subject Treatment Response (Outcome)

96

Page 97: CHAPTER 01 Teminology

A treatment is any condition set forth that is

applied to the individual or subject in an effort to

determine differences among a variety of

treatments as compared to each other or a control

group.

97

Page 98: CHAPTER 01 Teminology

A control group is a group created for sake of

comparison. This group can be one of the

treatment groups or a group that receives a false

treatment called a placebo.

Experimental Group: Subject Treatment Response (Outcome)

Control Group: Subject No Treatment (placebo) or Secondary Treatment Response (Outcome)

98

Page 99: CHAPTER 01 Teminology

The placebo effect occurs when a subject

receives a false treatment (such as a sugar pill) or

no treatment, but (incorrectly) believes he or she

is in fact receiving treatment and responds

favorably.

99

Page 100: CHAPTER 01 Teminology

In an experimental design, a block is a group of

individuals stratified based on a similar

characteristic and given treatments.

A block design is an experimental design in

which individuals or subjects are grouped into

categories or blocks and then test blocks are

treated as experimental units given different

treatments.

100

Page 101: CHAPTER 01 Teminology

A randomized-block design is an experimental

design in which individual subjects are matched

based on a specific variable. The subjects are

then put into blocks of the same size as the

number of treatments and then each block is

assigned to different treatment groups randomly.

101

Page 102: CHAPTER 01 Teminology

A (single) blind experiment is an experiment in

which individual subjects do not know the

treatment they receive; however, the researcher is

aware.

A double blind experiment is an experiment in

which neither the individual subjects nor the

researcher are aware of who received what

treatment.

102

Page 103: CHAPTER 01 Teminology

Principles of

Experimental

Design

A comparative or control group

Selected at random

To verify validity and reliability

More important in inferential

statistics and not so much in

descriptive statistics

Control

Randomization

Replication

Enough

Information

103

Page 104: CHAPTER 01 Teminology

Stages of Sampling

Define population of concern

The set of variables to be measured

Systematic, Cluster, Stratified, etc.

Large Enough n (compared to N)

Implement sampling plan (ED)

Action of data collection

Population Sampling Frame Sampling Method Sampling Size (n) Experimental Design (ED) Sampling

104

Page 105: CHAPTER 01 Teminology

Medical Trials and Simulations

105

Page 106: CHAPTER 01 Teminology

Medical Trials

Internal Review Board

Independent Ethics Committee

Ethical Review board

Requires that the individual (1) be

informed and (2) give consent

IRB

IEC

ERB

Informed

Consent

106

Page 107: CHAPTER 01 Teminology

Anonymity is when no personal information is taken, a coding system is in place to allow the subject to get the information regarding a survey without giving out any personal information; that is, the information is not personally identifiable. Confidentiality is when personal information is given, but not shared. Only the statistical summaries are made available to other organizations or persons involved in the study.

107

Page 108: CHAPTER 01 Teminology

Informed consent is when the individual

person is both informed of the ramifications

involved in the study and gives consent to

participate in the knowledge of such things as

side effects.

108

Page 109: CHAPTER 01 Teminology

Simulation is the imitation of a natural

process using general characteristics or

behaviors in an effort to mimic or model the

natural system.

“A simulation is only as good as the underlying

analytical model"

CPT

Can be used to verify statistical methods.

109

Page 110: CHAPTER 01 Teminology

Examples of

Simulation

ONE POSSIBILITY:

Let evens represent a head and

odds represent a tail.

Hence the sequence

1,5,4,6,5

would represent

T,T,H,H,T

Use a fair dice to simulate

the tossing of a fair coin.

110

Page 111: CHAPTER 01 Teminology

Random digit chart is the table of digits

selected at random and placed in a table in

Appendix B which can be used to simulate or

sample data.

07892632401926795457

111

Page 112: CHAPTER 01 Teminology

Examples of

Simulation using

Random Digits

Let 0-5 represent a boy and

6-9 represent a girl; hence,

the sequence of random numbers

078

would simulate the sequence of

children: boy, girl, girl.

A man has a 60%

chance of having a

boy and a 40%

change of having a

girl, use the random

digit chart to

simulate the birth

order of three

children

Random digits: 07892632401926795457

112

Page 113: CHAPTER 01 Teminology

Randomization or random charts can be used to sample or re-sample the data.

For example, if there are 100 data points available and we only need 30, then we can randomly select this sample by enumerating the data and using the random chart to select the required number with or without replacement. With replacement, we can resample 200 times even though there are only half this many data points to start – this technique is called bootstrapping.

113

Page 114: CHAPTER 01 Teminology

Examples of

Sampling using

Random Digits

Let: 0 represent A, 1 -B, 2 -C, 3 -D, 4 -E, 5 –F, 6 -G, 7 -H, 8 -I and 9 -J. Using the random set of digits 9263 generate a random committee as follows:

9 J 2 C 6 G 3 D

A committee of four

is to be selected from

a group of ten

individuals: A, B, C,

D, E, F, G, H, I, and

J. Using the random

set of digits

07892632401926795457

generate a random

committee. Explain.

114

Page 115: CHAPTER 01 Teminology

115

Descriptive Statistics vs.

Inferential Statistics

Population vs. Sample

N vs. n

Census vs. Sample Survey

Representative Samples

Sampling Techniques

Simulations

Re-sampling

Page 116: CHAPTER 01 Teminology

Statistical

Perspective

Biologist have

microscopes

Physicist have

telescopes

Statisticians have

kaleidoscopes

116