basic statistical concepts and methods

122
- - Ahmed Refat ZU Basic Statistical Concepts and Methods Ahmed-Refat AG Refat Ahmed-Refat AG Refat FOM-ZU FOM-ZU

Upload: ahmed-refat

Post on 27-Jan-2015

187 views

Category:

Education


17 download

DESCRIPTION

Statistics is the science of dealing with numbers.  It is used for collection, summarization, presentation and analysis of data. Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).

TRANSCRIPT

Page 1: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Basic Statistical Concepts and Methods

Ahmed-Refat AG RefatAhmed-Refat AG Refat

FOM-ZUFOM-ZU

Page 2: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Definition of Statistics

Statistics is the science of dealing with numbers.

 It is used for collection, summarization,

presentation and analysis of data.

Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).

Page 3: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Uses of medical statistics

Medical statistics are used in1- Planning, monitoring and evaluating community

health care programs.2- Epidemiological research studies.3-  Diagnosis of community health problems.4-  Comparison of health status and diseases in different

countries and in one country over years. 5-  To form standards for the different biological

measurements as weight, height.6- To differentiate between diseased and normal groups.

Page 4: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Types of data

Any aspect of an individual that is measured, is called

variable. Variables are either

1-Quantitative or 2-Qualitative.

1-    Quantitative data: it is numerical data. Discrete data: are usually whole numbers, such as

number of cases of certain disease, number of hospital beds (no decimal fraction).

Continuous data: it implies the measurement on a continuous scale e.g. height, weight, age (a decimal fraction can be present).

Page 5: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

   1- Quantitative data

.    Quantitative data: it is numerical data.

Tow Types A- Discrete data: are usually whole numbers, such

as number of cases of certain disease, number of hospital beds (no decimal fraction).

B- Continuous data: it implies the measurement on a continuous scale e.g. height, weight, age (a decimal fraction can be present).

Page 6: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

2- Qualitative data

   Qualitative data: It is non numerical data and is subdivided into Two Types:

  A- Categorical : data are purely descriptive and imply no ordering of any kind such as sex, area of residence.

  B- Ordinal data: are those which imply some kind of ordering like

-         Level of education: -         Socio-economic status: -         Degree of severity of disease:

Page 7: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Presentation Of Data

The first step in statistical analysis is to present data in an easy way to be understood.

The two basic ways for data presentation are:

1. Tabular presentation.

2. Graphical presentation

Page 8: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Tabulation

Some rules for the construction tables: 1- The table must be self-explanatory. 2- Title: written at the top of table to define precisely the content, the place and the time.3- Clear heading of the columns and rows and units of measurements 4- The size of the table depends on the number of classes. Usually lie between 2 and 10 rows or classes. Its selection depends on the form of data and the requirement of the distribution. Too small may obscure some information and too long will not differ from raw data.

Page 9: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Types of tables

For Qualitative data, draw a simple table eg., List Table : count the number of observations ( frequencies) in each category.

For Quantitative data, we have to form a frequency distribution Table

List tables (2 columns- one value for each measured variable)

Frequency Distribution Tables

Page 10: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Types of tables

:List: A table consisting of two  columns, the first giving an identification of the observational unit and the second giving the value of variable for that unit.Example : number of patients in each hospital department are

Medicine 100 patients Surgery 80 “ ENT 28 “

Ophthalmology 30 “

Page 11: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

FDTs are used for presentation of qualitative ( and quantitative Discrete) data,

By recording the number of observations in each category.

These counts are called frequencies.

…………………………………….

No Classes ….. No Intervals

Page 12: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

FDT for Quantitative Continuous Data consists of a series of classes (intervals) together with the number of observations ( frequency) whose values fall within the interval of each class.

Frequency Distribution tables

Page 13: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

EXAMPLE (1) Assume we have a group of 20 individuals whose blood groups were as followed : A , AB, AB, O, B, A, A, B, B, AB, O, AB, AB, A, B, B, B, A, O, A. We want to present these data by table.

????? Type of data >>>>>>……

Page 14: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

How to Construct a Frequency Distribution

tablesFour Steps

Title, Table, No , %1- Put a title

2-  Draw Columns & Rows

3- Enumerate the individuals in each category

4- 4- Calculate The relative frequency (%)Calculate The relative frequency (%)

Page 15: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

How to Construct a Frequency Distribution

tablesFour Steps

1- Put a title eg.,

Distribution of the studied individuals according to their blood group.

2-  Draw a table (Columns & Rows),First column > Studied Variable“ Blood Group”, 2nd column heading >“Frequency-Number”

3rd column heading > “ Percentage %”

Page 16: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

3- Enumerate the individuals in each blood group , i.e. individuals with blood group A are 6 and those with blood group B are 6 , AB are 5 and blood group

O are 3.

Make sure that the total number of individuals in all blood groups is 20 (the number of the studied group).

Page 17: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

4- Calculate The relative frequency 4- Calculate The relative frequency (%)(%) of each blood group by dividing the frequency of that group over the total number of individuals and multiplied by 100 i.e. the percentage of group A = 6 / 20 x 100, and the same for group AB = 5 / 20 x 100 and group O = 3 / 20 x 100. The final

table will be :

Page 18: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables What is Your Conclusion?

Page 19: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

We can conclude from this table that blood groups A & B are the most common groups and the rarest is group O (depending on the percentage of each group).

So presenting data in table is beneficial in deducing facts and simplify information than raw data.

Page 20: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

EXAMPLE (3) : The Following data are Systolic Blood Pressure measurements (mmHg) of 30 patients with hypertension. Present these data in frequency table:

150, 155, 160, 154, 162, 170, 165, 155, 190, 186, 180, 178, 195, 200, 180,156, 173, 188, 173, 189, 190, 177, 186,

177, 174, 155, 164, 163, 172, 160.

??????? Type of Data

Page 21: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

Four Steps 1- Put a title eg.,

Frequency distribution of blood pressure measurements (mmHg) among a group of

hypertensive patients. 2-  Draw a table (Columns & Rows),

First column > Studied Variable“ Blood Pressure-mm Hg”,

2nd column heading >“Frequency-Number”

3rd column heading > “ Percentage %”

Page 22: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

3-In the first column we have to classify blood pressure into categories or classes because we have a large sample (N=30)

and the measured variable is of continuous type (not discrete as in the previous

examples).

Page 23: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

construction of classes Calculate the Range of observation: subtract the lowest value of blood pressures from the highest value

(the highest was 200 and the lowest was 150) the difference is 50.

Determine the number of classes and the width class intervals Let class interval be 10 , so we will have 50/10 = 5 classes. Enumerate the Frequency By Tally MethodsCalculate the Exact Frequncy & Relative frequency

Page 24: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Frequency Distribution tables

construction of classes Determine the the number of classes You want to display ( not too few ~2 and too frequent >8. it is a matter of trial and sense !!!Let class interval= 10 mmHg , we will have 5 classes. If we choose 5 mmHg as a class interval-width we will obtain 10 classes (too long table).

We must maintain constant width for all intervals. Choose the upper and lower limits of the class start with the lowest value i.e 150 List the intervals in order every 10

Page 25: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Page 26: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

2-Graphical PresentationThe diagram should be:

  Simple

Easy to understand

Save a lot of words

Self explanatory

Has a clear title indicating its content

Fully labeledThe y axis (vertical) is usually used for frequency

Page 27: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

2-Graphical Presentation

Graphic presentations used to illustrate and clarify information. Tables are essential in presentation of scientific data and diagrams are complementary to summarize these tables in an easy, attractive and simple way.

Page 28: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart

It is used for presenting discrete or qualitative data. It represent the measured value (or %) by separated rectangles of constant width and its lengths proportional to the frequencyType:

>>>Simple , >>> Multiple, >>>Components

Page 29: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart- Simple

Mean maternal age of three studied groups

24

24.5

25

25.5

26

26.5

27

group I group II group III

The studied groups

Mea

n ag

e in

yea

rs

Page 30: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart

Multiple bar chart: Each observation has more than one value represented, by a group of bars. Percentage of males and females in different countries, percentage of deaths from heart diseases in old and young age, mode of delivery (cesarean or vaginal) in different female age groups.

Page 31: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart-MultipleMultiple bar chart:

Cancer Anemia

Males

Females

Page 32: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart

Component bar chart : subdivision of a single bar to indicate the composition of the total divided into sections according to their relative proportion.

Page 33: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart

Component bar chart : For example two countries are compared in their socio-economic standard of living, each bar represent one country, the height of the bar is 100, it is divided horizontally into 3 components (low, moderate and high classes) of socio-economic classes (SE), each class is represented by different color or shape.

Page 34: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart- Component

0%

20%

40%

60%

80%

100%

perc

enta

ge o

f pop

ulat

ion

Egypt USA

Comparison between Egypt and USA in socio-economic standard of living

high

moderate

low

Page 35: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 2-Pie diagram:

Consist of a circle whose area represents the total frequency (100%) which is divided into segments.

Each segment represents a proportional composition of the total frequency.

Page 36: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 2-Pie diagram:

Percentage of causes of child death in Egypt

diarrhea50%

chest infection30%

congenital10%

accident10%

Page 37: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 3- Histogram:

It is very similar to the bar chart with the difference that the rectangles or bars are adherent (without gaps).

It is used for presenting class frequency table (continuous data).

Each bar represents a class and its height represents the frequency (number of cases), its width represent the class interval.

Page 38: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 3- Histogram:

Distribution of studied group according to their height

0

5

10

15

20

25

30

100- 110- 120- 130- 140- 150-

height in cm

num

ber

of in

divi

dual

s

Page 39: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 4 -Frequency Polygon

Derived from a histogram by connecting the mid points of the tops of the rectangles in the histogram.The line connecting the centers of histogram rectangles is called frequency polygon. We can draw polygon without rectangles so we will get simpler form of line graph.

A special type of frequency polygon is the Normal Distribution Curve.

Page 40: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 5 - Scatter diagram

- It is useful to represent the relationship between two numeric measurements, each observation being represented by a point corresponding to its value on each axis

Page 41: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

This scatter diagram showed a positive or direct relationship between NAG and

albumin/creatinine among diabetic patients

Correlation between NAG and albumin creatinine ratio in group of early diabetics

05

101520253035

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

albumin creatinine ratio

NA

G

Page 42: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

In negative correlation, the points will be scattered in downward direction, meaning that the relation between the two studied measurements is controversial i.e. if one measure increases the other decreases. As shown in the following graph

Correlation between Doppler velocimetry (RI) and baby birth weight

0

0.2

0.4

0.6

0.8

1

1.5 2 2.5 3 3.5 4 4.5

baby weight in kgR

I

Page 43: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Graphical Presentation 6- Line graph:

it is diagram showing the relationship between two numeric variables (as the scatter) but the points are joined together to form a line (either broken line or

smooth curve)

Changes in body temperature of a patient after use of antibiotic

36

36.5

37

37.5

38

38.5

39

39.5

1 2 2 4 5 6 7

time in hours

tem

pe

ratu

re

Page 44: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Normal Distribution Curve

Page 45: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Normal Distribution curve

NDC is a Graphical Presentation <Frequency Polygon> of any Quantitative Biologic Variables

The Normal Distribution Curve is the frequency polygon of a quantitative variable measured in large number. It is a form of presentation of frequency distribution of biologic variables such as weights, heights, hemoglobin level and blood pressure or any continuous data.

It occupies a major role in the techniques of statistical analysis.

Page 46: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Page 47: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Characteristics of Normal Distribution curve

1-  It is bell shaped, continuous curve.2- It is symmetrical i.e. can be divided into two

equal halves vertically.3- The tails never touch the base line but

extended to infinity in either direction.4- The mean, median and mode values coincide5- It is described by two parameters: arithmetic

mean determine the location of the center of the curve and standard deviation represents the scatter around the mean.

Page 48: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Areas under the normal curve

X ± 1 SD = 68% of the area on each side of the mean.

X ± 2 SD = 95% of area on each side of the mean.

X ± 3 SD = 99% of area on each side of the mean.

Page 49: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Skewed data

If we represent a collected data by a frequency polygon graph and the

resulted curve does not simulate the normal distribution curve (with all its characteristics)

then these data are not normally distributed

Page 50: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Causes of Skewed CurveNot Normally Distributed Data

The curve may be skewed to the right or to the left side

This is because The data collected are from:

1. certain heterogeneous group

2. or from diseased or abnormal population

therefore the results obtained from these data can not be applied or generalized on the whole population.

Page 51: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

NDC can be used in distinguishing between normal from abnormal measurements.

Example:If we have NDC for hemoglobin levels for a

population of normal adult males with mean ± SD = 11 ±1.5

If we obtain a hemoglobin reading for an individual = 8.1 and we want to know if he/she is normal or

anemic. If this reading lies within the area under the curve at

95% of normal (i.e. mean ± 2 SD)he /she will be considered normal. If his reading is less

then he is anemic.

Page 52: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

The normal range for hemoglobin in this example will be: the higher level of hemoglobin: 11 + 2 ( 1.5 ) =14. The lower hemoglobin level 11 – 2 ( 1.5 ) = 8.

i.e the normal range of hemoglobin of adult males is from 8 to 14.

our sample (8.1 ) our sample (8.1 ) lies within the 95% of his population.

therefore this individual is normalis normal because his reading lies within the 95% of his population.

Page 53: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Data Summarization

To summarize data, we need to use one or two parameters that can describe the data.

1. Measures of Central tendency which describes the center of the data

2. and the Measures of Dispersion, which show how the data are scattered around its center.

Page 54: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Measures of central tendency

Variable usually has a point (center) around which the observed values lie. These averages are also called measures of central tendency. The three most commonly used averages are:

1. The arithmetic mean:

2. The Median

3. The Mode

Page 55: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

1- The arithmetic mean:

the sum of observation divided by the number of observations:

x = ∑ x

n

Where : x = mean

∑ denotes the (sum of)

x the values of observation

n the number of observation

Page 56: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Example: In a study the age of 5 students were: 12 , 15, 10, 17, 13

Mean = sum of observations / number of observations

Then the mean X = (12 + 15 + 10 + 17 + 13) / 5 =13.4 years

1- The arithmetic mean:

Page 57: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Calculation of Mean For frequency Distribution Data

In case of frequency distribution data we calculate the mean by this equation:

x = ∑ fx nwhere f = frequency

for example : we want to calculate the mean incubation period of this group.

Page 58: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Calculation of Mean For frequency Distribution Data

Page 59: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

If data is presented in frequency table with class intervals we calculate mean by the same equation summation of f x1 /n , x1 denotes the midpoint of class interval.

Example : calculate the mean of blood pressure of the following group :

Calculation of Mean For frequency Distribution Data

with class intervals

Page 60: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Page 61: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Page 62: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

2- Median

It is the middle observation in a series of observation after arranging them in an ascending or descending manner.

The rank of median for is (n + 1)/2 if

the number of observation is odd and n/2 if the number is even

Page 63: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

   Calculate the median of the following data 5, 6, 8, 9, 11 n = 5~ Odd!!

-The rank of the median = n + 1 / 2 i.e. (5+ 1)/ 2 = 3

The median is the third value in these groups when data are arranged in ascending (or descending) manner.

-         So the median is 8 (the third value)

2- Median

Page 64: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

-  If the number of observation is even, the median will be calculated as follows:e.g. 5, 6, 8, 9 n = 4

- The rank of median = n / 2 i.e. 4 / 2 = 2 .The median is the second value of that group. If data are arranged ascendingly then the median will be 6 and if arranged descendingly the median will be 8 therefore the median will be the mean of both observations i.e. (6 + 8)/2 =7.

2- Median

Page 65: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

For simplicity we can apply the same equation used for odd numbers i.e. n + 1 / 2. The median rank will be 4 + 1 /2 = 2 ½ i.e. the median will be the second and the third values i.e. 6 and 8, take their mean = 7.

2- Median

Page 66: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

The most frequent occurring value in the data is the mode and is calculated as follows:

Example: 5, 6, 7, 5, 10. The mode in this data is 5 since number 5 is repeated twice. Sometimes, there is more than one mode and sometimes there is no mode especially in small set of observations.

3- Mode

Page 67: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Example : 20 , 18 , 14, 20, 13, 14, 30, 19. There are two modes 14 and 20.

Example : 300, 280 , 130, 125 , 240 , 270 . Has no mode.

Unimodal Bimodal Nomodal

3- Mode

Page 68: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Advantages and disadvantages

of the measures of central Tendency:

- Mean: is the preferred CTM since it takes into account each individual observation but its main disadvantage is that it is affected by the extreme valus of observations.

Page 69: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Median: it is a useful descriptive measure if there are one or two extremely high or low values.

-Mode: is seldom used.

Advantages and disadvantages

of the measures of central Tendency:

Page 70: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Measures of Dispersion

The measure of dispersion describes the degree of variations or scatter or dispersion of the data around its central values: (dispersion = variation = spread = scatter).

1. Range - R2. Variance -V3. Standard Deviation - SD4. Coefficient of Variation -COV

Page 71: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

  1- Range:

is the difference between the largest and smallest values. is the simplest measure of variation.

disadvantages, it is based only on two of the observations and gives no idea of how the other observations are arranged between these two.

Also, it tends to be large when the size of the sample increases

Page 72: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

If we want to get the average of differences between the mean and each observation in the data,we have to reduce each value from the mean

and then sum these differences and divide it by the number of observation. V

= ∑ (mean – xi) / n

  2- Variance

Page 73: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Variance V = ∑ (mean – x) / n

The value of this equation will be equal to zero

because the differences between each value and the mean will have negative and positive signs that will equalize zero on algebraic summation.

  2- Variance

Page 74: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

To overcome this zero we square the difference between the mean and each value so the sign will be always positive . Thus we get:

V = ∑ (mean – x)2 / n - 1

  2- Variance

Page 75: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

3- Standard Deviation SD

The main disadvantage of the variance is that it is the square of the units used. So, it is more convenient to express the variation in the original units by taking the square root of the variance. This is called the standard deviation (SD). Therefore SD = √ V

i.e. SD = √ ∑ (mean – x)2 / n - 1

Page 76: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

The coefficient of variation expresses the standard deviation as a percentage of the sample mean.

C. V = SD / mean * 100

C.V is useful when, we are interested in the relative size of the variability in the data. Example : if we have observations 5, 7, 10, 12 and 16. Their mean will be 50/5=10. SD = √ (25+9 +0 + 4 + 36 ) / (5-1) = √ 74 / 4 = 4.3C.V. = 4.3 / 10 x 100 = 43%

4- Coefficient of variation CoV

Page 77: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Example

Calculate the mean, variance, SD and CV From the following measurements

5, 7, 10, 12 and 16.

Mean= 5+7+10+12+16/5=10.

SD = √ (25+9 +0 + 4 + 36 ) / (5-1) =

√ 74 / 4 = 4.3

C.V. = 4.3 / 10 x 100 = 43%

Page 78: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Another observations are 2, 2, 5, 10, and 11. Their mean = 30 / 5 = 6 SD = √ (16 + 16 + 1 + 16 + 25)/(5 –1) = √ 74 / 4 = 4.3 C.V = 4.3 /6 x 100 = 71.6 %Both observations have the same SD but they are different in C.V. because data in the first group is homogenous (so C.V. is not high), while data in the second observations is heterogenous (so C.V. is high). 

Example

Page 79: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Example: In a study where age was recorded the following were the observed values: 6, 8, 9, 7, 6. and the number of observations were 5.Calculate the mean, SD and range, mode and median.-         The mean = sum of observation / their number

Example

Page 80: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

The variance = Sum of the squared differences (mean minus observation) / number of observations. (7.2 – 6)2 + (7.2 – 8)2 + (7.2 – 9)2 + (7.2 – 7)2 + (7.2 – 6)2 / 5 – 1. which is equal to (1.2)2 + (- 0.8)2 + (- 1.8) 2 +(0.2)2 + (1.2)2 / 4 = 1.7

- So the variance = 1.7

Examples

Page 81: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

- The S.D. = √ 1.7 = 1.3

-         Range = 9 – 6 = 3

-         The mode is 6

-         The median is : first we have to arrange data ascendingly i.e. 6 – 6 – 7 – 8 – 9.

The rank of median = n + 1 / 2 i.e. 5 + 1 / 2 = 3 therefore the median is the third value i.e. median = 7

Examples

Page 82: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Inferential statistics

Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample.

Page 83: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Inferential statisticsHypothesis Testing

In hypothesis testing we want to find out whether the observed variation among sampling is explained by chance alone ???? (i.e., the chance of random sampling

variations ), or due to a real difference ???? between groups.

Page 84: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Hypothesis Testing

It involves conducting a test of statistical significance quantifying the chance of

random sampling variations that may account for observed results. In hypotheses testing, we are asking whether the sample mean for example is consistent with a certain hypothesis value for the population mean.

Page 85: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Hypothesis Testing

The method of assessing the hypotheses testing is known as

significance testsignificance test.

The significance testingThe significance testing is a method for assessing whether a result is likely to be due to chance or due to a real effect.  

Page 86: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Hypothesis Testing –Steps

>>> Formulate Hypothesis

>>> Collect the Data

>>>> Test Your Hypothesis

>>> Accept of Reject Your Hypothesis

Page 87: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Null and alternative hypotheses

In hypotheses testing, a specific hypothesis ( Null and alternative Hypothesis ) are formulated and tested. The null hypotheses H0 means : X1=X 2

Or X1-X 2=0this means that there is no difference between x1 and x2

The alternative hypotheses H1 means X1>X2 or X1< X2

Page 88: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Null and alternative hypotheses

The alternative hypotheses H1 means X1>X2 or X1< X2

this means that there is no difference between x1 and x2. If we reject the null hypothesis, i.e there is a difference between the two readings, it is either H1 : x1 < x2 or H2 : x1> x2in other words the null hypothesis is rejected because x1 is different from x2.

Page 89: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

General principles of significance tests

1. set up a null hypothesis and its alternative.

2. find the value of the test statistic.

3. refer the value of the test statistic to a known distribution which it would follow if the null hypothesis was true.

Page 90: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

General principles of significance tests

4-conclude that the data are consistent or inconsistent with the null hypothesis.

If the data are not consistent with the null hypotheses, the difference is said to be statistically significant. If the data are consistent with the null hypotheses it is said that we accept it i.e. statistically insignificant.

Page 91: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

General principles of significance tests P<0.05

In medicine, we usually consider that differences are significant if the probability is less than 0.05. This means that if the null hypothesis is true, we shall make a wrong decision less than 5 in a hundred times

Page 92: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Tests of significance

The selection of test of significance depends essentially on the type of data that we have.

1-Quantitative Data ( Means & SD): tt

test ,test ,paired tpaired t test and , test and ,ANOVAANOVA

2-Qualitative Data>>> ChiChi, and , and Z testZ test.

Page 93: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Tests of significance

Comparison of means:1-comparing two means of large samples using the normal distribution:(z test or SND standard normal deviate)If we have a large sample size i.e. 60 or more and it follows a normal distribution then we have to use the z-test.

z = (population mean — sample mean) / SD. If the result of z >2 then there is significant difference.

Page 94: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Tests of significance

Since the normal range for any biological reading lies between the mean value of the population reading ± 2 SD. (this range includes 95% of the area under the normal distribution curve).

Page 95: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Student’s t-test

2-Comparing two means of small samples using t-test:

If we have a small sample size (less than 60), we can use the t distribution instead of the normal distribution.

T = mean1 — mean2 / (SD1 2 / n1) +

(SD22 / n2)

Page 96: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

The value of t will be compared to values in the specific table of "t distribution test" at the value of the degree of freedom. If the value of t is less than that in the table , then the difference between samples is insignificant.

If the t value is larger than that in the table so the difference is significant i.e. the null hypothesis is rejected.

t-test

Page 97: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

2-Comparing two means of small samples using t-test:

If we have a small sample size (less than 60), we can use the t distribution instead of the normal distribution.

T = mean1 — mean2 / (SD1 2 / n1) +

(SD22 / n2)

t-test

Page 98: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

3-paired t-test:

If we are comparing repeated observation in the same individual or difference between paired data, we have to use paired t-test where the analysis is carried out using the mean and standard deviation of the difference between each pair.

Paired t-test

Page 99: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

4-comparing several means:

Sometimes we need to compare more than two means, this can be done by the use of several t-test which is not only tedious but can lead to spurious significant results. Therefore we have to use what we call analysis of variance or ANOVA.

ANOVA

Page 100: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

4-comparing several means:There are two main types: one-way analysis of variance and two-way analysis of variance. One-way analysis of variance is appropriate when the subgroups to be compared are defined by just one factor, for example comparison between means of different socio-economic classes. The two-way analysis of variables is used when the subdivision is based upon more than one factor

ANOVA

Page 101: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

The main idea in the analysis of variance is that we have to take into account the variability within the groups and between the groups and value of F is equal to the ratio between the means sum square of between the groups and within the groups.

F = between-groups MS / within-groups MS

ANOVA

Page 102: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

b-Qualitative variables:

1)Chi -squared test:

Qualitative data are arranged in table formed by rows and columns. One variable define the rows and the categories of the other variable define the column.

Chi-Squared Test

Page 103: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

A chi-squared test is used to test whether there is an association between the row variable and the column variable or, in other words whether the distribution of individuals among the categories of one variable is independent of their distribution among the categories of the other.

X2=(O-E)2 / E

Chi-Squared Test

Page 104: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

1)Chi -squared test:

degree of freedom = (row - 1) (column - 1)

O = observed value in the table

E = expected value calculated as follows:

E= Rt x Ct / GT

total of row x total of column / grand total

Chi-Squared Test

Page 105: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Page 106: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

From tables of X2 significance at degree of freedom (row 3-1)x(column 3-1) = 2x 2=4. The level of significance at 0.05 level, d.f.=4 is 9.48. therefore we conclude that there is significant relation between socioeconomic level and the degree of intelligence (because the value of X2 > that of the table).

Chi-Squared Test

Page 107: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

2) Z test for comparing two percentages:

z = p1 – p2 /√p1q1/n1 + p2q2/n2. where p1=percentage in the 1st group. P2 = percentage in the 2nd group, q1=100-p1, q2=100-p2, n1= sample size of group 1, n2=sample size of group2.Z test is significant(at 0.05 level)if the result>2.

Z Test

Page 108: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Example: if the number of anemic patients in group 1 which includes 50 patients is 5 and the number of anemic patients in group 2 which contains 60 patients is 20. To find if groups 1 & 2 are statistically different in prevalence of anemia we calculate z test.

P1=5/50=10% p2=20/60=33% q1=100-10=90 q2=100-33=67

Chi-Squared Test

Page 109: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Z=10 – 33/ √ 10x90/50 + 33x67/60

Z= 23 / √ 18 + 36.85 z= 23/ 7.4 z= 3.1

Therefore there is statistical significant difference between percentages of anemia in the studied groups (because z >2).

Chi-Squared Test

Page 110: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

c-Correlation and regression:

Correlation measures the closeness of the association between two continuous variables, while linear regression gives the equation of the straight line that best describes and enables the prediction of one variable from the other.

Correlation & regression

Page 111: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

1-Correlation:In the correlation, the closeness of the association is measured by the correlation coefficient, r. The values of r ranges between + 1 and —1. One means perfect correlation while 0 means no correlation. If r value is near the zero, it means weak correlation while near the one it means strong correlation. The sign — and + denotes the direction of correlation,

Correlation & regression

Page 112: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

1-Correlation:

the +ve correlation means that if one variable increases the other one increases similarly while for the –ve correlation means that when one variable increases the other one decreases

Correlation

Page 113: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

2- Linear regression:

Similar to correlation, linear regression is used to determine the relation and prediction of the change in a variable due to changes in other variable. For linear regression, the independent factor has to be specified from the dependent variable.

Linear regression

Page 114: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

2- Linear regression:The linear regression, not only allow assessment of the presence of association between the independent and dependent variable but also allows the prediction of dependent variable for a particular independent variable. However, regression for prediction should not be used outside the range of original data. a t-test is also used for the assessment of the level of significance. The dependent variable in linear regression must be a continuous one.

Linear regression

Page 115: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Correlation between Doppler velocimetry (RI) and baby birth weight

0

0.2

0.4

0.6

0.8

1

1.5 2 2.5 3 3.5 4 4.5

baby weight in kg

RI

Page 116: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

3-Multiple regression:

Situations frequently occur in which we are interested in the dependency of a dependent variable on several independent variables, not just one. Test of significance used is the analysis of variance.(F test).

Multiple regression

Page 117: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

1. How do you select a representative sample of 100 students from a primary school – Use all possible methods of sample selection

2. How to select a primary school from a rural area and another school from an urban area in Egypt?

Page 118: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

What Type of Sample is?

1. Lottery to select a winner2. Hospitalized Patients with SLE3. Every 6th patient coming to an

outpatient clinic 4. Random 20 females and 20 males out

of group of 100 person5. All workers in a factory chosen from

all factories in certain governorate

Page 119: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Present the following data by a suitable table & graph

Infant mortality rates in 2006 in some countries were as follows : Egypt =25/1000 , USA=10/1000 , Sweden 12/1000 and Pakistan= 30/1000

Page 120: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

Present the following data by a suitable table & graph

A the body weight (Kg ) of a group of male children were as follow:

12-22-18-17-28-20-16-21-19-16-27-21 Kg and for a group of female children were as follows:

16-23-19-29-18-22-17-15-21-21-24 Kg

Page 121: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU

The weight (Kg ) of a pregnant

Page 122: Basic Statistical Concepts and Methods

Ahmed-Refat-ZU