introduction, cont. ( script # 2 )

11
1 | Page

Upload: justden09

Post on 06-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 1/11

1 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 2/11

BIOSTATISTICS

Wednesday >>> 15th Feb. 2012

Dr. Atallah Z. Rabi

Slide (19): The data typesAs we have discussed before, the date are numbers about what we

are interested in; like the weight, the blood pressure, the number of 

students, grades of the students, so the data we collect is the

numbers.

So if I want to know your weight either I will ask you or I will weight

you, numbers as this collected is called weight data, as the data is

divided into two types; one is the continuous data and the other is

the categorical data or discrete data, categories means that usually

we have groups of issues or categories, for example; male or female,

it is a category so the person belong either to the male category or

to the female category, but not to both of them.

Another example is the blood group of humans “the ABO system”,

also the state of health is an example; like healthy or sick.

And as you notice the discrete data is usually mutually exclusive;

which means that if one person belongs to one category, it means

that he belongs to that category only; notice that the person can’t

be male and female at the same time, we can’t have blood group A

and blood group B at the same time, so it is mutually exclusive.

So if the person is in this category, it means that he doesn’t belong

to any other category, and the total would be the total of the

population.

Let’s say that I have 100 people and 30 of them belongs to blood

group A, 18 of them belongs to blood group B, 32 of them belongs to

blood group AB, and now a smart person like you will automatically

know how many people are having blood group O, they are 20;

because it is mutually exclusive, so if we have 80 people who have A

or B or AB, out of 100, so the remaining will be O.

So this is what we mean by mutually exclusive, and it usually covers

everybody, all of the population will be included, we have noexclusions.

2 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 3/11

So this is what we call categorical data or discrete, so there is no

continuity of the number, so we have 18 individuals not 18.3 having

blood group B for example, however the continuous is what we

usually measure or weight, so the number is not discrete, it is

continuous.

So if we ask someone what is your weight? The answer would be for

example 67 Kg, but it is possible on a measuring scale to get 67.3

Kg, when measuring on a scale it would be 67.35 Kg, so there is a

rule for the continuity of the data, and this is for the continuous

data.

So the continuous data is used to record the measurement’s of 

individuals that can take any value within an acceptable range, so

what do we mean by saying within an acceptable range?

 That means I couldn’t say that the weight of a student like you is

1000 Kg, it is known to be less than 100, and we couldn’t say that

his weight is 15 Kg, so that is what we mean by an acceptable range,

and the acceptable range is for that specific measurement that we

are considering or we are concerned with.

Another example is the range of your secondary certificates; it is

between 85 and 100, so no one would take 105 or 65, so this is what

we mean when we say within an acceptable range, and the

acceptable range varies from one source of data with the others; like

the pulse rate for example, usually the acceptable range of the pulse

rate for individuals like you is between 60 and 80, and for the doctor

it will be between 80 and 100.

And they say in medicine that the number of your heart beats “or

pulse rate” is fixed, so the faster your heart beats, the closer to theend of your life.

Usually the data we collect most of the time describes the people we

are concerned with, it describes either the weights, the grades, the

pulse rate, the blood pressure, the blood group, the income and

whatever we want, so it gives a good description of the population

we are concerned with.

And this is why the relevant statistics is called descriptive statistics,it describes the population we are concerned with, the JUST students

3 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 4/11

for example, if we are concerned with their distribution on faculties;

we have medicine faculty, engineering faculty, agriculture faculty,

nursing faculty, etc.

 The number of students in each faculty describes the distribution of 

the students according to the faculty.Also we may have another descriptions; for example the distribution

according to the gender; males or females, or according to the year;

1st year students,2nd year, 3rd year, 4th year and in medical and dental

schools they have 5th year and 6th year, etc.

So this is what we call descriptive statistics.

Slide (20): First look at the dataOur goal is to show you how to get a first look at the data and get

ready to do more elaborate procedures, it is a numerical summary of 

the data, and you should know that descriptive statistics should be

clear and easily interpreted.

What do we mean by clear? When I talk about the descriptive

statistics, I should mention what I want, do I want to describe the

students in just according to their year level?, or according to theirnationality?; they say that we have 57 nationalities in JUST, or

according to the blood groups?, and etc.

So we have to be clear and this should be stated, and this

description “The categorization of students” should follow this

description, and it should cover all aspects.

When I say according to the year level, we have 1st, 2nd, 3rd, 4th,5th

and 6th

year, and no more than that.

According to the level either the bachelor degree or the graduate

degree, we shouldn’t need any other categories assigned, we

shouldn’t need any question for somebody; like what about those in

the 7th year or 9th year? Do we have people like them? If we have

then we should include them in our descriptive statistics.

Slide (21): Measures of central tendencyNow in the data we have what we call measures of central tendency,

4 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 5/11

so what do we mean by central tendency? It means that most of the

numbers would be around a certain group; for example when the

doctor do the M391 test “which is our course by the way”, one of the

students would ask; what is the students’ average?

So the average is one of the measurements of the central tendency.

 The descriptive data or any data about the group, except for the

categories, moves towards the center, so this is what we call the

central tendency, and we have to be careful about the central

tendency mean for example.

So the Mean is the arithmetic average, for example; if 3 people

were in hospital 8, 10 and 30 days respectively, the mean time is

how many? I have to add all of them and divide by the number of 

them, so the average would be about 16 days (48/3).

We have to be careful with the mean, sometimes it is misleading, for

example; if we have a child of 6 years old, and she has friends, and

she wants to make a birthday party, she will invite those of her age,

then one of her friends bring her grandmother with her, and the

grandmother age is 90, then we have 6 friends and our girl (6+1=7)

who’s age is 6 years old, now (6*7=42) and the grandmother’s age

is 90, (42+90=132), now we divide 132 by the number of people wehave in the party which is 8, which makes the average age of those

on the birthday party 16 years.

Now it is misleading to say that the average age of those in the

birthday party is 16 years, so we have to be careful about this.

And this is why we have another measure of the central tendency

which is called the median, the Median is the value at which 50% of 

the numbers or the measurements are higher and 50% of them arelower, so it divides the data into two groups; higher and lower, so

when we calculate the median, what is the first step we should do?

We should arrange the data we have into ascending or descending

order.

We have two types of data; the odd numbers and the even numbers.

Now the odd numbers would be like 17, 23, 29, 31 or etc., and in the

odd numbers the median would be in the middle, let’s say if we have31 numbers of data, so what is the median after arranging in

5 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 6/11

ascending or descending order?

It will be the data number 16, because with 16 we have 15 data

above that and another 15 data below that, so this is in the odd

number.

However if we have an even number of data; like 32 for example, soafter arranging the data, in the middle we won’t have one number

we will have two numbers, so we add them and divide them by 2,

and the result would be the median.

So the first step is to arrange the data into ascending or descending

order, then we look at the middle or the median value.

So as a result, the median value if the number of the data is odd will

be one number, and if the number of data is even we will have twonumbers; so we add them together and then we divide the sum by 2.

And there are equations that we usually use, if the number is odd

like 31; we use {rank=(n+1)/2} where (n) is the number of data

which is here 31, so {(31+1)/2} which equals 16, and this is why we

say (n+1), and this is for the case of an odd data number, and in the

case of an even data number, it will be in the middle “The middle

two values”.

So the mean is the arithmetic average, and the median is the value

that divides the data we have into two equal parts; 50% above it and

50% below it.

And the Mode is the most common value, we don’t usually use it,

for example with our birthday party, we have 6, 6, 6, 6, 6, 6, 6 and

90, so the mode is 6, but the average is 16 and the median is 6.

Slide (22): Mean calculation

Now I think you don’t have any problem with calculating the mean.

Slide (23): Measures of dispersion

Now we have another way of measuring the data, so to overcome

the differences for the birthday, we have what we call the measures

of dispersion, dispersion is the variation from the mean, so the more

variation around the mean, the more the measures of dispersion.

6 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 7/11

 2  _   _ 

 2 

 _ 

 2  2 

We have 3 measures of dispersion and sometimes we use the

quartile; especially the difference between the 3rd quartile and the 1st

quartile, but these are the most commonly used measures of 

dispersion, which means the variation from the mean, which are:

1) Range.

2) Variance.3) Standard.

 The Range is the difference between the highest value and the

smallest value in our data, so {range = the highest value – the

smallest value}.

So now how do we calculate the range?

We should identify the lowest value and the highest value, and

subtract the lowest from the highest, and by that we get the range.

 The Variance is the sum (∑) of the squares of the difference

between the value and the mean, divided by the sample size

number.

 The Standard deviation is the square root.

Slide (24): Formulae for measures of variation

And now we will see how to calculate the range, the range is thedifference between X  max. and  X  min.  , so the range = X max. -  X min. .

Standard sample is the sum (∑) of “squares the difference between

the value and the mean” , the value is ( X i ) and the mean is ( X ), so

the standard sample = ( X i – X) and this is the sum of the squares.

Now the sample variance is usually abbreviated by s , and the

Sample variance is the sum of “squares the difference between

the value ( X i ) and the mean ( X) ”, divided by (n – 1), let’s say that

the mean is 8 and the value is 6, then (6-8) is (-2) which = 4.

We use the variance in order to overcome the minus values,

otherwise if we add all the values “the differences between the

mean and the measurements’ values” we will have 0, so this is the

variance sample or the sample variance.

Now why do we divide the sum by (n – 1)?

And this is important, because it we be with us for the rest of the

7 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 8/11

 _ 

 2 

 _ 

course, (n – 1) is what we call the degree’s of freedom, as I

mentioned when we said “we have 100 people, and they are divided

into categories according to their blood groups, and we know three

values, and by that the fourth one would be known automatically”.

So this is what we call the degree’s of freedom, which is always = (n

– 1).

So these three values for example, I could give them the number I

want; like for example; 18, 32, and 30, another one might not agree

with the me, and said that according to her sample or the group she

studied, the values are 20, 30 and 25, so she give different values,

but the fourth one is fixed anyway, so we can leave it without

mentioning; because it is fixed depending on the three values that

she gave.

Now the fourth value according to the doctor equals {100 –

(18+32+30)}, {100 – 80} which = 20, and the fourth value

according to her group equals {100 – (20+30+25)}, {100 – 75}

which = 25, so as we said before; the fourth value if fixed anyway

because it is depending on the other 3 values.

So this is why we have the degree’s of freedom, which is = (n – 1),

we could assign any values, and the last one is indicated by thenumbers we gave.

So again the sample variance is: the square of “( X i ) which means

any value minus the mean or the arithmetic average of the data

( X)”, divided by (n – 1).

Now the variance of the population is (σ ) = the square sum (∑) of 

( X i) minus (µ) divided by N.

Notice that (µ) is the mean of the population while ( X) “X bar” is the

mean of the sample, also notice that (N) is the number of the whole

population or the population size, where (n) is the number of the

sample.

And here with the population we divided by N and not by (n – 1), do

you know why?

Because the population usually is very large, so it wouldn’t make

much difference if we divide by the whole population number or thepopulation number minus 1.

8 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 9/11

 2 

Let’s say that the population of students in JUST size is 20000, it

wouldn’t make that much difference if I divided by 20000 or 19999,

and that is why we divide by (N).

* Remember this information very well; because the doctor was

going to give a bonus of 0.25 extra mark to the student who answer,badly no one answered .

Now the Standard deviation usually is the positive square root of 

the variance, because the square root is + or - , so we take the

positive value.

So the standard deviation of the sample is the square root of s ,

which equals s.

And the standard deviation of the population is the square root of σwhich equals σ.

Now the Coefficient of variance can be calculated by dividing the

standard deviation by the mean and then multiplying by 100, so

some times we have the same standard deviation, so we have the

same variation around the mean, but we have different means, so in

order to describe the difference or to show the difference between

this group of data and that group, we have what we call the

coefficient of variation.

And again the coefficient of variance = (the standard deviation / the

mean) * 100.

 There is in statistics something called the standard error and it

always will be in samples, so whenever we take a sample, which is a

small group of people, we collect data from them and interpret the

results on the population, and make implements to the population,so the Standard error of the mean = the standard deviation / the

square root of the sample size.

Now which is higher the standard error of a small sample or a large

sample?

 The standard error of the small sample is higher; because you know

that the larger the sample size is, the closer we are to approach the

9 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 10/11

population size, so the more our sample would be representing the

population.

If I have this class of 100 students, and I take a sample of 10

students, the standard error would be, the standard deviation / the

square root of 10, “the square root of 10 is approximately 3”.Now if I take a sample of 20, then the standard error will be less;

because I divide the standard deviation “which is the same as before

when the sample was 10” by the square root of (n) which is here the

square root of 20, “the square root of 20 is approximately 4.5”.

Slide (25): Line histogram showing distribution of HR in

womenAnd this is what we call the data presentation and the histogram

distribution

And here is an example of how we present our data, after collecting

the data, we have to make a data presentation.

 This is the pulse rate between 175 and 105, and this is what we call

reasonably correct or appropriate.

 This heart rate is usually for very old women; like grandmothers “italso works for our 90 years old grandmother”.

And this is what we call line histogram, and we also have what is

called bar histogram.

10 | P a g e

8/3/2019 Introduction, Cont. ( script # 2 )

http://slidepdf.com/reader/full/introduction-cont-script-2- 11/11

On Monday we will take exercises on how to measure or how to

calculate the mean, the median, the mode, the standard deviation,

the variance, the coefficient of variance, and we will take examples

and tables and we will calculate them.

And now you are free to go … ^_^

And Thank you …

 The End

Done by: Raja’ Amin El-haddad

Life in lines …

Life makes everyone wonder and say wow !!!

Every day is a big surprise …

And every day is a big gift …

So let’s fly above the clouds …

Carrying happiness and leaving sadness …

Because that’s the life and we all travelling on its journey …

So leave the anger …

And make sure you are a good passenger …

11 | P a g e