new statistics
DESCRIPTION
TRANSCRIPT
PROBABILITY AND STATISTICS
BYENGR. JORGE P. BAUTISTA
COURSE OUTLINE
I. Introduction to StatisticsII. Tabular and Graphical representation of
DataIII. Measures of Central Tendencies, Locations
and VariationsIV. Measure of Dispersion and CorrelationV. Probability and CombinatoricsVI. Discrete and Continuous DistributionsVII.Hypothesis Testing
Text and References
Statistics: a simplified approach by Punsalan and Uriarte, 1998, Rex Texbook
Probability and Statistics by Johnson, 2008, Wiley
Counterexamples in Probability and Statistics by Romano and Siegel, 1986, Chapman and Hall
Introduction to Statistics
Definition1.In its plural sense, statistics is a set of
numerical data e.g. Vital statistics, monthly sales, exchange rates, etc.
2.In its singular sense, statistics is a branch of science that deals with the collection, presentation, analysis and interpretation of data.
General uses of Statistics
a. Aids in decision making by providing comparison of data, explains action that has taken place, justify a claim or assertion, predicts future outcome and estimates un known quantities
b. Summarizes data for public use
Examples on the role of Statistics- In Biological and medical sciences, it helps researchers
discover relationship worthy of further attention.Ex. A doctor can use statistics to determine to what
extent is an increase in blood pressure dependent upon age
- In social sciences, it guides researchers and helps them support theories and models that cannot stand on rationale alone.
Ex. Empirical studies are using statistics to obtain socio-economic profile of the middle class to form new socio-political theories.
Con’t- In business, a company can use statistics to
forecast sales, design products, and produce goods more efficiently.
Ex. A pharmaceutical company can apply statistical procedures to find out if the new formula is indeed more effective than the one being used.
- In Engineering, it can be used to test properties of various materials,
- Ex. A quality controller can use statistics to estimate the average lifetime of the products produced by their current equipment.
Fields of Statistics
a. Statistical Methods of Applied Statistics:1. Descriptive-comprise those methods concerned
with the collection, description, and analysis of a set of data without drawing conclusions or inferences about a larger set.
2. Inferential-comprise those methods concerned with making predictions or inferences about a larger set of data using only the information gathered from a subset of this larger set.
con’t
b. Statistical theory of mathematical statistics- deals with the development and exposition of theories that serve as a basis of statistical methods
Descriptive VS Inferential
DESCRIPTIVE• A bowler wants to find his
bowling average for the past 12 months
• A housewife wants to determine the average weekly amount she spent on groceries in the past 3 months
• A politician wants to know the exact number of votes he receives in the last election
INFERENTIALA bowler wants to estimate his
chance of winning a game based on his current season averages and the average of his opponents.
A housewife would like to predict based on last year’s grocery bills, the average weekly amount she will spend on groceries for this year.
A politician would like to estimate based on opinion polls, his chance for winning in the upcoming election.
Population as Differrentiated from Sample
The word population refers to groups or aggregates of people, animals, objects, materials, happenings or things of any form, this means that there are populations of students, teachers, supervisors, principals, laboratory animals, trees, manufactured articles, birds and many others. If your interest is on few members of the population to represent their characteristics or traits, these members constitute a sample. The measures of the population are called parameters, while those of the sample are called estimates or statistics.
The Variable
It refers to a characteristic or property whereby the members of the group or set vary or differ from one another. However, a constant refers to a property whereby the members of the group do not differ one another.
Variables can be according to functional relationship which is classified as independent and dependent. If you treat variable y as a function of variable z, then z is your independent variable and y is your dependent variable. This means that the value of y, say academic achievement depends on the value of z.
Con’t
Variables according to continuity of values.1. Continuous variable – these are variables
whose levels can take continuous values. Examples are height, weight, length and width.
2. Discrete variables – these are variables whose values or levels can not take the form of a decimal. An example is the size of a particular family.
Con’t
Variables according to scale of measurements:1. Nominal – this refers to a property of the
members of a group defined by an operation which allows making of statements only of equality or difference. For example, individuals can be classified according to thier sex or skin color. Color is an example of nominal variable.
Con’t2. Ordinal – it is defined by an operation whereby
members of a particular group are ranked. In this operation, we can state that one member is greater or less that the others in a criterion rather than saying that he/it is only equal or different from the others such as what is meant by the nominal variable.
3. Interval – this refers to a property defined by an operation which permits making statement of equality of intervals rather than just statement of sameness of difference and greater than or less than. An interval variable does not have a “true” zero point.; althought for convenience, a zero point may be assigned.
Con’t
4. Ratio – is defined by the operation which permits making statements of equality of ratios in addition to statements of sameness or difference, greater than or less than and equality or inequality of differences. This means that one level or value may be thought of or said as double, triple or five times another and so on.
Assignment no. 1
I. Make a list of at least 5 mathematician or scientist that contributes in the field of statistics. State their contributions
II. With your knowledge of statistics, give a real life situation how statistics is applied. Expand your answer.
III. When can a variable be considered independent and dependent? Give an example for your answer.
Con’t
IV. Enumerate some uses of statistics. Do you think that any science will develop without test of the hypothesis? Why?
Examples of Scales of Measurement
1.Nominal LevelEx. Sex: M-Male F-Female Marital Status: 1-single 2- married 3-
widowed 4- separated2. Ordinal LevelEx. Teaching Ratings: 1-poor 2-fair 3- good 4-
excellent
Con’t3. Interval LevelEx. IQ, temperature4. Ratio LevelEx. Age, no. of correct answers in exam
Data Collection Methods
1. Survey Method – questions are asked to obtain information, either through self administered questionnaire or personal interview.
2. Observation Method – makes possible the recording of behavior but only at the time of occurrence (ex. Traffic count, reactions to a particular stimulus)
Con’t3. Experimental method – a method designed for
collecting data under controlled conditions. An experiment is an operation where there is actual human interference with the conditions that can affect the variable under study.
4. Use of existing studies – that is census, health statistics, weather reports.
5. Registration method – that is car registration, student registration, hospital admission and ticket sales.
Tabular Representation
Frequency Distribution is defined as the arrangement of the gathered data by categories plus their corresponding frequencies and class marks or midpoint. It has a class frequency containing the number of observations belonging to a class interval. Its class interval contain a grouping defined by the limits called the lower and the upper limit. Between these limits are called class boundaries.
Frequency of a Nominal DataMale and Female College students
Major in Chemistry
SEX FREQUENCY
MALE 23
FEMALE 107
TOTAL 130
Frequency of Ordinal DataEx. Frequency distribution of Employee Perception on
the Behavior of their Administrators
Perception Frequency
Strongly favorable 10
favorable 11
Slightly favorable 12
Slightly unfavorable 14
Unfavorable 22
Strongly unfavorable 31
total 100
Frequency Distribution Table
Definition:1. Raw data – is the set of data in its original
form2. Array – an arrangement of observations
according to their magnitude, wither in increasing or decreasing order.
Advantages: easier to detect the smallest and largest value and easy to find the measures of position
Grouped Frequency of Interval Data
Given the following raw scores in Algebra Examination,
47 56 42 28 56 41 56 55 5978 50 55 57 38 62 52 66 6579 33 34 37 47 42 68 62 5480 68 48 56 39 77 80 62 7157 52 60 70
1. Compute the range: R = H – L and the number of classes by K = 1 + 3.322log n where n = number of observations.
2. Divide the range by 10 to 15 to determine the acceptable size of the interval. Hint: most frequency distribution have odd numbers as the size of the interval. The advantage is that the midpoints of the intervals will be whole number.
3. Organize the class interval. See to it that the lowest interval begins with a number that is multiple of the interval size.
4. Tally each score to the category of class interval it belongs to.
5. Count the tally columns and summarizes it under column (f). Then add the frequency which is the total number of the cases (N).
6. Determine the class boundaries. UCB and LCB.(upper and lower class boundary)
7. Compute the midpoint for each class interval and put it in the column (M).
M = (LS + HS) / 2
8. Compute the cumulative distribution for less than and greater than and put them in column cf< and cf>. (you can now interpret the data). cf = cumulative frequency
9. Compute the relative frequency distribution. This can be obtained by
RF% = CF/TF x 100% CF = CLASS FREQUENCY TF = TOTAL FREQUENCY
Graphical RepresentationThe data can be graphically
presented according to their scale or level of measurements.
1. Pie chart or circle graph. The pie chart at the right is the enrollment from elementary to master’s degree of a certain university. The total population is 4350 students
2. Histogram or bar graph- this graphical representation can be used in nominal, ordinal or interval. For nominal bar graph, the bars are far apart rather than connected since the categories are not continuous. For ordinal and interval data, the bars should be joined to emphasize the degree of differences
Given the bar graph of how students rate their library.
A-strongly favorable, 90B-favorable, 48C-slightly favorable, 88D-slightly unfavorable, 48E-unfavorable, 15F-strongly unfavorable, 25
The Histogram of Person’s Age with Frequency of Travel
age freq RF
19-20 20 39.2%
21-22 21 41.2%
23-24 4 7.8%
25-26 4 7.8%
27-28 2 3.9%
total 51 100%
ExercisesFrom the previous grouped data on algebra scores,a. Draw its histogram using the frequency in the y axis
and midpoints in the x axis.b. Draw the line graph or frequency polygon using
frequency in the y axis and midpoints in the x axis.c. Draw the less than and greater than ogives of the
data. Ogives is a cumulation of frequencies by class intervals. Let the y axis be the CF> and x axis be LCB while y axis be CF< and x axis be UCB
Con’td. Plot the relative frequency using the y axis as
the relative frequency in percent value while in the x axis the midpoints.
25 30 35 40 45 50 55 60 65 70 75 80 85 90
9
8
7
6
5
4
3
2
1
0
f
midpoint29.5 - UCB27- midpoint24.5 - LCB
midpoint
HISTOGRAMLINE GRAPH
29.5 34.5 39.5 44.5 49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5
cf less than
40
35
30
25
20
15
10
5
0
UCB
40
35
30
25
20
15
10
5
024.5 29.5 34.5 39.5 44.5 49.5 54.5 59.5 64.5 69.5 74.5 79.5
cf greater than
LCB
Assignment No. 2Given the score in a statistics examinations,33 38 56 35 70 44 81 44 8047 45 72 45 50 51 51 52 6654 54 53 56 84 58 56 57 7055 56 39 56 59 72 63 89 6360 69 65 61 62 64 64 69 6065 53 66 66 67 67 68 68 6966 66 67 70 59 40 71 73 6073 73 73 73 73 73 74 73 7374 79 74 74 70 73 46 74 7475 74 75 75 76 55 77 78 7379 48 81 44 84 77 88 63 8573
1. Construct the class interval, frequency table, class midpoint(use a whole number midpoint), less than and greater than cumulative frequency, upper and lower boundary and relative frequency.
2. Plot the histogram, frequency polygon, and ogives
3. Draw the pie chart and bar graph of the plans of computer science students with respect to attending a seminar. Compute for the Relative frequency of each.
A-will not attend=45B-probably will not attend=30C-probably will attend=40D-will attend=25
Measures of Centrality and Location
Mean for Ungrouped DataX’ = ΣX / N where X’ = the mean ΣX = the sum of all scores/data N = the total number of casesMean for Grouped DataX’ = ΣfM / N where X’ = the mean M = the midpoint fM = the product of the frequency and each
midpoint N = total number of cases
Ex. 1. Find the mean of 10, 20, 25,30, 30, 35, 40 and 50.2. Given the grades of 50 students in a statistics classClass interval f 10-14 4 15-19 3 20-24 12 25-29 10 30-34 6 35-39 6 40-44 6 45-49 3
The weighted mean. The weighted arithmetic mean of given groups of data is the average of the means of all groups
WX’ = ΣXw / N where WX’ = the weighted mean w = the weight of X ΣXw = the sum of the weight of X’s N = Σw = the sum of the weight of
X
Ex.Find the weighted mean of four groups of
means below:Group, i 1 2 3 4Xi 60 50 70 75
Wi 10 20 40 50
Median for Ungrouped DataThe median of ungrouped data is the
centermost scores in a distribution. Mdn = (XN/2 + X (N + 2)/2) / 2 if N is even
Mdn = X (1+N)/2 if N is oddEx. Find the median of the following sets of
score:Score A: 12, 15, 19, 21, 6, 4, 2Score B: 18, 22, 31, 12, 3, 9, 11, 8
Median for Grouped DataProcedure:1. Compute the cumulative frequency less than.2. Find N/23. Locate the class interval in which the middle class falls, and
determine the exact limit of this interval.4. Apply the formula Mdn = L + [(N/2 – F)i]/fm where L = exact lower limit interval containing
the median class F = The sum of all frequencies preceeding L. fm = Frequency of interval containing the median
class i = class interval N = total number of cases
Ex. Find the median of the given frequency table.class interval f cf<25-29 3 330-34 5 835-39 10 1840-44 15 3345-49 15 4850-54 15 6355-59 21 8260-64 8 9265-69 6 9870-74 2 100
Mode of Ungrouped DataIt is defined as the data value or specific score
which has the highest frequency.Find the mode of the following data.Data A : 10, 11, 13, 15, 17, 20Data B: 2, 3, 4, 4, 5, 7, 8, 10Data C: 3.5, 4.8, 5.5, 6.2, 6.2, 6.2, 7.3, 7.3, 7.3,
8.8
Mode of Grouped DataFor grouped data, the mode is defined as the midpoint
of the interval containing the largest number of cases.
Mdo = L + [d1/(d1 + d2)]i where L = exact lower limit interval
containing the modal class. d1 = the difference of the modal class and the
frequency of the interval preceding the modal class d2 = the difference of the modal class and the
frequency of the interval after the modal class.
Ex. Find the mode of the given frequency table.class interval f cf<25-29 3 330-34 5 835-39 10 1840-44 15 3345-49 15 4850-54 15 6355-59 21 8260-64 8 9265-69 6 9870-74 2 100
Exercises 1. Determine the mean, median and mode of
the age of 15 students in a certain class.15, 18, 17, 16, 19, 18, 23 , 24, 18, 16, 17, 20, 21,
192. To qualify for scholarship, a student should
have garnered an average score of 2.25. determine if the a certain student is qualified for a scholarship.
Subjectno. of units grade A 1 2.0 B 2 3.0 C 3 1.5 D 3 1.25 E 5 2.0
3. Find the mean, median and mode of the given grouped data.
Classes f 11-22 223-34 835-46 1147-58 1959-70 1471-82 583-94 1
Quartiles refer to the values that divide the distribution into four equal parts. There are 3 quartiles represented by Q1 , Q2 and Q3. The value Q1 refers to the value in the distribution that falls on the first one fourth of the distribution arranged in magnitude. In the case of Q2 or the second quartile, this value corresponds to the median. In the case of third quartile or Q3, this value corresponds to three fourths of the distribution.
LH
Q3
Q2
Q1= 1st quartile
= 2nd quartile
=3rd quartile
The position of the quartiles in a given set of data
For grouped data, the computing formula of the kth quartile where k = 1,2,3,4,… is given by
Qk = L + [(kn/4 - F)/fm]IiWhere L = lower class boundary of the kth
quartile class F = cumulative frequency before the kth
quartile class fm = frequency before the kth quartile i = size of the class interval
ExercisesCompute the value of the first and third quartile of the given
dataclass interval f cf<25-29 3 330-34 5 835-39 10 1840-44 15 3345-49 15 4850-54 15 6355-59 21 8260-64 8 9265-69 6 9870-74 2 100
Decile:If the given data is divided into ten equal parts,
then we have nine points of division known as deciles. It is denoted by D1 , D2,
D3 , D4 …and D9
Dk = L + [(kn/10 – F)/fm] I
Where k = 1,2,3,4 …9
Exercises Compute the value of the third, fifth and seventh decile of the
given dataclass interval f cf<25-29 3 330-34 5 835-39 10 1840-44 15 3345-49 15 4850-54 15 6355-59 21 8260-64 8 9265-69 6 9870-74 2 100
Percentile- refer to those values that divide a distribution into one hundred equal parts. There are 99 percentiles represented by P1, P2, P3, P4, P5, …and P99. when we say 55th percentile we are referring to that value at or below 55/100 th of the data.
Pk = L + [(kn/100 – F)/fm]i
Where k = 1,2,3,4,5,…99
Exercises Compute the value of the 30th, 55th, 68th and 88th percentile of
the given dataclass interval f cf<25-29 3 330-34 5 835-39 10 1840-44 15 3345-49 15 4850-54 15 6355-59 21 8260-64 8 9265-69 6 9870-74 2 100
Assignment no. 3I. The rate per hour in pesos of 12 employees
of a certain company were taken and are shown below.
44.75, 44.75, 38.15, 39.25, 18.00, 15.75, 44.75, 39.25, 18.50, 65.25, 71.25, 77.50
a. Find the mean, median and mode.b. If the value 15.75 was incorrectly written as
45.75, what measure of central tendency will be affected? Support your answer.
II. The final grades of a student in six subjects were tabulated below.
Subj units final gradeAlgebra 3 60Religion 2 90English 3 75Pilipino 3 86PE 1 98History 3 70a. Determine the weighted meanb. If the subjects were of equal number of units, what
would be his average?
III. The ages of qualified voters in a certain barangay were taken and are shown below
Class Interval Frequency18-23 2024-29 2530-35 4036-41 5242-47 3048-53 2154-59 1260-65 666-71 472-77 1
a. Find the mean, median and modeb. Find the 1st and 3rd quantilec. Find the 40th and 60th deciled. Find the 25th and 75th percentile
Measure of VariationThe range is considered to be the simplest form
of measure of variation. It is the difference between the highest and the lowest value in the distribution.
R = H – LFor grouped data, the3 difference between the
highest upper class boundary and the lowest lower class boundary.
Example: find the range of the given grouped data in slide no. 59
Semi-inter Quartile Range
This value is obtained by getting one half of the difference between the third and the first quartile.
Q = (Q3 – Q1)/2
Example: Find the semin-interquartile range of the
previous example in slide no. 59
Average DeviationThe average deviation refers to the arithmetic
mean of the absolute deviations of the values from the mean of the distribution. This measure is sometimes known as the mean absolute deviation.
AD = Σ│x – x’│/ nWhere x = the individual values x’ = mean of the distribution
Steps in solving for AD1. Arrange the values in column according to
magnitude2. Compute for the value of the mean x’3. Determine the deviations (x – x’)4. Convert the deviations in step 3 into positive
deviations. Use the absolute value sign.5. Get the sum of the absolute deviations in
step 46. Divide the sum in step 5 by n.
Example:1. Consider the following values:16, 13, 9, 6, 15, 7, 11, 12Find the average deviation.
For grouped data:AD = Σf│x – x’│ / nWhere f = frequency of each class x = midpoint of each class x’ = mean of the distribution n = total number of frequency
Example:Find the average deviation of the given dataClasses f 11-22 223-34 835-46 1147-58 1959-70 1471-82 583-94 1
VarianceFor ungrouped datas2 = Σ(x – x’)2 / nExample: Find the variance of16, 13, 9, 6, 15, 7, 11, 12
For grouped datas2 = Σf(x – x’)2 / nWhere f = frequency of each class x = midpoint of each class interval x’ = mean of the distribution n = total number of frequency
Example: Find the variance of the given dataClasses f 11-22 223-34 835-46 1147-58 1959-70 1471-82 583-94 1
Standard Deviation
s = √s2
For ungrouped data s = √ Σ(x – x’)2 / nFor grouped datas = √ Σf(x – x’)2 / n
Find the standard deviation of the previous examples for ungrouped and grouped data.
Assignment no. 4
I. Compute for the semi-interquartile range, absolute deviation, variance and standard deviation test III of assignment no. 3.
II. Compute for the semi-interquartile range, absolute deviation, variance and standard deviation of test I of assignment no. 3.