descriptive statistics (1)
TRANSCRIPT
-
5/26/2018 Descriptive Statistics (1)
1/81
-
5/26/2018 Descriptive Statistics (1)
2/81
WHAT IS STATISTICS
Statistics is defined as the science ofcollecting, organizing, presenting,
analyzing, and interpreting data toassist in making more effectivedecisions.
ORCollection of numerical information iscalled statistics.
Dr. Iftikhar Hussain Adil
-
5/26/2018 Descriptive Statistics (1)
3/81
WHAT IS STATISTICS
Broadly defined, it is the science,technology and art of extractinginformation from observational data,
with an emphasis on solving real worldproblems.
It is a logic and methodology for the
measurement of uncertainty and forexamination of the consequences of thatuncertainty in the planning andinterpretation of experimentation and
observation.
-
5/26/2018 Descriptive Statistics (1)
4/81
TYPES OF STATISTICS
Dr. Iftikhar Hussain Adil
StatisticalMethods
Descriptive
Statistics
Inferential
Statistics
-
5/26/2018 Descriptive Statistics (1)
5/81
TYPES OF STATISTICS
DESCRIPTIVE STATISTICSMethods of organizing, summarizing,and presenting data in an informativeway.
INFERENTIAL STATISTICS
The methods used to determine
something about a population on thebasis of a sample.
-
5/26/2018 Descriptive Statistics (1)
6/81
DESCRIPTIVE STATISTICS
Dr. Iftikhar Hussain Adil
-
5/26/2018 Descriptive Statistics (1)
7/81
Inferential Statistics
Aim to draw conclusions about anadditional population outside of yourdatasets/sample is known to beinferential statistics.
-
5/26/2018 Descriptive Statistics (1)
8/81
Population versus Sample
A population is the complete set of allitems that interests an investigator.Population size N, can be very large or
even infinite.
e.g. All the registered voters of Pakistan
All the students at NUST
Sample is an observed subset of thepopulation values with sample sizegiven by n
-
5/26/2018 Descriptive Statistics (1)
9/81
Sampling Techniques
Simple Random Sampling
Systematic Sampling
Stratified Sampling Possible strata: (Male and female strata, Resident
and non-resident strata, White, Black, Hispanic, and Asianstrata, Protestant, Catholic, Jewish, Muslim, etc., strata)
Clustered Sampling Sample of Convenience
-
5/26/2018 Descriptive Statistics (1)
10/81
Parameter and Statistic
A parameter is a specific characteristic of apopulation. A statistic is a specificcharacteristic of a sample.
e.g. NBS surveyed its students to determine theaverage daily expense. From a sample of 80students the average expense was computedRs.133.
What is population?
What is sample?
What is parameter?
What is statistic?
Is Rs.133 a parameter or statistic?
-
5/26/2018 Descriptive Statistics (1)
11/81
Types of Variables
Variable: A characteristic of an item orindividual that will be analyzed byusing statistics.
e.g. Gender, Party affiliation of registeredvoters, HH income of citizens who live inspecific geographic area, Publishing
category (hard cover, trade paper book,mass marked paper book, text book) ofa book. No of televisions in a householdetc.
-
5/26/2018 Descriptive Statistics (1)
12/81
Example (Types of variables)
Reg # GenderAge FA/FSC or
equivalent
Family
Members
1 M 18.2 67 4
2 F 19 70 3
3 M 20 80 5
4 F 19.4 85 6
5 F 20.6 73 3
6 M 21 76 4
7 F 20.3 67 58 F 19.8 89 4
-
5/26/2018 Descriptive Statistics (1)
13/81
Types of Variables
Categorical Variables
A categorical variable is a variable that can takeon one of a limited, and usually fixed, number
of possible values. Categorical variables areoften used to represent categorical data.
The values of these variables are selected from anestablished list of categories.
Male/ Female, Pass/ Fail, SA,A,D,SD Numerical variables
The values of these variables involve a counted ormeasured valued
-
5/26/2018 Descriptive Statistics (1)
14/81
Types of Variables
Discrete Variables: The vales of thesevariables counts.
e.g. Number of people living in a HHContinuous Variables: These variables
have continuous values and any valuecan theoretically occur limited only by
the precision of the measuringprocess. E.g time to complete a work,air pressure in tyre.
-
5/26/2018 Descriptive Statistics (1)
15/81
-
5/26/2018 Descriptive Statistics (1)
16/81
Levels of Measurement
Levels of measurement often dictatethe calculations that can be done tosummarize and present the data. Italso determines the statistical testthat should be performed.
e.g. Balls in a bag are of different colors
like brown, yellow, blue, green,orange or red etc.
-
5/26/2018 Descriptive Statistics (1)
17/81
Types of Levels of Measurement
Ratio Level Data: When a scaleconsist of not only of equidistantpoints but also has a meaningful zeropoint, then we refer it as ratio scale.
Ratio scales are more sophisticated ofscales since it incorporates all the
characteristics of nominal, ordinal andinterval scales. E.g. income data
-
5/26/2018 Descriptive Statistics (1)
18/81
Properties of Ratio Level
Equal differences in the characteristic arerepresented by equal differences in thenumbers assigned to the classifications.
Can be added or subtracted i.e.X1+X2 or X1-X2is possible
Can be multiplied or divided
X1*X2 orX1/X2 is possible
Can be ordered
X1X2 Meaningful zero point
-
5/26/2018 Descriptive Statistics (1)
19/81
Types of Levels of Measurement
Interval Scale: An interval scale satisfies x2-x1or x2x1or x1x2but not the ratio.
e.g. 100Ois not twice as warm as 50o
(no zero point, no ratio but x2x1or x1x2)
Ordinal Scale: When item are classifiedaccording to more or less characteristics, thescale used is referred as ordinal scale. Thisscale is common in marketing, satisfaction andattitudinal research.E.g. Excellent, v good,good, fair, poor ( No zero point, no equal gap,no ratio but just comparison)
-
5/26/2018 Descriptive Statistics (1)
20/81
Types of Levels of Measurement
Nominal Scale: a discrete classificationof data, in which data are neithermeasured nor ordered but subjectsare merely allocated to distinctcategories: for example Male female,married unmarried widowed or
separated (No ratio, No zero point,No equal gap and no comparison)
-
5/26/2018 Descriptive Statistics (1)
21/81
Example
A sample of customers in a specialty icecream store was asked a series ofquestions.
What is your favorite flavor of ice cream.
How many times do you eat ice cream
Do you have children under the age of ten
living in your home Have you tried our latest ice cream
flavor?
-
5/26/2018 Descriptive Statistics (1)
22/81
Self Review 1-1
Chicago-based Market Facts asked a sample of1,960 consumers to try a newly developedchicken dinner by Boston Market. Of the 1,960
sampled, 1,176 said they would purchase thedinner if it is marketed.
(a) What could Market Facts report to BostonMarket regarding acceptance of the chickendinner in the population?
(b) Is this an example of descriptive statisticsor inferential statistics? Explain.
-
5/26/2018 Descriptive Statistics (1)
23/81
DESCRIPTIVE STATISTICSFREQUENCY DISTRIBUTION
A grouping of data into mutuallyexclusive classes showing the numberof observations in each. The raw data
are more easily interpreted iforganized into a frequency distribution.
How to find maximum of data
How to find minimum of data Where is the cluster of data
What is the typical price of vehicleDr. Iftikhar Hussain Adil
-
5/26/2018 Descriptive Statistics (1)
24/81
DESCRIPTIVE STATISTICS
Step 1: Decide on the number ofclasses.
Step 2: Determine the class interval'or width.
Step 3: Set the individual class limits
Step 4: Tally the vehicle selling pricesinto the classes.
Dr. Iftikhar Hussain Adil
-
5/26/2018 Descriptive Statistics (1)
25/81
DESCRIPTIVE STATISTICS
Step 5: Count the number of items in each
class.
class frequency The number ofobservations in each class.
class midpoint
class interval
Relative frequency
Dr. Iftikhar Hussain Adil
-
5/26/2018 Descriptive Statistics (1)
26/81
Self Review 2.2
Barry Bonds of the San Francisco Giantsestablished a new single season home runrecord by hitting 73 home runs during the
2001 Major League Baseball season. Thelongest of these home runs traveled 488 feetand the shortest 320 feet. You need toconstruct a frequency distribution of thesehome run lengths.
(a) How many classes would you use?
(b) What class interval would you suggest?
(c) What actual classes would you suggest?
-
5/26/2018 Descriptive Statistics (1)
27/81
Exercise Page 31
1. A set of data consists of 38 observations. How manyclasses would you recommend for the frequencydistribution?
2. A set of data consists of 45 observations between $0
and $29. What size would you recommend for the classinterval?
3. A set of data consists of 230 observations between$235 and $567. What class interval would yourecommend?
4. A set of data contains 53 observations. The lowestvalue is 42 and the largest is 129. The data are to beorganized into a frequency distribution.
a. How many classes would you suggest?
b. What would you suggest as the lower limit of the first
class?
-
5/26/2018 Descriptive Statistics (1)
28/81
5. Wachesaw Manufacturing, Inc. produced the followingnumber of units the last 16 days. 27, 27, 27, 28, 27,25, 25, 28, 26, 28, 26, 28, 31, 30, 26,26
The information is to be organized into a frequencydistribution.
a. How many classes would you recommend?
b. What class interval would you suggest?
c. What lower limit would you recommend for the first
class? d. Organize the information into a frequency distribution
and determine the relative frequency distribution.
e. Comment on the shape of the distribution.
-
5/26/2018 Descriptive Statistics (1)
29/81
HISTOGRAM
A graph in which the classes aremarked on the horizontal axis and theclass frequencies on the vertical axis.
The class frequencies are representedby the heights of the bars, and thebars are drawn adjacent to each
other.
-
5/26/2018 Descriptive Statistics (1)
30/81
HISTOGRAM
-
5/26/2018 Descriptive Statistics (1)
31/81
Frequency Polygon
It consists of line segmentsconnecting the points formed by theintersections of the class midpoints
and the class frequencies.
cumulative frequency distribution
cumulative frequency polygon
-
5/26/2018 Descriptive Statistics (1)
32/81
Frequency Polygon
-
5/26/2018 Descriptive Statistics (1)
33/81
Frequency Polygon
-
5/26/2018 Descriptive Statistics (1)
34/81
Cumulative Frequency Polygon
-
5/26/2018 Descriptive Statistics (1)
35/81
Pareto Diagram
A pareto diagram is a bar chart thatdisplays the frequency of defectcauses
Line Graphs
-
5/26/2018 Descriptive Statistics (1)
36/81
Bar Charts
A bar chart can be used to depict any of
the levels of measurement-nominal,
ordinal, interval, or ratio.
The level of education is an ordinalscale variable and is reported on thehorizontal axis
-
5/26/2018 Descriptive Statistics (1)
37/81
Difference b/w Histogram andBar Chart
In a histogram, the horizontal axis refersto the ratio scale variable-vehicle sellingprice. This is a continuous variable; hence
there is no space between the bars.Another difference between a bar chartand a histogram is the vertical scale. In ahistogram the vertical axis is the
frequency or number of observations. In abar chart the vertical scale refers to anamount.
-
5/26/2018 Descriptive Statistics (1)
38/81
-
5/26/2018 Descriptive Statistics (1)
39/81
DESCRIPTIVE STATISTICS
Measures of Location
Measures of Variability
Measure of Relative Position
Measure of Shape
Dr. Iftikhar Hussain Adil
-
5/26/2018 Descriptive Statistics (1)
40/81
Measures of Location
POPULATION MEAN:
For raw data, that is, data that has notbeen grouped in a frequencydistribution, the population mean isthe sum of all the values in thepopulation divided by the number of
values in the population.Or
Dr. Iftikhar Hussain Adil
-
5/26/2018 Descriptive Statistics (1)
41/81
Measures of Location
The Sample Mean:
For raw data, that is, ungrouped data,the mean is the sum of all thesampled values divided by the totalnumber of sampled values
or
-
5/26/2018 Descriptive Statistics (1)
42/81
Measures of Location
Examples: To obtain grade A, Ben mustachieve an average of at least 80 percent infive tests. If his average marks for the first
four tests is 78, what is the lowest marks hecan get in his fifth test and still obtain grade A?
The speeds to the nearest mile per hr, of 120vehicles passing a check point were recordedand grouped into the table below. Estimate themean of this distribution.
Speedmph
21-25 26-30 31-35 36-45 46-60
No of
vehicles
22 48 25 16 9
-
5/26/2018 Descriptive Statistics (1)
43/81
-
5/26/2018 Descriptive Statistics (1)
44/81
-
5/26/2018 Descriptive Statistics (1)
45/81
Measures of LocationProperties of Mean
1. Every set of interval- or ratio-leveldata has a mean.
2. All the values are included incomputing the mean.
3. The mean is unique.
4. The sum of the deviations of eachvalue from the mean will always bezero.
-
5/26/2018 Descriptive Statistics (1)
46/81
The Weighted Mean
The weighted mean is a special caseof the arithmetic mean. It occurswhen there are several observationsof the same value.
-
5/26/2018 Descriptive Statistics (1)
47/81
Example: A candidate obtained thefollowing results at NBS
Quizzes Mid Assignments Final
92% 95% 90% 65%
The regulations states that quizzeshaving weight of 15%, assignments10%, mid 25% and final 50%.What isthe candidates final percentage?
-
5/26/2018 Descriptive Statistics (1)
48/81
The Median:
The midpoint of the values after theyhave been ordered from the smallestto the largest, or the largest to thesmallest.
-
5/26/2018 Descriptive Statistics (1)
49/81
Properties of Median
The median is unique.
It is not affected by extremely largeor small values.
It can be computed for ratio-level,interval-level, and ordinal-level data.
-
5/26/2018 Descriptive Statistics (1)
50/81
MODE:The value of the observationthat appears most frequently.
-
5/26/2018 Descriptive Statistics (1)
51/81
Properties of Mode
It is Robust measure.
In several data sets there is no modeor more than one mode
-
5/26/2018 Descriptive Statistics (1)
52/81
Geometric Mean
The geometric mean is useful infinding the average of percentages,ratios, indexes, or growth rates.
-
5/26/2018 Descriptive Statistics (1)
53/81
Measures of Variability
Why Study Dispersion1. The average is not representative because of
the large spread.
2. A second reason for studying the dispersion ina set of data is to compare the spread in twoor more distributions.
A small value for a measure of dispersion
indicates that the data are clustered closely,say, around the arithmetic mean. The mean istherefore considered representative of thedata. Conversely, a large measure ofdispersion indicates that the mean is notreliable.
-
5/26/2018 Descriptive Statistics (1)
54/81
-
5/26/2018 Descriptive Statistics (1)
55/81
-
5/26/2018 Descriptive Statistics (1)
56/81
Measures of Variability
Range
The range is based on the largest andthe smallest values in the data set. Itis the difference of largest andsmallest value.
Range = Largest value - Smallest value
-
5/26/2018 Descriptive Statistics (1)
57/81
MEAN DEVIATION
The arithmetic mean of the absolutevalues of the deviations from thearithmetic mean.
Ad a ta e a d D a ba k
-
5/26/2018 Descriptive Statistics (1)
58/81
Advantages and Drawbackof Mean Deviation
it uses all the values in thecomputation.
It is easy to understand.
It uses absolute values and it isdifficult to work with absolute values
so this measure is not frequentlyused.
-
5/26/2018 Descriptive Statistics (1)
59/81
VARIANCE:The arithmetic mean of thesquared deviations from the mean.
STANDARD DEVIATION: The squareroot of the variance.
Population Variance:
Sample Variance:
-
5/26/2018 Descriptive Statistics (1)
60/81
-
5/26/2018 Descriptive Statistics (1)
61/81
CHEBYSHEV'S THEOREM
For any set of observations (sampleor population), the proportion of thevalues that lie within k standard
deviations of the mean is at least
(1 1/k2)
where k is any constant greater than
1.
-
5/26/2018 Descriptive Statistics (1)
62/81
-
5/26/2018 Descriptive Statistics (1)
63/81
EMPIRICAL RULE
For a symmetrical, bell-shapedfrequency distribution, approximately 68percent of the observations will lie
within plus and minus one standarddeviation of the mean; about 95 percentof the observations will lie within plusand minus two standard deviations of
the mean; and practically all (99.7percent) will lie within plus and minusthree standard deviations of the mean.
Q a tiles Deciles and
-
5/26/2018 Descriptive Statistics (1)
64/81
Quartiles, Deciles, andPercentiles
a percentile(or centile) is the valueof a variable below which a certainpercent of observations fall
Lp=(n+1)*P/100
91, 75, 61, 101,43,104
-
5/26/2018 Descriptive Statistics (1)
65/81
Box Plots
Abox plot is a graphical display,based on quartiles, that helps uspicture a set of data.
To construct a box plot, we needonly five statistics: the minimumvalue, Q1(the first quartile), the
median, Q3(the third quartile), andthe maximum value.
-
5/26/2018 Descriptive Statistics (1)
66/81
Outlier: An outlier is a value that isinconsistent with the rest of the data.
Inter Quartile Range:
The inter quartile range is thedistance between the first and thenthird quartile.
-
5/26/2018 Descriptive Statistics (1)
67/81
-
5/26/2018 Descriptive Statistics (1)
68/81
Skewness
Symmetric: In a symmetric set ofobservations the mean and median are equaland the data values are evenly spread around
these values. The data values below the meanand median are a mirror image of those above.
Positively Skewed:A set of values isskewed to the right or positively skewed if
there is a single peak and the values extendmuch further to the right of the peak than tothe left of the peak. In this case the mean islarger than the median.
-
5/26/2018 Descriptive Statistics (1)
69/81
Skewness
Negatively Skewed:In a negativelyskewed distribution there is a singlepeak but the observations extend
further to the left, in the negativedirection, than to the right. In negativelyskewed distribution the mean is smallerthan the median.
Bimodal:A bimodal distribution will havetwo or more peaks. This is often thecase when the values are from two
populations.
How to Access Skewness with
-
5/26/2018 Descriptive Statistics (1)
70/81
How to Access Skewness withthe help of Boxplot
Symmetric
The distance from Min to Q2= Q2toMax
The distance from Min to Q1= Q3toMax
The distance from Q1to Q2= Q2to Q3
How to Access Skewness with
-
5/26/2018 Descriptive Statistics (1)
71/81
How to Access Skewness withthe help of Boxplot
Right Skewed
The distance from Q2to Max > Min toQ2
The distance from Q3to Max > Min to
Q1
The distance Q2to Q3> Q1to Q2
How to Access Skewness with
-
5/26/2018 Descriptive Statistics (1)
72/81
How to Access Skewness withthe help of Boxplot
Left Skewed
The distance from Min to Q2> Q2toMax
The distance from Min to Q1> Q3toMax
The distance Q1to Q2> Q2to Q3
-
5/26/2018 Descriptive Statistics (1)
73/81
Skewness
-
5/26/2018 Descriptive Statistics (1)
74/81
Measures of Skewness
-
5/26/2018 Descriptive Statistics (1)
75/81
-
5/26/2018 Descriptive Statistics (1)
76/81
Univariate Vs Bivariate
Scatter Diagram
we use to show the relationshipbetween variables is called a scatterdiagram.
CONTINGENCY TABLE
A table used to classify observationsaccording to two identifiablecharacteristics.
-
5/26/2018 Descriptive Statistics (1)
77/81
-
5/26/2018 Descriptive Statistics (1)
78/81
-
5/26/2018 Descriptive Statistics (1)
79/81
Stem and Leaf Plot
-
5/26/2018 Descriptive Statistics (1)
80/81
Stem and leaf
-
5/26/2018 Descriptive Statistics (1)
81/81