descriptive statistics (1)

Upload: zeeshan-abdullah

Post on 16-Oct-2015

124 views

Category:

Documents


0 download

TRANSCRIPT

  • 5/26/2018 Descriptive Statistics (1)

    1/81

  • 5/26/2018 Descriptive Statistics (1)

    2/81

    WHAT IS STATISTICS

    Statistics is defined as the science ofcollecting, organizing, presenting,

    analyzing, and interpreting data toassist in making more effectivedecisions.

    ORCollection of numerical information iscalled statistics.

    Dr. Iftikhar Hussain Adil

  • 5/26/2018 Descriptive Statistics (1)

    3/81

    WHAT IS STATISTICS

    Broadly defined, it is the science,technology and art of extractinginformation from observational data,

    with an emphasis on solving real worldproblems.

    It is a logic and methodology for the

    measurement of uncertainty and forexamination of the consequences of thatuncertainty in the planning andinterpretation of experimentation and

    observation.

  • 5/26/2018 Descriptive Statistics (1)

    4/81

    TYPES OF STATISTICS

    Dr. Iftikhar Hussain Adil

    StatisticalMethods

    Descriptive

    Statistics

    Inferential

    Statistics

  • 5/26/2018 Descriptive Statistics (1)

    5/81

    TYPES OF STATISTICS

    DESCRIPTIVE STATISTICSMethods of organizing, summarizing,and presenting data in an informativeway.

    INFERENTIAL STATISTICS

    The methods used to determine

    something about a population on thebasis of a sample.

  • 5/26/2018 Descriptive Statistics (1)

    6/81

    DESCRIPTIVE STATISTICS

    Dr. Iftikhar Hussain Adil

  • 5/26/2018 Descriptive Statistics (1)

    7/81

    Inferential Statistics

    Aim to draw conclusions about anadditional population outside of yourdatasets/sample is known to beinferential statistics.

  • 5/26/2018 Descriptive Statistics (1)

    8/81

    Population versus Sample

    A population is the complete set of allitems that interests an investigator.Population size N, can be very large or

    even infinite.

    e.g. All the registered voters of Pakistan

    All the students at NUST

    Sample is an observed subset of thepopulation values with sample sizegiven by n

  • 5/26/2018 Descriptive Statistics (1)

    9/81

    Sampling Techniques

    Simple Random Sampling

    Systematic Sampling

    Stratified Sampling Possible strata: (Male and female strata, Resident

    and non-resident strata, White, Black, Hispanic, and Asianstrata, Protestant, Catholic, Jewish, Muslim, etc., strata)

    Clustered Sampling Sample of Convenience

  • 5/26/2018 Descriptive Statistics (1)

    10/81

    Parameter and Statistic

    A parameter is a specific characteristic of apopulation. A statistic is a specificcharacteristic of a sample.

    e.g. NBS surveyed its students to determine theaverage daily expense. From a sample of 80students the average expense was computedRs.133.

    What is population?

    What is sample?

    What is parameter?

    What is statistic?

    Is Rs.133 a parameter or statistic?

  • 5/26/2018 Descriptive Statistics (1)

    11/81

    Types of Variables

    Variable: A characteristic of an item orindividual that will be analyzed byusing statistics.

    e.g. Gender, Party affiliation of registeredvoters, HH income of citizens who live inspecific geographic area, Publishing

    category (hard cover, trade paper book,mass marked paper book, text book) ofa book. No of televisions in a householdetc.

  • 5/26/2018 Descriptive Statistics (1)

    12/81

    Example (Types of variables)

    Reg # GenderAge FA/FSC or

    equivalent

    Family

    Members

    1 M 18.2 67 4

    2 F 19 70 3

    3 M 20 80 5

    4 F 19.4 85 6

    5 F 20.6 73 3

    6 M 21 76 4

    7 F 20.3 67 58 F 19.8 89 4

  • 5/26/2018 Descriptive Statistics (1)

    13/81

    Types of Variables

    Categorical Variables

    A categorical variable is a variable that can takeon one of a limited, and usually fixed, number

    of possible values. Categorical variables areoften used to represent categorical data.

    The values of these variables are selected from anestablished list of categories.

    Male/ Female, Pass/ Fail, SA,A,D,SD Numerical variables

    The values of these variables involve a counted ormeasured valued

  • 5/26/2018 Descriptive Statistics (1)

    14/81

    Types of Variables

    Discrete Variables: The vales of thesevariables counts.

    e.g. Number of people living in a HHContinuous Variables: These variables

    have continuous values and any valuecan theoretically occur limited only by

    the precision of the measuringprocess. E.g time to complete a work,air pressure in tyre.

  • 5/26/2018 Descriptive Statistics (1)

    15/81

  • 5/26/2018 Descriptive Statistics (1)

    16/81

    Levels of Measurement

    Levels of measurement often dictatethe calculations that can be done tosummarize and present the data. Italso determines the statistical testthat should be performed.

    e.g. Balls in a bag are of different colors

    like brown, yellow, blue, green,orange or red etc.

  • 5/26/2018 Descriptive Statistics (1)

    17/81

    Types of Levels of Measurement

    Ratio Level Data: When a scaleconsist of not only of equidistantpoints but also has a meaningful zeropoint, then we refer it as ratio scale.

    Ratio scales are more sophisticated ofscales since it incorporates all the

    characteristics of nominal, ordinal andinterval scales. E.g. income data

  • 5/26/2018 Descriptive Statistics (1)

    18/81

    Properties of Ratio Level

    Equal differences in the characteristic arerepresented by equal differences in thenumbers assigned to the classifications.

    Can be added or subtracted i.e.X1+X2 or X1-X2is possible

    Can be multiplied or divided

    X1*X2 orX1/X2 is possible

    Can be ordered

    X1X2 Meaningful zero point

  • 5/26/2018 Descriptive Statistics (1)

    19/81

    Types of Levels of Measurement

    Interval Scale: An interval scale satisfies x2-x1or x2x1or x1x2but not the ratio.

    e.g. 100Ois not twice as warm as 50o

    (no zero point, no ratio but x2x1or x1x2)

    Ordinal Scale: When item are classifiedaccording to more or less characteristics, thescale used is referred as ordinal scale. Thisscale is common in marketing, satisfaction andattitudinal research.E.g. Excellent, v good,good, fair, poor ( No zero point, no equal gap,no ratio but just comparison)

  • 5/26/2018 Descriptive Statistics (1)

    20/81

    Types of Levels of Measurement

    Nominal Scale: a discrete classificationof data, in which data are neithermeasured nor ordered but subjectsare merely allocated to distinctcategories: for example Male female,married unmarried widowed or

    separated (No ratio, No zero point,No equal gap and no comparison)

  • 5/26/2018 Descriptive Statistics (1)

    21/81

    Example

    A sample of customers in a specialty icecream store was asked a series ofquestions.

    What is your favorite flavor of ice cream.

    How many times do you eat ice cream

    Do you have children under the age of ten

    living in your home Have you tried our latest ice cream

    flavor?

  • 5/26/2018 Descriptive Statistics (1)

    22/81

    Self Review 1-1

    Chicago-based Market Facts asked a sample of1,960 consumers to try a newly developedchicken dinner by Boston Market. Of the 1,960

    sampled, 1,176 said they would purchase thedinner if it is marketed.

    (a) What could Market Facts report to BostonMarket regarding acceptance of the chickendinner in the population?

    (b) Is this an example of descriptive statisticsor inferential statistics? Explain.

  • 5/26/2018 Descriptive Statistics (1)

    23/81

    DESCRIPTIVE STATISTICSFREQUENCY DISTRIBUTION

    A grouping of data into mutuallyexclusive classes showing the numberof observations in each. The raw data

    are more easily interpreted iforganized into a frequency distribution.

    How to find maximum of data

    How to find minimum of data Where is the cluster of data

    What is the typical price of vehicleDr. Iftikhar Hussain Adil

  • 5/26/2018 Descriptive Statistics (1)

    24/81

    DESCRIPTIVE STATISTICS

    Step 1: Decide on the number ofclasses.

    Step 2: Determine the class interval'or width.

    Step 3: Set the individual class limits

    Step 4: Tally the vehicle selling pricesinto the classes.

    Dr. Iftikhar Hussain Adil

  • 5/26/2018 Descriptive Statistics (1)

    25/81

    DESCRIPTIVE STATISTICS

    Step 5: Count the number of items in each

    class.

    class frequency The number ofobservations in each class.

    class midpoint

    class interval

    Relative frequency

    Dr. Iftikhar Hussain Adil

  • 5/26/2018 Descriptive Statistics (1)

    26/81

    Self Review 2.2

    Barry Bonds of the San Francisco Giantsestablished a new single season home runrecord by hitting 73 home runs during the

    2001 Major League Baseball season. Thelongest of these home runs traveled 488 feetand the shortest 320 feet. You need toconstruct a frequency distribution of thesehome run lengths.

    (a) How many classes would you use?

    (b) What class interval would you suggest?

    (c) What actual classes would you suggest?

  • 5/26/2018 Descriptive Statistics (1)

    27/81

    Exercise Page 31

    1. A set of data consists of 38 observations. How manyclasses would you recommend for the frequencydistribution?

    2. A set of data consists of 45 observations between $0

    and $29. What size would you recommend for the classinterval?

    3. A set of data consists of 230 observations between$235 and $567. What class interval would yourecommend?

    4. A set of data contains 53 observations. The lowestvalue is 42 and the largest is 129. The data are to beorganized into a frequency distribution.

    a. How many classes would you suggest?

    b. What would you suggest as the lower limit of the first

    class?

  • 5/26/2018 Descriptive Statistics (1)

    28/81

    5. Wachesaw Manufacturing, Inc. produced the followingnumber of units the last 16 days. 27, 27, 27, 28, 27,25, 25, 28, 26, 28, 26, 28, 31, 30, 26,26

    The information is to be organized into a frequencydistribution.

    a. How many classes would you recommend?

    b. What class interval would you suggest?

    c. What lower limit would you recommend for the first

    class? d. Organize the information into a frequency distribution

    and determine the relative frequency distribution.

    e. Comment on the shape of the distribution.

  • 5/26/2018 Descriptive Statistics (1)

    29/81

    HISTOGRAM

    A graph in which the classes aremarked on the horizontal axis and theclass frequencies on the vertical axis.

    The class frequencies are representedby the heights of the bars, and thebars are drawn adjacent to each

    other.

  • 5/26/2018 Descriptive Statistics (1)

    30/81

    HISTOGRAM

  • 5/26/2018 Descriptive Statistics (1)

    31/81

    Frequency Polygon

    It consists of line segmentsconnecting the points formed by theintersections of the class midpoints

    and the class frequencies.

    cumulative frequency distribution

    cumulative frequency polygon

  • 5/26/2018 Descriptive Statistics (1)

    32/81

    Frequency Polygon

  • 5/26/2018 Descriptive Statistics (1)

    33/81

    Frequency Polygon

  • 5/26/2018 Descriptive Statistics (1)

    34/81

    Cumulative Frequency Polygon

  • 5/26/2018 Descriptive Statistics (1)

    35/81

    Pareto Diagram

    A pareto diagram is a bar chart thatdisplays the frequency of defectcauses

    Line Graphs

  • 5/26/2018 Descriptive Statistics (1)

    36/81

    Bar Charts

    A bar chart can be used to depict any of

    the levels of measurement-nominal,

    ordinal, interval, or ratio.

    The level of education is an ordinalscale variable and is reported on thehorizontal axis

  • 5/26/2018 Descriptive Statistics (1)

    37/81

    Difference b/w Histogram andBar Chart

    In a histogram, the horizontal axis refersto the ratio scale variable-vehicle sellingprice. This is a continuous variable; hence

    there is no space between the bars.Another difference between a bar chartand a histogram is the vertical scale. In ahistogram the vertical axis is the

    frequency or number of observations. In abar chart the vertical scale refers to anamount.

  • 5/26/2018 Descriptive Statistics (1)

    38/81

  • 5/26/2018 Descriptive Statistics (1)

    39/81

    DESCRIPTIVE STATISTICS

    Measures of Location

    Measures of Variability

    Measure of Relative Position

    Measure of Shape

    Dr. Iftikhar Hussain Adil

  • 5/26/2018 Descriptive Statistics (1)

    40/81

    Measures of Location

    POPULATION MEAN:

    For raw data, that is, data that has notbeen grouped in a frequencydistribution, the population mean isthe sum of all the values in thepopulation divided by the number of

    values in the population.Or

    Dr. Iftikhar Hussain Adil

  • 5/26/2018 Descriptive Statistics (1)

    41/81

    Measures of Location

    The Sample Mean:

    For raw data, that is, ungrouped data,the mean is the sum of all thesampled values divided by the totalnumber of sampled values

    or

  • 5/26/2018 Descriptive Statistics (1)

    42/81

    Measures of Location

    Examples: To obtain grade A, Ben mustachieve an average of at least 80 percent infive tests. If his average marks for the first

    four tests is 78, what is the lowest marks hecan get in his fifth test and still obtain grade A?

    The speeds to the nearest mile per hr, of 120vehicles passing a check point were recordedand grouped into the table below. Estimate themean of this distribution.

    Speedmph

    21-25 26-30 31-35 36-45 46-60

    No of

    vehicles

    22 48 25 16 9

  • 5/26/2018 Descriptive Statistics (1)

    43/81

  • 5/26/2018 Descriptive Statistics (1)

    44/81

  • 5/26/2018 Descriptive Statistics (1)

    45/81

    Measures of LocationProperties of Mean

    1. Every set of interval- or ratio-leveldata has a mean.

    2. All the values are included incomputing the mean.

    3. The mean is unique.

    4. The sum of the deviations of eachvalue from the mean will always bezero.

  • 5/26/2018 Descriptive Statistics (1)

    46/81

    The Weighted Mean

    The weighted mean is a special caseof the arithmetic mean. It occurswhen there are several observationsof the same value.

  • 5/26/2018 Descriptive Statistics (1)

    47/81

    Example: A candidate obtained thefollowing results at NBS

    Quizzes Mid Assignments Final

    92% 95% 90% 65%

    The regulations states that quizzeshaving weight of 15%, assignments10%, mid 25% and final 50%.What isthe candidates final percentage?

  • 5/26/2018 Descriptive Statistics (1)

    48/81

    The Median:

    The midpoint of the values after theyhave been ordered from the smallestto the largest, or the largest to thesmallest.

  • 5/26/2018 Descriptive Statistics (1)

    49/81

    Properties of Median

    The median is unique.

    It is not affected by extremely largeor small values.

    It can be computed for ratio-level,interval-level, and ordinal-level data.

  • 5/26/2018 Descriptive Statistics (1)

    50/81

    MODE:The value of the observationthat appears most frequently.

  • 5/26/2018 Descriptive Statistics (1)

    51/81

    Properties of Mode

    It is Robust measure.

    In several data sets there is no modeor more than one mode

  • 5/26/2018 Descriptive Statistics (1)

    52/81

    Geometric Mean

    The geometric mean is useful infinding the average of percentages,ratios, indexes, or growth rates.

  • 5/26/2018 Descriptive Statistics (1)

    53/81

    Measures of Variability

    Why Study Dispersion1. The average is not representative because of

    the large spread.

    2. A second reason for studying the dispersion ina set of data is to compare the spread in twoor more distributions.

    A small value for a measure of dispersion

    indicates that the data are clustered closely,say, around the arithmetic mean. The mean istherefore considered representative of thedata. Conversely, a large measure ofdispersion indicates that the mean is notreliable.

  • 5/26/2018 Descriptive Statistics (1)

    54/81

  • 5/26/2018 Descriptive Statistics (1)

    55/81

  • 5/26/2018 Descriptive Statistics (1)

    56/81

    Measures of Variability

    Range

    The range is based on the largest andthe smallest values in the data set. Itis the difference of largest andsmallest value.

    Range = Largest value - Smallest value

  • 5/26/2018 Descriptive Statistics (1)

    57/81

    MEAN DEVIATION

    The arithmetic mean of the absolutevalues of the deviations from thearithmetic mean.

    Ad a ta e a d D a ba k

  • 5/26/2018 Descriptive Statistics (1)

    58/81

    Advantages and Drawbackof Mean Deviation

    it uses all the values in thecomputation.

    It is easy to understand.

    It uses absolute values and it isdifficult to work with absolute values

    so this measure is not frequentlyused.

  • 5/26/2018 Descriptive Statistics (1)

    59/81

    VARIANCE:The arithmetic mean of thesquared deviations from the mean.

    STANDARD DEVIATION: The squareroot of the variance.

    Population Variance:

    Sample Variance:

  • 5/26/2018 Descriptive Statistics (1)

    60/81

  • 5/26/2018 Descriptive Statistics (1)

    61/81

    CHEBYSHEV'S THEOREM

    For any set of observations (sampleor population), the proportion of thevalues that lie within k standard

    deviations of the mean is at least

    (1 1/k2)

    where k is any constant greater than

    1.

  • 5/26/2018 Descriptive Statistics (1)

    62/81

  • 5/26/2018 Descriptive Statistics (1)

    63/81

    EMPIRICAL RULE

    For a symmetrical, bell-shapedfrequency distribution, approximately 68percent of the observations will lie

    within plus and minus one standarddeviation of the mean; about 95 percentof the observations will lie within plusand minus two standard deviations of

    the mean; and practically all (99.7percent) will lie within plus and minusthree standard deviations of the mean.

    Q a tiles Deciles and

  • 5/26/2018 Descriptive Statistics (1)

    64/81

    Quartiles, Deciles, andPercentiles

    a percentile(or centile) is the valueof a variable below which a certainpercent of observations fall

    Lp=(n+1)*P/100

    91, 75, 61, 101,43,104

  • 5/26/2018 Descriptive Statistics (1)

    65/81

    Box Plots

    Abox plot is a graphical display,based on quartiles, that helps uspicture a set of data.

    To construct a box plot, we needonly five statistics: the minimumvalue, Q1(the first quartile), the

    median, Q3(the third quartile), andthe maximum value.

  • 5/26/2018 Descriptive Statistics (1)

    66/81

    Outlier: An outlier is a value that isinconsistent with the rest of the data.

    Inter Quartile Range:

    The inter quartile range is thedistance between the first and thenthird quartile.

  • 5/26/2018 Descriptive Statistics (1)

    67/81

  • 5/26/2018 Descriptive Statistics (1)

    68/81

    Skewness

    Symmetric: In a symmetric set ofobservations the mean and median are equaland the data values are evenly spread around

    these values. The data values below the meanand median are a mirror image of those above.

    Positively Skewed:A set of values isskewed to the right or positively skewed if

    there is a single peak and the values extendmuch further to the right of the peak than tothe left of the peak. In this case the mean islarger than the median.

  • 5/26/2018 Descriptive Statistics (1)

    69/81

    Skewness

    Negatively Skewed:In a negativelyskewed distribution there is a singlepeak but the observations extend

    further to the left, in the negativedirection, than to the right. In negativelyskewed distribution the mean is smallerthan the median.

    Bimodal:A bimodal distribution will havetwo or more peaks. This is often thecase when the values are from two

    populations.

    How to Access Skewness with

  • 5/26/2018 Descriptive Statistics (1)

    70/81

    How to Access Skewness withthe help of Boxplot

    Symmetric

    The distance from Min to Q2= Q2toMax

    The distance from Min to Q1= Q3toMax

    The distance from Q1to Q2= Q2to Q3

    How to Access Skewness with

  • 5/26/2018 Descriptive Statistics (1)

    71/81

    How to Access Skewness withthe help of Boxplot

    Right Skewed

    The distance from Q2to Max > Min toQ2

    The distance from Q3to Max > Min to

    Q1

    The distance Q2to Q3> Q1to Q2

    How to Access Skewness with

  • 5/26/2018 Descriptive Statistics (1)

    72/81

    How to Access Skewness withthe help of Boxplot

    Left Skewed

    The distance from Min to Q2> Q2toMax

    The distance from Min to Q1> Q3toMax

    The distance Q1to Q2> Q2to Q3

  • 5/26/2018 Descriptive Statistics (1)

    73/81

    Skewness

  • 5/26/2018 Descriptive Statistics (1)

    74/81

    Measures of Skewness

  • 5/26/2018 Descriptive Statistics (1)

    75/81

  • 5/26/2018 Descriptive Statistics (1)

    76/81

    Univariate Vs Bivariate

    Scatter Diagram

    we use to show the relationshipbetween variables is called a scatterdiagram.

    CONTINGENCY TABLE

    A table used to classify observationsaccording to two identifiablecharacteristics.

  • 5/26/2018 Descriptive Statistics (1)

    77/81

  • 5/26/2018 Descriptive Statistics (1)

    78/81

  • 5/26/2018 Descriptive Statistics (1)

    79/81

    Stem and Leaf Plot

  • 5/26/2018 Descriptive Statistics (1)

    80/81

    Stem and leaf

  • 5/26/2018 Descriptive Statistics (1)

    81/81