quantitative applications in management research e-book

Upload: sangeeta-chhetri

Post on 14-Oct-2015

174 views

Category:

Documents


0 download

DESCRIPTION

qamr

TRANSCRIPT

  • Quantitative Applications in Management and Research

    Amity Directorate of Distance & Online Education

    Decision-making is an essential and dominating part of the management process. Although authorities sometimes differ in their definitions of the basic functions of management, everybody agrees that one is not a manager unless he has some authority to plan, organise and control the activities of an enterprise and behaviour of the others. Within this context, decision-making may be viewed as the power to determine what plans will be made and how activities will be organized and controlled.

  • 1

    Preface

    It gives me immense pleasure in bringing out the Students Study Material for the subject Quantitative Applications in Management and Research. The matter is represented in an easy way and covers particularly the need of the desired course. The purpose of the course is to help students acquire the mathematical skills which is required in the field of management, the material is such arranged so as to allow the progressive learning of Quantitative techniques.

  • 2

    Index

    S.I. Nos. Chapter No. Subject Page No.

    1 Chapter 1 Introduction to Quantitative Analysis 3-10

    2 Chapter 2 Data Analysis 11-24

    3 Chapter 3 Correlation Analysis 25-34

    4 Chapter 4 Regression Analysis 35-41

    5 Chapter 5 Probability & Probability distribution 42-55

    6 Chapter 6 Time Series 55-69

    7 Key to End Chapter quizzes 70-71

    8 Bibliography 72

  • 3

    Chapter-I

    Introduction to Quantitative Analysis

    Contents:

    1.1 Introduction

    1.2 Decision - Making and Quantitative Techniques.

    1.2.1 Elements of any decision are

    1.3 Quantitative Applications in Management- an overview

    1.4 Application of Quantitative methods in business & Management

    1.4.1 Finance -Budgeting and Investments

    1.4.2 Purchasing, Procurement and Exploration

    1.4.3 Production Management

    1.4.4 Marketing

    1.4.5 Personal management

    1.4.6 Research and Development

  • 4

    Chapter-I Introduction to Quantitative Analysis

    1.1 Introduction

    Decision-making is an essential and dominating part of the management process. Although

    authorities sometimes differ in their definitions of the basic functions of management, everybody

    agrees that one is not a manager unless he has some authority to plan, organise and control the

    activities of an enterprise and behaviour of the others. Within this context, decision-making may

    be viewed as the power to determine what plans will be made and how activities will be organized

    and controlled. The right to make decisions is an internal part of right of authority upon which the

    entire concept of management rests. Essentially then, decision-making pervades the activities of

    every business manager. Further, since to carry out the key managerial functions of planning,

    organizing, directing and controlling, the management is engaged in a continuous process of

    decision-making pertaining to each of them, we can go to the extent of saying that management

    may be regarded as equivalent to decision-making.

    Traditionally, decision-making has been considered purely as an art, a talent which is

    acquired over a period of time through experience. It has been considered so because a variety

    of individual styles can be traced in handling and successfully solving similar type of managerial

    problems in actual business. However, the environment in which the management has to operate

    nowadays is complex and fast changing. There is a greater need for supplementing the art of

    decision-making by systematic and scientific methods. A systematic approach to decision-making

    is necessary because today's business and the environment in which it functions are far more

    complex than in the past, and the cost of making errors is becoming graver with time. Most of the

    business decisions cannot be made simply on the basic of rule of thumb, using commonsense

    and / or snap judgment. Commonsense may be misleading and snap judgments may have

    painful implications. For large business, a single wrong decision may not only one ruinous but

    may also have ramifications in national or even international economies. As such, present day

    management's cannot rely solely on a trial and error approach and the managers have to be

    more sophisticated. They should employ scientific methods to help them make proper choices.

    Thus, the decision makers, in the business world of today must understand scientific

    methodology for making decisions.

    1.2 Decision - Making and Quantitative Techniques

    Managerial decision-making is a process by which management, when faced with a problem,

    chooses a specific course of action from a set of possible options. In making a decision, a

    business manager attempts to choose that course of action which is most effective in the given

    circumstances in attaining the goals of the organization. The various types of decision-making

  • 5

    situations that a manager might encounter can be listed as follows.

    1. Decisions under certainty where all facts are known fully and for sure or uncertainly where the

    event that would actually occur is not known but probabilities can be assigned to various possible

    occurrences.

    2. Decisions for one time-period only called static decisions, or a sequence of interrelated

    decisions made either simultaneously or over several time periods called dynamic decisions.

    3. Decisions where the opponent is nature (digging an oil well, for example) or a national

    opponent (for instances, setting the advertising strategy when the actions of competitors have to

    be considered)

    These classes of decisions-making situations are not mutually exclusive and a given situation

    would exhibit characteristics from each class. Stocking of an item for sale in a certain trade fair, for

    instance, illustrates a static decision making situation where uncertainly exists and nature is the

    opponent.

    1.2.1 Elements of any decision are:

    i. a decision-maker who could be an individual, group, organization, or society;

    ii. a set of possible actions that may be taken to solve the decision problem;

    iii. a set of possible states that might occur;

    iv. a set of consequences (pay-offs) associated with various combinations of courses of action and

    the states that may occur; and

    v. the relationship between the pay-offs and the values of the decision maker;

    In an actual decision-making situation, definition and identification of the alternatives, the states

    and the consequences are most difficult, albeit not the most crucial, aspects of the decision

    problem.

    In real life, some decision-making situations are simple while others are not. Complexities in

    decision situations arise due to several factors. These include the complicated manner of interaction of the

    economic, political, technological, environmental and competitive forces in society, the limited resources of

    an organization; the values, risk attitudes and knowledge of the decision-makers and the like. For example, a

    company's decision to introduce a new product will be influenced by such considerations as market

    conditions, labour rates and availability, and investment requirements and availability of funds. The decision

    will be of multidimensional response, including the production methodology, cost and quality of the product,

    price, package design, and marketing and advertising strategy. The results of the decision would conceivably

  • 6

    affect every segment of the organisation. The essential idea of the quantitative approach to decision-making

    is that if the factors that influence the decisions can be identified and quantified then it becomes easier to

    resolve the complexity of the decision-making situations. Thus, in dealing with complex problems, we may

    use the tools of quantitative analysis. In fact, a large number of business problems have been given a

    quantitative representation with varying degrees of success and it has led to a general approach which is

    variably designated as operations research (for operational research), management science, systems

    analysis, decision analysis, decision science, etc. Quantitative analysis is now extended to several areas of

    business operations and represents probably the most effective approach to handling of some types of

    decision problems.

    A significant benefit of attaining some degree of proficiency with quantitative methods is exhibited in the way

    the problems are perceived and formulated. A problem has to be well defined before it can be formulated

    into a well-structured framework for solution. This requires an orderly and organised way of thinking.

    Two observations may be made here. First, it should be understood clearly that a decision by itself

    does not become a good and right decision for adoption merely because it is made within an orderly and

    mathematically precise framework. Quantification at best is an aid to business judgment and not its

    substitute. A certain degree of constructive skepticism is as desirable in considering a quantitative analysis

    of business decisions as it is in any other process of decision-making. Further, some allowances should be

    made for qualitative factors involving morale, motivation, leadership, etc. which cannot be ignored. But they

    should not be allowed to dominate to such an extent that the quantitative analysis may look to be an

    interesting academic exercise, but worthless. In fact, the manager should seek some balance between

    quantitative and qualitative factors. Should, it may be noted that the various names for quantitative analysis;

    operations research, management science, etc. cannot more or less the same general approach. We shall

    not attempt to discuss the differences among the various labels as it is prone to create more heat than light,

    but only state that the basic reason for so many titles is that the field is relatively new and there is not

    consensus regarding which field of knowledge it includes.

    1.3 Quantitative Applications in Management- an overview

    The objective of quantitative research is to develop and employ mathematical models, theories and/or

    hypotheses pertaining to natural phenomena. The process of measurement is central to quantitative

    research because it provides the fundamental connection between empirical observation and mathematical

    expression of quantitative relationships.

    Quantitative research is generally approached using scientific methods, which include:

    i. The generation of models, theories and hypotheses

    ii. The development of instruments and methods for measurement

  • 7

    iii. Experimental control and manipulation of variables

    iv. Collection of empirical data

    v. Modeling and analysis of data

    vi. Evaluation of results

    Quantitative methods are research techniques that are used to gather quantitative data - information

    dealing with numbers and anything that is measurable. Statistics, tables and graphs, are often used to

    present the results of these methods.

    1.4 Application of Quantitative methods in business & Management

    The tools and techniques of Quantitative Techniques used in areas of management decision making

    can be outlined as follows:

    1.4.1 Finance -Budgeting and Investments

    i. Cash-flow analysis, long range capital requirement, dividend policies, investments portfolios.

    ii. Credit policies, credit risks and delinquent account procedures.

    iii. Claim and complaint procedures.

    1.4.2 Purchasing, Procurement and Exploration

    i. Rules for buying, supplies and stable or varying prices.

    ii. Determination of quantities and timing of purchases.

    iii. Bidding policies.

    iv. Strategies for exploration and exploitation of raw material sources.

    v. Replacements policies.

    1.4.3 Production Management

    i. Physical distribution

    a) Location and size of warehouses, distribution centers and retail outlets.

    b) Distribution policy.

    ii. Facilities Planning a) Numbers and location of factories, warehouses, hospitals, etc.

    b) Loading and unloading facilities for railroads and trucks determining the transport schedule.

    iii. Manufacturing a) Production, scheduling and sequencing.

    b) Stabilisation of production and employment training, layoffs and optimum product mix.

    iv. Maintenance and Project scheduling a) Maintenance policies and preventive maintenance.

    b) Maintenance crew sizes.

    c) Project scheduling and allocation of resources.

  • 8

    1.4.4 Marketing

    i. Product selection, timing, competitive actions.

    ii. Number of salesman, frequency of calling on accounts per cent of time spent on prospects.

    iii. Advertising media with respect to cost and time.

    1.4.5 Personal management

    i. Selection of suitable personnel on minimum salary.

    ii. Mixes of age and skills.

    iii. Recruitment policies and assignment of jobs.

    1.4.6 Research and Development

    i. Determination of the areas of concentration of research and development.

    ii. Project selection.

    iii. Determination of time cost trade-off and control of development projects.

    iv. Reliability and alternative design.

  • 9

    Chapter-I Introduction to Quantitative Analysis

    End Chapter quizzes : I

    Ques 1. Traditionally, decision-making has been considered purely as an

    a. Art b. Science c. Social Science d. Mathematics

    Ques 2. Managerial decision-making is a process by which management, chooses a specific course of action from a set of

    a. Restricted options b. Possible options. c. No options d. None

    Ques 3. Decisions for one time-period only called

    a. dynamic decisions b. static decisions c. Both d. None

    Ques 4. Decision Making can be done under

    a. Certainty b. Uncertainty c. Both d. None

    Ques 5. Decision-maker could be

    a. an individual b. group c. society d. All the above

    Ques 6. Quantitative research is generally approached using scientific methods, which include:

    a. The generation of models, theories and hypotheses b. Experimental control and manipulation of variables c. Modeling and analysis of data d. All the above

  • 10

    Ques 7. Quantitative research provides the fundamental connection between

    a. empirical observation and mathematical expression b. empirical observation and qualitative expression c. empirical observation and social expression d. empirical observation and all expression

    Ques 8. Numbers and location of factories, warehouses, hospitals, etc comes under

    a. Maintenance and Project scheduling b. Purchasing, Procurement and Exploration c. Facilities Planning d. Physical distribution Ques 9. Selection of suitable personnel on minimum salary

    a. Production Management b. Personal management c. Research and Development d. Finance -Budgeting and Investments

    Ques 10. Most of the business decisions can be made on the basic of

    a. Rule of thumb b. Commonsense c. Snap judgment. d. Quantitative Techniques

  • 11

    Chapter-II

    Data Analysis

    Contents:

    2.1 Introduction

    2.1.1 Types of Data

    2.2 Some Definitions

    2.3 Frequency Distribution:

    2.3.1 Graphical presentation of Frequency distribution

    2.4 Measure of Central tendency

    2.4.1 Arithmetic Mean

    2.4.2 Median

    2.4.3 Mode

    2.5 Measure of Dispersion

    2.5.1 Range

    2.5.2 Mean Deviation

    2.5.3 Variance and standard deviation

    2.5.4 The Coefficient of Variation

  • 12

    Chapter-II Data Analysis

    2.1 Introduction

    Statistics is a branch of applied mathematics concerned with the collection and interpretation of quantitative data and

    the use of probability theory to estimate population parametersStatistical methods can be used to summarize or

    describe a collection of data; this is called descriptive statistics.

    Data: A collection of values to be used for statistical analysis.

    A dictionary defines data as facts or figures from which conclusions may be drawn. Data may consist of

    numbers, words, or images, particularly as measurements or observations of a set of variables. Data are often

    viewed as a lowest level of abstraction from which information and knowledge are derived. Thus, technically, it is a

    collective or plural noun.

    Datum is the singular form of the noun data. Data can be classified as either numeric or nonnumeric. Specific terms

    are used as follows:

    2.1.1 Types of Data

    I.I Qualitative data are nonnumeric.

    1. {Poor, Fair, Good, Better, Best}, colors (ignoring any physical causes), and types of material {straw, sticks, bricks} are examples of qualitative data.

    2. Qualitative data are often termed categorical data. Some books use the terms individual and variable to

    reference the objects and characteristics described by a set of data. They also stress the importance of exact

    definitions of these variables, including what units they are recorded in. The reason the data were collected

    is also important.

    II Quantitative data are numeric.

    Quantitative data are further classified as either discrete or continuous.

    Discrete data are numeric data that have a finite number of possible values.

    A classic example of discrete data is a finite subset of the counting numbers, {1,2,3,4,5} perhaps corresponding to {Strongly Disagree Strongly Agree}.

    When data represent counts, they are discrete. An example might be how many students were absent on a given day. ac on a given day. Counts are usually considered exact and integer.

    Continuous data have infinite possibilities: 1.4, 1.41, 1.414, 1.4142, 1.141421...

    The real numbers are continuous with no gaps or interruptions. Physically measureable quantities of length,

    volume, time, mass, etc. are generally considered continuous. At the physical level (microscopically), especially

  • 13

    for mass, this may not be true, but for normal life situations is a valid assumption.

    Data analysis is a process of gathering, modeling, and transforming data with the goal of highlighting useful

    information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and

    approaches, encompassing diverse techniques under a variety of names, in different business, science, and social

    science domains.

    2.2 Some Definitions

    Raw Data: Data collected in original form. Frequency: The number of times a certain value or class of values occurs. Frequency Distribution: The organization of raw data in table form with classes and frequencies. Categorical Frequency Distribution: A frequency distribution in which the data is only nominal or ordinal. Ungrouped Frequency Distribution: A frequency distribution of numerical data. The raw data is not grouped. Grouped Frequency Distribution: A frequency distribution where several numbers are grouped into one class. Class Limits: Separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limit of one class and the lower limit of the next. Class Boundaries: Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class limit. Class Width: The difference between the upper and lower boundaries of any class. The class width is also the difference between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is not the difference between the upper and lower limits of the same class. Class Mark (Midpoint): The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two. Cumulative Frequency: The number of values less than the upper class boundary for the current class. This is a running total of the frequencies. Relative Frequency: The frequency divided by the total frequency. This gives the percent of values falling in that class. Cumulative Relative Frequency (Relative Cumulative Frequency): The running total of the relative frequencies or the cumulative frequency divided by the total frequency, gives the percent of the values which are less than the upper class boundary.

    2.3 Frequency Distribution

    The distribution of empirical data is called a frequency distribution and consists of a count of the number of

    occurrences of each value. If the data are continuous, then a grouped frequency distribution is used. Typically, a

    distribution is portrayed using a frequency polygon or a histogram. Mathematical distributions are often used to

    define distributions. The normal distribution is, perhaps, the best known example. Many empirical distributions are

  • 14

    approximated well by mathematical distributions such as the normal distribution.

    Grouped Frequency Distribution A grouped frequency distribution is a frequency distribution in which

    frequencies are displayed for ranges of data rather than for individual values. For example, the distribution of heights

    might be calculated by defining one-inch ranges. The frequency of individuals with various heights rounded off to the

    nearest inch would be then be tabulated.

    2.3.1 Graphical presentation of Frequency distribution:

    Histogram

    A histogram is a graphical display of tabulated frequencies. A histogram is the graphical version of a table that shows

    what proportion of cases fall into each of several or many specified categories.

    Figure 2.1: Histogram

    Example of a histogram of 100 values

    Advantages

    Visually strong

    Can compare to normal curve

    Usually vertical axis is a frequency count of items falling into each category

    Disadvantages

    Cannot read exact values because data is grouped into categories

    More difficult to compare two data sets

    Use only with continuous data

    Frequency Polygons

    Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the same

    purpose as histograms, but are especially helpful in comparing sets of data. Frequency polygons are also a good

    choice for displaying cumulative frequency distributions.

  • 15

    To create a frequency polygon, start just as for histograms, by choosing a class interval. Then draw an X-axis

    representing the values of the scores in your data. Mark the middle of each class interval with a tick mark, and label it

    with the middle value represented by the class. Draw the Y-axis to indicate the frequency of each class. Place a point

    in the middle of each class interval at the height corresponding to its frequency. Finally, connect the points. You

    should include one class interval below the lowest value in your data and one above the highest value. The graph will

    then touch the X-axis on both sides.

    Figure 2.2: Histogram/Frequency Polygons

    Advantages

    Visually appealing

    Can compare to normal curve

    Can compare two data sets

    Disadvantages

    Anchors at both ends may imply zero as data points

    Use only with continuous data

    Frequency Curve

    A smooth curve which corresponds to the limiting case of a histogram computed for a frequency distribution

    of a continuous distribution as the number of data points becomes very large.

  • 16

    Figure 2.3 : Histogram/Frequency Polygons/Frequency Curve

    Advantages

    Visually appealing

    Disadvantages

    Anchors at both ends may imply zero as data points

    Use only with continuous data

    2.4 Measure of Central tendency

    Central Tendency is the center or middle of a distribution. There are many measures of central tendency. The most common are the mean, median and mode. The center of a distribution could be defined three ways:

    1. the point on which a distribution would balance, 2. the value whose average absolute deviation from all the other values is minimized, and

    3. the value whose squared difference from all the other values is minimized.

    From the simulation in this chapter, you discovered (we hope) that the mean is the point on which a distribution

    would balance, the median is the value that minimizes the sum of absolute deviations, and the mean is the value that

    minimizes the sum of the squared values.

    2.4.1 Arithmetic Mean

    The arithmetic mean is the most common measure of central tendency. For a data set, the mean is the sum of the

    observations divided by the number of observations. Basically, the mean describes the central location of the data.

    For a given set of data, where the observations are x1, x2,.,xi ; the Arithmetic Mean is defined as :

    The weighted arithmetic mean is used, if one wants to combine average values from samples of the same population

    with different sample sizes:

  • 17

    Example 1:

    Observations 12 15 20 22 30

    Weights 2 5 7 6 1

    Find the mean.

    Observations Weights xiwi

    Mean =401

    /21 =19.10

    12 2 24

    15 5 75

    20 7 140

    22 6 132

    30 1 30

    Total 21 404

    Advantages

    can be specified using and equation, and therefore can be manipulated algebraically

    is the most sufficient of the three estimators

    is the most efficient of the three estimators

    is unbiased

    Disadvantages

    is very sensitive to extreme scores (i.e., low resistance)

    value is unlikely to be one of the actual data points

    requires an interval scale

    anything else about the distribution that wed want to convey to someone if we were describing it to them?

    2.4.2 Median

    The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest

    value and picking the middle one. If there is an even number of observations, the median is not unique, so one often

    takes the mean of the two middle values.

    For Odd number of observations:

    Median = (n+1)/2 th

    observations.

  • 18

    For Even number of observations:

    Median = Average of (n/2) th

    and (n/2 + 1) th observations.

    Here are the sample test scores you have seen so often:

    100, 100, 99, 98, 92, 91, 91, 90, 88, 87, 87, 85, 85, 85, 80, 79, 76, 72, 67, 66, 45

    The "middle" score of this group could easily be seen as 87. Why? Exactly half of the scores lie above 87 and half lie

    below it. Thus, 87 is in the middle of this set of scores. This score is known as the median.

    In this example, there are 21 scores. The eleventh score in the ordered set is the median score (87), because ten

    scores are on either side of it.

    If there were an even number of scores, say 20, the median would fall halfway between the tenth and eleventh

    scores in the ordered set. We would find it by adding the two scores (the tenth and eleventh scores) together and

    dividing by two.

    Advantages

    is unbiased

    is unaffected by extreme scores (i.e., high resistance)

    doesnt require the use of an interval scale, as long as you can order the scores along some continuum

    then you can find the median

    Disadvantage

    can not be specified using an equation so cant be manipulated algebraically

    is the least sufficient of the three estimators

    is less efficient than the mean

    2.4.3 Mode

    The mode is the most frequently occurring value. It is the most common value in a distribution: The mode of 3, 4,

    4, 5, 5, 5, 8 is 5. Note that the mode may be very different from the mean and the median.

    With continuous data such as response time measured to many decimals, the frequency of each value is one

    since no two scores will be exactly the same. Therefore the mode of continuous data is normally computed from

    a grouped frequency distribution. The grouped frequency distribution table shows a grouped frequency

    distribution for the target response time data. Since the interval with the highest frequency is 600-700, the mode

    is the middle of that interval (650).

  • 19

    Table 2.1: Grouped frequency distribution

    Range Frequency

    500-600 3

    600-700 6

    700-800 5

    800-900 5

    900-1000 0

    1000-1100 1

    Advantages

    represents a number that actually occurred in the data

    represents the largest number of scores, and so the probability of getting that score is greater then the

    probability of getting any of the other scores if an observation is just chosen at random is unaffected by

    extreme scores (i.e., high resistance)

    is unbiased

    doesnt require an interval scale

    Disadvantages

    the mode depends on how we group the data

    can not be specified using an equation so cant be manipulated algebraically

    is less sufficient than the mean

    is less efficient than the mean

    2.5 Measure of Dispersion

    Measures of Dispersion provide us with a summary of how much the points in our data set vary, e.g. how spread out

    they are or how volatile they are.

    In measuring dispersion, it is necessary to know the amount of variation and the degree of variation. The

    former is designated as absolute measures if dispersion and expressed in the denomination of original variants while

    the latter is designated as related measures of dispersion.

    Absolute measures can be divided into positional measures based on some items of the series such as (I)

    Range, (ii) Quartile deviation or semi interquartile range and those which are based on all items in series such as

    (I) Mean deviation, (ii) Standard deviation. The relative measures in each of the above cases are called the

  • 20

    coefficients of the respective measures. For purposes of comparison between two or more series with varying size or

    number of items, varying central values or units of calculation, only relatives measures can be used.

    The following are the important methods of studying variation:

    1. Range

    2. Mean deviation

    3. Standard deviation and Variance (which is closely related to standard deviation)

    4. The Coefficient of Variation

    2.5.1 Range

    Range is the simplest of the summary measures of variation .It is also the crudest and most prone to error .It is

    computed as the difference between the largest and the smallest value in a data set:

    Range = H- L

    Absolute range H - L Relative range; Coefficient of range = =

    Sum of the two extremes H + L

    For example, for the data set {2, 2, 3, 4, 14}

    Range = 14-2=12

    14 2 12 Coefficient of range = = = 0.75 14 + 2 16

    Example:

    You are given the following data:

    3 6 9 11

    Compute the sample range

    Solution:

    H = 11, L = 3

    range = H - L = 11 - 3 = 8

    2.5.2 Mean Deviation

    Mean Deviation can be calculated from any value of Central Tendency, viz. Mean, Median, Mode. Accordingly, Mean

    Deviation can be of the following types:

    Mean Deviation about Mean

  • 21

    Mean Deviation about Median

    Mean Deviation about Mode

    Mean Deviation about Mean =

    Properties of Mean Deviation about Mean:-

    The average absolute deviation from the mean is less than or equal to the Standard Deviation.

    The mean deviation of any data set from its mean is always zero.

    The mean absolute deviation is the average absolute deviation from the mean and is a common measure of

    Forecast Error or Time Series Analysis.

    For example, for the data set {2, 2, 3, 4, 14}:

    Measure of central tendency Absolute deviation

    Mean = 5

    | 2 - 5| + | 2 - 5| +| 3 - 5| + | 4 - 5| + | 14 - 5| = 3.6

    5

    2.5.3 Variance and standard deviation

    Variance and standard deviation are the most common of all of the measures of variation

    Variance is a measure of statistical dispersion, indicating how its possible values are spread around the mean. Thus,

    variance indicates the variability of the values. A smaller value implies a smaller variation from the mean

    The positive square root of Variance is called the Standard Deviation.

    Let us consider an example:

    Values Xi - Mean(x) [Xi - XMean]2

    4 -1 1

  • 22

    6 1 1

    5 0 0

    5 0 0

    Total =20 , mean=5 2

    Variance = .2 =1/2

    S.D =

    2.5.4 The Coefficient of Variation

    The Coefficient of Variance is a measure of variation expressed as a percentage the sample mean:

    CV = S . 100 Xmean

  • 23

    Chapter-II Data Analysis

    End Chapter quizzes: II

    Ques 1. Singular form of the data is

    a. Datum b. Stratum c. Date d. Data

    Ques 2. Graphical presentation of Frequency distribution can be done by

    a. Histogram b. Frequency polygons c. Frequency Curve d. All the three

    Ques 3. Which one is unaffected by extreme scores

    a. Mean b. Median c. Mode d. Range

    Ques 4.Which one is not the Measure of Dispersion

    a. Range b. Mean deviation c. Histogram d. Standard deviation

    Ques 5.Chaya took 7 math tests in one marking period. What is the range of her test scores?

    89, 73, 84, 91, 87, 77, 94

    a. 25 b. 21 c. 13 d. 15

    Ques 6.In a crash test, 11 cars were tested to determine what impact speed was required to obtain minimal bumper damage. Find the mode of the speeds given in miles per hour below.

    24, 15, 18, 20, 18, 22, 20, 26, 18, 26, 24

    a. 18 b. 20 c. 18.6 d. 15

  • 24

    Ques 7. A survey conducted by an automobile company showed the number of cars per household and the corresponding probabilities. Find the standard deviation.

    Number of cars X 1 2 3 4

    Probability P(X) 0.32 0.51 0.12 0.05

    a. 4.24 b. 0.63 c. 0.79 d. 1.9

    Ques 8. The given data shows the number of burgers sold at a bakery in the last 14 weeks. 17, 13, 18, 17, 13, 16, 18, 19, 17, 13, 16, 18, 20, 19 Find the median number of burgers sold.

    a. 18.5 b. 17 c. 18 d. 17.5

    Ques 9.Histograms can be constructed for

    a. Discrete data b. Continuous data c. Both d. none

    Ques 10.Which is called positional average

    a. Mean b. Median c. Mode d. None

  • 25

    Chapter-III

    Correlation Analysis

    Contents:

    3.1 Introduction

    3.2 Types of Correlation

    3.2.1 Positive and Negative

    3.2.2 Simple, partial and multiple

    3.2.3 Linear and non-linear

    3.3 Degrees of Correlation

    3.3.1 Perfect correlation

    3.3.2 Limited degrees of correlation

    3.3.3 Absence of correlation

    3.4 Methods of Determining Correlation

    3.4.1 Scatter Plot

    3.4.2 Karl Pearsons coefficient of correlation

    3.4.3 Spearmans Rank-correlation coefficient

  • 26

    Chapter-III Correlation Analysis

    3.1 Introduction

    Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For

    example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't

    perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter

    one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of

    people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of

    the variation in peoples' weights is related to their heights.

    Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect

    there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater

    understanding of your data.

    3.2 Types of Correlation

    I. Positive and Negative

    II. Simple, partial and multiple

    III. Linear and non-linear

    3.2.1 Positive and Negative Correlation

    Positive Correlation

    If the higher scores on X are generally paired with the higher scores on Y, and the lower scores on X are

    generally paired with the lower scores on Y, then the direction of the correlation between two variables is

    positive.

    Negative Correlation

    If the higher scores on X are generally paired with the lower scores on Y, and the lower scores on X are

    generally paired with the higher scores on Y, then the direction of the correlation between two variables is

    negative.

    Figure: 3.1 Positive, Negative and No Correlation

  • 27

    3.2.2 Simple, partial and multiple

    The distinction between simple, partial and multiple Correlation is based upon the number of variables studied

    Simple Correlation

    Correlation between only two variables, e.g. Correlation between age and height, correlation between yield of

    rice and amount of rainfall in a given area are examples of Simple Correlation

    Multiple Correlation

    When correlation between three or more variables are studied simultaneously, then it is called multiple

    Correlation

    Partial Correlation

    In this we recognize more than two variables but consider only two variables to be influencing each other, the

    effect of other influencing variable being kept constant. The correlation between the two variables keeping the

    other variables constant is called partial correlation

    1 X1-Yield of rice

    2 X2-Amount of Rainfall

    3 X3-Amount of fertilizers

    4 X4-Type of soil

    5 X5-Advanced technologies used.

    Correlation analysis of X1, X2, X3, X4 and X5 is an example of Multiple Correlation whereas if we only

    study the relation between X1 and X2 keeping other variables constant it would be an example of Partial

    Correlation between yield of rice and amount of rainfall.

  • 28

    3.2.3 Linear and non-linear

    The nature of the graph gives us the idea of the linear type of correlation between two variables. If the graph is in

    a straight line, the correlation is called a "linear correlation" and if the graph is not in a straight line, the correlation

    is non-linear or curvi-linear

    3.3 Degrees of Correlation

    3.3.1 Perfect correlation

    If two variables changes in the same direction and in the same proportion, the correlation between the two is

    perfect positive. According to Karl Pearson the coefficient of correlation in this case is +1. On the other hand if

    the variables change in the opposite direction and in the same proportion, the correlation is perfect negative. its

    coefficient of correlation is -1. In practice we rarely come across these types of correlations.

    3.3.2 Limited degrees of correlation

    If two variables are not perfectly correlated or is there a perfect absence of correlation, then we term the

    correlation as Limited correlation. It may be positive, negative or zero but lies with the limits 1.

    3.3.3 Absence of correlation

    If two series of two variables exhibit no relations between them or change in variable does not lead to a change

    in the other variable, then we can firmly say that there is no correlation or absurd correlation between the two

    variables. In such a case the coefficient of correlation is 0.

    Table: 3.1 Meaning of (r) in the Correlation Coefficient

    Relationship Between X and Y

    r = + 1.0 Strong - Positive As X goes up, Y always also goes up

    r = + 0.5 Weak - Positive As X goes up, Y tends to usually also go

    up

    r = 0 - No Correlation - X and Y are not correlated

    r = - 0.5 Weak - Negative As X goes up, Y tends to usually go down

    r = - 1.0 Strong - Negative As X goes up, Y always goes down

    3.4 Methods of Determining Correlation

    1 Scatter Plot

    2 Karl Pearsons coefficient of correlation

    3 Spearmans Rank-correlation coefficient.

  • 29

    3.4.1 Scatter Plot (Scatter diagram or dot diagram)

    In this method the values of the two variables are plotted on a graph paper. One is taken along the horizontal ((x-

    axis) and the other along the vertical (y-axis). By plotting the data, we get points (dots) on the graph which are

    generally scattered and hence the name Scatter Plot.

    The manner in which these points are scattered, suggest the degree and the direction of correlation. The

    degree of correlation is denoted by r and its direction is given by the signs positive and negative.

    Figure: 3.2 Positive, Negative and No Correlation

    positive correlation negative correlation no correlation

    3.4.2 Karl Pearsons coefficient of correlation

    It gives the numerical expression for the measure of correlation. It is noted by r . The value of r gives the

    magnitude of correlation and sign denotes its direction. It is defined as

    r = . Cov (x,y) .

    (Var x .Var y)

    Table: 3.2 Correlation coefficient between advertisement expenditure(X) and sales (Y)

    X (Rs. akhs) Y (Rs. crore) (X- X Mean)2 (Y- Y Mean)

    2 (X - X Mean) (Y -Y Mean)

    4 16 0.1849 1.6641 0.5547

    6 29 2.4649 137.124 18.4789

    10 43 31.0249 661.004 143.2047

    5 20 0.3249 5.7100 1.5447

    1 3 11.7649 204.204 49.0147

    2 4 5.9049 176.624 32.2947

    3 6 2.0449 127.464 16.1447

    X =31 Y =121 (X- X Mean)2

    =53.7143

    (Y- Y Mean)2

    =1310.794

    (X - X Mean) (Y -Y Mean)

    =261.2371

  • 30

    X Mean = 4.43 and Y Mean = 17.29

    Sum of squared deviations in advertisement expenditure = 53.71

    Sum of squared deviations of sales = 1310.79

    Sum of cross products (SP) = 261.24

    Calculation of the Pearson r

    r = 261.24 = 261.24 .

    (53.71) (1310.79) 70402.53

    r = (261.24) / (265.33) = +0.985

    Interpretation

    The magnitude of the correlation between advertisement expenditure and sales = 0.985. The direction of the relationship is positive. As the advertisement expenditure increases so does the sales of the commodity.

    3.4.3 Spearmans Rank-correlation coefficient

    The most precise way to compare several pairs of data is to use a statistical test - this establishes whether the

    correlation is really significant or if it could have been the result of chance alone.

    Spearmans Rank correlation coefficient is a technique which can be used to summarise the strength and

    direction (negative or positive) of a relationship between two variables.

    The result will always be between 1 and minus 1.

    Method - calculating the coefficient

    Create a table from your data.

    Rank the two data sets. Ranking is achieved by giving the ranking '1' to the biggest number in a column, '2'

    to the second biggest value and so on. The smallest value in the column will get the lowest ranking. This

    should be done for both sets of measurements.

    Tied scores are given the mean (average) rank. For example, the three tied scores of 1 euro in the example

    below are ranked fifth in order of price, but occupy three positions (fifth, sixth and seventh) in a ranking

    hierarchy of ten. The mean rank in this case is calculated as (5+6+7) 3 = 6.

    Find the difference in the ranks (d): This is the difference between the ranks of the two values on each row

    of the table. The rank of the second value (price) is subtracted from the rank of the first (distance from the

    museum).

  • 31

    Square the differences (d) To remove negative values and then sum them (d2 ).

    Table: 3.3 Spearman's Rank Correlation

    Convenience

    Store

    Distance from

    CAM (m)

    Rank Price of 50cl

    bottle ()

    Rank Difference between

    the ranks (d)

    d2

    1 50 10 1.80 2 8 64

    2 175 9 1.20 3.5 5.5 30.25

    3 270 8 2.00 1 7 49

    4 375 7 1.00 6 1 1

    5 425 6 1.00 6 0 0

    6 580 5 1.20 3.5 1.5 2.25

    7 710 4 0.80 9 -5 25

    8 790 3 0.60 10 -7 49

    9 890 2 1.00 6 -4 16

    10 980 1 0.85 8 -7 49

    d =

    285.5

    Calculate the coefficient (R) using the formula below. The answer will always be between 1.0 (a perfect

    positive correlation) and -1.0 (a perfect negative correlation).

    When written in mathematical notation the Spearman Rank formula looks like this :

    Now to put all these values into the formula.

    Find the value of all the d2

    values by adding up all the values in the Difference2 column. In our example this is 285.5.

    Multiplying this by 6 gives 1713.

    Now for the bottom line of the equation. The value n is the number of sites at which you took measurements. This, in our example is 10. Substituting these values into n - n we get 1000 - 10

    We now have the formula: R = 1 - (1713/990) which gives a value for R: 1 - 1.73 = 0 -0.73.

  • 32

    What does this R value of -0.73 mean

    The R value of -0.73 suggests a fairly strong negative relationship.

  • 33

    Chapter-III Correlation

    End Chapter quizzes: III

    Ques.1. If the higher scores on X are paired with the lower scores on Y then the correlation between two variables is

    a. Positive b. Negative. c. No correlation d. Unknown

    Ques.2. The value of r gives the magnitude of correlation and sign denotes its

    a. Value b. Direction c. Both d. None

    Ques.3. When correlation between three or more variables are studied simultaneously, then it is called

    a. Simple Correlation b. Partial Correlation c. multiple Correlation d. All of the above

    Ques.4. If the graph between two variables gives a straight line, the correlation is called a

    a. linear correlation b. Curvi linear correlation c. Absence of correlation d. Simple correlation

    Ques.5. If two variables changes in the same direction and in the same proportion, the correlation between the two is

    a. Perfect negative b. Perfect positive c. Limited positive d. Limited Negative

    Ques.6.The correlation coefficient, r = 0, implies

    a. Perfect negative b. Perfect positive c. No correlation d. Limited correlation

    Ques.7 Which of the following is a stronger correlation than -.54? a. 0 b. -.45 c. .45 d. -.67

  • 34

    Ques.8 If the correlation between body weight and annual income were high and positive, we could conclude that:

    (a) High incomes cause people to eat more food. (b) Low incomes cause people to eat less food. (c) High income people tend to spend a greater proportion of their income on food than low income people,

    on average. (d) High income people tend to be heavier than low income people, on average.

    Ques.9 Men tend to marry women who are slightly younger than themselves. Suppose that every man married a woman who was exactly .5 of a year younger than themselves. Which of the following is CORRECT? (a) The correlation is -.5. (b) The correlation is .5. (c) The correlation is 1. (d) The correlation is -1.

    Ques.10. National consumer magazine reported the following correlations. The correlation between car weight and car reliability is -0.30. The correlation between car weight and annual maintenance cost is 0.20.

    Which of the following statements are true? I. Heavier cars tend to be less reliable. II. Heavier cars tend to cost more to maintain. III. Car weight is related more strongly to reliability than to maintenance cost.

    a. I only b. II only c. III only d. I, II, and III

  • 35

    Chapter-IV Regression Analysis

    Contents: 4.1 Introduction 4.2 Regression Equations 4.3 How to Find the Regression Equation 4.4 Properties of the Regression coefficients 4.5 Difference between Correlation and Regression

  • 36

    Chapter-IV Regression Analysis

    4.1 Introduction

    Regression analysis is a technique used for the modeling and analysis of numerical data consisting of values of a

    dependent variable (response variable) and of one or more independent variable (explanatory variables). The

    dependent variable in the regression equation is modeled as a function of the independent variables,

    corresponding parameters ("constants"), and an error term. The error term is treated as a random variable. It

    represents unexplained variation in the dependent variable. The parameters are estimated so as to give a "best

    fit" of the data. Most commonly the best fit is evaluated by using the least squares method, but other criteria have

    also been used.

    There are two types of variables in Regression Analysis.

    1 Dependent variable

    2 Independent variable

    Dependent variable is also known as regressed or predicted or explained variable .Independent variable is also

    known as regressor or predictor or explainer

    Simple regression is used to examine the relationship between one dependent and one independent variable. After

    performing an analysis, the regression statistics can be used to predict the dependent variable when the independent

    variable is known. Regression goes beyond correlation by adding prediction capabilities.

    The regression line (known as the least squares line) is a plot of the expected value of the dependent variable for

    all values of the independent variable. Technically, it is the line that "minimizes the squared residuals". The

    regression line is the one that best fits the data on a scatterplot.

    In the regression equation, if y is the dependent variable and x is the independent variable. Here are three

    equivalent ways to mathematically describe a linear regression model.

    1 y = intercept + (slope x) + error

    2 y = constant + (coefficient x) + error

    3 y = a + b x + e

    The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. It is expressed in

    the units of the Y-axis divided by the units of the X-axis. If the slope is positive, Y increases as X increases. If the

    slope is negative, Y decreases as X increases.

  • 37

    Figure: 4.1 Regression line

    The Y intercept is the Y value of the line when X equals zero. It defines the elevation of the line.

    For two variables X and Y, we will have two regression lines and they show mutual relationship between two

    variables. The regression line of Y on X gives the most probable estimate of the values of Y for given values of X

    whereas regression line of X on Y gives the most probable estimate of the values of X for given values of Y. Only one

    regression line: In case of perfect correlation (r = +1), both the line of regression coincide and we get only one line.

    4.2 Regression Equations

    Regression Equations are algebraic expressions of the regression lines.

    Regression Equation of Y on X

    Y=a +b X

    According to the principle of least squares, the normal equations for estimating a and b are

    Y = Na + b X

    XY =a X +b X2

    Regression Equation of X on Y

    X=a +b Y

    According to the principle of least squares, the normal equations for estimating a and b are

    X = Na + b Y

    XY =a Y +b Y2

    Regression Equation from Deviations taken from Arithmetic means of X and Y

    Y-YMean =b yx (X-XMean)

    byx is the regression coefficient of Y on X

    byx = xy .

    x2

  • 38

    4.3 How to Find the Regression Equation

    Five randomly selected students took a math aptitude test before they began their statistics course. The Statistics

    Department has three questions.

    i. What linear regression equation best predicts statistics performance, based on math aptitude scores?

    ii. If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics?

    iii. How well does the regression equation fit the data?

    In the table below, the xi column shows scores on the aptitude test. Similarly, the yi column shows statistics grades.

    The last two rows show sums and mean scores that we will use to conduct the regression analysis.

    Table: 4.1.

    Student xi

    yi (xi - x) (yi - y) (xi - x)2 (yi - y)

    2 (xi - x)(yi - y)

    1 95 85 17 8 289 64 136

    2 85

    95 7 18 49 324 126

    3 80 70 2 -7 4 49 -14

    4 70 65 -8 -12 64 144 96

    5 60 70 -18 -7 324 49 126

    Mean 390 385 730 630 470

    The regression equation is a linear equation of the form:

    y-ymean =b yx (x-xmean)

    byx is the regression coefficient of y on x

    byx = xy = 470 = 0.643836

    x2

    730

    y - 77 = 0 .643836 (x - 78)

    y = .643836 x + 26.78082

    Once you have the regression equation, using it is a snap. Choose a value for the independent variable (x), perform

    the computation, and you have an estimated value (y) for the dependent variable.

  • 39

    In our example, the independent variable is the student's score on the aptitude test. The dependent variable

    is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated statistics grade would be:

    y = 0.643836 x + 26.78082 =0.643836 x 80 + 26.78082= 26.768 + 51.52 = 78.288

    4.4 Properties of the Regression coefficients

    1. Correlation Coefficient is the geometric mean between the regression coefficients.

    r2 = b yx x b xy

    2. If one of the regression coefficients is greater than unity, the other must be less than unity.

    b yx 1 < 1

    b xy

    3. Both the regression coefficients will have the same sign.

    4. The Correlation Coefficient will have the same sign as that of regression coefficients.

    5. The arithmetic mean of the regression coefficients is greater than the Correlation Coefficient

    4.5 Difference between Correlation and Regression

    The difference between regression and correlation needs to be emphasised. Both methods attempt to describe the

    association between two (or more) variables, and are often confused by students and professional scientists alike!

    1 Correlation makes no a priori assumption as to whether one variable is dependent on the other(s) and is not

    concerned with the relationship between variables; instead it gives an estimate as to the degree of association

    between the variables. In fact, correlation analysis tests for interdependence of the variables.

    2 As regression attempts to describe the dependence of a variable on one (or more) explanatory variables; it implicitly

    assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless

    of whether the path of effect is direct or indirect.

  • 40

    Chapter-IV Regression Analysis End Chapter quizzes: IV

    Ques.1.In Regression Analysis the dependent variable is also known as

    a. Regressed variable b. Regressor variable c. Random variable d. All of the above

    Ques.2. Simple regression is used to examine the relationship between

    a. two dependent variables b. two independent variables c. one dependent and one independent variable d. two dependent and one independent variable

    Ques.3. In Regression Analysis, one regression line is obtained in case if

    a. r = +1 b. r = -1 c. r = +1 d. r = 0

    Ques.4. byx is the regression coefficient of Y on X

    a. byx = xy x

    2

    b. byx = xy y

    2

    c. byx = y

    2 .

    x2

    d. byx = x

    2 .

    xy

    Ques.5. If one of the regression coefficients is greater than unity, the other must be

    a. greater than unity b. less than unity c. equals to unity d. Not known

    Ques.6. Both the regression coefficients will have

    a. same sign b. opposite sign c. Not known d. None

    Ques.7 If y is the dependent variable and x is the independent variable. Then the linear regression model will

    be

  • 41

    a. x = a +b y + e b. y = b x c. x = b y d. y = a + b x + e

    Ques.8. The arithmetic mean of the regression coefficients is ----------- then the correlation coefficient

    a. Smaller b. Greater c. Equals to d. None

    Ques.9 A regression equation was computed to be Y = 35 + 6X. The value of 35 indicates that:

    a. An increase in one unit of X will result in an increase of 35 in Y b. The coefficient of correlation is 35 c. The coefficient of determination is 35 d. The regression line crosses the Y-axis at 35 Ques.10. After performing an analysis, the regression statistics can be used to predict the dependent variable when the ------------ variable is known

    a. Independent b. dependent c. correlation coefficient d. All of the above

  • 42

    Chapter-V

    Probability & Probability distribution

    Contents:

    5.1 Introduction

    5.1.1 Definition of Probability:

    5.1.2. Axioms of Probability

    5.1.3. How to Compute Probability:

    5.2 Addition Law of theorem

    5.3 Multiplication Law of Probability

    5.4 Probability Distribution

    5.5 Binomial Distribution

    5.5.1 Mean of Binomial Distribution

    5.6. Poisson Distribution

    5.6.1 Mean and variance of Poisson distribution

    5.7. Normal Distribution or Normal Curve

    5.7.1. Characteristics of Normal Distribution

    5.7.2. Empirical Rule

    .

  • 43

    Chapter-V Probability & Probability distribution

    5.1 Introduction

    Mathematically, the probability that an event will occur is expressed as a number between 0 and 1. Notationally, the

    probability of event A is represented by P (A).

    If P (A) equals zero, there is no chance that the event A will occur.

    If P (A) is close to zero, there is little likelihood that event A will occur.

    If P(A) is close to one, there is a strong chance that event A will occur

    If P (A) equals one, event A will definitely occur.

    The sum of all possible outcomes in a statistical experiment is equal to one. This means, for example, that if an

    experiment can have three possible outcomes (A, B, and C), then

    P (A) + P (B) + P(C) = 1.

    5.1.1 Definition of Probability

    Let an event A can happen in m ways, and fail in n ways where all ways are equally like are likely to occur, then the

    probability of the happening of event A is defined as

  • 44

    From above, it may be noted P (A) = p is such that 0 P 1. P () = q is called the complementary event. Also 0 q 1

    The sum of all possible outcomes in a statistical experiment is equal to one. This means, for example, that if an experiment can

    have three possible outcomes (A, B, and C), then P (A) + P (B) + P(C) = 1.

    Associated with each event A in S is the probability of A, P (A)

    5.1.2. Axioms of Probability

    Axioms:

    1. P (A) 0

    2. P(S) = 1 where S is the sample space

    3. P (A U B) = P (A) + P (B) if A and B are mutually exclusive

    e.g., P (ace or king) = P (ace) +P (king) =1/13+1/13=2/13.

    Theorems about probability can be proved using these axioms and these theorems can be used in probability calculations. P (A) = 1- P () P (A U B) = P (A) + P (B) P (A B) (for mutually not exclusive events) E.g. P (ace or black) = P (ace) + P (black) P(ace and black)= 4/52 + 26/52 2/52 = 28/52 = 7/13

    5.4 Some More Definitions:

    Here we define and explain certain term which are used frequently.

    (i) Trial and Event: Let an experiment be repeated under essentially the same conditions and let it result in any one of the several

    possible outcomes. Then the experiment is called a trial and the possible outcomes are known as event or cases. In a throw of a

    coin the turning of head or tail is called an event and the throwing of a coin is called a trial.

    (ii) Exhaustive events: The total number of all possible outcomes in any trial in known as exhaustive events or exhaustive cases.

    In a throw of a coin, the possible outcomes are head and tail i.e., these are two exhaustive cases. In the experiment of rolling a

    die, the outcomes 1,2,3,4,5,6(six cases) are exhaustive.

    (iii) Favourable events: The events, which entail the required happening, are said to be favourable events. For example in a throw

    of die, to have the even number, 2, 4 and 6 are favourable events.

    (iv) Mutually exclusive events: Two events are known as mutually exclusive when the occurrence of one of them, excludes the

    occurrence of the other, e.g. while tossing a coin, we either get a head or tail but not both.

    (v) Independent event: Two event may be independent, when the actual happening of one does not influence in any way the

    happening of the other. In throwing two coins at a time, the outcome of one is independent of the sound. But in case a card is

    drawn from a pack of well shuffled cards and is not replaced, then the second draw of the card is dependent on the first draw.

  • 45

    The second draw is then a dependent event.

    (vi) Equally likely events: Two events are said to be equally likely if one of them can not be expected in preference, is called the

    to other. For example in a throw of a coin two case i.e. head and tail are equally likely to come.

    (vii) Conditional Probability: The probability of happening an event A, such that event B has happened, is called the conditional

    probability of happening of A on the condition that B has already happened. It is usually denoted by P (A/B).

    5.1.3. How to Compute Probability (Equally Likely Outcomes)

    Sometimes, a statistical experiment can have n possible outcomes, each of which is equally likely. Suppose a subset of r

    outcomes are classified as "successful" outcomes.

    The probability that the experiment results in a successful outcome (S) is:

    P(S) = (Number of successful outcomes) / (Total number of equally likely outcomes) = r / n

    Consider the following experiment. An urn has 10 marbles. Two marbles are red, three are green, and five are blue. If an

    experimenter randomly selects 1 marble from the urn, what is the probability that it will be green?

    In this experiment, there are 10 equally likely outcomes, three of which are green marbles. Therefore, the probability of choosing

    a green marble is 3/10 or 0.30.

    The probability of an event refers to the likelihood that the event will occur

    5.2. Addition Law of Probability

    If P1, P2, P3, Pn be the probabilities of n mutually excusive events E1, E2, E3, En respectively, then the probability P

    that one these events will happen, is given by

    p = P1 + P2 + P3, + +Pn

    p = P (E1 + E2 +E3, + +En) = P (E1) +P (E2) +P (E3) + +P (En)

    5.3 Multiplication Law of Probability

    If there are two independent events E1, and E2, the respective probability of which are known, then the probability that both will

    happen simultaneously is the product of the probability of one and the conditional provisional probability of the other given that

    the first that occurred.

    P (AB) = P (A) x P (B).

    Note:

    (i) E1, and E2, independent events, then P (E2, / E1,) is the same as P (E2,). Then P (E1E2) = P (E1).P (E2). (ii) If P1, P2, P3, Pnbe the probabilities of independent even E1, E2, E3, En respectively then the probability p, that all events happen simultaneously is given by

    P = P1.P2 P3 Pn

  • 46

    (iii) If P is the probability that an event will happen in one trial, then the probability that it will happen in a succession of r trials

    is

    = P.P.P..P = Pr

    (iv) If P1, P2, P3, Pn be the probabilities that certain events E1, E2, E3, En happen, then the probability they

    do not happen at all i.e., they all fail, is q1. q2. q3 qn = (1- p1). (1-p2). (1-pn) Hence the probability in which at least one of these events must happen is given by 1-q1, q2, q3, qn = 1 {( 1- p1). (1-p2). (1-pn)}

    5.4 Probability Distribution

    When a variable X takes the value x, with probability Pi( i = 1,2,3, ,n), then X in called random variable or stochastic

    variable. The value x1, x2, x3, xn of the random variable X with their respective probabilities p1, p2, p3,

    pn constitute a probability distribution of the variable X.

    Mean Or Expected Value And Variance :Let a random variable X assumes the values x1, x2, x3, xn with respective

    probabilities p1, p2, p3, pn, then the mean or expected value of X is defined as

    E(X) = = p1x1 + p2x2 + p3x3 + . + pnxn = px.

    The variance of the random variable X given by

    The can be simplified to a more convenient from

    5.5 Binomial Distribution

    A random variable X which takes values 0, 1,2,..,n is said to follow a Binomial distribution

    if its probability function in given by

    P (X = r) = P (r) =cr prq

    n-r, r = 0,1,2,,n,

    Where p, q>0 such that p + q =1.

    Let the probability of the happening of an event A in one trial be p and its probability of not

    happening be 1 - p = q.

    We assume that there are n trials and the happening of the event A is r times and its not

    happening is n - r times.

  • 47

    This may shown as follows AAA

    ..

    r times n-r times A indicates its happening, its and P (A) = P and P () = q We see that (1) has the probability

    pp..p q.q..q = pr q

    n-r

    r times n-r times Clearly (1) is merely one order of arranging r As.

    (1) .(2)

    The probability of (1) = pr q

    n-r x Number of different arrangements of

    r As and (n-r) s. .

    The number of different arrangements of r As and (n-r) s =ncr.

    Probability of the happening of an event r times =ncr p

    rq

    n-r.

    = p(r) qn-r

    ,

    (r = 0, 1, 2,., n )

    = (r + 1)th term of (q + p)

    n.

    If r = 0, probability of happening of an event 0 times =

    nC0 q

    n p

    0 = q

    n

    If r = 1, probability of happening of an event 1 times =

    nC1q

    n1p

    If r = 2, Probability of happening of an event 2 times =

    If r = 3, probability of happening of an event 3 times =

    nC2q

    n2p

    2

    nC3q

    n3p

    3

    and so on.

    These terms are clearly the successive terms in the expansion of (q +p) n.

    Hence it is called Binomial distributions.

    Condition for the Applicability of Binomial Distribution:

    While using the formula of the binomial distribution in solving any problem, the following conditions must be satisfied:

    (a) There should be a finite number of trials.

    (b) The trials do not depend on each other.

    (c) Each trial should have only two possible outcomes, either a success or a failure.

    (d) The probability of success of failure is the same for all the trials.

    5.5.1 Mean of Binomial Distribution

    If X is a binomial vitiate with parameters n and p, then

    P (X = r) = p(r) =nCr p

    rq

    n-r, r = 0,1,2,..,n.

  • 48

  • 49

    Example: The probability that a pen manufactured by a company will be defective is 1/10. If 12 such pens are manufactured fine

    the probability that (1) exactly two will be defective, (ii) at least two will be defective (iii) none will be defective.

    Solution: The probability of defective pen is 1/10=0.1

    The probability of a non-defective pen is 1- 0.1= 0.9 Hence n = 12

    (i) The probability that exactly two will be defective

    = 12

    C2 (0.1)2 (0.9)

    10 = 0.2301.

    (ii) The probability that exactly two will be defective

    =1- (prob. That either none or one is non-defective)

    =1- [12

    C0 (0.9)12

    + 12

    C1 (0.1) (0.9)11

    ] = 0.3412

    (iii) The probability that none will be defective

    = 12

    C0 (0.9)12

    = 0.2833.

    Example: A die is thrown 8 times and it is required to find the probability that 3 will show (i) Exactly 2 times, (ii) At least seven

    times, (iii) At least once.

    Solution: The probability of throwing 3 in a single trial = P =1/6

    The probability of not throwing 3 in a single trial = q = 5/6

    a. P (getting 3, exactly 2 times)= 8C2 q

    6p

    2 =

    (ii) P (getting 3 at least seven times) = P (getting 3, at 7 or 8 times)

    = P (7) +P (8) = 8C7 q

    1p

    7 +

    8C8q

    0p

    8

    (iii) P (getting 3 at least once)

    = P(getting 3, at 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 times)

    =P(1) + P(2) + P(3) + P(4) + P(5) + P(6) + P(7) + P(8)

    =1- P (getting 3, at 0 times) = 1- 8C0q

    8p

    0

    =

    5.6. Poisson Distribution

    The Poisson distribution is generally used when measuring the number of occurrences of something (# of successes) over an

    interval or time period.

    The assumptions of a Poisson probability distribution are:

  • 50

    The probability of the occurrence of an event is constant for all subintervals.

    There can be no more than one occurrence in each subinterval.

    Occurrences are independent; that is, the number of occurrences in any non-overlapping intervals is independent of

    one another.

    The random variable X is said to follow the Poisson probability distribution if it has the probability function:

    5.6.1 The mean and variance of the Poisson probability distribution are:

    x = E(X) = and

    x2

    = E[(X -x)2 ] =

    The Poisson probability distribution is an important discrete probability distribution for a number of applications, including:

    1. The number of failures in a large computer system during a given day

    2. The number of delivery trucks to arrive at a central warehouse in an hour

    3. The number of customers to arrive for flights during each 15-minute time interval from 3:00 PM to 6:00 PM on weekdays

    4. The number of customers to arrive at a checkout aisle in your local grocery store during a particular time interval

    Example: On an average Friday, a waitress gets no tip from 5 customers. Find the probability that she will get no tip from 7

    customers this Friday.

    The waitress averages 5 customers that leave no tip on Fridays: = 5.

    Random Variable: The number of customers that leave her no tip this Friday.

    We are interested in .

    So, the probability that 7 customers will leave no tip this Friday is 0.1044.

    5.7. Normal Distribution or Normal Curve:

    Normal distribution is probably one of the most important and widely used continuous distribution. It is known as a normal random

    variable, and its probability distribution is called a normal distribution. The following are the characteristics of the normal

    distribution:

  • 51

    5.7.1. Characteristics of the Normal Distribution:

    1. It is bell shaped and is symmetrical about its mean.

    2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.

    3. It is a continuous distribution.

    4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a different normal distribution.

    Thus, the normal distribution is completely described by two parameters: mean and standard deviation.

    5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean is 0.5.

    6. It is unimodal, i.e., values mound up only in the center of the curve

    A normal distribution in a variate with mean and variance is a statistic distribution with probability density function

    on the domain

    The Standard normal distribution is given by taking = 0 and 2 =

    1 in a general normal distribution. An arbitrary normal

    distribution can be converted to a Standard normal distribution by changing variables to , so , yielding

    5.7.2. Empirical Rule

    All normal density curves satisfy the following property which is often referred to as the Empirical Rule.

    68% of the observations fall within 1 standard deviation of the mean, that is, between - and +.

    95% of the observations fall within 2 standard deviations of the mean, that is, between - 2 and +2.

    99.7% of the observations fall within 3 standard deviations of the mean, that is, between - 3 and +3.

    .

    Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean.

    Figure: 5.1Normal Distribution or Normal Curve

  • 52

    Example

    The total weight of 8 people chosen at random follows a normal distribution with a mean of 550kg and a standard deviation of

    150kg.

    Whats the probability that the total weight of 8 people exceeds 600kg?

    First sketch a diagram.

    Figure: 5.1 Normal area curve

    The mean is 550kg and we are interested in the area that is greater than 600kg.

    z = ( x - xmean ) /

    Here x = 600kg,

    xmean, the mean = 550kg

    , the standard deviation = 150kg

    z = ( 600 - 550 ) / 150

    z = 50 / 150

    z = 0.33

    Table: 5.1

  • 53

    Look in the table down the left hand column for z = 0.3,

    and across under 0.03.

    The number in the table is the tail area for z=0.33 which is 0.3707 .

    This is the probability that the weight will exceed 600kg.

    Our answer is

    "The probability that the total weight of 8 people exceeds 600kg is 0.37 correct to 2

    figures."

  • 54

    Chapter-V Probability & Probability distribution

    End Chapter quizzes : V

    Ques.1.A coin is tossed three times. What is the probability that it lands on heads exactly one time?

    a. 0.125 b. 0.250 c. 0.333 d. 0.375

    Ques.2.P(A U B) is the probability that __________ will occur

    a. A b. B c. A and B d. A or B or both

    Ques.3. The events in an experiment are _____________ if only one can occur at a time

    a. mutually exclusive b. non-mutually exclusive c. mutually inclusive d. independent

    . Ques.4. A die is rolled, find the probability that an even number is obtained.

    a. 1/2 b. 1/3 c. 1/4 d. 1/5

    Ques.5. Which of these numbers cannot be a probability?

    a 0.00001 b 0.5 c 1.001 d 0

    Ques.6. For the normal distribution, the mean plus and minus 1.96 standard deviations will include what

    percent of the observations?

    a. 80%

    b. 84%

    c. 90%

    d. 95%

    Ques.7. Normal distribution is a

    a. Discrete distribution b. Continuous distribution c. Both d. None

  • 55

    Ques.8. Mean Of Binomial Distribution is given by

    a. p b. np c. npq d. n

    Ques.9. The probability of happening an event A, such that event B has happened, is called

    a. disjoint probability b. independent probability c. conditional probability d. dependent probability

    Ques.10. if A and B are mutually exclusive, then P (A U B) =

    a. P (A) b. P (A) + P (B) c. P (B) d. P (A) + P (B) - P (A B)

  • 56

    Chapter-VI Time Series

    Contents: 6.1 Introduction 6.1.1. Role of time Series

    6.2. Components of a time series

    6.2.1 Secular Trend 6.2.2 Seasonal variation 6.2.3 Cyclical variation 6.2.4 Irregular variation

    6.3. Measurement of Trends

    6.3.1 Freehand method 6.3.2 The method of semi-averages 6.3.3 The method of moving averages 6.3.4 The method of curve fitting by the Principle of Least Squares

    6.4 Mathematical Models

    6.4.1 Additive model 6.4.2 Multiplicative model 6.4.3 Mixed models

  • 57

    6.1 Introduction

    Realization of the fact that "Time is Money" in business activities, the dynamic decision technologies presented here,

    have been a necessary tool for applying to a wide range of managerial decisions successfully where time and money

    are directly related. In making strategic decisions under uncertainty, we all make forecasts. We may not think that we

    are forecasting, but our choices will be directed by our anticipation of results of our actions or inactions.

    Indecision and delays are the parents of failure. This site is intended to help managers and administrators do a better

    job of anticipating, and hence a better job of managing uncertainty, by using effective forecasting and other predictive

    techniques.

    A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken

    at regular intervals (days, months, years), but the sampling could be irregular.

    A time series analysis consists of two steps:

    (1) building a model that represents a time series,

    (2) using the model to predict (forecast) future values.

    The time-series can be represented as a curve that evolves over time. Forecasting the time-series mean that we

    extend the historical values into the future where the measurements are not available yet.

    There are some subtleties in the definition a time-series forecast. For example, the historical data might be daily

    sales and but you need monthly forecasts. Grouping the values according to a certain period (ex: month) is called

    time-series.

    The following are few examples of time series data:

    1. Profits earned by a company for each of the past five years.

    2. Workers employed by a company for each of the past 15 years.

    3. Number of students registered for the MBA programme of an institute for each of the past five years.

    4. The weekly wholesale price index for each of the past 30 weeks.

    5. Number of fatal road accidents in Delhi for each day for the past two months.

    6.1.1. Role of time Series

    1. A time series analysis enables one to study such movements as cycles that fluctuate around the trend. Knowledge of cyclical pattern in certain series of data will be helpful in making generalisations in the concerned business or industry. 2. The analysis of a time series enables us to understand the past behavior or performance. We can know how the data have changed over time and find out the probable reasons responsible for such changes. If the past performance, say, of a company, has been poor, it can take corrective measures to arrest the poor performance.

  • 58

    3. A time series analysis helps directly in business planning. A firm can know the long-term trend in the sale of its products. It can find out at what rate sales have been increasing over the years. This may help it in making projections of its sales for the next few years and plan the procurement of raw material, equipment and manpower accordingly. 4. A time series analysis enables one to make meaningful comparisons in two or more series regarding the rate or type of growth. For example, growth in consumption at the national level can be compared with that in the national income over specified period. Such comparisons are of considerable importance to business and industry. 5. A time series analysis helps in evaluating current accomplishments. The actual performance can be compared with the expected performance and the cause of variation analysed e.g. if we know how much is the effect of seasonality on business we may device ways and means of ironing out the seasonal influence or decreasing it by producing commodities with complementary seasons.

    6.2. Components of a time series

    1 Secular Trend - the smooth long term direction of a time series

    2 Seasonal Variation - Patterns of change in a time series within a year which tends to repeat each year

    3 Cyclical Variation - the rise and fall of a time series over periods longer than one year

    4 Irregular Variation - classified into:

    Episodic - unpredictable but identifiable

    Residual - also called chance fluctuation and unidentifiable

    6.2.1 Secular Trend

    With the first type of change, secular trend, the value of the variable tends to increase or decrease over a long period

    of time. The steady increase in the cost of living recorded by the Consumer Price Index is an example of secular

    trend. From year to individual year, the cost of living varies a great deal, but if we examine a long- term period, we

    see that the trend is toward a steady increase. Figure shows a secular trend in an increasing but fluctuating time

    series.

    Figure: 6.1 Secular trend

  • 59

    6.2.2 Seasonal variation

    The third kind of change in time-series data is seasonal variation. As we might expect from the name, seasonal

    variation involves patterns of change within a year that tend to be repeated from year to year. For example, a

    physician can expect a substantial increase in the number of flu cases every winter and of poison in every summer.

    Since these are regular patterns, they are useful in forecasting the future. In figure 1(c), we see a seasonal variation.

    Notice how it peaks in the fourth quarter of each year.

    1 Sales of ice cream will be higher in summer than in winter, and sales of overcoats will be higher in autumn

    than in spring.

    2 Shops might expect higher sales shortly before Christmas or in their winter and summer sales.

    3 Sales might be higher on Friday and Saturday than on Monday.

    4 The telephone network may be heavily used at a certain times of the day (such as mid-morning and mid-

    afternoon) and much less used at other times (such as in the middle of the night)

    Figure: 6.2 Seasonal variation

    Seasonal VariationSeasonal Variation

    Linear trendLinear trend

    4 4

    3 3

    2 2

    1 1

    Sa

    les

    of

    Wil

    dca

    t sa

    ilb

    oa

    tsS

    ale

    s o

    f W

    ild

    ca

    t sa

    ilb

    oa

    ts

    (mil

    lio

    ns

    of

    do

    lla

    rs)

    (mil

    lio

    ns

    of

    do

    lla

    rs)

    |

    JulyJuly

    20012001

    |

    JulyJuly

    20022002

    |

    JulyJuly

    20032003

    |

    JulyJuly

    20042004

    tt

    6.2.3 Cyclical variation

    The second type of variation seen in a time series in cyclical fluctuation. The most common example of cyclical

    fluctuation is the business cycle. Over time, there are years when the business cycle hits a peak above the trend line.

    At other times, business activity is likely to slump, hitting a low point below the trend line. The time between bitting

    peaks or falling to low points is a least 1 year, and it can be as many as 15 or 20 years. Figure 1(b) illustrates a

    typical pattern of cyclical fluctuation above and below a secular trend line. Note that the cyclical movements do not

  • 60

    follow any regular pattern but move in a somewhat unpredictable manner.

    Figure: 6.3 cyclical variation

    Cyclical VariationCyclical Variation

    Z1Z1-- DeclineDecline

    P1P1-- ProsperityProsperity

    V1V1-- DepressionDepression

    Z2Z2-- ImprovementImprovement

    ZZ11PP11

    VV11

    ZZ22PP22

    VV22

    Cyc

    lical

    act

    ivity

    Cyc

    lical

    act

    ivity

    tt

    Figure: 6.4 Business Cycle

    Business CycleBusiness Cycle

    Prosperity

    Decline

    Improvement

    Depression

  • 61

    Figure: 6.5 Cyclical Components

    Cyclical ComponentsCyclical Components

    StartStart EndEnd

    1.15 1.15

    1.10 1.10

    1.05 1.05

    1.00 1.00

    .95 .95

    .90 .90

    CCtt

    tt|

    11

    |

    22

    |

    33

    |

    44

    |

    55

    |

    66

    |

    77

    |

    8819971997 19991999 20012001 20032003

    These are medium-term changes in results caused by circumstances which repeat in cycles. In business, cyclical

    variations are commonly associated with economic cycles, successful booms and slumps in the economy.

    Economic cycles may last a few years. Cyclical Variations are longer term than seasonal variations.

    6.2.4 Irregular variation

    Irregular variation is the fourth type of change in time-series analysis. In many situations, the value of a variations

    describe such movements. The effects of the Middle East conflict in 1973, the Iraqi situation in 1990 on gasoline

    prices in the United States are examples of irregular variation. Figure 1 (d) illustrates irregular variation.

    Figure: 6.6 Irregular variation