deloitte supply chain analytics workbook

Deloitte Consulting Advanced Analytics Group Presents: Supply Chain Analytics Unit 1 Workbook

Contents

Welcome to Supply Chain Analytics Unit 1 ............................................................ 1

How to Use this Workbook ..................................................................................... 2

Section 1 – Fundamentals of Operations Research (Part I) .................................. 3

Section 2 – Network Problems (Part I) ................................................................... 8

Section 3 – Applied Statistics (Part I) ................................................................... 17

Section 4 – Fundamentals of Operations Research (Part II) ............................... 34

Section 5 – Network Problems (Part II) ................................................................ 37

Section 6 – Applied Statistics (Part II) .................................................................. 44

Solutions ............................................................................................................... 61

Deloitte Advanced Analytics Group

Supply Chain Analytics Unit 1 Workbook 1

Welcome to Supply Chain Analytics Unit 1

One of Deloitte’s top priorities is to support the development of skills and knowledge that enable practitioners to provide the highest level of client service. In support of this objective, Deloitte’s Advanced Analytics Group (DAAG) created a set of courses and learning materials to expand the client service and technical capabilities of practitioners interested in Supply Chain Analytics.

Supply Chain Analytics Unit 1 is comprised of six courses that serve as prerequisites for Unit 2. Unit 2 introduces advanced topics in Supply Chain Analytics such as Network, Inventory and Transport Optimization. Unit 1 provides a the foundation and knowledge needed to solve business problems outlined in Unit 2. The Unit 1 courses should be taken in the order they are presented.



How to Use this Workbook

This workbook is designed to support the Unit 1 Supply Chain Analytics training and to provide tools and information needed to support the training. This workbook will:

• Summarize key learning objectives

• Provide an opportunity for reflection and a framework for understanding what can occur on client engagements

• Provide application based activities to embed learning and make it practical

• Point to resources and tools that will assist in applying learning objectives

As you proceed through Unit 1, have this workbook available to complete all of the activities and maximize the impact of the learning. The Course Information and Activities section has suggested activities to help you apply what you are learning and prepare you for Unit 2.



Section 1 – Fundamentals of Operations Research (Part I)

Basic Concepts of Linear Programming

Overview of Linear Programming Linear Programming is a technique which is used to arrive at an optimal decision, which is affected by various factors and constraints. Linear Programming problems consist of two parts: Objective Function and Constraints.

An objective function can be maximized or minimized.

Constraints are usually in the form of inequalities. Constraints exist because certain limitations restrict the range of a variable’s possible values.

Approach to Problem Solving

• Identify the objective of the problem

• Identify the decision variables and constraints on them

• Write the objective function and constraints in terms of the decision variables

• Add any implicit constraints

• Arrange the equations into an organized format

Assumptions in Linearity

• Proportionality

• Additivity

• Divisibility

• Certainty

Exercise Question 1.1: A diet is to contain at least 200 grams of carbohydrates, 100 grams of fat and 150 grams of protein. Two foods A and B are available. Food A costs $2 per pound and food B costs $4 per pound. A pound of food A contains 10 grams of carbohydrates, 20 grams of fat and 15 grams of protein. A pound of food B contains 25 grams of carbohydrates, 10 grams of fat and 20 grams of protein. Formulate the problem as a Linear Programming problem so as to find



the minimum cost for a diet that consists of a mixture of these two foods and also meets the minimum requirements.

Food Type Carbohydrates Fat Protein Cost ($) per gram

A 10 20 15 2

B 25 10 20 4

Requirement 200 100 150

Review the correct answer in the Solutions section.

Exercise Reflection: Use the space below to what you have learned about solving the previous Linear Programming problem.

________________________________________________________________

________________________________________________________________

The general form for maximized objective function and constraints in Linear Programming is represented as follows.



Linear Programming Optimization Methods

Graphical Method of Solution Graphical method is a simple way to solve Linear Programming problems when there are two decision variables x1, and x2. We usually take these decision variables as x, y instead of x1, x2.

The graphical method includes two major steps:

• The determination of the solution space that defines the feasible region

• The determination of the optimal solution from the feasible region

Defining the Feasible Region

The following three steps are used to determine the feasible solution of a Linear Programming problem:

1. Since the two decision variables x and y are non-negative, consider only the first quadrant of the xy-plane

2. Draw the line for each constraint

• Each line divides the first quadrant into two regions

• Area under constraint 1: All the points in this area satisfy the equation 3x + 4y ≤ 12

• Area under constraint 2: All the points in this area satisfy the equation 5x + 3y ≤ 15

3. Each point within the feasible solution meets all the constraints

Thus, the intersection of the two areas is the feasible area or feasible solution of the Linear Programming problem.



Optimal Solution The optimal solution to a Linear Programming problem occurs at the corners of the feasible region. Another way to reach the optimal solution is to plot the objective function for some arbitrary value, like 6x + 5y = 12. Since we want to maximize 6x + 5y, we plot another line for 6x + 5y = 20.

• This line is parallel to the first line and is moving in the direction of increase of the objective function line. If we want to maximize 6x + 5y, then we move it in the increasing direction

• We can move the line until it comes out of the feasible region. The last point it will touch before it leaves the feasible region is the corner point (2,3)

• This point is the feasible point that has the highest value of the objective function and is optimal

Exercise Question 1.2: Using the graphical method of solution of a Linear Programming problem, find the feasible solution for the problem of a decorative item dealer whose Linear Programming problem is to maximize profit function.

Objective Function: Z = 50x + 18y

Constraints:

2x + y ≤ 100

x + y ≤ 80

x ≥ 0, y ≥ 0


Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Linear Programming problem using the graphical method.



________________________________________________________________

________________________________________________________________

Business Applications Linear programming is used to facilitate decision-making in business when there are multiple trade-offs involved and an optimal outcome needs to be arrived at in the face of various conditions. While it has roots in operations and supply chain, it has applications across business functions/service lines.

Marketing Application

Financial Applications

Production Management

Product-Mix Application

• Media Selection

• Market Research

• Portfolio Selection

• Financial Planning

• A Make-or-Buy Decision

• Production Scheduling

• Workforce Assignment

• SKU Rationalization

• Blending Problems



Section 2 – Network Problems (Part I)

Overview of Network Problems

Fundamentals of Network Flow Problems Network Flow Problems are applied to business issues that can be formulated in a network structure with nodes and arcs, and solved using special purpose algorithms.

Common Terms

Node Specific location in a network that can be of various types such as origin, destination, and transshipment nodes.

Arc Connector of two nodes, and the path between nodes along which materials move. Arcs can be one-way or two-way in nature.

Flow Movement of materials / resources between nodes along an arc.

Capacity Limitations on the amount of materials that can flow through an arc. Arcs can possess both lower and upper capacity constraints.

Business Applications of Network Flow Problems Distribution and transportation systems

• Telecommunication networks

• Oil & gas

• Aerospace

• Manufacturing

• Telecommunications



Illustrative Examples of Business Applications of Network Flow Problems

Applications Sample Business Application

Physical Analog of Nodes

Physical Analog of Arcs

Flow

Distribution Networks

What quantity of goods should be sent from which plant given demand at a distribution center (DC)?

Plants, Distribution Centers, Warehouses

Road, Rail and Air Routes

Materials, Goods, Finished Products

Transportation Systems

What is the maximal number of vehicles that can be routed through a road system?

Intersections, Airports, Rail Yards

Highways, Airline Routes, Railbeds

Passengers, Freight, Vehicles, Operators

Manufacturing Scheduling

What is the optimal assignment of jobs to machines?

Machines, Jobs

Processing Time Assignment and Sequencing of Jobs

Overview of Solution Methods

Solution Methods for Network Flow Problems Common network flow problems can be solved primarily using three methods:

Rule-Based Algorithms Problem-specific, optimal, less flexible

Linear Programming-Based Optimization

More flexible, time-consuming, commercial solver tool-based

Heuristics Easy, but may be sub-optimal

Considerations for Choice of Solution Method

Solution Driven Considerations Resource Driven Considerations

• Problem Size

• Problem Complexity

• Desired Accuracy Levels

• Impact of Assumptions

• Availability of Solvers

• Availability of Trained Resources

• Cost Implications

• Available Time



Shortest Path Problem

Overview of the Shortest Path Problem The Shortest Path Problem is a network problem with the primary objective of finding the shortest route between any pair of nodes in a network. There are multiple forms of this problem, and most forms have corresponding specific algorithms that are more efficient than the standard algorithm.

Decision: Which arcs to travel on?

Objective: Minimize the distance (or time) from the origin to the destination.

Dijkstra's Algorithm – Standard Form

Step 1 Assign a permanent label [0,S] to the starting node (Node 1) (0 indicates the distance from the node to itself, and S indicates that it is the starting node)

Step 2 Assign tentative labels to the nodes that can be reached directly from Node 1 (In a label, the first number is the direct distance from Node 1, and the second number is the preceding node in the route from Node 1)

Step 3 Identify the tentatively labeled node with the shortest distance value, and declare that node permanently labeled If all nodes are permanently labeled, go to step 5

Step 4 For each non-permanently labeled node that can be reached from the new permanently labeled node: If a node has a tentative label, calculate the shortest distance from Node 1 through the new permanently labeled node. If this is less than the existing distance, reset and permanently label the node. Go to step 3 If the node is not yet labeled, create a tentative label indicating the shortest distance from Node 1 through the new permanently labeled node. Go to step 3

Step 5 The permanent labels identify the shortest route from Node 1 to the respective node, and the preceding node in the shortest route To find the shortest route to Node 1, work backwards along preceding nodes until Node 1 is reached



Linear Programming Method – Standard Form xij = binary variable indicating whether the arc between the ith and jth nodes is chosen

cij = The distance or length of arc (i,j)

Variations to Shortest Path Problem • Single-Source Shortest Path Problem

• Single-Destination Shortest Path Problem

• All-Pairs Shortest Path Problem

Sample List of Algorithms for Shortest Path Problem

Business Applications of the Shortest Path Problem

• Dijkstra's Algorithm

• Bellman-Ford Algorithm

• A* Search Algorithm

• Floyd-Warshall Algorithm

• Johnson's Algorithm

• Perturbation Theory

• The Shortest Path Algorithms determine the path with the least weight (weight can be cost, distance, time, etc.) between any pair of nodes in a network. Some business applications of these algorithms include:

• Flight reservations

• Internet packet routing

• Driving directions

• Telecom network routing



Exercise Question 2.1: Choose the arcs to travel on such that the distance between node 1 and node 8 is minimized.


Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Shortest Path problem.

________________________________________________________________

________________________________________________________________

Minimum Spanning Tree Problem

Overview of the Minimum Spanning Tree Problem The Minimum Spanning Tree Problem is typically used in a given network to connect all the nodes of the network such that the total weight of all the arcs used to achieve this objective is minimized. A Minimum Spanning Tree will provide the optimal set of arcs with minimal total arc cost, time, distance or other similar measure.

Decision: Which arcs to choose such that all nodes are connected to the network?

Objective: Minimize the total weight of the arcs chosen.



Linear Programming Method – Standard Form To solve this problem, the network is divided into all possible combinations of two subsets of the network such that each set of subsets together makes up the total network.

xij = The arc between the ith and jth nodes in a network of n nodes cij = The distance or length of arc (i,j) A = Every possible subset of nodes within the network B = Complement of A

Note: The optimal solution will use (n-1) arcs to connect a network of n nodes. Using more than (n-1) arcs will potentially result in redundant arcs and/or the formation of loops.

Variations to the Minimum Spanning Tree Problem • Optimum Communication Spanning Tree

• Steiner Trees

Sample List of Algorithms for Minimum Spanning Tree Problem

Business Applications of the Minimum Spanning Tree Problem

• Prim’s Algorithm

• Kruskal’s Algorithm

• Boruvka’s Algorithm

• Reverse-Delete Algorithm

• Edmonds’ Algorithm

The Minimum Spanning tree problem is used to determine the smallest spanning tree that is needed to connect a set of nodes in a network. The typical variables include distance, cost, time, etc. Some business applications of this problem include:

• Design of telecommunications networks

• Airline routing

• Design of lightly used transportation network to minimize the total cost of providing the links

• Finding routes with maximum bottleneck capacity in a computer network

• Network design of high voltage electrical transmission lines

Exercise: Identify the Minimum Spanning Tree for the below given sample problem.



Question 2.2: Fly High Airlines wants to establish connectivity to all the major ports in the country leveraging the shortest distance route. Connect all the ports in the network such that the overall distance of the network is minimized.



________________________________________________________________

________________________________________________________________

Maximal Flow Problem

Overview of the Maximal Flow Problem The Maximal Flow Problem is used to determine the maximum amount of flow of a given item (vehicles, fluid, materials, etc.) that can enter and exit a network in a specific period of time. Flow is transmitted through each node in the network as efficiently as possible. Typically, each arc is subject to certain flow restrictions (vehicles per hour, gallons per hour) and the maximum capacity restriction is referred to as the flow capacity for that arc. In its simplest form, it is assumed that for each node, inflow to the node is equal to the outflow from the node (no inventory). In this case, capacity restrictions are not assigned to the nodes.

Decision: How much flow on each arc?

Objective: Maximize flow through the network from an origin to a destination.



Linear Programming Method – Standard Form To solve this problem, add a new arc from node n (output node) back to Node 1 (input node). This arc denotes the total flow over the route network. The flow over this arc must be maximized. Each variable is associated with each arc that represents the quantity of flow through that arc, and there is a constraint for flow through each node.

xij = The flow across arc from the ith to the jth node.

uij = Maximal capacity on arc from the ith to the jth node.



Variations to Maximal Flow Problem • Capacity Constraints

• Max-Flow Min-Cut Theorem

Sample List of Algorithms for Maximal Flow Problem

Business Applications of the Maximal Flow Problem

• Ford Fulkerson Algorithm

• Edmonds-Karp Algorithm

• Dinitz Blocking Flow Algorithm

• General Push-Relabel Maximum Flow Algorithm

The Maximal Flow Problem can be used to determine the optimal flow of materials (such as vehicles, oil, etc.) through each arc of a given network such that the amount of flow through the entire network is maximized. Some business applications of this problem include:

• Oil flow through a pipeline network

• Project selection

• Airline scheduling

• Material flow through a company’s distribution network

• Water supply through a system of aqueducts

Exercise Question 2.3: The local water conservation authority is constructing new ducts for water supple in the city. The capacity of each duct is provided in the below given network representation. Choose the route to maximize the water flow from node 1 to node 9.



________________________________________________________________

________________________________________________________________



Section 3 – Applied Statistics (Part I)

Statistical Tools

Statistical Tools There are several statistical tools and packages that are commercially available to solve statistics problems. Even common software like Microsoft Excel have “Add-Ins” with significant statistical capabilities.

Using Statistical Tools Primary tools used to solve the different statistical problems:

Course Topic Primary Tools

Simple Regression and Correlation MS Excel, SPSS, SAS, Systat

Multiple Regression and Correlation MS Excel, SPSS, SAS, Systat

Time Series Analysis and Forecasting SPSS, SAS, Systat

Discriminant and Logit Analysis SPSS, SAS, Systat

Factor Analysis and Clustering SPSS, SAS, Systat

Key analysis supported by Analysis ToolPak:

• Regression

• Sampling

• Rank and percentile

• t-Test: Two Sample for Means

• Correlation

• Covariance



Probability Distributions

Random Variables A variable is random if it assumes different values as a result of the outcome of a random experiment, for example, a coin toss.

There are two types of random variables:

Discrete Random Variable Continuous Random Variable

A discrete random variable is one for which the number of possible outcomes can be counted, and for each possible outcome, there is a measurable and positive probability. Example: Number of days it rains in a given month, number of patients visiting a clinic on each day of the previous week.

A continuous random variable is one for which the number of possible outcomes is infinite, even if lower and upper bounds exist. Example: The actual amount of daily rainfall between zero and 10 inches is an example of a continuous random variable because the actual amount of rainfall can take on an infinite number of values.

Expected Value of a Random Variable The expected value of a random variable can be obtained by multiplying each value that the random variable can assume with the probability of occurrence of that value, and then adding up all these products.

Expected Value of a Random Variable E(x) = x1P1 + x2P2+..+….+xnPn

x = Value of the Random Variable

P = Probability of Occurrence of that Value

n = A numeric integer from 1 to infinity

Exercise

Question 3.1: Suppose Jim goes to two movies 10% of all weekends, he goes to one movie 40% of the time, and he goes to no movies 50% of the time. What is the expected value for the number of movies he goes to during a weekend?




Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Binomial Distributions problem.

________________________________________________________________

________________________________________________________________

Probability Distributions Probability distributions arise from experiments where the outcome is subject to chance. A Probability Distribution describes the probabilities of all the possible outcomes for a random variable, such as getting tails on the toss of a coin or the probability that a call center representative will convert a sale on a given call.

Characteristics

• The probability of all possible outcomes must sum to one

• It is a listing of the probabilities of all the outcomes that could result if an experiment was conducted

Example A simple Probability Distribution is that for the roll of one fair die, there are six possible outcomes and each one has a probability of 1/6, so they sum to one.

The Probability Distribution of all the possible returns on the S&P index is a more complex version of the same idea.

A frequency distribution is different from a Probability Distribution. Frequency distribution is the process of listing all the observed frequencies of all outcomes in an experiment while it was conducted. A Probability Distribution is a listing of the probabilities of all the outcomes that could result if the experiment was conducted.



Types of Probability Distributions The nature of the experiment dictates which Probability Distribution may be appropriate for modeling the resulting random outcomes.

There are two types of probability distributions:

Discrete Probability Distribution (Appropriate for discrete random variables)

Continuous Probability Distribution (Used for continuous random variables)

Discrete Probability Distributions can assume only certain outcomes. The outcomes are mutually exclusive. Examples: The number of students in a class The number of children in a family The number of cars entering a carwash in a hour Number of home mortgages approved by Coastal Federal Bank last week

Continuous Probability Distributions can assume an infinite number of values within a given range. Examples: The distance students travel to class The time it takes an executive to drive to work The length of an afternoon nap The length of time of a particular phone call

Types of Discrete Probability Distributions Binomial: Binomial Distributions describe discrete data resulting from an experiment known as the Bernoulli Process.

Poisson: Poisson Distributions express the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independent of the time since the last event.

Binomial Distributions

Standard Formula for Binomial Distributions

Where the following standard notations apply

p(r) = Probability of r successes in n trials p = Characteristic probability or probability of success q = 1 – p = Probability of failure r = Number of successes desired n = Number of trials undertaken μ = Population mean σ = Standard deviation ! denotes “factorial”; 5! = 5*4*3*2*1 = 120



Exercise Question 3.2: The probability of converting a sale on any given call for an outbound call center representative is 0.6. If the representative takes 6 calls per hour, what is the probability that he/she will convert exactly 2 sales?

Solution: Apply the binomial formula just discussed with the following values:

n = 6 r = 1, 2, 3,….,6 p = 0.6 q = 0.4

The Binomial Distributions for a variety of situations can be calculated in this manner and are illustrated below.

Reflective Question: What is the probability that he/she will convert up to 5 sales per hour?

Exercise Reflection: Use the space below to note the important things you have learned about solving problems using Binomial Distributions.

________________________________________________________________

________________________________________________________________



Poisson Distributions Standard Formula for Poisson Distributions

f(x) = Probability of x occurrences in an interval λ = Mean number of occurrences in an interval e = 2.71828 μ = Population mean

σ = Standard deviation

Types of Continuous Probability Distributions - Normal Distributions A Probability Distribution is called continuous if its cumulative distribution function is continuous.

Description • A Normal Distribution, also known as the Gaussian distribution, describes

continuous data where the random variable can assume any value within a given range, and the Probability Distribution is continuous

• The Normal Distribution is very important in statistics as it has properties that make it applicable to a wide variety of situations, and it comes close to matching the observed frequency distributions of many phenomena

• The areas under the curve represent probabilities, and the total area under the normal curve is 1.00

• As the tails never reach the horizontal axis the theoretical model can assign impossible empirical values, but not much accuracy is lost by ignoring values far out in the tails

• Although the Normal Distribution is continuous, it can be used to approximate discrete distributions whenever np and nq are at least 5



Characteristics

• The curve is bell-shaped and has a single peak (unimodal)

• The mean of the normally distributed population lies at the center of the normal curve

• Due to its symmetry, the mean, median and mode are of the same value

• The tails of the Normal Distribution extend indefinitely and never touch the horizontal axis

Standard Deviation

The areas under the curve represent probabilities, and the total area under the normal curve is 1.00. It can be noted that:

• Approximately 68% of the values in a normally distributed population lie within +/- 1 standard deviation from the mean

• Approximately 95.5% of the values in a normally distributed population lie within +/- 2 standard deviation from the mean

• Approximately 99.7% of the values in a normally distributed population lie within +/- 3 standard deviation from the mean



Testing for Normality

Graphical Method:

• Comparing the histogram plotted for all residuals (error terms) to a normal curve is a quick test of normality of data

• The normal probability plot is a formal graphical tool to confirm normality. In a normal probability plot, the data is plotted against a theoretical Normal Distribution in such a way that the points should form an approximate straight line. Departures from this straight line indicate departures from normality

• Other rigorous statistical methods used to test for normality include, Pearson’s Chi-Square Test, Anderson-Darling Test, and Shapiro-Wilk Test

• When removing data that lies two to three standard deviations from the mean, always go back and verify that other metrics (spend, revenue, etc.) are not disproportionately affected or reduced

Testing data for normality is critical since assuming data distribution is normal and including only +2σ or +3σ may lead to exclusion of important data points.

Other Common Probability Distributions

Distribution Description

Continuous Uniform Distribution:

• Continuous Uniform Distribution [U(a,b)], is a family of Probability Distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable

• Probability Density Function: f(x) = 1 / (b-a) for a< x < b ; 0 for x > a or x < b

• Population Mean = (a + b) / 2




• Variance = (b – a)2 / 12

• Standard Deviation = (b – a) / √12

• One of the most common applications of this distribution is to generate random numbers

Exponential Distribution:

• Exponential Distribution represents a process in which events occuring continuously and independently at a constant average rate

• Probability Density Function:

f(x) = λe- λx for x > 0

= 0 for x < 0

where λ is the parameter of distribution called rate parameter and λ>0

• Population Mean = 1 / λ

• Variance = 1 / λ2

• Standard Deviation = 1 / λ

• Service times of bank tellers, call center agents etc. may be modeled as Exponential Distributions. Other applications include situations where certain events occur with a constant probability per unit




length

Gamma Distribution

• Gamma Distribution is a two-parameter family of continuous Probability Distributions. It has a scale parameter θ and a shape parameter k

• Probability Density Function: f(x;k,θ) = (xk-1*e-x/ θ ) / θ k Γ(k) for x > 0 and k, θ > 0

• Population Mean = k θ

• Variance = kθ2

• Standard Deviation = θ √k

• The Gamma Distribution is frequently used to model waiting times; for instance, in life testing, the waiting time until death is a random variable which is frequently modeled with a Gamma Distribution

Student’s t Distribution • Student's t-Distribution (or simply the t-distribution) is a Probability Distribution used to model normally distributed population when the sample size is small

• Probability Density Function:

f(x) = Γ(Ʋ+1)/2___ * ( 1 + t2/(Ʋ)




–(Ʋ+1)/2

√(Ʋπ)Γ(Ʋ/2)

where Ʋ is the number of degrees of freedom and Γ is the gamma function

• Population Mean = 0 for Ʋ > 1, otherwise undefined

• Variance = Ʋ / (Ʋ - 2) for Ʋ > 2, otherwise undefined

• Standard Deviation = √ [Ʋ / (Ʋ - 2)] for Ʋ > 2, otherwise undefined

• Student’s t-Distribution is used when population standard deviation is required to be estimated from the data

Sampling Techniques

Overview Sampling is the part of statistical practice concerned with the selection of individual observations intended to yield knowledge about a population of concern, especially for the purposes of statistical inference.

The stages of the sampling process are:

• Defining the population of concern

• Specifying a sampling frame, a set of items or possible events to measure

• Specifying a sampling method for selecting items or events from the frame

• Determining the sample size

• Implementing the sampling plan

• Sampling and data collecting

• Reviewing the sampling process



Central Limit Theorem The Central Limit Theorem states that the sampling distribution of the mean approaches normality as the sample size increases.

• This relationship between the shape of a Population Distribution and the shape of the sampling distribution of the mean is called the Central Limit Theorem

• The importance of this theorem is that it permits us to use sample statistics to make inferences about population parameters without knowing anything about the nature of the distribution for that population other than what we can get from the sample

The charts below illustrate that the distribution of sample means reach normality as the sample size increases. Since we know the Normal Distribution characteristics, which are described by just two parameters (mean and standard deviation), we can now better estimate the characteristics of the entire population.

n = 1

n = 5



n = 10

n = 25

Types of Sampling Techniques There are two types of sampling techniques:

• Judgment Sampling

• Random Sampling

Methods of Random Sampling • Simple Random Sampling

• Systematic Sampling

• Stratified Sampling

• Cluster Sampling

Common examples of sampling bias are:

• Data Mining Bias

• Sample Selection Bias

• Survivorship Bias

• Look-Ahead Bias

• Time Period Bias



Hypothesis Testing

Description of Hypotheses Testing Hypothesis testing is a method of making statistical decisions using experimental data. It decides whether experimental results contain enough information to cast doubt on conventional wisdom.

Steps in Hypothesis Testing • Begin with an assumption or a hypothesis that is made about a population

parameter

• Collect sample data and conduct statistical analysis for the sample, which is then used to determine the likelihood that the hypothesized population parameter is correct

Key Concepts • Null Hypothesis: This is the statement of the assumed or hypothesized value of

the population parameter before we begin sampling. This assumption is called the null hypothesis and is denoted by H0. Null hypothesis is the default, conservative assumption. The test is trying to see if the data sufficiently proves the alternate.

• Alternate Hypothesis: Whenever the null hypothesis is rejected, the conclusion that is accepted is called the alternate hypothesis and is denoted by HA/ H1

• Significance Level: It is defined as the probability of making a decision to reject the null hypothesis when the null hypothesis is actually true (i.e. a false negative)

• Two-Tailed and One-Tailed Tests: A two-tailed test will reject the null hypothesis if the sample mean is significantly higher or lower than the hypothesized population mean (rejection). This can be contrasted with the one-tailed test where there is only one rejected area

• Standard Error: The standard error of a method of measurement or estimation is the standard deviation of the sampling distribution associated with the estimation method

Five-Step Process for Hypotheses Testing

Step 1: State your hypotheses. Decide whether this is a two-tailed or one-tailed test. Select a level of significance appropriate for this decision.

Step 2: Decide which distribution (t or z) is appropriate (from the table below) and find the critical values for the chosen level of significance from the appropriate table.

Step 3: Calculate the standard error of the sample statistic. Use the standard error to convert the observed value of the sample statistic to a standardized value.

Step 4: Sketch the distribution and mark the position of the standardized sample value and the critical values for the test.

Step 5: Compare the value of the standardized sample statistic with the critical values for this test and interpret the result.



Decision Table for Distribution Selection

Exercise Question 3.4: Fizz-O, a leading cola manufacturing and distribution company, is considering expanding its operations in New Jersey and Delaware. As a part of developing its expansion strategy, Fizz-O wants to establish if the average annual consumption of cola in these two states is different from that of the entire US. Fizz-O’s marketing team has already conducted a survey across 400 people (identified using random sampling) in each of the two states, and determined the state-wise cola consumption levels – in NJ, sample average = 1.6 gallons/year and 2.0 gallons/year in DE,. It is known that the average cola consumption across the US is 1.2 gallons/year with standard deviation = 6.

Solution:

Step 1: State your hypotheses and decide whether this is a two-tailed or one-tailed test.

Define the Hypothesis:

Decide Test:

This is a two-tailed test because our business decision is impacted if the state-level annual average consumption is either higher than or lower than the national annual average consumption.

Reflective Question: What will be the appropriate level of significance for this decision?



Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Hypothesis testing problem.

________________________________________________________________

________________________________________________________________



Section 4 – Fundamentals of Operations Research (Part II)

Basic Concepts of Integer Programming

Integer Programming Concepts When all the variables are integers, the integer program is called All Integer Program. When some, but, not all of the variables are integers, the integer program is called a Mixed Integer Program.

In many applications of Integer Programming, one or more integer variables are required to equal either 0 or 1. Such variables are called binary variables. If all variables are 0-1 variables, it is a 0-1 Integer Program.

The Linear Program that results from dropping the integer requirements is called the Linear Program Relaxation of the Integer Program.

Cost of Production In many fixed cost applications, the cost of production has two components:

• Set up, which is a fixed cost

• Variable Cost, which is directly related to the production quantity

Set up cost is included in a model for a production application using binary variables (1 to produce, 0 not to produce).

Exercise Question 4.1: Three raw materials are used to produce three products (in tons): a fuel additive, a solvent base, and a laundry detergent.



The company has 20 tons of Material A, 5 tons of Material B, and 21 tons of Material C, and is interested in determining the optimal production quantities for the upcoming planning period.

Solution Step 1: Formulate the Linear Program

Step 2: Conversion to Integer Programming Form

Reflective Question: What will be the final Cost Model for the problem?

Exercise Reflection: Use the space below to note the important things you have learned about solving problems using the Integer Programming model.

________________________________________________________________

________________________________________________________________



Sensitivity Analysis

Sensitivity Analysis Sensitivity analysis is the study of how the changes in the coefficients of a Linear Program affect the optimal solution.

Optimization Using Excel Solver Excel Solver is a Linear Programming solving option used by Microsoft Excel. You can install Microsoft Excel Solver by selecting the Microsoft Office Button > Excel Options > Add-Ins > Solver Add-In.

Some of the other integer linear programs software packages available on the market are:

• MPSX – MIP

• OSL

• CPLEX

• LINDO



Section 5 – Network Problems (Part II)

Minimum Cost Flow Problem

Overview of the Minimum Cost Flow Problem The Minimum Cost Flow Problem is used to send flow from a set of supply nodes to a set of demand nodes through the arcs of a network, at minimum total cost, and without violating the lower and upper bounds on flows through the arcs. This problem is used for moving only one product / commodity at a time.

Decision: Which arcs are to be used, given the lower and upper bounds or each arc?

Objective: Minimize total cost

For each arc: x = Cost of transportation per unit y = Lower capacity constraint z = Upper capacity constraint



For each supply node: [a = Available supply of commodity X]

For each demand node: [b = Demand for commodity X]

Linear Programming Method – Standard Form i = index for origins, i = 1, 2, 3…m ; j = Index for destinations, j = 1, 2, 3…n cij = Cost per unit shipped from origin i to destination j; si = Supply or capacity in units at origin i dj = Demand in units at destination j; lij = Lower bound on the flow from origin i to destination j uij = Capacity on the flow from origin i to destination j xij = Number of units shipped from origin i to destination j, where xij is only defined for arcs that exist in the network

Variations to Minimum Cost Flow Problem

• Assignment Problem

• Transshipment Problem

• Transportation Problem

• Shortest Path Problem

• Maximal Flow Problem

• Unbalanced Minimum Cost Flow Problems

Sample List of Algorithms for Minimum Cost Flow Problem

• Negative Cycle Algorithm

• Successive Shortest Path Algorithm

• Primal-Dual Algorithm

• Out-of-Kilter Algorithm



Transportation Problem

Overview of the Transportation Problem The Transportation Problem can be used to minimize the cost of shipping goods from multiple origins to multiple destinations. It is typically used in distribution planning, where the quantity of goods available at a supply location is limited, and the quantity of goods required at each demand location is known. This is a more specific form of the Minimum Cost Flow problem.

Decision: How much to ship along each arc between any origin and destination?

Objective: Minimize shipping cost.

Linear Programming Method – Standard Form i = Index for origins, i = 1, 2, 3….m j = Index for destinations, j = 1, 2, 3….n xij = Number of units shipped from origin i to destination j cij = Cost per unit shipped from origin i to destination j si = Supply or capacity in units at origin i dj = Demand in units at destination j



Variations to Transportation Problem Assignment Problem:

• All supply and demand values equal 1 and the amount shipped over each arc is either 0 or 1

• Primarily used for assignment of resources to specific tasks such as project staffing in large corporations and deployment of armed forces personnel

Supply vs. Demand Problem:

• Total supply is not equal to total demand

• For this sample business problem, you can create a dummy supply and demand node which acts as a catch-all for the excess supply and demand

Other Problems:

• Objective function is maximized rather than minimized (e.g., profit criterion)

• Routes that have specified capacity restrictions or minimums

• Some routes may be unacceptable

Sample List of Algorithms for Transportation Problem

• Northwest Corner rule

• Minimum Cost Method

• Vogel’s Approximation Method

• Stepping Stone Method

• Modified Distribution Method

Multi-Commodity Flow Problem

Overview of the Multi-Commodity Flow Problem The Multi-Commodity Flow Problem is a Network Flow Problem that has multiple commodities flowing through a network, where each commodity has different supply and demand nodes and each arc route has capacity restrictions. In finite time, only approximate algorithms can be used.

Decision: How much quantity of each commodity should be sent through each arc, given supply-demand and capacity constraints?

Objective: Flow assignment that satisfies the constraints.



For each arc: x = Cost of transportation per unit (varies for each commodity) y = Lower capacity constraint z = Upper capacity constraint For each supply node: a = Available supply of commodity A b = Available supply of commodity B c = Available supply of commodity C For each demand node: d = Demand for commodity A e = Demand for commodity B f = Demand for commodity C

Linear Programming Method – Standard Form K = Index for number of commodities, K = 1, 2, 3…k ckij = Cost per unit of commodity k along arc (i,j) uij = Capacity on arc (i,j) ski = Available supply of commodity k at node i dkj = Required quantity (demand) of commodity k at node j xkij = Flow of commodity k along arc (i,j), where xij is defined only for those arcs that exist in the network



Variations to Multi-Commodity Flow Problem • Minimum Cost Multi-Commodity Flow Problem: This problem is applied where

there is a cost associated with sending flow on each arc that needs to be minimized

• Maximum Multi-Commodity Flow Problem: This problem is applied where there are no hard demands on each commodity, but the total throughput has to be maximized

• Maximum concurrent flow problem: This problem is applied where the task is to maximize the minimal fraction of the flow of each commodity to its demand

Sample List of Algorithms for Multi-Commodity Flow Problem

• Dantzig-Wolfe Decomposition

• Frank-Wolfe Algorithm

• Lagrangian Relaxation

• Augmented Lagrangian Relaxation

• Proximal Decomposition

Which Problem to Choose?

Common Limitations of Network Flow Problems • Most business problems may not perfectly fit into the format of a particular

Network Flow Problem. These problems can be used as a basis for conceptualizing other heuristics

• Most special purpose algorithms can be used to solve only single objectives. It may be necessary to use Linear Programming or other heuristics if there are additional constraints or objectives

• When arc values in a particular network are negative, customized algorithms need to be used

• Depending on the nature of the business problem, objectives may need to be maximized or minimized. For example, given that the Shortest Path Algorithm always identifies a minimum value solution, it may not be ideal to apply the algorithm to situations that involve a profit criterion



Dynamic Programming

Overview of the Dynamic Programming Dynamic Programming is a unique problem solving approach that decomposes a large, complex problem into multiple smaller problems that are easier to solve. The Dynamic Programming approach results in the optimal solution for the large problem once all the smaller problems have been solved.

Linear Programming Method – Standard Form xn = State variables, which represent input to stage n (output from stage n + 1) dn = Decision variable at stage n tn = Stage transformation function that determines the stage n output rn = Return function for stage n, which represents the payoff or value for a stage N = Number of stages in the dynamic program. N varies from 1 to N. The general expression for the stage transformation function is xn-1 = tn (xn, dn) The general expression for the return function is rn (xn, dn)



Section 6 – Applied Statistics (Part II)

Simple Regression and Correlation

Overview of Simple Regressions and Correlations Regressions and correlations deal with the determination of relationships between variables. Both regression and correlation analyses help to determine the nature and strength of a relationship between variables.

Regression analysis is used to develop an estimating equation, which is a mathematical description of the relationship between a known variable and an unknown variable.

Correlation analysis is used to determine the degree to which the variables are related. In essence, correlation analysis is used to decide how well the estimating equation actually describes the relationship.

Causality between Variables There is usually a causal relationship between the dependent and independent variables. For example, as the relationship between advertising spends and sales – an increase in advertising spends causes an increase in sales.



Scatter Diagram A Scatter Diagram is a diagram in which the data is plotted on a chart.

Some of the uses of Scatter Diagram are:

• Helps visually identify if there are any patterns to indicate that the variables are related

• Identifies the kind of line and required estimation equation that describes the relationship

Different types of scatter diagrams are:



Equation for a Straight Line

Equation for a Straight Line To fit a regression line mathematically, it is necessary to “fit” a line such that it minimizes the total square error between the estimated points on the line and actual observed points that were used to draw it.

Squaring the errors magnifies (or penalizes) larger errors, and cancels the effect of positive and negative values.



Formulas for the Method of Least Squares

The slope of a line (b) obtained using linear least squares fitting is called the Regression Coefficient.

Estimating the Regression Equation Several statistical packages are readily available that estimate the regression equation and provide the coefficients.



Example: Microsoft Excel

For any set of values for X and Y, Excel can be used to rapidly plot the linear trend line and derive the regression equation.

Standard Error of Estimate The measure of reliability is called the Standard Error of Estimate.

Standard Error of Estimate is denoted by se. It measures the variability, or scatter, of the observed values around the regression line. Statistical packages calculate the se and provide the value as the output.

Formula for Standard Error

Interpretation The se can be used to form bounds around the regression line as follows:

• 68% of the points can be found within a band of +/- 1 se around the regression line

• 95.5% of the points can be found within a band of +/- 2 se around the regression line

• 99.7% of the points can be found within a band of +/- 3 se around the regression line



Correlation Analysis Correlation Analysis is a statistical tool that is used to describe the degree to which one variable is linearly related to another. It is used in conjunction with regression analysis to measure how well the regression line explains the variation of the dependent variable.

The sign of r indicates the direction of the relationship between the two variables.

• r2 = 1 and r = 1, means that the two variables are perfectly correlated and the slope of the line is positive

• r2 = 0 and r = 0, means that the two variables are not at all correlated

• r2 = 1 and r = -1, means that the two variables are perfectly negatively correlated and the slope of the line is negative

For example, if r2 = 0.45, it means that only 45% of the total variation in the dependent variable is explained by the regression line. It is important to note that r2 measures only the strength of a linear relationship between two variables.



Multiple Regression and Correlation

Overview of Multiple Regression More than one variable is used to estimate the dependent variable to increase the accuracy of the estimate. For example, there is a positive relationship between demand for sunglasses and various demographic characteristics (age, income) of the buyers – that is, demand varies directly with changes in their characteristics.

This process is called multiple regression and correlation, and is based on the same assumptions and processes we discussed in simple regression.

Example Sale of Beer = β0 + β1*(Temperature) + β2(NASDAQ Levels) + β3(Price of Beer) + β4 + β5 + …..

Three Step Process – Multiple Regression and Correlation Analysis Step 1: Describe the Multiple Regression Equation

Step 2: Examine the Multiple Regression Standard Error of Estimate

Step 3: Use Multiple Correlation Analysis to determine how well the regression equation describes the observed data and refine the model by adding or changing the terms as necessary

Assumptions Some of the key assumptions in Multiple Regression Analysis are :

• Normality

• Linearity

• Reliability

• Homoscedasticity

Standard Estimating Equation for Multiple Regression



The multiple regression equation contains several types of terms that are introduced based on the situation. Some of the types are:

• Linear Terms: Terms that affect the independent variable linearly – X1, X2

• Non-Linear Terms: Terms that affect the dependent variable non-linearly – X32

• Dummy Variables: Terms that represent qualitative factors like gender and can have discrete values or levels

• Interaction Variables: Terms that represent combined effect of the two independent variables on the dependent - X1X2

Sample Multiple Regression Equation

Dummy / Binary Variables Dummy, or Binary variable regression models involve usage of categorical (non-quantitative) variables with two or more levels. The number of dummy variables used is one less than the number of levels of the categorical variable.

Examples • Gender is a categorical variable with two levels that can be coded as 0 and 1

• States in the U.S. is a categorical variable with 50 possible levels

Interaction Variables An interaction variable is a variable often used in regression analysis, formed by the multiplication of two independent variables. An interaction regression model is used when response to one independent variable varies at different levels than those of another independent variable.

Multiple Regression model equation with interaction term:

where β3, X1, X2 are the interaction terms



Standard Error of Estimate for Multiple Regression



Overview and Effect of Multicollinearity (Model Issue) • Multicollinearity is a statistical phenomenon in which two or more predictor

variables in a Multiple Regression model are highly correlated thereby violating the linearity assumption required

• While conducting Multiple Regression analysis, the regression coefficients become less reliable as the degree of correlation between the independent variables increases

• In contrast to simple regression where each variable is highly significant, in Multiple Regression, the variables are collectively very significant but individually not significant

• Although it may still be possible to make estimations when Multicollinearity is present, results may change erratically in response to small changes in the model or the data

• This is particularly important as it is possible to accurately predict how the dependent variable will change as you tweak any of the independent variables that are correlated with another independent variable

Indicators of Multicollinearity (Model Issue) • Large changes in regression coefficients when an independent variable or

additional observations are added

• The model as a whole does a good job explaining the data, but none/few of the coefficients are statistically significant by themselves

• Variance Inflation Factor (VIF) of > 5; where VIF = 1/ (1-R2)

Common Remedies for Multicollinearity (Model Issue) • Drop one of the variables that is causing the Multicollinearity at the risk of

imminent bias in the remaining variables

• Obtain more data

Overview and Effects (Heteroscedasticity & Error Trends) • Ideally, residuals or error terms are randomly scattered around 0 (the horizontal

line), providing a relatively even distribution. Heteroscedasticity is indicated when the residuals are not evenly scattered around the line. Example: The error term could vary or increase with each observation, something that is often the case with cross-sectional or time series measurements

• Heteroscedasticity does not mean your coefficients are wrong, but rather that the model becomes less accurate as you increase term values

• Heteroscedasticity often occurs when there is a large difference among the sizes of the observations

• Seeing other trends (i.e. nonlinear relationship) in the model will clue you in to missing model terms



Detection and Remedy (Heteroscedasticity & Error Trends)

• Residual plots (plot of error terms) in Multiple Regression Analysis allows visual

detection of heteroscedasticity

• Dealing with heteroscedasticity is reasonably straightforward but a little technical. Techniques are widely available and can be found through textbooks, SMEs, etc.

• For dealing with other error trends, you need to add additional terms to your model. For example, if you see a parabola in the error terms, you should try adding an x2 term

Exercise: Using what you’ve just learned, interpret the output for the following problem:

Question 6.1: For this problem, we’ll return to the case of Moondrop Airline Corporation (MAC) from the Network Problems course. MAC has expanded its operations to cover 15 terminals and has recently conducted a survey across these terminals for the month of February. The information collected covers sales, spend on promotions, number of competing airlines at that terminal and the number of passengers who flew for free.

Solution Step 1: Input



Step 2: Output

Reflective Question: To arrive at the multiple regression equation, how are the coefficients interpreted?

Exercise Reflection: Use the space below to note the important things you have learned about solving problems using the Multiple Regression and Correlation Analysis.

________________________________________________________________

________________________________________________________________



Factor Analysis, Clustering & Discriminant Analysis

Overview of Factor Analysis Factor analysis (also called PCA – Principal Component Analysis) is a statistical method used for data reduction and summarization. Observed variables are represented in terms of variables which are unobserved (factors). It investigates whether a number of variables of interest are linearly related to a small number of unobserved factors.

Benefits of Factor Analysis The primary benefit of Factor Analysis is that a large number of correlated variables can be reduced to a manageable level:

• Fewer number of factors results in ease of interpretation and reduced complexity

• Effects of Multicollinearity are eliminated as the factors are orthogonal to each other

Commercially available statistics packages can be used to conduct this analysis. Excel doesn’t have the capability to conduct Factor Analysis.

Factor Tables The parameters (coefficients) of linear function between unobserved variables and unobserved factors are provided in the output table:

Variables Factors

Luxury Factor 2 Factor 3

Prestige 0.7655 0.1242 0.3343

Strong Brand 0.9876 0.3423 0.5684

Variable 3 0.4566 0.4533 0.8977

Variable 4 0.3424 0.9856 0.3455

… 0.4666 0.6753 0.3453

World-class Service 0.7643 0.2342 0.5564

Value for Money 0.1226 0.4674 0.7896

… 0.6773 0.3433 0.8996

… 0.3453 0.8772 0.3453



Variables Each variable is weighted proportionately to its involvement in the factor. The more involved a variable, the higher the score (positive or negative depending on the direction of relation). Scores on multiple variables of each sample can be converted to a limited number of factors using a linear equation derived from a factor loading table:

Factors The factor scores are unobserved and abstract; therefore, its direct interpretation is not available.

Once we have factor scores, we can use them as independent variables in regression as follows:

Types of Factor Analysis

• Exploratory Factor Analysis

• Confirmatory Factor Analysis

Applications of Factor Analysis Some of the business situations in which Factor Analysis is used are:

• Behavioral sciences and psychometrics

• Social sciences

• Marketing

• Product management

• Operations research

• Other applied sciences that deal with large quantities of data



Overview of Clustering

Clustering is used to identify the intrinsic grouping in a set of objects and classify them into relatively homogenous groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters.

Cluster Dendogram A dendogram is a graphical representation of a hierarchy of nested cluster solutions starting from a one-cluster solution all the way through to an n-cluster solution.

Drawing a perpendicular line through the dendogram corresponding to a particular distance shows the cluster solution at that level of distance.

Method of Clustering • Hierarchical Methods

• Partitioning Methods

Applications of Clustering • Market segmentation

• Market structure analysis

• Petroleum geology

• Data mining

• Pattern recognition

• Image analysis

• Biology and numerical taxonomy

Overview of Discriminant Analysis The objective of Discriminant Analysis is to classify objects (people, items, etc.) into two or more groups based on the features of the objects.



Approaches to Discriminant Analysis Discriminant analysis is an analysis of dependence method where the dependent variables are categorical in nature, dividing the set of observations into mutually exclusive and collectively exhaustive groups.

A categorical variable classifies objects into categories (e.g., good/bad, high/medium/low, etc.). Typically, G – 1 variables (each a binary indicator) describe membership in G mutually exclusive and collectively exhaustive groups. The output of discriminant analysis is an equation (similar to the regression equation) involving independent variables which calculate the discriminant score, and also a cut-off score to identify membership of each of the items into groups. Commercially available statistics packages can be used to conduct this analysis.

Discriminant Analysis Tool Output – Standard Form The score for each object can be calculated for which we want to predict the group membership using canonical discriminant function. The decision to which group the object belongs is made by comparing the score with a calculated cut-off score.

Canonical Discriminant Function Coefficients:

Functions at Group Centroids:

For each object, the discriminant score can be calculated using the equation. This score can be compared to the cut-off score to determine into which group the item can be classified.

Common Methods of Discriminant Analysis

• Fisher’s Approach

• Mahalanobis' Approach

Applications of Discriminant Analysis

• Product management

• Marketing research

• Bankruptcy prediction

• Credit scoring

• Face recognition



Solutions

Solution 1.1: Let the diet contain x units of A and y units of B. Total cost = 2x + 4y

Objective Function: Minimize Z = 2x + 4y

Constraints:

10x + 25y ≥ 200 20x + 10y ≥ 100 15x + 20y ≥ 150 x ≥ 0, y ≥ 0

Solution 1.2: Step 1: Since x>0, y>0, we consider only the first quadrant of the xy – plane

Step 2: We draw straight lines for the equation

2x+ y = 100

x + y = 80

To determine two points on the straight line 2x + y = 100

Put y = 0, 2x = 100

x = 50

(50, 0) is a point on the line 2

put x = 0 in (2), y =100

(0, 100) is the other point on the line 2

Plotting these two points on the graph paper draw the line which represent the line 2x + y =100.



This line divides the 1st quadrant into two regions, say R1 and R2. Choose a point

say (1, 0) in R1. (1, 0) satisfy the inequation 2x + y 100. Therefore R1 is the

required region for the constraint 2x + y 100.

Similarly draw the straight line x + y = 80 by joining the point (0, 80) and (80, 0).

Find the required region say R1', for the constraint x + y 80.

The intersection of both the region R1 and R1' is the feasible solution of the Linear

Programming problem. Therefore every point in the shaded region OABC is a

feasible solution, since this point satisfies all the constraints including the non-

negative constraints.



Solution 2.1:

Solution 2.2:



Solution 2.3:

Solution 3.1: If the possible outcomes for an experiment are a1, a2, . . .,an, and if the probabilities of these outcomes are p1, p2, . . ., pn then the expected value is

E = a1 p1 + a2 p2 + . . . an pn

Expected Value E = 0(0.50) + 1(0.40) + 2(0.10) = 0.6

deloitte supply chain analytics workbook

Documents

activities section

workbook available

linear programming problems

workbook contents welcome

advanced topics

learning materials

network problems

key learning objectives