how to measure anything - ppxppx.ca/wp-content/uploads/2016/05/plenary-douglas-hubbard.pdf · a...

Hubbard Decision Research 2 South 410 Canterbury Ct Glen Ellyn, Illinois 60137

www.hubbardresearch.com

© Hubbard Decision Research, 2015

A Presentation for PPX

How to Measure

Anything


Two Related Questions

• Are some things really immeasurable? • Which decision making methods work – or can be improved?

1


Your Most Important Decision

The Meta Decision: The decision you make about how you are going to make decisions

2


Can Analysis Or Expertise Be A “Placebo”?

3

Gathering more information makes you feel more confident but, at some point, begins to reduce decision quality while confidence continues to increase.

Examples from research: •  Collecting on horse races to predict outcomes (Tsai, Klayman, Hastie) •  Interaction with others to improve project estimates (Heath, Gonzalez) •  Collecting more data about investments to improve returns (Andreassen)

In short, we should assume increased confidence from analysis is a “placebo”. Real benefits have to be measured.

Analysis Effort

Actual

Per

form

ance


Decisions and Meta Decisions

Decision Methods •  Expert intuition •  Some kind of

“scoring” method •  Voting •  Consensus •  Accounting-style

estimates •  Statistical methods &

other advanced decision analysis

4

The Meta-Decision Criteria •  Is there evidence it improves estimates

and decisions? •  Is there evidence it makes estimates and

decisions worse? •  Is the evidence based on large controlled

experiments or just anecdotes or “industry best practices”?

•  Does it quantify risk? •  What measurements does it promote and

why? •  Is it practical – have organizations learned

and applied it?


“There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one. [With hardly] a half dozen studies showing even a weak tendency in favor of the [human expert], it is time to draw a practical conclusion.”

Expert vs. Quantitative Models

5

Paul Meehl assessed 150 studies comparing human experts to

statistical models in many fields (predicting football games, the

prognosis of liver disease, etc.).

“It is impossible to find any domain in which humans clearly outperformed crude extrapolation algorithms, less still sophisticated statistical ones.”

Philip Tetlock tracked a total of over 82,000 forecasts from 284

political experts in a 20 year study covering elections, policy effects,

wars, the economy and more.


Reasons For Not Measuring

6

Have you heard (or said) any of these?

“We don’t have sufficient data…”

“Each situation is too unique and complex to apply scientific analysis

of historical data...”

“There is too much error and bias in the data for it to be worth the effort to gather it...”

“There are so many factors affecting this, this measurement alone tells us nothing...”

The implied (and unjustified) conclusion from each of these is….

“…therefore we are better off relying on our experience.”


Do “Scores” and “Scales” Work?

Researchers uncovered several unintended consequences of simple ordinal scales and using words for probabilities: •  Scales obscure (rather than alleviate) the lack of information

(Budescu) •  Arbitrary partitions have unexpected effects on scoring behavior (Fox) •  The added error makes them “worse than useless” (Cox).

7

Excerpt from: Richards Heuer, The Psychology of Intelligence Analysis, Center for the Study of Intelligence, CIA, 1999

10% 20% 30% 40% 50% 60% 70% 80% 90%

23 NATO officers estimates of probabilities for events described using common terms used in communicating likelihoods in intelligence reports (e.g. “War between X and Y is…”


The Misunderstandings Behind an “Immeasurable”

8

Many procedures of empirical observation are misunderstood.

The thing being measured is not well defined.

The definition of measurement itself is widely misunderstood.

CONCEPT of Measurement

OBJECT of Measurement

METHOD of Measurement


The Concept of Measurement

• Measurement: A quantitatively expressed reduction in uncertainty based on observation

9


The Object of Measurement

• What does it mean? • Why do you care about it? • What do you see when there is more of it? • Are there lots of parts you need to identify?

10


Methods of Measurement

1.  A sample of 5: –  Suppose you are extremely uncertain about how much time per day is spent in some

activity in a company of 10,000 people –  Imagine you randomly sample 5 people out of a company and they spend an amount of

time in this activity as as shown by the data points below –  Is this statistically significant? –  Is it possible to estimate the chance the median time spent per person per day is between

15 and 40 minutes?

2.  A sample of one:

–  Imagine a crate full of marbles. – Green marbles make up a randomly chosen share (a uniform distribution of 0%-100%), the

rest are red. –  If you randomly choose one marble without seeing the rest, and it turns out to be red, what

is the chance the majority are red?

11

Minutes per day in activity X

© Hubbard Decision Research, 2015 12

Your Intuition About Sample Information Is Wrong

“Our thesis is that people have strong intuitions about random sampling…these intuitions are wrong in fundamental respects...[and] are shared by naive subjects and by trained scientists”

Amos Tversky and Daniel Kahneman, Psychological

Bulletin, 1971

•  Experts are not immune to widely held misconceptions about probabilities and statistics – especially if they vaguely remember some college stats.

•  These misconceptions lead many experts to believe they lack data for assessing uncertainties or they need some ideal amount before anything can be inferred.


Can We Improve Subjective Expert Judgement?

• Experts have an important role in estimates and decisions – but important biases and errors need to be considered.

• Fortunately, we can reduce those errors with training and the mathematical assistance

13

“Experience is inevitable. Learning is not.” Decision science researcher Paul Schoemaker


Monte Carlo: How to Model Uncertainty in Decisions

•  Simply put, Monte Carlo Models approximate the probability of certain outcomes by running multiple trial runs, called simulations, using random variables.

•  In the oil industry there is a correlation between the use of quantitative risk analysis methods and financial performance – and the improvement started after using the quantitative methods. (F. Macmillan, 2000)

•  Data at NASA from over 100 space missions showed that Monte Carlo simulations beat other methods for estimating cost, schedule and risks (I published this in The Failure of Risk Management and OR/MS Today).

•  More about Monte Carlos in Module A-3

Interest or Discount Rate

6% 7% 8% 5% 4%

$50 $60 $70 $40 $30

Increase in Profits ($MM)

NPV

$3M $4M $5M $2M $1M ?

Reduction in Costs($MM)

Gains in Productivity

20% 15% 30% 15% 10%

$30 $35 $40 $25 $20

14


Overconfidence

15

“Overconfident professionals sincerely believe they have expertise, act as experts and look like experts. You will have to struggle to remind yourself that they may be in the grip of an illusion.”

Daniel Kahneman, Psychologist, Economics Nobel

“It’s not what you don’t know that will hurt you, it’s what you know that ain’t so.”

Mark Twain


Giga Analysts

Giga Clients

Statistical Error

“Ideal” Confidence

30%

40%

50%

60%

70%

80%

90%

100%

50% 60% 80% 90% 100%

25

75 71 65 58

21

17

68 152 65

45 21

70%

Assessed Chance Of Being Correct

Per

cent

Cor

rect

99 # of Responses

•  In January 1997, Doug conducted a calibration training experiment with 16 IT Industry Analysts and 16 CIO’s to test if calibrated people were better at putting odds on uncertain future events.

•  The analysts were calibrated and all 32 subjects were asked to predict 20 IT Industry events

1997 Calibration Experiment

Source: Hubbard Decision Research


Measuring & Removing Estimation Inconsistency

•  No matter how much experience experts have, they appear to be unable to apply what they learned consistently.

•  Methods that statistically “smooth” their estimates show reduced error in several studies for many different kinds of problems.*

17

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Reduction in Errors

R&D Portfolio Priorities

Battlefield Fuel Forecasts

IT Portfolio Priorities

Cancer patient recovery

Changes in stock prices

Mental illness prognosis

Psychology course grades

Business failures

0% 10% 20% 30%

HDR Studies

Other Published Studies

First Estimate

Seco

nd E

stim

ate


Applied Information Economics

• AIE is a practical application of quantitative methods to decision analysis problems

• Goal: Optimizing Uncertainty Reduction –Balancing measurably improved decisions and analysis effort

• It answers two questions: – Given the current uncertainty, what is the best decision? – What additional analysis or measurements are justified?

18


Uses of Applied Information Economics

19

AIE was applied initially to IT business cases. But over the last 20 years it has also been applied to other decision analysis problems in all areas of Business

Cases, Performance Metrics, Risk Analysis, and Portfolio Prioritization.

•  Prioritizing IT portfolios •  Risk of software

development •  Value of better information •  Value of better security •  Risk of obsolescence and

optimal technology upgrades

•  Value of infrastructure •  Performance metrics for

the business value of applications

IT

•  Risks of major engineering projects

•  Risk of mine flooding

Engineering

•  Movie / film project selection

•  New product development •  Pharmaceuticals •  Medical devices •  Publishing •  Real estate

Business Government & Non Profit

•  Environmental policy •  Sustainable agriculture •  Procurement methods •  Grants management

Military •  Forecasting battlefield fuel

consumption •  Effectiveness of combat

training to reduce roadside bomb / IED casualties

•  R&D portfolios


The Value of Information

20

The Formula For The Value of Information:

EVI p r V p r V p r V p r EVi j j ij

z

j j i l j j ij

z

j

z

i

k

=⎡

⎣⎢

⎤

⎦⎥ −

= ===∑ ∑∑∑ ( )max ( | ), ( | ),... ( | ), *, , ,11

2111

Θ Θ Θ

…or, in its simplest form: “The cost of being wrong times

the chance of being wrong”


The Measurement Inversion

21

•  Initial cost •  Long-term costs

•  Cost saving benefit other than labor productivity

•  Labor productivity

•  Revenue enhancement

•  Technology adoption rate •  Project completion

Lowest Information Value

Highest Information Value

Most Measured

Least Measured

In a business case, the economic value of measuring a variable is usually inversely proportional to the measurement attention it typically gets.


Real Examples of Measurement Inversion

Subject What they would have measured

What they needed to measure

New Procurement System for Government

Detailed “time and motion” study of procurement process

The price savings from using reverse auctions

Battlefield Fuel Forecasting Chance of enemy contact, forecasts vehicle maintenance

The difference in mileage between paved and gravel roads

Risks of flooding in mining operations

Drilling test holes all over the mine

How much water the main pumps can handle

Market for new pharmaceutical products

The adoption rate of the new drug in all global regions

The duration of phase 1 testing, chance of a particular clinical outcome

Impact of pesticides regulation The value of saving endangered species

Whether pesticides regulation ever saves any endangered species

IT security People who attended training, external threats Internal theft incidents

22


Valu

e or

Cos

t

EVI

ECI

Increasing Value & Cost of Info.

• EVPI – Expected Value of Perfect Information

• ECI – Expected Cost of Information

• EVI – Expected Value of Information

$0

$$$

Low certainty High certainty

EVPI

Aim for this range

23

Perfect Information


Making the Best Decisions*

Optimize Decision – Use the quantified Risk/Return Boundary of the decision makers to determine which

decision is preferred.

Define the Decision – Identify relevant variables, set up the “Business Case” for

the decision using these variables.

Cal

ibra

tion

Trai

ning

Compute the value of additional Information – Determine what to measure and how much

effort to spend on measuring it.

Measure where the information value is high – Reduce uncertainty using any of the

methods.

Model The Current State of Uncertainty – Initially use calibrated estimates and then

actual measurements.

No

Yes

Is there significant value to more information?

*Covered more in all subsequent modules


Inconsistent Risk Judgements

•  Studies have shown risk aversion changes due to what should be irrelevant external factors including:

Factor Risk Aversion

Being around smiling people Recalling an event causing fear Recalling an event causing anger A recent win in an unrelated decision A recent loss in an unrelated decision


Quantifying Risk Aversion

26

Acceptable Risk/Return Boundary

•  The simplest element of Harry Markowitz’s Nobel Prize-winning method “Modern Portfolio Theory” is documenting how much risk an investor accepts for a given return.

•  The “Investment Boundary” states how much risk an investor is willing to accept for a given return.

•  For our purposes, we modified Markowitz’s approach a bit.

Investment

Expected IRR over 5 years

50%

40%

30%

20%

10%

0% 0% 50% 100% 150% 200%

Cha

nce

of a

neg

ativ

e IR

R

Region of Acceptable Investments

Region of Unacceptable Investments


Real Benefits: Life Technologies

•  An HDR client, Life Technologies was forecasting revenue of new products in the biotech lab equipment industry.

•  Their own analysis of our models showed that random error as well as systemic error (constantly overestimating revenue) were reduced.

•  Total reduction of error for forecasting revenue was 76%

27

Actual

AIE Model

Experts

Relative of Estimated vs. Actual Revenue


Key Advantages of AIE

28

Every component of AIE is based on methods that showed measurable improvements on expert intuition - over a large number

of trials and reported in peer-reviewed journals.

AIE explicitly addresses the measurement inversion problem by computing the value of information as a basis for all measurements.

AIE quantifies uncertainty and risk in a manner that is mathematically meaningful (i.e. can be used in probabilistic models).

With over 95 examples from a variety of industries, the method has become well-defined and repeatable.


Parting Thought

• Your most important decision is how to make decisions

• Misconceptions about measurements get in the way of improving decisions

• Many popular methods have been effectively “debunked” by the research

• Methods that show a measurable improvement exist and are practical

29


Questions?

Contact: Doug Hubbard Hubbard Decision Research [email protected] www.hubbardresearch.com 630 858 2788


Reactions: Fuel for the Marines

“The biggest surprise was that we can save so much fuel. We freed up vehicles because we didn’t have to move as much fuel. For a logistics person that's critical. Now vehicles that moved fuel can move ammunition.”

– Luis Torres, Fuel Study Manager, Office of Naval Research

“What surprised me was that [the model] showed most fuel was burned on logistics routes. The study even uncovered that tank operators would not turn tanks off if they didn’t think they could get replacement starters. That’s something that a logistician in a 100 years probably wouldn’t have thought of.”

– Chief Warrant Officer Terry Kunneman, Bulk Fuel Planning, HQ Marine Corps


Selected Sources

•  Tsai C., Klayman J., Hastie R. “Effects of amount of information on judgment accuracy and confidence” Org. Behavior and Human Decision Processes, Vol. 107, No. 2, 2008, pp 97-105

•  Heath C., Gonzalez R. “Interaction with Others Increases Decision Confidence but Not Decision Quality: Evidence against Information Collection Views of Interactive Decision Making” Organizational Behavior and Human Decision Processes, Vol. 61, No. 3, 1995, pp 305-326

•  Andreassen, P.” Judgmental extrapolation and market overreaction: On the use and disuse of news” Journal of Behavioral Decision Making, vol. 3 iss. 3, pp 153-174, Jul/Sep 1990

•  Williams M. Dennis A., Stam A., Aronson J. “The impact of DSS use and information load on errors and decision quality” European Journal of Operational Research, Vol. 176, No. 1, 2007, pp 468-81

•  Knutson et. al. “Nucleus accumbens activation mediates the influence of reward cues on financial risk taking” NeuroReport, 26 March 2008 - Volume 19 - Issue 5 - pp 509-513

•  A small study presented at Cognitive Neuroscience Society meeting in 2009 by a grad student at U. of Michigan showed that simply being briefly exposed to smiling faces makes people more risk tolerant in betting games.

•  Risk preferences show a strong correlation to testosterone levels – which change daily (Sapienza, Zingales, Maestripieri, 2009).

•  Recalling past events that involved fear and anger change the perception of risk (Lerner, Keltner, 2001).

32


Calibration Exercise: Ranges

! For the following questions, provide a range (an upper and lower bound) that you are 90% certain contains the correct answer:

33

Lower Bound

Upper Bound

Napoleon Bonaparte was born what year?

What is the average weight of an adult male African elephant (tons)?

The Coliseum in Rome held how many spectators?

How many countries are in NATO?

In what year did Newton publish the Laws of Gravitation?


Calibration Exercise: True/False

! For each statement below, answer whether you believe it is true or false and provide a percentage confidence that your answer is correct. Confidence is any value between 50% (“no idea”) to 100% (certainty).

34

True or False?

% Confi-dence

Brazil has a larger population than Spain.

A hockey puck will fit in a golf hole. The Yangtze River is the longest river in Asia.

Mars is always further away from Earth than Venus is from Earth. The movie Titanic still holds the record for box office receipts in the first six weeks.


Challenges to Capturing Uncertainty

• Experts are systematically overconfident • Experts are highly inconsistent • Proper probabilistic inferences are counter-intuitive (optional content)

35


Calibration Answers

Lower Bound

Napoleon Bonaparte was born what year? 1769

What is the average weight of an adult male African elephant (tons)? 3.5 tons

The Coliseum in Rome held how many spectators? 50,000

How many countries are in NATO? 28

In what year did Newton publish the Laws of Gravitation? 1687

36

True or False?

Brazil has a larger population than Spain. True

A hockey puck will fit in a golf hole. True

The Yangtze River is the longest river in Asia. True

Mars is always further away from Earth than Venus is from Earth. False

The movie Titanic still holds the record for box office receipts in the first six weeks.

False


“Low Resolution” Example of Bayes for Ranges

•  You can apply Bayes to estimating parameters (e.g. mean, median, etc.) of populations with continuous values (income, leisure time per week, etc.)

•  Just identify possible population distribution types and the probability of each type – resolution can be as fine as you like.

37

D

C

B

A A

B

C D

Number of Samples

Pro

babi

lity

of D

istri

butio

n


A Google Forecast

•  Using both Google Trends and previous Bureau of Labor Statistics data produces a powerful predictive model for unemployment

38

60%

80%

100%

120%

140%

160%

2005 2006 2007 2008 2009 2010

Prediction based on Google searches on

“unemployment”

BLS report (3-4 weeks later)


Facebook Forecasts

•  The sampling bias of social data may actually be desirable.

•  Studies done at Harvard Medical Center show how to forecast – not just track – flu outbreaks using friend connections.

•  Tracking more connected people can be better than tracking the original randomly selected people.

•  A group of people who are nominated friends (who tend to be more popular) exhibited what they called a “friend shift” in the data.

Copyright HDR 2008 [email protected]

39


Twitter Forecasts

Other Predictions • Dow Jones • Presidential Approval Ratings

• Consumer Confidence

Copyright HDR 2006 [email protected]

40

20 40 60 80 100 120 140

20

40

60

80

100

120

140

Predicted Revenue ($M)

Act

ual R

even

ue ($

M)

Twitter Forecasts

HSX Forecasts

The Hollywood Stock Exchange (HSX) vs. Twitter counts

Correlation = 0.9


How Uncertainty Changes With Sample Size

• Management seems to vaguely recall (incorrectly) some concepts with names like “statistical significance”

• But, in fact, there is pervasive misuse of these terms – all without doing any math

41


A Little Background

•  How to Measure Anything * – There are no real “immeasurables.” – Computing the value of information radically changes what you measure.

•  The Failure of Risk Management * – Popular risk management and risk analysis methods show no evidence of improving

decisions – and may make things worse.

•  Pulse – Vast data from social media and other places on the internet and mobile devices

creates a new, powerful measurement instrument.

42

*Required Reading in: •  Society of Actuaries Exam Prep •  Courses at multiple universities

including Harvard, Stanford, MIT, and the Naval Postgraduate School

how to measure anything - ppxppx.ca/wp-content/uploads/2016/05/plenary-douglas-hubbard.pdf · a...

Documents