how to measure anything - ppxppx.ca/wp-content/uploads/2016/05/plenary-douglas-hubbard.pdf · a...
TRANSCRIPT
Hubbard Decision Research 2 South 410 Canterbury Ct Glen Ellyn, Illinois 60137
www.hubbardresearch.com
© Hubbard Decision Research, 2015
A Presentation for PPX
How to Measure
Anything
© Hubbard Decision Research, 2015
Two Related Questions
• Are some things really immeasurable? • Which decision making methods work – or can be improved?
1
© Hubbard Decision Research, 2015
Your Most Important Decision
The Meta Decision: The decision you make about how you are going to make decisions
2
© Hubbard Decision Research, 2015
Can Analysis Or Expertise Be A “Placebo”?
3
Gathering more information makes you feel more confident but, at some point, begins to reduce decision quality while confidence continues to increase.
Examples from research: • Collecting on horse races to predict outcomes (Tsai, Klayman, Hastie) • Interaction with others to improve project estimates (Heath, Gonzalez) • Collecting more data about investments to improve returns (Andreassen)
In short, we should assume increased confidence from analysis is a “placebo”. Real benefits have to be measured.
Analysis Effort
Actual
Per
form
ance
© Hubbard Decision Research, 2015
Decisions and Meta Decisions
Decision Methods • Expert intuition • Some kind of
“scoring” method • Voting • Consensus • Accounting-style
estimates • Statistical methods &
other advanced decision analysis
4
The Meta-Decision Criteria • Is there evidence it improves estimates
and decisions? • Is there evidence it makes estimates and
decisions worse? • Is the evidence based on large controlled
experiments or just anecdotes or “industry best practices”?
• Does it quantify risk? • What measurements does it promote and
why? • Is it practical – have organizations learned
and applied it?
© Hubbard Decision Research, 2015
“There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one. [With hardly] a half dozen studies showing even a weak tendency in favor of the [human expert], it is time to draw a practical conclusion.”
Expert vs. Quantitative Models
5
Paul Meehl assessed 150 studies comparing human experts to
statistical models in many fields (predicting football games, the
prognosis of liver disease, etc.).
“It is impossible to find any domain in which humans clearly outperformed crude extrapolation algorithms, less still sophisticated statistical ones.”
Philip Tetlock tracked a total of over 82,000 forecasts from 284
political experts in a 20 year study covering elections, policy effects,
wars, the economy and more.
© Hubbard Decision Research, 2015
Reasons For Not Measuring
6
Have you heard (or said) any of these?
“We don’t have sufficient data…”
“Each situation is too unique and complex to apply scientific analysis
of historical data...”
“There is too much error and bias in the data for it to be worth the effort to gather it...”
“There are so many factors affecting this, this measurement alone tells us nothing...”
The implied (and unjustified) conclusion from each of these is….
“…therefore we are better off relying on our experience.”
© Hubbard Decision Research, 2015
Do “Scores” and “Scales” Work?
Researchers uncovered several unintended consequences of simple ordinal scales and using words for probabilities: • Scales obscure (rather than alleviate) the lack of information
(Budescu) • Arbitrary partitions have unexpected effects on scoring behavior (Fox) • The added error makes them “worse than useless” (Cox).
7
Excerpt from: Richards Heuer, The Psychology of Intelligence Analysis, Center for the Study of Intelligence, CIA, 1999
10% 20% 30% 40% 50% 60% 70% 80% 90%
23 NATO officers estimates of probabilities for events described using common terms used in communicating likelihoods in intelligence reports (e.g. “War between X and Y is…”
© Hubbard Decision Research, 2015
The Misunderstandings Behind an “Immeasurable”
8
Many procedures of empirical observation are misunderstood.
The thing being measured is not well defined.
The definition of measurement itself is widely misunderstood.
CONCEPT of Measurement
OBJECT of Measurement
METHOD of Measurement
© Hubbard Decision Research, 2015
The Concept of Measurement
• Measurement: A quantitatively expressed reduction in uncertainty based on observation
9
© Hubbard Decision Research, 2015
The Object of Measurement
• What does it mean? • Why do you care about it? • What do you see when there is more of it? • Are there lots of parts you need to identify?
10
© Hubbard Decision Research, 2015
Methods of Measurement
1. A sample of 5: – Suppose you are extremely uncertain about how much time per day is spent in some
activity in a company of 10,000 people – Imagine you randomly sample 5 people out of a company and they spend an amount of
time in this activity as as shown by the data points below – Is this statistically significant? – Is it possible to estimate the chance the median time spent per person per day is between
15 and 40 minutes?
2. A sample of one:
– Imagine a crate full of marbles. – Green marbles make up a randomly chosen share (a uniform distribution of 0%-100%), the
rest are red. – If you randomly choose one marble without seeing the rest, and it turns out to be red, what
is the chance the majority are red?
11
Minutes per day in activity X
© Hubbard Decision Research, 2015 12
Your Intuition About Sample Information Is Wrong
“Our thesis is that people have strong intuitions about random sampling…these intuitions are wrong in fundamental respects...[and] are shared by naive subjects and by trained scientists”
Amos Tversky and Daniel Kahneman, Psychological
Bulletin, 1971
• Experts are not immune to widely held misconceptions about probabilities and statistics – especially if they vaguely remember some college stats.
• These misconceptions lead many experts to believe they lack data for assessing uncertainties or they need some ideal amount before anything can be inferred.
© Hubbard Decision Research, 2015
Can We Improve Subjective Expert Judgement?
• Experts have an important role in estimates and decisions – but important biases and errors need to be considered.
• Fortunately, we can reduce those errors with training and the mathematical assistance
13
“Experience is inevitable. Learning is not.” Decision science researcher Paul Schoemaker
© Hubbard Decision Research, 2015
Monte Carlo: How to Model Uncertainty in Decisions
• Simply put, Monte Carlo Models approximate the probability of certain outcomes by running multiple trial runs, called simulations, using random variables.
• In the oil industry there is a correlation between the use of quantitative risk analysis methods and financial performance – and the improvement started after using the quantitative methods. (F. Macmillan, 2000)
• Data at NASA from over 100 space missions showed that Monte Carlo simulations beat other methods for estimating cost, schedule and risks (I published this in The Failure of Risk Management and OR/MS Today).
• More about Monte Carlos in Module A-3
Interest or Discount Rate
6% 7% 8% 5% 4%
$50 $60 $70 $40 $30
Increase in Profits ($MM)
NPV
$3M $4M $5M $2M $1M ?
Reduction in Costs($MM)
Gains in Productivity
20% 15% 30% 15% 10%
$30 $35 $40 $25 $20
14
© Hubbard Decision Research, 2015
Overconfidence
15
“Overconfident professionals sincerely believe they have expertise, act as experts and look like experts. You will have to struggle to remind yourself that they may be in the grip of an illusion.”
Daniel Kahneman, Psychologist, Economics Nobel
“It’s not what you don’t know that will hurt you, it’s what you know that ain’t so.”
Mark Twain
© Hubbard Decision Research, 2015 16
Giga Analysts
Giga Clients
Statistical Error
“Ideal” Confidence
30%
40%
50%
60%
70%
80%
90%
100%
50% 60% 80% 90% 100%
25
75 71 65 58
21
17
68 152 65
45 21
70%
Assessed Chance Of Being Correct
Per
cent
Cor
rect
99 # of Responses
• In January 1997, Doug conducted a calibration training experiment with 16 IT Industry Analysts and 16 CIO’s to test if calibrated people were better at putting odds on uncertain future events.
• The analysts were calibrated and all 32 subjects were asked to predict 20 IT Industry events
1997 Calibration Experiment
Source: Hubbard Decision Research
© Hubbard Decision Research, 2015
Measuring & Removing Estimation Inconsistency
• No matter how much experience experts have, they appear to be unable to apply what they learned consistently.
• Methods that statistically “smooth” their estimates show reduced error in several studies for many different kinds of problems.*
17
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Reduction in Errors
R&D Portfolio Priorities
Battlefield Fuel Forecasts
IT Portfolio Priorities
Cancer patient recovery
Changes in stock prices
Mental illness prognosis
Psychology course grades
Business failures
0% 10% 20% 30%
HDR Studies
Other Published Studies
First Estimate
Seco
nd E
stim
ate
© Hubbard Decision Research, 2015
Applied Information Economics
• AIE is a practical application of quantitative methods to decision analysis problems
• Goal: Optimizing Uncertainty Reduction –Balancing measurably improved decisions and analysis effort
• It answers two questions: – Given the current uncertainty, what is the best decision? – What additional analysis or measurements are justified?
18
© Hubbard Decision Research, 2015
Uses of Applied Information Economics
19
AIE was applied initially to IT business cases. But over the last 20 years it has also been applied to other decision analysis problems in all areas of Business
Cases, Performance Metrics, Risk Analysis, and Portfolio Prioritization.
• Prioritizing IT portfolios • Risk of software
development • Value of better information • Value of better security • Risk of obsolescence and
optimal technology upgrades
• Value of infrastructure • Performance metrics for
the business value of applications
IT
• Risks of major engineering projects
• Risk of mine flooding
Engineering
• Movie / film project selection
• New product development • Pharmaceuticals • Medical devices • Publishing • Real estate
Business Government & Non Profit
• Environmental policy • Sustainable agriculture • Procurement methods • Grants management
Military • Forecasting battlefield fuel
consumption • Effectiveness of combat
training to reduce roadside bomb / IED casualties
• R&D portfolios
© Hubbard Decision Research, 2015
The Value of Information
20
The Formula For The Value of Information:
EVI p r V p r V p r V p r EVi j j ij
z
j j i l j j ij
z
j
z
i
k
=⎡
⎣⎢
⎤
⎦⎥ −
= ===∑ ∑∑∑ ( )max ( | ), ( | ),... ( | ), *, , ,11
2111
Θ Θ Θ
…or, in its simplest form: “The cost of being wrong times
the chance of being wrong”
© Hubbard Decision Research, 2015
The Measurement Inversion
21
• Initial cost • Long-term costs
• Cost saving benefit other than labor productivity
• Labor productivity
• Revenue enhancement
• Technology adoption rate • Project completion
Lowest Information Value
Highest Information Value
Most Measured
Least Measured
In a business case, the economic value of measuring a variable is usually inversely proportional to the measurement attention it typically gets.
© Hubbard Decision Research, 2015
Real Examples of Measurement Inversion
Subject What they would have measured
What they needed to measure
New Procurement System for Government
Detailed “time and motion” study of procurement process
The price savings from using reverse auctions
Battlefield Fuel Forecasting Chance of enemy contact, forecasts vehicle maintenance
The difference in mileage between paved and gravel roads
Risks of flooding in mining operations
Drilling test holes all over the mine
How much water the main pumps can handle
Market for new pharmaceutical products
The adoption rate of the new drug in all global regions
The duration of phase 1 testing, chance of a particular clinical outcome
Impact of pesticides regulation The value of saving endangered species
Whether pesticides regulation ever saves any endangered species
IT security People who attended training, external threats Internal theft incidents
22
© Hubbard Decision Research, 2015
Valu
e or
Cos
t
EVI
ECI
Increasing Value & Cost of Info.
• EVPI – Expected Value of Perfect Information
• ECI – Expected Cost of Information
• EVI – Expected Value of Information
$0
$$$
Low certainty High certainty
EVPI
Aim for this range
23
Perfect Information
© Hubbard Decision Research, 2015 24
Making the Best Decisions*
Optimize Decision – Use the quantified Risk/Return Boundary of the decision makers to determine which
decision is preferred.
Define the Decision – Identify relevant variables, set up the “Business Case” for
the decision using these variables.
Cal
ibra
tion
Trai
ning
Compute the value of additional Information – Determine what to measure and how much
effort to spend on measuring it.
Measure where the information value is high – Reduce uncertainty using any of the
methods.
Model The Current State of Uncertainty – Initially use calibrated estimates and then
actual measurements.
No
Yes
Is there significant value to more information?
*Covered more in all subsequent modules
© Hubbard Decision Research, 2015 25
Inconsistent Risk Judgements
• Studies have shown risk aversion changes due to what should be irrelevant external factors including:
Factor Risk Aversion
Being around smiling people Recalling an event causing fear Recalling an event causing anger A recent win in an unrelated decision A recent loss in an unrelated decision
© Hubbard Decision Research, 2015
Quantifying Risk Aversion
26
Acceptable Risk/Return Boundary
• The simplest element of Harry Markowitz’s Nobel Prize-winning method “Modern Portfolio Theory” is documenting how much risk an investor accepts for a given return.
• The “Investment Boundary” states how much risk an investor is willing to accept for a given return.
• For our purposes, we modified Markowitz’s approach a bit.
Investment
Expected IRR over 5 years
50%
40%
30%
20%
10%
0% 0% 50% 100% 150% 200%
Cha
nce
of a
neg
ativ
e IR
R
Region of Acceptable Investments
Region of Unacceptable Investments
© Hubbard Decision Research, 2015
Real Benefits: Life Technologies
• An HDR client, Life Technologies was forecasting revenue of new products in the biotech lab equipment industry.
• Their own analysis of our models showed that random error as well as systemic error (constantly overestimating revenue) were reduced.
• Total reduction of error for forecasting revenue was 76%
27
Actual
AIE Model
Experts
Relative of Estimated vs. Actual Revenue
© Hubbard Decision Research, 2015
Key Advantages of AIE
28
Every component of AIE is based on methods that showed measurable improvements on expert intuition - over a large number
of trials and reported in peer-reviewed journals.
AIE explicitly addresses the measurement inversion problem by computing the value of information as a basis for all measurements.
AIE quantifies uncertainty and risk in a manner that is mathematically meaningful (i.e. can be used in probabilistic models).
With over 95 examples from a variety of industries, the method has become well-defined and repeatable.
© Hubbard Decision Research, 2015
Parting Thought
• Your most important decision is how to make decisions
• Misconceptions about measurements get in the way of improving decisions
• Many popular methods have been effectively “debunked” by the research
• Methods that show a measurable improvement exist and are practical
29
© Hubbard Decision Research, 2015 30
Questions?
Contact: Doug Hubbard Hubbard Decision Research [email protected] www.hubbardresearch.com 630 858 2788
© Hubbard Decision Research, 2015 31
Reactions: Fuel for the Marines
“The biggest surprise was that we can save so much fuel. We freed up vehicles because we didn’t have to move as much fuel. For a logistics person that's critical. Now vehicles that moved fuel can move ammunition.”
– Luis Torres, Fuel Study Manager, Office of Naval Research
“What surprised me was that [the model] showed most fuel was burned on logistics routes. The study even uncovered that tank operators would not turn tanks off if they didn’t think they could get replacement starters. That’s something that a logistician in a 100 years probably wouldn’t have thought of.”
– Chief Warrant Officer Terry Kunneman, Bulk Fuel Planning, HQ Marine Corps
© Hubbard Decision Research, 2015
Selected Sources
• Tsai C., Klayman J., Hastie R. “Effects of amount of information on judgment accuracy and confidence” Org. Behavior and Human Decision Processes, Vol. 107, No. 2, 2008, pp 97-105
• Heath C., Gonzalez R. “Interaction with Others Increases Decision Confidence but Not Decision Quality: Evidence against Information Collection Views of Interactive Decision Making” Organizational Behavior and Human Decision Processes, Vol. 61, No. 3, 1995, pp 305-326
• Andreassen, P.” Judgmental extrapolation and market overreaction: On the use and disuse of news” Journal of Behavioral Decision Making, vol. 3 iss. 3, pp 153-174, Jul/Sep 1990
• Williams M. Dennis A., Stam A., Aronson J. “The impact of DSS use and information load on errors and decision quality” European Journal of Operational Research, Vol. 176, No. 1, 2007, pp 468-81
• Knutson et. al. “Nucleus accumbens activation mediates the influence of reward cues on financial risk taking” NeuroReport, 26 March 2008 - Volume 19 - Issue 5 - pp 509-513
• A small study presented at Cognitive Neuroscience Society meeting in 2009 by a grad student at U. of Michigan showed that simply being briefly exposed to smiling faces makes people more risk tolerant in betting games.
• Risk preferences show a strong correlation to testosterone levels – which change daily (Sapienza, Zingales, Maestripieri, 2009).
• Recalling past events that involved fear and anger change the perception of risk (Lerner, Keltner, 2001).
32
© Hubbard Decision Research, 2015
Calibration Exercise: Ranges
! For the following questions, provide a range (an upper and lower bound) that you are 90% certain contains the correct answer:
33
Lower Bound
Upper Bound
Napoleon Bonaparte was born what year?
What is the average weight of an adult male African elephant (tons)?
The Coliseum in Rome held how many spectators?
How many countries are in NATO?
In what year did Newton publish the Laws of Gravitation?
© Hubbard Decision Research, 2015
Calibration Exercise: True/False
! For each statement below, answer whether you believe it is true or false and provide a percentage confidence that your answer is correct. Confidence is any value between 50% (“no idea”) to 100% (certainty).
34
True or False?
% Confi-dence
Brazil has a larger population than Spain.
A hockey puck will fit in a golf hole. The Yangtze River is the longest river in Asia.
Mars is always further away from Earth than Venus is from Earth. The movie Titanic still holds the record for box office receipts in the first six weeks.
© Hubbard Decision Research, 2015
Challenges to Capturing Uncertainty
• Experts are systematically overconfident • Experts are highly inconsistent • Proper probabilistic inferences are counter-intuitive (optional content)
35
© Hubbard Decision Research, 2015
Calibration Answers
Lower Bound
Napoleon Bonaparte was born what year? 1769
What is the average weight of an adult male African elephant (tons)? 3.5 tons
The Coliseum in Rome held how many spectators? 50,000
How many countries are in NATO? 28
In what year did Newton publish the Laws of Gravitation? 1687
36
True or False?
Brazil has a larger population than Spain. True
A hockey puck will fit in a golf hole. True
The Yangtze River is the longest river in Asia. True
Mars is always further away from Earth than Venus is from Earth. False
The movie Titanic still holds the record for box office receipts in the first six weeks.
False
© Hubbard Decision Research, 2015
“Low Resolution” Example of Bayes for Ranges
• You can apply Bayes to estimating parameters (e.g. mean, median, etc.) of populations with continuous values (income, leisure time per week, etc.)
• Just identify possible population distribution types and the probability of each type – resolution can be as fine as you like.
37
D
C
B
A A
B
C D
Number of Samples
Pro
babi
lity
of D
istri
butio
n
© Hubbard Decision Research, 2015
A Google Forecast
• Using both Google Trends and previous Bureau of Labor Statistics data produces a powerful predictive model for unemployment
38
60%
80%
100%
120%
140%
160%
2005 2006 2007 2008 2009 2010
Prediction based on Google searches on
“unemployment”
BLS report (3-4 weeks later)
© Hubbard Decision Research, 2015
Facebook Forecasts
• The sampling bias of social data may actually be desirable.
• Studies done at Harvard Medical Center show how to forecast – not just track – flu outbreaks using friend connections.
• Tracking more connected people can be better than tracking the original randomly selected people.
• A group of people who are nominated friends (who tend to be more popular) exhibited what they called a “friend shift” in the data.
Copyright HDR 2008 [email protected]
39
© Hubbard Decision Research, 2015
Twitter Forecasts
Other Predictions • Dow Jones • Presidential Approval Ratings
• Consumer Confidence
Copyright HDR 2006 [email protected]
40
20 40 60 80 100 120 140
20
40
60
80
100
120
140
Predicted Revenue ($M)
Act
ual R
even
ue ($
M)
Twitter Forecasts
HSX Forecasts
The Hollywood Stock Exchange (HSX) vs. Twitter counts
Correlation = 0.9
© Hubbard Decision Research, 2015
How Uncertainty Changes With Sample Size
• Management seems to vaguely recall (incorrectly) some concepts with names like “statistical significance”
• But, in fact, there is pervasive misuse of these terms – all without doing any math
41
© Hubbard Decision Research, 2015
A Little Background
• How to Measure Anything * – There are no real “immeasurables.” – Computing the value of information radically changes what you measure.
• The Failure of Risk Management * – Popular risk management and risk analysis methods show no evidence of improving
decisions – and may make things worse.
• Pulse – Vast data from social media and other places on the internet and mobile devices
creates a new, powerful measurement instrument.
42
*Required Reading in: • Society of Actuaries Exam Prep • Courses at multiple universities
including Harvard, Stanford, MIT, and the Naval Postgraduate School