bootstrapping – the neglected approach to uncertainty european real estate society conference...

16
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University of South Australia

Upload: dayna-black

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to

uncertainty

European Real Estate Society Conference

Eindhoven, Nederlands, 15-18 June 2011

Paul KershawUniversity of South Australia

Page 2: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 2

Overview

• The history of confidence intervals• Pedagogical predilection to a parametric view• Real estate research is NOT normal• Do not provide a measure of probability• Enter the Jackknife• Monte Carlo simulation & Bootstrapping• The basic algorithms• Real World Applications• A better mousetrap

Page 3: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 3

Introduction

• The origins of hypothesis tests is 1279• Confidence intervals were derived in 1937• A confidence interval estimates the uncertainty

about the true value of some population parameter

• 50-year lag before medical journals for example advocated their use

• The lazy approach is to assume a normal distribution

Page 4: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 4

Not Normal

• Very little about real estate can be considered to follow a Normal distribution including:Prices, Land area, building area, age, number of bedrooms, location, physical condition, construction, tenant’s covenant, heating, etc.

• Linear regression techniques are regularly applied, averages, standard errors and parametric confidence intervals proffered. Why?

• Is it because we are taught to do it that way – or because we teach it that way – gloss over the ignored assumptions – just give me a number from the printout.

Page 5: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 5

Not a measure of Probability

• This begs the question “what is the confidence interval of a correlation coefficient?” and leads to the second question “why is it so rarely reported?”

• What is a realistic confidence interval for a computer generated valuation using a linear regression model?

• Most proprietary AVMs provide their own, often ill defined, assessment of accuracy that is usually somewhat nebulous.

Page 6: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 6

Enter the Jackknife

• Early efforts in the 1950s revolved around the Jackknife (Quenouille, M).

• The Jackknife provides a technique for estimating the bias and standard error of an estimate irrespective of the shape of the underlying distribution.

• The following example is based upon the work of Efron, B; 1993. The datapoints are LSAT, the average score for the class on a national law test, and GPA, the average undergraduate grade-point average for the class.

Page 7: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 7

Sample Data

LSAT GPA576 3.39635 3.3558 2.81578 3.03666 3.44580 3.07555 3661 3.43651 3.36605 3.13653 3.12575 2.74545 2.76572 2.88594 2.96

2.7

2.8

2.9

3

3.1

3.2

3.3

3.4

3.5

540 560 580 600 620 640 660 680

Correlation = 0.7764

Jackknife 95% Confidence Interval = 0.73 to 0.89

Page 8: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 8

The basic algorithms

Compute sample statistics on n separate samples of size n-1. Each sample is the original data with a single observation omitted.

Jackknife Heuristic:• Remove one data point only and calculate the

statistic of interest to give estimate 1• Repeat for each data points to give estimates 2,

3, 4 …n• Calculate the percentiles of interest to obtain the

confidence interval

Page 9: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 9

Jackknife Calculations

Page 10: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 10

Monte Carlo & Bootstrapping

• Monte Carlo simulation caught the imagination of practitioners and researchers following Hertz, David; 1964, Harvard Business Review

• Monte Carlo simulation uses repeated sampling to determine the properties of some result of interest

• The re-sampling is carried out with replacement

• If we apply this technique to the previous Jackknife data

we would be Bootstrapping [Adventures of Baron Munchausen] • Bootstrapping is repeatedly re-sampling with

replacement, calculating the statistic of interest and recording its distribution.

Page 11: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 11

Bootstrap Algorithm

Remark: to calculate the dispersion of the mean

DataArray() = n data points

MeanResults(1000)

For i = 1 to 1000

Sum=0

For j = 1 to n

Sum = Sum + DataArray(RandomBetween(1,n))

Next j

MeanResults(i) = Sum / n

Next i

Page 12: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 12

Real World Application 1

What annoys me most – residential price change reporting and hot spotting

Below are sale prices for Q4 2010 and Q1 2011 for Detached houses in Aberfoyle Park, South Australia

$0

$100,000

$200,000

$300,000

$400,000

$500,000

$600,000

$700,000

1 6 11 16 21 26 31 36 41 46 $0

$100,000

$200,000

$300,000

$400,000

$500,000

$600,000

$700,000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Median 382,500

Average 409,932

Median 385,000

Average 391,946

Change Median 0.65%

Change Average -4.39%

Page 13: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 13

Bootstrap Results 1000 iterations

The degree of uncertainty is clearly illustrated.

The median has a 95% “confidence interval” of ….

Average Std Deviation 2.50% 97.50%Q4 2010 $409,138 $12,037 $384,635 $431,877Q1 2011 $392,358 $10,549 $372,403 $413,303

MedianQ4 2010 $384,431 $15,276 $361,000 $420,150Q1 2011 $387,346 $12,359 $350,000 $410,000

Confidence Interval

Average $384,635 $431,877 -13.77%$372,403 $413,303 7.45%

Median $361,000 $420,150 -16.70%$350,000 $410,000 13.57%

Page 14: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 14

A better mousetrap

• The traditional approach is to select n from n with replacement and calculate statistic of interest and repeat m times

• This is inefficient for most statistics of interest including the mean, median, standard deviation or correlation coefficient

• For example the mean is sum/n• If for each iteration we remove just one random element

and replace it with another random element we can adjust the sum by subtracting the value of the removed element and adding the value of the ingoing element

• If n is say 50 we save 48 mathematical operations

Page 15: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 15

Summary

• The bootstrap is simple to implement

• The results are meaningful and easy to interpret

• No specious assumptions regarding underlying distributions are required

• Widely accepted

• It should be embraced by all researchers and practitioners

Page 16: Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, 15-18 June 2011 Paul Kershaw University

Bootstrapping – the neglected approach to uncertainty Slide 16

Yesteryear’s Joys

Bootstrap Methods: Another Look at the JackknifeB. EfronSource: Annals of Statistics Volume 7, Number 1 (1979), 1-26.