why interpolation? · 2017-04-05 · 1 spatial interpolation and prediction cp204c ©radke 2017...

1

Spatial Interpolation and Prediction

cp204c© Radke 2017

Spatial Interpolation

“Everything is related to everything else, but close things are more closely related.”

Once again ….1st law of geography

W. Tobler, UCSB

Readings

qBolstad, GIS Fundamentals,Bolstad, Paul. 2015. GIS Fundamentals: A First Text on Geographic Information Systems. Eider Press - (1stedition pp. 333-350., 2nd edition 395-421, 3rd edition 437-470, 4th edition 473-520, 5th edition 519-559).


Definition:Estimating the value of a variable of interest at an un-sampled location based on the values measured at sampled locations.

Sample point

Sample point

Estimate value


Assumes a field-based conceptual model of space – that avariable of interest varies continuously over the study area.

ü Temperature (urban heat island) ü elevation, precipitationü Soil type, vegetation type, geologyü Fire risk, erosion potential, property valuesü Concentrations of students attending schoolsü Air Pollution modeling

Why interpolation?

We cannot sample everywhere ü Too expensive, too tedious, physically impossible

(vegetation, ground water, public opinion)ü Some locations inaccessible, off limits or not clearly

visible (property value, household amenities)ü Some locations inaccessible to even high resolution remote

sensing satellites (Cloud cover, forest canopy, roof tops)

2

Typical Inputs / Output of Interpolation

ü Points to Points

ü Points to Lines: contours (i.e., isolines)

ü Points to vector polygons

ü Points to raster grids

Sample data:

ü Location: x, y coordinates

ü Variable of interest that varies spatially(i.e., Z-value)

ü Time of data capture

Sampling Strategy

ü Number of samples

ü Type of sample: • Random,

• Uniform,

• Cluster,

• Adaptive sampling: fewer samples taken in homogenous areas

Systematic – Regular Lattice

ü Regular spatial interval ü Square or triangulated pattern

Bolstad, GIS Fundamentals

Random – Poisson Process

ü Each location has an equal probability of being selected

ü No location influences anyother potential selection


Cluster - Neighborhoods

ü Could be a stratified random clustering

ü Could be a systematic or regular clustering pattern


3

Adaptive – Intelligent Sampling

ü More sampling of data where patterns shift through space.

ü Sampling pattern dictated by data variability.

ü Example – surface or elevation points.


Adaptive - Elevation pointsElevation points Bolstad, GIS Fundamentals

Interpolation Methods

ü Several methods that vary in approach & complexity

ü All methods use the sample points to estimate values at un-sampled locations

ü Yet, usually produce different results from the same sample data points due to the underlying mathematical formulas / models and different parameters used in estimation

Main Characteristics of Interpolation Methods

üGlobal vs. Local

üExact vs. Inexact

üDeterministic vs. Stochastic

Global vs. Local Estimators

üGlobal: use all sample points to estimate values at un-sampled locations

ü Local: estimates are based on neighboring points

“Everything is related to everything else, but close things are more closely related.”

• W. Tobler, 1st law of geography

Exact vs. Inexact Estimators:

üExact: the values at input sample locations will have same values in the output surface

ü Inexact estimators will create an output surface where even the values at the original sample locations may be estimates

4

Deterministic vs. Stochastic Methods

üDeterministic: based on a mathematical model

üStochastic: based on a geostatistical model that incorporates random variation and accounts for spatial autocorrelation

…. more on this later

Evaluating Spatial Interpolation Results - Validation

One simple approach:üWithhold a small subset of the sample points

from the interpolation process

üCheck estimated values at withheld sample points with the observed values at those locations.

Spatial Interpolation Algorithms

A very brief review of the most commonly used spatial interpolation techniques

Algorithm – same as – a recipe

Ø where zˆ is the estimated value of an attribute at the point of interest x0,

Ø z is the observed value at the sampled point xi, λi is the weight assigned to the sampled point, and

Ø n represents the number of sampled points used for the estimation (Webster and Oliver, 2001).

Spatial Interpolation Techniques

Deterministic Methods:ü Natural neighbors: Thiessen polygonsü IDW: inverse distance weightedü Spline functions

Geostatistical Methods:ü Kriging

Nearest Neighbors

The nearest neighbors (NN) method predicts the value of an attribute at an un-sampled point based on the value of the nearest sample by drawing perpendicular bisectors between sampled points (n), forming such as Thiessen (or Dirichlet/Voronoi) polygons (Vi, i=1,2,…, n).

The estimations of the attribute at unsampled points within polygon Vi are the measured value at thenearest single sampled data point xi that is zˆ (x0) = z(xi). The weights are:

λi is the weight assigned to thesampled point

Thiessen Polygons

ü Aka Nearest Neighbor Interpolation

ü One point – the nearest point, is used to assign value to an unsampled location

ü Space is partitioned using Delaunay Triangulation to create Thiessen (aka, Voronoi or Dirichlet) polygons.

ü Each point within a polygon is closer to the sample point than any other point

ü Defines Areas of influence

5

Triangular Irregular Network

The triangular irregular network (TIN) was developed by Peucker (Poiker) and co-workers (Little, Fowler, Mark 1978) for digital elevation modeling that avoids the redundancies of the altitude matrix in the grid system

Voronoi Polygons

… or …

Thiessen Polygons

Delaunay Triangulation (TIN)

Thiessen Polygons

ü Thiessen polygon boundaries are the perpendicular bisectors of straight lines drawn between two neighboring points (Delaunay triangulation, in red)

Police StationsThe Voronoi or Thiessen Polygons

Unconstrained Allocation Solution

(point-polygon class)

Thiessen Polygon Interpolation

Bolstad, 3rd edition, GIS Fundamentals

Thiessen Polygon Interpolation

ü Local estimator: estimates are based on the nearest sample point

ü Exact: sample values are maintained in output

ü Deterministic: mathematical model based on Delaunay triangulation

6

Thiessen Polygon - Natural Neighbors

üFor each neighbor, the area of the portion of its original polygon that became incorporated in the tile of the new point is calculated.

üThese areas are scaled to sum to 1 and are used as weights for the corresponding samples.

üA new Voronoi polygon, beige color, is then created around the interpolation point (red star). The proportion of overlap between this new polygon and the initial polygons is then used as the weights.

Fixed radius

Sample point

Estimate value

Sample point

Fixed radius Inverse Distance Weighting

The inverse distance weighting or inverse distance weighted (IDW) method estimates the values of an attribute at un-sampled points using a linear combination of values

The assumption is that sampled points closer to the un-sampled point are more similar to it than those further away in their values. (Tobler’s Law – once again referenced). The weights can be expressed as:

where di is the distance between x0 and xi, p is a power parameter, and n represents the number of sampled points used for the estimation

Local, exact, deterministic method

Inverse Distance Weighting

x0xi d

i

Estimate value

Sample point

IDW Interpolation

Zj = estimated value at location j

i = # of sample pts considered, here 3

n = user defined exponent that can be used to increase weight of nearby pts, here n= 1 (no exponent).

7

üThe main factor affecting the accuracy of IDW is the value of the power parameter. üAs weights diminish as the distance increases.

üespecially when the value of the power parameter increases, so nearby samples have a heavier weight and have more influence on the estimation, and the resultant spatial interpolation is local.

The weight factor

The choice of power parameter and neighborhood size is arbitrary –based random choice or personal whim.

IDW Interpolation

ü Where Voronoi method is based on closest point, IDW derives an estimate based on a user defined parameter for the number of sample pts to consider (ie. search radius). • The larger number of sample points the smoother the resulting surface,

up to the point where all sample points are used and one value is estimated for the entire output surface.

ü The User can also input a power parameter • the higher the power, the greater the influence of nearby points, but

resulting surface not as smooth as a lower power.

Estimate value

Sample point

IDW Interpolation i = # sample points

n = Weight exponent

n = Weight exponent

i = # sample points

IDW Interpolation

p=1 and n=12 p=2 and n=12 p=4 and n=121-12pts 2-12pts 4-12pts

linear

IDW Interpolation

Local points have more influence

Spline Interpolation

ü Technique is named after a spline, a flexible ruler that was used by draftsmen to draw a smooth road from a set of survey points.

ü The spline creates the smoothest possible line along the set of points.

8

Splines

ü Spline functions, which are based on a set of polynomial functions, serve the same purpose as the bendy ruler with a set of sample points.

Splines

ü For surface creation, spline functions are like bending a rubber sheet to pass through all the sample points, while minimizing the total curvature of the surface.

ü Usually but not always exact interpolations, as exactness may not result in a smooth surface.

ü As with the IDW method, you can input the number of points to consider in the estimate.

• The more points, the more distant sample pts impact the local estimate and the smoother the overall surface.

Splines Splines

ü Splines are good spatial interpolators for gently varying surfaces like elevation, water tables, pollution concentrations.

Splines

Polynomial

Polynomial comes from poly- (meaning "many") and -nomial (in this case meaning "term") ... so it says "many terms"

A polynomial can have: constants (like 3, -20, or ½)variables (like x and y)exponents (like the 2 in y2),

… that can be combined using addition, subtraction, multiplication and division

A polynomial can have constants, variables and exponents,

but never division by a variable.

Global polynomial interpolation (GPI)

üGlobal polynomial interpolation (GPI) fits a smooth surface that is defined by a mathematical function (a polynomial) to the input sample points.

üThe global polynomial surface changes gradually and captures coarse-scale pattern in the data.

üConceptually, GPI is like taking a piece of paper and fitting it between the raised points (raised to the height of value).

Spatial regression

Use observations of dependent variables, independent variables, and sample coordinates to develop a prediction equation.

Zi = f(xi,yi,ai,bj)

9

Trend Surface:ü A spatial regression where one fits a statistical model, trend surface through the measured points.

ü Trend surfaces are the most accurate when you need to fit a smoothly varying surface such as the mean daily temperature over a large area.

Simple Spatial regression Trend SurfaceOriginal Surface Trend Surface

Kriging

ü A set of geostatistical estimatorsü Standard (i.e., non-spatial) statistical methods are based on the assumption of independence / normal distribution of data values, which is violated by spatial autocorrelation.

ü Geostatistical models account for spatial autocorrelation – a measure of the tendency of nearby points to have similar values.

Kriging

ü Kriging is based on 3 main components of the sample data: the spatial trend, spatial autocorrelation, and random variation.

ü These three are combined in a mathematical model to create an estimation function.

ü The function is then applied to the data for the sample points and used to estimate values over the surface of the study area.

ü The semivariogram plots the semivariance over lag distances

ü Semivariance is typically small at small lag distances and increases to a plateau

Kriging - Semivariogram Kriging - Semivariogram

ü The semivariogram is defined as

γ(si,sj) = ½ var(Z(si) - Z(sj))

where var is the variance.

ü If two locations, si and sj, are close to each other in terms of the distance measure of d(si, sj), you expect them to be similar, so the difference in their values, Z(si) - Z(sj), will be small. As si and sj get farther apart, they become less similar, so the difference in their values, Z(si) - Z(sj), will become larger.

10

Kriging - Semivariogram

The height that the semivariogram reaches when it levels off is called the sill.

nugget effect + the partial sill = the sill

partial sill

nugget

sill

The distance at which the semivariogram levels off to the sill is called the range.

range

Kriging - Covariance function

ü The covariance function is defined to be

C(si, sj) = cov(Z(si), Z(sj)),

Where: cov is the covariance.Covariance is a scaled version of correlation. When two locations, si and sj, are close to each other, you expect them to be similar, and their covariance (a correlation) will be large. As si and sj get farther apart, they become less similar, and their covariancebecomes zero.

Kriging - Covariance function

covariance function decreases with distance

partial sill

nugget

sill range

Kriging – Semivariogram & Covariance function

ü The relationship between the semivariogram and the covariance function:

γ(si, sj) = sill - C(si, sj),

üSemivariogram and covariance both measure the strength of statistical correlation as a function of distance.

üThere are some instances when semivariograms exist, but covariance functions do not.

üThere are no hard-and-fast rules on choosing the "best" semivariogram model.

ü The process of modeling semivariograms and covariance functions fits a semivariogram or covariance curve to your empirical data.

ü The goal is to achieve the best fit, and also incorporate your knowledge of the phenomenon in the model.

ü The model will then be used in your predictions.ü The sill, range, and nugget are the important characteristics of the model.


ü Nugget: initial semivariancewhen Autocorrelation is highest. Theoretically, the semivarianceshould be zero when the lag distance is zero. Thus, the nugget is an indicator of the error in the sample measurements

ü Sill is point of plateau: this can be thought of as the natural variation when there is little autocorrelation

ü Range is the lag distance at which the sill is reached.


11

Kriging Recap

ü Kriging resembles IDW: in that distance and weights are used to estimate values at unsampled locations.

ü

ü However, IDW uses a coarse weighting scheme (inverse distance) while Kriging uses the semivariance method to calculate weights that minimize error in the predicted values.

Kriging Pros & Cons

Pros• Because they are based on statistical models, Kriging methods can produce

evaluative measures of the accuracy of the predictions.• Effective method when samples are sparse.

• In theory, Kriging methods should produce optimal interpolation weights.

Cons• Much more complex and nuanced process than the deterministic spatial interpolation methods.

• There are no hard-and-fast rules on choosing the "best" semivariogram model.

• Much more computationally intensive.

Kriging Surface

Original Contours Kriging Contours

Kriging Contours

Kriging

ü A detailed review of Kriging is beyond the scope of this presentation and course. Dedicated self-study and/or a course on geostatistics/spatial statistics (such as: ESPMc177 / LD ARCHc177) is needed to better understand and appropriately apply the method.

ü For more info, see:• Burrough & McDonnell, Principles of GIS, Chap. 5&6• Bailey & Gatrell, Interactive Spatial Data Analysis

why interpolation? · 2017-04-05 · 1 spatial interpolation and prediction cp204c ©radke 2017...

Documents