why interpolation? · 2017-04-05 · 1 spatial interpolation and prediction cp204c ©radke 2017...
TRANSCRIPT
1
Spatial Interpolation and Prediction
cp204c© Radke 2017
Spatial Interpolation
“Everything is related to everything else, but close things are more closely related.”
Once again ….1st law of geography
W. Tobler, UCSB
Readings
qBolstad, GIS Fundamentals,Bolstad, Paul. 2015. GIS Fundamentals: A First Text on Geographic Information Systems. Eider Press - (1stedition pp. 333-350., 2nd edition 395-421, 3rd edition 437-470, 4th edition 473-520, 5th edition 519-559).
Spatial Interpolation
Definition:Estimating the value of a variable of interest at an un-sampled location based on the values measured at sampled locations.
Sample point
Sample point
Estimate value
Spatial Interpolation
Assumes a field-based conceptual model of space – that avariable of interest varies continuously over the study area.
ü Temperature (urban heat island) ü elevation, precipitationü Soil type, vegetation type, geologyü Fire risk, erosion potential, property valuesü Concentrations of students attending schoolsü Air Pollution modeling
Why interpolation?
We cannot sample everywhere ü Too expensive, too tedious, physically impossible
(vegetation, ground water, public opinion)ü Some locations inaccessible, off limits or not clearly
visible (property value, household amenities)ü Some locations inaccessible to even high resolution remote
sensing satellites (Cloud cover, forest canopy, roof tops)
2
Typical Inputs / Output of Interpolation
ü Points to Points
ü Points to Lines: contours (i.e., isolines)
ü Points to vector polygons
ü Points to raster grids
Sample data:
ü Location: x, y coordinates
ü Variable of interest that varies spatially(i.e., Z-value)
ü Time of data capture
Sampling Strategy
ü Number of samples
ü Type of sample: • Random,
• Uniform,
• Cluster,
• Adaptive sampling: fewer samples taken in homogenous areas
Systematic – Regular Lattice
ü Regular spatial interval ü Square or triangulated pattern
Bolstad, GIS Fundamentals
Random – Poisson Process
ü Each location has an equal probability of being selected
ü No location influences anyother potential selection
Bolstad, GIS Fundamentals
Cluster - Neighborhoods
ü Could be a stratified random clustering
ü Could be a systematic or regular clustering pattern
Bolstad, GIS Fundamentals
3
Adaptive – Intelligent Sampling
ü More sampling of data where patterns shift through space.
ü Sampling pattern dictated by data variability.
ü Example – surface or elevation points.
Bolstad, GIS Fundamentals
Adaptive - Elevation pointsElevation points Bolstad, GIS Fundamentals
Interpolation Methods
ü Several methods that vary in approach & complexity
ü All methods use the sample points to estimate values at un-sampled locations
ü Yet, usually produce different results from the same sample data points due to the underlying mathematical formulas / models and different parameters used in estimation
Main Characteristics of Interpolation Methods
üGlobal vs. Local
üExact vs. Inexact
üDeterministic vs. Stochastic
Global vs. Local Estimators
üGlobal: use all sample points to estimate values at un-sampled locations
ü Local: estimates are based on neighboring points
“Everything is related to everything else, but close things are more closely related.”
• W. Tobler, 1st law of geography
Exact vs. Inexact Estimators:
üExact: the values at input sample locations will have same values in the output surface
ü Inexact estimators will create an output surface where even the values at the original sample locations may be estimates
4
Deterministic vs. Stochastic Methods
üDeterministic: based on a mathematical model
üStochastic: based on a geostatistical model that incorporates random variation and accounts for spatial autocorrelation
Evaluating Spatial Interpolation Results - Validation
One simple approach:üWithhold a small subset of the sample points
from the interpolation process
üCheck estimated values at withheld sample points with the observed values at those locations.
Spatial Interpolation Algorithms
A very brief review of the most commonly used spatial interpolation techniques
Algorithm – same as – a recipe
Ø where zˆ is the estimated value of an attribute at the point of interest x0,
Ø z is the observed value at the sampled point xi, λi is the weight assigned to the sampled point, and
Ø n represents the number of sampled points used for the estimation (Webster and Oliver, 2001).
Spatial Interpolation Techniques
Deterministic Methods:ü Natural neighbors: Thiessen polygonsü IDW: inverse distance weightedü Spline functions
Geostatistical Methods:ü Kriging
Nearest Neighbors
The nearest neighbors (NN) method predicts the value of an attribute at an un-sampled point based on the value of the nearest sample by drawing perpendicular bisectors between sampled points (n), forming such as Thiessen (or Dirichlet/Voronoi) polygons (Vi, i=1,2,…, n).
The estimations of the attribute at unsampled points within polygon Vi are the measured value at thenearest single sampled data point xi that is zˆ (x0) = z(xi). The weights are:
λi is the weight assigned to thesampled point
Thiessen Polygons
ü Aka Nearest Neighbor Interpolation
ü One point – the nearest point, is used to assign value to an unsampled location
ü Space is partitioned using Delaunay Triangulation to create Thiessen (aka, Voronoi or Dirichlet) polygons.
ü Each point within a polygon is closer to the sample point than any other point
ü Defines Areas of influence
5
Triangular Irregular Network
The triangular irregular network (TIN) was developed by Peucker (Poiker) and co-workers (Little, Fowler, Mark 1978) for digital elevation modeling that avoids the redundancies of the altitude matrix in the grid system
Voronoi Polygons
… or …
Thiessen Polygons
Delaunay Triangulation (TIN)
Thiessen Polygons
ü Thiessen polygon boundaries are the perpendicular bisectors of straight lines drawn between two neighboring points (Delaunay triangulation, in red)
Police StationsThe Voronoi or Thiessen Polygons
Unconstrained Allocation Solution
(point-polygon class)
Thiessen Polygon Interpolation
Bolstad, 3rd edition, GIS Fundamentals
Thiessen Polygon Interpolation
ü Local estimator: estimates are based on the nearest sample point
ü Exact: sample values are maintained in output
ü Deterministic: mathematical model based on Delaunay triangulation
6
Thiessen Polygon - Natural Neighbors
üFor each neighbor, the area of the portion of its original polygon that became incorporated in the tile of the new point is calculated.
üThese areas are scaled to sum to 1 and are used as weights for the corresponding samples.
üA new Voronoi polygon, beige color, is then created around the interpolation point (red star). The proportion of overlap between this new polygon and the initial polygons is then used as the weights.
Fixed radius
Sample point
Estimate value
Sample point
Fixed radius Inverse Distance Weighting
The inverse distance weighting or inverse distance weighted (IDW) method estimates the values of an attribute at un-sampled points using a linear combination of values
The assumption is that sampled points closer to the un-sampled point are more similar to it than those further away in their values. (Tobler’s Law – once again referenced). The weights can be expressed as:
where di is the distance between x0 and xi, p is a power parameter, and n represents the number of sampled points used for the estimation
Local, exact, deterministic method
Inverse Distance Weighting
x0xi d
i
Estimate value
Sample point
IDW Interpolation
Zj = estimated value at location j
i = # of sample pts considered, here 3
n = user defined exponent that can be used to increase weight of nearby pts, here n= 1 (no exponent).
7
üThe main factor affecting the accuracy of IDW is the value of the power parameter. üAs weights diminish as the distance increases.
üespecially when the value of the power parameter increases, so nearby samples have a heavier weight and have more influence on the estimation, and the resultant spatial interpolation is local.
The weight factor
The choice of power parameter and neighborhood size is arbitrary –based random choice or personal whim.
IDW Interpolation
ü Where Voronoi method is based on closest point, IDW derives an estimate based on a user defined parameter for the number of sample pts to consider (ie. search radius). • The larger number of sample points the smoother the resulting surface,
up to the point where all sample points are used and one value is estimated for the entire output surface.
ü The User can also input a power parameter • the higher the power, the greater the influence of nearby points, but
resulting surface not as smooth as a lower power.
Estimate value
Sample point
IDW Interpolation i = # sample points
n = Weight exponent
n = Weight exponent
i = # sample points
IDW Interpolation
p=1 and n=12 p=2 and n=12 p=4 and n=121-12pts 2-12pts 4-12pts
linear
IDW Interpolation
Local points have more influence
Spline Interpolation
ü Technique is named after a spline, a flexible ruler that was used by draftsmen to draw a smooth road from a set of survey points.
ü The spline creates the smoothest possible line along the set of points.
8
Splines
ü Spline functions, which are based on a set of polynomial functions, serve the same purpose as the bendy ruler with a set of sample points.
Splines
ü For surface creation, spline functions are like bending a rubber sheet to pass through all the sample points, while minimizing the total curvature of the surface.
ü Usually but not always exact interpolations, as exactness may not result in a smooth surface.
ü As with the IDW method, you can input the number of points to consider in the estimate.
• The more points, the more distant sample pts impact the local estimate and the smoother the overall surface.
Splines Splines
ü Splines are good spatial interpolators for gently varying surfaces like elevation, water tables, pollution concentrations.
Splines
Polynomial
Polynomial comes from poly- (meaning "many") and -nomial (in this case meaning "term") ... so it says "many terms"
A polynomial can have: constants (like 3, -20, or ½)variables (like x and y)exponents (like the 2 in y2),
… that can be combined using addition, subtraction, multiplication and division
A polynomial can have constants, variables and exponents,
but never division by a variable.
Global polynomial interpolation (GPI)
üGlobal polynomial interpolation (GPI) fits a smooth surface that is defined by a mathematical function (a polynomial) to the input sample points.
üThe global polynomial surface changes gradually and captures coarse-scale pattern in the data.
üConceptually, GPI is like taking a piece of paper and fitting it between the raised points (raised to the height of value).
Spatial regression
Use observations of dependent variables, independent variables, and sample coordinates to develop a prediction equation.
Zi = f(xi,yi,ai,bj)
9
Trend Surface:ü A spatial regression where one fits a statistical model, trend surface through the measured points.
ü Trend surfaces are the most accurate when you need to fit a smoothly varying surface such as the mean daily temperature over a large area.
Simple Spatial regression Trend SurfaceOriginal Surface Trend Surface
Kriging
ü A set of geostatistical estimatorsü Standard (i.e., non-spatial) statistical methods are based on the assumption of independence / normal distribution of data values, which is violated by spatial autocorrelation.
ü Geostatistical models account for spatial autocorrelation – a measure of the tendency of nearby points to have similar values.
Kriging
ü Kriging is based on 3 main components of the sample data: the spatial trend, spatial autocorrelation, and random variation.
ü These three are combined in a mathematical model to create an estimation function.
ü The function is then applied to the data for the sample points and used to estimate values over the surface of the study area.
ü The semivariogram plots the semivariance over lag distances
ü Semivariance is typically small at small lag distances and increases to a plateau
Kriging - Semivariogram Kriging - Semivariogram
ü The semivariogram is defined as
γ(si,sj) = ½ var(Z(si) - Z(sj))
where var is the variance.
ü If two locations, si and sj, are close to each other in terms of the distance measure of d(si, sj), you expect them to be similar, so the difference in their values, Z(si) - Z(sj), will be small. As si and sj get farther apart, they become less similar, so the difference in their values, Z(si) - Z(sj), will become larger.
10
Kriging - Semivariogram
The height that the semivariogram reaches when it levels off is called the sill.
nugget effect + the partial sill = the sill
partial sill
nugget
sill
The distance at which the semivariogram levels off to the sill is called the range.
range
Kriging - Covariance function
ü The covariance function is defined to be
C(si, sj) = cov(Z(si), Z(sj)),
Where: cov is the covariance.Covariance is a scaled version of correlation. When two locations, si and sj, are close to each other, you expect them to be similar, and their covariance (a correlation) will be large. As si and sj get farther apart, they become less similar, and their covariancebecomes zero.
Kriging - Covariance function
covariance function decreases with distance
partial sill
nugget
sill range
Kriging – Semivariogram & Covariance function
ü The relationship between the semivariogram and the covariance function:
γ(si, sj) = sill - C(si, sj),
üSemivariogram and covariance both measure the strength of statistical correlation as a function of distance.
üThere are some instances when semivariograms exist, but covariance functions do not.
üThere are no hard-and-fast rules on choosing the "best" semivariogram model.
ü The process of modeling semivariograms and covariance functions fits a semivariogram or covariance curve to your empirical data.
ü The goal is to achieve the best fit, and also incorporate your knowledge of the phenomenon in the model.
ü The model will then be used in your predictions.ü The sill, range, and nugget are the important characteristics of the model.
Kriging - Semivariogram
ü Nugget: initial semivariancewhen Autocorrelation is highest. Theoretically, the semivarianceshould be zero when the lag distance is zero. Thus, the nugget is an indicator of the error in the sample measurements
ü Sill is point of plateau: this can be thought of as the natural variation when there is little autocorrelation
ü Range is the lag distance at which the sill is reached.
Kriging - Semivariogram
11
Kriging Recap
ü Kriging resembles IDW: in that distance and weights are used to estimate values at unsampled locations.
ü
ü However, IDW uses a coarse weighting scheme (inverse distance) while Kriging uses the semivariance method to calculate weights that minimize error in the predicted values.
Kriging Pros & Cons
Pros• Because they are based on statistical models, Kriging methods can produce
evaluative measures of the accuracy of the predictions.• Effective method when samples are sparse.
• In theory, Kriging methods should produce optimal interpolation weights.
Cons• Much more complex and nuanced process than the deterministic spatial interpolation methods.
• There are no hard-and-fast rules on choosing the "best" semivariogram model.
• Much more computationally intensive.
Kriging Surface
Original Contours Kriging Contours
Kriging Contours
Kriging
ü A detailed review of Kriging is beyond the scope of this presentation and course. Dedicated self-study and/or a course on geostatistics/spatial statistics (such as: ESPMc177 / LD ARCHc177) is needed to better understand and appropriately apply the method.
ü For more info, see:• Burrough & McDonnell, Principles of GIS, Chap. 5&6• Bailey & Gatrell, Interactive Spatial Data Analysis