review of digital soil mapping steps

Review of DSM steps

Review of DSM concept• Computer-assisted soil mapping-

– Computer models replace tacit models – Computer models to link soil-forming factors (Soil Climate Organism Relief

Parent)– Produces digital soil maps (with error/uncertainty maps)

• Its important features: – High quality and spatially explicit input data– Computer and internet applications – Soil survey principles– GIS and remote sensing principles– Computer and geo-statistical modelling

• Key points to consider– Adequate understanding of data and DSM steps– Optimum data mining– Documentation of steps and data– Accuracy assessment and reporting

DSM steps• Step 1: Decide on mapping focus

– Study area– Geographic projection– Mapping soil property or class (categorical) values

• Step 2: Identify data needs and sources– Data needs (SCORPAN factors)– Data sources (Availability of SCORPAN factors)

• Step 3: Data collection and documentation– Database creation– Metadata development

• Step 4: Input data preparation– GIS database development– GIS operations (clipping, re-projection, format translation,)– Digital terrain analysis (digital terrain parameters)– Image/raster analysis (correction, re-sampling, normalization, PCA)

• Step 5: Spatial prediction• Step 6: Accuracy assessment

1. Get data(Institute, Internet)

2. Develop GIS database

(MS Office, QGIS)

3. Terrain Modelling

(SAGA)

4. Image transformation

(ILWIS)

5. Soil Mapping(R, RStudio)

Summary steps

Detailed DSM Steps1. Choice of focus (soil type, soil property, scale) and study area2. Data identification, collection and documentation

1. SCORPAN factors2. Decide minimum dataset3. Data sources: National libraries, internet, NGOs, colleagues

3. Data organization 1. Develop input database, 2. Choose appropriate scale/resolution

4. Choice of tools1. Database tools (MS office)2. GIS and Image pre-processing tools3. Spatial interpolation and regression tools

5. Methodical application of tools and documentation6. Accuracy assessment and reporting

1. Choice of focus and study area

• Interest (national, regional, etc)

• Data availability

• Other factors

Which soil data is available?

Which environmental covariate is available?

Detailed soil map with Legends and soil data

Soil point data with sitedescription

Detailed soil map with legend

No data

All covariatesC, O, R, P

At least 3 covariatesIncluding R & O

At least 2 covariatesIncluding R

Only one covariate

No data

Increasing level of data adequacy

Climate (C)Organism (O)Relief (R)Parent (P)

Relief (R)Organism (O)

Relief (R)

Deciding on minimum dataset

2. Data identification and collection

• Required data – soil forming factors as input GIS layers– S – soil (profiles, soil map)

– C – climate (mean rainfall)

– O – organism (NDVI, land cover/use)

– R – Relief (Elevation and terrain parameters)

– P – parent material (geology, soil map)

– N – Spatial co-ordiates

• Data sources– National institutions , NGOs, Research institutes

– Regional databases

– Global databases – use of internet

3. Data organization

Methods for data organization

– Input point-data organized in MS Office

– GIS database developed in QGIS

4. Choice of tools• Data organization

– MS office for point-data– QGIS for spatial data

• 1st stage: Data pre-processing– Georeferencing using QGIS– Co-ordinate transformation using QGIS– Digital terrain analysis using SAGA– Study area clipping and vector-raster conversion using QGIS

• 2nd stage : data pre-processing – Pixel re-sampling using ILWIS– Image transformation (to normal distribution) using ILWIS– Principal Component Analysis

• Spatial modelling– Spatial interpolation using R

Sequence of input data organization1. Gather and align point-data in spreadsheet

– Develop database in Spreadsheet (MS Office)

– Create at least 4 columns – ID, X, Y, Attribute

– Check consistency –missing/outlying/factor entries

– Create a sheet for documentation (description)

2. Gather and align other SCORPAN factors– Download and georeference scanned maps in QGIS

– Download and stack multiband image layers in QGIS

– Download single-band images and visualize in QGIS

– Note and document map/image characteristics

3. Choosing the right scale/pixel size and samples

1. Pixel-samples formula: Cartographic rule, which is mathematically given as

Use the model to estimate required samples for a given resolution

2. Largest pixel size of input SCORPAN factors

Samples

Areapixel *0791.0

Methodical application of tools1. Display input data in QGIS and note projection

2. Decide on one projection (not decimal degrees)

3. Ensure all input data conform to the projection. If yes, they should overlay. If not, investigate:– Layer properties (general/metadata)

– Displayed layer coordinates, values, pixel sizes

– The layer should be in the study area

4. Clip all data to the study area using QGIS/ILWIS

5. Export DEM as Arc Info ASCII to SAGA

6. Fill sinks and carry out digital terrain analysis in SAGA

7. Export the terrain parameters as Arc Info ASCII to for 2nd Stage pre-processing in ILWIS

8. Resample layers to common pixel in ILWIS

9. Histogram analysis and normalization of images

10.Image transformation and SCORPAN maplistcreation

11.Carry out principal component analysis (PCA)

12.Choose the best (optimum) PCs from PCA layers

13.Remove undefined areas from the chosen PCs

14.Import data into R

15.Carry out spatial regression in R using the R code

16.Split the data in R for calibration and validation

17.Carry out accuracy assessment

18.Report on map output

DSM accuracy assessment & reporting

• DSM are not 100% accurate due to:

– Gaps and errors in input data

– Modelling errors

– Sampling errors

• Uncertainty represents the departure of soil map output from truth

• Assessment and visualization of uncertainty is a standard procedure in DSM

• In continuous variable map, it’s expressed by:

using geostatistical models

Comparing map output with independent validation data

• Using geostatistical model

Variance map output

Map show where samples were taken

Sampling points have low uncertainties

Map output

Uncertainty map

• Holdout validationUse independent samples

Compare map values with holdout samples

True uncertainty

• Comparison statisticsMean error

Root mean square error

Correlation

• Correlation (r)

• ME

• RMSE

• RMSE and ME should be close to zero

n

i

ii valpredn

ME1

1

n

i

ii valpredn

RMSE1

21

Accuracy assessment of categorical maps

• In categorical map, use is made of holdout validation data

• Accuracy is expressed by:

using confusion matrix

Agreement plot

Find some resources online

• Google: Youtube FAO DSM Module

Freely downloadable

review of digital soil mapping steps

Education