multiple criteria for evaluating land cover classification algorithms summary of a paper by r.s....

Multiple Criteria for Evaluating Land Cover

Classification Algorithms

Summary of a paper Summary of a paper

by R.S. DeFries and by R.S. DeFries and

Jonathan Cheung-Wai ChanJonathan Cheung-Wai Chan

April, 2000April, 2000

Remote Sensing of EnvironmentRemote Sensing of Environment

Premise of the paper:

Proposes criteria for assessing algorithms for Proposes criteria for assessing algorithms for supervisedsupervised land cover classification. land cover classification.

For land classification analysis to be operational, For land classification analysis to be operational, more automated procedures will be required. more automated procedures will be required.

Land cover monitoring using remotely-sensed Land cover monitoring using remotely-sensed satellite data is becoming more common. satellite data is becoming more common.

Larger volumes of data at higher quality are Larger volumes of data at higher quality are becoming more readily available.becoming more readily available.

No single machine learning algorithm has been No single machine learning algorithm has been shown superior for all situations.shown superior for all situations.

Supervised classification TTraining stage (define useful land cover raining stage (define useful land cover

categories with spectral response patterns categories with spectral response patterns from training data of known cover).from training data of known cover).

Classification stage (reassign image pixels Classification stage (reassign image pixels to land cover categories based on the match to land cover categories based on the match with defined spectral attributes).with defined spectral attributes).

Output stage (develop a categorized data set Output stage (develop a categorized data set as maps, tables, or GIS data files).as maps, tables, or GIS data files).

Example: Supervised classification

Training data Classified image

Objectives of this study:

Compare three machine learning Compare three machine learning algorithms for supervised land algorithms for supervised land cover classificationcover classification

based on four criteriabased on four criteria using two different data sets.using two different data sets.

Data sets 8 km AVHRR data (Advanced Very High 8 km AVHRR data (Advanced Very High

Resolution Radiometer from NOAA) Resolution Radiometer from NOAA) 30 m Landsat Thematic Mapper30 m Landsat Thematic Mapper scene scene

(from Pucallpa, Peru area)(from Pucallpa, Peru area)Note: Reliable land cover classifications Note: Reliable land cover classifications had been derived for both data sets based had been derived for both data sets based on expert knowledge (used in place of on expert knowledge (used in place of ground measurements)ground measurements)

1984 AVHRR dataincluded 6 channels at 8 km resolution

Two {

Onevisible

1996 Landsat TM sceneincluded 5 bands at 30m resolution

Approximately 9000 pixels can beoverlaid on the 8km AVHRR data.

8 km AVHRR data To train the classifiersTo train the classifiers

Overlaid Landsat scenes on AVHRR.Overlaid Landsat scenes on AVHRR. Each pixel was labeled as a cover type based Each pixel was labeled as a cover type based

on interpretation of Landsat scene.on interpretation of Landsat scene. To test the classification resultsTo test the classification results

Obtained a random sample of 10,000 pixels Obtained a random sample of 10,000 pixels from final classification results of a previous from final classification results of a previous study (they believe their test data has a high study (they believe their test data has a high degree of confidence).degree of confidence).

30 m Landsat Thematic Mapper scene

To train the classifiersTo train the classifiers Data were selected by sampling the results of Data were selected by sampling the results of

a previous study (5958 pixels).a previous study (5958 pixels). To test the classification resultsTo test the classification results

Date were randomly selected on an additional Date were randomly selected on an additional 12,084 pixels (although not independently 12,084 pixels (although not independently derived, they were used to illustrate the derived, they were used to illustrate the evaluation criteria).evaluation criteria).

The three algorithms compared:

1.1. C5.0 decision tree (standard)C5.0 decision tree (standard)2.2. Decision tree w/ “Bagging”Decision tree w/ “Bagging”3.3. Decision tree w/ “Boosting”Decision tree w/ “Boosting”

Note: Bagging and boosting (2 & 3) are Note: Bagging and boosting (2 & 3) are refinements of (1) that build multiple refinements of (1) that build multiple iterations of classifiers. They can be iterations of classifiers. They can be applied to any applied to any supervisedsupervised classification classification algorithm. algorithm.

What is a decision tree? a machine learning techniquea machine learning technique

(algorithm) that analyzes data, (algorithm) that analyzes data, recognizes patterns, and predicts recognizes patterns, and predicts through repeated learning instancesthrough repeated learning instances

useful when it is important for humans useful when it is important for humans to understand the classification structure to understand the classification structure

successfully applied to satellite data for successfully applied to satellite data for extraction of land cover categoriesextraction of land cover categories

1. C5.0 decision tree

predicts classes by repeatedly partitioning predicts classes by repeatedly partitioning a data set into homogeneous subsetsa data set into homogeneous subsets

variables are used to split subsets into variables are used to split subsets into further subsetsfurther subsets

most important component is the method most important component is the method used to estimate splits at each “node” of used to estimate splits at each “node” of the treethe tree

2. C5.0 decision tree w/“Bagging”

generates a decision tree for each generates a decision tree for each sample sample

a final classification result is a final classification result is obtained by plurality vote of the obtained by plurality vote of the individual classifiersindividual classifiers

3. C5.0 decision tree w/“Boosting”

entire training set is used to generate the entire training set is used to generate the decision tree with a weight is assigned to decision tree with a weight is assigned to each training observationeach training observation

subsequent decision tree iterations focus subsequent decision tree iterations focus on misclassified observations on misclassified observations

a final classification result is obtained by a final classification result is obtained by plurality vote of the individual classifiersplurality vote of the individual classifiers

One of the most important criteria in selecting an appropriate algorithm: the degree of human interpretation the degree of human interpretation

and involvement in the classification and involvement in the classification processprocess

Example: supervised classification Example: supervised classification (need for time intensive collection of (need for time intensive collection of training data) vs. unsupervised training data) vs. unsupervised classification (no training data). classification (no training data).

As a result:

There are always trade-offs There are always trade-offs between accuracy, computational between accuracy, computational speed, and ability to automate the speed, and ability to automate the classification process.classification process.

Four assessment criteria were evaluated in the study: Classification accuracyClassification accuracy – overall, mean class, and – overall, mean class, and

adjusted (accounts for unequal costs of adjusted (accounts for unequal costs of misclassification, which will vary)misclassification, which will vary)

Computational resourcesComputational resources required required StabilityStability of the algorithms w/r/t minor variability of the algorithms w/r/t minor variability

in input datain input data RobustnessRobustness to noise in the training data (includes to noise in the training data (includes

random noise in input and mislabeling of cover random noise in input and mislabeling of cover type in training data)type in training data)

Summary: Results Accuracy is comparable between the three Accuracy is comparable between the three

algorithms using two data sets.algorithms using two data sets. The Bagging and Boosting algorithms are The Bagging and Boosting algorithms are

more stable and more robust to noise in the more stable and more robust to noise in the training data.training data.

The Bagging algorithm is the most costly, The Bagging algorithm is the most costly, and standard decision tree is the least and standard decision tree is the least costly, in terms of computational costly, in terms of computational resources.resources.

The End

thank you for listening

Accuracy Accuracy is one of the primary criteria for Accuracy is one of the primary criteria for

comparing algorithms in literature.comparing algorithms in literature. Accuracy = % pixels correctly classified in Accuracy = % pixels correctly classified in

the test set.the test set. In this study, all three algorithms provide In this study, all three algorithms provide

fairly similar accuracies (generally within fairly similar accuracies (generally within 5%).5%).

Computational resources Likely to be a key consideration in machine Likely to be a key consideration in machine

learning, where “amount of work done” is learning, where “amount of work done” is used as a measure of operations performed.used as a measure of operations performed.

Standard tree: requires less resources.Standard tree: requires less resources. Bagging: number of operations increases in Bagging: number of operations increases in

proportion to number of samples used.proportion to number of samples used. Boosting: number of operations is in Boosting: number of operations is in

proportion to number of iterations used.proportion to number of iterations used.

Stability of algorithm Algorithm should ideally produce stable results with Algorithm should ideally produce stable results with

minor variability in input data, otherwise, it may minor variability in input data, otherwise, it may incorrectly indicate land cover changes when none incorrectly indicate land cover changes when none occurred.occurred.

Variable input data can be common if training data are Variable input data can be common if training data are from same locations.from same locations.

Test method: random sampling generated 10 training Test method: random sampling generated 10 training sets (to approximate minor variation).sets (to approximate minor variation).

Bagging and Boosting provide more stable Bagging and Boosting provide more stable classification (less sensitivity to variation) than a classification (less sensitivity to variation) than a standard decision tree.standard decision tree.

Robustness to noise Remotely sensed data is likely to be noisy due to: Remotely sensed data is likely to be noisy due to:

signal saturation, missing scans, mislabeling, signal saturation, missing scans, mislabeling, problems with sensor or viewing geometry.problems with sensor or viewing geometry.

Test methods: 1) random noise in input Test methods: 1) random noise in input (introduced zero values randomly to simulate (introduced zero values randomly to simulate missing data); 2) mislabeling of cover type in missing data); 2) mislabeling of cover type in training data (assigned one class to all training training data (assigned one class to all training pixels derived from 3 Landsat scenes). pixels derived from 3 Landsat scenes).

Bagging and Boosting appear substantially more Bagging and Boosting appear substantially more robust than standard C5.0 decision tree.robust than standard C5.0 decision tree.

Noise: random noise Standard C5.0 decision tree clearly has Standard C5.0 decision tree clearly has

higher error rates and lower stability.higher error rates and lower stability. Bagging appears slightly more stable than Bagging appears slightly more stable than

boosting for the Landsat data.boosting for the Landsat data.

Noise: mislabeling of cover type in training data

Causes more problems in terms of stability Causes more problems in terms of stability for the decision tree algorithms than for the decision tree algorithms than random noise.random noise.

Standard C5.0 decision tree is least stable Standard C5.0 decision tree is least stable and has the highest error of all algorithms.and has the highest error of all algorithms.

Some applications of results:

These same criteria can be applied to These same criteria can be applied to other types of algorithms such asother types of algorithms such as

Neural networksNeural networks Maximum likelihoodMaximum likelihood Unsupervised classificationUnsupervised classification

multiple criteria for evaluating land cover classification algorithms summary of a paper by r.s....

Documents

avhrr data

test data

training data of known

land cover monitoring

classification stage

land classification

useful land cover categories

different data sets