multiple criteria for evaluating land cover classification algorithms summary of a paper by r.s....
Post on 19-Dec-2015
218 views
TRANSCRIPT
Multiple Criteria for Evaluating Land Cover
Classification Algorithms
Summary of a paper Summary of a paper
by R.S. DeFries and by R.S. DeFries and
Jonathan Cheung-Wai ChanJonathan Cheung-Wai Chan
April, 2000April, 2000
Remote Sensing of EnvironmentRemote Sensing of Environment
Premise of the paper:
Proposes criteria for assessing algorithms for Proposes criteria for assessing algorithms for supervisedsupervised land cover classification. land cover classification.
For land classification analysis to be operational, For land classification analysis to be operational, more automated procedures will be required. more automated procedures will be required.
Land cover monitoring using remotely-sensed Land cover monitoring using remotely-sensed satellite data is becoming more common. satellite data is becoming more common.
Larger volumes of data at higher quality are Larger volumes of data at higher quality are becoming more readily available.becoming more readily available.
No single machine learning algorithm has been No single machine learning algorithm has been shown superior for all situations.shown superior for all situations.
Supervised classification TTraining stage (define useful land cover raining stage (define useful land cover
categories with spectral response patterns categories with spectral response patterns from training data of known cover).from training data of known cover).
Classification stage (reassign image pixels Classification stage (reassign image pixels to land cover categories based on the match to land cover categories based on the match with defined spectral attributes).with defined spectral attributes).
Output stage (develop a categorized data set Output stage (develop a categorized data set as maps, tables, or GIS data files).as maps, tables, or GIS data files).
Example: Supervised classification
Training data Classified image
Objectives of this study:
Compare three machine learning Compare three machine learning algorithms for supervised land algorithms for supervised land cover classificationcover classification
based on four criteriabased on four criteria using two different data sets.using two different data sets.
Data sets 8 km AVHRR data (Advanced Very High 8 km AVHRR data (Advanced Very High
Resolution Radiometer from NOAA) Resolution Radiometer from NOAA) 30 m Landsat Thematic Mapper30 m Landsat Thematic Mapper scene scene
(from Pucallpa, Peru area)(from Pucallpa, Peru area)Note: Reliable land cover classifications Note: Reliable land cover classifications had been derived for both data sets based had been derived for both data sets based on expert knowledge (used in place of on expert knowledge (used in place of ground measurements)ground measurements)
1984 AVHRR dataincluded 6 channels at 8 km resolution
Two {
Onevisible
1996 Landsat TM sceneincluded 5 bands at 30m resolution
Approximately 9000 pixels can beoverlaid on the 8km AVHRR data.
8 km AVHRR data To train the classifiersTo train the classifiers
Overlaid Landsat scenes on AVHRR.Overlaid Landsat scenes on AVHRR. Each pixel was labeled as a cover type based Each pixel was labeled as a cover type based
on interpretation of Landsat scene.on interpretation of Landsat scene. To test the classification resultsTo test the classification results
Obtained a random sample of 10,000 pixels Obtained a random sample of 10,000 pixels from final classification results of a previous from final classification results of a previous study (they believe their test data has a high study (they believe their test data has a high degree of confidence).degree of confidence).
30 m Landsat Thematic Mapper scene
To train the classifiersTo train the classifiers Data were selected by sampling the results of Data were selected by sampling the results of
a previous study (5958 pixels).a previous study (5958 pixels). To test the classification resultsTo test the classification results
Date were randomly selected on an additional Date were randomly selected on an additional 12,084 pixels (although not independently 12,084 pixels (although not independently derived, they were used to illustrate the derived, they were used to illustrate the evaluation criteria).evaluation criteria).
The three algorithms compared:
1.1. C5.0 decision tree (standard)C5.0 decision tree (standard)2.2. Decision tree w/ “Bagging”Decision tree w/ “Bagging”3.3. Decision tree w/ “Boosting”Decision tree w/ “Boosting”
Note: Bagging and boosting (2 & 3) are Note: Bagging and boosting (2 & 3) are refinements of (1) that build multiple refinements of (1) that build multiple iterations of classifiers. They can be iterations of classifiers. They can be applied to any applied to any supervisedsupervised classification classification algorithm. algorithm.
What is a decision tree? a machine learning techniquea machine learning technique
(algorithm) that analyzes data, (algorithm) that analyzes data, recognizes patterns, and predicts recognizes patterns, and predicts through repeated learning instancesthrough repeated learning instances
useful when it is important for humans useful when it is important for humans to understand the classification structure to understand the classification structure
successfully applied to satellite data for successfully applied to satellite data for extraction of land cover categoriesextraction of land cover categories
1. C5.0 decision tree
predicts classes by repeatedly partitioning predicts classes by repeatedly partitioning a data set into homogeneous subsetsa data set into homogeneous subsets
variables are used to split subsets into variables are used to split subsets into further subsetsfurther subsets
most important component is the method most important component is the method used to estimate splits at each “node” of used to estimate splits at each “node” of the treethe tree
2. C5.0 decision tree w/“Bagging”
generates a decision tree for each generates a decision tree for each sample sample
a final classification result is a final classification result is obtained by plurality vote of the obtained by plurality vote of the individual classifiersindividual classifiers
3. C5.0 decision tree w/“Boosting”
entire training set is used to generate the entire training set is used to generate the decision tree with a weight is assigned to decision tree with a weight is assigned to each training observationeach training observation
subsequent decision tree iterations focus subsequent decision tree iterations focus on misclassified observations on misclassified observations
a final classification result is obtained by a final classification result is obtained by plurality vote of the individual classifiersplurality vote of the individual classifiers
One of the most important criteria in selecting an appropriate algorithm: the degree of human interpretation the degree of human interpretation
and involvement in the classification and involvement in the classification processprocess
Example: supervised classification Example: supervised classification (need for time intensive collection of (need for time intensive collection of training data) vs. unsupervised training data) vs. unsupervised classification (no training data). classification (no training data).
As a result:
There are always trade-offs There are always trade-offs between accuracy, computational between accuracy, computational speed, and ability to automate the speed, and ability to automate the classification process.classification process.
Four assessment criteria were evaluated in the study: Classification accuracyClassification accuracy – overall, mean class, and – overall, mean class, and
adjusted (accounts for unequal costs of adjusted (accounts for unequal costs of misclassification, which will vary)misclassification, which will vary)
Computational resourcesComputational resources required required StabilityStability of the algorithms w/r/t minor variability of the algorithms w/r/t minor variability
in input datain input data RobustnessRobustness to noise in the training data (includes to noise in the training data (includes
random noise in input and mislabeling of cover random noise in input and mislabeling of cover type in training data)type in training data)
Summary: Results Accuracy is comparable between the three Accuracy is comparable between the three
algorithms using two data sets.algorithms using two data sets. The Bagging and Boosting algorithms are The Bagging and Boosting algorithms are
more stable and more robust to noise in the more stable and more robust to noise in the training data.training data.
The Bagging algorithm is the most costly, The Bagging algorithm is the most costly, and standard decision tree is the least and standard decision tree is the least costly, in terms of computational costly, in terms of computational resources.resources.
The End
thank you for listening
Accuracy Accuracy is one of the primary criteria for Accuracy is one of the primary criteria for
comparing algorithms in literature.comparing algorithms in literature. Accuracy = % pixels correctly classified in Accuracy = % pixels correctly classified in
the test set.the test set. In this study, all three algorithms provide In this study, all three algorithms provide
fairly similar accuracies (generally within fairly similar accuracies (generally within 5%).5%).
Computational resources Likely to be a key consideration in machine Likely to be a key consideration in machine
learning, where “amount of work done” is learning, where “amount of work done” is used as a measure of operations performed.used as a measure of operations performed.
Standard tree: requires less resources.Standard tree: requires less resources. Bagging: number of operations increases in Bagging: number of operations increases in
proportion to number of samples used.proportion to number of samples used. Boosting: number of operations is in Boosting: number of operations is in
proportion to number of iterations used.proportion to number of iterations used.
Stability of algorithm Algorithm should ideally produce stable results with Algorithm should ideally produce stable results with
minor variability in input data, otherwise, it may minor variability in input data, otherwise, it may incorrectly indicate land cover changes when none incorrectly indicate land cover changes when none occurred.occurred.
Variable input data can be common if training data are Variable input data can be common if training data are from same locations.from same locations.
Test method: random sampling generated 10 training Test method: random sampling generated 10 training sets (to approximate minor variation).sets (to approximate minor variation).
Bagging and Boosting provide more stable Bagging and Boosting provide more stable classification (less sensitivity to variation) than a classification (less sensitivity to variation) than a standard decision tree.standard decision tree.
Robustness to noise Remotely sensed data is likely to be noisy due to: Remotely sensed data is likely to be noisy due to:
signal saturation, missing scans, mislabeling, signal saturation, missing scans, mislabeling, problems with sensor or viewing geometry.problems with sensor or viewing geometry.
Test methods: 1) random noise in input Test methods: 1) random noise in input (introduced zero values randomly to simulate (introduced zero values randomly to simulate missing data); 2) mislabeling of cover type in missing data); 2) mislabeling of cover type in training data (assigned one class to all training training data (assigned one class to all training pixels derived from 3 Landsat scenes). pixels derived from 3 Landsat scenes).
Bagging and Boosting appear substantially more Bagging and Boosting appear substantially more robust than standard C5.0 decision tree.robust than standard C5.0 decision tree.
Noise: random noise Standard C5.0 decision tree clearly has Standard C5.0 decision tree clearly has
higher error rates and lower stability.higher error rates and lower stability. Bagging appears slightly more stable than Bagging appears slightly more stable than
boosting for the Landsat data.boosting for the Landsat data.
Noise: mislabeling of cover type in training data
Causes more problems in terms of stability Causes more problems in terms of stability for the decision tree algorithms than for the decision tree algorithms than random noise.random noise.
Standard C5.0 decision tree is least stable Standard C5.0 decision tree is least stable and has the highest error of all algorithms.and has the highest error of all algorithms.
Some applications of results:
These same criteria can be applied to These same criteria can be applied to other types of algorithms such asother types of algorithms such as
Neural networksNeural networks Maximum likelihoodMaximum likelihood Unsupervised classificationUnsupervised classification