department of physical geography - su.diva-portal.org1138282/fulltext01.pdf · relating to the...
TRANSCRIPT
Master’s thesisPhysical Geography and Quaternary Geology, 60 Credits
Department of Physical Geography
Evaluating Multitemporal Sentinel-2 data for Forest
Mapping using Random Forest
Marc Nelson
NKA 1892017
Preface
This Master’s thesis is Marc Nelson’s degree project in Physical Geography and Quaternary
Geology at the Department of Physical Geography, Stockholm University. The Master’s
thesis comprises 60 credits (two terms of full-time studies).
Cooperation with Metria.
Supervisors have been Helle Skånes and Marika Wennbom at the Department of Physical
Geography, Stockholm University. Examiner has been Ian Brown at the Department of
Physical Geography, Stockholm University.
The author is responsible for the contents of this thesis.
Stockholm, 4 September 2017
Steffen Holzkämper
Director of studies
i
Abstract
The mapping of land cover using remotely sensed data is most effective when a robust
classification method is employed. Random forest is a modern machine learning algorithm
that has recently gained interest in the field of remote sensing due to its non-parametric
nature, which may be better suited to handle complex, high-dimensional data than
conventional techniques. In this study, the random forest method is applied to remote
sensing data from the European Space Agency’s new Sentinel-2 satellite program, which was
launched in 2015 yet remains relatively untested in scientific literature using non-simulated
data. In a study site of boreo-nemoral forest in Ekerö mulicipality, Sweden, a classification is
performed for six forest classes based on CadasterENV Sweden, a multi-purpose land cover
mapping and change monitoring program. The performance of Sentinel-2’s Multi-Spectral
Imager is investigated in the context of time series to capture phenological conditions,
optimal band combinations, as well as the influence of sample size and ancillary inputs.
Using two images from spring and summer of 2016, an overall map accuracy of 86.0% was
achieved. The red edge, shortwave infrared, and visible red bands were confirmed to be of
high value. Important factors contributing to the result include the timing of image
acquisition, use of a feature reduction approach to decrease the correlation between spectral
channels, and the addition of ancillary data that combines topographic and edaphic
information. The results suggest that random forest is an effective classification technique
that is particularly well suited to high-dimensional remote sensing data.
iii
Table of contents
Abstract…………………………………………………………….………......………………..……i
Table of contents…………………………………………………….……...……………………...iii
List of Tables and figures…………………………………………………….……...……………iv
1. Introduction………………………………………………………..……….………………………1
2. Background……………………………………………………………….…………………..……3
2.1 CadasterENV and reference data………………………………………………………..…….3
2.2 Machine learning and Random Forest classification………………………….………...……5
2.3 Accuracy Assessment…………………………………………………….………..……….8
3. Methods……………………………………………….…………………………………...….……8
3.1 Study area………………………………………………………………….…….….….….8
3.2 Satellite and ancillary data……………………………………………….…….…….……9
3.3 Implementation of random forest method………………………………..…..……..……12
3.4 Multitemporal imagery………………………………………………..…..….….…..…..14
3.5 Ancillary data…………………………………………………………..….….…...……..14
3.6 Sample size……………………………………………………………..……….…...……14
3.7 Optimal band combinations and variable importance………………..………....………..14
3.8 Optimal random forest parameters………………………………...………….…………15
3.9 Validation………………………………………………………………..…….………….16
4. Results……………………………………………….………………………………….………….17
4.1 Multitemporal imagery………………………………………………..…..……....……..17
4.2 The influence of ancillary data on random forest classification…………………...……..18
4.3 The influence of sample size on classification accuracy……………………….………….18
4.4 Band combinations and variable importance………………………………….………….19
4.5 Optimal random forest parameters………………………………………….……...…….21
4.6 Final result: random forest classification map…………………………….…...…………21
5. Discussion……………………………………………….………………………....……………..24
5.1 Multitemporal imagery………………………………………….…….………………….24
5.2 Ancillary data…..………………………………………………….………………….…..26
5.3 Sample size……………………………………………….……….………………………28
5.4 Band combinations and variable importance………………….………………………….29
5.5 Optimal random forest parameters……………………….………………………………31
6. Conclusion……………………………………………….………….………………………….…31
7. References……………………………………………….…………….…………….……….……34
8. Appendix………………………………………………..……….……………………………….40
iv
List of Tables and figures
Table 1: CadasterENV forest class definitions………………………………………………………….……4
Table 2: Correlation coefficients matrix for selected Sentinel-2 bands, July 21 image………..…….…..20
Table 3: Confusion matrix for random forest classification. ………………………….………………..…23
Table A1: Data inputs. …………………………………………………………………………………...……40
Figure 1: Sentinel-2 MSI bands vs. spatial resolution. ……………………………………………….….….2
Figure 2: Visualization of random forest classification. …………………………………..…………...……7
Figure 3: (a) Location of Sentinel-2 tile 33VXF (outlined in red), (b) location of Ekerö municipality
(outlined in green). ……………………………………………….………………………………….9
Figure 4: Mean spectral signatures (TOA reflectance) of CadasterENV classes for selected Sentinel-2
bands derived from reference data. …………………………………………….…………….…..11
Figure 5: Random forest method flowchart. ………………………………………………….……….……13
Figure 6: Overall accuracies for various Sentinel-2 scene image combinations. …………………..……17
Figure 7: Sample size vs overall accuracy, where training samples are randomly removed from the
model. …………………………………………..……………………………………………………19
Figure 8: CadasterENV map produced using random forest classification, Ekerö municipality….….22
Figure 9: (a) Example portion of CadasterENV map produced using random forest classification, (b)
2015 CIR orthophoto for comparison. ……………………………………………………………23
Figure 10: Class distribution (% forest area) among forested areas in Ekerö municipality…….….…..24
1
1. Introduction
Land cover mapping takes an essential role in various applications relating to land
management and conservation, particularly with regards to predicting the geographic
distribution and biophysical dynamics of natural and agricultural areas, providing vital
information to a wide range of applied research (Ban et al., 2015; Baret and Buis, 2008; Foody,
2002). Remote sensing methods have become integral in the field of Landscape Ecology,
where environmental patterns influence ecological processes and shape the interactions
between organisms and their environment. In this context, landscape can be defined as an
area of land consisting of a mosaic of different habitats, with interactions between Earth’s
surface/atmosphere and the organisms that inhabit it, the scale of which is dependent on the
target organism(s) of interest (Turner, 1989). Earth observation missions via remote sensing
provide access to spatial and temporal realms of landscape observation unavailable to
researchers restricted to field investigation.
Machine learning techniques have become increasingly popular for a range of applications
relating to the classification of data across the scientific community and notably within a
remote sensing context, due in part to the evolving nature of satellite datasets, which have
become progressively larger and more complex over time. A wide variety of classification
methods are available, and the determination of which algorithms are best-suited to specific
applications has been of great interest to researchers (Elder and Lee, 1997; Michie et al., 1994).
Recently, there has been a focus on aggregating classification models into “ensembles”, due
to their ability to further improve the accuracy of well-established classification algorithms.
The ensemble phenomenon was noticed independently by several researchers who were
investigating methods ranging from decision trees (Ho et al., 1990), mathematics theory
(Kleinberg, 1990), and neural networks (Hansen and Salamon, 1990). They were further
developed via the bootstrap aggregation technique by Breiman (1996) and the adaptive
boosting methods of Freund and Shapire (1996). These techniques were soon adopted by the
remote sensing community for use in classifying land cover using satellite imagery (Pal,
2005; Gislason, Benediktsson, and Sveinsson, 2006).
The European Space Agency’s Sentinel-2 satellites were designed to improve upon the
technology and experience drawn from the SPOT and Landsat missions of previous decades
(Drusch et al., 2012). The Sentinel-2 mission comprises a pair of polar orbiting satellites
carrying a Multi-Spectral Imager (MSI) sensor with a spatial resolution of 10 meters in the
visible and near infrared bands, 20 meters in 3 Red-edge bands and 2 shortwave infrared
bands, and 60 meters in 3 atmospheric bands (Figure 1). While not a hyperspectral system,
Sentinel-2’s innovative sensors allow for more spectral continuity than preceding multi-
spectral systems aboard the Landsat 8 and SPOT6/7 satellites, earning it the informal
designation as a “super-spectral” imager (Verrelst et al., 2015). Particularly promising are the
red edge bands, due to their capacity to detect slight differences in chlorophyll content,
2
making Sentinel-2 theoretically well-suited for forestry applications (Horler, Dockray, and
Barber, 1983; Laurin et al., 2016).
Figure 1. Sentinel-2 MSI bands vs. spatial resolution. The aerosol, water vapor, and cirrus bands (B1, B9, B10) were not used in this study (ESA, 2017).
Other key features of the Sentinel-2 mission include the MSI’s wide swath (290 km) and its
low revisit time. Sentinel-2A was launched in June 2015 and has been providing images of
earth’s surface, on a global scale, at 10-day intervals at the equator. Sentinel-2B was launched
into the same orbit, phased 180° apart from Sentinel-2A, on March 7, 2017, bringing the
temporal resolution of the Senintel-2 mission down to 5 days at the equator, and 2-3 days at
mid-latitudes (ESA, 2017). The high temporal resolution enables unprecedented detail in the
monitoring of vegetation phenology and biotic/abiotic land surface changes for a freely-
available, non-commercial system (Banskota et al., 2014; Wulder and Coops, 2014). To the
best knowledge of the author, only one study evaluating Sentinel-2’s spectral bands in the
context of forest classification using non-simulated data currently exists, the work of
Immitzer, Vuolo, and Atzberger (2016).
The classification of heterogeneous landscapes with low inter-class and high intra-class
spectral variability poses a challenge in remote sensing research (Ghimire et al., 2010).
Several studies have shown that the use of multi-seasonal acquisitions can help increase
spectral separation between land cover types, though the datasets have a higher degree of
complexity (Lunetta and Balogh, 1999; Oetter et al., 2001; Waske and Braun, 2009; Wolter et
al., 1995; Yuan et al., 2005). With such high-dimensional, multitemporal satellite imagery,
machine learning algorithms have demonstrated superior accuracy and efficiency in the
classification of data over conventional parametric techniques such as the widely-used
maximum likelihood classification method (Hansen et al., 1996; Huang et al., 2002; Rogan et
al., 2003).
3
A simple, robust classification method that is able to handle noise, complex measurement
spaces, and a limited number of reference samples in relation to the size of the study area
would be of great value to the field of remote sensing (DeFries and Chan, 2000; Rogan et al.,
2008). As the Sentinel-2 mission is relatively new and thus the available literature on the
performance of its spectral channels in the context of forest classification is limited, this study
seeks to contribute to the state-of-the-art by investigating the following research questions:
(1) how can Sentinel-2 be best utilized for future land cover classification studies, at a
regional to national scale? (2) How do band combinations, time series, ancillary data, and
sample size affect the map accuracy, and how can these variables be adjusted to produce an
optimal result? (3) How effective are machine learning techniques in the context of remote
sensing?
2. Background
2.1 CadasterENV and reference data
CadasterENV Sweden is a multi-purpose land cover mapping and change monitoring
program with the goal of building a homogenous, nationwide land cover database. Funded
by the European Space Agency, the primary users include the Swedish Environmental
Protection Agency, Swedish Forest Agency, County Administration Boards of Sweden, the
Board of Agriculture, Statistics Sweden, and the Cadaster and Land Registration Authority.
It is currently under development, with preliminary versions of 3 different test plots based
on SPOT-5 and Pleiades satellite data combined with ancillary information including LiDAR-
derived metrics, with classifications performed using the Maximum Likelihood classification
method (MLC) in ERDAS IMAGINE software. CadasterENV was designed to maintain full
compatibility regarding thematic details with the older KNAS (Kontinuerlig
Naturtypskartering Av Skyddade områden) land cover database, which has been used since
2002 to communicate Sweden’s land cover statistics both domestically and internationally.
The most recent version of CadasterENV uses the Maximum Likelihood classification
technique with new Sentinel-2 imagery.
A description of the CadasterENV thematic classes are shown in Table 1. Due to a lack of
distinct samples from the reference data and similarity to class 1.1.6, the deciduous forest
with hardwood class (1.1.7) was grouped with deciduous hardwood forest, forming a single
class for deciduous forest containing hardwoods, labeled (1.1.6) for simplicity. The
definitions of these classes are shown in Table 1.
4
Table 1. CadasterENV forest class definitions. Class 1.1.7 was grouped with 1.1.6 for analysis in this study.
Code Map color Name Definition
1.1.1 Pine forest >70% CC = pine
1.1.2 Spruce forest >70% CC = spruce
1.1.3 Mixed coniferous forest >70% CC = pine or spruce, neither having >70% CC
1.1.4 Mixed coniferous / deciduous forest Neither coniferous nor deciduous forest has >70% CC
1.1.5 Deciduous forest >70% CC = deciduous (non-hardwood)
1.1.6 Deciduous hardwood forest >70% CC = deciduous, where >50% = hardwood
1.1.7 Deciduous forest with hardwood >70% CC = deciduous, where 20%-50% = hardwood
The forest classes defined for the CadasterENV product relate to the user requirements of the
various national agencies and institutions that intend to use the map product for their
specific operational needs, particularly the Swedish Environmental Protection Agency.
CadasterENV’s mixture of both single-species classes (e.g. Pine forest) with classes that can
contain multiple species (e.g. Deciduous forest) represents a departure from common
practices regarding the categorization of land cover components (Anderson, 1976; Fassnacht
et al., 2016). Fassnacht et al. (2016) states that any species level classification should by
definition be for single trees or stands consisting only of a single species, and approaches
focusing on larger spatial units that do not allow for the separation of individual trees are
only appropriate for classification of whole forest stands or to assess species mixtures. This is
the case with using Sentinel-2’s resampled 10 meter pixel resolution to map Swedish forest
types, where the aerial extent of tree crowns of primary species (apart from oak) do not
typically reach 10 m2 and thus can be considered inappropriate for species level classification
at the individual tree scale. Furthermore, pixel-based classification of forest groups
containing mixed species, where the spatial resolution of the sensor is too low to distinguish
individual tree crowns, is an approach that has been avoided due to interactions between
background signature and intra-species variability. This has been shown to obfuscate
classifications of stands comprising of a single species, especially when the geographic extent
involves small scales over large areas. It is in these cases where the co-occurrence of same-
species stands but with dissimilar canopy closure and background signal can introduce
classification problems due to the reduction of statistical separability of different classes in
the spectral space (Carleer and Wolff, 2004; Cushnie, 1987).
It is important to note that CadasterENV is affected by this problem of classifying mixed
classes at the pixel level, but it is ultimately the end user who decides the data structure that
is best suited for their analytical or operational needs (Fassnacht et al., 2016; Foody, 2002).
This demonstrates how the remote sensing perspective, delineating forest types in terms of
percentages of species-specific crown cover present in the upper canopy layer, often doesn’t
align with standard definitions in traditional applications where forest stand composition
and species mixture is often characterized as species-specific basal area or standing timber
volume measurements (Fassnacht et al., 2016). Since most CadasterENV applications involve
5
the sustainable management of forest resources, it follows that thematic classes may reflect
this delineation bias.
2.2 Machine learning and Random Forest classification
Machine learning is a term used to describe a system of automatic learning via the
generalization of information. Machine learning algorithms have emerged as a superior
method for classification of complex data, with applications ranging from large scale
association studies for genetic diseases to air quality prediction to growth models for
agricultural crop yield estimation (Scornet, Biau, Vert, 2015). They are particularly well
suited to models with high data dimensionality, such as multitemporal spectral data in
remote sensing studies, due to their non-parametric nature (Hansen et al., 1996; Huang et al.,
2002; Rogan et al., 2003). Non-parametric classifiers differ from conventional parametric
classification methods such as the widely-used maximum likelihood classification (MLC) in
that they do not rely on assumptions of data distribution e.g. normality, a function of mean
and variance, and are generally more accurate than parametric techniques (Foody, 1995;
Friedl and Brodley, 1997). Breiman (2001) notes that data in ecological models often do not
conform to assumptions of independence, homoscedasticity, and multivariate normality.
Another drawback with parametric classifiers such as MLC are their sensitivity to the
Hughes phenomenon, where with a fixed number of training samples, predictive power
decreases as the number of input variables increases (Dalponte et al., 2013; Hughes, 1968).
Waske and Braun (2009) suggested that due to the relative independency of the spectral
content of satellite acquisitions from different dates, a method based on random selection of
input features is well-suited to analyses employing multitemporal imagery. Immitzer et al.
(2012) and Pant et al. (2013) have also confirmed that a method which employs random
subset selection of the data performs well when using mixed sets of inputs to classify data, as
is the case with multitemporal satellite imagery.
Random forest is a type of non-parametric machine learning algorithm. This method has
been selected to perform a pixel-based classification for this study, due to its combination of
ease-of-use, robustness to noise, as well as its demonstrated performance in past remote
sensing studies (Pal, 2005; Rodriguez-Galiano et al., 2012). Additionally, random forest is
computationally light as well as being simple to set up and automate compared to other non-
parametric classifiers that have produced highly accurate results in land cover mapping,
such as Support Vector Machine (Atkinson and Tatnall, 1997; Fassnacht et al., 2016). Carreiras
et al. (2006) found that a decision tree based classifier ensemble for mapping agricultural and
pasture land using multitemporal SPOT4 imagery significantly outperformed several other
common approaches (MLC, k-nearest neighbor, and simple decision tree) when using the
same study area and reference data. Furthermore, the availability of open-source software
implementations such as R and Python has facilitated the proliferation of non-parametric
machine learning techniques.
6
One of the key advantages of random forest is its ability to exploit the strengths of an
individual group of classifiers while avoiding the weaknesses of any single classifier
(Ghimire et al., 2010; Kotsiantis and Pintelas, 2004). It does not classify data with the
assumption of normality, allowing more complex patterns and relationships to be identified.
A “forest” of binary decision trees is grown via bootstrap aggregation (also referred to as
“bagging”) of samples from an original dataset. A random subset of input variables are then
chosen as the node split criterion for each tree, and the best split is calculated within this
subset based on the Gini criterion, a measure of impurity (Breiman, 2001) (Figure 2a, 2b). The
random subset selection reduces the strength of any individual tree, but also decreases the
degree to which they are correlated. Some training data may not be used at all in the model,
while others may be used multiple times. As a result, greater classifier accuracy and stability
is achieved as the model remains robust when facing slight variations in the input data as it
is permuted over hundreds or thousands of votes (Breiman, 2001; Rodriguez-Galiano et al.,
2012).
The classification of new data occurs by taking the majority vote among the outcomes of all
decision trees constructed in the forest (Genuer, Poggi, Tuleau-Malot, 2010; Immitzer et al.,
2012) (Figure 2c). Similar studies have confirmed that random forest’s bootstrap aggregation
technique allows classification models to be less sensitive to noise, which may be defined as
errors in the data which are unrelated to target reflectance (Chan and Paelinckx, 2008; Pal
and Mather, 2003; Rodriguez-Galiano et al., 2012). Bagging also reduces sensitivity to
overfitting, where a model is too tailored to a specific dataset to a point which reduces its
ability to accurately generalize the underlying trends to unseen data.
7
Figure 2. Visualization of random forest classification. (a) Reference data (v) is labeled according to in-situ species composition. (b) This data set is fitted to a random forest model in the training step where a ‘forest’ of binary classification trees of a random subset of input variables is ‘grown’. (c) The forest is then used in the prediction of new data (Criminisi et al., 2011).
The original random forest algorithm from Breiman’s seminal paper describes its practical
advantages, being able to run efficiently on large databases, handle a large number of input
variables, and provide an internal unbiased estimate of the generalization error and input
variable importance (Breiman, 2001). To estimate classification error, approximately 37% of
the original dataset at each bootstrap iteration are left out of the bootstrap sample, referred to
as the out-of-bag (OOB) data. Each decision tree constructed in the original dataset is applied
to the set of OOB samples, then compared to the true class label to generate an unbiased
approximation of the model’s generalization error. Theoretically, this feature negates the
need for independent validation (Breiman, 2001; Lawrence et al., 2006; Prasad et al., 2006).
The OOB samples are also used, through random permutation of the input variables, to
calculate mean decrease in accuracy (MDA) of a variable, or the degree to which accuracy of
the model decreases when a particular variable is removed (Breiman, 2001; Immitzer et al.,
2012). As a result, the importance of a model’s input variables can be ranked. This variable
importance metric is a calculation of the difference between the misclassification rate of the
c
8
randomly permuted OOB data for each variable, divided by the standard error. This feature
is particularly valuable in a remote sensing context due to the common research question of
identifying which satellite bands contribute most to successful classification, and to what
degree (Fassnacht et al., 2016). For these reasons, the random forest approach appears well-
suited to handle the influx of new remote sensing data, the size and dimensionality of which
continue to increase over time.
2.3 Accuracy Assessment
One of the most useful facets of the random forest algorithm is its ability to generate an
internal, unbiased estimate of classification error from the OOB dataset (Breiman, 2001).
While this is an attractive feature for purposes of estimation and testing model sensitivity,
most of the literature pertaining to random forest classification doesn’t rely on it exclusively
to report accuracy levels, with some exceptions (Naidoo et al., 2012). Indeed, Evans et al.
(2011) specifically recommends the inclusion of cross-validation in random forest studies,
even while acknowledging random forest’s general ability to produce reliable accuracy
estimates without it.
The kappa statistic has long been used as a measure of map agreement for classification
accuracy assessment. In recent years, however, researchers have demonstrated kappa’s
tendency to produce redundant or misleading information in certain circumstances. After
reviewing the findings of Pontius Jr and Millones (2011), it was decided that the kappa
statistic would be avoided in favor of cross validation and its derivatives: overall, user’s, and
producer’s accuracy. Overall accuracy is simply the percentage of samples correctly
classified from the entire sample dataset. Overall accuracy does not, however, reveal how
error is distributed between classes. User’s and producer’s accuracy are ways of representing
individual category accuracies, and are obtained by comparing the predicted data with the
field reference points. The difference between them lies in the perspectives of the map
producer in identifying the integrity of pixel classifications on the map, and the field user
determining whether the map classification is actually represented on the ground (Congalton
and Green, 1999).
3. Methods
3.1 Study area
The study area for this project is the municipality of Ekerö, a group of islands in lake
Mälaren in southeast Sweden (Figure 3). With a land surface area of approximately 218 km2,
Ekerö consists of rural landscape dominated by boreo-nemoral forest, with elevations
between 0 and 82 meters above sea level (KSLA, 2015). The average annual temperature is
approximately 7.6 °C, and the mean annual precipitation is 531 mm (SMHI, 2017). Primary
tree species include Scots pine (Pinus silvestris), Norway spruce (Picea abies), pendunculate
oak (Quercus robur), European aspen (Populus tremula), black alder (Alnus glutinosa), and
9
birch species (Betula sp.). Other species include Norway maple (Acer platanoides), European
ash (Fraxinus excelsior), small-leaved lime / Linden (Tilia cordata), and willow species (Salix
sp.).
Figure 3. (a) Location of Sentinel-2 tile 33VXF (outlined in red), (b) location of Ekerö municipality (outlined in green).
3.2 Satellite and ancillary data
The use of multitemporal imagery in the classification of land cover, which is characterized
not only by spatial patterns but also its temporal dynamics, has been shown to increase
spectral separability between land cover classes as it represents the phenological vegetation
condition (Lunetta and Balogh, 1999; Oetter et al., 2001; Wolter et al., 1995; Waske and Braun,
2009). A combination of acquisitions at different times of the year are preferred because they
possess seasonal variations relevant to the discrimination of surface types (Brisco and Brown,
1995; Pax- Lenney et al., 1996; Pax-Lenney and Woodcock, 1997; Schriever and Congalton,
1995).
A total of three Sentinel-2 images of the Stockholm/Ekerö region from 2016 were included for
use in this study (see Appendix). The dates May 2, July 21, and August 28 were selected as
they 1) contain negligible cloud and/or haze cover, and 2) occur as close to the vegetation
period (after leaf foliation and before defoliation of principal broadleaved tree species) as
a
b
10
possible. It should be noted that the May 2 image is likely a bit too early in the season to be
an ideal springtime acquisition, with some of the principal tree species in Ekerö likely to not
yet have undergone leaf setting. Nonetheless, it was the best springtime scene available due
to heavy cloud/haze cover in the rest of the May – early June images. Outside of the
vegetation season, leaf defoliation causes the spectral information to relate more to the
understory vegetation layer than the canopy layer for broadleaved species (Jensen et al.
(2012).
The images were accessed in the Level 2A processing designation, which includes
radiometric and geometric corrections with ortho-rectification and spatial registration on a
global reference system with sub-pixel accuracy. Atmospheric correction is performed using
ESA’s S2AC algorithm, based on Atmospheric/Topographic Correction for Satellite Imagery
(ATCOR; Richter and Schlaepfer, 2011), which employs the LIBRADTRAN radiative transfer
model (Mayaker and Kylling, 2005). Sentinel-2’s Level 2A product provides Bottom Of
Atmosphere (BOA) reflectance, which is derived from the associated Level 1C product and
processed using ESA’s Sentinel-2 Toolbox. Resampling is performed with a constant Ground
Sampling Distance of 10, 20, and 60 m depending on the spatial resolution of the different
spectral bands (ESA, 2017). All bands are resampled to 10 meter spatial resolution for use
within this project.
11
Figure 4. Mean spectral signatures (BOA reflectance) of CadasterENV classes for selected Sentinel-2 bands derived from reference data. (a) Average mean values extracted from May 2 image, (b) from July 21 image.
Following Franklin (1998) and Rodriguez-Galiano et al. (2012), topographic variables
including the Digital Elevation Model (DEM) derivatives slope, aspect, and curvature at 2
meter spatial resolution were used as ancillary data (see Appendix). In addition, a modified
Topographic Wetness Index, a relative measure of moisture status (see Appendix) was
included (Buchanan et al., 2014). The modification comes in the form of being a combination
of two rasters with different weights: a Depth-to-water (70% weight) raster containing soil
data, resampled from 2 to 10 meter spatial resolution, and a standard Topographic Wetness
Index (TWI), derived from a DEM at 10 meter resolution (30% weight). Taken together, the
combined product called Soil Topographic Index (STI) provides information on both
topographic and edaphic conditions, a proxy for soil transmissivity. Compared to a standard
TWI, the STI incorporates the role of soil conductivity, allowing for better predictions of
potentially saturated areas where soils are not uniform (Sivapalan and Wood, 1987).
Buchanan et al. (2014) found that a weighted modified STI correlated better with observed
soil moisture patterns than TWI (based solely on topography) alone.
0
500
1000
1500
2000
2500Pine
Spruce
Mix coniferous
Mix forest
Deciduous
Dec hardwood
0
500
1000
1500
2000
2500
3000
3500
4000
a b
12
3.3 Implementation of random forest method
The random forest process as implemented in this study is shown in Figure 5. Three Sentinel-
2 scenes of the Ekerö/Stockholm region (Sentinel-2 tile 33VXF) on the dates May 2, July 21,
and August 28, 2016, were downloaded in the level 2A processing designation. Spectral
information and ancillary data were extracted to field reference points, which had been
assigned a CadasterENV class and visually confirmed with an orthophoto. A random forest
model was then fitted to this data, and the OOB error estimate was used to iteratively
determine optimal input parameters in terms of band combinations, time series, ancillary
data, ntree/mtry parameters, and sample size. The fitted model with the lowest estimated
OOB error was used to predict a raster, the accuracy of which was verified via 10-fold cross
validation. This raster may be used to compare to an existing map product and/or determine
which areas to focus future field data collection, though this step was not undertaken in this
study due to the uncertainty of the existing CadasterENV product’s accuracy.
Reference data was collected by the municipality of Ekerö during the summer of 2016
(unpublished report). A total of 663 samples were collected based on a random stratified
sampling strategy of land cover types from an existing biotope database (see Appendix).
Each of these points were assigned a CadasterENV class based on the description of the site
species composition. The desired number of samples per class were then chosen following
previous studies employing the random forest algorithm where the training data contained
an equal number of samples per class to avoid feature bias (Guo et al., 2011; Rodriguez-
Galiano et al., 2012). Chen et al. (2004) reported that disproportionate representation of input
variables in a random forest classifier resulted in biased classifications due to over-
representation of the bootstrap sample of the majority class (having the greatest frequency of
training samples contributing to classification), causing the minority class to be under-
represented.
13
Figure 5. Flowchart, random forest method as implemented in this study. Optional further steps shown in grey (not performed in this study).
To avoid this problem, supplementary training data was added via 2D aerial photography
interpretation, successfully employed in a random forest for land cover classification context
by Immitzer et al. (2012). A color infrared orthophoto from 2015, 0.25 m spatial resolution,
provided by Lantmäteriet (Swedish Land Survey), was visually interpreted on screen in
ArcMap software version 10.3.1 to confirm the integrity of the existing reference data. The
original reference dataset was determined to contain 2 classes that had over 120 records, so a
sample upscaling method was chosen to balance the dataset. Supplementary records were
added using a random stratified approach based on the current version of the CadasterENV
product, resulting in a proportionate training set of 720 samples, or 120 samples per class.
This more than covers the general recommendation of having a minimum of 20 to 100
samples per class by Congalton and Green (2008) to account for intra-class spectral
variability. To determine the effect of sample size on overall map accuracy, an additional 30
14
samples were added to each class for a sample size accuracy assessment, though these extra
samples were not included for use in producing the final classification map.
3.4 Multitemporal imagery
To examine the effect of using multiple acquisition scenes, a random forest model was fitted
to 7 different time series combinations, using all available bands: May 2 single image, July 21
single image, August 28 single image, May 2 and July 21 combination, May 2 and August 28
combination, July 21 and August 28 combination, and May 2, July 21, and August 28
combination. The results were verified via 10-fold cross validation, and the overall accuracies
of which were compared to determine the optimal time series combination.
3.5 Ancillary data
To test the potential impact of adding ancillary data, random forest models were fitted with
common topographical metrics, including slope, aspect, and curvature. In addition, a
modified Soil Topographic Index (STI) was tested. The effect on estimated OOB error of
these inputs were evaluated with both single-date spectral data as well as various band
combinations of multitemporal data. The results were verified using 10-fold cross validation,
and the resulting overall accuracies of the maps produced were used to determine which, if
any, ancillary data was useful to the model.
3.6 Sample size
As the collection of field data requires time and human labor, but is nonetheless essential in
validating the results of remote sensing studies, it is of great importance to determine the
amount of samples needed to produce a desired classification result (Lippitt et al., 2008; Pal
and Mather, 2003; Rogan et al., 2003, 2008). To examine the effect of sample size on
classification accuracy, samples were randomly removed from the dataset and set aside for
testing classification accuracy. The following sample numbers per class were tested: 150
(baseline), 100, 50, 25, and 10. Because the samples randomly removed were set aside for
testing, 100 training samples meant 50 testing samples, 50 training samples meant 100 testing
samples, 25 training samples meant 125 testing samples, and 10 training samples meant 140
testing samples. The overall accuracies of maps produced from models fitted with the
various sample size amounts were then compared to one another.
3.7 Optimal band combinations and variable importance
In classification analyses, the determination of variable importance plays a critical role in the
interpretation of data and in understanding the underlying phenomena that influence the
classifier (Strobl et al., 2007). Feature reduction approaches are commonly used in order to
determine the best predictors of hyper-dimensional feature spaces (Fassnacht et al., 2016).
Furthermore, they have been shown to improve classification accuracies in a number of
studies (Clark and Roberts, 2012). There are two types of feature reduction techniques:
15
feature selection and feature extraction. Feature extraction procedures, such as the widely
used principal component analysis (PCA), generate a new, reduced set of bands in which the
information content is refined to minimize correlation (Singh, 1989). In contrast, feature
selection techniques identify a subset of the original variables, which allow for the
interpretation of the importance of selected predictors. Although Fassnacht et al. (2014)
found that feature selection approaches are less efficient in improving classification accuracy,
this study aims to identify the most useful inputs among the original predictors, to assess the
performance of Sentinel-2’s MSI system. Therefore, a stepwise feature selection approach
was taken.
To determine the relative importance of each spectral band, an iterative feature selection
approach was carried out, following the results of Guyon et al. (2002), who investigated
different variable selection methods in a machine learning context. Reflectance values for all
10 bands, for all three acquisition dates (May 2, July 21, August 28, 2016), were extracted to
the reference data points. The default settings for the random forest model were used (ntree
= 500, mtry = 5 [square root of # of inputs, rounded down]) to fit a random forest model. A
Recursive (or backward) Feature Elimination was then employed (Kohavi, 2000). The process
was carried out in two different ways: first, individual bands were eliminated stepwise from
the model based on their ranking in the random forest algorithm’s internal variable
importance score, based on Mean Decrease in Accuracy.
To test the algorithm for any variable selection bias, a second method was tested, removing
individual bands stepwise based on their performance in terms of actual OOB error estimate
averaged over 5 separate trials. These two approaches, beginning with all 10 bands for all 3
images, were carried out until only 3 bands for each image remained. The results were
confirmed via 10-fold cross validation. Additionally, a correlation matrix was calculated to
determine the degree to which the different bands are correlated (Table 2). A correlation
matrix shows the correlation coefficients that represent the relationship between two
variables, a measure of dependency (Snedecor and Cochran, 1968). Here, cell values of the
individual band rasters of the Sentinel-2 tile 33VXF for the May 2 image are presented in
terms of their relationship to another individual band layer, calculated as a ratio of the
covariance between them divided by the product of their standard deviations.
3.8 Optimal random forest parameters
Only two parameters must be set in the random forest model: the number of trees to grow
(ntree), and the number of random split variables (mtry). Due to the Strong Law of Large
Numbers, generalization error converges after a certain number of trees are grown, thus the
random forest algorithm doesn’t overfit the data when using large values for ntree (Feller,
1968). As with the random subset selection feature previously mentioned, reducing the
number of split variables (mtry) diminishes the classification strength of the individual trees,
but makes them less correlated to each other (Breiman, 1996). Hence, it is important to strike
16
the correct balance between ntree and mtry with the combination that produces the lowest
generalization error.
There is also the option to sample with or without replacement, which can be described as
“putting back” samples after they’ve been randomly selected, allowing them to be selected
again or not. Sampling without replacement was chosen for all fitted models in this study
after reviewing the findings of Strobl et al. (2007), who established that sampling with
replacement can introduce bias into the variable selection mechanism. The random forest
process was implemented in the Marine Geospatial Ecology Tools toolbox in ArcMap
version 10.3.1 (Roberts et al., 2010).
For the time series and band combination where estimated OOB error rate was lowest,
following the results of the previous steps, a number of different values of trees grown
(ntree) and number of randomly selected split variables (mtry) were tried. Since the random
forest algorithm’s default value for ntree is 500, and values used in several previous random
forest studies ranged from 100 to 2500, the following values of ntree were tested: 50, 100, 200,
300, 400, 500, 700, 1000, 1500, 2000, and 3000. As the default value for mtry is the square root
of the number of input variables, rounded down, and the model where maximum number of
inputs for all band and image combinations was 31 (3 images, 10 bands, STI), values of mtry
between 1 and 5 were tested, for all combinations of ntree. The combination of values for
ntree and mtry that produced the lowest OOB error estimate was selected to predict a final
raster classification map.
3.9 Validation
To validate the results of the random forest classifier, a 10-fold cross validation technique
was employed. 10-fold cross validation, a variant of the k-fold cross validation technique
where k=10, allows all reference data to be used for both training and validation with each
individual observation used for validation exactly once. The dataset is randomly split into 10
equal-sized subsets. 9 of these subsets are then selected and together form a training dataset,
and the remaining single subset is used for testing. This step is repeated 10 times with the
remaining subsets, the results of which being subsequently aggregated into one confusion
matrix (Kohavi, 1995; Snee, 1977). This approach was chosen as an improvement over
traditional simple data splitting, dividing the data so that two-thirds is used for training and
one-third for testing, due to the potential for substantial loss in modeling capability
associated with a forfeiture of useful data for the classifier (Seni and Elder, 2010). A forest
mask, part of the Ekerö biotope database, was used to separate non-forested areas for the
production of classification maps.
17
4. Results
4.1 Multitemporal imagery
The overall classification accuracies for 7 different image combinations are shown in Figure
6. Classifications performed using two images, with all available bands included, were on
average over 5.3% more accurate, in terms of overall accuracy, than those employing single-
date acquisitions, with values ranging between 2.0% (July 21 single date vs July 21 / August
28 combination) and 7.7% (August 28 single date vs May 2 / August 28 combination). This
provides further support to the well-established idea that multitemporal imagery can
improve classification accuracy (Lunetta and Balogh, 1999; Oetter et al., 2001; Waske and
Braun, 2009; Wolter et al., 1995; Yuan et al., 2005). It was determined that the May 2 / July 21
combination yielded the highest relative overall map accuracies. Interestingly, the addition
of the third, late summer (August 28) image was the next best time series combination in
terms of overall accuracy, but performed slightly worse than the May 2 / July 21
combination, with a 1.1% reduction. This confirms the results of Hill et al. (2010) and
Mickelson et al. (1998), whose research suggests that adding additional images is not
necessary when the phenological variations of the study area can be captured with less.
Figure 6. Overall accuracies for maps produced using various Sentinel-2 scene image combinations. Colors refer to acquisition date / combination (May 2 = green, July 21 = blue, Aug. 28 = orange).
Even amongst individual CadasterENV classes, the May 2 / July 21 combination proved most
consistent in obtaining the highest producer’s and user’s accuracies of all the time series
combinations, achieving 98.0% producer’s accuracy in the pine class and 94.7% user’s
accuracy in the spruce class. The exceptions were the spruce and mixed coniferous classes,
where the May 2 single image fared slightly better with the producer’s accuracy of the mixed
coniferous class and user’s accuracy of the spruce class, obtaining 94.0% and 97.0%,
respectively. Additionally, the July 21 / August 28 combination outperformed the May 2 /
July 21 combination in producer’s accuracy in the spruce class and user’s accuracy in the
68
70
72
74
76
78
80
82
84
86
May 2 July 21 28-Aug May 2 / July21
May 2 / Aug28
July 21 / Aug28
May 2 / July21 / Aug 28
Overall
accuracy
(%)
Sentinel-2 scene / combination
18
mixed coniferous class. In the former case, the July 21 / August 28 combination beat the May
2 / July 21 combination by over 10%, reaching 83.7%, and over 6% in the latter instance with
89.8%. The single date August 21 image had the overall poorest performance in terms of
overall accuracy (78.6%) and producer’s / user’s accuracy (worst performer in 3 of 6 classes),
however the July 21 single date image performed similarly in regard to producer’s accuracy.
4.2 The influence of ancillary data on random forest classification
None of the common topographic variables (slope, aspect, curvature) made any
improvements in the model’s classification accuracy, confirming the results of Engler et al.
(2013). In contrast, the inclusion of the modified STI increased overall accuracy in
multitemporal models by nearly 2%. This result corroborates the results of Buchanan et al.
(2014), who demonstrated the usefulness of combining soil data with topographic indices
and its ability to better predict water-saturated areas compared to metrics derived solely
from topography. This improvement was especially pronounced in the classes containing
deciduous trees: mixed coniferous/deciduous forest (3.3% average increase), deciduous
forest (2.3% average increase), and deciduous hardwood forest (2.7% increase). This effect
was even more prominent in single-date classification trials, with the deciduous hardwood
class showing a 9% increase and deciduous forest showing 4% improvement in a
classification of the July 21 image using all 10 bands.
4.3 The influence of sample size on classification accuracy
Compared to the baseline 150 samples per class, it was found that 50 samples (one third of
the total) could be randomly removed and a model fitted on the 100 that remain retain an
accuracy threshold within 2.5% of the baseline (Figure 7). When 100 samples (two thirds of
the total) were removed, the accuracy was within 6% of the baseline. With 125 samples
removed (approx. 83.3% of the total), accuracy was 8.8% lower than the baseline. Finally,
with only 10 samples per class (approximately 93.3% of total samples removed), accuracy
dropped 9.4% compared to the baseline.
19
Figure 7. Sample size vs overall accuracy, where training samples are randomly removed from the model.
4.4 Band combinations and variable importance
The correlation coefficient matrix for the July 21 image can be seen in Table 2, with clusters of
highly correlated bands (highlighted in red) standing out. First, a high degree of correlation
exists among the 3 bands in the visible range. Second, the 2nd and 3rd red edge bands and the
two near infrared bands display a high correlation. Both shortwave infrared bands were
strongly correlated to each other and red edge band 1, with the SWIR1 band showing a
higher correlation to the red edge and NIR bands than SWIR2. The least correlated bands
(highlighted in blue) were between the visible and NIR bands with the red edge bands. Red
edge band 1, unlike the other 2 red edge bands, showed strong correlation with the visible
bands, particularly the green and red bands.
70
72
74
76
78
80
82
84
86
88
90
10 25 50 100 150
OverallAccuracy (%)
Number of Training Samples per Class
20
Table 2. Correlation coefficient matrix for selected Sentinel-2 bands, July 21 image.
blue green red NIR 1 SWIR 1 SWIR 2 RE 1 RE 2 RE 3 NIR 2
blue 1.000 0.964 0.953 0.219 0.477 0.691 0.780 0.300 0.225 0.197
green 0.964 1.000 0.976 0.421 0.653 0.813 0.899 0.497 0.424 0.400
red 0.953 0.976 1.000 0.338 0.632 0.825 0.879 0.424 0.348 0.325
NIR 1 0.219 0.421 0.338 1.000 0.856 0.667 0.698 0.982 0.987 0.987
SWIR 1 0.477 0.653 0.632 0.856 1.000 0.932 0.887 0.900 0.869 0.867
SWIR 2 0.691 0.813 0.825 0.667 0.932 1.000 0.940 0.737 0.684 0.673
RE 1 0.780 0.899 0.879 0.698 0.887 0.940 1.000 0.771 0.712 0.697
RE 2 0.300 0.497 0.424 0.982 0.900 0.737 0.771 1.000 0.993 0.991
RE 3 0.225 0.424 0.348 0.987 0.869 0.684 0.712 0.993 1.000 0.997
NIR 2 0.197 0.400 0.325 0.987 0.867 0.673 0.697 0.991 0.997 1.000
very low (<0.35)
low (0.35-0.65)
moderate (0.65-0.85)
high (0.85-0.95)
very high (>0.95)
To determine variable importance, two variations of a Recursive Feature Elimination
approach were evaluated (Fassnacht, 2014). First, random forest’s own variable importance
ranking, successfully employed in a non-parametric machine learning environment by Diaz-
Uriarte and Alvarez de Andres (2006) and Guyon et al. (2002), was tested. Second, a method
that didn’t rely on random forest’s internal variable importance rankings was used to verify
the results. Starting with all available bands and ancillary inputs, variables were removed
stepwise on the basis of their actual OOB error estimate, averaged over 5 trials. Both
methods produced similar results. After accuracy verification via 10-fold cross validation, it
was found that the lowest error rates were achieved with a 4-band combination of the
following spectral channels: one band in the visible range, two bands in the red edge range,
and one shortwave infrared band.
The visible band that produced the best results was the red visible band (band 4). The
shortwave infrared band that yielded the highest accuracy was SWIR2 (band 12). This
finding corroborates the results from Immitzer, Vuolo and Atzberger (2016), who
determined the robustness of a combination of red edge, SWIR, and visible bands for
classification of tree species in central Europe. It should be noted, however, that their
findings indicated the visible blue band was more important than the visible red band, in
conflict with the results of this study.
The most important red edge band was band 6, and the best results were achieved when
combined with a second red edge band (band 7). Using red edge band 5 in combination with
band 6 produced very similar results to the band 6 / band 7 red edge band combination. This
combination generally produced slightly better results in both producer’s and user’s
accuracy for the coniferous classes, but slightly worse producer’s and user’s accuracy for the
deciduous and mixed coniferous-deciduous classes. The only noteworthy difference between
the accuracies of these two combinations occurred with producer’s accuracy in the deciduous
21
forest class. The band 6/7 red edge combination achieved a producer’s accuracy of 85.3%,
while the band 5/6 red edge combination achieved 81.3%. Furthermore, the overall map
accuracy of the 4/6/7/12 combination was nearly half a percent higher than the 4/5/6/12
combination, earning 85.3% and 84.9%, respectively. Put another way, the latter combination
produced mere 4 more net misclassified samples, of the total 720. The 4/6/7/12 combination
was then selected to create the final classification map.
4.5 Optimal random forest parameters
For the time series and band combination where estimated OOB error rate was lowest, the
May 2 and July 21 combination (bands 4, 6, 7, 12), the OOB error estimate stabilized once
reaching 200 trees, confirming both the robustness of Breiman’s (2001) use of 500 trees as a
default setting, as well as the findings of Immitzer et al. (2012) and Rodriguez-Galiano et al.
(2012). Nonetheless, the OOB error estimate was 0.2% lower with ntree=2000 compared with
ntree=200, so ntree=2000 was selected for the production of the final map. OOB error
estimates for the optimal value of ntree, 2000, were 0.3% to 1.4% lower when replaced with
other values of ntree between 50 and 3000. For the number of split variables, mtry, the
default setting (square root of number of predictor variables, rounded down) produced the
best results, slightly outperforming the default value plus 1. As 9 input variables (2 images, 4
bands + STI) were used to fit the model, the default value of mtry was 3. This finding
confirms the suitability of using the default value of the number of random split variables.
Using the default value of mtry, OOB error estimates ranged between 0.2% and 1.0% lower
than when replacing mtry with other values between 1 and 5.
4.6 Final result: random forest classification map
After separate analyses of the impact of Recursive Feature Elimination for different band
combinations, the use of multitemporal imagery, ancillary data, model parameters, and
sample size, a final classification raster was produced, representing the established optimal
model conditions, shown in Figures 8 and 9a. This map was predicted on a model fitted with
the parameters ntree=2000 and mtry=3 using the red visible band 4, two red edge bands
(bands 6 and 7), the shortwave infrared band 12, as well as the extracted STI value, for the
two image acquisitions May 2 and July 21, 2016, using all 120 samples per class.
22
Figure 8. CadasterENV map produced using random forest classification, Ekerö municipality (inset of figure X outlined in blue).
10-fold cross validation showed this map achieving overall accuracy of 86.0%. The confusion
matrix is shown in Table 3. Producer’s accuracy varied between 90.8% for deciduous
hardwood forest and 81.7% for mixed coniferous forest, and User’s accuracy varied between
93.2% for deciduous hardwood and 81.8% for mixed coniferous-deciduous forest. For
coniferous forest classes, confusion occurred only with other coniferous classes and the
mixed coniferous-deciduous class, with one exception: a single hardwood record, located in
an area that appears to have been clearcutted since the time of the orthophoto acquisition
(which was used to add the record), was misclassified as pine. For deciduous classes,
confusion occurred only within other deciduous classes and the mixed coniferous-deciduous
class.
23
Figure 9. (a) Example portion of CadasterENV map produced using random forest classification, (b) 2015 CIR orthophoto for comparison (see Appendix).
Table 3. Confusion matrix for random forest classification. Producer’s accuracy highlighted in red, user’s accuracy highlighted in blue, overall accuracy highlighted in green.
Class Pine Spruce Mix con Mixed Deciduous Hardwood User's
Pine 107 3 7 7 0 0 86.3%
Spruce 1 99 10 3 0 0 87.6%
Mix con 4 14 99 4 0 0 81.8%
Mixed 7 4 4 98 6 0 82.4%
Deciduous 0 0 0 8 107 11 84.9%
Hardwood 1 0 0 0 7 109 93.2%
Producer's 89.2% 82.5% 82.5% 81.7% 89.2% 90.8% 86.0%
Figure 10 shows the relative distribution of forest classes in terms of percent forest area. A
total of 1027629 pixels were identified as forest in Ekerö municipality, comprising an area of
102.8 km2, approximately 47% of Ekerö’s surface area. The most common CadasterENV
forest class is mixed coniferous-deciduous forest with 30.9 km2, or 30.1% of all forested area
in Ekerö. The next most prevalent class is pine forest with an area of 23.7 km2, or 23.1% of
forest in Ekerö. Deciduous hardwood comprises 21.7 km2 of forest, or 21.2% of the total.
Deciduous forest has a total area of 12.6 km2, 12.3% of the total. Mixed coniferous forest
a b
24
comprises 7.4 km2 of forest, or 7.2%. Of all forest classes, spruce forest is the least represented
at 6.5 km2, 6.3% of forest cover in Ekerö.
Figure 10. Class distribution (% forest area) among forested areas in Ekerö municipality.
5. Discussion
5.1 Multitemporal imagery
As different plant species respond to biological processes such as pigmentation and
senescence in unique ways and at different rates, it follows that using multitemporal imagery
to capture differences in foliar presentation would help separate forest classes that may be
spectrally similar in any one single acquisition image (Chuine and Beaubien, 2001; Dymond
et al., 2002). Less well-studied but suggested by previous research nonetheless is the finding
that although a time series generally improves accuracy via capturing vegetation’s
phenological condition, there appears to be a limit where once reached, one finds a tradeoff
between information efficacy and redundancy, where additional scenes contain unnecessary
or irrelevant details that effect the classifier’s ability to discriminate between classes (Hill et
al., 2010; Mickelson et al., 1998).
Indeed, Hill et al. (2010) reported that image quality, in terms of timing, is more important to
improving classification accuracy than the quantity of images used. In this study, the May 2 /
July 21 combination produced the best result, 1.1% better in terms of overall accuracy when
compared to a model that included spectral data from the August 28 image, verified via 10-
fold cross validation. This shows that while the time series aspect of classification is
important, more data in the form of additional acquisitions after a certain quantity is reached
does not necessarily produce a better predictive model, as data dimensionality increases.
Hughes (1968) laid out the theoretical justification for requiring more reference data to
compensate for the added layer of complexity to the model in these instances, which was
Pine
Spruce
Mixed coniferous
Mixed forest
Deciduous
Deciduous hardwood
25
subsequently confirmed in a remote sensing context by Hill et al. (2010) and Key et al. (2001).
Furthermore, additional images can potentially contain noise related to different canopy
illumination angles and intensity, among other effects, weakening the discriminatory power
of the model for class separation. It is important to note that the integrity of specific
acquisition dates are not universally applicable from year to year, as annual phenology is
affected by both short and long term weather and climatic dynamics (Chuine and Beaubien,
2001). Furthermore, one must take into account the decreasing length of the growing season
the further north the study site is located.
Because the May 2 single image classification achieved higher accuracies than the July 21 and
August 28 single images, and the spring-summer (May 2 / July 21) combination was more
accurate than the summer-late summer (July 21 / August 8) combination, one can infer that
the springtime acquisitions were most critical in the separation of the classes. This finding
corroborates past studies which have highlighted the importance of acquiring data for the
discrimination of forest classes containing deciduous trees in temperate forests using a
single-date image either at the start or at the end of the growing season. Key et al. (2001) and
Schriever & Congalton (1995) found that mid-autumn was the optimal time, noting that the
greatest differences amongst target vegetation types in factors relating to the biophysical
properties of the plant structure are likely to be observed during senescence. However, trees
during autumn senescence are more susceptible to unpredictable frosts and wind conditions
that may contribute to premature leaf removal. Furthermore, in choosing between spring
and autumn images for use in discriminating between land cover classes, Sweden’s
geographical position at high latitudes become a decisive factor.
The study areas of Key et al. (2001) and Schriever & Congalton (1995) are both located in
Eastern North America, between latitudes 39°N and 43°N. In contrast, Ekerö, Sweden, is
located between latitudes 59°N and 60°N. As solar zenith angles increase as position moves
further away from the equator and further away in time from the summer solstice, the effects
of shadows increase, a reduction in the signal to noise ratio occurs, and the atmospheric path
is longer, affecting the spectral distribution of the irradiance (Hawotte et al., 2016). An early-
to-mid May acquisition, the approximate time of foliation of primary tree species in Ekerö, is
roughly 6 weeks away from the summer solstice occurring on June 21. A mid-autumn image
occurring in early-to-mid October, on the other hand, is approximately 14 weeks after the
summer solstice. The effects of shadows, signal to noise ratio, and spectral distribution of the
irradiance thus affect a mid-autumn image to a greater degree. Therefore, while a mid-
autumn acquisition may have been optimal at latitudes which are closer to the equator, one
should exercise caution in assuming the same will be true at higher latitudes.
With regard to individual classes, generally very high producer’s and user’s accuracies were
achieved. Only the May 2 / July 21 image combination achieved the highest producer’s and
user’s accuracy in 4 of the 6 forest classes, demonstrating the importance of optimizing
image acquisition timing as it relates to the capturing of phenological differences needed to
26
discriminate between classes. The May 2 single date classification attained the highest
producer’s accuracy for the mixed coniferous class at 94.0%, but user’s accuracy was the
worst of all combinations at 78.3%. This demonstrates the tradeoff between accurate
classification and actual map reliability on the ground. Interestingly, the May 2 single date
classification had the highest user’s accuracy in the spruce class at 97.0%, though producer’s
accuracy was 65.3%, the lowest for all combinations. The opposite was true for the July 21 /
August 28 image combination, achieving significantly better producer’s accuracy in the
spruce class and user’s accuracy in the mixed coniferous class compared to the May 2 / July
21 combination.
The ability of an image or a combination of images to achieve both the highest user’s
accuracy in one category and the lowest user’s accuracy in another, for example with the
spruce and mixed coniferous classes in the May 2 image, can be explained by their confusion
with other classes. Only one sample from another class, deciduous hardwood, was
misclassified as pine, but 17 of the 49 spruce samples were misclassified as either mixed
coniferous, mixed coniferous-deciduous, or pine. At the same time, the May 2 / July 21
combination had two mixed coniferous samples misclassified as spruce, but only 13 of 49
spruce samples were misclassified as either mixed coniferous, mixed coniferous-deciduous,
or pine. The May 2 image, with only half the available spectral information compared to the
May 2 / July 21 combination, had a harder time distinguishing amongst the 4 classes
containing coniferous trees, even if it performed slightly better in terms of % spruce samples
correctly classified. Therefore, the May 2 / July 21 image combination produced much better
overall results when taking into account not only how many samples were correctly
classified but also the reliability of that classification to a user of the map on the ground.
It is important to note that since the mixed coniferous class is categorized as having a
compositional mix of two existing classes, pine and spruce, a certain level or confusion
within these classes is a logical consequence and can be expected. As previously noted, the
inclusion of classes containing both individual species and separate classes comprising a mix
of those species is not an ideal classification scheme in the remote sensing context, due to
interactions between background signature and intra-species variability, related to the
reduction of statistical separability of different classes in the spectral space (Carleer and
Wolff, 2004; Cushnie, 1987; Fassnacht et al., 2016). The poorest performing single image
classification was the August 28 acquisition, the only case that had the lowest producer’s and
user’s accuracy in 3 of 6 classes. This is likely due to the timing of the August 28 image, with
late summer having the lowest spectral variability between forest classes when compared to
the spring (May 2) and high summer (July 21) images.
5.2 Ancillary data
Though topographic variables have been shown in previous studies to improve classification
accuracy, including those employing the random forest algorithm, their contribution to the
27
model appears to be dependent on both the scale and physical characteristics of the study
area. Engler et al. (2013) confirmed the limited benefit of common topographical variables in
a study area of 200 km2, though Zimmerman et al. (2007) found they were crucial when
working in a larger study area of 60,000km2. Rodriguez-Galiano et al. (2012) found that
elevation was among the most important variables in a study of the Granada Province of
Spain with a large area (13,000 km2) and large elevation range (0-3480 m.a.s.l.) of target land
cover classes. Therefore, it can be inferred that if the study area is large enough and/or
heterogeneous enough for the vegetation observed to be potentially affected by broader
changes in the environmental gradients of the land it inhabits, topographic variables will
provide a great deal of explanatory power to classification models. With smaller or more
homogenous study areas, the scale of topographic variables as they relate to spectral
information is likely to be limited and thus cannot contribute to successful discrimination of
land cover types. Another potential reason topographic variables are useful at small scales is
that intra-class variability can be expected to increase with study area size, compromising
intra-class spectral seperability in the process (Rogan et al., 2008). As the study area in this
project is approximately 218km2, it follows that the results support those found by Engler et
al. (2013), where the study area was also relatively small in size.
In contrast to ineffectiveness of the aforementioned topographic variables in improving
overall accuracy, the inclusion of the Soil Topographic Wetness Index proved useful. Overall
accuracies were nearly 2% higher in multitemporal models and 9% higher in single-date
models when added as input variables in random forest. The integration of soil and
topographic information together to provide a metric of soil transmissivity that appears to
help the algorithm discriminate between classes, where topographic variables alone do not
contain enough information to do so, in this case likely due to the small study area size. This
finding suggests that in areas where soil patterns are not uniform, the STI predicts areas of
potential saturation even where spectral signatures may be similar (Sivapalan and Wood,
1987). This helps to explain the significant improvement STI gives in classification accuracy
to the deciduous forest classes, where mixed spectral signatures occurring within a pixel are
likely to occur due to the possibility of multiple species, a characteristic of the forest class
definition. Here, the STI provides useful information to the algorithm to aid in the separation
of classes where the algorithm has difficulty doing so with spectral information alone. The
effect is even more noticeable in classifications performed with single-date images, as even
less spectral information is available. These results confirm Buchanan et al. (2014), suggesting
that an STI can indeed allow for better predictions of potentially saturated areas than
topographic indexes alone. In areas where soils are non-uniform and when soil/topographic
information is available, the STI metric should be considered as a complement to spectral
data in future forest classification studies.
28
5.3 Sample size
One of the most pertinent questions in remote sensing studies is the determination of the
quantity of reference data needed to achieve a specific classification accuracy threshold.
Several issues must be considered: the need to take enough samples to represent the inter-
and intra-species variation in the target sample area, the level of accuracy desired, as well as
cost, both in economic and temporal terms (Lippitt et al., 2008; Pal and Mather, 2003; Rogan
et al., 2003, 2008). Furthermore, Fassnacht et al. (2016) notes that the amount of training data
needed varies with the target species or classes under investigation, the methods applied,
and the requirements of the end user. The topic is especially important to future land cover
classification projects, which require funding in order to collect field data. As much of
Sweden’s landscape is characterized by dense forest, alpine areas, or terrain that may
otherwise be difficult to access, it is of high importance to investigate the relationship
between the amount of reference data and accuracy objectives before dispatching resources
into the field.
While Jensen (2005) recommended that the minimum number of training records be equal to
ten times the number of input variables for parametric classification methods, various
studies have shown this approach to be insufficient for non-parametric machine learning
methods such as random forest (Foody, 1995; Foody and Arora, 1997; Pal, 2005; Pal and
Mather, 2003). The amount of training data should be more sensitive to study areas and land
cover classes exhibiting high variability, and not simply which classification method was
employed.
Regardless of the method used, more data beats a cleverer algorithm (Domingos, 2012). All
classifiers essentially function by grouping together similar samples of data, the main
difference being how “similarity” is defined. Rodriguez-Galiano et al. (2012) found that with
a 70% reduction in sample size, overall accuracy remained within 5% of the baseline
accuracy obtained using all samples. In this study, a 70% reduction in sample size resulted in
overall accuracy remaining within approximately 6% of the baseline. In comparison,
however, Rodriguez-Galiano et al. (2012) achieved higher stability up until this 70%
threshold, and much lower stability beyond it, whereas in this study less stability was
achieved before the 70% threshold, but not as much instability occurred beyond it. This may
be explained by the great difference in scale between the two study sites, with the reference
study’s site being much larger in areal extent, as well as having nearly double the amount of
land cover classes to classify. Understandably, a model with more classes would have
greater sensitivity to the effects heavy reduction in sample size, as samples must represent
the intra-class variability of the study area. Indeed, Reese (2011) recorded a 9.3% drop in
accuracy when reducing the total number of samples by just 25% in a random forest model.
This demonstrates the need to consider reference data not only in terms of quantity, but also
quality.
29
As the reference data used in this study was collected in the context of creating/validating a
biotope database, the data was inherently flawed for use in this study. Notably, species
composition for all reference points was based not on % crown cover, but relative number of
species present by stem count. Furthermore, even though the data was confirmed visually
using an orthophoto, there was no standard radius by which these counts were made in the
field, which clearly affects the integrity of classifications that are performed on 10 meter
pixels. This problem is likely to have affected the accuracy in this study, but it was an
unavoidable issue as the author was unable to perform their own reference data collection.
5.4 Band combinations and variable importance
In determining which bands are the most valuable, it is important to note that there is
multicollinearity between the variables. Ranking the input variables on the basis of their
individual contribution to a model’s accuracy may not say much about how one may
optimize the inputs for increasing overall classification accuracy. Any particular variable on
its own may not be among the most important according to the stepwise Recursive Feature
Elimination process, but combined with other spectral channels, may play a role in
producing the most accurate result. For example, the red edge bands were shown to be less
important than either of the SWIR bands, but a model with both SWIR bands and no red
edge bands is less accurate than one that includes one of each. As there are complex
correlative relationships between the different spectral channels, one must try various
combinations based not just on band number, but also the spectral class they belong to
(visible, red edge, NIR, SWIR), as correlation is typically highest within these classes.
Calculating correlation metrics can help steer this process. Once highly correlated input
variables are identified, the variable that produces the most accurate classification output
amongst them can be included in the model and the rest excluded, to not add redundant
information which reduces the performance of the classifier.
Only one of the three visible bands and one of the two SWIR bands were needed due to the
high degree of correlation between them, based on the results of the correlation matrix.
Adding between 1 and 3 red edge bands produced similar results, with the best results
produced using red edge bands 6 and 7 and SWIR band 12. Immitzer, Vuolo and Atzberger
(2016) also confirmed the high value of the red edge and SWIR Sentinel-2 bands for the
identification of tree species in central Europe. Interestingly, they showed the blue visible
(band 2) as being the most important visible spectral channel, whereas in this study red band
produced the lowest classification error rates. This may be due to a greater emphasis on
coniferous trees in the former study, attempting to classify Silver fir (Abies alba) and European
larch (Larix decidua) in addition to pine and spruce. Indeed, Pu and Liu (2011) relate the blue
band’s special relevance to coniferous trees to their reduced relative photosynthetic activity
in the blue spectrum. Immitzer, Vuolo and Atzberger (2016) did, however, note that the
timing of the single-date, late-summer acquisition timing most likely influenced the variable
importance ranking.
30
The limited usefulness of the NIR bands 8 and 8a in classifying similar forest types using
random forest was also confirmed by Immitzer, Vuolo and Atzberger (2016). Additionally,
the correlation matrix showed that NIR band 8 is highly correlated with the red edge bands.
As the bands are characterized by multicollinearity, when combined with spectral
information from the visible and SWIR range, the red edge bands appear to be better at
discriminating between classes than the NIR bands.
Quantifying variable importance aids in understanding the factors that drive a classification
model’s ability to separate data into distinct classes. One of the more perplexing issues with
analyzing the way random forests work is the black-box nature of the algorithm, which
arises from the subtleties of different procedural features of the method, namely bootstrap
aggregation (bagging) and the classification tree node split criterion, which is based on the
Gini impurity index. While bagging is one of the most computationally efficient techniques
to handle large, high-dimensional datasets, finding an appropriate classification model in
one step is not possible due to the scale and complexity of the problem (Buhlmann and Yu,
2002; Kleiner et al., 2014; Wager et al., 2014).
The combination of bagging and random selection of input variables, the tree nodes of which
are then split using the Gini criterion, makes the algorithm very challenging to scrutinize
with rigorous mathematics. This phenomenon can also be described as a difficulty in
examining the effects of randomization and a highly data-dependent tree construction
process simultaneously. Research attempting to explain the mathematical properties of
random forest tend to adopt methodological procedures that are not data-driven, unlike the
practical applications which are data-dependent, and thus a gap is created between theory
and practice (Scornet, Biau, and Vert, 2015).
Strobl et al. (2007) notes that the Gini index, the node-splitting rule employed by the random
forest algorithm, can be biased in certain cases. In their analysis, it was determined
experimentally that with a growing number of input variables, the robustness of the variable
importance calculation decreases. This is because given a fixed number of true predictor
variables, the addition of further inputs decreases the chance a true predictor is randomly
selected, and the less often a variable is selected, the less likely it will be considered to be of
high importance (Genuer, Poggi, and Tuleau-Malot, 2010). Furthermore, the variable
importance ranking may be biased if there is a discrepancy in the scale of measurement
between the input variables. As the scale of measurement is not an indicator of true variable
importance, this can lead to a biased ranking of variables (Strobl et al., 2007). Since the Gini
index is computed for all possible node splits within the range of a given variable, and the
variable selected for a the next split is that which produces the highest overall Gini split
criterion, variables that are greater in scale have better chance to randomly produce a good
criterion value (Strobl et al., 2008).
31
5.5 Optimal random forest parameters
The variance between output OOB generalization error when changing the ntree and mtry
parameters are related to the balance between increasing the strength of each tree and the
degree to which they are correlated to each other. Increasing the number of randomly
selected split variables improves the strength of any particular tree, but also has a
detrimental effect in that it increases correlation between them, which reduces overall
accuracy when used to predict unseen data. Hence, these parameters should be tuned in
order to determine the values of each that result in the lowest generalization error. The
results of this study confirm that random forest’s default parameters are capable of
producing close to optimal results. The default value for number of random split variables,
mtry, consistently produced the best result.
The number of trees needed to be grown in order to reach a convergence of generalization
error within 0.2% of the best result of the was determined to be 200 with 2 images, 120
reference points per class, and 9 input variables. Rodriguez-Galiano et al. (2012) reached
convergence after 100 trees, also using 2 images, 120 reference points per class, and 9 input
variables. Immitzer et al. (2012) reached a convergence of generalization error after 250 trees
using 1 image, 215 samples per class, and 8 input variables. The results of this study place
my findings in the same neighborhood, demonstrating that random forest’s default value of
ntree=500 is more than sufficient to reach a convergence of OOB generalization error. Beyond
this number, growing more trees made no substantial improvements in the model’s
accuracy, only increasing computation time.
Breiman (2001) states that this convergence phenomenon arises from the Strong Law of
Large Numbers, noting that overfitting does not occur when adding more trees (Feller, 1968).
Indeed, a forest of several hundred trees appears to be sufficient to compensate for the
inherent instability of any individual tree contained in the model, one of Breiman’s objectives
in developing the algorithm. Nonetheless, performing an investigation of the effect of forest
size on estimated generalization error proved useful in tuning this parameter a bit further, as
increasing the number of trees grown to 2000 reduced the estimated generalization error by
0.2% compared with 200 trees. No improvement was made when increasing the number of
trees to 3000.
6. Conclusion
The main sources of error in this study can be attributed to the following: 1) a forest
classification system which contains a mixture of species-specific as well as mixed-species
classes, 2) the use of a May 2 acquisition, the best spring image available, which is likely too
early to capture leaf-on conditions for all target species in the Ekerö region, and 3) reference
data which was not intended specifically for use in this study. Addressing these issues
would likely contribute to better result. Other recommendations to improve accuracy in
future studies include the incorporation of fuzzy logic for mixed forest classes (Foody, 2002),
32
high resultion LiDAR data to assess stand characteristics (Nordkvist et al., 2012), and textural
analysis (Lu et al., 2014). If an orthophoto from a different time period than the satellite
acquisitions is used to check the integrity of reference data, consider the effects of the time
lag (new clearcuts, construction, etc.).
This study sought to address the performance of Sentinel-2 data for use in classifying forest
types in Ekerö, Sweden. To best utilize this data in future studies, several recommendations
are of note. The use of multitemporal data is helpful in separating the classes of forest types
that may be spectrally similar in any single time frame, though one should be careful to
select the appropriate timing of the images to maximize phenological differences, so as to not
add redundant information which will not be of use to the classifier. Reducing the number of
bands via a feature reduction approach to exclude highly-correlated spectral information can
increase overall accuracy. It is suggested to take more than one approach to determine
relative variable importance for optimal band combinations, and in the case of machine
learning algorithms, not relying solely on internal variable importance rankings. Calculating
a correlation matrix can help to make sense of the findings. Ancillary data may complement
the spectral information in a model to produce better results, but the scale and physical
characteristics of the study site should be taken into consideration. Topographic information
alone may not boost performance of the model, but when combined with soil data may
prove useful. While the amount and type of reference data needed to produce an optimal
result depends on a number of factors, consider that more data beats a better model. Lastly,
the random forest machine learning algorithm proved to be a simple, flexible method for
land cover classification analysis. As it is resistant to overfitting and doesn’t rely on
assumptions of data distribution, it appears to be well suited to handle future remote sensing
data that continues to increase in dimensionality.
34
6. References
Abu-Mostafa, Y.S., Magdon-Ismail, M. and Lin, H.T., 2012. Learning from data (Vol. 4). New York, NY, USA:
AMLBook.
Anderson, J.R., 1976. A land use and land cover classification system for use with remote sensor data (Vol. 964). US
Government Printing Office.
Atkinson, P.M. and Tatnall, A.R.L., 1997. Introduction neural networks in remote sensing. International Journal of
remote sensing, 18(4), pp.699-709.
Ban, Y., Gong, P. and Gini, C., 2015. Global land cover mapping using Earth observation satellite data: Recent
progresses and challenges. ISPRS journal of photogrammetry and remote sensing (Print), 103(1), pp.1-6.
Banskota, A., Kayastha, N., Falkowski, M.J., Wulder, M.A., Froese, R.E. and White, J.C., 2014. Forest monitoring
using Landsat time series data: a review. Canadian Journal of Remote Sensing, 40(5), pp.362-384.
Baret, F. and Buis, S., 2008. Estimating canopy characteristics from remote sensing observations: Review of
methods and associated problems. In Advances in land remote Sensing (pp. 173-201). Springer Netherlands.
Benediktsson, J.A., Swain, P.H. and Ersoy, O.K., 1990. Neural network approaches versus statistical methods in
classification of multisource remote sensing data.
Breiman, L., 1996. Bagging predictors. Machine learning, 24(2), pp.123-140.
Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32.
Brisco, B. and Brown, R.J., 1995. Multidate SAR/TM synergism for crop classification in western
Canada. Photogrammetric Engineering and Remote Sensing, 61(8), pp.1009-1014.
Buchanan, B.P., Fleming, M., Schneider, R.L., Richards, B.K., Archibald, J., Qiu, Z. and Walter, M.T., 2014.
Evaluating topographic wetness indices across central New York agricultural landscapes. Hydrology and
Earth System Sciences, 18(8), p.3279.
Büchlmann, P. and Yu, B., 2002. Analyzing bagging. Annals of Statistics, pp.927-961.
Carleer, A. and Wolff, E., 2004. Exploitation of very high resolution satellite data for tree species
identification. Photogrammetric Engineering & Remote Sensing, 70(1), pp.135-140.
Carreiras, J.M., Pereira, J.M., Campagnolo, M.L. and Shimabukuro, Y.E., 2006. Assessing the extent of
agriculture/pasture and secondary succession forest in the Brazilian Legal Amazon using SPOT
VEGETATION data. Remote Sensing of Environment, 101(3), pp.283-298.
Chan, J.C.W. and Paelinckx, D., 2008. Evaluation of Random Forest and Adaboost tree-based ensemble
classification and spectral band selection for ecotope mapping using airborne hyperspectral
imagery. Remote Sensing of Environment, 112(6), pp.2999-3011.
Chen, C., Liaw, A. and Breiman, L., 2004. Using random forest to learn imbalanced data. University of California,
Berkeley, 110.
Chuine, I. and Beaubien, E.G., 2001. Phenology is a major determinant of tree species range. Ecology Letters, 4(5),
pp.500-510.
Congalton, R.G. and Green, K., 2008. Assessing the accuracy of remotely sensed data: principles and practices. CRC
press.
Criminisi, A., Shotton, J. and Konukoglu, E., 2011. Decision forests for classification, regression, density
estimation, manifold learning and semi-supervised learning. Microsoft Research Cambridge, Tech. Rep.
MSRTR-2011-114, 5(6), p.12.
35
Cushnie, J.L., 1987. The interactive effect of spatial resolution and degree of internal variability within land-cover
types on classification accuracies. International Journal of Remote Sensing, 8(1), pp.15-29.
Dalponte, M., Orka, H.O., Gobakken, T., Gianelle, D. and Næsset, E., 2013. Tree species classification in boreal
forests with hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 51(5), pp.2632-2645.
DeFries, R.S. and Chan, J.C.W., 2000. Multiple criteria for evaluating machine learning algorithms for land cover
classification from satellite data. Remote Sensing of Environment, 74(3), pp.503-515.
Díaz-Uriarte, R. and De Andres, S.A., 2006. Gene selection and classification of microarray data using random
forest. BMC bioinformatics, 7(1), p.3.
Domingos, P., 2012. A few useful things to know about machine learning. Communications of the ACM, 55(10),
pp.78-87.
Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P.,
Martimort, P. and Meygret, A., 2012. Sentinel-2: ESA's optical high-resolution mission for GMES
operational services. Remote Sensing of Environment, 120, pp.25-36.
Dymond, C.C., Mladenoff, D.J. and Radeloff, V.C., 2002. Phenological differences in Tasseled Cap indices
improve deciduous forest classification. Remote Sensing of Environment, 80(3), pp.460-472.
Engler, R., Waser, L.T., Zimmermann, N.E., Schaub, M., Berdos, S., Ginzler, C. and Psomas, A., 2013. Combining
ensemble modeling and remote sensing for mapping individual tree species at high spatial
resolution. Forest Ecology and Management, 310, pp.64-73.
European Space Agency. 2017. Sentinel-2 - Missions - Sentinel Online. [ONLINE] Available
at: https://sentinel.esa.int/web/sentinel/missions/sentinel-2. [Accessed 23 May 2017].
Evans, J.S., Murphy, M.A., Holden, Z.A. and Cushman, S.A., 2011. Modeling species distribution and change
using random forest. In Predictive species and habitat modeling in landscape ecology (pp. 139-159). Springer
New York.
Fassnacht, F.E., Latifi, H., Stereńczak, K., Modzelewska, A., Lefsky, M., Waser, L.T., Straub, C. and Ghosh, A.,
2016. Review of studies on tree species classification from remotely sensed data. Remote Sensing of
Environment, 186, pp.64-87.
Fassnacht, F.E., Neumann, C., Förster, M., Buddenbaum, H., Ghosh, A., Clasen, A., Joshi, P.K. and Koch, B., 2014.
Comparison of feature reduction algorithms for classifying tree species with hyperspectral data on three
central European test sites. IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, 7(6), pp.2547-2561.
Feller, W., 1968. An introduction to probability theory and its applications: volume I (Vol. 3). New York: John Wiley &
Sons.
Fernández-Delgado, M., Cernadas, E., Barro, S. and Amorim, D., 2014. Do we need hundreds of classifiers to
solve real world classification problems. J. Mach. Learn. Res, 15(1), pp.3133-3181.
Foody, G.M., 1995. Land cover classification by an artificial neural network with ancillary
information. International Journal of Geographical Information Systems, 9(5), pp.527-542.
Foody, G.M., 2002. Status of land cover classification accuracy assessment. Remote sensing of environment, 80(1),
pp.185-201.
Foody, G.M. and Arora, M.K., 1997. An evaluation of some factors affecting the accuracy of classification by an
artificial neural network. International Journal of Remote Sensing, 18(4), pp.799-810.
Franklin, J., 1998. Predicting the distribution of shrub species in southern California from climate and terrain‐
derived variables. Journal of Vegetation Science, 9(5), pp.733-748.
36
Freund, Y. and Schapire, R.E., 1996, July. Experiments with a new boosting algorithm. In icml (Vol. 96, pp. 148-
156).
Friedl, M.A. and Brodley, C.E., 1997. Decision tree classification of land cover from remotely sensed data. Remote
sensing of environment, 61(3), pp.399-409.
Genuer, R., Poggi, J.M. and Tuleau-Malot, C., 2010. Variable selection using random forests. Pattern Recognition
Letters, 31(14), pp.2225-2236.
Ghimire, B., Rogan, J. and Miller, J., 2010. Contextual land-cover classification: incorporating spatial dependence
in land-cover classification models using random forests and the Getis statistic. Remote Sensing Letters, 1(1),
pp.45-54.
Gislason, P.O., Benediktsson, J.A. and Sveinsson, J.R., 2004, September. Random forest classification of
multisource remote sensing and geographic data. In Geoscience and Remote Sensing Symposium, 2004.
IGARSS'04. Proceedings. 2004 IEEE International (Vol. 2, pp. 1049-1052). IEEE.
Guo, L., Chehata, N., Mallet, C. and Boukir, S., 2011. Relevance of airborne lidar and multispectral image data for
urban scene classification using Random Forests. ISPRS Journal of Photogrammetry and Remote Sensing, 66(1),
pp.56-66.
Guyon, I., Weston, J., Barnhill, S. and Vapnik, V., 2002. Gene selection for cancer classification using support
vector machines. Machine learning, 46(1), pp.389-422.
Hansen, L.K. and Salamon, P., 1990. Neural network ensembles. IEEE transactions on pattern analysis and machine
intelligence, 12(10), pp.993-1001.
Hansen, M., Dubayah, R. and DeFries, R., 1996. Classification trees: an alternative to traditional land cover
classifiers. International journal of remote sensing, 17(5), pp.1075-1081.
Hawotte, F., Radoux, J., Chomé, G. and Defourny, P., 2016. Assessment of Automated Snow Cover Detection at
High Solar Zenith Angles with PROBA-V. Remote Sensing, 8(9), p.699.
Heller, R.C., Doverspike, G.E. and Aldrich, R.C., 1964. Identification of tree species on large-scale panchromatic and
color aerial photographs. US Department of Agriculture, Forest Service.
Hill, R.A., Wilson, A.K., George, M. and Hinsley, S.A., 2010. Mapping tree species in temperate deciduous
woodland using time‐series multi‐spectral data. Applied Vegetation Science, 13(1), pp.86-99.
Ho, T.K., Hull, J.J. and Srihari, S.N., 1990, June. Combination of structural classifiers. In Proc. IAPR Workshop
Syntatic and Structural Pattern Recog (pp. 123-137).
Horler, D.N.H., DOCKRAY, M. and Barber, J., 1983. The red edge of plant leaf reflectance. International Journal of
Remote Sensing, 4(2), pp.273-288.
Huang, C., Davis, L.S. and Townshend, J.R.G., 2002. An assessment of support vector machines for land cover
classification. International Journal of remote sensing, 23(4), pp.725-749.
Hughes, G., 1968. On the mean accuracy of statistical pattern recognizers. IEEE transactions on information
theory, 14(1), pp.55-63.
Immitzer, M., Atzberger, C. and Koukal, T., 2012. Tree species classification with random forest using very high
spatial resolution 8-band WorldView-2 satellite data. Remote Sensing, 4(9), pp.2661-2693.
Immitzer, M., Vuolo, F. and Atzberger, C., 2016. First experience with Sentinel-2 data for crop and tree species
classifications in central Europe. Remote Sensing, 8(3), p.166.
Jensen, J.R., 2005. Introductory digital image processing: a remote sensing perspective (No. Ed. 3). Prentice-Hall Inc.
Upper Saddle River, NJ.Jensen, R.R., Hardin, P.J. and Hardin, A.J., 2012. Classification of urban tree species
using hyperspectral imagery. Geocarto International, 27(5), pp.443-458.
37
Jones, H.G. and Vaughan, R.A., 2010. Remote sensing of vegetation: principles, techniques, and applications. Oxford:
Oxford University Press.
Key, T., Warner, T.A., McGraw, J.B. and Fajvan, M.A., 2001. A comparison of multispectral and multitemporal
information in high spatial resolution imagery for classification of individual tree species in a temperate
hardwood forest. Remote Sensing of Environment, 75(1), pp.100-112.
Kleinberg, E.M., 1990. Stochastic discrimination. Annals of Mathematics and Artificial intelligence, 1(1), pp.207-239.
Kleiner, A., Talwalkar, A., Sarkar, P. and Jordan, M.I., 2014. A scalable bootstrap for massive data. Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 76(4), pp.795-816.
Kohavi, R., 1995, August. A study of cross-validation and bootstrap for accuracy estimation and model selection.
In Ijcai (Vol. 14, No. 2, pp. 1137-1145).
Kotsiantis, S. and Pintelas, P., 2004. Combining bagging and boosting. International Journal of Computational
Intelligence, 1(4), pp.324-333.
KSLA (The Royal Swedish Academy of Agriculture and Forestry), 2015. Forests and forestry in Sweden.
GeoJournal doi: 10.1007/BF00578267.
Laurin, G.V., Puletti, N., Hawthorne, W., Liesenberg, V., Corona, P., Papale, D., Chen, Q. and Valentini, R., 2016.
Discrimination of tropical forest types, dominant species, and mapping of functional guilds by
hyperspectral and simulated multispectral Sentinel-2 data. Remote Sensing of Environment, 176, pp.163-176.
Lawrence, R.L., Wood, S.D. and Sheley, R.L., 2006. Mapping invasive plants using hyperspectral imagery and
Breiman Cutler classifications (RandomForest). Remote Sensing of Environment, 100(3), pp.356-362.
Lee, S.S. and Elder, J.F., 1997. Bundling heterogeneous classifiers with advisor perceptrons. White Paper.
Lippitt, C.D., Rogan, J., Li, Z., Eastman, J.R. and Jones, T.G., 2008. Mapping Selective Logging in Mixed
Deciduous Forest. Photogrammetric Engineering & Remote Sensing, 74(10), pp.1201-1211.
Lu, D., Li, G., Kuang, W. and Moran, E., 2014. Methods to extract impervious surface areas from satellite images.
International Journal of Digital Earth, 7(2), pp.93-112.
Lunetta, R.S. and Balogh, M.E., 1999. Application of multi-temporal Landsat 5 TM imagery for wetland
identification. Photogrammetric Engineering and Remote Sensing, 65(11), pp.1303-1310.
Manyara, C.G. and Lein, J.K., 1994. Exploring the Suitability of Fuzzy Set Theory in Image Classification: A
Comparative Study Applied to the Mau Forest Area Kenya. In Proceedings of American Society for
Photogrammetry and Remote Sensing/American Congress on Surveying and Mapping, Annual Convention (pp.
384-391).
Mayer, B.A. and Kylling, A., 2005. The libRadtran software package for radiative transfer calculations-description
and examples of use. Atmospheric Chemistry and Physics, 5(7), pp.1855-1877.
Melgani, F. and Bruzzone, L., 2004. Classification of hyperspectral remote sensing images with support vector
machines. IEEE Transactions on geoscience and remote sensing, 42(8), pp.1778-1790.
Michie, D., Spiegelhalter, D.J. and Taylor, C.C., 1994. Machine learning, neural and statistical classification.
Mickelson, J.G., Civco, D.L. and Silander, J.A., 1998. Delineating forest canopy species in the northeastern United
States using multi-temporal TM imagery. Photogrammetric engineering and remote sensing, 64, pp.891-904.
Naidoo, L., Cho, M.A., Mathieu, R. and Asner, G., 2012. Classification of savanna tree species, in the Greater
Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data
mining environment. ISPRS Journal of Photogrammetry and Remote Sensing, 69, pp.167-179.
Nordkvist, K., Granholm, A.H., Holmgren, J., Olsson, H. and Nilsson, M., 2012. Combining optical satellite data
and airborne laser scanner data for vegetation classification. Remote sensing letters, 3(5), pp.393-401.
38
Oetter, D.R., Cohen, W.B., Berterretche, M., Maiersperger, T.K. and Kennedy, R.E., 2001. Land cover mapping in
an agricultural setting using multiseasonal Thematic Mapper data. Remote Sensing of Environment, 76(2),
pp.139-155.
Pal, M., 2005. Random forest classifier for remote sensing classification. International Journal of Remote
Sensing, 26(1), pp.217-222.
Pal, M. and Mather, P.M., 2003. An assessment of the effectiveness of decision tree methods for land cover
classification. Remote sensing of environment, 86(4), pp.554-565.
Pant, P., Heikkinen, V., Hovi, A., Korpela, I., Hauta-Kasari, M. and Tokola, T., 2013. Evaluation of simulated
bands in airborne optical sensors for tree species identification. Remote Sensing of Environment, 138, pp.27-
37.
Lenney, M.P., Woodcock, C.E., Collins, J.B. and Hamdi, H., 1996. The status of agricultural lands in Egypt: the use
of multitemporal NDVI features derived from Landsat TM. Remote Sensing of Environment, 56(1), pp.8-20.
Pax-Lenney, M. and Woodcock, C.E., 1997. Monitoring agricultural lands in Egypt with multitemporal landsat
TM imagery: How many images are needed?. Remote Sensing of Environment, 59(3), pp.522-529.
Pontius Jr, R.G. and Millones, M., 2011. Death to Kappa: birth of quantity disagreement and allocation
disagreement for accuracy assessment. International Journal of Remote Sensing, 32(15), pp.4407-4429.
Prasad, A.M., Iverson, L.R. and Liaw, A., 2006. Newer classification and regression tree techniques: bagging and
random forests for ecological prediction. Ecosystems, 9(2), pp.181-199.
Pu, R. and Liu, D., 2011. Segmented canonical discriminant analysis of in situ hyperspectral data for identifying
13 urban tree species. International Journal of Remote Sensing, 32(8), pp.2207-2226.
Reese, H., 2011. Classification of Sweden’s forest and alpine vegetation using optical satellite and inventory data (Vol.
2011, No. 86).
Richter, R. and Schlaepfer, D., 2011. Atmospheric/topographic correction for satellite imagery: ATCOR-2/3 User
Guide Vers. 8.0. 2. DLR—German Aerospace Center, Remote Sensing Data Center.
Roberts, J.J., Best, B.D., Dunn, D.C., Treml, E.A. and Halpin, P.N., 2010. Marine Geospatial Ecology Tools: An
integrated framework for ecological geoprocessing with ArcGIS, Python, R, MATLAB, and
C++. Environmental Modelling & Software, 25(10), pp.1197-1207.
Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M. and Rigol-Sanchez, J.P., 2012. An assessment of
the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry
and Remote Sensing, 67, pp.93-104.
Rogan, J., Franklin, J., Stow, D., Miller, J., Woodcock, C. and Roberts, D., 2008. Mapping land-cover modifications
over large areas: A comparison of machine learning algorithms. Remote Sensing of Environment, 112(5),
pp.2272-2283.
Rogan, J., Miller, J., Stow, D., Franklin, J., Levien, L. and Fischer, C., 2003. Land-cover change monitoring with
classification trees using Landsat TM and ancillary data. Photogrammetric Engineering & Remote
Sensing, 69(7), pp.793-804.
Schriever, J.R. and Congalton, R.G., 1995. Evaluating seasonal variability as an aid to cover-type mapping from
Landsat Thematic Mapper data in the Northeast. Photogrammetric Engineering and Remote Sensing, 61(3),
pp.321-327.
Scornet, E., Biau, G. and Vert, J.P., 2015. Consistency of random forests. The Annals of Statistics, 43(4), pp.1716-
1741.
Seni, G. and Elder, J.F., 2010. Ensemble methods in data mining: improving accuracy through combining
predictions. Synthesis Lectures on Data Mining and Knowledge Discovery, 2(1), pp.1-126.
39
Singh, A., 1989. Review article digital change detection techniques using remotely-sensed data. International
journal of remote sensing, 10(6), pp.989-1003.
Sivapalan, M., Beven, K. and Wood, E.F., 1987. On hydrologic similarity: 2. A scaled model of storm runoff
production. Water Resources Research, 23(12), pp.2266-2278.
Swedish Meteorological and Hydrological Institute. 2017. SMHI Öppna Data - Meteorologiska observationer.
[ONLINE] Available at: http://opendata-download-metobs.smhi.se/explore/#. [Accessed 25 May 2017].
Snedecor, G.W. and Cochran, W.G., 1968. Statistical methods the lowa state university press, lowa. USA pp1-435.
Snee, R.D., 1977. Validation of regression models: methods and examples. Technometrics, 19(4), pp.415-428.
Strobl, C., Boulesteix, A.L., Zeileis, A. and Hothorn, T., 2007. Bias in random forest variable importance measures:
Illustrations, sources and a solution. BMC bioinformatics, 8(1), p.25.
Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T. and Zeileis, A., 2008. Conditional variable importance for
random forests. BMC bioinformatics, 9(1), p.307.
Treitz, P.M., Howarth, P.J., Suffling, R.C. and Smith, P., 1992. Application of detailed ground information to
vegetation mapping with high spatial resolution digital imagery. Remote Sensing of Environment, 42(1),
pp.65-82.
Turner, M.G., 1989. Landscape ecology: the effect of pattern on process. Annual review of ecology and
systematics, 20(1), pp.171-197.
Verrelst, J., Rivera, J.P., Veroustraete, F., Muñoz-Marí, J., Clevers, J.G., Camps-Valls, G. and Moreno, J., 2015.
Experimental Sentinel-2 LAI estimation using parametric, non-parametric and physical retrieval methods–
A comparison. ISPRS Journal of Photogrammetry and Remote Sensing, 108, pp.260-272.
Wager, S., Hastie, T. and Efron, B., 2014. Confidence intervals for random forests: the jackknife and the
infinitesimal jackknife. Journal of Machine Learning Research, 15(1), pp.1625-1651.
Waske, B. and Braun, M., 2009. Classifier ensembles for land cover mapping using multitemporal SAR
imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 64(5), pp.450-457.
Wolter, P.T., Mladenoff, D.J., Host, G.E. and Crow, T.R., 1995. Improved forest classification in the Northern Lake
States using multi-temporal Landsat imagery. Photogrammetric Engineering & Remote Sensing, 61(9),
pp.1129-1143.
Wulder, M.A. and Coops, N.C., 2014. Make Earth observations open access. Nature, 513(7516), p.30.
Yuan, F., Sawaya, K.E., Loeffelholz, B.C. and Bauer, M.E., 2005. Land cover classification and change analysis of
the Twin Cities (Minnesota) Metropolitan Area by multitemporal Landsat remote sensing. Remote sensing of
Environment, 98(2), pp.317-328.
Zimmermann, N.E., Edwards, T.C., Moisen, G.G., Frescino, T.S. and Blackard, J.A., 2007. Remote sensing‐based
predictors improve distribution models of rare, early successional and broadleaf tree species in
Utah. Journal of applied ecology, 44(5), pp.1057-1067.
40
6. Appendix
Table A1. Data inputs.
Name Description
(resolution/MMU)
Date of production/
acquisition
Source
Ekerö prototype biotope database
0.1 – 0.25 ha 2016 © Stockholm University and Metria
DEM 2 m 2009 © Lantmäteriet
Sentinel-2 imagery,
level 2A
10 – 20 m 2016 © Copernicus Sentinel data [2016], ESA
Soil Topographic
Wetness Index
10 m 2016 © Metria
CIR orthophoto 0.25 m 2015 © Lantmäteriet