department of physical geography - su.diva-portal.org1138282/fulltext01.pdf · relating to the...

48
Master’s thesis Physical Geography and Quaternary Geology, 60 Credits Department of Physical Geography Evaluating Multitemporal Sentinel-2 data for Forest Mapping using Random Forest Marc Nelson NKA 189 2017

Upload: lyxuyen

Post on 30-Mar-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Master’s thesisPhysical Geography and Quaternary Geology, 60 Credits

Department of Physical Geography

Evaluating Multitemporal Sentinel-2 data for Forest

Mapping using Random Forest

Marc Nelson

NKA 1892017

Preface

This Master’s thesis is Marc Nelson’s degree project in Physical Geography and Quaternary

Geology at the Department of Physical Geography, Stockholm University. The Master’s

thesis comprises 60 credits (two terms of full-time studies).

Cooperation with Metria.

Supervisors have been Helle Skånes and Marika Wennbom at the Department of Physical

Geography, Stockholm University. Examiner has been Ian Brown at the Department of

Physical Geography, Stockholm University.

The author is responsible for the contents of this thesis.

Stockholm, 4 September 2017

Steffen Holzkämper

Director of studies

i

Abstract

The mapping of land cover using remotely sensed data is most effective when a robust

classification method is employed. Random forest is a modern machine learning algorithm

that has recently gained interest in the field of remote sensing due to its non-parametric

nature, which may be better suited to handle complex, high-dimensional data than

conventional techniques. In this study, the random forest method is applied to remote

sensing data from the European Space Agency’s new Sentinel-2 satellite program, which was

launched in 2015 yet remains relatively untested in scientific literature using non-simulated

data. In a study site of boreo-nemoral forest in Ekerö mulicipality, Sweden, a classification is

performed for six forest classes based on CadasterENV Sweden, a multi-purpose land cover

mapping and change monitoring program. The performance of Sentinel-2’s Multi-Spectral

Imager is investigated in the context of time series to capture phenological conditions,

optimal band combinations, as well as the influence of sample size and ancillary inputs.

Using two images from spring and summer of 2016, an overall map accuracy of 86.0% was

achieved. The red edge, shortwave infrared, and visible red bands were confirmed to be of

high value. Important factors contributing to the result include the timing of image

acquisition, use of a feature reduction approach to decrease the correlation between spectral

channels, and the addition of ancillary data that combines topographic and edaphic

information. The results suggest that random forest is an effective classification technique

that is particularly well suited to high-dimensional remote sensing data.

ii

iii

Table of contents

Abstract…………………………………………………………….………......………………..……i

Table of contents…………………………………………………….……...……………………...iii

List of Tables and figures…………………………………………………….……...……………iv

1. Introduction………………………………………………………..……….………………………1

2. Background……………………………………………………………….…………………..……3

2.1 CadasterENV and reference data………………………………………………………..…….3

2.2 Machine learning and Random Forest classification………………………….………...……5

2.3 Accuracy Assessment…………………………………………………….………..……….8

3. Methods……………………………………………….…………………………………...….……8

3.1 Study area………………………………………………………………….…….….….….8

3.2 Satellite and ancillary data……………………………………………….…….…….……9

3.3 Implementation of random forest method………………………………..…..……..……12

3.4 Multitemporal imagery………………………………………………..…..….….…..…..14

3.5 Ancillary data…………………………………………………………..….….…...……..14

3.6 Sample size……………………………………………………………..……….…...……14

3.7 Optimal band combinations and variable importance………………..………....………..14

3.8 Optimal random forest parameters………………………………...………….…………15

3.9 Validation………………………………………………………………..…….………….16

4. Results……………………………………………….………………………………….………….17

4.1 Multitemporal imagery………………………………………………..…..……....……..17

4.2 The influence of ancillary data on random forest classification…………………...……..18

4.3 The influence of sample size on classification accuracy……………………….………….18

4.4 Band combinations and variable importance………………………………….………….19

4.5 Optimal random forest parameters………………………………………….……...…….21

4.6 Final result: random forest classification map…………………………….…...…………21

5. Discussion……………………………………………….………………………....……………..24

5.1 Multitemporal imagery………………………………………….…….………………….24

5.2 Ancillary data…..………………………………………………….………………….…..26

5.3 Sample size……………………………………………….……….………………………28

5.4 Band combinations and variable importance………………….………………………….29

5.5 Optimal random forest parameters……………………….………………………………31

6. Conclusion……………………………………………….………….………………………….…31

7. References……………………………………………….…………….…………….……….……34

8. Appendix………………………………………………..……….……………………………….40

iv

List of Tables and figures

Table 1: CadasterENV forest class definitions………………………………………………………….……4

Table 2: Correlation coefficients matrix for selected Sentinel-2 bands, July 21 image………..…….…..20

Table 3: Confusion matrix for random forest classification. ………………………….………………..…23

Table A1: Data inputs. …………………………………………………………………………………...……40

Figure 1: Sentinel-2 MSI bands vs. spatial resolution. ……………………………………………….….….2

Figure 2: Visualization of random forest classification. …………………………………..…………...……7

Figure 3: (a) Location of Sentinel-2 tile 33VXF (outlined in red), (b) location of Ekerö municipality

(outlined in green). ……………………………………………….………………………………….9

Figure 4: Mean spectral signatures (TOA reflectance) of CadasterENV classes for selected Sentinel-2

bands derived from reference data. …………………………………………….…………….…..11

Figure 5: Random forest method flowchart. ………………………………………………….……….……13

Figure 6: Overall accuracies for various Sentinel-2 scene image combinations. …………………..……17

Figure 7: Sample size vs overall accuracy, where training samples are randomly removed from the

model. …………………………………………..……………………………………………………19

Figure 8: CadasterENV map produced using random forest classification, Ekerö municipality….….22

Figure 9: (a) Example portion of CadasterENV map produced using random forest classification, (b)

2015 CIR orthophoto for comparison. ……………………………………………………………23

Figure 10: Class distribution (% forest area) among forested areas in Ekerö municipality…….….…..24

1

1. Introduction

Land cover mapping takes an essential role in various applications relating to land

management and conservation, particularly with regards to predicting the geographic

distribution and biophysical dynamics of natural and agricultural areas, providing vital

information to a wide range of applied research (Ban et al., 2015; Baret and Buis, 2008; Foody,

2002). Remote sensing methods have become integral in the field of Landscape Ecology,

where environmental patterns influence ecological processes and shape the interactions

between organisms and their environment. In this context, landscape can be defined as an

area of land consisting of a mosaic of different habitats, with interactions between Earth’s

surface/atmosphere and the organisms that inhabit it, the scale of which is dependent on the

target organism(s) of interest (Turner, 1989). Earth observation missions via remote sensing

provide access to spatial and temporal realms of landscape observation unavailable to

researchers restricted to field investigation.

Machine learning techniques have become increasingly popular for a range of applications

relating to the classification of data across the scientific community and notably within a

remote sensing context, due in part to the evolving nature of satellite datasets, which have

become progressively larger and more complex over time. A wide variety of classification

methods are available, and the determination of which algorithms are best-suited to specific

applications has been of great interest to researchers (Elder and Lee, 1997; Michie et al., 1994).

Recently, there has been a focus on aggregating classification models into “ensembles”, due

to their ability to further improve the accuracy of well-established classification algorithms.

The ensemble phenomenon was noticed independently by several researchers who were

investigating methods ranging from decision trees (Ho et al., 1990), mathematics theory

(Kleinberg, 1990), and neural networks (Hansen and Salamon, 1990). They were further

developed via the bootstrap aggregation technique by Breiman (1996) and the adaptive

boosting methods of Freund and Shapire (1996). These techniques were soon adopted by the

remote sensing community for use in classifying land cover using satellite imagery (Pal,

2005; Gislason, Benediktsson, and Sveinsson, 2006).

The European Space Agency’s Sentinel-2 satellites were designed to improve upon the

technology and experience drawn from the SPOT and Landsat missions of previous decades

(Drusch et al., 2012). The Sentinel-2 mission comprises a pair of polar orbiting satellites

carrying a Multi-Spectral Imager (MSI) sensor with a spatial resolution of 10 meters in the

visible and near infrared bands, 20 meters in 3 Red-edge bands and 2 shortwave infrared

bands, and 60 meters in 3 atmospheric bands (Figure 1). While not a hyperspectral system,

Sentinel-2’s innovative sensors allow for more spectral continuity than preceding multi-

spectral systems aboard the Landsat 8 and SPOT6/7 satellites, earning it the informal

designation as a “super-spectral” imager (Verrelst et al., 2015). Particularly promising are the

red edge bands, due to their capacity to detect slight differences in chlorophyll content,

2

making Sentinel-2 theoretically well-suited for forestry applications (Horler, Dockray, and

Barber, 1983; Laurin et al., 2016).

Figure 1. Sentinel-2 MSI bands vs. spatial resolution. The aerosol, water vapor, and cirrus bands (B1, B9, B10) were not used in this study (ESA, 2017).

Other key features of the Sentinel-2 mission include the MSI’s wide swath (290 km) and its

low revisit time. Sentinel-2A was launched in June 2015 and has been providing images of

earth’s surface, on a global scale, at 10-day intervals at the equator. Sentinel-2B was launched

into the same orbit, phased 180° apart from Sentinel-2A, on March 7, 2017, bringing the

temporal resolution of the Senintel-2 mission down to 5 days at the equator, and 2-3 days at

mid-latitudes (ESA, 2017). The high temporal resolution enables unprecedented detail in the

monitoring of vegetation phenology and biotic/abiotic land surface changes for a freely-

available, non-commercial system (Banskota et al., 2014; Wulder and Coops, 2014). To the

best knowledge of the author, only one study evaluating Sentinel-2’s spectral bands in the

context of forest classification using non-simulated data currently exists, the work of

Immitzer, Vuolo, and Atzberger (2016).

The classification of heterogeneous landscapes with low inter-class and high intra-class

spectral variability poses a challenge in remote sensing research (Ghimire et al., 2010).

Several studies have shown that the use of multi-seasonal acquisitions can help increase

spectral separation between land cover types, though the datasets have a higher degree of

complexity (Lunetta and Balogh, 1999; Oetter et al., 2001; Waske and Braun, 2009; Wolter et

al., 1995; Yuan et al., 2005). With such high-dimensional, multitemporal satellite imagery,

machine learning algorithms have demonstrated superior accuracy and efficiency in the

classification of data over conventional parametric techniques such as the widely-used

maximum likelihood classification method (Hansen et al., 1996; Huang et al., 2002; Rogan et

al., 2003).

3

A simple, robust classification method that is able to handle noise, complex measurement

spaces, and a limited number of reference samples in relation to the size of the study area

would be of great value to the field of remote sensing (DeFries and Chan, 2000; Rogan et al.,

2008). As the Sentinel-2 mission is relatively new and thus the available literature on the

performance of its spectral channels in the context of forest classification is limited, this study

seeks to contribute to the state-of-the-art by investigating the following research questions:

(1) how can Sentinel-2 be best utilized for future land cover classification studies, at a

regional to national scale? (2) How do band combinations, time series, ancillary data, and

sample size affect the map accuracy, and how can these variables be adjusted to produce an

optimal result? (3) How effective are machine learning techniques in the context of remote

sensing?

2. Background

2.1 CadasterENV and reference data

CadasterENV Sweden is a multi-purpose land cover mapping and change monitoring

program with the goal of building a homogenous, nationwide land cover database. Funded

by the European Space Agency, the primary users include the Swedish Environmental

Protection Agency, Swedish Forest Agency, County Administration Boards of Sweden, the

Board of Agriculture, Statistics Sweden, and the Cadaster and Land Registration Authority.

It is currently under development, with preliminary versions of 3 different test plots based

on SPOT-5 and Pleiades satellite data combined with ancillary information including LiDAR-

derived metrics, with classifications performed using the Maximum Likelihood classification

method (MLC) in ERDAS IMAGINE software. CadasterENV was designed to maintain full

compatibility regarding thematic details with the older KNAS (Kontinuerlig

Naturtypskartering Av Skyddade områden) land cover database, which has been used since

2002 to communicate Sweden’s land cover statistics both domestically and internationally.

The most recent version of CadasterENV uses the Maximum Likelihood classification

technique with new Sentinel-2 imagery.

A description of the CadasterENV thematic classes are shown in Table 1. Due to a lack of

distinct samples from the reference data and similarity to class 1.1.6, the deciduous forest

with hardwood class (1.1.7) was grouped with deciduous hardwood forest, forming a single

class for deciduous forest containing hardwoods, labeled (1.1.6) for simplicity. The

definitions of these classes are shown in Table 1.

4

Table 1. CadasterENV forest class definitions. Class 1.1.7 was grouped with 1.1.6 for analysis in this study.

Code Map color Name Definition

1.1.1 Pine forest >70% CC = pine

1.1.2 Spruce forest >70% CC = spruce

1.1.3 Mixed coniferous forest >70% CC = pine or spruce, neither having >70% CC

1.1.4 Mixed coniferous / deciduous forest Neither coniferous nor deciduous forest has >70% CC

1.1.5 Deciduous forest >70% CC = deciduous (non-hardwood)

1.1.6 Deciduous hardwood forest >70% CC = deciduous, where >50% = hardwood

1.1.7 Deciduous forest with hardwood >70% CC = deciduous, where 20%-50% = hardwood

The forest classes defined for the CadasterENV product relate to the user requirements of the

various national agencies and institutions that intend to use the map product for their

specific operational needs, particularly the Swedish Environmental Protection Agency.

CadasterENV’s mixture of both single-species classes (e.g. Pine forest) with classes that can

contain multiple species (e.g. Deciduous forest) represents a departure from common

practices regarding the categorization of land cover components (Anderson, 1976; Fassnacht

et al., 2016). Fassnacht et al. (2016) states that any species level classification should by

definition be for single trees or stands consisting only of a single species, and approaches

focusing on larger spatial units that do not allow for the separation of individual trees are

only appropriate for classification of whole forest stands or to assess species mixtures. This is

the case with using Sentinel-2’s resampled 10 meter pixel resolution to map Swedish forest

types, where the aerial extent of tree crowns of primary species (apart from oak) do not

typically reach 10 m2 and thus can be considered inappropriate for species level classification

at the individual tree scale. Furthermore, pixel-based classification of forest groups

containing mixed species, where the spatial resolution of the sensor is too low to distinguish

individual tree crowns, is an approach that has been avoided due to interactions between

background signature and intra-species variability. This has been shown to obfuscate

classifications of stands comprising of a single species, especially when the geographic extent

involves small scales over large areas. It is in these cases where the co-occurrence of same-

species stands but with dissimilar canopy closure and background signal can introduce

classification problems due to the reduction of statistical separability of different classes in

the spectral space (Carleer and Wolff, 2004; Cushnie, 1987).

It is important to note that CadasterENV is affected by this problem of classifying mixed

classes at the pixel level, but it is ultimately the end user who decides the data structure that

is best suited for their analytical or operational needs (Fassnacht et al., 2016; Foody, 2002).

This demonstrates how the remote sensing perspective, delineating forest types in terms of

percentages of species-specific crown cover present in the upper canopy layer, often doesn’t

align with standard definitions in traditional applications where forest stand composition

and species mixture is often characterized as species-specific basal area or standing timber

volume measurements (Fassnacht et al., 2016). Since most CadasterENV applications involve

5

the sustainable management of forest resources, it follows that thematic classes may reflect

this delineation bias.

2.2 Machine learning and Random Forest classification

Machine learning is a term used to describe a system of automatic learning via the

generalization of information. Machine learning algorithms have emerged as a superior

method for classification of complex data, with applications ranging from large scale

association studies for genetic diseases to air quality prediction to growth models for

agricultural crop yield estimation (Scornet, Biau, Vert, 2015). They are particularly well

suited to models with high data dimensionality, such as multitemporal spectral data in

remote sensing studies, due to their non-parametric nature (Hansen et al., 1996; Huang et al.,

2002; Rogan et al., 2003). Non-parametric classifiers differ from conventional parametric

classification methods such as the widely-used maximum likelihood classification (MLC) in

that they do not rely on assumptions of data distribution e.g. normality, a function of mean

and variance, and are generally more accurate than parametric techniques (Foody, 1995;

Friedl and Brodley, 1997). Breiman (2001) notes that data in ecological models often do not

conform to assumptions of independence, homoscedasticity, and multivariate normality.

Another drawback with parametric classifiers such as MLC are their sensitivity to the

Hughes phenomenon, where with a fixed number of training samples, predictive power

decreases as the number of input variables increases (Dalponte et al., 2013; Hughes, 1968).

Waske and Braun (2009) suggested that due to the relative independency of the spectral

content of satellite acquisitions from different dates, a method based on random selection of

input features is well-suited to analyses employing multitemporal imagery. Immitzer et al.

(2012) and Pant et al. (2013) have also confirmed that a method which employs random

subset selection of the data performs well when using mixed sets of inputs to classify data, as

is the case with multitemporal satellite imagery.

Random forest is a type of non-parametric machine learning algorithm. This method has

been selected to perform a pixel-based classification for this study, due to its combination of

ease-of-use, robustness to noise, as well as its demonstrated performance in past remote

sensing studies (Pal, 2005; Rodriguez-Galiano et al., 2012). Additionally, random forest is

computationally light as well as being simple to set up and automate compared to other non-

parametric classifiers that have produced highly accurate results in land cover mapping,

such as Support Vector Machine (Atkinson and Tatnall, 1997; Fassnacht et al., 2016). Carreiras

et al. (2006) found that a decision tree based classifier ensemble for mapping agricultural and

pasture land using multitemporal SPOT4 imagery significantly outperformed several other

common approaches (MLC, k-nearest neighbor, and simple decision tree) when using the

same study area and reference data. Furthermore, the availability of open-source software

implementations such as R and Python has facilitated the proliferation of non-parametric

machine learning techniques.

6

One of the key advantages of random forest is its ability to exploit the strengths of an

individual group of classifiers while avoiding the weaknesses of any single classifier

(Ghimire et al., 2010; Kotsiantis and Pintelas, 2004). It does not classify data with the

assumption of normality, allowing more complex patterns and relationships to be identified.

A “forest” of binary decision trees is grown via bootstrap aggregation (also referred to as

“bagging”) of samples from an original dataset. A random subset of input variables are then

chosen as the node split criterion for each tree, and the best split is calculated within this

subset based on the Gini criterion, a measure of impurity (Breiman, 2001) (Figure 2a, 2b). The

random subset selection reduces the strength of any individual tree, but also decreases the

degree to which they are correlated. Some training data may not be used at all in the model,

while others may be used multiple times. As a result, greater classifier accuracy and stability

is achieved as the model remains robust when facing slight variations in the input data as it

is permuted over hundreds or thousands of votes (Breiman, 2001; Rodriguez-Galiano et al.,

2012).

The classification of new data occurs by taking the majority vote among the outcomes of all

decision trees constructed in the forest (Genuer, Poggi, Tuleau-Malot, 2010; Immitzer et al.,

2012) (Figure 2c). Similar studies have confirmed that random forest’s bootstrap aggregation

technique allows classification models to be less sensitive to noise, which may be defined as

errors in the data which are unrelated to target reflectance (Chan and Paelinckx, 2008; Pal

and Mather, 2003; Rodriguez-Galiano et al., 2012). Bagging also reduces sensitivity to

overfitting, where a model is too tailored to a specific dataset to a point which reduces its

ability to accurately generalize the underlying trends to unseen data.

7

Figure 2. Visualization of random forest classification. (a) Reference data (v) is labeled according to in-situ species composition. (b) This data set is fitted to a random forest model in the training step where a ‘forest’ of binary classification trees of a random subset of input variables is ‘grown’. (c) The forest is then used in the prediction of new data (Criminisi et al., 2011).

The original random forest algorithm from Breiman’s seminal paper describes its practical

advantages, being able to run efficiently on large databases, handle a large number of input

variables, and provide an internal unbiased estimate of the generalization error and input

variable importance (Breiman, 2001). To estimate classification error, approximately 37% of

the original dataset at each bootstrap iteration are left out of the bootstrap sample, referred to

as the out-of-bag (OOB) data. Each decision tree constructed in the original dataset is applied

to the set of OOB samples, then compared to the true class label to generate an unbiased

approximation of the model’s generalization error. Theoretically, this feature negates the

need for independent validation (Breiman, 2001; Lawrence et al., 2006; Prasad et al., 2006).

The OOB samples are also used, through random permutation of the input variables, to

calculate mean decrease in accuracy (MDA) of a variable, or the degree to which accuracy of

the model decreases when a particular variable is removed (Breiman, 2001; Immitzer et al.,

2012). As a result, the importance of a model’s input variables can be ranked. This variable

importance metric is a calculation of the difference between the misclassification rate of the

c

8

randomly permuted OOB data for each variable, divided by the standard error. This feature

is particularly valuable in a remote sensing context due to the common research question of

identifying which satellite bands contribute most to successful classification, and to what

degree (Fassnacht et al., 2016). For these reasons, the random forest approach appears well-

suited to handle the influx of new remote sensing data, the size and dimensionality of which

continue to increase over time.

2.3 Accuracy Assessment

One of the most useful facets of the random forest algorithm is its ability to generate an

internal, unbiased estimate of classification error from the OOB dataset (Breiman, 2001).

While this is an attractive feature for purposes of estimation and testing model sensitivity,

most of the literature pertaining to random forest classification doesn’t rely on it exclusively

to report accuracy levels, with some exceptions (Naidoo et al., 2012). Indeed, Evans et al.

(2011) specifically recommends the inclusion of cross-validation in random forest studies,

even while acknowledging random forest’s general ability to produce reliable accuracy

estimates without it.

The kappa statistic has long been used as a measure of map agreement for classification

accuracy assessment. In recent years, however, researchers have demonstrated kappa’s

tendency to produce redundant or misleading information in certain circumstances. After

reviewing the findings of Pontius Jr and Millones (2011), it was decided that the kappa

statistic would be avoided in favor of cross validation and its derivatives: overall, user’s, and

producer’s accuracy. Overall accuracy is simply the percentage of samples correctly

classified from the entire sample dataset. Overall accuracy does not, however, reveal how

error is distributed between classes. User’s and producer’s accuracy are ways of representing

individual category accuracies, and are obtained by comparing the predicted data with the

field reference points. The difference between them lies in the perspectives of the map

producer in identifying the integrity of pixel classifications on the map, and the field user

determining whether the map classification is actually represented on the ground (Congalton

and Green, 1999).

3. Methods

3.1 Study area

The study area for this project is the municipality of Ekerö, a group of islands in lake

Mälaren in southeast Sweden (Figure 3). With a land surface area of approximately 218 km2,

Ekerö consists of rural landscape dominated by boreo-nemoral forest, with elevations

between 0 and 82 meters above sea level (KSLA, 2015). The average annual temperature is

approximately 7.6 °C, and the mean annual precipitation is 531 mm (SMHI, 2017). Primary

tree species include Scots pine (Pinus silvestris), Norway spruce (Picea abies), pendunculate

oak (Quercus robur), European aspen (Populus tremula), black alder (Alnus glutinosa), and

9

birch species (Betula sp.). Other species include Norway maple (Acer platanoides), European

ash (Fraxinus excelsior), small-leaved lime / Linden (Tilia cordata), and willow species (Salix

sp.).

Figure 3. (a) Location of Sentinel-2 tile 33VXF (outlined in red), (b) location of Ekerö municipality (outlined in green).

3.2 Satellite and ancillary data

The use of multitemporal imagery in the classification of land cover, which is characterized

not only by spatial patterns but also its temporal dynamics, has been shown to increase

spectral separability between land cover classes as it represents the phenological vegetation

condition (Lunetta and Balogh, 1999; Oetter et al., 2001; Wolter et al., 1995; Waske and Braun,

2009). A combination of acquisitions at different times of the year are preferred because they

possess seasonal variations relevant to the discrimination of surface types (Brisco and Brown,

1995; Pax- Lenney et al., 1996; Pax-Lenney and Woodcock, 1997; Schriever and Congalton,

1995).

A total of three Sentinel-2 images of the Stockholm/Ekerö region from 2016 were included for

use in this study (see Appendix). The dates May 2, July 21, and August 28 were selected as

they 1) contain negligible cloud and/or haze cover, and 2) occur as close to the vegetation

period (after leaf foliation and before defoliation of principal broadleaved tree species) as

a

b

10

possible. It should be noted that the May 2 image is likely a bit too early in the season to be

an ideal springtime acquisition, with some of the principal tree species in Ekerö likely to not

yet have undergone leaf setting. Nonetheless, it was the best springtime scene available due

to heavy cloud/haze cover in the rest of the May – early June images. Outside of the

vegetation season, leaf defoliation causes the spectral information to relate more to the

understory vegetation layer than the canopy layer for broadleaved species (Jensen et al.

(2012).

The images were accessed in the Level 2A processing designation, which includes

radiometric and geometric corrections with ortho-rectification and spatial registration on a

global reference system with sub-pixel accuracy. Atmospheric correction is performed using

ESA’s S2AC algorithm, based on Atmospheric/Topographic Correction for Satellite Imagery

(ATCOR; Richter and Schlaepfer, 2011), which employs the LIBRADTRAN radiative transfer

model (Mayaker and Kylling, 2005). Sentinel-2’s Level 2A product provides Bottom Of

Atmosphere (BOA) reflectance, which is derived from the associated Level 1C product and

processed using ESA’s Sentinel-2 Toolbox. Resampling is performed with a constant Ground

Sampling Distance of 10, 20, and 60 m depending on the spatial resolution of the different

spectral bands (ESA, 2017). All bands are resampled to 10 meter spatial resolution for use

within this project.

11

Figure 4. Mean spectral signatures (BOA reflectance) of CadasterENV classes for selected Sentinel-2 bands derived from reference data. (a) Average mean values extracted from May 2 image, (b) from July 21 image.

Following Franklin (1998) and Rodriguez-Galiano et al. (2012), topographic variables

including the Digital Elevation Model (DEM) derivatives slope, aspect, and curvature at 2

meter spatial resolution were used as ancillary data (see Appendix). In addition, a modified

Topographic Wetness Index, a relative measure of moisture status (see Appendix) was

included (Buchanan et al., 2014). The modification comes in the form of being a combination

of two rasters with different weights: a Depth-to-water (70% weight) raster containing soil

data, resampled from 2 to 10 meter spatial resolution, and a standard Topographic Wetness

Index (TWI), derived from a DEM at 10 meter resolution (30% weight). Taken together, the

combined product called Soil Topographic Index (STI) provides information on both

topographic and edaphic conditions, a proxy for soil transmissivity. Compared to a standard

TWI, the STI incorporates the role of soil conductivity, allowing for better predictions of

potentially saturated areas where soils are not uniform (Sivapalan and Wood, 1987).

Buchanan et al. (2014) found that a weighted modified STI correlated better with observed

soil moisture patterns than TWI (based solely on topography) alone.

0

500

1000

1500

2000

2500Pine

Spruce

Mix coniferous

Mix forest

Deciduous

Dec hardwood

0

500

1000

1500

2000

2500

3000

3500

4000

a b

12

3.3 Implementation of random forest method

The random forest process as implemented in this study is shown in Figure 5. Three Sentinel-

2 scenes of the Ekerö/Stockholm region (Sentinel-2 tile 33VXF) on the dates May 2, July 21,

and August 28, 2016, were downloaded in the level 2A processing designation. Spectral

information and ancillary data were extracted to field reference points, which had been

assigned a CadasterENV class and visually confirmed with an orthophoto. A random forest

model was then fitted to this data, and the OOB error estimate was used to iteratively

determine optimal input parameters in terms of band combinations, time series, ancillary

data, ntree/mtry parameters, and sample size. The fitted model with the lowest estimated

OOB error was used to predict a raster, the accuracy of which was verified via 10-fold cross

validation. This raster may be used to compare to an existing map product and/or determine

which areas to focus future field data collection, though this step was not undertaken in this

study due to the uncertainty of the existing CadasterENV product’s accuracy.

Reference data was collected by the municipality of Ekerö during the summer of 2016

(unpublished report). A total of 663 samples were collected based on a random stratified

sampling strategy of land cover types from an existing biotope database (see Appendix).

Each of these points were assigned a CadasterENV class based on the description of the site

species composition. The desired number of samples per class were then chosen following

previous studies employing the random forest algorithm where the training data contained

an equal number of samples per class to avoid feature bias (Guo et al., 2011; Rodriguez-

Galiano et al., 2012). Chen et al. (2004) reported that disproportionate representation of input

variables in a random forest classifier resulted in biased classifications due to over-

representation of the bootstrap sample of the majority class (having the greatest frequency of

training samples contributing to classification), causing the minority class to be under-

represented.

13

Figure 5. Flowchart, random forest method as implemented in this study. Optional further steps shown in grey (not performed in this study).

To avoid this problem, supplementary training data was added via 2D aerial photography

interpretation, successfully employed in a random forest for land cover classification context

by Immitzer et al. (2012). A color infrared orthophoto from 2015, 0.25 m spatial resolution,

provided by Lantmäteriet (Swedish Land Survey), was visually interpreted on screen in

ArcMap software version 10.3.1 to confirm the integrity of the existing reference data. The

original reference dataset was determined to contain 2 classes that had over 120 records, so a

sample upscaling method was chosen to balance the dataset. Supplementary records were

added using a random stratified approach based on the current version of the CadasterENV

product, resulting in a proportionate training set of 720 samples, or 120 samples per class.

This more than covers the general recommendation of having a minimum of 20 to 100

samples per class by Congalton and Green (2008) to account for intra-class spectral

variability. To determine the effect of sample size on overall map accuracy, an additional 30

14

samples were added to each class for a sample size accuracy assessment, though these extra

samples were not included for use in producing the final classification map.

3.4 Multitemporal imagery

To examine the effect of using multiple acquisition scenes, a random forest model was fitted

to 7 different time series combinations, using all available bands: May 2 single image, July 21

single image, August 28 single image, May 2 and July 21 combination, May 2 and August 28

combination, July 21 and August 28 combination, and May 2, July 21, and August 28

combination. The results were verified via 10-fold cross validation, and the overall accuracies

of which were compared to determine the optimal time series combination.

3.5 Ancillary data

To test the potential impact of adding ancillary data, random forest models were fitted with

common topographical metrics, including slope, aspect, and curvature. In addition, a

modified Soil Topographic Index (STI) was tested. The effect on estimated OOB error of

these inputs were evaluated with both single-date spectral data as well as various band

combinations of multitemporal data. The results were verified using 10-fold cross validation,

and the resulting overall accuracies of the maps produced were used to determine which, if

any, ancillary data was useful to the model.

3.6 Sample size

As the collection of field data requires time and human labor, but is nonetheless essential in

validating the results of remote sensing studies, it is of great importance to determine the

amount of samples needed to produce a desired classification result (Lippitt et al., 2008; Pal

and Mather, 2003; Rogan et al., 2003, 2008). To examine the effect of sample size on

classification accuracy, samples were randomly removed from the dataset and set aside for

testing classification accuracy. The following sample numbers per class were tested: 150

(baseline), 100, 50, 25, and 10. Because the samples randomly removed were set aside for

testing, 100 training samples meant 50 testing samples, 50 training samples meant 100 testing

samples, 25 training samples meant 125 testing samples, and 10 training samples meant 140

testing samples. The overall accuracies of maps produced from models fitted with the

various sample size amounts were then compared to one another.

3.7 Optimal band combinations and variable importance

In classification analyses, the determination of variable importance plays a critical role in the

interpretation of data and in understanding the underlying phenomena that influence the

classifier (Strobl et al., 2007). Feature reduction approaches are commonly used in order to

determine the best predictors of hyper-dimensional feature spaces (Fassnacht et al., 2016).

Furthermore, they have been shown to improve classification accuracies in a number of

studies (Clark and Roberts, 2012). There are two types of feature reduction techniques:

15

feature selection and feature extraction. Feature extraction procedures, such as the widely

used principal component analysis (PCA), generate a new, reduced set of bands in which the

information content is refined to minimize correlation (Singh, 1989). In contrast, feature

selection techniques identify a subset of the original variables, which allow for the

interpretation of the importance of selected predictors. Although Fassnacht et al. (2014)

found that feature selection approaches are less efficient in improving classification accuracy,

this study aims to identify the most useful inputs among the original predictors, to assess the

performance of Sentinel-2’s MSI system. Therefore, a stepwise feature selection approach

was taken.

To determine the relative importance of each spectral band, an iterative feature selection

approach was carried out, following the results of Guyon et al. (2002), who investigated

different variable selection methods in a machine learning context. Reflectance values for all

10 bands, for all three acquisition dates (May 2, July 21, August 28, 2016), were extracted to

the reference data points. The default settings for the random forest model were used (ntree

= 500, mtry = 5 [square root of # of inputs, rounded down]) to fit a random forest model. A

Recursive (or backward) Feature Elimination was then employed (Kohavi, 2000). The process

was carried out in two different ways: first, individual bands were eliminated stepwise from

the model based on their ranking in the random forest algorithm’s internal variable

importance score, based on Mean Decrease in Accuracy.

To test the algorithm for any variable selection bias, a second method was tested, removing

individual bands stepwise based on their performance in terms of actual OOB error estimate

averaged over 5 separate trials. These two approaches, beginning with all 10 bands for all 3

images, were carried out until only 3 bands for each image remained. The results were

confirmed via 10-fold cross validation. Additionally, a correlation matrix was calculated to

determine the degree to which the different bands are correlated (Table 2). A correlation

matrix shows the correlation coefficients that represent the relationship between two

variables, a measure of dependency (Snedecor and Cochran, 1968). Here, cell values of the

individual band rasters of the Sentinel-2 tile 33VXF for the May 2 image are presented in

terms of their relationship to another individual band layer, calculated as a ratio of the

covariance between them divided by the product of their standard deviations.

3.8 Optimal random forest parameters

Only two parameters must be set in the random forest model: the number of trees to grow

(ntree), and the number of random split variables (mtry). Due to the Strong Law of Large

Numbers, generalization error converges after a certain number of trees are grown, thus the

random forest algorithm doesn’t overfit the data when using large values for ntree (Feller,

1968). As with the random subset selection feature previously mentioned, reducing the

number of split variables (mtry) diminishes the classification strength of the individual trees,

but makes them less correlated to each other (Breiman, 1996). Hence, it is important to strike

16

the correct balance between ntree and mtry with the combination that produces the lowest

generalization error.

There is also the option to sample with or without replacement, which can be described as

“putting back” samples after they’ve been randomly selected, allowing them to be selected

again or not. Sampling without replacement was chosen for all fitted models in this study

after reviewing the findings of Strobl et al. (2007), who established that sampling with

replacement can introduce bias into the variable selection mechanism. The random forest

process was implemented in the Marine Geospatial Ecology Tools toolbox in ArcMap

version 10.3.1 (Roberts et al., 2010).

For the time series and band combination where estimated OOB error rate was lowest,

following the results of the previous steps, a number of different values of trees grown

(ntree) and number of randomly selected split variables (mtry) were tried. Since the random

forest algorithm’s default value for ntree is 500, and values used in several previous random

forest studies ranged from 100 to 2500, the following values of ntree were tested: 50, 100, 200,

300, 400, 500, 700, 1000, 1500, 2000, and 3000. As the default value for mtry is the square root

of the number of input variables, rounded down, and the model where maximum number of

inputs for all band and image combinations was 31 (3 images, 10 bands, STI), values of mtry

between 1 and 5 were tested, for all combinations of ntree. The combination of values for

ntree and mtry that produced the lowest OOB error estimate was selected to predict a final

raster classification map.

3.9 Validation

To validate the results of the random forest classifier, a 10-fold cross validation technique

was employed. 10-fold cross validation, a variant of the k-fold cross validation technique

where k=10, allows all reference data to be used for both training and validation with each

individual observation used for validation exactly once. The dataset is randomly split into 10

equal-sized subsets. 9 of these subsets are then selected and together form a training dataset,

and the remaining single subset is used for testing. This step is repeated 10 times with the

remaining subsets, the results of which being subsequently aggregated into one confusion

matrix (Kohavi, 1995; Snee, 1977). This approach was chosen as an improvement over

traditional simple data splitting, dividing the data so that two-thirds is used for training and

one-third for testing, due to the potential for substantial loss in modeling capability

associated with a forfeiture of useful data for the classifier (Seni and Elder, 2010). A forest

mask, part of the Ekerö biotope database, was used to separate non-forested areas for the

production of classification maps.

17

4. Results

4.1 Multitemporal imagery

The overall classification accuracies for 7 different image combinations are shown in Figure

6. Classifications performed using two images, with all available bands included, were on

average over 5.3% more accurate, in terms of overall accuracy, than those employing single-

date acquisitions, with values ranging between 2.0% (July 21 single date vs July 21 / August

28 combination) and 7.7% (August 28 single date vs May 2 / August 28 combination). This

provides further support to the well-established idea that multitemporal imagery can

improve classification accuracy (Lunetta and Balogh, 1999; Oetter et al., 2001; Waske and

Braun, 2009; Wolter et al., 1995; Yuan et al., 2005). It was determined that the May 2 / July 21

combination yielded the highest relative overall map accuracies. Interestingly, the addition

of the third, late summer (August 28) image was the next best time series combination in

terms of overall accuracy, but performed slightly worse than the May 2 / July 21

combination, with a 1.1% reduction. This confirms the results of Hill et al. (2010) and

Mickelson et al. (1998), whose research suggests that adding additional images is not

necessary when the phenological variations of the study area can be captured with less.

Figure 6. Overall accuracies for maps produced using various Sentinel-2 scene image combinations. Colors refer to acquisition date / combination (May 2 = green, July 21 = blue, Aug. 28 = orange).

Even amongst individual CadasterENV classes, the May 2 / July 21 combination proved most

consistent in obtaining the highest producer’s and user’s accuracies of all the time series

combinations, achieving 98.0% producer’s accuracy in the pine class and 94.7% user’s

accuracy in the spruce class. The exceptions were the spruce and mixed coniferous classes,

where the May 2 single image fared slightly better with the producer’s accuracy of the mixed

coniferous class and user’s accuracy of the spruce class, obtaining 94.0% and 97.0%,

respectively. Additionally, the July 21 / August 28 combination outperformed the May 2 /

July 21 combination in producer’s accuracy in the spruce class and user’s accuracy in the

68

70

72

74

76

78

80

82

84

86

May 2 July 21 28-Aug May 2 / July21

May 2 / Aug28

July 21 / Aug28

May 2 / July21 / Aug 28

Overall

accuracy

(%)

Sentinel-2 scene / combination

18

mixed coniferous class. In the former case, the July 21 / August 28 combination beat the May

2 / July 21 combination by over 10%, reaching 83.7%, and over 6% in the latter instance with

89.8%. The single date August 21 image had the overall poorest performance in terms of

overall accuracy (78.6%) and producer’s / user’s accuracy (worst performer in 3 of 6 classes),

however the July 21 single date image performed similarly in regard to producer’s accuracy.

4.2 The influence of ancillary data on random forest classification

None of the common topographic variables (slope, aspect, curvature) made any

improvements in the model’s classification accuracy, confirming the results of Engler et al.

(2013). In contrast, the inclusion of the modified STI increased overall accuracy in

multitemporal models by nearly 2%. This result corroborates the results of Buchanan et al.

(2014), who demonstrated the usefulness of combining soil data with topographic indices

and its ability to better predict water-saturated areas compared to metrics derived solely

from topography. This improvement was especially pronounced in the classes containing

deciduous trees: mixed coniferous/deciduous forest (3.3% average increase), deciduous

forest (2.3% average increase), and deciduous hardwood forest (2.7% increase). This effect

was even more prominent in single-date classification trials, with the deciduous hardwood

class showing a 9% increase and deciduous forest showing 4% improvement in a

classification of the July 21 image using all 10 bands.

4.3 The influence of sample size on classification accuracy

Compared to the baseline 150 samples per class, it was found that 50 samples (one third of

the total) could be randomly removed and a model fitted on the 100 that remain retain an

accuracy threshold within 2.5% of the baseline (Figure 7). When 100 samples (two thirds of

the total) were removed, the accuracy was within 6% of the baseline. With 125 samples

removed (approx. 83.3% of the total), accuracy was 8.8% lower than the baseline. Finally,

with only 10 samples per class (approximately 93.3% of total samples removed), accuracy

dropped 9.4% compared to the baseline.

19

Figure 7. Sample size vs overall accuracy, where training samples are randomly removed from the model.

4.4 Band combinations and variable importance

The correlation coefficient matrix for the July 21 image can be seen in Table 2, with clusters of

highly correlated bands (highlighted in red) standing out. First, a high degree of correlation

exists among the 3 bands in the visible range. Second, the 2nd and 3rd red edge bands and the

two near infrared bands display a high correlation. Both shortwave infrared bands were

strongly correlated to each other and red edge band 1, with the SWIR1 band showing a

higher correlation to the red edge and NIR bands than SWIR2. The least correlated bands

(highlighted in blue) were between the visible and NIR bands with the red edge bands. Red

edge band 1, unlike the other 2 red edge bands, showed strong correlation with the visible

bands, particularly the green and red bands.

70

72

74

76

78

80

82

84

86

88

90

10 25 50 100 150

OverallAccuracy (%)

Number of Training Samples per Class

20

Table 2. Correlation coefficient matrix for selected Sentinel-2 bands, July 21 image.

blue green red NIR 1 SWIR 1 SWIR 2 RE 1 RE 2 RE 3 NIR 2

blue 1.000 0.964 0.953 0.219 0.477 0.691 0.780 0.300 0.225 0.197

green 0.964 1.000 0.976 0.421 0.653 0.813 0.899 0.497 0.424 0.400

red 0.953 0.976 1.000 0.338 0.632 0.825 0.879 0.424 0.348 0.325

NIR 1 0.219 0.421 0.338 1.000 0.856 0.667 0.698 0.982 0.987 0.987

SWIR 1 0.477 0.653 0.632 0.856 1.000 0.932 0.887 0.900 0.869 0.867

SWIR 2 0.691 0.813 0.825 0.667 0.932 1.000 0.940 0.737 0.684 0.673

RE 1 0.780 0.899 0.879 0.698 0.887 0.940 1.000 0.771 0.712 0.697

RE 2 0.300 0.497 0.424 0.982 0.900 0.737 0.771 1.000 0.993 0.991

RE 3 0.225 0.424 0.348 0.987 0.869 0.684 0.712 0.993 1.000 0.997

NIR 2 0.197 0.400 0.325 0.987 0.867 0.673 0.697 0.991 0.997 1.000

very low (<0.35)

low (0.35-0.65)

moderate (0.65-0.85)

high (0.85-0.95)

very high (>0.95)

To determine variable importance, two variations of a Recursive Feature Elimination

approach were evaluated (Fassnacht, 2014). First, random forest’s own variable importance

ranking, successfully employed in a non-parametric machine learning environment by Diaz-

Uriarte and Alvarez de Andres (2006) and Guyon et al. (2002), was tested. Second, a method

that didn’t rely on random forest’s internal variable importance rankings was used to verify

the results. Starting with all available bands and ancillary inputs, variables were removed

stepwise on the basis of their actual OOB error estimate, averaged over 5 trials. Both

methods produced similar results. After accuracy verification via 10-fold cross validation, it

was found that the lowest error rates were achieved with a 4-band combination of the

following spectral channels: one band in the visible range, two bands in the red edge range,

and one shortwave infrared band.

The visible band that produced the best results was the red visible band (band 4). The

shortwave infrared band that yielded the highest accuracy was SWIR2 (band 12). This

finding corroborates the results from Immitzer, Vuolo and Atzberger (2016), who

determined the robustness of a combination of red edge, SWIR, and visible bands for

classification of tree species in central Europe. It should be noted, however, that their

findings indicated the visible blue band was more important than the visible red band, in

conflict with the results of this study.

The most important red edge band was band 6, and the best results were achieved when

combined with a second red edge band (band 7). Using red edge band 5 in combination with

band 6 produced very similar results to the band 6 / band 7 red edge band combination. This

combination generally produced slightly better results in both producer’s and user’s

accuracy for the coniferous classes, but slightly worse producer’s and user’s accuracy for the

deciduous and mixed coniferous-deciduous classes. The only noteworthy difference between

the accuracies of these two combinations occurred with producer’s accuracy in the deciduous

21

forest class. The band 6/7 red edge combination achieved a producer’s accuracy of 85.3%,

while the band 5/6 red edge combination achieved 81.3%. Furthermore, the overall map

accuracy of the 4/6/7/12 combination was nearly half a percent higher than the 4/5/6/12

combination, earning 85.3% and 84.9%, respectively. Put another way, the latter combination

produced mere 4 more net misclassified samples, of the total 720. The 4/6/7/12 combination

was then selected to create the final classification map.

4.5 Optimal random forest parameters

For the time series and band combination where estimated OOB error rate was lowest, the

May 2 and July 21 combination (bands 4, 6, 7, 12), the OOB error estimate stabilized once

reaching 200 trees, confirming both the robustness of Breiman’s (2001) use of 500 trees as a

default setting, as well as the findings of Immitzer et al. (2012) and Rodriguez-Galiano et al.

(2012). Nonetheless, the OOB error estimate was 0.2% lower with ntree=2000 compared with

ntree=200, so ntree=2000 was selected for the production of the final map. OOB error

estimates for the optimal value of ntree, 2000, were 0.3% to 1.4% lower when replaced with

other values of ntree between 50 and 3000. For the number of split variables, mtry, the

default setting (square root of number of predictor variables, rounded down) produced the

best results, slightly outperforming the default value plus 1. As 9 input variables (2 images, 4

bands + STI) were used to fit the model, the default value of mtry was 3. This finding

confirms the suitability of using the default value of the number of random split variables.

Using the default value of mtry, OOB error estimates ranged between 0.2% and 1.0% lower

than when replacing mtry with other values between 1 and 5.

4.6 Final result: random forest classification map

After separate analyses of the impact of Recursive Feature Elimination for different band

combinations, the use of multitemporal imagery, ancillary data, model parameters, and

sample size, a final classification raster was produced, representing the established optimal

model conditions, shown in Figures 8 and 9a. This map was predicted on a model fitted with

the parameters ntree=2000 and mtry=3 using the red visible band 4, two red edge bands

(bands 6 and 7), the shortwave infrared band 12, as well as the extracted STI value, for the

two image acquisitions May 2 and July 21, 2016, using all 120 samples per class.

22

Figure 8. CadasterENV map produced using random forest classification, Ekerö municipality (inset of figure X outlined in blue).

10-fold cross validation showed this map achieving overall accuracy of 86.0%. The confusion

matrix is shown in Table 3. Producer’s accuracy varied between 90.8% for deciduous

hardwood forest and 81.7% for mixed coniferous forest, and User’s accuracy varied between

93.2% for deciduous hardwood and 81.8% for mixed coniferous-deciduous forest. For

coniferous forest classes, confusion occurred only with other coniferous classes and the

mixed coniferous-deciduous class, with one exception: a single hardwood record, located in

an area that appears to have been clearcutted since the time of the orthophoto acquisition

(which was used to add the record), was misclassified as pine. For deciduous classes,

confusion occurred only within other deciduous classes and the mixed coniferous-deciduous

class.

23

Figure 9. (a) Example portion of CadasterENV map produced using random forest classification, (b) 2015 CIR orthophoto for comparison (see Appendix).

Table 3. Confusion matrix for random forest classification. Producer’s accuracy highlighted in red, user’s accuracy highlighted in blue, overall accuracy highlighted in green.

Class Pine Spruce Mix con Mixed Deciduous Hardwood User's

Pine 107 3 7 7 0 0 86.3%

Spruce 1 99 10 3 0 0 87.6%

Mix con 4 14 99 4 0 0 81.8%

Mixed 7 4 4 98 6 0 82.4%

Deciduous 0 0 0 8 107 11 84.9%

Hardwood 1 0 0 0 7 109 93.2%

Producer's 89.2% 82.5% 82.5% 81.7% 89.2% 90.8% 86.0%

Figure 10 shows the relative distribution of forest classes in terms of percent forest area. A

total of 1027629 pixels were identified as forest in Ekerö municipality, comprising an area of

102.8 km2, approximately 47% of Ekerö’s surface area. The most common CadasterENV

forest class is mixed coniferous-deciduous forest with 30.9 km2, or 30.1% of all forested area

in Ekerö. The next most prevalent class is pine forest with an area of 23.7 km2, or 23.1% of

forest in Ekerö. Deciduous hardwood comprises 21.7 km2 of forest, or 21.2% of the total.

Deciduous forest has a total area of 12.6 km2, 12.3% of the total. Mixed coniferous forest

a b

24

comprises 7.4 km2 of forest, or 7.2%. Of all forest classes, spruce forest is the least represented

at 6.5 km2, 6.3% of forest cover in Ekerö.

Figure 10. Class distribution (% forest area) among forested areas in Ekerö municipality.

5. Discussion

5.1 Multitemporal imagery

As different plant species respond to biological processes such as pigmentation and

senescence in unique ways and at different rates, it follows that using multitemporal imagery

to capture differences in foliar presentation would help separate forest classes that may be

spectrally similar in any one single acquisition image (Chuine and Beaubien, 2001; Dymond

et al., 2002). Less well-studied but suggested by previous research nonetheless is the finding

that although a time series generally improves accuracy via capturing vegetation’s

phenological condition, there appears to be a limit where once reached, one finds a tradeoff

between information efficacy and redundancy, where additional scenes contain unnecessary

or irrelevant details that effect the classifier’s ability to discriminate between classes (Hill et

al., 2010; Mickelson et al., 1998).

Indeed, Hill et al. (2010) reported that image quality, in terms of timing, is more important to

improving classification accuracy than the quantity of images used. In this study, the May 2 /

July 21 combination produced the best result, 1.1% better in terms of overall accuracy when

compared to a model that included spectral data from the August 28 image, verified via 10-

fold cross validation. This shows that while the time series aspect of classification is

important, more data in the form of additional acquisitions after a certain quantity is reached

does not necessarily produce a better predictive model, as data dimensionality increases.

Hughes (1968) laid out the theoretical justification for requiring more reference data to

compensate for the added layer of complexity to the model in these instances, which was

Pine

Spruce

Mixed coniferous

Mixed forest

Deciduous

Deciduous hardwood

25

subsequently confirmed in a remote sensing context by Hill et al. (2010) and Key et al. (2001).

Furthermore, additional images can potentially contain noise related to different canopy

illumination angles and intensity, among other effects, weakening the discriminatory power

of the model for class separation. It is important to note that the integrity of specific

acquisition dates are not universally applicable from year to year, as annual phenology is

affected by both short and long term weather and climatic dynamics (Chuine and Beaubien,

2001). Furthermore, one must take into account the decreasing length of the growing season

the further north the study site is located.

Because the May 2 single image classification achieved higher accuracies than the July 21 and

August 28 single images, and the spring-summer (May 2 / July 21) combination was more

accurate than the summer-late summer (July 21 / August 8) combination, one can infer that

the springtime acquisitions were most critical in the separation of the classes. This finding

corroborates past studies which have highlighted the importance of acquiring data for the

discrimination of forest classes containing deciduous trees in temperate forests using a

single-date image either at the start or at the end of the growing season. Key et al. (2001) and

Schriever & Congalton (1995) found that mid-autumn was the optimal time, noting that the

greatest differences amongst target vegetation types in factors relating to the biophysical

properties of the plant structure are likely to be observed during senescence. However, trees

during autumn senescence are more susceptible to unpredictable frosts and wind conditions

that may contribute to premature leaf removal. Furthermore, in choosing between spring

and autumn images for use in discriminating between land cover classes, Sweden’s

geographical position at high latitudes become a decisive factor.

The study areas of Key et al. (2001) and Schriever & Congalton (1995) are both located in

Eastern North America, between latitudes 39°N and 43°N. In contrast, Ekerö, Sweden, is

located between latitudes 59°N and 60°N. As solar zenith angles increase as position moves

further away from the equator and further away in time from the summer solstice, the effects

of shadows increase, a reduction in the signal to noise ratio occurs, and the atmospheric path

is longer, affecting the spectral distribution of the irradiance (Hawotte et al., 2016). An early-

to-mid May acquisition, the approximate time of foliation of primary tree species in Ekerö, is

roughly 6 weeks away from the summer solstice occurring on June 21. A mid-autumn image

occurring in early-to-mid October, on the other hand, is approximately 14 weeks after the

summer solstice. The effects of shadows, signal to noise ratio, and spectral distribution of the

irradiance thus affect a mid-autumn image to a greater degree. Therefore, while a mid-

autumn acquisition may have been optimal at latitudes which are closer to the equator, one

should exercise caution in assuming the same will be true at higher latitudes.

With regard to individual classes, generally very high producer’s and user’s accuracies were

achieved. Only the May 2 / July 21 image combination achieved the highest producer’s and

user’s accuracy in 4 of the 6 forest classes, demonstrating the importance of optimizing

image acquisition timing as it relates to the capturing of phenological differences needed to

26

discriminate between classes. The May 2 single date classification attained the highest

producer’s accuracy for the mixed coniferous class at 94.0%, but user’s accuracy was the

worst of all combinations at 78.3%. This demonstrates the tradeoff between accurate

classification and actual map reliability on the ground. Interestingly, the May 2 single date

classification had the highest user’s accuracy in the spruce class at 97.0%, though producer’s

accuracy was 65.3%, the lowest for all combinations. The opposite was true for the July 21 /

August 28 image combination, achieving significantly better producer’s accuracy in the

spruce class and user’s accuracy in the mixed coniferous class compared to the May 2 / July

21 combination.

The ability of an image or a combination of images to achieve both the highest user’s

accuracy in one category and the lowest user’s accuracy in another, for example with the

spruce and mixed coniferous classes in the May 2 image, can be explained by their confusion

with other classes. Only one sample from another class, deciduous hardwood, was

misclassified as pine, but 17 of the 49 spruce samples were misclassified as either mixed

coniferous, mixed coniferous-deciduous, or pine. At the same time, the May 2 / July 21

combination had two mixed coniferous samples misclassified as spruce, but only 13 of 49

spruce samples were misclassified as either mixed coniferous, mixed coniferous-deciduous,

or pine. The May 2 image, with only half the available spectral information compared to the

May 2 / July 21 combination, had a harder time distinguishing amongst the 4 classes

containing coniferous trees, even if it performed slightly better in terms of % spruce samples

correctly classified. Therefore, the May 2 / July 21 image combination produced much better

overall results when taking into account not only how many samples were correctly

classified but also the reliability of that classification to a user of the map on the ground.

It is important to note that since the mixed coniferous class is categorized as having a

compositional mix of two existing classes, pine and spruce, a certain level or confusion

within these classes is a logical consequence and can be expected. As previously noted, the

inclusion of classes containing both individual species and separate classes comprising a mix

of those species is not an ideal classification scheme in the remote sensing context, due to

interactions between background signature and intra-species variability, related to the

reduction of statistical separability of different classes in the spectral space (Carleer and

Wolff, 2004; Cushnie, 1987; Fassnacht et al., 2016). The poorest performing single image

classification was the August 28 acquisition, the only case that had the lowest producer’s and

user’s accuracy in 3 of 6 classes. This is likely due to the timing of the August 28 image, with

late summer having the lowest spectral variability between forest classes when compared to

the spring (May 2) and high summer (July 21) images.

5.2 Ancillary data

Though topographic variables have been shown in previous studies to improve classification

accuracy, including those employing the random forest algorithm, their contribution to the

27

model appears to be dependent on both the scale and physical characteristics of the study

area. Engler et al. (2013) confirmed the limited benefit of common topographical variables in

a study area of 200 km2, though Zimmerman et al. (2007) found they were crucial when

working in a larger study area of 60,000km2. Rodriguez-Galiano et al. (2012) found that

elevation was among the most important variables in a study of the Granada Province of

Spain with a large area (13,000 km2) and large elevation range (0-3480 m.a.s.l.) of target land

cover classes. Therefore, it can be inferred that if the study area is large enough and/or

heterogeneous enough for the vegetation observed to be potentially affected by broader

changes in the environmental gradients of the land it inhabits, topographic variables will

provide a great deal of explanatory power to classification models. With smaller or more

homogenous study areas, the scale of topographic variables as they relate to spectral

information is likely to be limited and thus cannot contribute to successful discrimination of

land cover types. Another potential reason topographic variables are useful at small scales is

that intra-class variability can be expected to increase with study area size, compromising

intra-class spectral seperability in the process (Rogan et al., 2008). As the study area in this

project is approximately 218km2, it follows that the results support those found by Engler et

al. (2013), where the study area was also relatively small in size.

In contrast to ineffectiveness of the aforementioned topographic variables in improving

overall accuracy, the inclusion of the Soil Topographic Wetness Index proved useful. Overall

accuracies were nearly 2% higher in multitemporal models and 9% higher in single-date

models when added as input variables in random forest. The integration of soil and

topographic information together to provide a metric of soil transmissivity that appears to

help the algorithm discriminate between classes, where topographic variables alone do not

contain enough information to do so, in this case likely due to the small study area size. This

finding suggests that in areas where soil patterns are not uniform, the STI predicts areas of

potential saturation even where spectral signatures may be similar (Sivapalan and Wood,

1987). This helps to explain the significant improvement STI gives in classification accuracy

to the deciduous forest classes, where mixed spectral signatures occurring within a pixel are

likely to occur due to the possibility of multiple species, a characteristic of the forest class

definition. Here, the STI provides useful information to the algorithm to aid in the separation

of classes where the algorithm has difficulty doing so with spectral information alone. The

effect is even more noticeable in classifications performed with single-date images, as even

less spectral information is available. These results confirm Buchanan et al. (2014), suggesting

that an STI can indeed allow for better predictions of potentially saturated areas than

topographic indexes alone. In areas where soils are non-uniform and when soil/topographic

information is available, the STI metric should be considered as a complement to spectral

data in future forest classification studies.

28

5.3 Sample size

One of the most pertinent questions in remote sensing studies is the determination of the

quantity of reference data needed to achieve a specific classification accuracy threshold.

Several issues must be considered: the need to take enough samples to represent the inter-

and intra-species variation in the target sample area, the level of accuracy desired, as well as

cost, both in economic and temporal terms (Lippitt et al., 2008; Pal and Mather, 2003; Rogan

et al., 2003, 2008). Furthermore, Fassnacht et al. (2016) notes that the amount of training data

needed varies with the target species or classes under investigation, the methods applied,

and the requirements of the end user. The topic is especially important to future land cover

classification projects, which require funding in order to collect field data. As much of

Sweden’s landscape is characterized by dense forest, alpine areas, or terrain that may

otherwise be difficult to access, it is of high importance to investigate the relationship

between the amount of reference data and accuracy objectives before dispatching resources

into the field.

While Jensen (2005) recommended that the minimum number of training records be equal to

ten times the number of input variables for parametric classification methods, various

studies have shown this approach to be insufficient for non-parametric machine learning

methods such as random forest (Foody, 1995; Foody and Arora, 1997; Pal, 2005; Pal and

Mather, 2003). The amount of training data should be more sensitive to study areas and land

cover classes exhibiting high variability, and not simply which classification method was

employed.

Regardless of the method used, more data beats a cleverer algorithm (Domingos, 2012). All

classifiers essentially function by grouping together similar samples of data, the main

difference being how “similarity” is defined. Rodriguez-Galiano et al. (2012) found that with

a 70% reduction in sample size, overall accuracy remained within 5% of the baseline

accuracy obtained using all samples. In this study, a 70% reduction in sample size resulted in

overall accuracy remaining within approximately 6% of the baseline. In comparison,

however, Rodriguez-Galiano et al. (2012) achieved higher stability up until this 70%

threshold, and much lower stability beyond it, whereas in this study less stability was

achieved before the 70% threshold, but not as much instability occurred beyond it. This may

be explained by the great difference in scale between the two study sites, with the reference

study’s site being much larger in areal extent, as well as having nearly double the amount of

land cover classes to classify. Understandably, a model with more classes would have

greater sensitivity to the effects heavy reduction in sample size, as samples must represent

the intra-class variability of the study area. Indeed, Reese (2011) recorded a 9.3% drop in

accuracy when reducing the total number of samples by just 25% in a random forest model.

This demonstrates the need to consider reference data not only in terms of quantity, but also

quality.

29

As the reference data used in this study was collected in the context of creating/validating a

biotope database, the data was inherently flawed for use in this study. Notably, species

composition for all reference points was based not on % crown cover, but relative number of

species present by stem count. Furthermore, even though the data was confirmed visually

using an orthophoto, there was no standard radius by which these counts were made in the

field, which clearly affects the integrity of classifications that are performed on 10 meter

pixels. This problem is likely to have affected the accuracy in this study, but it was an

unavoidable issue as the author was unable to perform their own reference data collection.

5.4 Band combinations and variable importance

In determining which bands are the most valuable, it is important to note that there is

multicollinearity between the variables. Ranking the input variables on the basis of their

individual contribution to a model’s accuracy may not say much about how one may

optimize the inputs for increasing overall classification accuracy. Any particular variable on

its own may not be among the most important according to the stepwise Recursive Feature

Elimination process, but combined with other spectral channels, may play a role in

producing the most accurate result. For example, the red edge bands were shown to be less

important than either of the SWIR bands, but a model with both SWIR bands and no red

edge bands is less accurate than one that includes one of each. As there are complex

correlative relationships between the different spectral channels, one must try various

combinations based not just on band number, but also the spectral class they belong to

(visible, red edge, NIR, SWIR), as correlation is typically highest within these classes.

Calculating correlation metrics can help steer this process. Once highly correlated input

variables are identified, the variable that produces the most accurate classification output

amongst them can be included in the model and the rest excluded, to not add redundant

information which reduces the performance of the classifier.

Only one of the three visible bands and one of the two SWIR bands were needed due to the

high degree of correlation between them, based on the results of the correlation matrix.

Adding between 1 and 3 red edge bands produced similar results, with the best results

produced using red edge bands 6 and 7 and SWIR band 12. Immitzer, Vuolo and Atzberger

(2016) also confirmed the high value of the red edge and SWIR Sentinel-2 bands for the

identification of tree species in central Europe. Interestingly, they showed the blue visible

(band 2) as being the most important visible spectral channel, whereas in this study red band

produced the lowest classification error rates. This may be due to a greater emphasis on

coniferous trees in the former study, attempting to classify Silver fir (Abies alba) and European

larch (Larix decidua) in addition to pine and spruce. Indeed, Pu and Liu (2011) relate the blue

band’s special relevance to coniferous trees to their reduced relative photosynthetic activity

in the blue spectrum. Immitzer, Vuolo and Atzberger (2016) did, however, note that the

timing of the single-date, late-summer acquisition timing most likely influenced the variable

importance ranking.

30

The limited usefulness of the NIR bands 8 and 8a in classifying similar forest types using

random forest was also confirmed by Immitzer, Vuolo and Atzberger (2016). Additionally,

the correlation matrix showed that NIR band 8 is highly correlated with the red edge bands.

As the bands are characterized by multicollinearity, when combined with spectral

information from the visible and SWIR range, the red edge bands appear to be better at

discriminating between classes than the NIR bands.

Quantifying variable importance aids in understanding the factors that drive a classification

model’s ability to separate data into distinct classes. One of the more perplexing issues with

analyzing the way random forests work is the black-box nature of the algorithm, which

arises from the subtleties of different procedural features of the method, namely bootstrap

aggregation (bagging) and the classification tree node split criterion, which is based on the

Gini impurity index. While bagging is one of the most computationally efficient techniques

to handle large, high-dimensional datasets, finding an appropriate classification model in

one step is not possible due to the scale and complexity of the problem (Buhlmann and Yu,

2002; Kleiner et al., 2014; Wager et al., 2014).

The combination of bagging and random selection of input variables, the tree nodes of which

are then split using the Gini criterion, makes the algorithm very challenging to scrutinize

with rigorous mathematics. This phenomenon can also be described as a difficulty in

examining the effects of randomization and a highly data-dependent tree construction

process simultaneously. Research attempting to explain the mathematical properties of

random forest tend to adopt methodological procedures that are not data-driven, unlike the

practical applications which are data-dependent, and thus a gap is created between theory

and practice (Scornet, Biau, and Vert, 2015).

Strobl et al. (2007) notes that the Gini index, the node-splitting rule employed by the random

forest algorithm, can be biased in certain cases. In their analysis, it was determined

experimentally that with a growing number of input variables, the robustness of the variable

importance calculation decreases. This is because given a fixed number of true predictor

variables, the addition of further inputs decreases the chance a true predictor is randomly

selected, and the less often a variable is selected, the less likely it will be considered to be of

high importance (Genuer, Poggi, and Tuleau-Malot, 2010). Furthermore, the variable

importance ranking may be biased if there is a discrepancy in the scale of measurement

between the input variables. As the scale of measurement is not an indicator of true variable

importance, this can lead to a biased ranking of variables (Strobl et al., 2007). Since the Gini

index is computed for all possible node splits within the range of a given variable, and the

variable selected for a the next split is that which produces the highest overall Gini split

criterion, variables that are greater in scale have better chance to randomly produce a good

criterion value (Strobl et al., 2008).

31

5.5 Optimal random forest parameters

The variance between output OOB generalization error when changing the ntree and mtry

parameters are related to the balance between increasing the strength of each tree and the

degree to which they are correlated to each other. Increasing the number of randomly

selected split variables improves the strength of any particular tree, but also has a

detrimental effect in that it increases correlation between them, which reduces overall

accuracy when used to predict unseen data. Hence, these parameters should be tuned in

order to determine the values of each that result in the lowest generalization error. The

results of this study confirm that random forest’s default parameters are capable of

producing close to optimal results. The default value for number of random split variables,

mtry, consistently produced the best result.

The number of trees needed to be grown in order to reach a convergence of generalization

error within 0.2% of the best result of the was determined to be 200 with 2 images, 120

reference points per class, and 9 input variables. Rodriguez-Galiano et al. (2012) reached

convergence after 100 trees, also using 2 images, 120 reference points per class, and 9 input

variables. Immitzer et al. (2012) reached a convergence of generalization error after 250 trees

using 1 image, 215 samples per class, and 8 input variables. The results of this study place

my findings in the same neighborhood, demonstrating that random forest’s default value of

ntree=500 is more than sufficient to reach a convergence of OOB generalization error. Beyond

this number, growing more trees made no substantial improvements in the model’s

accuracy, only increasing computation time.

Breiman (2001) states that this convergence phenomenon arises from the Strong Law of

Large Numbers, noting that overfitting does not occur when adding more trees (Feller, 1968).

Indeed, a forest of several hundred trees appears to be sufficient to compensate for the

inherent instability of any individual tree contained in the model, one of Breiman’s objectives

in developing the algorithm. Nonetheless, performing an investigation of the effect of forest

size on estimated generalization error proved useful in tuning this parameter a bit further, as

increasing the number of trees grown to 2000 reduced the estimated generalization error by

0.2% compared with 200 trees. No improvement was made when increasing the number of

trees to 3000.

6. Conclusion

The main sources of error in this study can be attributed to the following: 1) a forest

classification system which contains a mixture of species-specific as well as mixed-species

classes, 2) the use of a May 2 acquisition, the best spring image available, which is likely too

early to capture leaf-on conditions for all target species in the Ekerö region, and 3) reference

data which was not intended specifically for use in this study. Addressing these issues

would likely contribute to better result. Other recommendations to improve accuracy in

future studies include the incorporation of fuzzy logic for mixed forest classes (Foody, 2002),

32

high resultion LiDAR data to assess stand characteristics (Nordkvist et al., 2012), and textural

analysis (Lu et al., 2014). If an orthophoto from a different time period than the satellite

acquisitions is used to check the integrity of reference data, consider the effects of the time

lag (new clearcuts, construction, etc.).

This study sought to address the performance of Sentinel-2 data for use in classifying forest

types in Ekerö, Sweden. To best utilize this data in future studies, several recommendations

are of note. The use of multitemporal data is helpful in separating the classes of forest types

that may be spectrally similar in any single time frame, though one should be careful to

select the appropriate timing of the images to maximize phenological differences, so as to not

add redundant information which will not be of use to the classifier. Reducing the number of

bands via a feature reduction approach to exclude highly-correlated spectral information can

increase overall accuracy. It is suggested to take more than one approach to determine

relative variable importance for optimal band combinations, and in the case of machine

learning algorithms, not relying solely on internal variable importance rankings. Calculating

a correlation matrix can help to make sense of the findings. Ancillary data may complement

the spectral information in a model to produce better results, but the scale and physical

characteristics of the study site should be taken into consideration. Topographic information

alone may not boost performance of the model, but when combined with soil data may

prove useful. While the amount and type of reference data needed to produce an optimal

result depends on a number of factors, consider that more data beats a better model. Lastly,

the random forest machine learning algorithm proved to be a simple, flexible method for

land cover classification analysis. As it is resistant to overfitting and doesn’t rely on

assumptions of data distribution, it appears to be well suited to handle future remote sensing

data that continues to increase in dimensionality.

33

34

6. References

Abu-Mostafa, Y.S., Magdon-Ismail, M. and Lin, H.T., 2012. Learning from data (Vol. 4). New York, NY, USA:

AMLBook.

Anderson, J.R., 1976. A land use and land cover classification system for use with remote sensor data (Vol. 964). US

Government Printing Office.

Atkinson, P.M. and Tatnall, A.R.L., 1997. Introduction neural networks in remote sensing. International Journal of

remote sensing, 18(4), pp.699-709.

Ban, Y., Gong, P. and Gini, C., 2015. Global land cover mapping using Earth observation satellite data: Recent

progresses and challenges. ISPRS journal of photogrammetry and remote sensing (Print), 103(1), pp.1-6.

Banskota, A., Kayastha, N., Falkowski, M.J., Wulder, M.A., Froese, R.E. and White, J.C., 2014. Forest monitoring

using Landsat time series data: a review. Canadian Journal of Remote Sensing, 40(5), pp.362-384.

Baret, F. and Buis, S., 2008. Estimating canopy characteristics from remote sensing observations: Review of

methods and associated problems. In Advances in land remote Sensing (pp. 173-201). Springer Netherlands.

Benediktsson, J.A., Swain, P.H. and Ersoy, O.K., 1990. Neural network approaches versus statistical methods in

classification of multisource remote sensing data.

Breiman, L., 1996. Bagging predictors. Machine learning, 24(2), pp.123-140.

Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32.

Brisco, B. and Brown, R.J., 1995. Multidate SAR/TM synergism for crop classification in western

Canada. Photogrammetric Engineering and Remote Sensing, 61(8), pp.1009-1014.

Buchanan, B.P., Fleming, M., Schneider, R.L., Richards, B.K., Archibald, J., Qiu, Z. and Walter, M.T., 2014.

Evaluating topographic wetness indices across central New York agricultural landscapes. Hydrology and

Earth System Sciences, 18(8), p.3279.

Büchlmann, P. and Yu, B., 2002. Analyzing bagging. Annals of Statistics, pp.927-961.

Carleer, A. and Wolff, E., 2004. Exploitation of very high resolution satellite data for tree species

identification. Photogrammetric Engineering & Remote Sensing, 70(1), pp.135-140.

Carreiras, J.M., Pereira, J.M., Campagnolo, M.L. and Shimabukuro, Y.E., 2006. Assessing the extent of

agriculture/pasture and secondary succession forest in the Brazilian Legal Amazon using SPOT

VEGETATION data. Remote Sensing of Environment, 101(3), pp.283-298.

Chan, J.C.W. and Paelinckx, D., 2008. Evaluation of Random Forest and Adaboost tree-based ensemble

classification and spectral band selection for ecotope mapping using airborne hyperspectral

imagery. Remote Sensing of Environment, 112(6), pp.2999-3011.

Chen, C., Liaw, A. and Breiman, L., 2004. Using random forest to learn imbalanced data. University of California,

Berkeley, 110.

Chuine, I. and Beaubien, E.G., 2001. Phenology is a major determinant of tree species range. Ecology Letters, 4(5),

pp.500-510.

Congalton, R.G. and Green, K., 2008. Assessing the accuracy of remotely sensed data: principles and practices. CRC

press.

Criminisi, A., Shotton, J. and Konukoglu, E., 2011. Decision forests for classification, regression, density

estimation, manifold learning and semi-supervised learning. Microsoft Research Cambridge, Tech. Rep.

MSRTR-2011-114, 5(6), p.12.

35

Cushnie, J.L., 1987. The interactive effect of spatial resolution and degree of internal variability within land-cover

types on classification accuracies. International Journal of Remote Sensing, 8(1), pp.15-29.

Dalponte, M., Orka, H.O., Gobakken, T., Gianelle, D. and Næsset, E., 2013. Tree species classification in boreal

forests with hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 51(5), pp.2632-2645.

DeFries, R.S. and Chan, J.C.W., 2000. Multiple criteria for evaluating machine learning algorithms for land cover

classification from satellite data. Remote Sensing of Environment, 74(3), pp.503-515.

Díaz-Uriarte, R. and De Andres, S.A., 2006. Gene selection and classification of microarray data using random

forest. BMC bioinformatics, 7(1), p.3.

Domingos, P., 2012. A few useful things to know about machine learning. Communications of the ACM, 55(10),

pp.78-87.

Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P.,

Martimort, P. and Meygret, A., 2012. Sentinel-2: ESA's optical high-resolution mission for GMES

operational services. Remote Sensing of Environment, 120, pp.25-36.

Dymond, C.C., Mladenoff, D.J. and Radeloff, V.C., 2002. Phenological differences in Tasseled Cap indices

improve deciduous forest classification. Remote Sensing of Environment, 80(3), pp.460-472.

Engler, R., Waser, L.T., Zimmermann, N.E., Schaub, M., Berdos, S., Ginzler, C. and Psomas, A., 2013. Combining

ensemble modeling and remote sensing for mapping individual tree species at high spatial

resolution. Forest Ecology and Management, 310, pp.64-73.

European Space Agency. 2017. Sentinel-2 - Missions - Sentinel Online. [ONLINE] Available

at: https://sentinel.esa.int/web/sentinel/missions/sentinel-2. [Accessed 23 May 2017].

Evans, J.S., Murphy, M.A., Holden, Z.A. and Cushman, S.A., 2011. Modeling species distribution and change

using random forest. In Predictive species and habitat modeling in landscape ecology (pp. 139-159). Springer

New York.

Fassnacht, F.E., Latifi, H., Stereńczak, K., Modzelewska, A., Lefsky, M., Waser, L.T., Straub, C. and Ghosh, A.,

2016. Review of studies on tree species classification from remotely sensed data. Remote Sensing of

Environment, 186, pp.64-87.

Fassnacht, F.E., Neumann, C., Förster, M., Buddenbaum, H., Ghosh, A., Clasen, A., Joshi, P.K. and Koch, B., 2014.

Comparison of feature reduction algorithms for classifying tree species with hyperspectral data on three

central European test sites. IEEE Journal of Selected Topics in Applied Earth Observations and Remote

Sensing, 7(6), pp.2547-2561.

Feller, W., 1968. An introduction to probability theory and its applications: volume I (Vol. 3). New York: John Wiley &

Sons.

Fernández-Delgado, M., Cernadas, E., Barro, S. and Amorim, D., 2014. Do we need hundreds of classifiers to

solve real world classification problems. J. Mach. Learn. Res, 15(1), pp.3133-3181.

Foody, G.M., 1995. Land cover classification by an artificial neural network with ancillary

information. International Journal of Geographical Information Systems, 9(5), pp.527-542.

Foody, G.M., 2002. Status of land cover classification accuracy assessment. Remote sensing of environment, 80(1),

pp.185-201.

Foody, G.M. and Arora, M.K., 1997. An evaluation of some factors affecting the accuracy of classification by an

artificial neural network. International Journal of Remote Sensing, 18(4), pp.799-810.

Franklin, J., 1998. Predicting the distribution of shrub species in southern California from climate and terrain‐

derived variables. Journal of Vegetation Science, 9(5), pp.733-748.

36

Freund, Y. and Schapire, R.E., 1996, July. Experiments with a new boosting algorithm. In icml (Vol. 96, pp. 148-

156).

Friedl, M.A. and Brodley, C.E., 1997. Decision tree classification of land cover from remotely sensed data. Remote

sensing of environment, 61(3), pp.399-409.

Genuer, R., Poggi, J.M. and Tuleau-Malot, C., 2010. Variable selection using random forests. Pattern Recognition

Letters, 31(14), pp.2225-2236.

Ghimire, B., Rogan, J. and Miller, J., 2010. Contextual land-cover classification: incorporating spatial dependence

in land-cover classification models using random forests and the Getis statistic. Remote Sensing Letters, 1(1),

pp.45-54.

Gislason, P.O., Benediktsson, J.A. and Sveinsson, J.R., 2004, September. Random forest classification of

multisource remote sensing and geographic data. In Geoscience and Remote Sensing Symposium, 2004.

IGARSS'04. Proceedings. 2004 IEEE International (Vol. 2, pp. 1049-1052). IEEE.

Guo, L., Chehata, N., Mallet, C. and Boukir, S., 2011. Relevance of airborne lidar and multispectral image data for

urban scene classification using Random Forests. ISPRS Journal of Photogrammetry and Remote Sensing, 66(1),

pp.56-66.

Guyon, I., Weston, J., Barnhill, S. and Vapnik, V., 2002. Gene selection for cancer classification using support

vector machines. Machine learning, 46(1), pp.389-422.

Hansen, L.K. and Salamon, P., 1990. Neural network ensembles. IEEE transactions on pattern analysis and machine

intelligence, 12(10), pp.993-1001.

Hansen, M., Dubayah, R. and DeFries, R., 1996. Classification trees: an alternative to traditional land cover

classifiers. International journal of remote sensing, 17(5), pp.1075-1081.

Hawotte, F., Radoux, J., Chomé, G. and Defourny, P., 2016. Assessment of Automated Snow Cover Detection at

High Solar Zenith Angles with PROBA-V. Remote Sensing, 8(9), p.699.

Heller, R.C., Doverspike, G.E. and Aldrich, R.C., 1964. Identification of tree species on large-scale panchromatic and

color aerial photographs. US Department of Agriculture, Forest Service.

Hill, R.A., Wilson, A.K., George, M. and Hinsley, S.A., 2010. Mapping tree species in temperate deciduous

woodland using time‐series multi‐spectral data. Applied Vegetation Science, 13(1), pp.86-99.

Ho, T.K., Hull, J.J. and Srihari, S.N., 1990, June. Combination of structural classifiers. In Proc. IAPR Workshop

Syntatic and Structural Pattern Recog (pp. 123-137).

Horler, D.N.H., DOCKRAY, M. and Barber, J., 1983. The red edge of plant leaf reflectance. International Journal of

Remote Sensing, 4(2), pp.273-288.

Huang, C., Davis, L.S. and Townshend, J.R.G., 2002. An assessment of support vector machines for land cover

classification. International Journal of remote sensing, 23(4), pp.725-749.

Hughes, G., 1968. On the mean accuracy of statistical pattern recognizers. IEEE transactions on information

theory, 14(1), pp.55-63.

Immitzer, M., Atzberger, C. and Koukal, T., 2012. Tree species classification with random forest using very high

spatial resolution 8-band WorldView-2 satellite data. Remote Sensing, 4(9), pp.2661-2693.

Immitzer, M., Vuolo, F. and Atzberger, C., 2016. First experience with Sentinel-2 data for crop and tree species

classifications in central Europe. Remote Sensing, 8(3), p.166.

Jensen, J.R., 2005. Introductory digital image processing: a remote sensing perspective (No. Ed. 3). Prentice-Hall Inc.

Upper Saddle River, NJ.Jensen, R.R., Hardin, P.J. and Hardin, A.J., 2012. Classification of urban tree species

using hyperspectral imagery. Geocarto International, 27(5), pp.443-458.

37

Jones, H.G. and Vaughan, R.A., 2010. Remote sensing of vegetation: principles, techniques, and applications. Oxford:

Oxford University Press.

Key, T., Warner, T.A., McGraw, J.B. and Fajvan, M.A., 2001. A comparison of multispectral and multitemporal

information in high spatial resolution imagery for classification of individual tree species in a temperate

hardwood forest. Remote Sensing of Environment, 75(1), pp.100-112.

Kleinberg, E.M., 1990. Stochastic discrimination. Annals of Mathematics and Artificial intelligence, 1(1), pp.207-239.

Kleiner, A., Talwalkar, A., Sarkar, P. and Jordan, M.I., 2014. A scalable bootstrap for massive data. Journal of the

Royal Statistical Society: Series B (Statistical Methodology), 76(4), pp.795-816.

Kohavi, R., 1995, August. A study of cross-validation and bootstrap for accuracy estimation and model selection.

In Ijcai (Vol. 14, No. 2, pp. 1137-1145).

Kotsiantis, S. and Pintelas, P., 2004. Combining bagging and boosting. International Journal of Computational

Intelligence, 1(4), pp.324-333.

KSLA (The Royal Swedish Academy of Agriculture and Forestry), 2015. Forests and forestry in Sweden.

GeoJournal doi: 10.1007/BF00578267.

Laurin, G.V., Puletti, N., Hawthorne, W., Liesenberg, V., Corona, P., Papale, D., Chen, Q. and Valentini, R., 2016.

Discrimination of tropical forest types, dominant species, and mapping of functional guilds by

hyperspectral and simulated multispectral Sentinel-2 data. Remote Sensing of Environment, 176, pp.163-176.

Lawrence, R.L., Wood, S.D. and Sheley, R.L., 2006. Mapping invasive plants using hyperspectral imagery and

Breiman Cutler classifications (RandomForest). Remote Sensing of Environment, 100(3), pp.356-362.

Lee, S.S. and Elder, J.F., 1997. Bundling heterogeneous classifiers with advisor perceptrons. White Paper.

Lippitt, C.D., Rogan, J., Li, Z., Eastman, J.R. and Jones, T.G., 2008. Mapping Selective Logging in Mixed

Deciduous Forest. Photogrammetric Engineering & Remote Sensing, 74(10), pp.1201-1211.

Lu, D., Li, G., Kuang, W. and Moran, E., 2014. Methods to extract impervious surface areas from satellite images.

International Journal of Digital Earth, 7(2), pp.93-112.

Lunetta, R.S. and Balogh, M.E., 1999. Application of multi-temporal Landsat 5 TM imagery for wetland

identification. Photogrammetric Engineering and Remote Sensing, 65(11), pp.1303-1310.

Manyara, C.G. and Lein, J.K., 1994. Exploring the Suitability of Fuzzy Set Theory in Image Classification: A

Comparative Study Applied to the Mau Forest Area Kenya. In Proceedings of American Society for

Photogrammetry and Remote Sensing/American Congress on Surveying and Mapping, Annual Convention (pp.

384-391).

Mayer, B.A. and Kylling, A., 2005. The libRadtran software package for radiative transfer calculations-description

and examples of use. Atmospheric Chemistry and Physics, 5(7), pp.1855-1877.

Melgani, F. and Bruzzone, L., 2004. Classification of hyperspectral remote sensing images with support vector

machines. IEEE Transactions on geoscience and remote sensing, 42(8), pp.1778-1790.

Michie, D., Spiegelhalter, D.J. and Taylor, C.C., 1994. Machine learning, neural and statistical classification.

Mickelson, J.G., Civco, D.L. and Silander, J.A., 1998. Delineating forest canopy species in the northeastern United

States using multi-temporal TM imagery. Photogrammetric engineering and remote sensing, 64, pp.891-904.

Naidoo, L., Cho, M.A., Mathieu, R. and Asner, G., 2012. Classification of savanna tree species, in the Greater

Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data

mining environment. ISPRS Journal of Photogrammetry and Remote Sensing, 69, pp.167-179.

Nordkvist, K., Granholm, A.H., Holmgren, J., Olsson, H. and Nilsson, M., 2012. Combining optical satellite data

and airborne laser scanner data for vegetation classification. Remote sensing letters, 3(5), pp.393-401.

38

Oetter, D.R., Cohen, W.B., Berterretche, M., Maiersperger, T.K. and Kennedy, R.E., 2001. Land cover mapping in

an agricultural setting using multiseasonal Thematic Mapper data. Remote Sensing of Environment, 76(2),

pp.139-155.

Pal, M., 2005. Random forest classifier for remote sensing classification. International Journal of Remote

Sensing, 26(1), pp.217-222.

Pal, M. and Mather, P.M., 2003. An assessment of the effectiveness of decision tree methods for land cover

classification. Remote sensing of environment, 86(4), pp.554-565.

Pant, P., Heikkinen, V., Hovi, A., Korpela, I., Hauta-Kasari, M. and Tokola, T., 2013. Evaluation of simulated

bands in airborne optical sensors for tree species identification. Remote Sensing of Environment, 138, pp.27-

37.

Lenney, M.P., Woodcock, C.E., Collins, J.B. and Hamdi, H., 1996. The status of agricultural lands in Egypt: the use

of multitemporal NDVI features derived from Landsat TM. Remote Sensing of Environment, 56(1), pp.8-20.

Pax-Lenney, M. and Woodcock, C.E., 1997. Monitoring agricultural lands in Egypt with multitemporal landsat

TM imagery: How many images are needed?. Remote Sensing of Environment, 59(3), pp.522-529.

Pontius Jr, R.G. and Millones, M., 2011. Death to Kappa: birth of quantity disagreement and allocation

disagreement for accuracy assessment. International Journal of Remote Sensing, 32(15), pp.4407-4429.

Prasad, A.M., Iverson, L.R. and Liaw, A., 2006. Newer classification and regression tree techniques: bagging and

random forests for ecological prediction. Ecosystems, 9(2), pp.181-199.

Pu, R. and Liu, D., 2011. Segmented canonical discriminant analysis of in situ hyperspectral data for identifying

13 urban tree species. International Journal of Remote Sensing, 32(8), pp.2207-2226.

Reese, H., 2011. Classification of Sweden’s forest and alpine vegetation using optical satellite and inventory data (Vol.

2011, No. 86).

Richter, R. and Schlaepfer, D., 2011. Atmospheric/topographic correction for satellite imagery: ATCOR-2/3 User

Guide Vers. 8.0. 2. DLR—German Aerospace Center, Remote Sensing Data Center.

Roberts, J.J., Best, B.D., Dunn, D.C., Treml, E.A. and Halpin, P.N., 2010. Marine Geospatial Ecology Tools: An

integrated framework for ecological geoprocessing with ArcGIS, Python, R, MATLAB, and

C++. Environmental Modelling & Software, 25(10), pp.1197-1207.

Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M. and Rigol-Sanchez, J.P., 2012. An assessment of

the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry

and Remote Sensing, 67, pp.93-104.

Rogan, J., Franklin, J., Stow, D., Miller, J., Woodcock, C. and Roberts, D., 2008. Mapping land-cover modifications

over large areas: A comparison of machine learning algorithms. Remote Sensing of Environment, 112(5),

pp.2272-2283.

Rogan, J., Miller, J., Stow, D., Franklin, J., Levien, L. and Fischer, C., 2003. Land-cover change monitoring with

classification trees using Landsat TM and ancillary data. Photogrammetric Engineering & Remote

Sensing, 69(7), pp.793-804.

Schriever, J.R. and Congalton, R.G., 1995. Evaluating seasonal variability as an aid to cover-type mapping from

Landsat Thematic Mapper data in the Northeast. Photogrammetric Engineering and Remote Sensing, 61(3),

pp.321-327.

Scornet, E., Biau, G. and Vert, J.P., 2015. Consistency of random forests. The Annals of Statistics, 43(4), pp.1716-

1741.

Seni, G. and Elder, J.F., 2010. Ensemble methods in data mining: improving accuracy through combining

predictions. Synthesis Lectures on Data Mining and Knowledge Discovery, 2(1), pp.1-126.

39

Singh, A., 1989. Review article digital change detection techniques using remotely-sensed data. International

journal of remote sensing, 10(6), pp.989-1003.

Sivapalan, M., Beven, K. and Wood, E.F., 1987. On hydrologic similarity: 2. A scaled model of storm runoff

production. Water Resources Research, 23(12), pp.2266-2278.

Swedish Meteorological and Hydrological Institute. 2017. SMHI Öppna Data - Meteorologiska observationer.

[ONLINE] Available at: http://opendata-download-metobs.smhi.se/explore/#. [Accessed 25 May 2017].

Snedecor, G.W. and Cochran, W.G., 1968. Statistical methods the lowa state university press, lowa. USA pp1-435.

Snee, R.D., 1977. Validation of regression models: methods and examples. Technometrics, 19(4), pp.415-428.

Strobl, C., Boulesteix, A.L., Zeileis, A. and Hothorn, T., 2007. Bias in random forest variable importance measures:

Illustrations, sources and a solution. BMC bioinformatics, 8(1), p.25.

Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T. and Zeileis, A., 2008. Conditional variable importance for

random forests. BMC bioinformatics, 9(1), p.307.

Treitz, P.M., Howarth, P.J., Suffling, R.C. and Smith, P., 1992. Application of detailed ground information to

vegetation mapping with high spatial resolution digital imagery. Remote Sensing of Environment, 42(1),

pp.65-82.

Turner, M.G., 1989. Landscape ecology: the effect of pattern on process. Annual review of ecology and

systematics, 20(1), pp.171-197.

Verrelst, J., Rivera, J.P., Veroustraete, F., Muñoz-Marí, J., Clevers, J.G., Camps-Valls, G. and Moreno, J., 2015.

Experimental Sentinel-2 LAI estimation using parametric, non-parametric and physical retrieval methods–

A comparison. ISPRS Journal of Photogrammetry and Remote Sensing, 108, pp.260-272.

Wager, S., Hastie, T. and Efron, B., 2014. Confidence intervals for random forests: the jackknife and the

infinitesimal jackknife. Journal of Machine Learning Research, 15(1), pp.1625-1651.

Waske, B. and Braun, M., 2009. Classifier ensembles for land cover mapping using multitemporal SAR

imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 64(5), pp.450-457.

Wolter, P.T., Mladenoff, D.J., Host, G.E. and Crow, T.R., 1995. Improved forest classification in the Northern Lake

States using multi-temporal Landsat imagery. Photogrammetric Engineering & Remote Sensing, 61(9),

pp.1129-1143.

Wulder, M.A. and Coops, N.C., 2014. Make Earth observations open access. Nature, 513(7516), p.30.

Yuan, F., Sawaya, K.E., Loeffelholz, B.C. and Bauer, M.E., 2005. Land cover classification and change analysis of

the Twin Cities (Minnesota) Metropolitan Area by multitemporal Landsat remote sensing. Remote sensing of

Environment, 98(2), pp.317-328.

Zimmermann, N.E., Edwards, T.C., Moisen, G.G., Frescino, T.S. and Blackard, J.A., 2007. Remote sensing‐based

predictors improve distribution models of rare, early successional and broadleaf tree species in

Utah. Journal of applied ecology, 44(5), pp.1057-1067.

40

6. Appendix

Table A1. Data inputs.

Name Description

(resolution/MMU)

Date of production/

acquisition

Source

Ekerö prototype biotope database

0.1 – 0.25 ha 2016 © Stockholm University and Metria

DEM 2 m 2009 © Lantmäteriet

Sentinel-2 imagery,

level 2A

10 – 20 m 2016 © Copernicus Sentinel data [2016], ESA

Soil Topographic

Wetness Index

10 m 2016 © Metria

CIR orthophoto 0.25 m 2015 © Lantmäteriet