simulating highly disturbed vegetation distribution: …management, and conservation planning...

26
Submitted 20 January 2020 Accepted 10 August 2020 Published 2 September 2020 Corresponding author Yuanrun Zheng, [email protected] Academic editor Andrea Ghermandi Additional Information and Declarations can be found on page 20 DOI 10.7717/peerj.9839 Copyright 2020 Yi et al. Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS Simulating highly disturbed vegetation distribution: the case of China’s Jing-Jin-Ji region Sangui Yi 1 ,2 , Jihua Zhou 1 , Liming Lai 1 , Hui Du 1 , Qinglin Sun 1 ,2 , Liu Yang 1 ,2 , Xin Liu 1 ,2 , Benben Liu 1 ,2 and Yuanrun Zheng 1 1 Key Laboratory of Plant Resources, West China Subalpine Botanical Garden, Institute of Botany, Chinese Academy of Sciences, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China ABSTRACT Background. Simulating vegetation distribution is an effective method for identifying vegetation distribution patterns and trends. The primary goal of this study was to determine the best simulation method for a vegetation in an area that is heavily affected by human disturbance. Methods. We used climate, topographic, and spectral data as the input variables for four machine learning models (random forest (RF), decision tree (DT), support vector machine (SVM), and maximum likelihood classification (MLC)) on three vegetation classification units (vegetation group (I), vegetation type (II), and formation and subformation (III)) in Jing-Jin-Ji, one of China’s most developed regions. We used a total of 2,789 vegetation points for model training and 974 vegetation points for model assessment. Results. Our results showed that the RF method was the best of the four models, as it could effectively simulate vegetation distribution in all three classification units. The DT method could only simulate vegetation distribution in units I and II, while the other two models could not simulate vegetation distribution in any of the units. Kappa coefficients indicated that the DT and RF methods had more accurate predictions for units I and II than for unit III. The three vegetation classification units were most affected by six variables: three climate variables (annual mean temperature, mean diurnal range, and annual precipitation), one geospatial variable (slope), and two spectral variables (Mid-infrared ratio of winter vegetation index and brightness index of summer vegetation index). Variables Combination 7, including annual mean temperature, annual precipitation, mean diurnal range and precipitation of driest month, produced the highest simulation accuracy. Conclusions. We determined that the RF model was the most effective for simulating vegetation distribution in all classification units present in the Jing-Jin-Ji region. The RF model produced high accuracy vegetation distributions in classification units I and II, but relatively low accuracy in classification unit III. Four climate variables were sufficient for vegetation distribution simulation in such region. Subjects Ecology, Ecosystem Science, Climate Change Biology, Natural Resource Management, Environmental Impacts Keywords Vegetation distribution model, Vegetation classification unit, Important predictor variable, Jing-Jin-Ji region How to cite this article Yi S, Zhou J, Lai L, Du H, Sun Q, Yang L, Liu X, Liu B, Zheng Y. 2020. Simulating highly disturbed vegetation distribution: the case of China’s Jing-Jin-Ji region. PeerJ 8:e9839 http://doi.org/10.7717/peerj.9839

Upload: others

Post on 09-Sep-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Submitted 20 January 2020Accepted 10 August 2020Published 2 September 2020

Corresponding authorYuanrun Zheng zhengyribcasaccn

Academic editorAndrea Ghermandi

Additional Information andDeclarations can be found onpage 20

DOI 107717peerj9839

Copyright2020 Yi et al

Distributed underCreative Commons CC-BY 40

OPEN ACCESS

Simulating highly disturbed vegetationdistribution the case of ChinarsquosJing-Jin-Ji regionSangui Yi12 Jihua Zhou1 Liming Lai1 Hui Du1 Qinglin Sun12 Liu Yang12Xin Liu12 Benben Liu12 and Yuanrun Zheng1

1Key Laboratory of Plant Resources West China Subalpine Botanical Garden Institute of BotanyChinese Academy of Sciences Beijing China

2University of Chinese Academy of Sciences Beijing China

ABSTRACTBackground Simulating vegetation distribution is an effective method for identifyingvegetation distribution patterns and trends The primary goal of this study was todetermine the best simulationmethod for a vegetation in an area that is heavily affectedby human disturbanceMethods We used climate topographic and spectral data as the input variables forfour machine learning models (random forest (RF) decision tree (DT) support vectormachine (SVM) and maximum likelihood classification (MLC)) on three vegetationclassification units (vegetation group (I) vegetation type (II) and formation andsubformation (III)) in Jing-Jin-Ji one of Chinarsquos most developed regions We used atotal of 2789 vegetation points for model training and 974 vegetation points for modelassessmentResults Our results showed that the RF method was the best of the four models as itcould effectively simulate vegetation distribution in all three classification units TheDT method could only simulate vegetation distribution in units I and II while theother two models could not simulate vegetation distribution in any of the units Kappacoefficients indicated that the DT and RF methods had more accurate predictionsfor units I and II than for unit III The three vegetation classification units weremost affected by six variables three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) andtwo spectral variables (Mid-infrared ratio of winter vegetation index and brightnessindex of summer vegetation index) Variables Combination 7 including annual meantemperature annual precipitation mean diurnal range and precipitation of driestmonth produced the highest simulation accuracyConclusions We determined that the RF model was the most effective for simulatingvegetation distribution in all classification units present in the Jing-Jin-Ji region TheRF model produced high accuracy vegetation distributions in classification units I andII but relatively low accuracy in classification unit III Four climate variables weresufficient for vegetation distribution simulation in such region

Subjects Ecology Ecosystem Science Climate Change Biology Natural Resource ManagementEnvironmental ImpactsKeywords Vegetation distribution model Vegetation classification unit Important predictorvariable Jing-Jin-Ji region

How to cite this article Yi S Zhou J Lai L Du H Sun Q Yang L Liu X Liu B Zheng Y 2020 Simulating highly disturbed vegetationdistribution the case of Chinarsquos Jing-Jin-Ji region PeerJ 8e9839 httpdoiorg107717peerj9839

INTRODUCTIONVegetation is an essential component of terrestrial ecosystems and landscapes (EditorialCommittee of Vegetation Map of China 2007) Environmental research resourcemanagement and conservation planning require vegetation distribution maps (Franklin2010) to better understand use and monitor vegetation Vegetation patterns anddistributions are affected by the climate (Chen et al 2015 Zhang et al 2018) and otherdisturbances particularly those caused by changes in land use (Hansen et al 2013Wehkamp et al 2018) Human disturbances such as industrialization urbanizationpopulation growth land use change for agricultural use etc strongly influence theenvironment by greatly altering vegetation patterns making exact mapping a significantchallenge (Xie Sha amp Yu 2008 Zhou et al 2016)

Field surveys the traditional method used to map vegetation are costly and labor-intensive (Newell amp Leathwick 2005 Zhou et al 2016) Mapping using remote sensingdata is also a popular method that has been used over the last 30 years (Xie Sha ampYu 2008) This method makes it possible to obtain a wide range of reliable data fromremote sensing images and it updates vegetation boundaries by visually interpretingimages and field surveys (Zhang et al 2008) However determining vegetation units andtheir boundaries by visual interpretation can produce inaccurate results Researchersmay get different results from the same images for the same study areas (Bie amp Beckett1973 Pfeffer Pebesma amp Burrough 2003) Furthermore field survey and remote sensingmethods manually draw vegetation unit boundaries based on climate elevation and soiltype information which can be inaccurate in transition areas (Zhang et al 2008) Usingsimulation models in combination with field and remote sensing data may be an effectivealternative for mapping vegetation

Changes in the environment can affect vegetation composition structure functionand spatial distribution Environmental variables have been used to simulate the globaldistribution of vegetation (Dilts et al 2015 Mod et al 2016) Simulation models areusually developed to test how environmental variables control vegetation distribution(Guisan amp Zimmermann 2000) Modern remote sensing data and software make it moreconvenient than ever before to produce predictive vegetation maps (Franklin 1995)

Predictive vegetation mapping uses environmental variables and various models basedon niche theory and gradient analysis to visualize communities in geographic space (Diltset al 2015 Lany et al 2019) Other methods based on statistics and machine learninghave also been used to simulate vegetation distribution Predictive vegetation mappingincludes various statistical methods such as the generalized linear model the generalizedadditive model and multivariate statistical approaches (Lany et al 2019 Prasad Iversonamp Liaw 2006) Recently machine learning modeling methods have been used to map thedistribution of both vegetation communities and individual species Thesemethods includethe support vector machine (SVM) decision tree (DT) and artificial neural network(Guisan amp Zimmermann 2000 Hastie Tibshirani amp Friedman 2009 Zhou et al 2016)These machine learning models have fewer limitations and can produce more reliableresults than traditional vegetation modeling methods (Hastie Tibshirani amp Friedman

Yi et al (2020) PeerJ DOI 107717peerj9839 226

2009) Advanced machine learning techniques can integrate spectral and spatial predictorsand improve classification accuracy by retaining important information about vegetationcomposition and structural differences (Sesnie et al 2010) Machine learning modelsefficiently and cost-effectively produce vegetation maps without the general inaccuraciescaused by visual interpretation (Franklin 2010)

The Jing-Jin-Ji region also known as the Beijing-Tianjin-Hebei urban agglomerationis the center of northern Chinese politics culture and economy Because of its extensionit faces significant problems such as unbalanced regional development and the strugglebetween economic growth and limited resources The regionrsquos larger cities includingBeijingand Tianjin have large populations developed economies and abundant educationalresources However these big cities face issues of limited natural resources and seriousecological and environmental pollution In particular Beijingrsquos large population requireslimited resources such as water land and vegetation (Wang amp Gong 2018) Breaking upadministrative divisions may be the best method to coordinate regional development(Wang et al 2019) The new Xiongrsquoan area located in Hebei province is being constructedto relocate some of Beijingrsquos population The development of areas like Xiongrsquoan is affectedby the surrounding natural environment To better integrate the environmental carryingcapacity and socioeconomic development of the Jing-Jin-Ji region including the newXiongrsquoan area accurate vegetation maps with temporal resolution are necessary The mostupdated vegetation map of the Jing-Jin-Ji region is the Vegetation Map of the PeoplersquosRepublic of China (VMC) with a scale of 11000000 (Editorial Committee of VegetationMap of China 2007) Most of its data come from a field survey conducted between 1980and 1990 meaning its temporal and spatial scales are both outdated

In this study we integrated geospatial climate and spectral data to simulate vegetationdistribution through four different models across three vegetation classification units Thisresearch was different from the research of Zhou et al (2016) Firstly the research area ofthis research was the Jing-Jin-Ji region located in theNorth China Plain and affected by highsocial-economic disturbance while the Qilian Mountain in the research of Zhou et al ischaracterized by complex terrain but without high social-economic disturbance Secondlythe predictive variables as well as the combinations of these variables were different from theresearch of Zhou et al (2016) Thirdly we compared four model methods for simulatingdistribution of vegetation in three vegetation classification levels while only three modelswere used for simulation in two vegetation classification levels in the research of Zhou etal (2016) Our primary objectives were to (1) determine the best modeling method forvegetation affected by high socioeconomic disturbance (2) create an improved vegetationmap of the Jing-Jin-Ji region (3) determine the predictive abilities of different modelsacross different vegetation classification units and (4) determine which variables enhancedthe classification accuracy for vegetation mapping

MATERIALS amp METHODSStudy areaThe Jing-Jin-Ji region is located in the northern part of the North China Plain Its locationranges from 11304prime to 11953primeE and 3601prime to 4237primeN and is bordered by Taihang

Yi et al (2020) PeerJ DOI 107717peerj9839 326

Figure 1 The location and DEM of the Jing-Jin-Ji regionFull-size DOI 107717peerj9839fig-1

Mountain in the west Yanshan Mountain in the north and the Bohai Sea in the eastThe region includes the Beijing Tianjin and Hebei provinces (Fig 1) Jing-Jin-Ji has apopulation of approximately 110 million people and covers an area of approximately216000 km2 (Wang et al 2019) The region is a temperate monsoon climate zone with anelevation range ofminus14 to 2837 m (Fig 1) The annual precipitation ranges from 305 to 711mm with increased precipitation at lower altitudes The annual mean temperature rangesfrom minus3 to 14 C with colder averages at higher elevations The amount of precipitationin the region gradually decreases moving from the southeast to the northwest whiletemperature changes show the reverse pattern

Yi et al (2020) PeerJ DOI 107717peerj9839 426

Vegetation and training dataThe VMC completed in 2007 based on field survey data included eight vegetation groups(I) 15 vegetation types (II) and 75 formations and subformations (III) from the Jing-Jin-Jiregion However some of the maprsquos vegetation unit areas are very small and difficult todistinguish To ensure that enough training and assessment point data can be randomlyselected in units II and III we selected eight units I 12 units II and 39 units III from thestudy area (Table 1) Cultivated vegetation are mainly distributed in low areas with analtitude range ofminus14 to 254 m and an annual mean temperature range of 7 to 14 C Majorcultivated plants include winter wheat and coarse grains Scrub and grass-forb communitiesare mainly distributed in the north in elevations ranging from 254 to 1440 m

We obtained model training and assessment data on vegetation composition from fieldsurveys and other publications We collected a total of 3763 vegetation points with 2789of those used for model training and 974 used for model assessment Each unit III had atleast 80 vegetation points with at least 60 of those used for model training and 20 usedfor model assessment The model training and assessment data were randomly selected foreach unit III Additionally we increased the credibility of the model assessment by firstrasterizing the vector VMC onto the same grid as the modeled data and then assessingthe data using the Kappa coefficient (Landis amp Koch 1977 Weng amp Zhou 2006 Zhou etal 2016)

Geospatial climate and spectral dataWe derived geospatial variables including elevation slope and aspect from the 30 mresolution Shuttle Radar Topography Mission (SRTM) digital elevation model (DEMZhao et al 2018) We then resampled these data to a 500times500 m grid cell size using thecubic technique in ArcGIS 103 (Wu et al 2019)

We downloaded the climate data including 19 bioclimatic variables atsim1 km resolutionfromWorldClim (Fick amp Hijmans 2017) at httpworldclimorg These climate data werealso resampled to a 500times500 m grid cell size using the cubic technique in ArcGIS 103 (Wuet al 2019) Climatic variables are important for plant ecophysiology (Mod et al 2016)and are commonly used as bioclimatic limits in vegetation models (Sitch et al 2003)

We acquired the MYD09A1500M product data (sinusoidal projection path 4 and row26 path 4 and row 27 path 5 and row 26 path 5 and row 27) from summer (July 202013) and winter (January 17 2013) as Modis images from the Geospatial Data Cloudat httpwwwgscloudcn Our image pre-processing included image subset mosaickingand clipping in ENVI 52 (Deng 2010) We obtained the land surface albedo in bands1-7 directly from the MYD09A1500M product and calculated the indicesrsquo effectiveness atreflecting vegetation information (Price Guo amp Stiles 2002 Zhou et al 2016)

Since vegetation indices can provide information on both vegetation and environment(Bannari Morin amp Bonn 1995) these indices are more sensitive than single spectralbands at detecting green vegetation (Bannari Morin amp Bonn 1995 Cohen amp Goward2004) Therefore vegetation indices can be used for image interpretation vegetationdiscrimination and prediction and land use change detection (Bannari Morin amp Bonn

Yi et al (2020) PeerJ DOI 107717peerj9839 526

Table 1 Classification units of the vegetation of China

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

0 No vegetation 0 No vegetation 0 No vegetation1 Needleleaf forest 1 Temperate needleleaf forest 1 Pinus tabulaeformis forest

2 Quercus mongolica forest3 Quercus liaotungensis forest4 Quercus variabilis forest5 Robinia pseudoacacia forest6 Salix matsudana forest7 Populus davidiana forest

2 Broadleaf forest 2 Temperate broadleaf deciduous forest

8 Betula platyphylla forest9 Corylus heterophylla scrub10 Lespedeza bicolor scrub11 Prunus armeniaca var ansa scrub12 Vitex negundo var heterophylla Zizyphus jujuba varspinosa scrub13 Cotinus coggygria var cinerea scrub14 Spiraea spp scrub

3 Scrub 3 Temperate broadleaf deciduous scrub

15 Ostryopsis davidiana scrub16 Stipa baicalensis forb meadow steppe4 Temperate grass-forb meadow steppe17 Filifolium sibiricum grass-forb meadow steppe18 Aneurolepidium chinense needlegrass steppe19 Stipa krylovii steppe20 Stipa bungiana steppe

4 Steppe5 Temperate needlegrass arid steppe

21 Thymus mongolicus needlegrass steppe22 Bothriochloa ischaemum community23 Bothriochloa ischaemum community24 Vitex negundo var heterophylla Zizyphus jujubavar spinosus Bothriochloa ischaemum scrub and grasscommunity

5 Grass-forb community 6 Temperate grass-forb community

25 Vitex negundo var heterophylla Zizyphus jujuba varspinosus Themeda triandra var japonica scrub and grasscommunity

7 Temperate grass and forb meadow 26 Arundinella hirta Spodiopogon sibiricus forb meadow27 Carex spp forb meadow28 Achnatherum splendens holophytic meadow

6 Meadow8 Temperate grass and forb holophytic meadow

29 Suaeda glauca holophytic meadow7 Swamp 9 Cold-temperate and temperate swamp 30 Phragmites communis swamp

(continued on next page)

Yi et al (2020) PeerJ DOI 107717peerj9839 626

Table 1 (continued)

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

10 One crop annually and cold-resistant economic crops 31 Spring wheat naked oats buckwheat potatoes flux11 One crop annually cold-resistant economic crops anddeciduous orchards

32 Coarse grains

33 Winter wheat coarse grains34 Coarse grains35 Rice36 Winter wheat corn cotton37 Apple pear orchard38 Winter wheat corn Chinese sorghum sweet potatoescotton tabacco peanut sesame apple pear hauthornpersimmon walnut pomegranat grape

8 Cultural vegetation12 Three crops two years and two cropsannually non irrigation deciduous orchards

39 Winter wheat coarse grains (loamy soil)

Table 2 The vegetation indices

Indices Abbreviation Formula

Ratio vegetation index RVI NIRRedBrightness index BI 02909Blue + 02493Green + 04806Red + 05568NIR +

04438SWIR1 + 01706SWIR2Green vegetation index GI minus02728Blue - 02174Green-05508Red + 07221NIR +

00733SWIR1 - 01648SWIR2Wetness index WI 01446Blue + 01761Green + 03322Red + 03396NIR -

06210SWIR1 - 04186SWIR2Differenced vegetation index DVI NIR - RedGreen ratio GR NIRGreenMid-infrared ratio MR NIRSWIR1Soil-adjusted vegetation index SAVI (15(NIR - Red))((NIR + Red + 05))Optimization of soil-adjusted vegetation index OSAVI (116(NIR - Red))((NIR + Red + 016))Atmospherically resistant vegetation index ARVI (NIR - (2Red - Blue))(NIR + (2Red - Blue))Normalized difference vegetation index NDVI (NIR - Red)(NIR + Red)Enhanced vegetation index EVI 25[(NIR - Red)(NIR + 6Red - 75Blue + 1)]Normalized difference tillage index NDTI (SWIR1-SWIR2)(SWIR1 + SWIR2)Normalized difference senescent vegetation index NDSVI (SWIR1-Red)(SWIR1 + Red)

1995 Cohen amp Goward 2004 Zhou et al 2016) We tested the vegetation discriminationof 14 vegetation indices (Table 2)

To determine the distribution predictive ability of different variables we grouped thevariables into different combinations based on the results of the Pearson correlation Weonly used less correlated variables (R lt |07| Pearson correlation) (Chala et al 2017) inCombinations 1ndash9 (Table 3) then used variable combinations as input predictor variablesto simulate vegetation distribution Combination 1 included the less correlated variablesof the summer land surface albedos from bands 1 to 7 Combination 2 included the lesscorrelated variables of the winter land surface albedos from bands 1 to 7 Combination 3included the less correlated variables in Combinations 1 and 2 Combination 4 included

Yi et al (2020) PeerJ DOI 107717peerj9839 726

Table 3 Variable combinationsNote DT10 and RF10 represent the top 10 important variables of deci-sion tree (DT) and random forest (RF) methods with Combination 9 in the vegetation group level respec-tively The vegetation indices and their abbreviations were shown in Table 2

Number Variables combinations

1 Summer land surface albedos of band 1 and 52 Winter land surface albedos of band 1 and 63 Summer land surface albedos of band 1 and 5

Winter land surface albedos of band 1 and 64 Summer vegetation indices BI WI MR NDVI EVI5 Winter vegetation indices MR NDVI EVI NDTI6 Summer vegetation indices BI WI MR NDVI EVI

Winter vegetation indices MR NDVI EVI NDTI7 Annual mean temperature Annual precipitation Mean

diurnal range Precipitation of driest month8 Slope Aspect Annual mean temperature Annual

precipitation Mean diurnal range Precipitation of driestmonth

9 Summer land surface albedos of band 1Winter land surface albedos of band 6Summer vegetation indices BI WI MR NDVI EVIWinter vegetation indices MR NDVI EVI NDTISlope Aspect Annual mean temperature Annualprecipitation Mean diurnal range Precipitation of driestmonth

10 DT10 Annual mean temperature Annual precipitationMean diurnal range Precipitation of driest month SlopeWinter vegetation indices MR Summer land surfacealbedos of band 1 Summer vegetation indices BI and EVIWinter land surface albedos of band 6

11 RF10 Annual precipitation Annual mean temperatureMean diurnal range Slope Summer vegetation indices BIMR NDVI and EVI Winter vegetation indices MR andNDVI

the less correlated variables of the summer vegetation indices Combination 5 included theless correlated variables of the winter vegetation indices Combination 6 included the lesscorrelated variables in Combinations 4 and 5 Combination 7 included the less correlatedvariables from the 19 bioclimatic variables Combination 8 included the less correlatedvariables from the 19 bioclimatic variables and three geospatial variables Combination9 included the less correlated variables in Combinations 3 6 and 8 Combinations 10and 11 represented the top 10 most important variables of the DT and RF methodswith Combination 9 in vegetation unit I respectively (Table 3) The SVM and maximumlikelihood classification (MLC) methods only output the simulation results of variableCombinations 1 to 6 likely due to the training samplesrsquo weak separability (Deng 2010)

Vegetation distribution modelsWe used DT RF MLC and SVM vegetation distribution models in this study TheDT model is a divisive monothetic and supervised classifier often used for speciesdistribution modeling and related applications (Franklin 2010) It is computationally fast

Yi et al (2020) PeerJ DOI 107717peerj9839 826

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 2: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

INTRODUCTIONVegetation is an essential component of terrestrial ecosystems and landscapes (EditorialCommittee of Vegetation Map of China 2007) Environmental research resourcemanagement and conservation planning require vegetation distribution maps (Franklin2010) to better understand use and monitor vegetation Vegetation patterns anddistributions are affected by the climate (Chen et al 2015 Zhang et al 2018) and otherdisturbances particularly those caused by changes in land use (Hansen et al 2013Wehkamp et al 2018) Human disturbances such as industrialization urbanizationpopulation growth land use change for agricultural use etc strongly influence theenvironment by greatly altering vegetation patterns making exact mapping a significantchallenge (Xie Sha amp Yu 2008 Zhou et al 2016)

Field surveys the traditional method used to map vegetation are costly and labor-intensive (Newell amp Leathwick 2005 Zhou et al 2016) Mapping using remote sensingdata is also a popular method that has been used over the last 30 years (Xie Sha ampYu 2008) This method makes it possible to obtain a wide range of reliable data fromremote sensing images and it updates vegetation boundaries by visually interpretingimages and field surveys (Zhang et al 2008) However determining vegetation units andtheir boundaries by visual interpretation can produce inaccurate results Researchersmay get different results from the same images for the same study areas (Bie amp Beckett1973 Pfeffer Pebesma amp Burrough 2003) Furthermore field survey and remote sensingmethods manually draw vegetation unit boundaries based on climate elevation and soiltype information which can be inaccurate in transition areas (Zhang et al 2008) Usingsimulation models in combination with field and remote sensing data may be an effectivealternative for mapping vegetation

Changes in the environment can affect vegetation composition structure functionand spatial distribution Environmental variables have been used to simulate the globaldistribution of vegetation (Dilts et al 2015 Mod et al 2016) Simulation models areusually developed to test how environmental variables control vegetation distribution(Guisan amp Zimmermann 2000) Modern remote sensing data and software make it moreconvenient than ever before to produce predictive vegetation maps (Franklin 1995)

Predictive vegetation mapping uses environmental variables and various models basedon niche theory and gradient analysis to visualize communities in geographic space (Diltset al 2015 Lany et al 2019) Other methods based on statistics and machine learninghave also been used to simulate vegetation distribution Predictive vegetation mappingincludes various statistical methods such as the generalized linear model the generalizedadditive model and multivariate statistical approaches (Lany et al 2019 Prasad Iversonamp Liaw 2006) Recently machine learning modeling methods have been used to map thedistribution of both vegetation communities and individual species Thesemethods includethe support vector machine (SVM) decision tree (DT) and artificial neural network(Guisan amp Zimmermann 2000 Hastie Tibshirani amp Friedman 2009 Zhou et al 2016)These machine learning models have fewer limitations and can produce more reliableresults than traditional vegetation modeling methods (Hastie Tibshirani amp Friedman

Yi et al (2020) PeerJ DOI 107717peerj9839 226

2009) Advanced machine learning techniques can integrate spectral and spatial predictorsand improve classification accuracy by retaining important information about vegetationcomposition and structural differences (Sesnie et al 2010) Machine learning modelsefficiently and cost-effectively produce vegetation maps without the general inaccuraciescaused by visual interpretation (Franklin 2010)

The Jing-Jin-Ji region also known as the Beijing-Tianjin-Hebei urban agglomerationis the center of northern Chinese politics culture and economy Because of its extensionit faces significant problems such as unbalanced regional development and the strugglebetween economic growth and limited resources The regionrsquos larger cities includingBeijingand Tianjin have large populations developed economies and abundant educationalresources However these big cities face issues of limited natural resources and seriousecological and environmental pollution In particular Beijingrsquos large population requireslimited resources such as water land and vegetation (Wang amp Gong 2018) Breaking upadministrative divisions may be the best method to coordinate regional development(Wang et al 2019) The new Xiongrsquoan area located in Hebei province is being constructedto relocate some of Beijingrsquos population The development of areas like Xiongrsquoan is affectedby the surrounding natural environment To better integrate the environmental carryingcapacity and socioeconomic development of the Jing-Jin-Ji region including the newXiongrsquoan area accurate vegetation maps with temporal resolution are necessary The mostupdated vegetation map of the Jing-Jin-Ji region is the Vegetation Map of the PeoplersquosRepublic of China (VMC) with a scale of 11000000 (Editorial Committee of VegetationMap of China 2007) Most of its data come from a field survey conducted between 1980and 1990 meaning its temporal and spatial scales are both outdated

In this study we integrated geospatial climate and spectral data to simulate vegetationdistribution through four different models across three vegetation classification units Thisresearch was different from the research of Zhou et al (2016) Firstly the research area ofthis research was the Jing-Jin-Ji region located in theNorth China Plain and affected by highsocial-economic disturbance while the Qilian Mountain in the research of Zhou et al ischaracterized by complex terrain but without high social-economic disturbance Secondlythe predictive variables as well as the combinations of these variables were different from theresearch of Zhou et al (2016) Thirdly we compared four model methods for simulatingdistribution of vegetation in three vegetation classification levels while only three modelswere used for simulation in two vegetation classification levels in the research of Zhou etal (2016) Our primary objectives were to (1) determine the best modeling method forvegetation affected by high socioeconomic disturbance (2) create an improved vegetationmap of the Jing-Jin-Ji region (3) determine the predictive abilities of different modelsacross different vegetation classification units and (4) determine which variables enhancedthe classification accuracy for vegetation mapping

MATERIALS amp METHODSStudy areaThe Jing-Jin-Ji region is located in the northern part of the North China Plain Its locationranges from 11304prime to 11953primeE and 3601prime to 4237primeN and is bordered by Taihang

Yi et al (2020) PeerJ DOI 107717peerj9839 326

Figure 1 The location and DEM of the Jing-Jin-Ji regionFull-size DOI 107717peerj9839fig-1

Mountain in the west Yanshan Mountain in the north and the Bohai Sea in the eastThe region includes the Beijing Tianjin and Hebei provinces (Fig 1) Jing-Jin-Ji has apopulation of approximately 110 million people and covers an area of approximately216000 km2 (Wang et al 2019) The region is a temperate monsoon climate zone with anelevation range ofminus14 to 2837 m (Fig 1) The annual precipitation ranges from 305 to 711mm with increased precipitation at lower altitudes The annual mean temperature rangesfrom minus3 to 14 C with colder averages at higher elevations The amount of precipitationin the region gradually decreases moving from the southeast to the northwest whiletemperature changes show the reverse pattern

Yi et al (2020) PeerJ DOI 107717peerj9839 426

Vegetation and training dataThe VMC completed in 2007 based on field survey data included eight vegetation groups(I) 15 vegetation types (II) and 75 formations and subformations (III) from the Jing-Jin-Jiregion However some of the maprsquos vegetation unit areas are very small and difficult todistinguish To ensure that enough training and assessment point data can be randomlyselected in units II and III we selected eight units I 12 units II and 39 units III from thestudy area (Table 1) Cultivated vegetation are mainly distributed in low areas with analtitude range ofminus14 to 254 m and an annual mean temperature range of 7 to 14 C Majorcultivated plants include winter wheat and coarse grains Scrub and grass-forb communitiesare mainly distributed in the north in elevations ranging from 254 to 1440 m

We obtained model training and assessment data on vegetation composition from fieldsurveys and other publications We collected a total of 3763 vegetation points with 2789of those used for model training and 974 used for model assessment Each unit III had atleast 80 vegetation points with at least 60 of those used for model training and 20 usedfor model assessment The model training and assessment data were randomly selected foreach unit III Additionally we increased the credibility of the model assessment by firstrasterizing the vector VMC onto the same grid as the modeled data and then assessingthe data using the Kappa coefficient (Landis amp Koch 1977 Weng amp Zhou 2006 Zhou etal 2016)

Geospatial climate and spectral dataWe derived geospatial variables including elevation slope and aspect from the 30 mresolution Shuttle Radar Topography Mission (SRTM) digital elevation model (DEMZhao et al 2018) We then resampled these data to a 500times500 m grid cell size using thecubic technique in ArcGIS 103 (Wu et al 2019)

We downloaded the climate data including 19 bioclimatic variables atsim1 km resolutionfromWorldClim (Fick amp Hijmans 2017) at httpworldclimorg These climate data werealso resampled to a 500times500 m grid cell size using the cubic technique in ArcGIS 103 (Wuet al 2019) Climatic variables are important for plant ecophysiology (Mod et al 2016)and are commonly used as bioclimatic limits in vegetation models (Sitch et al 2003)

We acquired the MYD09A1500M product data (sinusoidal projection path 4 and row26 path 4 and row 27 path 5 and row 26 path 5 and row 27) from summer (July 202013) and winter (January 17 2013) as Modis images from the Geospatial Data Cloudat httpwwwgscloudcn Our image pre-processing included image subset mosaickingand clipping in ENVI 52 (Deng 2010) We obtained the land surface albedo in bands1-7 directly from the MYD09A1500M product and calculated the indicesrsquo effectiveness atreflecting vegetation information (Price Guo amp Stiles 2002 Zhou et al 2016)

Since vegetation indices can provide information on both vegetation and environment(Bannari Morin amp Bonn 1995) these indices are more sensitive than single spectralbands at detecting green vegetation (Bannari Morin amp Bonn 1995 Cohen amp Goward2004) Therefore vegetation indices can be used for image interpretation vegetationdiscrimination and prediction and land use change detection (Bannari Morin amp Bonn

Yi et al (2020) PeerJ DOI 107717peerj9839 526

Table 1 Classification units of the vegetation of China

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

0 No vegetation 0 No vegetation 0 No vegetation1 Needleleaf forest 1 Temperate needleleaf forest 1 Pinus tabulaeformis forest

2 Quercus mongolica forest3 Quercus liaotungensis forest4 Quercus variabilis forest5 Robinia pseudoacacia forest6 Salix matsudana forest7 Populus davidiana forest

2 Broadleaf forest 2 Temperate broadleaf deciduous forest

8 Betula platyphylla forest9 Corylus heterophylla scrub10 Lespedeza bicolor scrub11 Prunus armeniaca var ansa scrub12 Vitex negundo var heterophylla Zizyphus jujuba varspinosa scrub13 Cotinus coggygria var cinerea scrub14 Spiraea spp scrub

3 Scrub 3 Temperate broadleaf deciduous scrub

15 Ostryopsis davidiana scrub16 Stipa baicalensis forb meadow steppe4 Temperate grass-forb meadow steppe17 Filifolium sibiricum grass-forb meadow steppe18 Aneurolepidium chinense needlegrass steppe19 Stipa krylovii steppe20 Stipa bungiana steppe

4 Steppe5 Temperate needlegrass arid steppe

21 Thymus mongolicus needlegrass steppe22 Bothriochloa ischaemum community23 Bothriochloa ischaemum community24 Vitex negundo var heterophylla Zizyphus jujubavar spinosus Bothriochloa ischaemum scrub and grasscommunity

5 Grass-forb community 6 Temperate grass-forb community

25 Vitex negundo var heterophylla Zizyphus jujuba varspinosus Themeda triandra var japonica scrub and grasscommunity

7 Temperate grass and forb meadow 26 Arundinella hirta Spodiopogon sibiricus forb meadow27 Carex spp forb meadow28 Achnatherum splendens holophytic meadow

6 Meadow8 Temperate grass and forb holophytic meadow

29 Suaeda glauca holophytic meadow7 Swamp 9 Cold-temperate and temperate swamp 30 Phragmites communis swamp

(continued on next page)

Yi et al (2020) PeerJ DOI 107717peerj9839 626

Table 1 (continued)

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

10 One crop annually and cold-resistant economic crops 31 Spring wheat naked oats buckwheat potatoes flux11 One crop annually cold-resistant economic crops anddeciduous orchards

32 Coarse grains

33 Winter wheat coarse grains34 Coarse grains35 Rice36 Winter wheat corn cotton37 Apple pear orchard38 Winter wheat corn Chinese sorghum sweet potatoescotton tabacco peanut sesame apple pear hauthornpersimmon walnut pomegranat grape

8 Cultural vegetation12 Three crops two years and two cropsannually non irrigation deciduous orchards

39 Winter wheat coarse grains (loamy soil)

Table 2 The vegetation indices

Indices Abbreviation Formula

Ratio vegetation index RVI NIRRedBrightness index BI 02909Blue + 02493Green + 04806Red + 05568NIR +

04438SWIR1 + 01706SWIR2Green vegetation index GI minus02728Blue - 02174Green-05508Red + 07221NIR +

00733SWIR1 - 01648SWIR2Wetness index WI 01446Blue + 01761Green + 03322Red + 03396NIR -

06210SWIR1 - 04186SWIR2Differenced vegetation index DVI NIR - RedGreen ratio GR NIRGreenMid-infrared ratio MR NIRSWIR1Soil-adjusted vegetation index SAVI (15(NIR - Red))((NIR + Red + 05))Optimization of soil-adjusted vegetation index OSAVI (116(NIR - Red))((NIR + Red + 016))Atmospherically resistant vegetation index ARVI (NIR - (2Red - Blue))(NIR + (2Red - Blue))Normalized difference vegetation index NDVI (NIR - Red)(NIR + Red)Enhanced vegetation index EVI 25[(NIR - Red)(NIR + 6Red - 75Blue + 1)]Normalized difference tillage index NDTI (SWIR1-SWIR2)(SWIR1 + SWIR2)Normalized difference senescent vegetation index NDSVI (SWIR1-Red)(SWIR1 + Red)

1995 Cohen amp Goward 2004 Zhou et al 2016) We tested the vegetation discriminationof 14 vegetation indices (Table 2)

To determine the distribution predictive ability of different variables we grouped thevariables into different combinations based on the results of the Pearson correlation Weonly used less correlated variables (R lt |07| Pearson correlation) (Chala et al 2017) inCombinations 1ndash9 (Table 3) then used variable combinations as input predictor variablesto simulate vegetation distribution Combination 1 included the less correlated variablesof the summer land surface albedos from bands 1 to 7 Combination 2 included the lesscorrelated variables of the winter land surface albedos from bands 1 to 7 Combination 3included the less correlated variables in Combinations 1 and 2 Combination 4 included

Yi et al (2020) PeerJ DOI 107717peerj9839 726

Table 3 Variable combinationsNote DT10 and RF10 represent the top 10 important variables of deci-sion tree (DT) and random forest (RF) methods with Combination 9 in the vegetation group level respec-tively The vegetation indices and their abbreviations were shown in Table 2

Number Variables combinations

1 Summer land surface albedos of band 1 and 52 Winter land surface albedos of band 1 and 63 Summer land surface albedos of band 1 and 5

Winter land surface albedos of band 1 and 64 Summer vegetation indices BI WI MR NDVI EVI5 Winter vegetation indices MR NDVI EVI NDTI6 Summer vegetation indices BI WI MR NDVI EVI

Winter vegetation indices MR NDVI EVI NDTI7 Annual mean temperature Annual precipitation Mean

diurnal range Precipitation of driest month8 Slope Aspect Annual mean temperature Annual

precipitation Mean diurnal range Precipitation of driestmonth

9 Summer land surface albedos of band 1Winter land surface albedos of band 6Summer vegetation indices BI WI MR NDVI EVIWinter vegetation indices MR NDVI EVI NDTISlope Aspect Annual mean temperature Annualprecipitation Mean diurnal range Precipitation of driestmonth

10 DT10 Annual mean temperature Annual precipitationMean diurnal range Precipitation of driest month SlopeWinter vegetation indices MR Summer land surfacealbedos of band 1 Summer vegetation indices BI and EVIWinter land surface albedos of band 6

11 RF10 Annual precipitation Annual mean temperatureMean diurnal range Slope Summer vegetation indices BIMR NDVI and EVI Winter vegetation indices MR andNDVI

the less correlated variables of the summer vegetation indices Combination 5 included theless correlated variables of the winter vegetation indices Combination 6 included the lesscorrelated variables in Combinations 4 and 5 Combination 7 included the less correlatedvariables from the 19 bioclimatic variables Combination 8 included the less correlatedvariables from the 19 bioclimatic variables and three geospatial variables Combination9 included the less correlated variables in Combinations 3 6 and 8 Combinations 10and 11 represented the top 10 most important variables of the DT and RF methodswith Combination 9 in vegetation unit I respectively (Table 3) The SVM and maximumlikelihood classification (MLC) methods only output the simulation results of variableCombinations 1 to 6 likely due to the training samplesrsquo weak separability (Deng 2010)

Vegetation distribution modelsWe used DT RF MLC and SVM vegetation distribution models in this study TheDT model is a divisive monothetic and supervised classifier often used for speciesdistribution modeling and related applications (Franklin 2010) It is computationally fast

Yi et al (2020) PeerJ DOI 107717peerj9839 826

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 3: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

2009) Advanced machine learning techniques can integrate spectral and spatial predictorsand improve classification accuracy by retaining important information about vegetationcomposition and structural differences (Sesnie et al 2010) Machine learning modelsefficiently and cost-effectively produce vegetation maps without the general inaccuraciescaused by visual interpretation (Franklin 2010)

The Jing-Jin-Ji region also known as the Beijing-Tianjin-Hebei urban agglomerationis the center of northern Chinese politics culture and economy Because of its extensionit faces significant problems such as unbalanced regional development and the strugglebetween economic growth and limited resources The regionrsquos larger cities includingBeijingand Tianjin have large populations developed economies and abundant educationalresources However these big cities face issues of limited natural resources and seriousecological and environmental pollution In particular Beijingrsquos large population requireslimited resources such as water land and vegetation (Wang amp Gong 2018) Breaking upadministrative divisions may be the best method to coordinate regional development(Wang et al 2019) The new Xiongrsquoan area located in Hebei province is being constructedto relocate some of Beijingrsquos population The development of areas like Xiongrsquoan is affectedby the surrounding natural environment To better integrate the environmental carryingcapacity and socioeconomic development of the Jing-Jin-Ji region including the newXiongrsquoan area accurate vegetation maps with temporal resolution are necessary The mostupdated vegetation map of the Jing-Jin-Ji region is the Vegetation Map of the PeoplersquosRepublic of China (VMC) with a scale of 11000000 (Editorial Committee of VegetationMap of China 2007) Most of its data come from a field survey conducted between 1980and 1990 meaning its temporal and spatial scales are both outdated

In this study we integrated geospatial climate and spectral data to simulate vegetationdistribution through four different models across three vegetation classification units Thisresearch was different from the research of Zhou et al (2016) Firstly the research area ofthis research was the Jing-Jin-Ji region located in theNorth China Plain and affected by highsocial-economic disturbance while the Qilian Mountain in the research of Zhou et al ischaracterized by complex terrain but without high social-economic disturbance Secondlythe predictive variables as well as the combinations of these variables were different from theresearch of Zhou et al (2016) Thirdly we compared four model methods for simulatingdistribution of vegetation in three vegetation classification levels while only three modelswere used for simulation in two vegetation classification levels in the research of Zhou etal (2016) Our primary objectives were to (1) determine the best modeling method forvegetation affected by high socioeconomic disturbance (2) create an improved vegetationmap of the Jing-Jin-Ji region (3) determine the predictive abilities of different modelsacross different vegetation classification units and (4) determine which variables enhancedthe classification accuracy for vegetation mapping

MATERIALS amp METHODSStudy areaThe Jing-Jin-Ji region is located in the northern part of the North China Plain Its locationranges from 11304prime to 11953primeE and 3601prime to 4237primeN and is bordered by Taihang

Yi et al (2020) PeerJ DOI 107717peerj9839 326

Figure 1 The location and DEM of the Jing-Jin-Ji regionFull-size DOI 107717peerj9839fig-1

Mountain in the west Yanshan Mountain in the north and the Bohai Sea in the eastThe region includes the Beijing Tianjin and Hebei provinces (Fig 1) Jing-Jin-Ji has apopulation of approximately 110 million people and covers an area of approximately216000 km2 (Wang et al 2019) The region is a temperate monsoon climate zone with anelevation range ofminus14 to 2837 m (Fig 1) The annual precipitation ranges from 305 to 711mm with increased precipitation at lower altitudes The annual mean temperature rangesfrom minus3 to 14 C with colder averages at higher elevations The amount of precipitationin the region gradually decreases moving from the southeast to the northwest whiletemperature changes show the reverse pattern

Yi et al (2020) PeerJ DOI 107717peerj9839 426

Vegetation and training dataThe VMC completed in 2007 based on field survey data included eight vegetation groups(I) 15 vegetation types (II) and 75 formations and subformations (III) from the Jing-Jin-Jiregion However some of the maprsquos vegetation unit areas are very small and difficult todistinguish To ensure that enough training and assessment point data can be randomlyselected in units II and III we selected eight units I 12 units II and 39 units III from thestudy area (Table 1) Cultivated vegetation are mainly distributed in low areas with analtitude range ofminus14 to 254 m and an annual mean temperature range of 7 to 14 C Majorcultivated plants include winter wheat and coarse grains Scrub and grass-forb communitiesare mainly distributed in the north in elevations ranging from 254 to 1440 m

We obtained model training and assessment data on vegetation composition from fieldsurveys and other publications We collected a total of 3763 vegetation points with 2789of those used for model training and 974 used for model assessment Each unit III had atleast 80 vegetation points with at least 60 of those used for model training and 20 usedfor model assessment The model training and assessment data were randomly selected foreach unit III Additionally we increased the credibility of the model assessment by firstrasterizing the vector VMC onto the same grid as the modeled data and then assessingthe data using the Kappa coefficient (Landis amp Koch 1977 Weng amp Zhou 2006 Zhou etal 2016)

Geospatial climate and spectral dataWe derived geospatial variables including elevation slope and aspect from the 30 mresolution Shuttle Radar Topography Mission (SRTM) digital elevation model (DEMZhao et al 2018) We then resampled these data to a 500times500 m grid cell size using thecubic technique in ArcGIS 103 (Wu et al 2019)

We downloaded the climate data including 19 bioclimatic variables atsim1 km resolutionfromWorldClim (Fick amp Hijmans 2017) at httpworldclimorg These climate data werealso resampled to a 500times500 m grid cell size using the cubic technique in ArcGIS 103 (Wuet al 2019) Climatic variables are important for plant ecophysiology (Mod et al 2016)and are commonly used as bioclimatic limits in vegetation models (Sitch et al 2003)

We acquired the MYD09A1500M product data (sinusoidal projection path 4 and row26 path 4 and row 27 path 5 and row 26 path 5 and row 27) from summer (July 202013) and winter (January 17 2013) as Modis images from the Geospatial Data Cloudat httpwwwgscloudcn Our image pre-processing included image subset mosaickingand clipping in ENVI 52 (Deng 2010) We obtained the land surface albedo in bands1-7 directly from the MYD09A1500M product and calculated the indicesrsquo effectiveness atreflecting vegetation information (Price Guo amp Stiles 2002 Zhou et al 2016)

Since vegetation indices can provide information on both vegetation and environment(Bannari Morin amp Bonn 1995) these indices are more sensitive than single spectralbands at detecting green vegetation (Bannari Morin amp Bonn 1995 Cohen amp Goward2004) Therefore vegetation indices can be used for image interpretation vegetationdiscrimination and prediction and land use change detection (Bannari Morin amp Bonn

Yi et al (2020) PeerJ DOI 107717peerj9839 526

Table 1 Classification units of the vegetation of China

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

0 No vegetation 0 No vegetation 0 No vegetation1 Needleleaf forest 1 Temperate needleleaf forest 1 Pinus tabulaeformis forest

2 Quercus mongolica forest3 Quercus liaotungensis forest4 Quercus variabilis forest5 Robinia pseudoacacia forest6 Salix matsudana forest7 Populus davidiana forest

2 Broadleaf forest 2 Temperate broadleaf deciduous forest

8 Betula platyphylla forest9 Corylus heterophylla scrub10 Lespedeza bicolor scrub11 Prunus armeniaca var ansa scrub12 Vitex negundo var heterophylla Zizyphus jujuba varspinosa scrub13 Cotinus coggygria var cinerea scrub14 Spiraea spp scrub

3 Scrub 3 Temperate broadleaf deciduous scrub

15 Ostryopsis davidiana scrub16 Stipa baicalensis forb meadow steppe4 Temperate grass-forb meadow steppe17 Filifolium sibiricum grass-forb meadow steppe18 Aneurolepidium chinense needlegrass steppe19 Stipa krylovii steppe20 Stipa bungiana steppe

4 Steppe5 Temperate needlegrass arid steppe

21 Thymus mongolicus needlegrass steppe22 Bothriochloa ischaemum community23 Bothriochloa ischaemum community24 Vitex negundo var heterophylla Zizyphus jujubavar spinosus Bothriochloa ischaemum scrub and grasscommunity

5 Grass-forb community 6 Temperate grass-forb community

25 Vitex negundo var heterophylla Zizyphus jujuba varspinosus Themeda triandra var japonica scrub and grasscommunity

7 Temperate grass and forb meadow 26 Arundinella hirta Spodiopogon sibiricus forb meadow27 Carex spp forb meadow28 Achnatherum splendens holophytic meadow

6 Meadow8 Temperate grass and forb holophytic meadow

29 Suaeda glauca holophytic meadow7 Swamp 9 Cold-temperate and temperate swamp 30 Phragmites communis swamp

(continued on next page)

Yi et al (2020) PeerJ DOI 107717peerj9839 626

Table 1 (continued)

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

10 One crop annually and cold-resistant economic crops 31 Spring wheat naked oats buckwheat potatoes flux11 One crop annually cold-resistant economic crops anddeciduous orchards

32 Coarse grains

33 Winter wheat coarse grains34 Coarse grains35 Rice36 Winter wheat corn cotton37 Apple pear orchard38 Winter wheat corn Chinese sorghum sweet potatoescotton tabacco peanut sesame apple pear hauthornpersimmon walnut pomegranat grape

8 Cultural vegetation12 Three crops two years and two cropsannually non irrigation deciduous orchards

39 Winter wheat coarse grains (loamy soil)

Table 2 The vegetation indices

Indices Abbreviation Formula

Ratio vegetation index RVI NIRRedBrightness index BI 02909Blue + 02493Green + 04806Red + 05568NIR +

04438SWIR1 + 01706SWIR2Green vegetation index GI minus02728Blue - 02174Green-05508Red + 07221NIR +

00733SWIR1 - 01648SWIR2Wetness index WI 01446Blue + 01761Green + 03322Red + 03396NIR -

06210SWIR1 - 04186SWIR2Differenced vegetation index DVI NIR - RedGreen ratio GR NIRGreenMid-infrared ratio MR NIRSWIR1Soil-adjusted vegetation index SAVI (15(NIR - Red))((NIR + Red + 05))Optimization of soil-adjusted vegetation index OSAVI (116(NIR - Red))((NIR + Red + 016))Atmospherically resistant vegetation index ARVI (NIR - (2Red - Blue))(NIR + (2Red - Blue))Normalized difference vegetation index NDVI (NIR - Red)(NIR + Red)Enhanced vegetation index EVI 25[(NIR - Red)(NIR + 6Red - 75Blue + 1)]Normalized difference tillage index NDTI (SWIR1-SWIR2)(SWIR1 + SWIR2)Normalized difference senescent vegetation index NDSVI (SWIR1-Red)(SWIR1 + Red)

1995 Cohen amp Goward 2004 Zhou et al 2016) We tested the vegetation discriminationof 14 vegetation indices (Table 2)

To determine the distribution predictive ability of different variables we grouped thevariables into different combinations based on the results of the Pearson correlation Weonly used less correlated variables (R lt |07| Pearson correlation) (Chala et al 2017) inCombinations 1ndash9 (Table 3) then used variable combinations as input predictor variablesto simulate vegetation distribution Combination 1 included the less correlated variablesof the summer land surface albedos from bands 1 to 7 Combination 2 included the lesscorrelated variables of the winter land surface albedos from bands 1 to 7 Combination 3included the less correlated variables in Combinations 1 and 2 Combination 4 included

Yi et al (2020) PeerJ DOI 107717peerj9839 726

Table 3 Variable combinationsNote DT10 and RF10 represent the top 10 important variables of deci-sion tree (DT) and random forest (RF) methods with Combination 9 in the vegetation group level respec-tively The vegetation indices and their abbreviations were shown in Table 2

Number Variables combinations

1 Summer land surface albedos of band 1 and 52 Winter land surface albedos of band 1 and 63 Summer land surface albedos of band 1 and 5

Winter land surface albedos of band 1 and 64 Summer vegetation indices BI WI MR NDVI EVI5 Winter vegetation indices MR NDVI EVI NDTI6 Summer vegetation indices BI WI MR NDVI EVI

Winter vegetation indices MR NDVI EVI NDTI7 Annual mean temperature Annual precipitation Mean

diurnal range Precipitation of driest month8 Slope Aspect Annual mean temperature Annual

precipitation Mean diurnal range Precipitation of driestmonth

9 Summer land surface albedos of band 1Winter land surface albedos of band 6Summer vegetation indices BI WI MR NDVI EVIWinter vegetation indices MR NDVI EVI NDTISlope Aspect Annual mean temperature Annualprecipitation Mean diurnal range Precipitation of driestmonth

10 DT10 Annual mean temperature Annual precipitationMean diurnal range Precipitation of driest month SlopeWinter vegetation indices MR Summer land surfacealbedos of band 1 Summer vegetation indices BI and EVIWinter land surface albedos of band 6

11 RF10 Annual precipitation Annual mean temperatureMean diurnal range Slope Summer vegetation indices BIMR NDVI and EVI Winter vegetation indices MR andNDVI

the less correlated variables of the summer vegetation indices Combination 5 included theless correlated variables of the winter vegetation indices Combination 6 included the lesscorrelated variables in Combinations 4 and 5 Combination 7 included the less correlatedvariables from the 19 bioclimatic variables Combination 8 included the less correlatedvariables from the 19 bioclimatic variables and three geospatial variables Combination9 included the less correlated variables in Combinations 3 6 and 8 Combinations 10and 11 represented the top 10 most important variables of the DT and RF methodswith Combination 9 in vegetation unit I respectively (Table 3) The SVM and maximumlikelihood classification (MLC) methods only output the simulation results of variableCombinations 1 to 6 likely due to the training samplesrsquo weak separability (Deng 2010)

Vegetation distribution modelsWe used DT RF MLC and SVM vegetation distribution models in this study TheDT model is a divisive monothetic and supervised classifier often used for speciesdistribution modeling and related applications (Franklin 2010) It is computationally fast

Yi et al (2020) PeerJ DOI 107717peerj9839 826

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 4: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Figure 1 The location and DEM of the Jing-Jin-Ji regionFull-size DOI 107717peerj9839fig-1

Mountain in the west Yanshan Mountain in the north and the Bohai Sea in the eastThe region includes the Beijing Tianjin and Hebei provinces (Fig 1) Jing-Jin-Ji has apopulation of approximately 110 million people and covers an area of approximately216000 km2 (Wang et al 2019) The region is a temperate monsoon climate zone with anelevation range ofminus14 to 2837 m (Fig 1) The annual precipitation ranges from 305 to 711mm with increased precipitation at lower altitudes The annual mean temperature rangesfrom minus3 to 14 C with colder averages at higher elevations The amount of precipitationin the region gradually decreases moving from the southeast to the northwest whiletemperature changes show the reverse pattern

Yi et al (2020) PeerJ DOI 107717peerj9839 426

Vegetation and training dataThe VMC completed in 2007 based on field survey data included eight vegetation groups(I) 15 vegetation types (II) and 75 formations and subformations (III) from the Jing-Jin-Jiregion However some of the maprsquos vegetation unit areas are very small and difficult todistinguish To ensure that enough training and assessment point data can be randomlyselected in units II and III we selected eight units I 12 units II and 39 units III from thestudy area (Table 1) Cultivated vegetation are mainly distributed in low areas with analtitude range ofminus14 to 254 m and an annual mean temperature range of 7 to 14 C Majorcultivated plants include winter wheat and coarse grains Scrub and grass-forb communitiesare mainly distributed in the north in elevations ranging from 254 to 1440 m

We obtained model training and assessment data on vegetation composition from fieldsurveys and other publications We collected a total of 3763 vegetation points with 2789of those used for model training and 974 used for model assessment Each unit III had atleast 80 vegetation points with at least 60 of those used for model training and 20 usedfor model assessment The model training and assessment data were randomly selected foreach unit III Additionally we increased the credibility of the model assessment by firstrasterizing the vector VMC onto the same grid as the modeled data and then assessingthe data using the Kappa coefficient (Landis amp Koch 1977 Weng amp Zhou 2006 Zhou etal 2016)

Geospatial climate and spectral dataWe derived geospatial variables including elevation slope and aspect from the 30 mresolution Shuttle Radar Topography Mission (SRTM) digital elevation model (DEMZhao et al 2018) We then resampled these data to a 500times500 m grid cell size using thecubic technique in ArcGIS 103 (Wu et al 2019)

We downloaded the climate data including 19 bioclimatic variables atsim1 km resolutionfromWorldClim (Fick amp Hijmans 2017) at httpworldclimorg These climate data werealso resampled to a 500times500 m grid cell size using the cubic technique in ArcGIS 103 (Wuet al 2019) Climatic variables are important for plant ecophysiology (Mod et al 2016)and are commonly used as bioclimatic limits in vegetation models (Sitch et al 2003)

We acquired the MYD09A1500M product data (sinusoidal projection path 4 and row26 path 4 and row 27 path 5 and row 26 path 5 and row 27) from summer (July 202013) and winter (January 17 2013) as Modis images from the Geospatial Data Cloudat httpwwwgscloudcn Our image pre-processing included image subset mosaickingand clipping in ENVI 52 (Deng 2010) We obtained the land surface albedo in bands1-7 directly from the MYD09A1500M product and calculated the indicesrsquo effectiveness atreflecting vegetation information (Price Guo amp Stiles 2002 Zhou et al 2016)

Since vegetation indices can provide information on both vegetation and environment(Bannari Morin amp Bonn 1995) these indices are more sensitive than single spectralbands at detecting green vegetation (Bannari Morin amp Bonn 1995 Cohen amp Goward2004) Therefore vegetation indices can be used for image interpretation vegetationdiscrimination and prediction and land use change detection (Bannari Morin amp Bonn

Yi et al (2020) PeerJ DOI 107717peerj9839 526

Table 1 Classification units of the vegetation of China

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

0 No vegetation 0 No vegetation 0 No vegetation1 Needleleaf forest 1 Temperate needleleaf forest 1 Pinus tabulaeformis forest

2 Quercus mongolica forest3 Quercus liaotungensis forest4 Quercus variabilis forest5 Robinia pseudoacacia forest6 Salix matsudana forest7 Populus davidiana forest

2 Broadleaf forest 2 Temperate broadleaf deciduous forest

8 Betula platyphylla forest9 Corylus heterophylla scrub10 Lespedeza bicolor scrub11 Prunus armeniaca var ansa scrub12 Vitex negundo var heterophylla Zizyphus jujuba varspinosa scrub13 Cotinus coggygria var cinerea scrub14 Spiraea spp scrub

3 Scrub 3 Temperate broadleaf deciduous scrub

15 Ostryopsis davidiana scrub16 Stipa baicalensis forb meadow steppe4 Temperate grass-forb meadow steppe17 Filifolium sibiricum grass-forb meadow steppe18 Aneurolepidium chinense needlegrass steppe19 Stipa krylovii steppe20 Stipa bungiana steppe

4 Steppe5 Temperate needlegrass arid steppe

21 Thymus mongolicus needlegrass steppe22 Bothriochloa ischaemum community23 Bothriochloa ischaemum community24 Vitex negundo var heterophylla Zizyphus jujubavar spinosus Bothriochloa ischaemum scrub and grasscommunity

5 Grass-forb community 6 Temperate grass-forb community

25 Vitex negundo var heterophylla Zizyphus jujuba varspinosus Themeda triandra var japonica scrub and grasscommunity

7 Temperate grass and forb meadow 26 Arundinella hirta Spodiopogon sibiricus forb meadow27 Carex spp forb meadow28 Achnatherum splendens holophytic meadow

6 Meadow8 Temperate grass and forb holophytic meadow

29 Suaeda glauca holophytic meadow7 Swamp 9 Cold-temperate and temperate swamp 30 Phragmites communis swamp

(continued on next page)

Yi et al (2020) PeerJ DOI 107717peerj9839 626

Table 1 (continued)

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

10 One crop annually and cold-resistant economic crops 31 Spring wheat naked oats buckwheat potatoes flux11 One crop annually cold-resistant economic crops anddeciduous orchards

32 Coarse grains

33 Winter wheat coarse grains34 Coarse grains35 Rice36 Winter wheat corn cotton37 Apple pear orchard38 Winter wheat corn Chinese sorghum sweet potatoescotton tabacco peanut sesame apple pear hauthornpersimmon walnut pomegranat grape

8 Cultural vegetation12 Three crops two years and two cropsannually non irrigation deciduous orchards

39 Winter wheat coarse grains (loamy soil)

Table 2 The vegetation indices

Indices Abbreviation Formula

Ratio vegetation index RVI NIRRedBrightness index BI 02909Blue + 02493Green + 04806Red + 05568NIR +

04438SWIR1 + 01706SWIR2Green vegetation index GI minus02728Blue - 02174Green-05508Red + 07221NIR +

00733SWIR1 - 01648SWIR2Wetness index WI 01446Blue + 01761Green + 03322Red + 03396NIR -

06210SWIR1 - 04186SWIR2Differenced vegetation index DVI NIR - RedGreen ratio GR NIRGreenMid-infrared ratio MR NIRSWIR1Soil-adjusted vegetation index SAVI (15(NIR - Red))((NIR + Red + 05))Optimization of soil-adjusted vegetation index OSAVI (116(NIR - Red))((NIR + Red + 016))Atmospherically resistant vegetation index ARVI (NIR - (2Red - Blue))(NIR + (2Red - Blue))Normalized difference vegetation index NDVI (NIR - Red)(NIR + Red)Enhanced vegetation index EVI 25[(NIR - Red)(NIR + 6Red - 75Blue + 1)]Normalized difference tillage index NDTI (SWIR1-SWIR2)(SWIR1 + SWIR2)Normalized difference senescent vegetation index NDSVI (SWIR1-Red)(SWIR1 + Red)

1995 Cohen amp Goward 2004 Zhou et al 2016) We tested the vegetation discriminationof 14 vegetation indices (Table 2)

To determine the distribution predictive ability of different variables we grouped thevariables into different combinations based on the results of the Pearson correlation Weonly used less correlated variables (R lt |07| Pearson correlation) (Chala et al 2017) inCombinations 1ndash9 (Table 3) then used variable combinations as input predictor variablesto simulate vegetation distribution Combination 1 included the less correlated variablesof the summer land surface albedos from bands 1 to 7 Combination 2 included the lesscorrelated variables of the winter land surface albedos from bands 1 to 7 Combination 3included the less correlated variables in Combinations 1 and 2 Combination 4 included

Yi et al (2020) PeerJ DOI 107717peerj9839 726

Table 3 Variable combinationsNote DT10 and RF10 represent the top 10 important variables of deci-sion tree (DT) and random forest (RF) methods with Combination 9 in the vegetation group level respec-tively The vegetation indices and their abbreviations were shown in Table 2

Number Variables combinations

1 Summer land surface albedos of band 1 and 52 Winter land surface albedos of band 1 and 63 Summer land surface albedos of band 1 and 5

Winter land surface albedos of band 1 and 64 Summer vegetation indices BI WI MR NDVI EVI5 Winter vegetation indices MR NDVI EVI NDTI6 Summer vegetation indices BI WI MR NDVI EVI

Winter vegetation indices MR NDVI EVI NDTI7 Annual mean temperature Annual precipitation Mean

diurnal range Precipitation of driest month8 Slope Aspect Annual mean temperature Annual

precipitation Mean diurnal range Precipitation of driestmonth

9 Summer land surface albedos of band 1Winter land surface albedos of band 6Summer vegetation indices BI WI MR NDVI EVIWinter vegetation indices MR NDVI EVI NDTISlope Aspect Annual mean temperature Annualprecipitation Mean diurnal range Precipitation of driestmonth

10 DT10 Annual mean temperature Annual precipitationMean diurnal range Precipitation of driest month SlopeWinter vegetation indices MR Summer land surfacealbedos of band 1 Summer vegetation indices BI and EVIWinter land surface albedos of band 6

11 RF10 Annual precipitation Annual mean temperatureMean diurnal range Slope Summer vegetation indices BIMR NDVI and EVI Winter vegetation indices MR andNDVI

the less correlated variables of the summer vegetation indices Combination 5 included theless correlated variables of the winter vegetation indices Combination 6 included the lesscorrelated variables in Combinations 4 and 5 Combination 7 included the less correlatedvariables from the 19 bioclimatic variables Combination 8 included the less correlatedvariables from the 19 bioclimatic variables and three geospatial variables Combination9 included the less correlated variables in Combinations 3 6 and 8 Combinations 10and 11 represented the top 10 most important variables of the DT and RF methodswith Combination 9 in vegetation unit I respectively (Table 3) The SVM and maximumlikelihood classification (MLC) methods only output the simulation results of variableCombinations 1 to 6 likely due to the training samplesrsquo weak separability (Deng 2010)

Vegetation distribution modelsWe used DT RF MLC and SVM vegetation distribution models in this study TheDT model is a divisive monothetic and supervised classifier often used for speciesdistribution modeling and related applications (Franklin 2010) It is computationally fast

Yi et al (2020) PeerJ DOI 107717peerj9839 826

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 5: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Vegetation and training dataThe VMC completed in 2007 based on field survey data included eight vegetation groups(I) 15 vegetation types (II) and 75 formations and subformations (III) from the Jing-Jin-Jiregion However some of the maprsquos vegetation unit areas are very small and difficult todistinguish To ensure that enough training and assessment point data can be randomlyselected in units II and III we selected eight units I 12 units II and 39 units III from thestudy area (Table 1) Cultivated vegetation are mainly distributed in low areas with analtitude range ofminus14 to 254 m and an annual mean temperature range of 7 to 14 C Majorcultivated plants include winter wheat and coarse grains Scrub and grass-forb communitiesare mainly distributed in the north in elevations ranging from 254 to 1440 m

We obtained model training and assessment data on vegetation composition from fieldsurveys and other publications We collected a total of 3763 vegetation points with 2789of those used for model training and 974 used for model assessment Each unit III had atleast 80 vegetation points with at least 60 of those used for model training and 20 usedfor model assessment The model training and assessment data were randomly selected foreach unit III Additionally we increased the credibility of the model assessment by firstrasterizing the vector VMC onto the same grid as the modeled data and then assessingthe data using the Kappa coefficient (Landis amp Koch 1977 Weng amp Zhou 2006 Zhou etal 2016)

Geospatial climate and spectral dataWe derived geospatial variables including elevation slope and aspect from the 30 mresolution Shuttle Radar Topography Mission (SRTM) digital elevation model (DEMZhao et al 2018) We then resampled these data to a 500times500 m grid cell size using thecubic technique in ArcGIS 103 (Wu et al 2019)

We downloaded the climate data including 19 bioclimatic variables atsim1 km resolutionfromWorldClim (Fick amp Hijmans 2017) at httpworldclimorg These climate data werealso resampled to a 500times500 m grid cell size using the cubic technique in ArcGIS 103 (Wuet al 2019) Climatic variables are important for plant ecophysiology (Mod et al 2016)and are commonly used as bioclimatic limits in vegetation models (Sitch et al 2003)

We acquired the MYD09A1500M product data (sinusoidal projection path 4 and row26 path 4 and row 27 path 5 and row 26 path 5 and row 27) from summer (July 202013) and winter (January 17 2013) as Modis images from the Geospatial Data Cloudat httpwwwgscloudcn Our image pre-processing included image subset mosaickingand clipping in ENVI 52 (Deng 2010) We obtained the land surface albedo in bands1-7 directly from the MYD09A1500M product and calculated the indicesrsquo effectiveness atreflecting vegetation information (Price Guo amp Stiles 2002 Zhou et al 2016)

Since vegetation indices can provide information on both vegetation and environment(Bannari Morin amp Bonn 1995) these indices are more sensitive than single spectralbands at detecting green vegetation (Bannari Morin amp Bonn 1995 Cohen amp Goward2004) Therefore vegetation indices can be used for image interpretation vegetationdiscrimination and prediction and land use change detection (Bannari Morin amp Bonn

Yi et al (2020) PeerJ DOI 107717peerj9839 526

Table 1 Classification units of the vegetation of China

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

0 No vegetation 0 No vegetation 0 No vegetation1 Needleleaf forest 1 Temperate needleleaf forest 1 Pinus tabulaeformis forest

2 Quercus mongolica forest3 Quercus liaotungensis forest4 Quercus variabilis forest5 Robinia pseudoacacia forest6 Salix matsudana forest7 Populus davidiana forest

2 Broadleaf forest 2 Temperate broadleaf deciduous forest

8 Betula platyphylla forest9 Corylus heterophylla scrub10 Lespedeza bicolor scrub11 Prunus armeniaca var ansa scrub12 Vitex negundo var heterophylla Zizyphus jujuba varspinosa scrub13 Cotinus coggygria var cinerea scrub14 Spiraea spp scrub

3 Scrub 3 Temperate broadleaf deciduous scrub

15 Ostryopsis davidiana scrub16 Stipa baicalensis forb meadow steppe4 Temperate grass-forb meadow steppe17 Filifolium sibiricum grass-forb meadow steppe18 Aneurolepidium chinense needlegrass steppe19 Stipa krylovii steppe20 Stipa bungiana steppe

4 Steppe5 Temperate needlegrass arid steppe

21 Thymus mongolicus needlegrass steppe22 Bothriochloa ischaemum community23 Bothriochloa ischaemum community24 Vitex negundo var heterophylla Zizyphus jujubavar spinosus Bothriochloa ischaemum scrub and grasscommunity

5 Grass-forb community 6 Temperate grass-forb community

25 Vitex negundo var heterophylla Zizyphus jujuba varspinosus Themeda triandra var japonica scrub and grasscommunity

7 Temperate grass and forb meadow 26 Arundinella hirta Spodiopogon sibiricus forb meadow27 Carex spp forb meadow28 Achnatherum splendens holophytic meadow

6 Meadow8 Temperate grass and forb holophytic meadow

29 Suaeda glauca holophytic meadow7 Swamp 9 Cold-temperate and temperate swamp 30 Phragmites communis swamp

(continued on next page)

Yi et al (2020) PeerJ DOI 107717peerj9839 626

Table 1 (continued)

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

10 One crop annually and cold-resistant economic crops 31 Spring wheat naked oats buckwheat potatoes flux11 One crop annually cold-resistant economic crops anddeciduous orchards

32 Coarse grains

33 Winter wheat coarse grains34 Coarse grains35 Rice36 Winter wheat corn cotton37 Apple pear orchard38 Winter wheat corn Chinese sorghum sweet potatoescotton tabacco peanut sesame apple pear hauthornpersimmon walnut pomegranat grape

8 Cultural vegetation12 Three crops two years and two cropsannually non irrigation deciduous orchards

39 Winter wheat coarse grains (loamy soil)

Table 2 The vegetation indices

Indices Abbreviation Formula

Ratio vegetation index RVI NIRRedBrightness index BI 02909Blue + 02493Green + 04806Red + 05568NIR +

04438SWIR1 + 01706SWIR2Green vegetation index GI minus02728Blue - 02174Green-05508Red + 07221NIR +

00733SWIR1 - 01648SWIR2Wetness index WI 01446Blue + 01761Green + 03322Red + 03396NIR -

06210SWIR1 - 04186SWIR2Differenced vegetation index DVI NIR - RedGreen ratio GR NIRGreenMid-infrared ratio MR NIRSWIR1Soil-adjusted vegetation index SAVI (15(NIR - Red))((NIR + Red + 05))Optimization of soil-adjusted vegetation index OSAVI (116(NIR - Red))((NIR + Red + 016))Atmospherically resistant vegetation index ARVI (NIR - (2Red - Blue))(NIR + (2Red - Blue))Normalized difference vegetation index NDVI (NIR - Red)(NIR + Red)Enhanced vegetation index EVI 25[(NIR - Red)(NIR + 6Red - 75Blue + 1)]Normalized difference tillage index NDTI (SWIR1-SWIR2)(SWIR1 + SWIR2)Normalized difference senescent vegetation index NDSVI (SWIR1-Red)(SWIR1 + Red)

1995 Cohen amp Goward 2004 Zhou et al 2016) We tested the vegetation discriminationof 14 vegetation indices (Table 2)

To determine the distribution predictive ability of different variables we grouped thevariables into different combinations based on the results of the Pearson correlation Weonly used less correlated variables (R lt |07| Pearson correlation) (Chala et al 2017) inCombinations 1ndash9 (Table 3) then used variable combinations as input predictor variablesto simulate vegetation distribution Combination 1 included the less correlated variablesof the summer land surface albedos from bands 1 to 7 Combination 2 included the lesscorrelated variables of the winter land surface albedos from bands 1 to 7 Combination 3included the less correlated variables in Combinations 1 and 2 Combination 4 included

Yi et al (2020) PeerJ DOI 107717peerj9839 726

Table 3 Variable combinationsNote DT10 and RF10 represent the top 10 important variables of deci-sion tree (DT) and random forest (RF) methods with Combination 9 in the vegetation group level respec-tively The vegetation indices and their abbreviations were shown in Table 2

Number Variables combinations

1 Summer land surface albedos of band 1 and 52 Winter land surface albedos of band 1 and 63 Summer land surface albedos of band 1 and 5

Winter land surface albedos of band 1 and 64 Summer vegetation indices BI WI MR NDVI EVI5 Winter vegetation indices MR NDVI EVI NDTI6 Summer vegetation indices BI WI MR NDVI EVI

Winter vegetation indices MR NDVI EVI NDTI7 Annual mean temperature Annual precipitation Mean

diurnal range Precipitation of driest month8 Slope Aspect Annual mean temperature Annual

precipitation Mean diurnal range Precipitation of driestmonth

9 Summer land surface albedos of band 1Winter land surface albedos of band 6Summer vegetation indices BI WI MR NDVI EVIWinter vegetation indices MR NDVI EVI NDTISlope Aspect Annual mean temperature Annualprecipitation Mean diurnal range Precipitation of driestmonth

10 DT10 Annual mean temperature Annual precipitationMean diurnal range Precipitation of driest month SlopeWinter vegetation indices MR Summer land surfacealbedos of band 1 Summer vegetation indices BI and EVIWinter land surface albedos of band 6

11 RF10 Annual precipitation Annual mean temperatureMean diurnal range Slope Summer vegetation indices BIMR NDVI and EVI Winter vegetation indices MR andNDVI

the less correlated variables of the summer vegetation indices Combination 5 included theless correlated variables of the winter vegetation indices Combination 6 included the lesscorrelated variables in Combinations 4 and 5 Combination 7 included the less correlatedvariables from the 19 bioclimatic variables Combination 8 included the less correlatedvariables from the 19 bioclimatic variables and three geospatial variables Combination9 included the less correlated variables in Combinations 3 6 and 8 Combinations 10and 11 represented the top 10 most important variables of the DT and RF methodswith Combination 9 in vegetation unit I respectively (Table 3) The SVM and maximumlikelihood classification (MLC) methods only output the simulation results of variableCombinations 1 to 6 likely due to the training samplesrsquo weak separability (Deng 2010)

Vegetation distribution modelsWe used DT RF MLC and SVM vegetation distribution models in this study TheDT model is a divisive monothetic and supervised classifier often used for speciesdistribution modeling and related applications (Franklin 2010) It is computationally fast

Yi et al (2020) PeerJ DOI 107717peerj9839 826

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 6: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Table 1 Classification units of the vegetation of China

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

0 No vegetation 0 No vegetation 0 No vegetation1 Needleleaf forest 1 Temperate needleleaf forest 1 Pinus tabulaeformis forest

2 Quercus mongolica forest3 Quercus liaotungensis forest4 Quercus variabilis forest5 Robinia pseudoacacia forest6 Salix matsudana forest7 Populus davidiana forest

2 Broadleaf forest 2 Temperate broadleaf deciduous forest

8 Betula platyphylla forest9 Corylus heterophylla scrub10 Lespedeza bicolor scrub11 Prunus armeniaca var ansa scrub12 Vitex negundo var heterophylla Zizyphus jujuba varspinosa scrub13 Cotinus coggygria var cinerea scrub14 Spiraea spp scrub

3 Scrub 3 Temperate broadleaf deciduous scrub

15 Ostryopsis davidiana scrub16 Stipa baicalensis forb meadow steppe4 Temperate grass-forb meadow steppe17 Filifolium sibiricum grass-forb meadow steppe18 Aneurolepidium chinense needlegrass steppe19 Stipa krylovii steppe20 Stipa bungiana steppe

4 Steppe5 Temperate needlegrass arid steppe

21 Thymus mongolicus needlegrass steppe22 Bothriochloa ischaemum community23 Bothriochloa ischaemum community24 Vitex negundo var heterophylla Zizyphus jujubavar spinosus Bothriochloa ischaemum scrub and grasscommunity

5 Grass-forb community 6 Temperate grass-forb community

25 Vitex negundo var heterophylla Zizyphus jujuba varspinosus Themeda triandra var japonica scrub and grasscommunity

7 Temperate grass and forb meadow 26 Arundinella hirta Spodiopogon sibiricus forb meadow27 Carex spp forb meadow28 Achnatherum splendens holophytic meadow

6 Meadow8 Temperate grass and forb holophytic meadow

29 Suaeda glauca holophytic meadow7 Swamp 9 Cold-temperate and temperate swamp 30 Phragmites communis swamp

(continued on next page)

Yi et al (2020) PeerJ DOI 107717peerj9839 626

Table 1 (continued)

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

10 One crop annually and cold-resistant economic crops 31 Spring wheat naked oats buckwheat potatoes flux11 One crop annually cold-resistant economic crops anddeciduous orchards

32 Coarse grains

33 Winter wheat coarse grains34 Coarse grains35 Rice36 Winter wheat corn cotton37 Apple pear orchard38 Winter wheat corn Chinese sorghum sweet potatoescotton tabacco peanut sesame apple pear hauthornpersimmon walnut pomegranat grape

8 Cultural vegetation12 Three crops two years and two cropsannually non irrigation deciduous orchards

39 Winter wheat coarse grains (loamy soil)

Table 2 The vegetation indices

Indices Abbreviation Formula

Ratio vegetation index RVI NIRRedBrightness index BI 02909Blue + 02493Green + 04806Red + 05568NIR +

04438SWIR1 + 01706SWIR2Green vegetation index GI minus02728Blue - 02174Green-05508Red + 07221NIR +

00733SWIR1 - 01648SWIR2Wetness index WI 01446Blue + 01761Green + 03322Red + 03396NIR -

06210SWIR1 - 04186SWIR2Differenced vegetation index DVI NIR - RedGreen ratio GR NIRGreenMid-infrared ratio MR NIRSWIR1Soil-adjusted vegetation index SAVI (15(NIR - Red))((NIR + Red + 05))Optimization of soil-adjusted vegetation index OSAVI (116(NIR - Red))((NIR + Red + 016))Atmospherically resistant vegetation index ARVI (NIR - (2Red - Blue))(NIR + (2Red - Blue))Normalized difference vegetation index NDVI (NIR - Red)(NIR + Red)Enhanced vegetation index EVI 25[(NIR - Red)(NIR + 6Red - 75Blue + 1)]Normalized difference tillage index NDTI (SWIR1-SWIR2)(SWIR1 + SWIR2)Normalized difference senescent vegetation index NDSVI (SWIR1-Red)(SWIR1 + Red)

1995 Cohen amp Goward 2004 Zhou et al 2016) We tested the vegetation discriminationof 14 vegetation indices (Table 2)

To determine the distribution predictive ability of different variables we grouped thevariables into different combinations based on the results of the Pearson correlation Weonly used less correlated variables (R lt |07| Pearson correlation) (Chala et al 2017) inCombinations 1ndash9 (Table 3) then used variable combinations as input predictor variablesto simulate vegetation distribution Combination 1 included the less correlated variablesof the summer land surface albedos from bands 1 to 7 Combination 2 included the lesscorrelated variables of the winter land surface albedos from bands 1 to 7 Combination 3included the less correlated variables in Combinations 1 and 2 Combination 4 included

Yi et al (2020) PeerJ DOI 107717peerj9839 726

Table 3 Variable combinationsNote DT10 and RF10 represent the top 10 important variables of deci-sion tree (DT) and random forest (RF) methods with Combination 9 in the vegetation group level respec-tively The vegetation indices and their abbreviations were shown in Table 2

Number Variables combinations

1 Summer land surface albedos of band 1 and 52 Winter land surface albedos of band 1 and 63 Summer land surface albedos of band 1 and 5

Winter land surface albedos of band 1 and 64 Summer vegetation indices BI WI MR NDVI EVI5 Winter vegetation indices MR NDVI EVI NDTI6 Summer vegetation indices BI WI MR NDVI EVI

Winter vegetation indices MR NDVI EVI NDTI7 Annual mean temperature Annual precipitation Mean

diurnal range Precipitation of driest month8 Slope Aspect Annual mean temperature Annual

precipitation Mean diurnal range Precipitation of driestmonth

9 Summer land surface albedos of band 1Winter land surface albedos of band 6Summer vegetation indices BI WI MR NDVI EVIWinter vegetation indices MR NDVI EVI NDTISlope Aspect Annual mean temperature Annualprecipitation Mean diurnal range Precipitation of driestmonth

10 DT10 Annual mean temperature Annual precipitationMean diurnal range Precipitation of driest month SlopeWinter vegetation indices MR Summer land surfacealbedos of band 1 Summer vegetation indices BI and EVIWinter land surface albedos of band 6

11 RF10 Annual precipitation Annual mean temperatureMean diurnal range Slope Summer vegetation indices BIMR NDVI and EVI Winter vegetation indices MR andNDVI

the less correlated variables of the summer vegetation indices Combination 5 included theless correlated variables of the winter vegetation indices Combination 6 included the lesscorrelated variables in Combinations 4 and 5 Combination 7 included the less correlatedvariables from the 19 bioclimatic variables Combination 8 included the less correlatedvariables from the 19 bioclimatic variables and three geospatial variables Combination9 included the less correlated variables in Combinations 3 6 and 8 Combinations 10and 11 represented the top 10 most important variables of the DT and RF methodswith Combination 9 in vegetation unit I respectively (Table 3) The SVM and maximumlikelihood classification (MLC) methods only output the simulation results of variableCombinations 1 to 6 likely due to the training samplesrsquo weak separability (Deng 2010)

Vegetation distribution modelsWe used DT RF MLC and SVM vegetation distribution models in this study TheDT model is a divisive monothetic and supervised classifier often used for speciesdistribution modeling and related applications (Franklin 2010) It is computationally fast

Yi et al (2020) PeerJ DOI 107717peerj9839 826

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 7: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Table 1 (continued)

Vegetation groups (I) Vegetation types (II) Formations and sub-formations (III)

10 One crop annually and cold-resistant economic crops 31 Spring wheat naked oats buckwheat potatoes flux11 One crop annually cold-resistant economic crops anddeciduous orchards

32 Coarse grains

33 Winter wheat coarse grains34 Coarse grains35 Rice36 Winter wheat corn cotton37 Apple pear orchard38 Winter wheat corn Chinese sorghum sweet potatoescotton tabacco peanut sesame apple pear hauthornpersimmon walnut pomegranat grape

8 Cultural vegetation12 Three crops two years and two cropsannually non irrigation deciduous orchards

39 Winter wheat coarse grains (loamy soil)

Table 2 The vegetation indices

Indices Abbreviation Formula

Ratio vegetation index RVI NIRRedBrightness index BI 02909Blue + 02493Green + 04806Red + 05568NIR +

04438SWIR1 + 01706SWIR2Green vegetation index GI minus02728Blue - 02174Green-05508Red + 07221NIR +

00733SWIR1 - 01648SWIR2Wetness index WI 01446Blue + 01761Green + 03322Red + 03396NIR -

06210SWIR1 - 04186SWIR2Differenced vegetation index DVI NIR - RedGreen ratio GR NIRGreenMid-infrared ratio MR NIRSWIR1Soil-adjusted vegetation index SAVI (15(NIR - Red))((NIR + Red + 05))Optimization of soil-adjusted vegetation index OSAVI (116(NIR - Red))((NIR + Red + 016))Atmospherically resistant vegetation index ARVI (NIR - (2Red - Blue))(NIR + (2Red - Blue))Normalized difference vegetation index NDVI (NIR - Red)(NIR + Red)Enhanced vegetation index EVI 25[(NIR - Red)(NIR + 6Red - 75Blue + 1)]Normalized difference tillage index NDTI (SWIR1-SWIR2)(SWIR1 + SWIR2)Normalized difference senescent vegetation index NDSVI (SWIR1-Red)(SWIR1 + Red)

1995 Cohen amp Goward 2004 Zhou et al 2016) We tested the vegetation discriminationof 14 vegetation indices (Table 2)

To determine the distribution predictive ability of different variables we grouped thevariables into different combinations based on the results of the Pearson correlation Weonly used less correlated variables (R lt |07| Pearson correlation) (Chala et al 2017) inCombinations 1ndash9 (Table 3) then used variable combinations as input predictor variablesto simulate vegetation distribution Combination 1 included the less correlated variablesof the summer land surface albedos from bands 1 to 7 Combination 2 included the lesscorrelated variables of the winter land surface albedos from bands 1 to 7 Combination 3included the less correlated variables in Combinations 1 and 2 Combination 4 included

Yi et al (2020) PeerJ DOI 107717peerj9839 726

Table 3 Variable combinationsNote DT10 and RF10 represent the top 10 important variables of deci-sion tree (DT) and random forest (RF) methods with Combination 9 in the vegetation group level respec-tively The vegetation indices and their abbreviations were shown in Table 2

Number Variables combinations

1 Summer land surface albedos of band 1 and 52 Winter land surface albedos of band 1 and 63 Summer land surface albedos of band 1 and 5

Winter land surface albedos of band 1 and 64 Summer vegetation indices BI WI MR NDVI EVI5 Winter vegetation indices MR NDVI EVI NDTI6 Summer vegetation indices BI WI MR NDVI EVI

Winter vegetation indices MR NDVI EVI NDTI7 Annual mean temperature Annual precipitation Mean

diurnal range Precipitation of driest month8 Slope Aspect Annual mean temperature Annual

precipitation Mean diurnal range Precipitation of driestmonth

9 Summer land surface albedos of band 1Winter land surface albedos of band 6Summer vegetation indices BI WI MR NDVI EVIWinter vegetation indices MR NDVI EVI NDTISlope Aspect Annual mean temperature Annualprecipitation Mean diurnal range Precipitation of driestmonth

10 DT10 Annual mean temperature Annual precipitationMean diurnal range Precipitation of driest month SlopeWinter vegetation indices MR Summer land surfacealbedos of band 1 Summer vegetation indices BI and EVIWinter land surface albedos of band 6

11 RF10 Annual precipitation Annual mean temperatureMean diurnal range Slope Summer vegetation indices BIMR NDVI and EVI Winter vegetation indices MR andNDVI

the less correlated variables of the summer vegetation indices Combination 5 included theless correlated variables of the winter vegetation indices Combination 6 included the lesscorrelated variables in Combinations 4 and 5 Combination 7 included the less correlatedvariables from the 19 bioclimatic variables Combination 8 included the less correlatedvariables from the 19 bioclimatic variables and three geospatial variables Combination9 included the less correlated variables in Combinations 3 6 and 8 Combinations 10and 11 represented the top 10 most important variables of the DT and RF methodswith Combination 9 in vegetation unit I respectively (Table 3) The SVM and maximumlikelihood classification (MLC) methods only output the simulation results of variableCombinations 1 to 6 likely due to the training samplesrsquo weak separability (Deng 2010)

Vegetation distribution modelsWe used DT RF MLC and SVM vegetation distribution models in this study TheDT model is a divisive monothetic and supervised classifier often used for speciesdistribution modeling and related applications (Franklin 2010) It is computationally fast

Yi et al (2020) PeerJ DOI 107717peerj9839 826

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 8: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Table 3 Variable combinationsNote DT10 and RF10 represent the top 10 important variables of deci-sion tree (DT) and random forest (RF) methods with Combination 9 in the vegetation group level respec-tively The vegetation indices and their abbreviations were shown in Table 2

Number Variables combinations

1 Summer land surface albedos of band 1 and 52 Winter land surface albedos of band 1 and 63 Summer land surface albedos of band 1 and 5

Winter land surface albedos of band 1 and 64 Summer vegetation indices BI WI MR NDVI EVI5 Winter vegetation indices MR NDVI EVI NDTI6 Summer vegetation indices BI WI MR NDVI EVI

Winter vegetation indices MR NDVI EVI NDTI7 Annual mean temperature Annual precipitation Mean

diurnal range Precipitation of driest month8 Slope Aspect Annual mean temperature Annual

precipitation Mean diurnal range Precipitation of driestmonth

9 Summer land surface albedos of band 1Winter land surface albedos of band 6Summer vegetation indices BI WI MR NDVI EVIWinter vegetation indices MR NDVI EVI NDTISlope Aspect Annual mean temperature Annualprecipitation Mean diurnal range Precipitation of driestmonth

10 DT10 Annual mean temperature Annual precipitationMean diurnal range Precipitation of driest month SlopeWinter vegetation indices MR Summer land surfacealbedos of band 1 Summer vegetation indices BI and EVIWinter land surface albedos of band 6

11 RF10 Annual precipitation Annual mean temperatureMean diurnal range Slope Summer vegetation indices BIMR NDVI and EVI Winter vegetation indices MR andNDVI

the less correlated variables of the summer vegetation indices Combination 5 included theless correlated variables of the winter vegetation indices Combination 6 included the lesscorrelated variables in Combinations 4 and 5 Combination 7 included the less correlatedvariables from the 19 bioclimatic variables Combination 8 included the less correlatedvariables from the 19 bioclimatic variables and three geospatial variables Combination9 included the less correlated variables in Combinations 3 6 and 8 Combinations 10and 11 represented the top 10 most important variables of the DT and RF methodswith Combination 9 in vegetation unit I respectively (Table 3) The SVM and maximumlikelihood classification (MLC) methods only output the simulation results of variableCombinations 1 to 6 likely due to the training samplesrsquo weak separability (Deng 2010)

Vegetation distribution modelsWe used DT RF MLC and SVM vegetation distribution models in this study TheDT model is a divisive monothetic and supervised classifier often used for speciesdistribution modeling and related applications (Franklin 2010) It is computationally fast

Yi et al (2020) PeerJ DOI 107717peerj9839 826

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 9: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

and easy to understand and implement It uses classification or regression algorithmsto generate classification rules and then visualizes those rules into simple tree graphics(Hastie Tibshirani amp Friedman 2009 Zhou et al 2016) The DTmodel calculates the mostsignificant variables contributing to the model (Deng 2010) We used a DT with five layers40 samples in the smallest parent node and 10 samples in the smallest child node

The RF model is an ensemble method that has been applied in risk assessment andspecies distribution modeling studies (Cutler et al 2007 Zhang amp Dong 2017) TheRF model creates and combines different DTs to produce considerably more accurateclassifications that are unaffected by noise or overtraining (Burai et al 2015 Cutler etal 2007 Gislason Benediktsson amp Sveinsson 2006) The RF model also calculates the mostsignificant variables that contribute to themodel (Cutler et al 2007) Running an RFmodelrequires defined parameters including tree number number of randomly selected featuresand node impurity function We generated the RF model in EnMAP-Box a license-freeand platform-independent software interface designed to process hyperspectral remotesensing data which was developed by the Humboldt University of Berlin There arein-built applications aimed at the processing of hyperspectral data such as SVM and RFfor classification of image data in the EnMAP-Box (Held et al 2014) We used the defaultsettings in EnMAP-Box with 100 trees The number of randomly selected features wasequal to the square root of the number of all features and we used a Gini coefficient forthe node impurity function (Rabe et al 2014 Ma Gao amp Gu 2019 van der Linden et al2015 Zhou et al 2016)

The MLC model is one of the most commonly used supervised image classificationmethods MLCrsquos classification rules use the statistics of the Gaussian probability densityfunction to assign each pixel to the class with the highest probability Although the MLCmethod usually generates similar or more accurate classifications than other methods itis not applicable when there are fewer training samples than input predictors (Burai et al2015 Zhou et al 2016)

The SVM model is a supervised machine learning model used for classification andregression It is a complex and widely used method that can output more accuratepredictions (Burai et al 2015) than othermethods The SVMmodel searches for an optimalplane in amultidimensional space to divide all sample elements into two categories makingthe distance between the closest points in the two classes as large as possible (Kabacoff2016) Running an SVM model requires a defined kernel parameter g and regularizationparameter c In this study we generated the SVM model in the EnMAP-Box The defaultsettings in EnMAP-Box to the SVM model was applied where the parameter g was 001 to1000 and the parameter c was 01 to 1000 Parameters g and c were tested using a gridsearch with internal performance estimation and we used those with the best performancefor data training (Lin et al 2014 van der Linden et al 2014 van der Linden et al 2015)

We generated the predicted vegetation maps of the three classification units using theDT RF MLC and SVM methods with a resolution of 500 m We selected all 11 variablecombinations as the input variables for each method The DT and RF method resultsindicated which variables were most important for vegetation discrimination

Yi et al (2020) PeerJ DOI 107717peerj9839 926

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 10: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Modelassessm

entWeused

theVMCand

atotalof974

vegetationpoints

toassess

theoverallaccuracy

andKappa

coefficientofevery

predictedvegetation

mapK

appacoefficient

valuesranging

from04

to055

indicatedmoderate

agreementfrom

056to

08indicated

substantialagreem

entandfrom

081to

1indicated

almostperfectagreem

ent(LandisampKoch1977

Weng

ampZhou2006Zhou

etal2016)When

theKappa

coefficientvaluewasgreater

than04the

assessedpredicted

map

wasconsidered

acceptable

RESU

LTSUnitIm

odelingand

assessment

TheRFmodelrsquos

resultswere

betterthan

theresults

oftheDTM

LCand

SVM

models

(Table4)The

RFmodelhad

aKappa

coefficientlarger

than04

when

usingvariable

Com

binations6to

11assessed

byfield

pointdatawith

anoverallaccuracy

of50to

72

TheRFmodelhad

aKappa

coefficientlargerthan056

when

usingvariable

Com

binations7to

11assessed

byfield

datawith

anoverallaccuracy

of68to

72The

RFmodelhad

thehighestK

appacoefficientof066

andthe

highestoverallaccuracyof72

when

usingvariable

Com

bination7The

DTmodelhad

aKappa

coefficientlargerthan04

when

usingvariable

Com

binations7to

11assessed

byfield

pointdatawith

anoverallaccuracy

of54to

56The

DTmodelhad

noKappa

coefficientlargerthan

056when

usingallvariable

combinationsA

fterVMCassessm

entwefound

thehighestK

appacoefficientw

as038

with

anoverallaccuracy

of57in

theRFmodelusing

variableCom

binations9to

11(Table

4Fig2)

UnitIIm

odelingand

assessment

TheRFmodelresults

were

betterthan

theresults

ofthe

otherthree

modelsThe

RF

modelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof66

ndash70and

54ndash55

forfield

pointdataand

VMCassessm

entsrespectivelyThe

RFmodelusing

Com

binations7to

11had

aKappa

coefficientlargerthan056

andan

overallaccuracyof66

ndash70when

assessedby

fieldpointdataThe

RFmodel

hadthe

highestKappa

coefficientof065and

thehighestoverallaccuracy

of70when

usingvariable

Com

bination7The

DTmodelusing

variableCom

binations7to

11had

aKappa

coefficientlargerthan

04with

overallaccuraciesof53

ndash55and

65ndash72

forfield

pointdataand

VMCassessm

entsrespectivelyTheDTmodelhad

thehighestK

appacoefficientof054

andoverallaccuracy

of72when

usingvariable

Com

bination7The

DTmodelhad

alarger

Kappa

coefficientandgreater

overallaccuracywhen

assessedby

VMCrather

thanthe

RFmodel(Table

5Fig3)

UnitIIIm

odelingand

assessment

Only

theRFmodelcould

simulate

vegetationdistribution

inunit

IIITheRFmodel

usingvariable

Com

binations7to

11had

aKappa

coefficientlarger

than04

andan

overallaccuracyof55

ndash58assessed

byfield

pointdataTheRFmodelusing

variableCom

bination7had

thehighestK

appacoefficientof057

(theonly

modelw

ithaKappa

coefficientlargerthan056)and

thehighestoverallaccuracy

of58assessed

byfield

point

Yietal(2020)PeerJDOI107717peerj9839

1026

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 11: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Tab

le4

Mod

elassessmento

fvegetationgrou

psby

field

pointd

ataan

dVMCV

ariablecombinatio

nswereshow

nin

Table3

Variable

combina

tion

sDecisiontree

Ran

dom

forest

Supp

ortv

ectorm

achine

Maxim

umlik

elihoo

dclassific

ation

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

Pointd

ata

VMC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

OA

KC

134

018

55

022

37

024

32

009

36

021

53

021

23

008

11

002

238

020

52

023

39

027

37

013

35

020

55

024

18

007

9003

345

031

54

026

47

036

45

021

41

027

54

027

24

012

15

005

432

016

46

016

42

030

42

017

37

022

57

026

11

004

3001

531

011

59

014

44

032

44

019

36

022

51

022

9004

4002

641

026

44

018

50

040

52

027

42

029

54

027

13

008

4003

754

045

57

034

72

066

55

035

855

046

56

035

69

063

56

037

955

046

53

034

68

061

57

038

1055

046

53

033

69

063

57

038

1156

046

56

036

68

062

57

038

Notes VMCthe

VegetationMap

ofthePeop

lesR

epub

licof

China

thekapp

acoefficient

lagerthan056

the

kapp

acoefficient

larger

than

04andlessthan

056

OAO

verallaccuracyK

CK

appa

coefficient

Yi et al (2020) PeerJ DOI 107717peerj9839 1126

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 12: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Figure 2 The modeling vegetation map of vegetation groups with highest accuracy by four methodsand the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vectormachine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic ofChina (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-2

data The Kappa coefficients in all models were less than 04 when assessed by the VMC(Table 6 Fig 4)

The abbreviations were same with Table 4

Important variablesFor the RF model eight of the top 10 most important variables were the same acrossthe different vegetation units three climate variables (annual mean temperature meandiurnal range and annual precipitation) one geospatial variable (slope) and four spectralvariables (Mid-infrared ratio and NDVI of winter vegetation index brightness index andNDVI of summer vegetation index) For the DT model nine of the top 10 most importantvariables were the same across the different vegetation units four climate variables (annualmean temperature mean diurnal range precipitation of the driest month and annualprecipitation) one geospatial variable (slope) and 4 spectral variables (Mid-infrared ratioof winter vegetation index brightness index of summer vegetation index summer surfacealbedo of band 1 winter surface albedo of band 6) (Table 7)

DISCUSSIONVegetation classification unitsVegetation classification is an important and complex system with multiple levels Higherlevel classificationmethods not only accurately classify vegetation but they can also describe

Yi et al (2020) PeerJ DOI 107717peerj9839 1226

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 13: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Table 5 Model assessment of vegetation types by field point data and VMCVariable combinations were shown in Table 3

Variablecombinations

Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 42 024 63 033 32 022 23 009 32 018 40 018 6 002 7 0002 44 027 58 031 34 023 30 014 31 018 44 024 5 002 14 0003 43 030 58 035 44 034 38 022 37 026 43 025 9 005 13 0004 36 020 47 020 39 029 31 015 32 019 43 021 13 007 6 0025 32 014 59 023 41 031 36 019 34 022 43 022 6 003 6 0036 36 023 45 024 47 038 44 027 40 029 43 025 14 009 21 0067 55 046 72 054 70 065 54 041

8 53 044 68 052 68 063 55 043

9 54 045 65 049 66 060 55 043

10 54 045 65 049 68 063 55 043

11 53 044 68 052 67 062 55 043

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1326

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 14: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Figure 3 The modeling vegetation map of vegetation types with highest accuracy by four methods andthe VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) support vector ma-chine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Republic of China(VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-3

ecosystem diversity even during global changes (Faber-Langendoen et al 2014) Plants indifferent vegetation classification units have different spectral characteristics and climaticconditions that are the basis for vegetation distribution simulation Thus models using thesame variables to simulate the vegetation distribution of different classification units mayproduce different classification accuracies (Dobrowski et al 2008 Prasad Iverson amp Liaw2006) Map accuracy has been found to be a function of which classification system andcategories are used (Muchoney et al 2000)

Previous studies have explored vegetation distribution simulation using differentvegetation classification systems Plant functional types (PFTs) defined as plant setssharing similar perturbation response effects on dominant ecosystem processes have beenused to simulate vegetation distribution as seen in the Biome and Box systemmodels (Box1981 Box 1996 Dormann ampWoodin 2002) with positive simulation results (Box 1981Song Zhou amp Ouyang 2005 Weng amp Zhou 2006) The Mapped Atmosphere-Plant-SoilSystem (MAPSS) model was also used to simulate vegetation distribution using vegetationlife forms leaf area index leaf morphology and leaf longevity (Zhao et al 2002) Otherresearchers studied potential vegetation distribution using the Holdridge life zone modelwith positive vegetation pattern results (Zheng et al 2006) When the IGBP classificationsystem was applied to simulate vegetation distribution at a regional scale the map estimateaccuracy was upwards of 80 (Muchoney et al 2000) In this study we used machine

Yi et al (2020) PeerJ DOI 107717peerj9839 1426

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 15: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Table 6 Model assessment of formations and subformations by field point data and VMCVariable combinations were shown in Table 3

Variable combinations Decision tree Random forest Support vector machine Maximum likelihood classification

Point data VMC Point data VMC Point data VMC Point data VMC

OA KC OA KC OA KC OA KC OA KC OA KC OA KC OA KC

1 23 014 19 008 20 018 5 002 11 009 6 003 8 006 8 0042 22 minus004 49 004 19 017 6 003 13 011 7 004 8 006 13 0053 26 014 45 023 29 027 9 007 21 019 10 007 12 009 13 0074 30 020 30 004 22 020 7 004 16 014 6 003 9 007 8 0045 33 001 67 000 22 020 7 004 15 013 5 003 11 009 10 0046 26 015 22 002 31 030 11 008 21 019 8 006 12 009 15 0087 33 020 52 027 58 057 23 0208 27 017 34 018 55 054 23 0209 25 015 22 015 55 053 22 02010 30 017 41 022 56 055 23 02111 31 020 41 022 56 055 23 020

NotesVMC the Vegetation Map of the Peoples Republic of China

the kappa coefficient lager than 056the kappa coefficient larger than 04 and less than 056OA Overall accuracy KC Kappa coefficient

Yietal(2020)PeerJDOI107717peerj9839

1526

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 16: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Figure 4 The modeling vegetation map of formations and sub-formations with highest accuracy byfour methods and the VMC in Jing-Jin-Ji regionDecision tree model (A) random forest model (B) sup-port vector machine (C) maximum likelihood classification (D) the Vegetation Map of the Peoplersquos Re-public of China (VMC) (E) The legend represents vegetation groups shown in Table 1

Full-size DOI 107717peerj9839fig-4

learning models and a hierarchical classification system from the VMC to determine thebest modelingmethod for vegetation affected by high socioeconomic disturbance at variousclassification levels In the VMC unit I was the highest classification level mainly basedupon community appearance unit II was the second highest level mainly based uponcommunity and climate appearance and unit III was the medium classification level basedupon the dominant species The accuracy of the vegetation distribution simulations inunits I and II was similar to each other and higher than unit IIIrsquos simulation (Tables 4ndash6)

Different model performancesWe were interested in vegetation distribution modelingrsquos ability to forecast and respondto environmental changes and vegetation pattern management at local to global scalesVegetation distribution predictions can help explain the relationship between plantsand their abiotic and biotic environments (Franklin 2010) To benefit from ecosystemservice functions people can design vegetation distributions according to distributionand abundance patterns and trends (Hastie Tibshirani amp Friedman 2009) Vegetationclassification has become a widely used ecological method due to a number of new statisticaland machine learning methods used alongside mapped biological and environmental datato model vegetation distributions over large spatial scales at higher resolutions (Cutleret al 2007) Different image classification methods are rarely used together in the same

Yi et al (2020) PeerJ DOI 107717peerj9839 1626

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 17: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Table 7 Top tenmost important variables of models in the different vegetation classification units The abbreviations of indices were shown in Table 2

Vegetation groups Vegetation types Formations and sub-formations

Decision tree Random forest Decision tree Random forest Decision tree Random forest

Important variables StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

Importantvariables

StandardizedImportance

Importantvariables

Normalizedimportance

1 Annual mean temperature 100 Annual mean temperature 368 Annual mean temperature 100 Annual mean temperature 351 Annual mean temperature 100 Annual mean temperature 416

2 Annual precipitation 088 Slope 294 Slope 083 Slope 335 Annual precipitation 086 Annual precipitation 328

3 Slope 080 Mean diurnal range 260 Annual precipitation 051 Mean diurnal range 306 Slope 063 Mean diurnal range 325

4 Winter vegetation index MR 036 Annual precipitation 238 Winter vegetation index MR 030 Annual precipitation 28 Mean diurnal range 052 Slope 224

5 Mean diurnal range 033 Summer vegetation index BI 188 Mean diurnal range 028 Summer vegetation index BI 184 Precipitation of driest month 052 Precipitation of driest month 216

6 Summer surface albedo of band 1 029 Winter vegetation index NDVI 137 Summer vegetation index EVI 022 Winter vegetation index NDVI 161 Winter vegetation index MR 04 Summer vegetation index BI 183

7 Summer vegetation index BI 028 Summer vegetation index EVI 136 Precipitation of driest month 021 Winter vegetation index MR 145 Summer surface albedo of band 1 032 Summer vegetation index NDVI 17

8 Precipitation of driest month 025 Winter vegetation index MR 130 Summer vegetation index BI 020 Summer vegetation index WI 131 Summer vegetation index BI 032 Winter vegetation index NDVI 161

9 Summer vegetation index EVI 023 Summer vegetation index NDVI 122 Summer surface albedo of band 1 019 Precipitation of driest month 124 Summer vegetation index WI 031 Winter vegetation index MR 147

10 Winter surface albedo of band 6 019 Summer vegetation index MR 112 Winter surface albedo of band 6 014 Summer vegetation index NDVI 122 Winter surface albedo of band 6 028 Summer vegetation indices EVI and MR 132

Yietal(2020)PeerJDOI107717peerj9839

1726

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 18: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

classification research especially when combined with environmental variables (Li et al2014)

In this study the RF model performed better than the DT SVM and MLC modelsacross the three classification levels This finding was consistent with the results of otherstudies that found that the RF method modeled vegetation distribution better than othermethods (Prasad Iverson amp Liaw 2006) The DT model divided the data into homogenoussubgroups according to the range of predictor variable values The DTmodel was generallyable to handle a large number of independent variables and could build a tree model fasterthan the other methods However the DT model was somewhat unstable for vegetationdistribution modeling and had lower classification accuracy (Zhou et al 2016) The RFmodel generated a large number of independent trees through data subsets and developeda split in every tree model using a random subset of predictor variables Therefore weconcluded that the RF model was generally better than the DTmodel The SVMmodel wasdeveloped from statistical learning methods and discriminated class samples by locatingpotentially nonlinear or multiple linear boundaries between individual training points(Burai et al 2015) The aim of the MLC model was to maximize the overall probabilitythat a pixel is correctly assigned to a class However theMLCmodel requires a large numberof training samples that limits its application (Sesnie et al 2010) Previous research hasshown that classification accuracieswhenusing the SVMclassifierwere higher than theMLCmodel (Pal amp Mather 2005 Boyd Sanchez-Hernandez amp Foody 2006 Sanchez-HernandezBoyd amp Foody 2007 Sesnie et al 2010) Because the model had fewer requirements theDT method provided significantly more accurate classifications than those of the MLCmodel (Boyd Sanchez-Hernandez amp Foody 2006) Other studies found that the RF andSVM models were similarly accurate (653 and 666 respectively) (Sesnie et al 2010)and that the RF MLC DT and SVMmodels performed similarly and reasonably well whensimulating land use classification (Li et al 2014) In addition to the methods mentionedabove an artificial neural network implemented at a regional scale produced classificationaccuracies of 60ndash80 (Muchoney et al 2000 Haslem et al 2010) In the Arctic thismethod provided the most accurate vegetation mapping (Langford et al 2019) Thereasons for the similarly positive results of these models may be due to the relativelylarge differences between classification objects and their use of sufficiently representativetraining samples and appropriate input variables In our study only the SVM and MLCmodelsrsquo output simulated the results of variable Combinations 1 to 6 This may be dueto the poor separability of the training samples as the models could not recognize thetraining points or their vegetation categories (Jarnevich et al 2015) The Jing-Jin-Ji regionhas many types of vegetation with very small distribution areas so the selected trainingpoints may have been insufficient Future training points for these vegetation types shouldbe selected using field surveys and more suitable models for modeling global vegetationdistribution should be developed and tested (Jiang et al 2012)

Important variables in vegetation classification modelsVariable selection is directly related to the vegetation distribution modelrsquos ability tocapture important environmental factors (Mod et al 2016) Models predict the important

Yi et al (2020) PeerJ DOI 107717peerj9839 1826

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 19: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

variables that drive the distribution of vegetation (Prasad Iverson amp Liaw 2006) Vegetationdistribution is predominantly driven by temperature precipitation and topographicalvariables (Franklin 1995Mod et al 2016 Prasad Iverson amp Liaw 2006) specifically thoserelated to physiological tolerance site energy and moisture balance (Franklin 1995) Inaddition to environmental variables some spectral variables are used as input variablesHowever the overuse of spectral variables can actually decrease discrimination accuracymeaning that only spectral variables reflecting vegetation information should be selectedsuch as those related to the visible spectrum infrared spectrum and vegetation indices(Price Guo amp Stiles 2002 Zhou et al 2016) Different variables respond to differentinformation Spectral variables directly reflect land surface object information whilegeospatial and climatic variables reveal information about the vegetative environment

Terrain an important variable in vegetation distribution models has long been used toimprove map accuracy especially for regions with large elevation differences (Dobrowskiet al 2008 Oke amp Thompson 2015) Sesnie et al (2010) found that adding elevation asa predictive variable dramatically improved the accuracies of the SVM and RF modelsgt80 for most forest types Slopes with similar elevations but different aspects have verydifferent soil and vegetation temperatures (Gunton Polce amp Kunin 2015Mod et al 2016)Dobrowski et al (2008) highlighted the importance of slope and aspect when mappingvegetation communities in the Sierra Nevada Slope was also an important variable inthis study (Table 7) since different types of vegetation require different precipitation andtemperature levels and have different tolerances to extreme heat and cold The significanceof these climate variables (annual mean temperature temperature range and annualprecipitation) has been validated in other studies (Prasad Iverson amp Liaw 2006 Sesnie etal 2008) We looked at two surface albedo indices (the summer surface albedo of band1 and the winter surface albedo of band 6) Sesnie et al (2010) combined elevation andspectral band data to increase the classification accuracy to a satisfactory level for mostforest types De Colstoun et al (2003) obtained high accuracies (80) when classifyingconiferous temperate broad-leaf and mixed forest types using Landsat ETM+ bandsOther studies have used different vegetation index variables (Price Guo amp Stiles 2002Zhou et al 2016) specific to their study areas and data

The input variables used in our vegetation distribution model are not exhaustiveEcophysiologically meaningful predictors such as soil moisture pH and nutrients shouldbe considered Other factors such as actual light disturbance biotic interactions landuse and bioclimatic information could also be incorporated into vegetation distributionmodels (Dobrowski et al 2008 Mod et al 2016 Prasad Iverson amp Liaw 2006 Sesnie et al2010) We suggest building more ecophysiologically sound vegetation distribution modelsthat require a collaborative effort across the ecological geographical and environmentalsciences (Mod et al 2016)

Other factors affecting classification accuracyIn addition to classification units models and input variables classification accuracy isaffected by other factors including algorithm error and image data (Li et al 2014) Wemust acknowledge the existence of errors in random sample selection modeling and data

Yi et al (2020) PeerJ DOI 107717peerj9839 1926

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 20: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

preprocessing algorithms Remote sensing data sources as well as the date and processingof selected images vary resulting in different simulated values and accuracies (Price Guoamp Stiles 2002) Remote sensing images with high spectral and spatial resolutions providerich spectral and ground information moderately improving the predictive ability of thevegetation distribution model (Peng et al 2002) However the use of high spectral andspatial resolution images creates a greater demand for data access larger computer storagecapacities and faster data processors (Price Guo amp Stiles 2002) which is why we did notuse high spectral and spatial resolution images in this study Moreover some cultivatedvegetation and shelter forests in the Jing-Jin-Ji region are greatly affected by humandisturbance which affects their water-heat conditions and soil nutrition Urbanizationreduces vegetation transforming some areas into industrial commercial and residentialland This has led to the direct or indirect pollution of the water soil and air and thereduced predictive ability of vegetation distribution models The VMC we used for modelassessment was published in 2007 and no updated study has been published over the past10 years The current state of the Jing-Jin-Ji regionrsquos vegetation no longer coincides withthe VMCrsquos assessment

CONCLUSIONSOur main objective was to determine the best simulation method for vegetation affectedby high socioeconomic disturbance in the Jing-Jin-Ji region The RF model was the mostcapable at simulating vegetation distribution across all three units The DT model couldsimulate the vegetation distribution in units I and II The SVM andMLCmodels could notsimulate the distribution in any of the three units Based on the Kappa coefficient the RFmodel was generally better than the DT model and the most suitable model for simulatingvegetation distribution in the Jing-Jin-Ji region The most important variables affectingvegetation classification accuracy were three climate variables (annual mean temperaturemean diurnal range and annual precipitation) one geospatial variable (slope) and twospectral variables (Mid-infrared ratio of winter vegetation index and brightness index ofsummer vegetation index) We recommend using the RF model to produce or improvethe vegetation maps in areas of high human disturbance

ACKNOWLEDGEMENTSWe thank two anonymous reviewers and the editor for their effort to review thismanuscript

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThis work was funded by the National Key R amp D Program of China (2018YFC0506903)The funders had no role in study design data collection and analysis decision to publishor preparation of the manuscript

Yi et al (2020) PeerJ DOI 107717peerj9839 2026

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 21: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Grant DisclosuresThe following grant information was disclosed by the authorsNational Key R amp D Program of China 2018YFC0506903

Competing InterestsThe authors declare there are no competing interests

Author Contributionsbull Sangui Yi conceived and designed the experiments performed the experiments analyzedthe data prepared figures andor tables authored or reviewed drafts of the paper andapproved the final draftbull Jihua Zhou conceived and designed the experiments prepared figures andor tables andapproved the final draftbull Liming Lai performed the experiments prepared figures andor tables and approvedthe final draftbull Hui Du performed the experiments authored or reviewed drafts of the paper andapproved the final draftbull Qinglin Sun analyzed the data authored or reviewed drafts of the paper and approvedthe final draftbull Liu Yang and Xin Liu analyzed the data prepared figures andor tables and approvedthe final draftbull Benben Liu analyzed the data prepared figures andor tables authored or revieweddrafts of the paper and approved the final draftbull Yuanrun Zheng conceived and designed the experiments prepared figures andor tablesauthored or reviewed drafts of the paper and approved the final draft

Data AvailabilityThe following information was supplied regarding data availability

Data is available at the 4TUCentre for Research Data Yi Sangui (2020) Sample data forsimulation in Jing-Jin-Ji region 4TUResearchData Dataset httpsdoiorg104121uuid1b27dc6b-b77e-4f18-b035-e8a249f595c0

REFERENCESBannari A Morin D Bonn F 1995 A review of vegetation indices Remote Sensing

Reviews 13(1ndash2)95ndash120 DOI 10108002757259509532298Bie SW Beckett PHT 1973 Comparison of four independent soil surveys by air-

photo interpretation paphos area (cyprus) Photogrammetria 29(6)189ndash202DOI 1010160031-8663(73)90001-x

Box EO 1981Macroclimate and plant forms an introduction to predictive modeling inphytogeography (Tasks for vegetation science 1) London DR W Junk Publishers

Box EO 1996 Plant functional types and climate at the global scale Journal of VegetationScience 7(3)309ndash320 DOI 1023073236274

Yi et al (2020) PeerJ DOI 107717peerj9839 2126

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 22: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Boyd DS Sanchez-Hernandez C Foody GM 2006Mapping a specific class for priorityhabitats monitoring from satellite sensor data International Journal of RemoteSensing 27(13)2631ndash2644 DOI 10108001431160600554348

Burai P Deak B Valko O Tomor T 2015 Classification of herbaceous vegeta-tion using airborne hyperspectral imagery Remote Sensing 7(2)2046ndash2066DOI 103390rs70202046

Chala D Zimmermann NE Brochmann C Bakkestuen V 2017Migration corridorsfor alpine plants among the lsquosky islandsrsquo of eastern Africa do they or did they existAlpine Botany 127133ndash144 DOI 101007s00035-017-0184-z

Chen LY Li H Zhang PJ Zhao X Zhou LH Liu TY HuHF Bai YF Shen HH FangJY 2015 Climate and native grassland vegetation as drivers of the communitystructures of shrub-encroached grasslands in Inner Mongolia China LandscapeEcology 30(9)1627ndash1641 DOI 101007s10980-014-0044-9

Editorial Committee of VegetationMap of China the Chinese Academy of Sciences2007 The Vegetation Map of the Peoplersquos Republic of China (11 000 000) BeijingGeological Publishing House

CohenWB Goward SN 2004 Landsatrsquos role in ecological applications of remote sens-ing Bioscience 54(6)535ndash545 DOI 1016410006-3568(2004)054[0535lrieao]20co2

Cutler DR Edwards TC Beard KH Cutler A Hess KT Gibson J Lawler JJ2007 Random forests for classification in ecology Ecology 88(11)2783ndash2792DOI 10189007-05391

De Colstoun ECB Story MH Thompson C Commisso K Smith TG Irons JR2003 National Park vegetation mapping using multitemporal Landsat 7 dataand a decision tree classifier Remote Sensing of Environment 85(3)316ndash327DOI 101016s0034-4257(03)00010-5

Deng SB 2010 ENVI remote sensing image processing method Beijing Science pressDilts TEWeisberg PJ Dencker CM Chambers JC 2015 Functionally relevant climate

variables for arid lands a climatic water deficit approach for modelling desert shrubdistributions Journal of Biogeography 42(10)1986ndash1997 DOI 101111jbi12561

Dobrowski SZ Safford HD Cheng YB Ustin SL 2008Mapping mountain vegetationusing species distribution modeling image-based texture analysis and object-basedclassification Applied Vegetation Science 11(4)499ndash508 DOI 1031702008-7-18560

Dormann CFWoodin SJ 2002 Climate change in the Arctic using plant functionaltypes in a meta-analysis of field experiments Functional Ecology 16(1)4ndash17DOI 101046j0269-8463200100596x

Faber-Langendoen D Keeler-Wolf T Meidinger D Tart D Hoagland B Josse CNavarro G Ponomarenko S Saucier J-P Weakley A Comer P 2014 EcoVeg anew approach to vegetation description and classification Ecological Monographs84(4)533ndash561 DOI 10189013-23341

Fick SE Hijmans RJ 2017WorldClim 2 new 1-km spatial resolution climate surfacesfor global land areas International Journal of Climatology 37(12)4302ndash4315DOI 101002joc5086

Yi et al (2020) PeerJ DOI 107717peerj9839 2226

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 23: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Franklin J 1995 Predictive vegetation mapping geographic modelling of biospatialpatterns in relation to environmental gradients Progress in Physical Geography19(4)474ndash499 DOI 101177030913339501900403

Franklin J 2010Mapping species distributions spatial inference and prediction Cam-bridge Cambridge University Press

Gislason PO Benediktsson JA Sveinsson JR 2006 Random forests for land cover classi-fication Pattern Recognition Letters 27(4)294ndash300 DOI 101016jpatrec200508011

Guisan A Zimmermann NE 2000 Predictive habitat distribution models in ecologyEcological Modelling 135147ndash186 DOI 101016s0304-3800(00)00354-9

Gunton RM Polce C KuninWE 2015 Predicting ground temperatures across Euro-pean landscapesMethods in Ecology and Evolution 6(5)532ndash542DOI 1011112041-210x12355

HansenMC Potapov PV Moore R Hancher M Turubanova SA Tyukavina A ThauD Stehman SV Goetz SJ Loveland TR Kommareddy A Egorov A Chini L JusticeCO Townshend JRG 2013High-resolution global maps of 21st-century forestcover change Science 342(6160)850ndash853 DOI 101126science1244693

Haslem A Callister KE Avitabile SC Griffioen PA Kelly LT NimmoDG Spence-Bailey LM Taylor RSWatson SJ Brown L Bennett AF Clarke MF 2010 Aframework for mapping vegetation over broad spatial extents a technique to aidland management across jurisdictional boundaries Landscape and Urban Planning97(4)296ndash305 DOI 101016jlandurbplan201007002

Hastie T Tibshirani R Friedman J 2009 The elements of statistical learning data mininginference and prediction Second edition Berlin Springer

HeldM Rabe A Jakimow B Van der Linden S Hostert P 2014 EnMAP-Box ManualVersion 20 Germany Humboldt-Universitaumlt zu Berlin

Jarnevich CS Stohlgren TJ Kumar S Morisette JT Holcombe TR 2015 Caveatsfor correlative species distribution modeling Ecological Informatics 296ndash15DOI 101016jecoinf201506007

Jiang H Zhao D Cai Y An S 2012 A method for application of classification treemodels to map aquatic vegetation using remotely sensed images from differentsensors and dates Sensors 12(9)12437ndash12454 DOI 103390s120912437

Kabacoff RI 2016 R in action data analysis and graphics with R Second edition BeijingPosts amp Telecom Press

Landis JR Koch GG 1977 The measurement of observer agreement for categorical dataBiometrics 33(1)159ndash174 DOI 1023072529310

Langford ZL Kumar J Hoffman FM Breen AL Iversen CM 2019 Arctic vegetationmapping using unsupervised training datasets and convolutional neural networksRemote Sensing 11(1)69 DOI 103390rs11010069

Lany NK Zarnetske PL Finley AO McCullough DG 2019 Complementary strengthsof spatially-explicit and multi-species distribution models Ecography 421ndash11DOI 101111ecog04728

Yi et al (2020) PeerJ DOI 107717peerj9839 2326

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 24: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Li CWang J Wang L Hu L Gong P 2014 Comparison of classification algorithmsand training sample sizes in urban land classification with landsat thematic mapperimagery Remote Sensing 6(2)964ndash983 DOI 103390rs6020964

Lin H Yue CWu X Xu H Zheng X 2014 Remote sense images classification byEnmap-Box model Journal of Southwest Forestry University 2(34)67ndash71 (In Chinesewith English abstract) DOI 103969jissn2095-1914201402013

van der Linden S Rabe A HeldM Jakimow B Leitao PJ Okujeni A Schwieder MSuess S Hostert P 2015 The EnMAP-Box-A toolbox and application program-ming interface for EnMAP data processing Remote Sensing 7(9)11249ndash11266DOI 103390rs70911249

Van der Linden S Rabe A HeldMWirth F Suess S Okujeni A Hostert P 2014ImageSVM Classification Manual for Application imageSVM version 30 GermanyHumboldt-Universitaumlt zu Berlin

MaHJ Gao XH Gu XT 2019 Random forest classification of Landsat 8 imageryfor the complex terrain area based on the combination of spectral topographicand texture information Journal of Geo-information Science 21(3)359ndash371DOI 1012082dpxxkx2019180346

ModHK Scherrer D LuotoM Guisan A 2016What we use is not what we knowenvironmental predictors in plant distribution models Journal of Vegetation Science27(6)1308ndash1322 DOI 101111jvs12444

Muchoney D Borak J Chi H Friedl M Gopal S Hodges J Morrow N Strahler A 2000Application of the MODIS global supervised classification model to vegetation andland cover mapping of Central America International Journal of Remote Sensing21(6ndash7)1115ndash1138 DOI 101080014311600210100

Newell CL Leathwick JR 2005Mapping Hurunui forest community distribution usingcomputer models (Science for Conservation 251) Wellington Department ofConservation

Oke OA Thompson KA 2015 Distribution models for mountain plant species the valueof elevation Ecological Modelling 30172ndash77 DOI 101016jecolmodel201501019

Pal M Mather PM 2005 Support vector machines for classification in remote sensingInternational Journal of Remote Sensing 26(5)1007ndash1011DOI 10108001431160512331314083

PengWL Bai ZP Liu XN Cao T 2002 Introduction to remote sensing Beijing HigherEducation Press

Pfeffer K Pebesma EJ Burrough PA 2003Mapping alpine vegetation using vegeta-tion observations and topographic attributes Landscape Ecology 18(8)759ndash776DOI 101023BLAND000001447178787d0

Prasad AM Iverson LR Liaw A 2006 Newer classification and regression treetechniques bagging and random forests for ecological prediction Ecosystems9(2)181ndash199 DOI 101007s10021-005-0054-1

Price KP Guo XL Stiles JM 2002 Optimal Landsat TM band combinations and vegeta-tion indices for discrimination of six grassland types in eastern Kansas InternationalJournal of Remote Sensing 23(23)5031ndash5042 DOI 10108001431160210121764

Yi et al (2020) PeerJ DOI 107717peerj9839 2426

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 25: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Rabe A Jakimow B HeldM Van der Linden S Hostert P 2014 EnMAP-Box Version20 Available at httpswwwenmaporg

Sanchez-Hernandez C Boyd DS Foody GM 2007Mapping specific habitats from re-motely sensed imagery support vector machine and support vector data descriptionbased classification of coastal saltmarsh habitats Ecological Informatics 2(2)83ndash88DOI 101016jecoinf200704003

Sesnie SE Finegan B Gessler PE Thessler S Bendana ZR Smith AMS 2010 Themultispectral separability of Costa Rican rainforest types with support vectormachines and Random Forest decision trees International Journal of Remote Sensing31(11)2885ndash2909 DOI 10108001431160903140803

Sesnie SE Gessler PE Finegan B Thessler S 2008 Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and changedetection in complex neotropical environments Remote Sensing of Environment112(5)2145ndash2159 DOI 101016jrse200708025

Sitch S Smith B Prentice IC Arneth A Bondeau A CramerW Kaplan JOLevis S LuchtW Sykes MT Thonicke K Venevsky S 2003 Evaluation ofecosystem dynamics plant geography and terrestrial carbon cycling in theLPJ dynamic global vegetation model Global Change Biology 9(2)161ndash185DOI 101046j1365-2486200300569x

SongMH Zhou CP Ouyang H 2005 Simulated distribution of vegetation types inresponse to climate change on the Tibetan Plateau Journal of Vegetation Science16(3)341ndash350 DOI 101111j1654-11032005tb02372x

Wang L Gong BL 2018 Collaborative governance of ecological space in Beijing-Tianjin-Hebei region Journal of Tianjin Administration Institute 20(5)38ndash44 (Chinese)DOI 1016326jcnki1008-7168201805005

Wang X Liu G Coscieme L Giannetti BF Hao Y Zhang Y BrownMT 2019 Studyon the emergy-based thermodynamic geography of the Jing-Jin-Ji region combinedmultivariate statistical data with DMSP-OLS nighttime lights data EcologicalModelling 3971ndash15 DOI 101016jecolmodel201901021

Wehkamp J Pietsch SA Fuss S Gusti M ReuterWH Koch N Kindermann G KraxnerF 2018 Accounting for institutional quality in global forest modeling Environmen-tal Modelling amp Software 102250ndash259 DOI 101016jenvsoft201801020

Weng ES Zhou GS 2006Modeling distribution changes of vegetation in Chinaunder future climate change Environmental Modeling amp Assessment 11(1)45ndash58DOI 101007s10666-005-9019-1

WuYWWang NL Li Z Chen AA Guo ZM Qie YF 2019 The effect of thermalradiation from surrounding terrain on glacier surface temperatures retrieved fromremote sensing data a case study from Qiyi Glacier China Remote Sensing ofEnvironment 2311ndash9 DOI 101016jrse2019111267

Xie YC Sha ZY YuM 2008 Remote sensing imagery in vegetation mapping a reviewJournal of Plant Ecology 1(1)9ndash23 DOI 101093jpertm005

Yi et al (2020) PeerJ DOI 107717peerj9839 2526

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626

Page 26: Simulating highly disturbed vegetation distribution: …management, and conservation planning require vegetation distribution maps (Franklin, 2010) to better understand, use, and monitor

Zhang G Biradar CM Xiao X Dong J Zhou Y Qin Y Zhang Y Liu F DingM ThomasRJ 2018 Exacerbated grassland degradation and desertification in Central Asiaduring 2000-2014 Ecological Applications 28(2)442ndash456 DOI 101002eap1660

ZhangWT DongW 2017 SPSS statistical analysis advanced tutorial Third editionBeijing Higher Education Press

Zhang Z Clercq EDe Ou XKWulf RDe Verbeke L 2008Mapping dominantvegetation communities at Meili Snow Mountain Yunnan Province China usingsatellite imagery and plant community data Geocarto International 23(2)135ndash153DOI 10108010106040701337410

ZhaoMS Neilson RP Yan XD DongWJ 2002Modelling the vegetation of Chinaunder changing climate Acta Geographica Sinica 57(1)28ndash38 (Chinese)

Zhao X Su Y Hu T Chen L Gao SWang R Jin S Guo Q 2018 A global correctedSRTM DEM product for vegetated areas Remote Sensing Letters 9(4)393ndash402DOI 1010802150704x20181425560

Zheng Y Xie Z Jiang L Shimizu H Drake S 2006 Changes in Holdridge Life Zone di-versity in the Xinjiang Uygur Autonomous Region (XUAR) of China over the past 40years Journal of Arid Environments 66(1)113ndash126 DOI 101016jjaridenv200509005

Zhou J Lai L Guan T CaiW Gao N Zhang X Yang D Cong Z Zheng Y 2016 Com-parison modeling for alpine vegetation distribution in an arid area EnvironmentalMonitoring and Assessment 188408 DOI 101007s10661-016-5417-x

Yi et al (2020) PeerJ DOI 107717peerj9839 2626