morphodynamic analysis and statistical synthesis of

19
Originally published as: Hoffimann, J., Bufe, A., Caers, J. (2019): Morphodynamic Analysis and Statistical Synthesis of Geomorphic Data: Application to a Flume Experiment. - Journal of Geophysical Research, 124, 11, pp. 2561β€”2578. DOI: http://doi.org/10.1029/2019JF005245

Upload: others

Post on 09-Jun-2022

37 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Morphodynamic Analysis and Statistical Synthesis of

Originally published as:

Hoffimann, J., Bufe, A., Caers, J. (2019): Morphodynamic Analysis and Statistical Synthesis of Geomorphic Data: Application to a Flume Experiment. - Journal of Geophysical Research, 124, 11, pp. 2561β€”2578.

DOI: http://doi.org/10.1029/2019JF005245

Page 2: Morphodynamic Analysis and Statistical Synthesis of

Morphodynamic Analysis and Statistical Synthesisof Geomorphic Data: Applicationto a Flume Experiment

JΓΊlio Hoffimann1 , Aaron Bufe2 , and Jef Caers3

1Department of Energy Resources Engineering, Stanford University, Stanford, CA, USA, 2German Research Center forGeosciences, Postdam, Germany, 3Department of Geological Sciences, Stanford University, Stanford, CA, USA

Abstract Many Earth surface processes are studied using field, experimental, or numerical modelingdata sets that represent a small subset of possible outcomes observed in nature. Based on these data,deterministic models can be built that describe the β€œaverage” evolution of a system. However, these modelscommonly cannot account for the complex variability of many processes or present a quantitativestatement of uncertainty. To assess such uncertainty, stochastic models are needed that can mimic spatialas well as temporal variability. A common limitation for applying stochastic models to Earth surfaceprocesses is a lack of data and methods that allow constraining the full spatiotemporal variability of thesemodels. In this paper, we propose a Bayesian framework for calibrating input parameters to stochasticmodels of morphodynamic systems using time series of image data from the field, or from numerical andlaboratory experiments. The framework consists of generating synthetic time series of images using thestochastic model and rejecting those time series that do not reproduce key morphodynamic statistics of theavailable data sets. The calibrated stochastic model allows us to quantify both the spatial and temporaluncertainty about the evolution of the morphodynamic systems of interest. For demonstration purposes,we apply the framework to a single flume experiment of braided river channels evolving under steadywater and sediment discharges, but it can be used more generally to quantify spatiotemporal uncertaintyfor any time series of morphodynamic data for which key statistics can be defined.

1. IntroductionThe evolution of many Earth surface systems (such as landscapes or landforms) is governed by processesthat are highly variable in both space and time and that are characterized by complex internal (termed β€œauto-genic”) dynamics. Yet, this known variability is challenging to constrain with deterministic approaches thatmodel the average behavior of a process. Deterministic models have been used with great success to charac-terize, for example, incision into uplifting bedrock (Howard & Kerby, 1983; Kirby & Whipple, 2012; Sklar &Dietrich, 2004), erosion of hillslopes by creep (Roering et al., 1999; Braun et al., 2001) or mass movements(Bishop, 1959; Iverson, 2000), morphodynamics of alluvial channels (Bryant et al., 1995; Howard & Knutson,1984; Parker, 1979), the transport of sediment by rivers (Meyer-Peter & MΓΌller, 1948), the kinematics of foldsand faults (Suppe, 1983), or the evolution of entire landscapes (Tucker & Hancock, 2010). However, thesemodels tend to not constrain the significant variability that may occur. For example, Ma et al. (2014) esti-mated that macroscopically averaged formulas to predict bed load transport rates could deviate by as muchas 1 to 2 orders of magnitude from field data. Such prediction errors motivate alternative treatments of Earthsurface processes.

Stochastic models can complement deterministic models by describing the ensemble of possible states orpossible spatiotemporal evolutions of a system (Tarantola, 2006). Stochastic models are based on a prob-abilistic treatment of physical processes, and include a collection of input parameters that are assignedprobability distributions instead of deterministic values. Hence, the output of these models varies as a func-tion of the random draw of input parameters. Stochastic models have been used in the Earth sciences todescribe, for example, the occurrence of landslides and their impact on topography (Convertino et al., 2013;Moon et al., 2015), the synchronization of snowfall and avalanche release (Crouzy et al., 2015), the transportof sediment on hillslopes (Furbish et al., 2018) and in rivers (Ancey et al., 2008; Furbish et al., 2012; Lianget al., 2015; Lopez, 2003), or the buildup of stratigraphy (Straub & Wang, 2013). Compared with deterministic

RESEARCH ARTICLE10.1029/2019JF005245

Key Points:β€’ We propose a Bayesian framework

to calibrate stochastic models ofsurface dynamics usingmorphodynamic statistics fromtime series of images

β€’ We propose a stochastic model forbraided channel movement and analgorithm to sample it in a computer

β€’ We use a flume experiment dataset to illustrate the frameworkand to quantify uncertainty in themovement of braided rivers

Correspondence to:J. Hoffimann,[email protected]

Citation:Hoffimann, J., Bufe, A., & Caers, J.(2019). Morphodynamic analysis andstatistical synthesis of geomorphicdata: Application to a flumeexperiment. Journal of GeophysicalResearch: Earth Surface,124, 2561 2578.https://doi.org/10.1029/2019JF005245

Received 5 JUL 2019Accepted 9 OCT 2019Accepted article online 1 NOV 2019

Β©2019. American Geophysical Union.All Rights Reserved.

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2561

–

Published online 19 NOV 2019

Page 3: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

models, stochastic models are commonly less constrained to the physics governing the modeled system andare more likely to produce unrealistic configurations. Hence, in order to be useful, stochastic models needto be extensively calibrated with available data.

Unfortunately, the Earth surface presents to us only a small set of β€œnatural experiments” (i.e., data) thatevolve under a particular set of boundary conditions. Moreover, financial and time constraints may prohibitrepeating experiments in the field or the laboratory often enough to constrain the range of possible evolu-tions of a landform or landscape. With such limitations, it can be difficult to calibrate stochastic models andto capture the full spatiotemporal dynamics of a system and corresponding uncertainties. Here, we present aBayesian framework to calibrate stochastic models of morphodynamic systems on the basis of key morpho-dynamic statistics computed with available time series of image data. By morphodynamic system, we referto a system that is characterized by changes in shape, structure, and position of objects such as channels,watersheds, dunes, or mountain ranges.

Our framework is applicable to any time series of image data containing morphodynamic changes. For illus-tration purposes, we specifically develop and demonstrate the framework using a time series of images froma single laboratory experiment that records the autogenic movement of braided alluvial channels with con-stant water and sediment discharges. Alluvial river channels that migrate across floodplains or alluvial fansare a prominent example of morphodynamic systems and their movements and configurations are subjectto large temporal and spatial variations. For this specific demonstration, our approach consists of defining aspace-time stochastic model for braided alluvial channel movements, a set of key summary statistics relatedto morphological evolution of the channel network, and a method to generate a large number of synthetictime series that can reproduce the statistical behavior of the original experiment. The stochastic model isbased on the conceptual understanding that braided channels exhibit frequent morphological changes ofsmall magnitude and rare morphological changes of large magnitude, where the small magnitude com-ponent can be conceptualized as the gradual back and forth migration of braided channels, and the largemagnitude component as major reorganizations of the channel system that occur, for example, during theabrupt rerouting of water from an old into a new channel (aka avulsions).

Following the development of the framework, we discuss its applicability to other morphodynamic sys-tems, its potential to quantify statistical similarities between data and models, and limitations that could beaddressed in future work.

2. Experimental DataHere, we demonstrate our general methodology with imagery data of braided alluvial channels that movein unconsolidated sediments. Although the details of the approach are tuned to this particular data set, themethodology can be adapted to any imagery data of a system that can be represented by frequent small-scalevariability around a morphological state and large-scale infrequent reorganizations of that state. Moreover,many parts in the methodology are modular and can be replaced by similar components.

Laboratory experiments allow detailed observations of geomorphic processes under controlled boundaryconditions and data collection at a high resolution in both time and space (Bufe et al., 2016; Esposito et al.,2018; Ganti et al., 2016; Paola et al., 2009). Here, we use photographic data from an experiment of braidedchannel movement in noncohesive sand under steady water and sediment discharges and a steady base level(data available as β€œRun 7” in Bufe et al., 2016). The experiment was conducted in a rectangular sand-filledwooden box with dimensions of 4.8 Γ— 3.0 Γ— 0.6 m (see Figure 3 in Bufe et al., 2016). A steady discharge ofsediment (15.8 ml/s) and blue-dyed water (790 ml/s) was fed from a point source at one of the short sidesof the rectangular basin and flowed over a weir at the opposite side. Overhead photographs were collectedevery 1 min and at a spatial resolution of 1 mm. Throughout the experiment, the system was dominated bysediment bypass during which > 80% of the input sediment was transported across the entire basin, and theoverall surface change was relatively small (see Figure 5 in Bufe et al., 2019). Under these circumstanceschannel movement occurred primarily by gradual channel migration with infrequent channel avulsions(sudden changes in channel position) (Bufe et al., 2019; Jerolmack & Mohrig, 2007). At the beginning of theexperiment, multiple unincised braided channels migrated back and forth across the basin, but at ∼450 min,a major autogenic incision event occurred that concentrated all water into a single channel. In the remainderof the experiment, the system slowly widened the incised valley, but the experiment did not continue longenough for the system to return to its initial condition. Thus, within the existing data, this single event

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2562

Page 4: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

brings the channel system into an β€œabsorbing” state, from which the system does not recover. Because wecannot model such absorbing states with our model, we restricted our analysis to the first 400 min of data.In the discussion, we address possible solutions to this limit that can be incorporated into the model withadditional work. Following Bufe et al. (2016), we cropped all photographs to the basin extent and convertedthem to binary images with a value of one for wet pixels and a value of zero for dry pixels in the basin. Finally,we upscaled these binary images from 3,000 Γ— 4,600 pixels to 100 Γ— 150 pixels for faster analysis. We denotethe resulting sequence of binary images as β„‘ = (I1, I2, … , In=400).

3. Stochastic Process of Surface DynamicsIn this section, we introduce a stochastic process (and model) that attempts to mimic the physical behaviorof the morphodynamic process of braided channel migration in space and time. The stochastic process isvery general and should be applicable to a range of other morphodynamic systems. Here, we first provide aconceptual description, followed by a detailed mathematical description.

3.1. Conceptual ModelA common characteristic of the morphological variability of many Earth surface processes is thathigh-frequency, low-magnitude changes are interrupted by infrequent and large changes to the system. Forexample, many braided channels gradually migrate back and forth across a restricted active fluvial area,and this migration is interrupted by less frequent abrupt shifts in channel position (avulsions) (Jerolmack &Mohrig, 2007). We model such systems as consisting of β€œmodes” and transitions between β€œmodes.” Modesare defined as states of the system where spatial patterns at the surface remain similar (relative to the largechanges), while large-scale variations are modeled as infrequent and represent transitions into other modes.

To turn this conceptual idea into the definition of a stochastic model, we will define (1) the modes of thesystem, (2) transitions between modes that have certain frequencies and transition probabilities, and (3) thesmall-scale random variations within a mode.

3.2. Modes of the SystemHere, we rely on previous work that showed how many fluvial processes can be decomposed into a reducedset of modes (Scheidt et al., 2016). In the case of a time series of imagery data, each mode can be representedby a single image that is representative of a group of images, which are more similar to each other than toimages in other modes. As shown in Scheidt et al. (2016), with a reduced set of such representative imagesthe total variability seen across all images is maintained.

The key step in defining modes is to find groups (or clusters) of images of the experiment based on a measureof morphological similarity (or dissimilarity). Here, we use the modified Hausdorff distance (MHD) dMH asour measure of dissimilarity between any two β€œwet-dry” images (or frames) of the experiment. The MHDis a powerful measure of dissimilarity between shapes, and it is used in many computer vision applications(Dubuisson & Jain, 1994; Huttenlocher et al., 1993). In simple terms, the MHD between a body A and abody B is the maximum of two values, dAB and dBA, where dAB is the mean of the set of Euclidean distancesbetween each point on A and the closest point on B, and where dBA is a similar mean of the set of Euclideandistances between each point on B and the closest point on A. This measure is therefore sensitive to botha change of shape or a movement of a body in spaceβ€”in this case the difference in channel positions andchannel shapes between two images.

In order to find groups of images that are similar and define a mode we use density-based spatial clusteringof applications with noise (DBSCAN) (Ester et al., 1996) of the MHD distances between images. DBSCAN isa scalable clustering algorithm designed for arbitrarily shaped point sets (each point is a β€œwet-dry” image)that exploits the MHD matrix between all pairs of images of the experiment (Figure 1b). In this algo-rithm, a threshold of dissimilarity determines whether or not any two images belong to the same cluster.Hence, the threshold determines the number of clusters or modes of the system. The higher the thresh-old of dissimilarity, the smaller is the number of clusters, and the larger is the number of images withineach cluster. In principle, the threshold can be selected based on a physical understanding of the system.For example, a threshold for the data set of braided channel movements could be chosen to capture a givenavulsion magnitude. For demonstration purposes, in this study we select the threshold based on a visualinspection of clusters. We note that other distance measures, clustering algorithms, and thresholds couldbe used. To visualize the clustering results, we use t-distributed stochastic neighbor embedding (t-SNE)

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2563

Page 5: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Figure 1. Modes (or morphological states) of the system obtained with DBSCAN. (a) Distances between images shownin t-SNE space. The Euclidean distances between points in t-SNE space approximate the modified Hausdorff distances.Fifteen modes (or clusters) are highlighted in gray color. For each mode, the image that represents the cluster centroidis shown. (b) The modified Hausdorff distance for all binary β€œwet-dry” image pairs.

(van der Maaten & Hinton, 2008); t-SNE is a multidimensional scaling method that embeds informationof distances between high-dimensional objects into lower dimensions. The method converts the MHDdistances between images (Figure 1b) into probabilities of being similar, and then solves a probabilistic opti-mization problem to place images (represented as points) into a 2-D t-SNE space (Figure 1a). In this space,small distances between images (points) indicate morphological similarity. This information is rather qual-itative, and the axes in Figure 1a do not have a physical meaning. Using a threshold of dissimilarity selectedbased on visual inspection, we get 15 clusters (modes) that are well distinguished (highlighted in gray color)in t-SNE space (Figure 1a). We will denote this finite set of modes in our example as

𝔐 ={

M1,M2, … ,Mm=15}

(1)

Each image (or point) in Figure 1a is colored with its time index (1 to 400 min) from the real experiment.Some clusters like cluster M1 and M15 are far apart in t-SNE space, and that means that images in M1 arevery different from images in M15 in terms of morphology. Moreover, not all modes have similar β€œsize” inthe sense that some modes contain more images than others. The number of images in each cluster/modeis here interpreted as a proxy for the likelihood of the system to occupy that mode in the limit in which thesystem has reached a steady state. Therefore, we normalize the cluster sizes |M1|, |M2|, … , |Mm| to obtaina set of probabilities πœ‹π‘— =

|M𝑗 |βˆ‘i|Mi| for each mode of the system. We say that the system spends a fraction πœ‹j of

the time in mode Mj.

3.3. Transitions Between ModesThe simplest stochastic process that models the transition between a pair of modes needs at least two param-eters: (1) a transition probability that describes the probability of the new mode given the current mode and(2) a frequency of transitions that describes how often transitions occur.

Here, we model transitions as a function of the similarity between modes. Thus, transitions between modesthat β€œlook” similar (as defined by being close in t-SNE space (Figure 1a) are more likely than transitionsbetween more distant modes. In physical terms, small rearrangements are more likely than major reconfig-urations of the channel network, as suggested by many frequency-magnitude scalings of surface processes.The system transitions between these modes according to transition probabilities Pij = Pr(Mi β†’ Mj). Thesetransition probabilities, a matrix of size m Γ— m, must satisfy a set of constraints in order to reproduce theprobability distribution πœ‹ that describes the fraction of time of the system within each mode. In Markovchain theory, πœ‹ is termed a stationary distribution of the system, and many transition probabilities lead to

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2564

Page 6: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Figure 2. Generation of synthetic images with random variations around a mode. Representative images of each mode(first column) are smoothed (second column). This smoothed version is used to generate synthetic images via imagequilting (last three columns). Synthetic images reproduce the main morphological features of the mode and plot withinthe mode in t-SNE space.

the same stationary distribution. In Appendix A, we propose a model of transition probabilities that exploitsthe Euclidean distance Dij between modes Mi and Mj in the t-SNE space. These transition probabilities areapproximately proportional to

Pi𝑗 ∝ 1√𝜎2

exp

(βˆ’1

2

D2i𝑗

𝜎2

)(2)

with 𝜎 > 0 a dispersion parameter. The role of the dispersion parameter 𝜎 can be understood as follows:A high dispersion implies a high chance of transitions occurring between very distinct modes, which isequivalent to a high chance of extreme morphological changes, or major channel movements in our exper-iment with braided alluvial channels. In addition to the dispersion, the obtained transition probabilities Pijreproduce the stationary distribution (i.e., πœ‹P = πœ‹).

The transition probability based on the dispersion parameter models how the stochastic process movesbetween modes, but a second parameter is needed to describe the frequency of transitions and, thus, howlong the process remains within a single mode. We propose to model the frequency of transitions as a homo-geneous Poisson process with fixed rate parameter πœ† (Cox & Isham, 1980), which is equivalent to modelingthe interarrival times of transitions using an exponential distribution with parameter πœ†. The Poisson processis easy to simulate and well suited for large changes that are not clustered in time, such as the transi-tions between modes. For systems in which extreme changes are clustered in time (e.g., earthquakes andaftershocks), heterogeneous point processes with varying rates may be more appropriate than a Poisson pro-cess. We emphasize that our methodology is agnostic to this modeling choice and other processes can beimplemented into our framework.

3.4. Random Variations Within a ModeIn order to allow the generation of a large number of alternative data sets, we produce synthetic images foreach of the modes of the experiment. Within a single mode, the system remains similar, and changes in theriver network remain relatively small. In previous work (Hoffimann et al., 2017; Scheidt et al., 2016), weshowed that the variability generated within such modes can be reproduced by geostatistical simulationsthat model the patterns within that mode. We use an image quilting algorithm (Hoffimann et al., 2017),

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2565

Page 7: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

an idea borrowed from computer graphics, to produce a number of images that are a variation of a giventraining image (Figure 2), but various other geostatistical simulation algorithms could be used (Arpat &Caers, 2007; Mariethoz et al., 2010; Strebelle, 2002; Tahmasebi et al., 2014; Yang et al., 2016; Zhang et al.,2006). Our choice of image quilting is motivated by its computational performance and ability to reproducecomplex shapes without the need to fine tune parameters. In simple terms, image quilting stitches togethertiles that were randomly extracted from training images to produce new images with similar spatial patterns.Here, the training images are the centroids of the modes selected through the clustering process. A centroidof a mode (i.e., cluster of points in t-SNE space) is a point in the mode that is nearest to the center (i.e.,arithmetic mean of points) of the mode. When multiple centroids exist for a mode, the final centroid (ortraining image) is selected at random from the list of centroids.

In image quilting, the random selection of patterns from the training image can be controlled with auxiliaryvariables (Hoffimann et al., 2017). In order to assure that random variations remain within the mode (withhigh probability), we introduce a blurred version of the training image as an auxiliary variable to constrainthe rearrangement of patterns in the tank. Based on this blurred image and the training image itself, a newsynthetic image is produced at the initial resolution. Without blurring (i.e., auxiliary variable = trainingimage), image quilting would not produce any change in the training image (Hoffimann et al., 2017). On theother hand, for very strong blurring, the synthetic images could fall outside the mode boundaries. The ideaof using auxiliary variables to constrain morphology in geostatistical simulation is not new and has beenapplied before, for example, for generating random elevation maps around river centerlines (Pirot et al.,2014). Figure 2 illustrates the image quilting results for two modes of the system and for a small smoothingwindow (e.g., 3 Γ— 3 pixels). The resulting synthetic image is not based on a physical understanding of themorphodynamic process (in this case, braided channel movement). It is simply a random variation of thetraining image that presents a given morphological similarity to the training image. Therefore, physicallyimpossible images can be produced, but the variability around the mode of the synthetic images has to bestatistically similar to the natural variability. It is possible to develop additional tests that assure physicalplausibility including, for example, the continuity of channels or a smooth variation of channel width. A keyassumption behind our approach is that statistical tests that are appropriate to the time and spatial scalesof investigation can be used effectively to (1) reduce the number of nonphysical synthetic time series and(2) select synthetic time series that quantify the variability in the overall behavior of the real system, even ifsome features may be unphysical on smaller scales.

3.5. Implementation and Monte CarloOur method generates synthetic time series of image data (i.e., movies) using a combination of Markov,Poisson, and image quilting processes. A random sequence occupation of modes is determined based on thetransition probabilities (Markov process). Then, a number of images (Poisson process) is randomly drawnfrom each of the modes using the image quilting process. In technical terms, we present an algorithmtermed Markov-Poisson sampling that samples from the above defined stochastic process, which has twoparametersβ€”the process rate πœ† and the process dispersion πœŽβ€”introduced in section 3.3. This process is nowwritten as Sπœ†,𝜎(x, t) in space (x) and time (t). Each sample consists of a time series of synthetic images fortimes t = 1, 2, … ,T. Algorithm 1 takes as input the system configuration (the set of modes, 𝔐, the set ofEuclidean distances in t-SNE space, D, and the stationary distribution πœ‹), the stochastic process parameters(πœ†, 𝜎), and a length T for the sample (in our example T is 400 min).

First, the transition matrix P is obtained via convex optimization (see Appendix A) as a function of (D, πœ‹, 𝜎).An empty movie S is initialized and one mode of the system s is drawn at random from the stationary dis-tribution πœ‹. For each iteration in the loop, the algorithm determines the mode sβ€² that the system occupiesat that iteration based on the mode s of the previous iteration and the transition probabilities P. The algo-rithm then determines the time Ξ”t that the system spends in that mode and creates synthetic images fromthe centroid of the mode Mβ€²

s to be appended as frames in the movie S. In order to generate multiple samples(Monte Carlo), we simply execute Algorithm 1 multiple times.

The key parameters in this algorithm are the Poisson process parameter πœ† that controls the probability of atransition to some new mode and the dispersion 𝜎 that, for any transition that occurs, describes the proba-bility of picking the new mode based on the distance between modes. So far, we considered sampling fromyet uncalibrated parameters, and in the next section, we explain how distributions f(πœ†) and f(𝜎) on theseparameters can be learned based on the physical experiment.

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2566

Page 8: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

4. Bayesian InferenceIn this section, we address two questions. (1) (falsification) Can the series of synthetic images produced bythe stochastic process Sπœ†,𝜎(x, t), with given prior distributions f(πœ†) and f(𝜎), reproduce key statistics of theexperiment I? (2) (inversion) What is the posterior distribution of parameters πœ† and 𝜎 given the experiment,i.e. what is 𝑓 (πœ†, 𝜎|β„‘)? The falsification is a necessary step for the inversion, and it is fundamental becauseit tests the hypothesis that the stochastic process can, in general, produce synthetic models that are realisticalternative outcomes of the experiment. We propose that a model can be considered equivalent to the exper-iment if key test statistics are similar (within some difference threshold πœ–). Here, we define three key teststatistics that are relevant for the flume experiment, but the falsification and inversion can include any typeand number of statistical summaries that are appropriate for the given type of data and the desired detailsof the inferences drawn from the data.

4.1. Definition of Test StatisticsMany forms of statistical summaries exist that can be used to describe morphological features (Liang et al.,2016). Here, we use statistical measures common in time series analysis to summarize the morphodynamicvariability of the geomorphic system on both short and long timescales.

The long-term temporal variability in dynamic systems such as floods, landslides, or earthquakes, is com-monly represented by return levels and return periods (equivalent to frequency-magnitude relationships)(Dalrymple, 1960; Gutenberg & Richter, 1944; Hovius et al., 1997). The same return level analysis can beapplied to morphological changes resulting from braided channel movements. We calculate a measure ofdifference between pairs of consecutive images d(t) for the original time series of image data I and for allmodel runs (Figure 3a):

d(t) = dMH(

It, It+1), t = 1, 2, … ,n βˆ’ 1 (3)

Again, we use the modified Hausdorff distance dMH as a measure of difference between images that is sen-sitive to the variability of both the shape and position of objects. Next, we calculate return statistics onthe magnitude of change in the time series of modified Hausdorff differences between consecutive images(Figure 3). In general, a return level for large magnitudes X and return period T (here T is in minutes)implies that a change of shape and position of magnitude X(T) is expected to occur, on average, once everyperiod of T. In other words, X(T) is the magnitude of change that is exceeded within time T with probabil-ity 1βˆ•T. Return levels are calculated by block maxima (Coles, 2001) in which blocks of increasing size T arecreated within which the maximum change (the highest calculated modified Hausdorff distance between

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2567

Page 9: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Figure 3. Return statistics for the experiment. (a) Time series d(t) of the modified Hausdorff distance betweenconsecutive binary images of the experiment. Abrupt changes in morphology generate a spike in the plot, for example,between t = 162 min and t = 163 min of the experiment. (b) Return levels and return periods computed from the timeseries.

two sequential images) is retained. Obviously, the larger the blocks T, the larger the return level, but thesmaller the total number of blocks of that size. Figure 3b shows the empirical return levels of the time seriesd(t) of the experiment.

Another common way of studying large events is through the use of generalized extreme value distributionsor generalized Pareto distributions (Coles, 2001). These distributions are characterized by an extreme valueindex πœ‰, which is a measure of how β€œheavy” or β€œlong” the tail of the distribution is. Here, we estimate πœ‰

for the time series of the modified Hausdorff distances of consecutive images, d(t) via the mean excess plot(Beirlant, 2004). The mean excess plot is constructed by calculating an extreme value index for a subset ofthe k largest d(t) values where k varies between 1 (only the largest d(t) value retained) and 400 (all valuesin the time series are retained) (Figure 4). For small k (between ∼1 and 20 values) the extreme value index

Figure 4. Mean excess plot for the experiment showing an extreme value index of approximately πœ‰ = 0.15, and hence aPareto behavior of extreme morphological changes. Retaining more maxima than 25 leads to high bias (i.e., deviationfrom the true unknown index). A number of maxima much smaller than 25 leads to high variance (i.e., indices that aretoo sensitive to fluctuations in the data).

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2568

Page 10: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Figure 5. Empirical variograms of wetted area. (a) Wetted area w(t) as a function of time, shown in blue color. Asmooth mean is filtered out (dashed gray line), then the empirical variogram is computed based on the detrended timeseries, shown in black color. (b) The empirical variogram computed from the wet area process. The correlation length(or range) is β‰ˆ20 min. For lags larger than the correlation length, the variations in wet area are uncorrelated.

varies strongly with k, that is, the mean excess plot shows high variance. In contrast, for a large numberof retained distance values (k > 150), the estimate of the extreme value index deviates from the true value(i.e., the estimator is biased) (Beirlant, 2004) (Figure 4). An extreme value index of πœ‰ β‰ˆ 0.15 is found fork = 20–150. This positive value implies that morphological changes in the experiment are rare and largeand show Pareto behavior (Ganti et al., 2011).

Return levels and extreme value indices are particularly useful to characterize large magnitude changes(extremes) that have relatively long return periods (e.g. changes between modes of the system). In order tocompare the small-scale spatial and temporal variability between the experimental and the synthetic timeseries, we construct variograms of the temporal evolution of wetted area in the basin, w(t). The wetted areais simply the number of wet pixels in an image and reflects the movement, expansion, and contraction ofthe stream network (Bufe et al., 2016; Wickert et al., 2013). In turn, the variogram is a statistical summarythat quantifies the temporal dependence of the time series for increasing time lags (Chiles & Delfiner, 1999).For data that are temporally correlated (in other words, the evolution of wetted area follows a clear pat-tern), the variogram increases as the time lag increases. At some time interval, termed β€œcorrelation length,”the variogram stops increasing with the time lag, and the data are completely uncorrelated. In other words,beyond the correlation timescale, the evolution of the wetted area is completely uncorrelated with the initialimage. The shape of the variogram will depend on the evolution of the wetted area and, therefore, on thecharacteristics of the rates and patterns of reworking of the basin area by the braided channels. For exam-ple, the correlation length is expected to scale with the migration rates of channels. Before calculating thevariogram, we remove long timescale trends in wetted area by calculating the residual to a filtered mean ofthe data (Figure 5a). The removal of any long timescale trends is important to ensure that the obtained var-iograms are a good summary of short timescale variations. We calculate the empirical variogram (Chiles &Delfiner, 1999) of w0(t) (Figure 5b):

οΏ½Μ‚οΏ½(𝛿t) = 12|N(𝛿t)| βˆ‘

(t,tβ€²)∈N(𝛿t)

(w0(t) βˆ’ w0(tβ€²)

)2 (4)

with 𝛿t the time lag, and N(𝛿t) the set of pairs of times (t, tβ€² ) that are 𝛿t units apart. For the experiment,the correlation length is β‰ˆ20 min. Therefore, the variogram analysis of the wetted area provides relevantstatistics for the morphodynamics of the system on short (minute) to medium (β‰ˆ20 of minutes) timescales.

On the basis of this correlation timescale, we choose to compare the experimental data and the model timeseries with the variogram of the wetted area for 𝛿t < 20 min. In turn, we consider return levels only for return

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2569

Page 11: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Figure 6. Results of the falsification step. Test statistics of all model runs (gray curves) are compared against theexperiment statistics (blue curves and vertical dashed line). The synthetic samples span the experiment statistics andshould, in principle, be able to reproduce these statistics with the correct choice of parameters. (a) variogram statistics,(b) return statistics, and (c) extreme value index.

periods above 50 min, a duration well above this correlation timescale in which morphological changesoccur as rare extreme events.4.2. Statistical FalsificationHere, we compare the return level plot, the extreme value index, and the variogram of the wetted area esti-mated from the experimental data to the model runs in order to test whether the stochastic process can,in principle, produce models that are statistically similar to the experiment. This falsification consists ofperforming Monte Carlo simulations of the stochastic process with varying πœ† and 𝜎 and comparing the sta-tistical measures generated with the values of the experiment. The model is falsified if the statistical behaviorof the real experiment lies outside the range of statistical behaviors of all model runs. In that case either(1) the conceptual model is wrong, (2) the prior distributions are wrong, or (3) both are wrong. In con-trast, if the test statistic results of the real experiment lie within the range of results from the synthetic timeseries, the model can, in principle, reproduce the behavior of the experiment if the correct combination ofπœ† and 𝜎 are used.

Falsification requires stating the prior distributions f(πœ†) and f(𝜎). Because our type of statistical model is new,and because we have gained insight into the models sensitivity to input parameters during the developmentprocess, we chose a 2-order-of-magnitude-wide prior for both parameters. By trial and error, we noticedthat increasing the prior further would only increase computational time and lead to models that werevisually quite different from the experiment. We sampled parameters uniformly from f(πœ†, 𝜎) = f(πœ†)f(𝜎) =Uniform(0.001, 0.1) Γ—Uniform(0.1, 10.0). The uniform distribution reflects our indifference in choosing anyparticular value in the stated ranges. Another reasonable choice for the rate parameter πœ† (i.e., a positivereciprocal parameter) would be a Jeffreyss prior (Tarantola, 2001), but we did not implement this prior here.In any case, the prior is chosen according to a maximum-entropy criterion, wherein there is value in beingas faithful to what is not known as there is in considering what is known. Notice that samples from the priordistribution are uncorrelated. It is only later with Bayesian inversion that correlations may (and often will)emerge.

We find that with the chosen prior distribution, the experiment lies well within the test statistic results of allmodel runs (Figure 6) and that the model is not falsified. This similarity in the test statistics does not provethat the stochastic model is true. However, the test against three key test statistics strengthens the hypothesisthat the stochastic model can produce realistic alternative outcomes of the experiment.

In the remainder of the analysis, we continue with only the variogram of the wetted area and the returnstatistics of extreme morphological changes. The extreme value index characterizes long timescale changes

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2570

Page 12: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

similarly to the return levels and does not contribute additional constraints on systems behavior (i.e.,Bayesian inversion results with and without πœ‰ should be statistically equivalent).

4.3. Bayesian InversionGiven that the stochastic model can reproduce key statistical summaries of the experiment, we now proceedwith estimating the posterior distribution 𝑓 (πœ†, 𝜎|β„‘). By posterior, we mean that the distribution of πœ†, and 𝜎 isconstrained to match the experimental statistics within some range of allowed variability. This is a key step tofind those synthetic image time series that can be considered alternative realizations of the original data set.Our method relies on approximate Bayesian computation (ABC). ABC is a method whereby, using MonteCarlo, multiple samples of the parameters are generated and then are accepted/rejected according to somemeasure of difference with the observations and a stated difference threshold πœ–. The method is approximatein the sense that πœ– is not zero, which allows the statistics of the models to deviate (up to some threshold πœ–)from the statistics of the experiment (Marin et al., 2012). We chose ABC as our inversion method because itdoes not require additional distributional assumptions on the stochastic model nor on the measure of dif-ference. The measure of difference here is simply the difference between the simulated statistics and theobserved statistics. The choice of the difference threshold is guided by computational resource limits. Ideally,epsilon is chosen as close to zero as possible. An epsilon of zero would imply that only those synthetic exper-iments that have the exact same test statistics as the real experiment would be chosen. However, choosing asmall or zero epsilon leads to a low number of models that are accepted in a finite amount of computationaltime. Therefore, a common methodology for choosing epsilon given finite computational resources is togradually increase the threshold until a satisfactory number of models is obtained. The approximation errorassociated with choosing a positive πœ– as opposed to zero is only understood in very simple situations whereone assumes independent and identically distributed additive error (Van Der Vaart et al., 2018; Wilkinson,2013). In more complex situations, the impact of a nonzero epsilon comparability between natural and syn-thetic time series has to be otherwise assessed. We do not explore the impact of the similarity thresholdepsilon on the pattern of braided channels here, but we note that a systematic investigation into the effectof epsilon would be useful.

We denote οΏ½Μ‚οΏ½πœ‰ (extremal behavior as indicated by the return level plot) and �̂�𝛾 (short timescale behavior asindicated by the variogram) the summary statistics based on the experiment and denote πœ‚πœ‰ = πœ‚πœ‰(Sπœ†,𝜎) andπœ‚π›Ύ = πœ‚π›Ύ (Sπœ†,𝜎) the statistics generated by a sample of the stochastic model. We define a measure of difference

between these functions dπœ‚

((οΏ½Μ‚οΏ½πœ‰ , �̂�𝛾 ), (πœ‚πœ‰, πœ‚π›Ύ ))

), using functional analysis (Rudin, 1991) and then we perform

the following Algorithm 2, which represents an ABC.

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2571

Page 13: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Figure 7. Results from the inversion step showing histograms of the frequency and dispersion parameters of allaccepted model runs as well as a scatter plot of the two parameters. Contours in the scatter plot highlight the density ofpoints. Two regions that both reproduce the test statistics of the physical experiment(low-πœ†, high-𝜎 and high-πœ†, andlow-𝜎) are visible in the parameter space and are separated by the red dashed line. The black solid line is a regressionline illustrating the negative correlation between the two parameters.

The algorithm takes as input the prior distribution on input parameters (f(πœ†), f(𝜎)), the statistics of the exper-iment (οΏ½Μ‚οΏ½πœ‰ , �̂�𝛾 ), a threshold πœ–, and the desired number of accepted posterior models L. We chose to generateL = 400 accepted model runs, and the threshold value of πœ– was selected to allow reaching L in a reasonablecomputing time (here πœ– = 10βˆ’3). At each iteration, samples of the input parameters (πœ†, 𝜎) are drawn from theprior distribution, then Markov-Poisson sampling (Algorithm 1) is performed, and statistics are extractedfrom the synthetic movie S. If the distance dπœ‚ to the experiment statistics is smaller than the threshold πœ–,then the parameters (πœ†, 𝜎) are accepted. Because ABC is computationally intensive, particularly for smallthreshold πœ–, we use the prior samples generated in the falsification step (section 4.2) to build a kernel densityestimator kde(πœ†, 𝜎, dπœ‚) and to sample triplets (πœ†, 𝜎, dπœ‚) efficiently from the prior distribution. In this case, atriplet is discarded whenever dπœ‚ β‰₯ πœ–.

The posterior distribution 𝑓 (πœ†, 𝜎|β„‘) is shown in Figure 7. Examples of accepted model runs and their sum-mary statistics are shown in Figure 8. We observe two broad regions in the parameter space that both produceacceptable models and cannot be distinguished with the chosen test statistics (Figure 7). These two regions(to the left and right of the red dashed line) correspond to two types of system behavior: (1) a highly activefluvial system with frequent changes between modes (high πœ†), but a high preference of changes betweenmodes that are morphologically close (low 𝜎), and (2) a fluvial system with fewer changes between modes(low πœ†) but changes that are, on average, more extreme (high 𝜎). In other words, the rate and dispersionof mode changes are negatively correlated. These regions likely correspond to systems that are dominatedby lateral migration (frequent, more gradual, and small-scale morphological changes) and avulsions (infre-quent, abrupt, and more extreme morphological changes) and that cannot be distinguished given the chosentest statistics.

Having obtained the posterior distribution on the input parameters πœ† and 𝜎 of the stochastic model allowsus to compile a large number of synthetic realizations of the initial time series each within the acceptedposterior distribution of πœ† and 𝜎 (posterior samples S ∼ Sπœ†,𝜎(x, t)).

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2572

Page 14: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Figure 8. Examples of model results generated with the posterior distribution 𝑓 (πœ†, 𝜎|[[𝔰𝔭𝔦𝔦π”ͺπ”žπ”€π”’]]β„‘) in comparisonwith the experiment. (a) The time series of modified Hausdorff distance between consecutive images d(t), and (b) thetime series of wetted area w(t) computed on posterior samples (synthetic movies). Similar patterns of extrememorphological changes and short timescale variability in the wetted area are produced.

5. Discussion5.1. Applicability and Use of the Stochastic ModelAbove, we presented a framework to calibrate a stochastic model of braided channel movement on the basisof morphodynamic statistics derived from experimental time series of image data. For any parameter orbehavior that can be characterized by the chosen statistics, the synthetic time series generated with the cali-brated model can be considered as alternative realizations of the experimental data. In general, our approachcan be used on any time-varying image data that maps changes in the shape and position of objects, and inwhich the morphological changes can be characterized by small-scale variability interrupted by large andinfrequent morphological changes. Examples for such data are satellite images, aerial photos, or time-lapsephotos of natural and experimental geomorphological systems that are moving, deforming, eroding, oraggrading. We note that confidence in the parameters of the model (including confidence in the observedmodes, the corresponding transition probabilities, and the rates of small-scale change) increases with thelength of the original experimental data. Unfortunately, the length of a time series necessary to attain a cer-tain level of confidence about process parameters is difficult to define in general. It will be dependent on thecharacteristic timescale for the morphological changes of the investigated process, on the chosen test statis-tics, and, perhaps, on the resolution of the data. A closer investigation into how the statistical parametersvary with the length of the experimental data could lead to a better understanding of the minimum lengththat is necessary for applying our proposed methodology.

The ability to generate, from a calibrated stochastic model, multiple synthetic time series of the evolution ofa morphodynamic system can serve to address inquiries that cannot be easily addressed with a small num-ber of observations. For example, a single time series of data on channel movements can be enough to derivedistributions of the magnitude and frequency of channel movements. However, given a position of channelsat some time t, predicting the probability of the position and shape of channels at time t+1 remains difficultwithout a very large amount of data. In essence, large data sets of synthetic time series provide constraintson the variability of channel positions in space in addition to the temporal variability. Thus, it becomes pos-sible to predict the location of channels at any given time with an estimate of uncertainty, and to modelreworking rates of channel banks and sediment residence times as a function of location with implications

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2573

Page 15: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

for sediment dynamics (Hancock & Anderson, 2002; Martin et al., 2009; Paola et al., 2001), flooding hazards(Mutton & Haque, 2004), or weathering processes (Bradley & Tucker, 2013; Torres et al., 2017). Similar anal-yses of the spatiotemporal variability could be performed for time series of other processes such as bedloadtransport, hillslope erosion, distributions of precipitation, or the triggering of avalanches and landslides.

Our stochastic model is modular in the sense that components (such as the statistical tests) can be exchangedto build simpler or more complex models. In principle, the model can be adapted to any morphodynamicsystem that has, at least statistically, similar behavior to rivers in a dynamic steady state. This modular-ity is useful to explore and incorporate conceptual understanding of different systems. Moreover, it allowsincorporating multiple alternative analyses of the same system, for example, by replacing the image quiltingmodule by any other numerical model that creates synthetic data.

Beyond the scope of this work, the test statistics that are used to falsify the synthetic time series can beused to quantitatively compare any two morphodynamic systems, including theoretical models, laboratorymodels, or field observations. A possible approach to rank laboratory-scale experiments and/or theoreticalmodels quantitatively by their similarity to a field data set could consist of (1) learning the parameters ofa stochastic model using field observations and appropriate morphodynamic statistics, (2) simulating thestochastic model with Markov-Poisson sampling, and (3) comparing the synthetic and theoretical time seriesof images with the same morphodynamic statistics used for learning the parameters of the stochastic model.The more statistics are used, the more robust the ranking will be.

5.2. Limitations to the Simulated VariabilityOur method creates variability within a series of modes that were learned with experimental data as wellas a sequence of transitions between these modes. Therefore, the synthetic data can simulate small-scalevariability (such as braided channel migration) as well as large-scale reorganizations of the system (suchas avulsions). However, the transitions between modes are limited to the range of observed modes, andthe model does not capture the possibility of large-scale changes that have not been captured in the inputtime series. For example, the algorithm does not allow generating avulsions to areas that have not seen anyoccupation by channels, and general patterns of occupation within the basin will be maintained. Includ-ing a method to generate new modes constrained by the range of observed variability of modes would bea significant step forward to predicting the behavior of systems for which limited time series data exist.Projections of extreme events beyond the range of data are frequently done, for example, by extrapolatingfrequency-magnitude beyond the largest observed events. Equivalent approaches that include spatial extrap-olation based on the known set of modes could be adopted in a stochastic model to generate new modes.With such a model, system-wide changes that are larger than the observed changes, could be explored.

A similar limitation applies to modes in the data that are very different to all other modes and that thesystem does not exit from. Such absorbing states can cause premature termination of the synthetic timeseries, which means that these states can be reached earlier than in the actual experimental record, viaMarkov-Poisson sampling. The implications of premature terminations are many. For example, empiricalvariograms estimated from synthetic time series can only be compared with each other if the time series haveapproximately the same length, or at least a minimum common length that is greater than the correlationlength of the process. Had we opted for modeling absorbing states, we would need to reject many moresynthetic time series with ABC given that some of these time series would be too short, and with a negligiblenumber of morphological changes from which to learn statistics. Moreover, absorbing states are hard todistinguish from ordinary states. When a system does not recover from a mode, it could be that the moderepresents an absorbing state, or that the data series was not long enough to observe the recovery. An examplefor such an apparent absorbing state occurred within our data after 450 min of the experiment, when a majorautogenic incision even was recorded that brought the system to a state that was very different than all ofthe other states. The system remained in this morphological state (or mode) until the end of the experiment,and no transition was observed out of that state. However, it is likely that with a long-enough data serieswe would have observed several oscillations of incision and aggradation as observed in other experiments(Kim et al., 2014). Here we decided to avoid this state, but techniques to distinguish absorbing from ordinarystates and methods to include absorbing states in the stochastic model would be useful to model very largeand infrequent events that are not fully captured by a data series of limited length.

2574HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA

Page 16: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

6. ConclusionsIn this work, we proposed a general framework to calibrate a stochastic model and generate, from a sin-gle time series of image data, a collection of alternative realizations of the time series reproducing a set ofmorphodynamic statistics of interest. We demonstrated the framework with an example data set containingoverhead pothographs of braided alluvial channels and showed that synthetic time series were statisticallyequivalent to the real experimental data. The equivalence was evaluated using two test statistics: the var-iogram of a time series of wetted area and return levels of morphological change. We observed that thesesynthetic time series could be grouped into systems that are dominated by infrequent large reorganizationsand into systems that are characterized by frequent gradual changes. Additional test statistics should allowus to narrow the range of acceptable system behaviors further.

Future research could apply the proposed framework to multiple data sets to understand the sensitiv-ity of the proposed statistics to controlled boundary conditions. Longer time series could also provide abetter understanding of some core modeling assumptions made in this initial work, and help improvethe proposed stochastic model. As an alternative research direction, the proposed framework could beadapted to quantitatively compare results of numerical models and laboratory experiments with field-scaleobservations.

Appendix A: Markov Models of Transition ProbabilitiesGiven a stationary distribution πœ‹ over a finite set of states, we would like to find a homogeneous Markovprocess, represented by a transition matrix P, that reproduces the distribution πœ‹ in the long run. Becausethere are many such matrices, we discuss here a few interesting models.

First, we recall that any transition matrix P must satisfy the normalization constraints 0 ≀ Pij ≀ 1 andβˆ‘π‘—Pi𝑗 = 1. In order to reproduce a stationary distribution πœ‹, the matrix P must additionally satisfy the invari-

ance propertyπœ‹P = πœ‹. The resulting feasibility problem always admits a solution, namely the identity matrix.However, this trivial solution is not useful. In the following sections, we use the term stationarity constraintsto refer to the properties just described and design objective functions to enforce nontrivial transitions.

A1. Maximum-Entropy TransitionsAmong the transition matrices P satisfying the stationarity constraints, we select the matrix Pe that max-imizes the entropy Hi(PiΒ·) = βˆ’

βˆ‘π‘—Pi𝑗 log Pi𝑗 of each conditional distribution PiΒ· = Pr(M|Mi). Equivalently,

we seek the matrix Pe that maximizes the total transition entropy H(P) =βˆ‘

iHi(PiΒ·) = βˆ’βˆ‘

iβˆ‘

𝑗Pi𝑗 log Pi𝑗 asfollows:

Find Pe = arg maxPH(P) subject to stationarity constraints

Maximizing this concave objective function corresponds to the extreme case in which transition probabilitiesbetween states/modes are similar. The problem can be solved with standard optimization software.

A2. Distance-Based TransitionsThe previous model of transition probabilities has a major limitation; namely, it does not take into con-sideration the distances between the states/modes. We design an alternative objective function based onthe pairwise Euclidean distances D ∈ RmΓ—m between the modes in t-SNE space and parameterize it with adispersion parameter 𝜎 > 0 as follows.

We introduce a reference transition matrix Q = KE with E = 1√2πœ‹πœŽ2

exp(βˆ’ 1

2D2

𝜎2

)the exponentiation of the

distances taken entrywise, πœ‹ = 3.1415… (unfortunately the same symbol is widely used for the stationarydistribution), and with the diagonal normalization matrix Kii =

(βˆ‘π‘—Ei𝑗

)βˆ’1. The optimal transition matrixis then obtained by minimizing the Kullback-Leibler divergence DKL(PiΒ·||QiΒ·) =

βˆ‘π‘—Pi𝑗 log Pi𝑗

Qi𝑗between the

conditional distributions PiΒ· and QiΒ· for all i = 1, 2, … ,m:

Find P𝜎 = arg minPβˆ‘

iDKL(PiΒ·||QiΒ·) subject to stationarity constraints

In this convex optimization problem, the parameter 𝜎 > 0 controls the entropy of the system. For small 𝜎, thetransitions are allowed only between states of similar morphology. For large 𝜎, more extreme morphologicalchanges are allowed. This interpretation is based on the following theoretical result.

The maximum-entropy transitions Pe are recovered in the limit limπœŽβ†’βˆžP𝜎 = Pe

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2575

Page 17: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Figure A1. Distance-based transition probabilities converging to maximum-entropy transition probabilities. The normof the difference vanishes with increasing dispersion 𝜎 > 0.

To prove this result, we recall that the entropy of a discrete random variable X with probability mass functionp ∢ Ξ© β†’ [0, 1] can be written as follows:

H(X) = βˆ’βˆ‘x∈Ω

p(x) log p(x)

=βˆ‘x∈Ω

p(x) log 1p(x)

= log |Ξ©| βˆ’βˆ‘x∈Ω

p(x) logp(x)

1|Ξ©|= log |Ξ©| βˆ’ DKL

(p|| 1|Ξ©|

)Maximizing entropy H(X) is therefore equivalent to minimizing the relative entropy DKL

(p|| 1|Ξ©|

)with the

uniform distribution 1βˆ•|Ξ©|. The proof is concluded by observing that limπœŽβ†’βˆžQiΒ· =1|Ξ©| for all i = 1, 2, … ,m.

Alternatively, we illustrate how the norm of the difference ||P𝜎 βˆ’Pe|| vanishes as a function of dispersion inFigure A1.

In the paper, we focus on the distance-based transition model P𝜎 given that the maximum-entropy modelPe can be obtained by making 𝜎 arbitrarily large.

References

Ancey, C., Davison, A. C., BΓΆhm, T., Jodeau, M., & Frey, P. (2008). Entrainment and motion of coarse particles in a shallow water streamdown a steep slope. Journal of Fluid Mechanics, 595, 83–114. https://doi.org/10.1017/S0022112007008774

Arpat, G. B., & Caers, J. (2007). Conditional simulation with patterns. Mathematical Geology, 39(2), 177–203. https://doi.org/10.1007/s11004-006-9075-3

Beirlant, J. (2004). Statistics of extremes: Theory and applications. Hoboken, NJ: Wiley.Bishop, A. (1959). The principle of effective stress.Bradley, D. N., & Tucker, G. E. (2013). The storage time, age, and erosion hazard of laterally accreted sediment on the floodplain of a

simulated meandering river. Journal of Geophysical Research: Earth Surface, 118, 1308–1319. https://doi.org/10.1002/jgrf.20083Braun, J., Heimsath, A. M., & Chappell, J. (2001). Sediment transport mechanisms on soil-mantled hillslopes. Geology, 29, 683–686. https://

doi.org/10.1130/0091-7613(2001)029<0683:STMOSM>2.0.CO;2Bryant, M., Falk, P., & Paola, C. (1995). Experimental study of avulsion frequency and rate of deposition. Geology, 23, 365–368.

https://doi.org/10.1130/0091-7613(1995)023<0365:ESOAFA>2.3.CO;2Bufe, A., Paola, C., & Burbank, D. W. (2016). Fluvial bevelling of topography controlled by lateral channel mobility and uplift rate. Nature

Geoscience, 9(9), 706–710. https://doi.org/10.1038/ngeo2773Bufe, A., Turowski, J. M., Burbank, D. W., Paola, C., Wickert, A. D., & Tofelde, S. (2019). Controls on the lateral channel-migration rate

of braided channel systems in coarse non-cohesive sediment. Earth Surface Processes and Landforms, 44, 2823–2836. https://doi.org/10.1002/esp.4710

Chiles, J.-P., & Delfiner, P. (1999). Geostatistics: Modeling spatial uncertainty. Wiley series in probability and statistics. Applied probabilityand statistics section https://doi.org/10.1007/s11004-012-9429-y

Coles, S. (2001). An introduction to statistical modeling of extreme values. London New York: Springer.Convertino, M., Troccoli, A., & Catani, F. (2013). Detecting fingerprints of landslide drivers: A MaxEnt model. Journal of Geophysical

Research: Earth Surface, 118, 1367–1386. https://doi.org/10.1002/jgrf.20099

AcknowledgmentsThis work was supported byCoordenação de Aperfeicoamento dePessoal de NΓ­vel Superior (CAPES) andby the Department of EnergyResources Engineering at StanfordUniversity. The data used in this workare available as β€œRun 7” in Bufe et al.(2016).

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2576

Page 18: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Cox, D., & Isham, V. (1980). Point processes (Chapman & Hall/CRC Monographs on Statistics & Applied Probability): Chapman andHall/CRC. ISBN 9780412219108

Crouzy, B., Forclaz, R., Sovilla, B., Corripio, J., & Perona, P. (2015). Quantifying snowfall and avalanche release synchronization: A casestudy. Journal of Geophysical Research: Earth Surface, 120, 183–199. https://doi.org/10.1002/2014JF003258

Dalrymple, T. (1960). Flood-frequency analyses manual of hydrology: Part 3 (Tech. rep.): US Geological Survey.Dubuisson, M.-P., & Jain, A. K. (1994). A modified Hausdorff distance for object matching. Proceedings of 12th International Conference on

Pattern Recognition, 1(1), 566–568. https://doi.org/10.1109/ICPR.1994.576361Esposito, C. R., di leonardo, D., Harlan, M., & Straub, K. M. (2018). Sediment storage partitioning in alluvial stratigraphy: The influence of

discharge variability. Journal of Sedimentary Research, 88, 717–726. https://doi.org/10.2110/jsr.2018.36Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise

(pp. 226–231): AAAI Press.Furbish, D. J., Haff, P. K., Roseberry, J. C., & Schmeeckle, M. W. (2012). A probabilistic description of the bed load sediment flux: 1. Theory.

Journal of Geophysical Research, 117, F03031. https://doi.org/10.1029/2012JF002352Furbish, D. J., Roering, J. J., Almond, P., & Doane, T. H. (2018). Soil particle transport and mixing near a Hillslope Crest: 1. Particle ages

and residence times. Journal of Geophysical Research: Earth Surface, 123, 1052–1077. https://doi.org/10.1029/2017JF004315Ganti, V., Chadwick, A. J., Hassenruck-Gudipati, H. J., & Lamb, M. P. (2016). Avulsion cycles and their stratigraphic signature on

an experimental backwater-controlled delta. Journal of Geophysical Research: Earth Surface, 121, 1651–1675. https://doi.org/10.1002/2016JF003915

Ganti, V., Straub, K. M., Foufoula-Georgiou, E., & Paola, C. (2011). Space-time dynamics of depositional systems: Experimental evidenceand theoretical modeling of heavy-tailed statistics. Journal of Geophysical Research, 116, F02011. https://doi.org/10.1029/2010jf001893

Gutenberg, B., & Richter, C. F. (1944). Frequency of earthquakes in California. Bulletin of the Seismological Society of America, 34(4),185–188.

Hancock, G. S., & Anderson, R. S. (2002). Numerical modeling of fluvial strath-terrace formation in response to oscillating climate. GSABulletin, 114(9), 1131. https://doi.org/10.1130/0016-7606(2002)114<1131:NMOFST>2.0.CO;2

Hoffimann, J., Scheidt, C., Barfod, A., & Caers, J. (2017). Stochastic simulation by image quilting of process-based geological models.Computers & Geosciences, 106, 18–32. https://doi.org/10.1016/j.cageo.2017.05.012

Hovius, N., Stark, C. P., & Allen, P. A. (1997). Sediment flux from a mountain belt derived by landslide mapping. Geology, 25, 231–234.https://doi.org/10.1130/0091-7613(1997)025<0231:SFFAMB>2.3.CO;2

Howard, A. D., & Kerby, G. (1983). Channel changes in badlands. Geological Society of America Bulletin, 94, 739–52. https://doi.org/10.1130/0016-7606(1983)94<739:CCIB>2.0.CO;2

Howard, A. D., & Knutson, T. R. (1984). Sufficient conditions for river meandering: A simulation approach. Water Resources Research, 20,1659–1667. https://doi.org/10.1029/WR020i011p01659

Huttenlocher, D. P., Klanderman, G. A., & Rucklidge, W. J. (1993). Comparing images using the Hausdorff distance. IEEE Transactions onPattern Analysis and Machine Intelligence, 15(9), 850–863. https://doi.org/10.1109/34.232073

Iverson, R. M. (2000). Landslide triggering by rain infiltration. Water Resources Research, 36, 1897–1910. https://doi.org/10.1029/2000WR900090

Jerolmack, D. J., & Mohrig, D. (2007). Conditions for branching in depositional rivers. Geology, 35(5), 463. https://doi.org/10.1130/G23308A.1

Kim, W., Petter, A., Straub, K., & Mohrig, D. (2014). Investigating the autogenic process response to allogenic forcing, pp. 127–138. https://doi.org/10.1002/9781118920435.ch5

Kirby, E., & Whipple, K. X. (2012). Expression of active tectonics in erosional landscapes. Journal of Structural Geology, 44, 54–75. https://doi.org/10.1016/j.jsg.2012.07.009

Liang, M., Van Dyk, C., & Passalacqua, P. (2016). Quantifying the patterns and dynamics of river deltas under conditions of steady forcingand relative sea level rise. Journal of Geophysical Research: Earth Surface, 121, 465–496. https://doi.org/10.1002/2015JF003653

Liang, M., Voller, V. R., & Paola, C. (2015). A reduced-complexity model for river delta formationβ€”Part 1: Modeling deltas with channeldynamics. Earth Surface Dynamics, 285, 54–75. https://doi.org/10.5194/esurf-3-67-2015

Lopez, S. (2003). Channelized reservoir modeling: A stochastic process-based approach (Theses), Γ‰cole Nationale Sup'erieure des Minesde Paris.

Ma, H., Heyman, J., Fu, X., Mettra, F., Ancey, C., & Parker, G. (2014). Bed load transport over a broad range of timescales: Determinationof three regimes of fluctuations. Journal of Geophysical Research: Earth Surface, 119, 2653–2673. https://doi.org/10.1002/2014JF003308

Mariethoz, G., Renard, P., & Straubhaar, J. (2010). The direct sampling method to perform multiple-point geostatistical simulations. WaterResources Research, 46, W11536. https://doi.org/10.1029/2008WR007621

Marin, J.-M., Pudlo, P., Robert, C. P., & Ryder, R. J. (2012). Approximate Bayesian computational methods. Statistics and Computing, 22(6),1167–1180. https://doi.org/10.1007/s11222-011-9288-2

Martin, J., Sheets, B., Paola, C., & Hoyal, D. (2009). Influence of steady base-level rise on channel mobility, shoreline migration, and scalingproperties of a cohesive experimental delta. Journal of Geophysical Research, 114, F03017. https://doi.org/10.1029/2008JF001142

Meyer-Peter, E., & MΓΌller, R. (1948). Formulas for bed-load transport. In International Association for Hydraulic Structures Research.Moon, S., Shelef, E., & Hilley, G. E. (2015). Recent topographic evolution and erosion of the deglaciated washington cascades inferred

from a stochastic landscape evolution model. Journal of Geophysical Research: Earth Surface, 120, 856–876. https://doi.org/10.1002/2014JF003387

Mutton, D., & Haque, C. E. (2004). Human vulnerability, dislocation and resettlement: Adaptation processes of river-bank erosion-induceddisplacees in bangladesh. Disasters, 28(1), 41–62. https://doi.org/10.1111/j.0361-3666.2004.00242.x

Paola, C., Mullin, J., Ellis, C., Mohrig, D., Swenson, J., Parker, G., et al. (2001). Experimental stratigraphy. GSA Today, 11(7), 4–9.Paola, C., Straub, K., Mohrig, D., & Reinhardt, L. (2009). The β€œunreasonable effectiveness” of stratigraphic and geomorphic experiments.

Earth-Science Reviews, 97(1-4), 1–43. https://doi.org/10.1016/j.earscirev.2009.05.003Parker, G. (1979). Hydraulic geometry of active gravel rivers. Journal of the Hydraulics Division-ASCE, 105, 1185–1201.Pirot, G., Straubhaar, J., & Renard, P. (2014). Simulation of braided river elevation model time series with multiple-point statistics.

Geomorphology, 214, 148–156. https://doi.org/10.1016/j.geomorph.2014.01.022Roering, J. J., Kirchner, J. W., & Dietrich, W. E. (1999). Evidence for nonlinear, diffusive sediment transport on hillslopes and implications

for landscape morphology. Water Resources Research, 35, 853–870. https://doi.org/10.1029/1998WR900090Rudin (1991). Functional analysis. Mcgraw Hill Higher Education.Scheidt, C., Fernandes, A. M., Paola, C., & Caers, J. (2016). Quantifying natural delta variability using a multiple-point geostatistics prior

uncertainty model. Journal of Geophysical Research: Earth Surface, 121, 1800–1818. https://doi.org/10.1002/2016JF003922

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2577

Page 19: Morphodynamic Analysis and Statistical Synthesis of

Journal of Geophysical Research: Earth Surface 10.1029/2019JF005245

Sklar, L. S., & Dietrich, W. E. (2004). A mechanistic model for river incision into bedrock by saltating bed load. Water Resources Research,40, W06301. https://doi.org/10.1029/2003WR002496

Straub, K. M., & Wang, Y. (2013). Influence of water and sediment supply on the long-term evolution of alluvial fans and deltas: Statisticalcharacterization of basin-filling sedimentation patterns. Journal of Geophysical Research: Earth Surface, 18, 1602–1616. https://doi.org/10.1002/jgrf.20095

Strebelle, S. (2002). Conditional simulation of complex geological structures using multiple-point statistics. Mathematical Geology, 34(1),1–21. https://doi.org/10.1023/A:1014009426274

Suppe, J. (1983). Geometry and kinematics of fault-bend folding. American Journal of Science, 283, 684–721. https://doi.org/10.2475/ajs.283.7.684

Tahmasebi, P., Sahimi, M., & Caers, J. (2014). MS-CCSIM: Accelerating Pattern-based geostatistical simulation of categorical variablesusing a multi-scale search in fourier space. Computers and Geosciences, 67, 75–88. https://doi.org/10.1016/j.cageo.2014.03.009

Tarantola, A. (2001). Logarithmic parameters.Tarantola, A. (2006). Popper, Bayes and the inverse problem (Vol. 2, pp. 492–494). https://doi.org/10.1038/nphys375Torres, M. A., Limaye, A. B., Ganti, V., Lamb, M. P., West, A. J., & Fischer, W. W. (2017). Model predictions of long-lived storage of organic

carbon in river depo sits. Earth Surface Dynamics, 5(4), 711–730. https://doi.org/10.5194/esurf-5-711-2017Tucker, G. E., & Hancock, G. R. (2010). Modelling landscape evolution. Earth Surface Processes and Landforms, 35(1), 28–50. https://doi.

org/10.1002/esp.1952Van Der Vaart, E., Prangle, D., & Sibly, R. M. (2018). Taking error into account when fitting models using Approximate Bayesian

Computation. Ecological Applications, 28, 267–274. https://doi.org/10.1002/eap.1656van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.Wickert, A. D., Martin, J. M., Tal, M., Kim, W., Sheets, B., & Paola, C. (2013). River channel lateral mobility: Metrics, time scales, and

controls. Journal of Geophysical Research: Earth Surface, 118, 396–412. https://doi.org/10.1029/2012jf002386Wilkinson, R. D. (2013). Approximate Bayesian computation ABC) gives exact results under the assumption of model error. Statistical

Applications in Genetics and Molecular Biology, 12, 129–141. https://doi.org/10.1515/sagmb-2013-0010Yang, L., Hou, W., Cui, C., & Cui, J. (2016). GOSIM: A multi-scale iterative multiple-point statistics algorithm with global optimization.

Computers and Geosciences, 89, 57–70. https://doi.org/10.1016/j.cageo.2015.12.020Zhang, T., Switzer, P., & Journel, A. (2006). Filter-based classification of training image patterns for spatial simulation. Mathematical

Geology, 38(1), 63–80. https://doi.org/10.1007/s11004-005-9004-x

HOFFIMANN ET AL. STATISTICAL ANALYSIS AND SYNTHESIS OF GEOMORPHIC DATA 2578