locality, statefulness, and causality in distributed ... · distributed deployments, e.g. in the...

27
Locality, Statefulness, and Causality in Distributed Information Systems Concerning the Scale Dependence Of System Promises Mark Burgess Aljabr Inc. ChiTek-i AS Abstract—Several popular best-practice manifestos for IT design and architecture use terms like ‘stateful’, ‘stateless’, ‘shared nothing’, etc, and describe ‘fact based’ or ‘functional’ descriptions of causal evolution to describe computer processes, especially in cloud comput- ing. The concepts are used ambiguously and sometimes in contradictory ways, which has led to many imprecise beliefs about their implications. This paper outlines the simple view of state and causation in Promise Theory, which accounts for the scaling of processes and the rela- tivity of different observers in a natural way. It’s shown that the concepts of statefulness or statelessness are artifacts of observational scale and causal bias towards functional evaluation. If we include feedback loops, re- cursion, and process convergence, which appear acausal to external observers, the arguments about (im)mutable state need to be modified in a scale-dependent way. In most cases the intended focus of such remarks is not terms like ‘statelessness’ but process predictability. A simple principle may be substituted in most cases as a guide to system design: the principle the separation of dynamic scales. Understanding data reliance and the ability to keep stable promises is of crucial importance to the con- sistency of data pipelines, and distributed client-server interactions, albeit in different ways. With increas- ingly data intensive processes over widely separated distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful treatment. These notes are part of an initiative to engage with thinkers and practitioners towards a more rational and disciplined language for systems engineering for era of ubiquitous extended-cloud computing. CONTENTS I Introduction 2 II Notation and definitions 2 III Informal ideas about state and causality 3 III-A The role of scale ......... 3 III-B Popular ideas .......... 4 III-C Process history, entropy, and timescales ............ 4 III-D The point of usage ....... 6 III-E The importance of forgetting— indistinguishability ........ 7 III-F Summary: proper invariants . . . 8 IV Locality and distinguishability 9 IV-A Localization or spatial partitioning 9 IV-B Scaling of state ......... 9 IV-C Sequences or temporal parti- tioning .............. 10 IV-D Distinguishability, partitions, and redundancy ......... 10 IV-E Invariance ............ 11 IV-F Sharing versus partitioning ... 11 V States, events, and messages 12 V-A Propagation of influence by state messages .......... 13 V-B Scaling of local state ...... 13 V-C State localization at different scales ............... 14 VI Stateful (memory) processes 15 VI-A Short memory processes: linearity 15 VI-B Long memory processes ..... 16 VI-C Invariant definitions of stateless . 16 VI-D Transactions on scale T ..... 17 VI-E Scale dependence of state and causality ............. 17 VII Causality and event driven propagation 17 VII-A Past, present, and now ...... 17 VII-B Facts, messages, and event hori- zons ............... 18 VII-C Repeatability and fixed points . 19 1 arXiv:1909.09357v1 [cs.DC] 20 Sep 2019

Upload: others

Post on 23-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

Locality, Statefulness, and Causalityin Distributed Information Systems

Concerning the Scale Dependence Of System Promises

Mark Burgess

Aljabr Inc.ChiTek-i AS

Abstract—Several popular best-practice manifestosfor IT design and architecture use terms like ‘stateful’,‘stateless’, ‘shared nothing’, etc, and describe ‘factbased’ or ‘functional’ descriptions of causal evolution todescribe computer processes, especially in cloud comput-ing. The concepts are used ambiguously and sometimesin contradictory ways, which has led to many imprecisebeliefs about their implications. This paper outlines thesimple view of state and causation in Promise Theory,which accounts for the scaling of processes and the rela-tivity of different observers in a natural way. It’s shownthat the concepts of statefulness or statelessness areartifacts of observational scale and causal bias towardsfunctional evaluation. If we include feedback loops, re-cursion, and process convergence, which appear acausalto external observers, the arguments about (im)mutablestate need to be modified in a scale-dependent way. Inmost cases the intended focus of such remarks is notterms like ‘statelessness’ but process predictability. Asimple principle may be substituted in most cases as aguide to system design: the principle the separation ofdynamic scales.

Understanding data reliance and the ability to keepstable promises is of crucial importance to the con-sistency of data pipelines, and distributed client-serverinteractions, albeit in different ways. With increas-ingly data intensive processes over widely separateddistributed deployments, e.g. in the Internet of Thingsand AI applications, the effects of instability need a morecareful treatment.

These notes are part of an initiative to engage withthinkers and practitioners towards a more rational anddisciplined language for systems engineering for era ofubiquitous extended-cloud computing.

CONTENTS

I Introduction 2

II Notation and definitions 2

III Informal ideas about state and causality 3

III-A The role of scale . . . . . . . . . 3

III-B Popular ideas . . . . . . . . . . 4

III-C Process history, entropy, andtimescales . . . . . . . . . . . . 4

III-D The point of usage . . . . . . . 6

III-E The importance of forgetting—indistinguishability . . . . . . . . 7

III-F Summary: proper invariants . . . 8

IV Locality and distinguishability 9

IV-A Localization or spatial partitioning 9

IV-B Scaling of state . . . . . . . . . 9

IV-C Sequences or temporal parti-tioning . . . . . . . . . . . . . . 10

IV-D Distinguishability, partitions,and redundancy . . . . . . . . . 10

IV-E Invariance . . . . . . . . . . . . 11

IV-F Sharing versus partitioning . . . 11

V States, events, and messages 12

V-A Propagation of influence bystate messages . . . . . . . . . . 13

V-B Scaling of local state . . . . . . 13

V-C State localization at differentscales . . . . . . . . . . . . . . . 14

VI Stateful (memory) processes 15

VI-A Short memory processes: linearity 15

VI-B Long memory processes . . . . . 16

VI-C Invariant definitions of stateless . 16

VI-D Transactions on scale T . . . . . 17

VI-E Scale dependence of state andcausality . . . . . . . . . . . . . 17

VII Causality and event driven propagation 17

VII-A Past, present, and now . . . . . . 17

VII-B Facts, messages, and event hori-zons . . . . . . . . . . . . . . . 18

VII-C Repeatability and fixed points . 19

1

arX

iv:1

909.

0935

7v1

[cs

.DC

] 2

0 Se

p 20

19

Page 2: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

VII-D Blocking and non-blockingpromises . . . . . . . . . . . . . 19

VII-E Synchronous and asynchronoussignals . . . . . . . . . . . . . . 20

VIII Fault propagation 21

VIII-A Downstream principle . . . . . . 21

IX Applications 23

IX-A Data pipelines in the extendedcloud (IoT) . . . . . . . . . . . . 23

IX-B Microservices . . . . . . . . . . 23

IX-C Manifesto promises . . . . . . . 23

X Summary 24

Appendix 26

References 26

I. INTRODUCTION

Scaling and reliability of functional systems is apopular topic, not least for distributed systems. InSoftware Engineering, slogans, manifestos, and bestpractice frameworks dominate this discussion, andthe academic work on the subject is sparse and hasnot kept up with technology. Amongst those slogans,terms like ‘stateless architecture’ and ‘immutable in-frastructure’ have come to be used to describe a designpattern for software processes [1], and it is even ad-vocated by some influencers as a principle, especiallyin cloud computing1. This is not without controversy[2], [3], and the lack of agreement about what itmeans may well be due to its casual usage: what does‘state’ refer to (state of what), and what exactly isthe application versus its platform infrastructure? Aquick search reveals that there are various definitionsof statelessness, all informal, and usually entangledwith specific use-cases. The problems therefore beginwhen a usage in one domain spills across into another,bringing confusion. Counter-proposals involve theirown terms and slogans, with about the same levelof rigour, and eventually these become quasi-religiousconvictions rather than rational strategies.

This note is an attempt to disentangle some ofthese ideas and outline a reference model that couldoutlast more than a single generation of technologiesand practices. Some of the issues have been discussedbefore in [4]. The concise summary is that the terms‘stateless’, ‘immutable’, and so on, are largely scape-goats for a number of other concepts that fall underthe headings of ‘dependency’, ‘reliability’, and ‘faultpropagation’. However, I believe that the concepts oflocality, state, and causality are the actual essential

1The terminology is part of a pattern of ‘responsibility free’computing that now includes ‘serverless’ as a pattern for consuminga resource without responsibility for its underlying dependencies.

ingredients to understand. A few authors have at-tempted to explain parts of these issues in the past,but invariably partially in the course of advocating aparticular recommended practice, so the audience isleft with an incomplete understanding.

The outline of the paper is as follows: in IIII review a few rhetorical statements in the popularliterature to set up the context for the discussion. Iexplain how no process can be truly stateless, so weneed to understand what authors means when they use‘stateless’ in a rhetorical sense. I explain how the ob-servability of state is scale dependent, and determinesboundaries whose partitioning changes the semanticsof promises at different scales. The important scaleis the one at which an observer assesses the system(this is often a ‘client’ in a computing setting). In VII Idiscuss the implications of state in causal determinism,as this is mixed into what authors are trying to explain.I extend the usual ‘past causes future’ functional viewof causality to include concurrent, asynchronous, andprocesses with feedback, convergent semantics, anddesired end states, which are classically acausal on athe microscopic scale. Finally, to complete the popularmanifestos, I briefly talk about continuity and repro-ducibility (replaying congruent causal sequences) andthe implications of partitioning (modularity) strategy.Fault domains are a common idea, but often arguedincorrectly. I try to restate some of the popular claimsto make them more formally correct, and explain whytheir original statement is flawed.

Given the scope of the audience, and the impor-tance of reaching as many readers as possible, mygoal is to err on the side of pedagogy and keep thepaper as non-technical as possible—without devolvinginto unjustified opinionation. I shall try to provide justenough justification within the semi-formal languageof promises.

II. NOTATION AND DEFINITIONS

In line with previous work [5]–[7], I’ll use thelanguage of Promise Theory [8] to describe systeminteractions at a high level. In a promise theoreticmodel, any system is a collection of agents. Usually,agents will be active processes. Agents represent inter-nalized processes that can make and keep generalizedpromises to one another [8].

The generic label for agents in Promise Theoryis Ai, where Latin subscripts i, j, k, . . . numbers dis-tinguishable agents for convenience (these effectivelybecome coordinates for the agents). We shall oftenuse the symbols Si and Rj , instead, for agents toemphasize their roles as source (initiator) and receiver(reactive). So the schematic flow of reasoning is:

1) S offers (+ promises) data.2) R accepts (-) promises or rejects the data,

either in full or in part.3) R observes and forms an assessment αR(.)

of what it receives.

2

Page 3: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

III. INFORMAL IDEAS ABOUT STATE ANDCAUSALITY

The meaning of state can be pursued on manylevels. Suffice it to say that no decision process orcomputation can proceed without an interior depen-dence on some kind of state [9]. State basically refersto any information that characterizes a process, oversome interior timescale, and may be stored anywherewithin a hierarchy of agents and subagents that char-acterize the process. For example, a clock is a processthat maintains an interior state counter.

The state concept therefore spans the full pantheonof memory storage, from what is kept in the registersof chips, to configuration files, source code, or to longterm databases—but no two authors will necessarilyagree on which states are the relevant ones to theirarguments, or why they choose to treat one kind ofstate differently to another.

A. The role of scale

Scale plays a role in localizing state. Data maybe localized to a geographical region, a datacentre, ahost, a container, a function, or even a register. Someauthors play the game of offloading state from onelocation to another in order to claim statelessness; butthat isn’t a scale invariant characteristic.

I shall argue that genuine characteristics of asystem are those that can be described as invariantproperties—i.e. ideas that are not demolished by asimple change of perspective. In a virtualized worldof cloud computing, the meaning of being ‘within’a process, entity, or agent is ambiguous, because wemay draw the system boundary almost anywhere tofocus on specific issues or to capture extent—and,across the many articles written about statelessness,there is little agreement about what storage level oneshould be talking about.

One author may call a process ‘stateless’ or ‘im-mutable’, meaning that all decisions except unavoid-able input-output should be based on state that isfrozen and held invariant before the specific executionof the process (see figure 1). The scale-dependenceof state management, over space and time, was alsothe origin of the so-called ‘configuration manage-ment wars’ [10]–[14]. Another reader may considerprepackaged frozen configuration choices to belong toa phase of the processing itself, on a larger timescale,and the lifetime of a single process is further partof a longer meta-process, involving many clients, inwhich continuous delivery of upgrades to changes ofdependencies are interleaved with the keeping of clientpromises. Authors thus cherry-pick the meaning ofstate to suit their arguments.

It’s especially important to revisit the topic ofstate in cloud computing, where some definitionsconcerning locality need to be reconsidered in a scaleinvariant way; cloud processes often scale elasticallyto some extent. Moreover, virtualization adds layersto its meaning: from source code, configuration, con-tainer packaging, runtime environment, virtual ma-chine, physical host, etc. Developers constantly jump

end

runtime state

input/output

code

configuration

start

Fig. 1: Processes exhibit state on all manner of timescales.Some state is frozen into initial state of the packaging,some is allowed to change. The main question is, over whattimescale (or part of the process) does the state remaininvariant?

between the concerns of different levels: from pro-gramming, to continuous delivery, ‘DevOps’, config-uration management, serverless, etc.

Various perspectives on these issues have beenexpressed over the years [15]–[17]. The Twelve FactorApp [1] is a widely referred to best practice manifesto,which advocates that developers should execute appli-cations as ‘one or more stateless processes’, and thatsuch apps ought to have a ‘share nothing’ architectureto avoid contention. Recommendations then go on toexplain how necessary state can still be kept, after all,by employing ‘backing services’, and how caching ofcertain objects is ‘allowed’. This is confusing to saythe least. Processes should be stateless only when wesay so? There should be a simple invariant principle.

Nevertheless, there is a hint of a suggestion in theprinciple of favouring transactional rather than con-tinuous processing, for a particular scale and meaningof ‘transaction’. So-called ‘sticky sessions’ that tiemultiple web transactions to a particular server contextand location are explicitly rejected in [1]. However, ifone takes an extended session to mean a ‘complete’dialogue over a business process, including reliableTCP and TLS negotiations, etc, then it’s no longerclear that ‘stateless’, as implied, has an unambiguousmeaning.

Some platforms, like Kubernetes [18], have beendesigned with a notion of statelessness in mind, butlater extended their models to include state. Neweradditions, such as ‘service mesh’ and sidecars, actas state managers on behalf of processes and prop-agate state to reintegrate weakly coupled systems ina stronger manner. This suggests that the absenceof state itself is not the real problem the guidelinesare clawing at, but that the rejection of what isperceived as stateful behaviour is really an attemptto address concerns about localization (scale), speed(timescales), and fault tolerance (spread prevention).All of this needs to be scaled to cope with the extendedcloud of ubiquitous embedded devices.

3

Page 4: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

B. Popular ideas

A quick online search and query reveals a numberof definitions about statelessness, which point a fingerat state but discuss reliability. These definitions aregenerally tied to a single case, and generalize only byimplication2. For example:

‘When an application is stateless, the serverdoes not store any state about the client session.Instead, the session data is stored on the clientand passed to the server as needed.’

This definition refers to client and server roles in atwo-agent interaction. Similarly:

‘A stateful service keeps state ‘between theconnections’ or session interactions, whereas astateless service does not’

In this case ‘between the connections’ is intended tomean that access to a service is transactional and thatsome data persist between independent transactionswhen a process is stateful. The preoccupation withconnection indicates the author assumes a client-serverstyle application. Wikipedia talks only about statelessapplications, which it defines to mean:

‘that no session information is retained by thereceiver’

The substitution of ‘connections’ for ‘transaction’ be-lies a focus on client-server computing at a particularscale, and the assumption that single exchanges aresafely invariant while longer exchanges are not. Thatwould depend on the extent to which the compositionof exchanges were ‘locked’ (e.g. mutex locks) andthe data could go missing in case of interruption.The excerpt also distinguishes the roles of sender andreceiver as part of the concept, as for a protocol,implying a directional arrow from client to server. Ingeneral developers tend to focus their thinking on thepreferred scale of the subtask they are working on,even as ideas about DevOps and Continuous Deliveryask them to rethink those ideas on a larger scale, fordevelopment continuity.

‘Think of stateless as if a service is a hardwarechip. All computation needs short term storagelike registers and stack and maybe heap. Whathappens when we lose power? A service thatcalculates some value and returns a result can beconsidered purely stateless. Purely stateful wouldbe a service that maintains state like a gameserver tracking scores and players in a gameworld.’

In this view, scale plays a key role. A short lifetimefor data (as measured in the proper time of theprocess3, rather than wall-clock time) means statelessand persistent and reusable means stateful. As I’llshow later, this view of stateless approximates theidea of memoryless systems (section III-E), and very

2I choose not to cite the ephemeral sources for these ‘quotes’as they are easily found, sometimes paraphrased, and pulled out oflonger discussions. I hope readers agree that they are representativeof the state of thinking.

3For a definition of proper time, see [5]

long term data that are ‘invariant’ over the effectivelifetime of the process can be separated out and treateddifferently (see section IV-E). A refinement of this:

‘Stateful means written to localhost

Semi-stateful means 1:1 write over the network

Semi-stateless means n+1 networks relayeventual write

Stateless means held in memory until derefer-enced.’

The scale dependence becomes more evident here.The reference to ‘in memory’ suggests a short livedonce-only usage of state, versus persistent storageagain. The reference to networking is less clear: whichnetwork are we referring to? The interior host bus isa network alongside the LAN/WAN. In the past, thepreference for the processor bus was about relativespeeds: interior communication was much faster thanLAN/WAN communication. Today, it is impossible toknow whether interior host bus or exterior LAN/WANconnection will be faster. The goal of avoiding anarchitecture that relies on a network connection, todisk storage, or to a remote service, therfore doesn’tstand scrutiny in the cloud era, as even an in-memoryprocess memory might be paged out to disk, orretained in a hash table for extended usage. With thisin mind, where exactly is the imagined line betweenruntime state and persistent state in the architecture?

C. Process history, entropy, and timescales

Lamport was the probably the first author to ap-preciate the relativity of time in computer science,as a succession of causally ordered events [19]. Thetransmission of messages, carrying causal influenceplays a central role in understanding what happensin computation, both locally and in a distributedsystem. Processes that depend only on a current localregister set, i.e. not on the recent past or the extendedhistory of all such sets are called path independentor memoryless (see appendix)4. This concept will bemost useful to explaining what authors are trying toexpress in ‘stateless’.

Predictive systems can never be memoryless,for instance, because they explicitly use pastexperiences—not only current state—to predict thenear future, involving a computation over multiplesamples collected over multiple proper times. Weathermodelling is the archetypal case in point. Small dif-ferences in the data sets can lead to large changesin the predictions. The dependence of data processingon history is utterly susceptible to scaling arguments.Similarly, convergent systems that move towards afixed point or attractor, at a future time, rely onmemory to recognize their final desired end state.

4I sometimes call them ballistic processes, because we tend totreat computation as if it behaves something like a game of billiards:the mere sending of some data provokes an immediate involuntaryreaction, fully deterministic. This is used to argue for ‘push’ over‘pull’ methods, for instance, and is completely wrong.

4

Page 5: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

It’s hard to generalize about the role of causality,because it is so dependent on the nature of interactionsin a system, but I need to make a few commentson this because the integrity of data sources hasbeen challenged in some commentaries, e.g. [3], [20].Some authors have argued that we should never throwaway data about past events because it might beneeded later to ‘recover’ from some fault. Apart frombeing unsustainable5, the premise of this argumentis wrong from a causal perspective; indeed, systemsmust eventually forget their past over some timescale.The question, again, reduces to understanding the rel-evant timescales. Sometimes regulatory bodies insiston data retention, for legal reasons, up to some statuteof limitations. This can be factored into the policyand separated from the data that need to accesseddynamically at runtime, becoming effectively a partof a different application.

Does it matter whether we take data transactionby transaction, or in bulk as a sum of all transactions(such as an aggregate database or a file)? The processby which data arrived in the past is only relevant ifit affects the promises it makes at the time of usageby another agent. For some authors, a ‘database’ ismerely a cache for a long linear process of accretingthe past, i.e. that the current state of a database is thesum of all past facts transacted at its entry point [3],[20]. This is ‘retarded conditional’ view of data: byintegrating differential elements from some beginningto some end one calculates the answer transaction bytransaction.

‘From this perspective, the contents of thedatabase hold a caching of the latest recordvalues in the logs. The truth is the log. Thedatabase is a cache of a subset of the log. Thatcached subset happens to be the latest value ofeach record and index value from the log.’ [20]

This quote summarizes the retarded view of data inwhich the current state is merely an arbitrary point isa deterministic trajectory—a classic Turing machineargument. In this view, all the causal informationlies in the past. The argument goes: the present isa function that does not alter the past (which is true).It’s then assumed that the function is a linear function,formed from a sequence of ‘deltas’ or state changesthat can be stored in a log and added together toyield the current state. The latter is only true for re-versible linearized (memoryless) processes, localizedto a single point of entry. Popular techniques of thisinclude the use of ‘actors’, ‘pure functions’, and evenmutex locks, with associated costs (i.e. without inte-rior reads or writes to exterior data sources)6. Neitherof these assumptions generalizes to distributed cloudcomputing, as I’ll show in section VII. There is also an‘advanced conditional view’ of determining outcome,which is also know as ‘desired end state’, in whichfuture states are the relevant driver of behaviour.

5The argument goes: disk storage is cheap today, so why wouldn’tyou store everything forever? There are plenty of reasons. Forexample the escalating power cost of storage alone is a reason. Ourcurrent experience with the crisis over cheap plastics should be awake up call for anyone advocating an end to garbage collection.

6See also the approach to network data consistency taken in [21].

Example 1: The functional programming mani-festo claims that programs will be deterministic andreproducible if functions are defined as following thefollowing axioms: (i) if they are ‘total functions’,in the mathematical sense (i.e. they return an out-put for every input), ii) if they are deterministic,i.e. they return the same output for the same input(which assumes they do not implicitly rely on variantconfiguration, database lookups, and are immune to‘noise’ over all timescales involved in the process),and iii) if they alter no exterior state other thancomputing their promised output. The composition ofsuch objects would certainly be deterministic, but theaxioms are often violated in practice, e.g. by systemfaults and by inattention to environmental noise. Thenaive view is that programs are perfectly isolatedand that, if programmers do nothing, nothing willhappen. In practice, there is no such isolation contextfor distributed systems, and it’s up to programmersto explicitly perform noise correction fast enough tomaintain these axioms.

It feeds the non-relativistic view that one canabsolutely capture ‘facts’ about the source of infor-mation, which can then be preserved and treated asimmutable. The error in this argument is that, assoon as a sample of data has been transported intostorage, it is no longer the source view: its the observerview of the data store. One would have to transportall relevant context into the data snapshot. This maybe a simple discriminator, but it’s still an arbitraryview that doesn’t remove the uncertainties and doesn’twarrant the preservation of an expired context withoutan understanding of its significance.

In order to recover a snapshot of state, it’s arguedthat one should never delete any of the contributingfacts in the logs. After all, in a linear system, thecurrent snapshot is merely the balance of all previoustransactions within the system; but this is simplistic. Ina non-linear system, there is no such separation of pro-cess timescales, and we would still need the full pasthistory including all leakages of noise and interleavedprocesses to understand the present in general, becausecomputations are not always linearizable (see sectionVI-A). The final outcome becomes strongly dependenton the particular moment at which data were collected(a kind of ‘butterfly effect’)—so both the currentsnapshot of the database contains information that isnot in the journal7.

Even if our system is linear, and we keep all datain an eternal timeseries, searching backwards takestime, so we index data, but to do so imposes a risingcost (in energy and labelling), possibly identifyingunique instances, by GUID or quasi-universal times-tamps, and so on. There is a reason we aggregatedata and use caches and latest summaries: to localizerelevant context, and separate it from other data whose

7I suspect that the underlying and unspoken aim of advocating‘stateless’ and ‘throw away nothing’ approaches is actually tolinearize systems and make them as deterministic as possible byweak coupling. Alas, the rising cost of this, in some cases, isprohibitive and ultimately unsustainable, so alternative strategiesshould probably be considered.

5

Page 6: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

meaning has gone into the mix of entropy. Whatevery system designer needs to consider carefully isthe extent to which we flatten a dynamical processinto a timeless, static database model. It’s okay todo so as long as you accept the loss of a relativetemporal reference, and the accumulation of entropy.It does not necessarily imply that the past is lost.Process time information is lost to entropy by designin most systems—and this is not wrong; relationaldatabases focus entirely on static semantics of data,not on process histories. We now have a pantheonof time-series databases that focus entirely on bruteforce history, without attention to ‘scaled semantics’.By this, I mean that timeseries typically involve manypattern scales, such as by hour, by week, by month,etc, and that these are treated as issues for post hocanalysis rather than being built into the data modelin an efficient manner. There is a policy choice—achoice of semantics that can’t be stipulated in general.If we want to get systems to behave ‘properly’ as wellas efficiently, we need to select an appropriate causalpolicy for what is ‘proper’, and be aware that thesechoices are inherently scale dependent in both spaceand time.

D. The point of usage

Before summarizing, I want to make one morepoint about the importance of the recipient in thedetermination of so-called facts. Facts are a kind ofpromise, but it takes two agents and two promises topass on facts: a sender and a receiver. Past facts aretherefore not really as immutable as is often claimed.There is the original value offered by a source S to areceiver R:

S+V−−→ R (1)

and there is the moment at which the value is acceptedand used:

R−V−−→ S. (2)

It is this latter promise that actually passes on theinformation and creates an event [5]. We ask: canthe promises be kept invariantly? The conditions foragent R might have changed, between the keepingof these two promises, even if S is somehow etchedin stone. Promise Theory predicts that it’s not thetime of origin of the data that matters to its causalinfluence, but rather the moment at which the dataare accepted into the timeline of the next agent—justas in an electric circuit with feedback, which is theinspiration for control theory.

If one dabbles in synchronous versus asynchronousprocessing, this may matter: if the timescale overwhich the behaviour of an agent changes is com-parable to the timescale over which you sampledata, the data basically become random variables,by the receiver’s hand (not the source’s). Sufficientimmutability can be assured in a few ways: e.g. byassembling the states one promises to depend onbefore processing, to decouple independent processes.That way the processes can continue at their own rate

and still avoid such issues8. This is what functions doin programming: automatic variables (by value) copythe value into private workspace.

There are two approaches: trusting the source andtrusting the receiver.

• If one trusts the promise of invariance (im-mutability) of the source S (as one does intimeseries databases, as a trusted second handsource), then keeping state there becomesa policy choice and one reads informationdirectly from that source. This second handinformation replaces the source of ‘truth’, andis not the same. This assumes that there isalso an invariant key for looking up the data,which is understood by both parties and thatrelationship is also constant.

• If one trusts the receiver to sample and keepthe information invariant (as one does in usingprivate local variables in programming, and in‘immutable images’ in cloud computing) thenpolicy abhors reading any new informationfrom outside the boundary of containment.Derivative processes, like that of R, whichdepend on data from a source S, thus cap-ture all their dependencies before beginningto keep any subsequent promises—freezingthem and rendering them immutable as amatter of policy.

It seems that, in neither case is there any guarantee ofthe invariance of the data, or the ability to replay thesame interactions multiple times, as that is entirelya policy decision for R. In general data come frommany different sources, with conditions that are quiteunequal, and merely sampling these into a trustedrepository does not alter that. In fact it adds a secondlayer of trust, by the Intermediate Agent Law (see7.2.2 of [8]).

What matters is not whether we cache data ina database, or keep each update in a journal. Whatmatters is whether the data can be relied upon notto change over the course of trying to use them.This often assumes implicitly that there is a singlecorrect dependency value for each moment in time,with an ability to ‘roll back’, yet this notion has beendebunked [22].

Coarse graining time and separating interior fromexterior time: this is what we do in functional pro-gramming. It introduces the full range of processcausal viewpoints.

‘Mutable state needs to be contained.’

There is a causal twist here, in the form ofNyquist’s sampling law, and Shannon’s error cor-rection law [23], [24] (for a review, see [25]). Ifa system has knowledge of a correct state (wherecorrect is promised as a matter of policy), then no

8This idea was built into the design of CFEngine, a realtimemaintenance tool as a safety measure, and was rarely understoodby users.

6

Page 7: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

unintended deviations from that state will be measuredby an observer if restored quickly enough. We take forgranted that such feedback processes are on-going ata low level of memory in all our technology at alltimes. The same principles may also be applied at ahigher level, as maintenance procedures9. If problemsare fixed before a fault can be sampled downstream,there will be no propagation—and the system willbe invariant by virtue of dynamic equilibrium [27].This is how data consensus works and memory errorcorrection work, for instance. The key issue is whetherthe relevant user can observe any change in the stateor not.

The Twelve Factor App manifesto claims to avoidsoftware erosion, which implies that there is should bea maintenance process at work. In the various cloudmanifestos, there has been a focus on reorganizing themaintenance process so that dependent information isembedded and assumed invariant (as in a transaction),by freezing ‘golden images’. If errors are detectedpost hoc, due to state drift, one deletes the process,replacing it with a fresh copy (see the car examplebelow), which is accepted as a matter of policy.Corrective actions post hoc (instead of preventativeactions) then require some kind of ‘rollback’ on thescale of the promised transaction.

Without a preventative error correction underlyingthe use of runtime state based on a ‘fixed image’, apost hoc correction is still needed in that dynamicalruntime portion of state, but if the error has beenobserved by any process, its influence will alreadybe too late. In a kernel or database monitor, keep-ing validating transactions is relatively easy (givendeep memory error correction), but as the scale ofinteractions grows (e.g. in microservices and servicemeshes), isolation becomes less likely.

E. The importance of forgetting—indistinguishability

Dependency on process history adds baggage (pro-cess mass [25]), tangling up changes in dependencieswith data going back in time. Memoryless processesare ‘agile’ or ‘cheap’ to run because they have littlebaggage. This allows changes to be made to themeasily. Of course, that should not be taken to mean thatall changes will be simply localized, with no effect onother processes.

Keeping processes agile and independent of his-tory seems to be a way to address quick reproducibil-ity. Reproducibility has nothing to do with compu-tation unless it requires replaying an entire journalof transactions to achieve it. It has more to do withtrust in the stability of promises. We build businessesand institutions on reproducibility, because it allowsanyone to verify a result and repair possible errors,when something is judged to go wrong. At such a

9The relationship with the strategy used by CFEngine to define‘convergent operators’ [10], [26], is interesting. If you deal withpure functions, you cannot have maintenance of persistent agents.You redefine maintenance as the death and rebirth of an agent, withassociated loss of runtime state. Runtime state is contained mutablestate. But for a memory process it affects the behaviour of the agentin ‘realtime’, i.e. in band of the function’s I/O channel.

time, the idea is that we can forget a state of thesystem we consider erroneous, and replace it with a‘proper’ policy-acceptable one.

If a process is interrupted, and some of the con-tributing past information is lost, we believe that thismust compromise the reproducibility of the outcome.This is not necessarily true, as explained below, butlet’s continue. The result is that we make transactionsthat carry all relevant data bundled with them, andkeep a copy until the transaction has successfullybeen prosecuted. If the transaction should fail to beconfirmed, we can repeat it. The implicit assumptionhere is that there will be no effect on either agent(the receiver or the source) unless the transaction com-pletes successfully. Repetition should also be ‘safe’,i.e. convergent to a definite outcome, not just a ‘firstcome first serve’ (FCFS) in a random walk.

If there are errors on multiple scales, we mayhave to go back across cumulative transactions onmultiple scales to repeat the transaction. So, if onebuilds systems at scale on the basis of transactionaldeterminism, we are doomed to keeping ever growingamounts of data, up to the size of the largest transac-tion. The cost grows in relation to history (time) notin relation to scale of parallel instances (space).

If any data are left ‘floating in limbo’, in extended‘stateful’ sessions, it is argued that those states couldbe lost and data may go missing. This is not aboutstatefulness at the receiver, but about when the dataare discarded from the source so that the receiverstate can no longer be reproduced. This naively seemsto return us to a justification for the idea of neverthrowing away any data, discussed above, but thisis not so. We simply need to preserve data untilconfirmation of receipt—as in reliable transfer pro-tocols. In other words: we must assess when datahave already played their role in the next stage ofthe computational pipeline. Next we need to define thescale of that remark: on what process scale do we needconfirmation of ‘ok to delete’? If we treat transactionsas packet by packet over a session, then a processcrash could lose data. But if we treat completion asthe confirmation of a promise kept that depends onthe data, then scaled transactions can be constructedusing locks.

Safety under repetition is the much neglectedmethod of assuring certainty in systems. Idempotenceis sometimes mentioned, but most authors think thismeans remembering which transactions are completedon a FCFS basis without checking for contradictions.

Example 2: Numbering of transactions, like inTCP, is one way to maintain coherence of order, butthis is not always meaningful without ad hoc assump-tions. In a data pipeline, for example, you can numberitems, but the numbers assume that both ends have aclear sense of how the arrival of data will take place inorder to combine multiple sources meaningfully. So,while a 1:N transport can be regularized by partial or-dering, N:1 aggregation cannot. Numbering promisesprocess as ‘intentional’ events, but random arrivalshave no such coherence, making the processes non-

7

Page 8: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

reproducible unless the entire history is captured andused as the future source of truth. That assumed truthmay not be a faithful representation of the originalsource processes. As always, the receiver determinesthe semantics of data.

Fixed point convergence is the more economical keyto reproducibility, because a fixed point is the onlycertain way of guiding a system with random be-haviour to a known state. It requires knowledge ofa future state to which the system is headed. Absoluteinvariant future state is simple and cheap to manage.Relative future state is fragile and susceptible to faultsand cumulative errors of execution. In data pipelines,for instance, this needs to be treated very carefully(see our Koalja work, for instance [28]).

Example 3 (State of a car): If you crash your car,the car will be lost if it is a unique, one-of-a-kinddesign. But if the memory of its design (its idealstate, minus age and mileage decrepitation) is keptelsewhere, as a separate manufacturing process, thenthe car can be replaced—but not its runtime state, i.e.the precise details of what it was doing at the timeof the crash (including its passenger inventory). If thecar client is not fussy, it may overlook a few detailsand be satisfied with an equivalent.

Some of the state you are happy to forget, some youare attached to. There is no fact present in the car thatcan tell you how to discriminate this line. We don’talways need to remember the past, sometimes only asingle future ‘desired’ state. Indeed, it’s desirable toforget the past as it’s just in the way—and sends youa regular bill.

Example 4: A statute of limitations, or causalityhorizon. If you build on advanced boundary condi-tions, you need no memory. Memoryless processeshave a very short horizon. A function is basicallya mutex lock around a private cache of functionargument values. The completion of the function is atick of its clock at the scale of functions, and thereforea pure function is memoryless at that scale.

We might conclude that favouring ‘statelessness’ isto push responsibility for preserving state backwardsalong a causal chain (into the past): onto the sourcesrather than the receivers. But this is not necessarilytrue. Trajectories also depend on policy of future statesand the rules of transaction, and there is significantfreedom to redefine causal constraints by shiftingresponsibility between these, over different timescales.We have to ask: which of the agents in this chain isthe most fragile? Why should events from the pastbe more important or ‘correct’ than what happens inthe future? Many will answer that ‘the past determinesthe future’, but this naive determinism. Promises aboutfuture states also contain causal information, and it isnot correct once one accounts properly for scaling. Sowe need to return to look at the ways in which causalpropagation takes place and is scale dependent.

F. Summary: proper invariants

To summarize, we appear to have succumbedto the trap of obsessing over illusory detail rather

than focusing on the key question: how can a stablepromise be kept? So what we seem to be struggling toexpress is a design decision (a promise) about whichprocesses will be considered atomic at each scale,which is equivalent to expressing which data we arepotentially willing to lose.

Assumption 1 (Promise manifesto): The centralquestion about systems is: will the outcome of promises beinvariant to the conditions under which the promise is keptor not—and does that matter to the promisee?10

From this perspective, statelessness actually seemsto imply a preference to use ‘current state’ in short-lived, ephemeral interactions, over which dependen-cies can be treated as approximate invariants. If allagent interactions are kept short, as measured by theirown proper time, and relative to the scale of theirlarger process’s exterior time,

∆tinterior

∆texterior� 1 (3)

i.e.∆ttransaction

∆tprocess� 1. (4)

then a process will tend to a state of statistical invari-ance. This relative timescale argument could perhapsbe used as a definition of ‘micro’ in microservice.It makes dynamical sense: it’s a linearization of apotentially non-linear process. We should understandthat as a design constraint. The choice enables even-tually consistent outcomes over a sample set, butthere may be other sample sets that have not reachedthe equilibrium. The best promise (no guarantee) ofstability is to ensure that updates have plenty of timeto reach equilibrium, by separating timescales.

Example 5: If a dependency changes every sec-ond, and a process promises output every few seconds,there is insufficient time for the process to promiseinvariance. However, if changes to dependencies occuronly once per year, then processes lasting a fewseconds can be considered invariant in practice, by(3).

There is an implicit separation of concerns intalking about state: the part of state that we careabout, in the current context, and the part we don’t.This suggests a natural partitioning by policy of scalesfor each relative process, rather than a universal bestpractice guideline. The final point about ‘good enoughreplacement’ leads us to consider the role of observ-ability and distinguishability in deciding outcomes [5].

The issue in question seems to be: over whattimescale can some form of state be considered de-pendable (invariant relative to the receiver), from theperspective of all stakeholders in the system? Thisincludes at least the role of the client (when data areuploaded) and the server (receiver of uploaded data).For the remainder of the paper, I’ll therefore focus onthe dynamical principles of keeping promises acrossa multitude of scales.

10The recipient may have arranged for contingencies in advance.

8

Page 9: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

IV. LOCALITY AND DISTINGUISHABILITY

The focus of the paper, from here on, will be toillustrate how a few key concepts behind the promisesauthors try to capture in rhetorical usage may beformalized. The central concepts are mainly spacetimeconcepts, about order, scale, and observation [5].

A. Localization or spatial partitioning

The virtue of source code modularity, for the sep-aration of semantic concerns, is doctrine in computerscience. Localization of process execution in spaceis a form of modularity too, that we call scalingof execution context—‘containment’ for short. Today,virtual machines and container technologies are thetools for achieving such spatial localization, erectingbarriers that are supposed to limit the exchange ofinfluence between interior and exterior. Isolation froman influence X implies causal independence of X(see section VIII-A). In Promise Theory, bare agentsthat make no promises are assumed independent of allcausal influence a priori.

relay

client 1

client 2

client N

shard 1

shard M

shard 2

. . .

. . .

back (advanced)

forward (retarded)

conditional

Fig. 2: Causal influence can flow forwards or backwardsalong the direction of an interaction (defined arbitrarily). Atthe ends of an N :M interaction, say from clients to servers,data may be shared or kept in entirely separate scopes, ineither direction. Where nothing is shared, we often say thatthe data are ‘sharded’ or have private scope. Integratingshard implies reintroducing a shared resource in the agentthat aggregates them however, so we don’t escape sharing;we only delay its onset in order to acquire partial invarianceof data for other processes.

In Promise Theory, every active or passive partof a system is an agent. The definition of an agentalso defines a scale, and an isolation boundary for thatscale. Elementary agents are the smallest observablescale of a system, and superagent clusters of themform larger scales, where agents work together to keepcollaborative promises.

Modularity is achieved by partitioning the processinto separate agents whose interactions are defined bypromises.

Definition 1 (Partitioning of a process): A subdi-vision of a process agent into a number of non-overlappingsubagents, in such a way that the mutual promises betweenthe subagents are exposed to exterior observers.

Agents may be decomposed by space or by time ifthey are distinguishable by some label. In other words,if agents are numbered in order, or labelled withnames or types, they can be separated into subdivi-sions using the labels they promise. Examples includethe division of a larger process into microservices, orthe partitioning of a database into shards, perhaps cu-rated by intermediaries with a APIs between them, butthe principle doesn’t refer to any particular technologyor set of assumptions.

Now, let’s formalize the hierarchy of agents in-volved in representing data processing, starting withthe easy parts. This helps to establish the language ofpromises and use of terminology.

B. Scaling of state

Every memory location in a system that can recordstate is an agent that can promise to hold a value11.All states are memory agents, and partitionings leadto separation of states that keep different promises—sometimes called ‘sharding’ (see figures 2 and 3):

clusters (pods)

variables

processagents

process

Fig. 3: Distinguishability of agents to clients determineswhether they can be partitioned or whether they form aredundant set. Agents are distinguished by the promises theymake, which in turn are states of the agent. Together thestates of a system form a configuration. Some states arepromised to exterior agents and some have private scope.In general the scope of a promised value is contained by acertain scale, which we call a semantic boundary. Redundantagents are indistinguishable. Non-redundant shards makedifferent promises.

Definition 2 (Variable): A agent or subagent Vpromising a key-value pair, that promises a name and avalue, representable as a simple agent: promise.

V+(name,value)−−−−−−−→ S, (5)

within a scope S.

The internal variables of a process agent are what onenormally thinks of as the state of the agent.

Definition 3 (State of a variable): Let V be anyvariable (or set of variables), on the interior of a process

11This applies to the nodes in any state machine too.

9

Page 10: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

agent S, which takes values from any set of distinguishableelements X . The state of V is the value of V ∈ X , promisedby the source S to any outside agent A?:

S+V−−→ A?. (6)

The problem with this definition is that the state isonly observable to the process agent O outside of A,if both the source S promises the information as anexterior promise, and the observer also promises toaccept and use the promised information:

Definition 4 (Observable state of a variable): Letthe state of a variable VS ∈ X , promised by S, be sampledon any occasion by an observer O:

S+VS−−−→ O (7)

O−VO−−−→ S, (8)

where VO ∈ X . Then the observable state of the variable isVS ∩ VO .

The role played by the observer in this definition iscrucial. It underlines how relativity will play a rolein all stateful phenomena12. If a state is persistent orinvariant to order N samples, then multiple samples,by an observer O, lead to the same value for somenumber of samples N . It is clear that a variable, evenof order 1 implies the existence of memory on that canbe sampled by some process that carries informationto an observer. State is an observer issue rather thana provider issue.

Definition 5 (State of an agent): The total state ofan agent consists of the states of its interior variables.

In effect, each variable is a subagent member ofa larger process (superagent). This tells us that theboundary where we choose to define the edge of aprocess plays an important role in the way we describeits behaviours (process, container, group, pod, host,etc). Moreover, since it relates to promises, whateverelse it may be, the statefulness of an agent S is anassessment made by each recipient observer involvedin promised interactions.

C. Sequences or temporal partitioning

There is also localization in time: when a process’strajectory starts and ends (see figure 1), and whetherinformation is fed into it only at those endpoints asimmutable constants or ‘invariants’ of the process, orwhether information is accepted into it and modifiesthe process as it evolves. The purpose of ‘functions’ inprogramming is to promise that only the I/O channelsbelonging to the function (the arguments and thereturn value) lead to change. This is hard to assureon a larger scale, however, as computer code is onlyone in a mixture of overlapping processes whosedependencies lead to mixing. Seeking invariance ofpromises is the key to process stability. A processmay be called adiabatic [29] if exterior informationdoes not alter promise definitions over the timescale ofinteractions that rely on it—meaning that a process’s

12The example of observing the inconsistent state of a clock wasdiscussed in reference [5].

promises are invariant over the interval during whichthey are being kept, with no configuration changes13.If process fragments are partitioned end to end, theyform a sequence. If they coexist, either starting orending at a common agent at a common time, thenthey may be called concurrent.

Localization allows us to partition processes inspace and time, holding certain aspects constant overthe duration of a sub-process.

• Time localization leads to promise invariancefor agents.

• Space locality leads to privacy of scope andnon-interference.

This is a form of lock-free synchronization. Propo-nents of the Actor Model will find these principlesfamiliar [30].

D. Distinguishability, partitions, and redundancy

In order to distinguish partitioned agents fromredundant agents, partitions must be distinguishableby the agents that interact with them.

Definition 6 (Redundant agents): Two agents A1

and A2 are observationally redundant if they make the samepromises to an observer A3, and the A3 accepts the promisesequally, i.e.

A1+X−−→ A3 (9)

A2+X−−→ A3 (10)

A3−Y−−→ A1 (11)

A3−Y−−→ A2 (12)

where X ∩ Y 6= ∅.

An observer that discriminates between two agentsmaking identical promises may be called a discrimi-nator. Such agents are the basis of all decision makingbased on data. In principle, discrimination at a singleagent location can be made in on the basis of space(source agent) or arrival time, but each simultaneoustime step sampled by the discriminator represents anew causal decision, so that must depend on howwe define the scale timesteps and the discriminatoritself—a distributed superagent discriminator has tobe able to promise interior time coherence. As always,the scale of encapsulation over which we can assumeinvariance (‘coherence’) plays the main complicatingrole.

Definition 7 (Partitioned agents): Let two collec-tions of agents P1 = {A1, . . .} and P2 = {A2, . . .} be

13Confusion ensues for many when considering the origins forsuch change. There may be intentional change, such as a codechange or a manual input of data, and there may be unintentional(hidden) change to a dependency presumed invariant. A lot ofrhetoric has been exchanged around ‘never touch the system and itwill never go wrong’, but ‘if it fails, don’t fix it—replace it’. Theseare policy decisions, not unique recipes for handling change, butthey are rooted in the idea that invariance is a solid foundation forprocess continuity. They may have different causal outcomes.

10

Page 11: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

partitions, as discriminated by an observer A3, then:

P1+X1−−−→ A3 (13)

P2+X2−−−→ A3 (14)

A3−Y−−→ A1 (15)

A3−Y−−→ A2 (16)

where X1 ∩ Y 6= ∅, X2 ∩ Y 6= ∅, and X1 ∩X2 = ∅.

Note that a partitioning is a superagent, i.e.

Lemma 1 (Partitions are agents): If agents Ai arescale n, then a partition is a superagent at scale n+1.

This follows trivially from the definitions, without anyrestrictions on what other promises the agents maymake.

Example 6: ‘Shared nothing’ agents cannot becompletely redundant. They make a priori uncor-related and uncalibrated promises, so they cannotpromise determined redundancy; they are merely ran-dom and possibly similar. As soon as they accepta common source of calibration (by cooperative de-pendency on a single source), or achieve dynamicalequilibration (as in data consistency protocols), theyshare something from O(1) to O(N2). In practice, theclaim of ‘shared nothing’ is overtstated as agents sharedetermining configuration that only changes on a longtimescale, making it effectively invariant, according to(3).

E. Invariance

We can now define the meaning of invariance ofan interaction, as a process involving pairs of agents(on any scale): a source and an observer:

Definition 8 (Invariance of a promise π): Anagent’s (exterior) promise may be called invariant if thebody of the promise is constant for the lifetime of thepromise.

Theorem 1 (Invariant promises): An agent may beinvariant with respect to exterior change if and only if itcontains all its dependencies and promises them constant.

To prove this, suppose an agent A makes a promiseof X to an observer O, conditionally on the promiseof another agent AD being kept, which promises adependency D, then:

πA : AX|D−−−→ O (17)

πAT : A−D−−→ AD (18)

πD : D+D−−→ A. (19)

In addition, A promises its value of D to be constant:

πC : AD=const−−−−−→ O (20)

If we form the superagent from {A,AD}, from thetwo collaborating agents, then

{A,AD}+X|D−−−−→ O, (21)

{A,AD}−D=const−−−−−−→ O, (22)

which is unconditional after A assesses the promiseto provide D has been kept, i.e. αA(πD) 6= 0. Now,

since D is promised constant, X|D → X|const, andthis is invariant under D. If the promise πA is madeunconditionally, then D = ∅, and the result is triviallytrue.

Example 7: This theorem may be considered the basisfor freezing all dependencies and configurations internallyin containers before execution, e.g. in Docker or usingfixed images in the cloud. But it also applies to dynamicalconfiguration engines, like CFEngine etc, where the policyfor D is fixed and the promise keeping is maintaineddynamically by ‘self-healing’ based on fixed policy. In eithercase, the promise may be broken because the result cannotbe guaranteed. In the case of containment, one is trustingthe integrity of the containment, which can only be assuredfor non-runtime state by making it read-only. In the case ofdynamical configuration, runtime state can also be repairedby dynamical equilibrium in the presence of noise. So therequired invariance is not dependent on a particular strategy.A dynamical configuration is more expensive in processing,but may prevent errors before they occur. Static containmentmay appear cheap, in terms of runtime process resources, butis more likely to result in exterior consequences that breachcontainment, because the timescale of exposure to non-corrective actions is maximal (the lifetime of the process)rather than a regular shorter maintenance interval.

We can reduce this to a very simple expression ofinvariance for agents, as a whole:

Definition 9 (Invariance of an agent A): An agentpromises to accept nothing from any agent.

From the proof, we see that this is a scale dependentassertion, since we may always partition the agentinternally such that one interior partition makes apromise on which the other interior partition depends,entirely within the boundary of the agent, leaving itsexterior promise unconditional.

The key assumption in this argument is the absenceof unintended change, by impositions, such as noise,that systems are fragile to. Many developers believethat there is no noise in systems, only the programmedchange, because a lot of it has been eliminated by lowlevel error correction14.

As long as ‘a system’ of choice interacts with someother agency, it is not the total system, merely anarbitrary partitioning of it. If an promises to acceptnothing, i.e. make no (-) promises, then its interiorstate will be invariant for as long as that promise canbe kept. We may assume that this is the actual goalof systems that serve users.

F. Sharing versus partitioning

Partitioning is naturally the opposite of sharing.The original definition defined ‘shared nothing’ fordatabases was ‘neither memory nor peripheral storageis shared among processors’ [31]. In the cloud era, weneed a more generalized abstraction to cope with thebranching technologies.

Definition 10 (‘Shared nothing’ agent): An agentis keeps all of its promises unconditionally (makes no

14In band ‘self healing’ configuration engines are essentiallynoise error correction processes, on a fairly long timescale ofminutes to hours, which may be too slow to maintain invariancefor busy processes.

11

Page 12: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

assisted promises), from its own intrinsic capabilities, i.e.it makes no promises that require the assistance of anotheragent.

An example would be a unikernel architecture with-out network service interactions and private disk.Since ‘shared nothing’ the default assumption of ‘au-tonomous’ behaviour for agents in Promise Theory,we see the utility of promises to describing theseissues: every dependency has to be revealed as apromise to see the channels that constrain processoperation. A simple consequence of defining this isthat agents that play the mediating roles of hubs,switches, or routers (as in figure 2) for promises ofany kind violate the condition above.

Lemma 2 (Hubs violate ‘shared nothing’): Anynodes that operate as a point of confluence, or a divergencelike a switch, and connect a sharding of process messages,promises to partake in sharing and violates the assumptionsof ‘shared nothing’ in the broader sense. Shared nothinginvolves promises that do not depend on one another forany of their resources. This even includes power supply atthe deepest level.

The degree of sensitivity to sharing depends on thepossible variance of the dependency. If we seek todepend only on invariants, then there is only weakcoupling. The shorter the timescale for variation (themore active a dependency is), the greater its effect onthe system promises.

The connection between state and partitioning liesin what information is used to distinguish processagents. Process trajectories trace the evolution ofcausal relationships from agent to agent, at whateverscale an observer can witness. Some agents maymake indistinguishable promises leading to redundantparallelism, or they can promise full distinguishabilityleading to branching and switching (decision making).

Distinguishability of promises is what enablesnon-shared futures, i.e. sharding and switching (seefigure 2). In switching, a process selects from a setof possible futures based on the state of variabledata. Each decision partitions possible outcomes intobranches, or ‘many worlds’ futures. If branches areindistinguishable (contain only the same redundantinformation, both in initial conditions and runtimestate) then the branching process is memoryless, andthe superposition of agents acts as a single superagenton a larger scale. If they are distinguishable, theprogram takes on a new course.

Occasionally, different process flows merge into asingle one. This happens with pull requests in softwaredevelopment, for example. It also happens in datapipelines where source information gets aggregatedinto batches. Branching (+ promises) costs nothing,but merging timelines (- promises) requires causalintervention, and the input of new information in theform of state-dependent selection criteria. Thus we donot escape the cost of a memory process by branchingas long as there is a need for the branches to be

merged15.

Example 8: In network package delivery, i.e. ‘rout-ing’, for instance, the decision about which route to takeis variable according to a separate parallel process, but thatmight be made on the same timescale as the running processfrom which the data arises. This is then non-linear (see sec-tion VI-A). It is always downstream (receiver) promises thatcarry the greatest responsibility and the greatest potentialcost—hence the downstream principle (see section VIII-A).

V. STATES, EVENTS, AND MESSAGES

State refers to information, which may be reliedupon by some process to determine its next steps.This includes monitoring systems that may rely onthe state to trigger exterior processes. The total stateof the world is all the observable information in theuniverse (including chains of indirect dependency).That which cannot be observed by an agent cannotinfluence its future. Information may be hosted byagents in the form of any promisable properties, andoverlap between human and computer subsystems, forinstance as part of a larger business process. Partialstate may localized to a particular container, or beaggregated over those locations (multiple agents), aswell as over time (multiple episodes)16.

States get passed between agents and their pro-cesses, coupling them together. When a sample of dataarrives, it normally leads to a transition from one stateto another [9]. This is the meaning of a message.

Definition 11 (Message): A discrete unit of data, i.e.a state transition carried between agents.

A message channel is a pair of promises:

S+MS−−−→ R (23)

R−MR−−−→ S, (24)

where MS∩MR 6= ∅, that forms a non-empty promisebinding to share messages unidirectionally from asender to a receiver.

We understand events as ‘happenings’. In physics,we attribute coordinates to events, in space and time;this is the origin of many confusions. In computerscience, coordinates refer to process signposts andprogram counters, which are not generally helpful toknow externally (their scope is local) [5]. The keydifference between a message an an event is thatevents are observational in character.

Definition 12 (Event): A discrete unit of process inwhich an atomic state change is observed or sampled.

We further imagine processes being driven by a flowof events, like a stream; that’s because observersserialize them as a matter of policy, based on thelimitations of their cognitive processes—and situatethemselves downstream of the outcomes they are

15This is basically the reason why Continuous Delivery advocatesrecommend developing software in a single branch. Contention canthen be resolved in band, since software development is a largelystateful process, in spite of modularity.

16These two methods correspond to frequency (space) andBayesian (time) interpretations of statistical state.

12

Page 13: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

interested in. A message only becomes an event whenit is sampled (i.e. accepted by a receiver) and generatesa change of state, becoming a tick in the clock of thatprocess.

Definition 13 (Event or Message Driven Agent):Any agent R that can promise the occurrence of an eventE, conditionally on its sampling of message M from asource S, with an average rate λ:

R+E|M

O (25)

i.e. R can promise an observer O that it acknowledges anevent E on receipt of a message M . By Promise Theoryaxioms, this assumes the prior promises (or impositions):

S+M|λ

R or S+M|λ−−−−→ R (26)

R−M|µ−−−−→ S (27)

where µ is the queue service rate.

A message is the transmission of a proposed statechange. If accepted there is a response on the interiorof the agent, and there may or may not be a responseon the exterior. This includes heartbeats and repeti-tions of all kinds; no prejudice should be inferred whatis important and non-important state.

A. Propagation of influence by state messages

We see that state does not lead to influence unlessit is observed, and there are conditional promises thatuse it to promise conditional behaviour. This is thebehaviour of a ‘switch’.

We can now think about this in terms of dependen-cies. In order for remote state to affect a local process,a source agent has to share it, then the receiver has toobserve, accept it, and subsequently alter its behaviouraccording to it (see section VI-A).

From section IV, what we call the beginning andend of a process is a scale-dependent characterization.We make a choice about which agents we want toinclude at a given scale. The common understandingof processes has some basic elements however. Eachprocess has a lifecycle of major states that characterizeit, which we can call epochs in the lifecycle of theprocess.

• Definition (of promises).

• Initialization of resources (Initial state).

• Execution (keeping promises).

• Termination (Final State).

This is basically the same model one has of anydynamical system in mathematics or physics. It mapson to the equivalent, e.g. think of solving differentialequations:

• Definition the equation.

• Initial boundary condition.

• Find the propagator that computes the deriva-tive states.

• Final boundary condition.

These are the major elements we use to describe thecausality of a system, and fix its trajectory.

In a cloud setting, these correspond to

• Building software

• Configuring software settings

• Executing the software (runtime)

• The desired end state (outcome)

There is state in each of these epoch timescales. Weare free to redefine the placement of changes. e.g.keeping code or configuration invariant during exe-cution, or to write code or configuration that rewritesitself (as in learning systems).

B. Scaling of local state

A proper discussion about the localization of statecan only made with reference to a theory of scaling.What is local at one scale is composed of manylocations on a smaller scale17? We therefore need todecide on the agents, or units of localization: what dowe mean by entity, agent, location in a given context?As mentioned above, a certain locale could refer toanything from single chip register or a distributeddatabase, depending on the author’s state of mind!

Computer processes are made up agents, whichare discrete processing units. They sometimes worktogether in clusters—represented here as superagents.By making promises, they form many patterns such asclient-server interactions, data pipelines, object mod-els, microservices, container pods, backup servers,redundant failover, etc. Promise Theory provides asimple view of scaling, based on boundary semantics,that easily accounts for the cases found in IT [4].We can thus ask, to what extent are promises (e.g.about state) within or without of a boundary? Is stateimplicated in decision-making at the level of a condi-tional promise on the interior or exterior of an agentboundary? How is state implicated in propagation ofassisted promise-keeping?

Locality refers then to the ability to draw a se-mantically defined boundary around an agent (i.e.one based on what it promises rather than based onwhere it happens to reside) and decide what is on itsinterior (local) and what is exterior to it (non-local).Every system of agents that interacts with other agentsbreaches its boundary or grows it to accommodate newmembers, so the definition of a system ‘module’ isalways an ad hoc matter. Modules are often chosenbased on functional separation in IT18.

Part of the confusion in the colloquial use of‘stateless’ is that ‘state’ itself refers an implicit andspecific scale for many authors, namely whatever

17The description of a virtual hierarchy of perimeter boundariesaround resources leads to a kind of ‘Gauss law’ for promises madeby process agents. Any promise of state expressible externally mustcome from interior process memory.

18I’ve argued that one should instead be guided by The PrincipleOf Separation Of Timescales if predictability and stability are theprimary goal [5], [7], [32].

13

Page 14: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

favoured object they happen to be working on, suchas a programming class, a process container, a cluster,or a host computer, etc. Software engineering doesnot teach practitioners to think across multiple scales.State may therefore refer to all scales, from interiormicrostates to aggregate macrostates, and refer to realor virtual space. In order to observe and measure state,it needs to persist relative to the process that samplesit (i.e. for some finite number of samples or durationof proper process time). Different processes tick atdifferent rates, and interactions often lead to waiting.The issues of observation were discussed in [5].

A notion of ‘total state’ may be accumulatedover many interactions, either laterally across manyredundant concurrent processes, or longitudinally overmultiple similar interactions, such as in data collectionand machine learning applications. Already it seemsclear that we need to distinguish different kinds ofstate and that the intended use of the data play a partin what is objectionable about statefulness to someauthors. If we think of sampling as an informationchannel, in the Shannon sense, then the separationof timescales amounts to partitioning process samplesinto different channels according to the timescalesover which we assume that certain state we rely onwill be invariant, i.e. constant with respect to multiplesamples.

C. State localization at different scales

If we redefine the agent boundaries or partitions ofa system, we can shift state formally from one locationto another, but we can’t do so without altering thepromises kept by the outer boundary. This assumes,naturally, that state is depended on for a purpose. Freestate is irrelevant baggage.

We can try to summarize statelessness withoutreferring to a particular case, like client-server or datapipeline, or even to a particular scale, while—at thesame time—unifying semantics and dynamics for theprocess:

Definition 14 (Locally stateful): A locally statefulprocess is one in which memory is kept on the interiorof a process agent or superagent cluster. This memoryis promised for as long as the process agent’s dependentexterior promises persist, and access to process memoryoccurs over interior channels.

Definition 15 (Non-locally stateful): A non-locallystateful agent is a composite agent, in which any persistentprocess memory accumulated over the history of interactionsis partitioned and kept independently of the agent mediatingan exterior conditional promise (see figure 4). The mediatingagent is then merely a conduit for state that persists in anagent belonging to a ‘backing’ process partition. The lossof the mediating agent does not incur a loss of partitionedprocess memory for the collaboration.

This deliberate indirection—pushing state out of oneagent and into a dependency—seems to implicitlyreference shared resources and risk mitigation, notwhether state is promised or used19. So reference to

19The intent of the Twelve-Factor App manifesto seems toprincipally address risk and local contention.

application boundary

V

π( )V

state

mediatingagents

partition

exterior

promises

interiorpromises

π( )

Fig. 4: A partitioning of a process ‘entity’ or superagent.Within its semantic boundary there are stateful parts andstateless parts. Should we argue these as separate or inte-grated? The exterior promises are conditional on the interiorstate dependency, but the mediating agents are stateless(memoryless). This approach was used in Kubernetes, forexample, where container services were initially assumedweakly stateless, with possible database services partitionedinto separate containers and storage services. Later, this wasrationalized to weaken the claim of statelessness.

state is a red herring for intended purpose, and it con-ceals assumptions about the timescales and numberof times over which the state will be used before itchanges. Such matters are critical and therefore theassumptions are unacceptable.

final state

causation

+R | c−cA

CLIENT

states

intermediate

SERVER

BACKEND

dependent promise

dependency

initial state

Fig. 5: Conditionality is causality. The states that arecausally implicated in a conditional promise include initialstate, including code and prior configuration; then thereis runtime state and shortlived intermediate computationalstate, which may be reconstructible from code and inputs.All these combine to an output that may be strongly orweakly dependent on the full set: how many intermediatestates are involved in computing a virtual transaction? Isthe scope of that state private or shared between distinctexchanges? These are all questions of process scale.

The key question about state, then, is not whetherit is retained, but rather whether or not it is used as adependency in the keeping of a larger promise. If theloss or latency of such state gets in the way of a largerdependent promise being kept, then one would bebetter served by a collaborative architecture in whichthat risk may be mitigated. The term ‘shared nothingarchitecture’ [31] is more accurate than ‘stateless’to address this. It implies a form of sharding orpartitioning of agency in a system: possibly at eitherthe client side or the server side. Both ends can end

14

Page 15: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

up having to deal with inconsistent promises20.

Whether we keep state in primary RAM memoryrather than in secondary disk storage or even tertiaryservices like databases is not the issue. The issue ishow do we depend on it, i.e. what happens if it’s lost.What sources do we trust to keep stable promises?

Unfortunately, a ‘shared nothing’ partitioning ofdependencies to localize causal interference has itsown problems. A set of services (e.g. for web,database, and storage) is already made up of separateprocesses, even if they run on the same host, or withthe same common storage. Just how much separationis ‘shared nothing’? If they all share a commonpurpose, then they must be connected by something.Should we wrap them in layers of virtualization (con-tainers, virtual machines, etc), or run them on differenthosts, in different racks, in different datacentres? Byhanding off state to another agent that serves it upas a backing service, we only introduce a new shareddependency.

VI. STATEFUL (MEMORY) PROCESSES

We can now put these key elements together tounderstand causal dependence, or chains and transitionmatrices across networks of agents. What’s key abouta process is what invariant information we have toconstrain its trajectory. Initial and final conditions arethe available external fixed points. The process rules(promises) may also implicitly contain fixed pointbehaviour such that the process converges to a ‘desiredstate’.

When processes depend on one another, they ob-serve one another’s states. The amount of memorythey have, internally, defines the extent of their de-pendence on their own causal past or future, both asmemory for storing their program and for representingdecisions as a ‘log’ or ‘journal’ of prior states.

A. Short memory processes: linearity

A perspective, which addresses increasingly pop-ular ideas about complexity and chaotic behaviours,is process linearity. Linearity is related to weak cou-pling, and addresses the relative scales of processinteractions (see the earlier paper on observability [5]).Non-linearity is associated with memory behaviour—behaviours in which past interactions change the sys-tem so that new interactions experience a modifiedsystem. This is learning behaviour. A non-linear pro-cess cannot simply be replaced or restarted withoutaccess to a complete history of interactions, synchro-nized for all times, because it’s outcomes dependon that unique memory of interactions. Non-linearagents cannot be redundant, as their unique historiesdistinguish them.

A system may be called linear if it comprisesconditional promises that are Markov processes of

20A huge amount of discussion centres on data consensus sharingprotocols for server (+) promises, but almost nothing is writtenabout the responsibility of the receivers (-) who ultimately shoulderthe burden of dealing with inconsistency. For an industrial example,see [33].

order no greater than 1 (Markov processes are de-scribed in the appendix). In other words, if the processis independent of past inputs to a scale that goesback n > 1 samples into the past (where the ‘past’is defined to mean a chain of prior samples). Thismatters when the delivery of data could be carried outin a transactional way, but the promised methods thatreceive and process the data are changing concurrentlyas part of an independent process. Dependency graphsmay span multiple processes implicitly. They might bequite invisible in program code.

Consider a simple interaction, of the ‘client-server’variety, in which an agent C (in the role of client)promises or imposes a request c onto an agent S (inthe role of server), which is accepted by S i.e.

C+c

S (28)

S−c−−→ C. (29)

The absorption of c by S implies that a state haschanged in S, for some timescale that persists fora sufficiently long time to enable a response r tobe returned. Let’s say that the response is a simplestorage lookup, like a database record or a web page.This acts as a key-value pair, where the key is c andthe value is r(c), which depends on c

S+r(c) | c−−−−−→ C. (30)

In order for S to make this conditional promise,it has to contain the state variable V = r(c) onits interior. The state variable is persistent, so S isclearly part of a system that promises state. Now, itmight ‘outsource’ this capability to another agent (abacking service, in the vocabulary of [1]). Then, wehave an assisted promise [8]. Suppose the assisting orbacking agent is D, then S hands off responsibilityfor state to a subordinate agent, and must thereforemake an assisted promise that depends both on theclient request c and the promise of state storage d:

S+r(c) | c,d−−−−−−→ C, (31)

S−d−−→ D, (32)

S−c−−→ C, (33)

(34)

where S hands off the request to its subordinate:

S+c′(c) | c−−−−−−→ D (35)

D−c′−−→ S (36)

D+d(c′) | c′−−−−−−→ S (37)

S−d(c′)−−−−→ D (38)

As long as each dependence is a Markov process,forming a Markov chain, the dependency on c is linear.

Definition 16 (Linear conditional promise): Aconditional promise π is linear with respect to a dependencyd iff,

π : S+V (d) | d−−−−−−→ R (39)

implies that ∂V/∂d = const over the life of π (seeappendix).

15

Page 16: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

Linearity literally implies that a functional dependenceon d is linear (of polynomial order 1), and does notalter the functional form of the promise V (d). Thedependency d does not alter the promise, except to actas a lookup key. If we were to repeat the keeping ofthe promise over some timescale, i.e. over some chainof promise keeping assessments, an observer wouldnot assess there to be any difference in the result ofV (d), over a number of samples T . The promise istherefore invariant over a timescale T .

The qualification of a bounded interval T is impor-tant, because no system is truly invariant for all futurehistory (see figure 1). Changes do occur to systemsand their promises: new versions of software promisesare made, for example. The real issue is whether onecan redefine a process to ensure that invariants arefixed somehow before runtime execution starts and allthe way up to when it ends21.

Lemma 3 (Linear promises and weak coupling):The need to wait for state history increases the service timefor a queue, increasing the ratio of λR/λS .

We need to define clear timescales for the asser-tions (promises) we make. Slowly varying changesdecouple from changes that occur on the timescaleof the promise because each sampling of a linearsystem is an independent variable, and a sequencethat depends on multiple samples is independent ofthe sample if the sample has already been integrated(e.g. refactored) into the definition of π22.

B. Long memory processes

Long memory processes depend on the sequencesof states that led to their current state: e.g. does itmatter which route you used to enter the city? This isthe typical domain of machine learning.

The memory required to keep this promise deter-mines a minimum scale for the process. Long memoryprocesses cannot be stateless, in any definition, but itmay be possible to separate part of a long memoryprocess and isolate certain subagents whose behaviouris memoryless.

C. Invariant definitions of stateless

Given the popular usage of the term ‘stateless’, itseems appropriate to accommodate the commonplaceideas with a clearer definition, so that we do leastviolence to present day intuitions. This leads to whatI’ll call weak statelessness:

Definition 17 (Weakly stateless process): A mem-oryless process (Markov process of order 1) promises thatits interior memory of past interactions is the empty set:

A+V (t)=∅−−−−−−→ ∗. (40)

21This is a more precise expression of what people mean by‘immutable containers’.

22This is what happens, for example, when runtime incidents leadto iterated bug fixes in software and new promises are made thatincorporate past states into the initial conditions of current promises(feedback loops). The process of Continuous Delivery renders suchchanges on the same timescale as runtime transactions, which makessensitivity to change higher.

The definition is only weak, because it doesn’t saymuch about what other behaviours the process mayhave. Implicitly, it suggests that that the next outcomeof the process can only depend on the inputs at eachstep. Inputs could easily include data from long termexterior memory. The key point is that the promisesthat are purely local to the weakly stateless process aredecoupled from, i.e. invariant, for all possible input-output transitions, as in (83).

Memory processes, or stateful processes, are thosethat are not weakly stateless.

Definition 18 (Stateful (memory) process): A pro-cess that promises:

A+V (t) 6=∅−−−−−−→ ∗. (41)

When these two kinds of process are composed, toform a superagent on a larger scale, the result isnaturally stateful.

Lemma 4 (Stateful + stateless = stateful): Anagent that promises to be both stateful and weaklystateless is stateful by composition.

The proof is trivial:

A+∅−−→ ∗

A+V (t)−−−−→ ∗

≡ A+V (t)−−−−→ ∗ (42)

If we want to be strict in the definition of statelessness(what we might call a purely ballistic process) then theagent responsible has to refuse all input.

Definition 19 (Strongly stateless process): A pro-cess that has no exterior (-) promises to accept inputfrom any source during its lifetime. The agent’s promiseis thus completely constant: it does not rely on the order orsubstance of any other information.

scale n+1 agent

backend

client

server

client

server

backend

scale−n agents

Fig. 6: A client-server system with a backend can treat thebackend as part of a service, or as a separate service. If theexterior promises remain the same, then these configurationsare indistinguishable. We are always free to compose ordecompose agents at scale n into agents at scale n − 1 orn + 1 by redrawing the boundaries around modules. Thisshifts a discussion about interior to exterior or vice versa,but cannot affect the outcome observed by an agent on ascale greater than the total system.

What we surmise is that basically all non-trivialprocesses must be stateful on some scale, because apromise of stateful behaviour overrides a promise ofstateless behaviour on any scale.

16

Page 17: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

D. Transactions on scale T

It’s usual to define transactions in terms of atom-icity and consistency. Here we can define the conceptsmore simply using invariance of promises:

Definition 20 (Transaction at scale T ): A transac-tion is the promise of an invariant sequence of messagesM1,M2, . . . ,MT , of length/number T , accepted by a pro-cess agent A, whose memory of the messages is alsoinvariant over the sequence, and contains all the data neededto keep the conditional promise

A+X|M1,M2,...,MT−−−−−−−−−−−−→ (43)

In other words, the agent A doesn’t let go of theinformation from its cache until it is acknowledgedby the receiver. Failures on a large enough scale canstill wipe out all the information of the transaction,but this adds some assurance of invariance if the datasurvive the transaction.

With this definition, we do not presuppose anymodel or scale for the meaning of a transaction.As long as the transacting agent is invariant overthe completion of its promised task, and the datarequire no dependencies. The virtue of this definitionis to make such transactions repeatable, as all theconditions of the transaction are self-contained, andthus invariant. Put another way, transactions turn mes-sages into scalable autonomous (super)agents, withoutexterior dependencies beyond their promised scale T .

Lemma 5 (Transactions are repeatable): Anyvalid transaction at scale T leads to a repeatable process,given the same message and conditional promise.

Notice that the process is only memoryless if T = 1,i.e. we choose a particular scale, but the all importantinvariance is scale independent. Also note that it’s im-portant to distinguish between events and transactions,which many authors fail to do. The invariant propertiesof transactions are not shared by arbitrary messages,so favouring a transactional system is not the same asfavouring a message or event driven system.

E. Scale dependence of state and causality

Under scaling transformations that aggregate pro-cesses by causal dependence, the foregoing discussionshould make it clear that we can state quite strongly:

Theorem 2 (Statelessness is scale dependent): Aprocess that is weakly stateless at scale n may be statefulwhen causal promises are composed or decomposed atscales n− 1 and n+ 1.

The proof of this is elementary. Consider agents A1

and A2 that make promises that are stateless andstateful respectively, such that A1 depends on thepromise of A2

A1stateless−−−−→ ∗ (44)

A2stateful−−−−→ ∗ (45)

{A1, A2}stateful−−−−→ ∗ (46)

This theorem renders statements like ‘transactionscannot span entities’ [17] meaningless, as there is

no plausible definition of an entity without a clearspecification of scale.

Causality itself is about the transmission of priorstate, along the trajectory of each autonomous process,causality must itself be scale dependent. Indeed, aswe’ll see, influences may appear to be determined bystates that are only reached in a process’s future. Timedoes not follow a simple imperative ballistic view ofprior state. In the frame of the process itself (theproper time) acausal changes frequently take place,by advanced boundary information.

VII. CAUSALITY AND EVENT DRIVENPROPAGATION

Several authors have commented on the impor-tance to time relativity for understanding processexecution [34]–[36]—already bringing insights fromspacetime relativity, and ‘many worlds’ interpretationsof Kripke and Everett [37], [38]. Time has been thedomain of physics for centuries, and it would bea mistake to not pay attention to the full range ofpatterns developed there. To fully understand causalityin distributed systems we need to expand the simplisticunderstanding of universal past, now, future into alocal view in which causal behaviour depends on allthree in a scale dependent way.

A. Past, present, and now

Past, now, and future are concepts about the orderof events relative to a process of observation. Whatan observer calls ‘now’ is the state expressed by itsclock, i.e. a snapshot of its complete interior state(see interior time [5]). Obviously, this is not a scaleinvariant assertion—if we step back, or zoom in, theboundary between interior and exterior is altered23.Agents may be aggregated into superagents, which arethe smallest grains on a larger scale.

The common view of causation is the retardedview:

Definition 21 (Retarded process): In a chain of de-pendent promises, a process depends on an invariant initialstate or boundary condition. The final state of the agent doesnot play a role in determining the outcome of the process.

Example 9: In a process to build a tower, thebalance of the project bank account starts with theinvariant boundary condition of zero money. Its finalstate is a sum of transactions related to that initialstate. The final outcome of the tower plays no role indetermining the final amount in the bank account.

The contrary view, often used in radio engineering is:

Definition 22 (Advanced process): (includesdesired end-state, recursion, etc). In a chain of dependentpromises, a process depends on an invariant final state orboundary condition. The initial state of the agent does notplay a role in determining the outcome of the process.

Example 10: In the space race to the moon, thefinal invariant outcome of the process was to land

23This is why the concept of a microservice architecture versusmonolith has no invariant definition either.

17

Page 18: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

a person on the moon. The chain of transactionsleading to that point was not dependent on the initialconditions of the project.

In the latter case, the final state of an agent isimplicitly or self-determined, and the promises workbackwards to search for a path to reach it from theundefined initial state. This approach is used in trans-actional ‘rollback’, for instance. It’s also how a GPSnavigation system works, for example. A process tosolve a Rubik’s cube is also anchored in the invariantfuture state (the desired end state of ordered colour[36]).

Example 11 (Proper time clocks): For example, theincrements of time for a cloud process could be measuredby ticks that represent the starting and stopping of containerprocesses. Or we go count each function call as a tick of aclock, or each statement. This is not nit-picking: it mattersto the issue of causality how we define the evolution ofprogress.

In a flowchart view of programming, which representsthe most common imperative view of time, the futureis thought of as a function of the past.

Tnext = f(Tthis). (47)

Each prior statement leads inevitably to the next byan implicit jump instruction in the process counter.No statement changes the past, because everythingadvances at the same rate: the result of each statementis the essentially deterministic keeping of an exteriorpromise of its agent.

At a function call level, this is somewhat am-biguous, because a function call involves recursion,which poses the promise of the outcome before theexecution the keeps the promise, i.e. in the assignmentx := f(y), the right hand side is assumes that f(y)exists, which involves stack frames to create a side-ways dimension of ‘subtime’24, whose incorporationinto the process is acausal from the perspective ofa programmer. Past and future get muddled by anassignment that behaves like an advanced boundarycondition, while a subroutine advances with a locallyretarded boundary condition of the function argument.The discrete scaling of a process into lumps, orsubroutines, implies that time does not run in a simplefashion for any observer outside the system (see figure7).

Functions that are non-deterministic may also em-ploy data that are not accounted for by the promisesof the agent [22]. Such systems are known to beirreversible, but can be made to behave consistentlyby using advanced (exterior future) boundary values.

Definition 23 (Interior feedback): Interior feedbackat scale n is a causal sequence of messages whose channelruns counter to the direction of the system’s proper timeat scale n + 1. It is unobservable from the exterior ofthe agent containing it. In other words, the process clockonly ticks after iterations and interior subtime machinationshave reached an outcome that can keep the agent’s exteriorpromise.

24I borrow this phrase from Paul Borrill, and elsewhere use theterm ‘interior time’.

state

interior

exterior feedback at scale n+1at scale n

feedback

initial

state

final

Fig. 7: Feedback on the functional scale, including iterationand recursion, leads to apparent causation in the reversedirection relative to large scale exterior time (against theobserved flow as seen by an exterior observer), but becausethis is unobservable, the promised outcome, measured byexterior clock time always appears to flow in a constantdirection from start to finish. This is what we interpret asfrom past to future. The direction of time on a large scaleis from left to right, but inside the subroutines it may becounter to this monotonic progress.

Feedback may appear causal or acausal depending onthe scale of the agent making that assessment. Thisonly illustrates how the meaning of time is naturallycomplicated in distributed multiscale processes. It’sneither deep nor trivial.

Definition 24 (Exterior feedback): Exteriorfeedback is the same from the perspective on theinside. A dependency from downstream of the process (thecausal future) which is merged with a dependency fromupstream (the causal past).

B. Facts, messages, and event horizons

Messages are the transport mechanism for programtransitions between agents. Events are the observationof a state transition by any agent O. The preser-vation of an event, as an immutable fact, is not apriori guaranteed by any agent. It is design choice (apromise) made by the receiver, whose default is non-immutability. An invariant promise by an agent needsinterior memory to remember it, and—since all re-sources are finite, including memory—there must alsobe a cut-off lifetime for such facts to be remembered(an event horizon). As the scale of a dependent pro-cess, it encompasses an increasing amount of memory,which implies a growing power cost and increasedinterior time latency for data retrieval. Eventually, theability to recall prior facts must become much greaterthan the lifetime of the agent’s promise lifetime.

Example 12: This cost has been made clear in theearly blockchains, where coherence or consistency of thechain (the transaction journal) is the causal promise.

The idea that the past informs the future is toosimplistic for distributed processes. A model of com-putation is a model of causally ordered events, butthe order of causality is actually undefined becausewe are free to place certain information in the rulesof propagation and other information in the boundaryconditions, instead of all in one place.

Einstein taught us that causality is what an ob-server sees. The arrival of messages, leading to events,defines a perceived direction for time for each ob-server independently. It is always measured at thescale of whatever observer assesses it. What happens

18

Page 19: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

on the interior, including acausal feedback loops, isusually discounted (see figure 7). Interior processsequence numbers may be used to pay for determinismof causal ordering by coarse graining time, i.e. bypaying for order preservation with a delay in interiortime; this may not appear to delay the process onthe exterior, but adds a cost in terms of uniquedistinguishability of messages to sender and receiver.Other information, like desired end states, fixed pointsand other ‘attractors’ may bring about convergencearound states that only exist in the relative future, anda process only ends (in interior time or subtime) whenthat future state has been reached25.

C. Repeatability and fixed points

The true goal of information systems as tools is tostrive for repeatability or predictability. It should nowbe clear that this is about the larger goal of arranginginvariance over process conditions, i.e. dependencies.The surrogates that often stand in place of this, sucha statelessness, and causal ordering, are themselvesnon-invariant characteristics and should therefore beavoided.

A common mistake is to try to assure invariance byacting ‘only once’ (the FCFS random walk approachto state, rather than the determined fixed point). Forexample, in the delivery of a transaction. We mightnumber transactions, like TCP sequence numbers,and tick them off a checklist as they are completed.This leads to a growing process memory (a statefulprocess). It can be replaced by a memoryless localprocess using advanced causation.

Advanced causation (treating the end state as afixed point) has many uses, e.g. for desired state policyenforcement. Systems whose interior states are chang-ing may not have homogeneous transitions acrossdifferent replays, but a choice of a fixed attractor isequivalent to inline error correction.

Relying on thing that happen only once is a non-invariant procedure (changing the sampling timeoutcan change yes into no). Messages may be repeatedor lost, and isolation from interference is not a promisethat can be kept easily (process isolation is oftenthe first thing violated by intrusions and securityexploits). If we seek a deeper level of safety, it makessense to rely not on the keeping of promises that arefed as data, but on the characteristics that are morelikely to be preserved, such as convergence to fixedpoints26. The surest means to achieve repeatability isthe maintain the promises on a timescale shorter thanthat at which they are sampled. This is the Nyquistsampling theorem in action.

Advanced propagation determines based on a de-sired state xD

xend = f(xany) (48)xend = f(xend) (49)

25This is Turing’s halting problem.26This explains the value of in band configuration maintenance

for security and safety in a live setting. Instead of relying onisolation, one hopes for isolation but validates it with a competingimmunity process (‘trust but verify’).

We see that the final value is insensitive to the initialvalue, which is in strict opposition to the functionalidea of past forming immutable facts. The immutablefact lies in the definition of the function itself, whichrefers to an ‘inevitable’ future state.

The outcome is idempotent when it reaches itsfinal state, not after a certain number of transactions‘once only’ has been reached [10], [32]. The approachis what the immune system does, and was used fa-mously in CFEngine [10], [26] and later configurationtools27. It’s also the approach used in pull requests,and GPS locators. The processes are designed tofavour a predetermined outcome. The outcome willonly become an event in the agent’s future, and willonly be observable as a future event by other agentsthat depend on it.

On the interior of a process, a fixed point of achain satisfies conditional promises:

A+Xp|Xi−−−−−→ A′

A+Xp|Xp−−−−−→ A′. (50)

The more familiar retarded process is a Markovchain, to some order, and has no deterministic endstate unless the agents keep their promises perfectly,which is essentially impossible to promise.

D. Blocking and non-blocking promises

Conditional promises are ‘blocking’. They aremarked as ‘kept’ only when a precondition has beenmet. Unconditional promises are non-blocking. In asense, a promise of an advanced convergent state isa ‘blocking algorithm’. The process exits when thefinal state has been reached. In this case it is notwaiting for input, as in blocking I/O, but rather forthe keeping of an interior promise. It doesn’t referto any particular message, because it acts as a quasi-invariant condition. As long as the condition is notmet, nothing will proceed. If the process drifts in andout of compliance, due to other subtime processes,then blocking may add exterior latency.

It only makes sense to speak of non-blocking ina shared time environment, i.e. an agent that hasinteractions with more than one dependency. Sinceeach agent has its own process clock, the agent thatshares communications with these has to share its ownclock with all its dependencies—violating the ‘sharednothing’ notion. Analogous to reaching consensusequilibrium, any agent with multiple dependenciesdoes have to wait for all of them to keep theirpromises, else it cannot keep its own promise. Tosummarize: an agent with multiple dependencies need

27This distinction and its scale dependence was the basis for theconfiguration management wars of the 2000s. It was argued that aninitial state process was required along with complete congruence ofsteps (requiring total isolation at a high level) [12]. The conversewas argued: by creating closed operations in which the outcomewas assured at a low level, the dependence on exterior orderingcould be relaxed (which corresponds to a non-blocking executionpolicy) [10], [26]. The latter is just a micro-encapsulation of theformer. The process is the same on different scales, but the latteris ‘reactive’ in the sense of the Reactive Manifesto [39].

19

Page 20: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

not wait for dependencies in any order (as they aresymmetrical with respect to the current promise, atscale n), but it must wait for all of them in total, atscale n+ 1 (see figure 8).

hub

1

S2

S3

Sm

+d1

+d2

+d3

+dm

+ R | d1,d2,d3...dm−d2

−d3

−dm

−d1

dependencies

wait

don’twait

shared dependency

S

Fig. 8: Any active N : M relationship will result inpartitioning of interior time at a hub. The granularity ofthat partitioning is an interior policy decision for the agent.Coroutines, threads, and serial blocking are three commonvariants of scheduling strategy for sharing the agent’s statemachinery between the neighbouring agents.

I believe a more correct formulation of the Re-active Manifesto’s call for non-blocking processes[39] is to decompose an agent into subagents, i.e.partition them as non-ordered dependencies of largerscale time-ordered dependencies (see figure 8). In asingle process, there must be a hub that receives theresults of such concurrent partitions (‘threads’), whichmust block on a larger scale to aggregate the results.A process (super)agent cannot be non-blocking ona scale of exterior actions—at its exterior scale—without breaking such a conditional promise. Thatwould alter its causal behaviour. However, the promiseto depend on m mutually independent dependenciesshould not imply an arbitrarily imposed order; inde-pendent concurrent processes need not be starved of ashared time resource because of the need to wait fora subset of them.

Example 13: A program need not suspend parallelthreads or co-routines while one of them is waitingfor a result. Waiting is a serial property, not a parallelone. The use of synchronous or asynchronous aboutcommunication implicitly uses the clock of one pro-cess to measure the progress of another, which has noinvariant meaning.

The reason for this nitpicking is that it maymislead readers into thinking that i) waiting is nevernecessary, and ii) all processes can be made moreresponsive by parallelization. Not suspending parallelthreads does not make a serial dependent thread asyn-chronous as measured according to its proper time. Itmay or may not be perceived that way according tosome exterior clock time.

These points are certainly pedantic, but we needclarity as an antidote to ‘best practices’ that are merelyadvocated on the basis of unclear language withoutexplanation.

Convergence has no dependence on past state,as long as the final outcome (desired end state) isa system invariant over each extended epoch of thesystem. It is an effective counter-strategy to the riskof non-linear divergence. For state that gets reused onmultiple occasions, it is a strategy for maintenancethat builds intrinsic stability into systems [32].

State convergence is also the way we render al-ternatives indistinguishable (by forgetting incidentalpast), and therefore engineer greater stability throughthe fault tolerance. The instinct to throw away the pastmay run counter to what many software developers aretrained to do—i.e. to keep every distinct case separatein its own context, but it actually leads to greatercertainty in the future. I’ll go out on a limb and predictthat we need a greater focus on advanced causationfor sustainable and scalable computing in the future.

E. Synchronous and asynchronous signals

Synchronous means literally simultaneous—at thesame time, i.e. measured within the same clock tickinterval—yet, when we speak of synchronous or asyn-chronous communication, we are talking about serialprocesses which (by definition) do not all happenat the same moment. The only way to resolve thismuddle is to define interior and exterior time overcoarse grained steps. This is effectively what we doin locking critical sections in computer code. Manysubsteps of interior time can add up to a single stepof exterior time, which the exterior promise waits tobe accepted and produce a tick.

In a distributed world of many clocks, synchrony isa meaningless aspiration [19]. Synchronous can onlymean ‘observed in the same interval’, i.e. according tothe same clock. Time intervals are not invariant (theyare covariant, i.e. the change with scale and observerreference frame). Asynchronous implies that an agentmay wait for an unspecified interval after receiving asignal before completing its dependent promise.

Both of these pertain to conditional promises:

AY−→ S (51)

S−Y−−→ A (52)

SX|Y−−−→ R (53)

R−X−−→ S (54)

(55)

Since we can only measure time differences locally ata single agent’s clock, a synchronous response wouldimply that the difference in S’s clock time betweenkeeping promises (52) and (53) was minimized.

An asynchronously-kept conditional promisewould imply an arbitrary delay between keepingpromises (52) and (53). Thus synchronicity is apolicy decision to set a scale for a ‘timeout’. Thesemantics of faults also need to be considered inthese promises: a ‘fault’ may also interpreted as apromise outcome in this loose description.

20

Page 21: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

The definition is more complicated when there isa shared channel (figure 8).

S1Y1−→ R (56)

S2Y2−→ R (57)

. . . (58)

SmYm−−→ R (59)

R−Y−−→ S1, S2, . . . , Sm (60)

RX|Y1,Y2,...,Ym−−−−−−−−−→ A (61)

A−X−−→ R (62)

(63)

Event driven systems may be synchronous or asyn-chronous. This is a timescale issue.

Lemma 6 (Synchronous or asynchronous events):The prerequisite for triggering a conditional promise is thesampling of all conditional events, no proper time intervalfor a response is implied by this order.

A promise of a minimum response time after col-lection of dependencies is limited by other (perhapshidden) dependencies, e.g. CPU rate, memory speed,scheduling commitments. It can be promised explicitlyby R, but has no absolute meaning for A, whichsamples according to its own clock.

Usually developers think about synchrony fromtheir own perspective: their viewpoint is that of anexterior observer (like a monitoring system, not drawnin the figure), which has instantaneous knowledge ofthe states of the agents. This is essentially a badhabit we take for granted in everyday life, becausewe live in a relatively slow world in which signallingis very fast. The promise (51) occurred when thefinal result was ‘in the bank’ R according to its ownclock. This clock is not usually distinguished from theclocks of the other agents, thus effectively assuming asingle global Newtonian view of time. Alas, in orderto observe those agents directly, the same promiserelationships in figure (61) are needed. Whether theseare given synchronously or asynchronously is scaledependent—a matter of definition, not an observablefact, because it relies entirely on the definitions of thefinal observer after it has sampled arriving signals.This is indeterminate, because to know this promisewould be to ignore the causal independence or violatethe autonomy of the agents.

The implicit goal of the manifestos seems to be torender total systems as close to causally deterministicas possible—recreating the Newtonian view of globalpast. This is possible, but only at the expense of a rateof total interior time the gets slower by at least N2

for N interior agents—as we know from consensussystems.

VIII. FAULT PROPAGATION

Many developers believe that modularity preventsthe propagation of faults (hence the interest in mi-croservices and object classes). To isolate an agentfrom a dependency one must withdraw all promiseswhich use that dependency—abstaining might be an

effective strategy for the spread of consequences, but italso invalidates its purpose. Locality may help to limitthe propagation of a faults, if causes are themselvesmodularized, and downstream clients make appropri-ate promises to recover (see section VIII-A), but thisis not a certainty. It depends on how we define thesemantics of separation.

The term ‘fault domain’ is widely used to imply akind of semi-permeable membrane that prevents faultsfrom having consequences beyond a certain perimeter.Security perimeters (firewalls) are a common example.Such barriers may select only specific messages froma wider set, but they cannot prevent the propagation ofinfluence unless there is independence of the promisesmade by those modules.

A modular system may, on the other hand, help topinpoint the source of a fault, by attaching a name toa region, if the chain of causal outcomes leave tracesof the name in the states of agents as they propagate.

A. Downstream principle

Locality gives a surprisingly simple and consistentinterpretation of responsibility for keeping promises[7]. The recipient of a promise carries the bulk ofthe burden of outcome. The so-called DownstreamPrinciple, in which agents have responsibility forseeking alternatives when promised outcomes are notdelivered by upstream sources, follows from the causalindependence of agents, i.e. of agent autonomy.

In a chain of promises, dependencies are ‘up-stream’ (the servers or sources of the flow of influ-ence) and the benefactors or clients are ‘downstream’.The assurance of the final promise outcome follows a‘downstream principle’ that the agent farthest down-stream has both access and opportunity to observe andcorrect (or absorb) faults, and hence the greatest causalresponsibility for adapting to a promise not being kept.In other words, the greater the distance from the pointof promise-making, the less causal responsibility anagent has in contributing to its outcome. Statelessnessdoesn’t play a large role in this principle; we onlyobserve that the natural situation for state is eitherfar upstream or far downstream (at the ends of thechain of dependency). This is tidy, for sure, so it helpsdevelopers, but it also enables efficient scaling and aregularity of promised patterns.

+s1 +s2 +s3

downstreamupstream

ownership

Fig. 9: Responsibility for success in a promise chain flowsdownstream. With feedback, upstream and downstream arescale dependent concepts.

This is not a moral assessment, it is a purelypragmatic observation about cause and effect. How-ever, it is interesting that it is in opposition to what

21

Page 22: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

is conventionally assumed about fault-tree hierarchiesand root cause analysis, which point a finger of blameat the first choice of promise provider. The explana-tion for this apparent contradiction can be found inthe bi-directionality of promise bindings required forpropagation of influence. By the conditional promiselaw, a promise that is conditional on another promisebeing kept (either by the same agent or by a thirdparty) is not a promise, unless the other promise ismade by the same agent. This clarifies and documentsdiminished responsibility.

We can now attempt a limited but tenable defini-tion of responsibility [7]:

Definition 25 (Causal responsibility): When anagent relies on a dependency promise in order to keep itsown conditional promise, causal responsibility refers tothe agent’s freedom to obtain a promised outcome by itsown autonomous choice of interaction, especially in thepresence of redundant alternatives.

In Promise Theory, we track provenance, or causa-tion with conditional promises, as chains of promises.Keeping each promise is the responsibility of theagent that makes the promise (the promiser). However,from the conditional promise law, an agent making aconditional promise has not made a complete promiseat all unless it also promises to acquire the thing itspromise is conditioned on. Thus the promise depends

D S R

+d+S(d)

−d −S

Fig. 10: A conditional promise chain, showing servicedelivery based on a dependency.

on the promises of other agents—a shared responsi-bility, which makes the promise more fragile, as itdepends on the promises of both the first agent ANDthe second. Consider the scenario in figure 10

D+d−−→ S (64)

S−d−−→ D (65)

S+S(d)−−−−→ R (66)

R−d−−→ S (67)

where

S+S(d)−−−−→ R ≡

S+S|d−−−→ R

S−d−−→ R

(68)

This system is fragile because the recipient has onlya single choice. It has a single point of failure. IfThe recipient could seek out redundant alternativesto provide the service S. Nothing can improve thesituation for these agents from outside, since whathappens beyond the horizon of the next agent in thechain of promise relationships is beyond the control ofthe recipient, and is thus beyond the limit any possibleresponsibility, but there is the possibility to improve

for the client by promising redundancy on a largerscale.

S RD

SD

+S(d)

−d

+d

1

2

1

2

+S−S

+d−d

−S

Fig. 11: A redundant conditional promise chain, show-ing service delivery based on a dependency.

Now consider the same scenario with redundancyalong the chain (see figure 11)

D1+d−−→ S1 (69)

D2+d−−→ S1 (70)

S1−d−−→ {D1, D2} (71)

S1+S|d−−−→ R (72)

S1−d−−→ R (73)

S2+S−−→ R (74)

R−S(d)−−−−→ S1 (75)

R−S−−→ S2 (76)

R−d−−→ {S1, S2} (77)

In this second scenario, both the server S1 the recip-ient can choose from two providers of the promisesthey are trying to use. For the final recipient R, thefact that the promise from S1 has a dependency isirrelevant, as there is nothing it can do about thatexcept to acquire a second provider who may or maynot have a dependency too. The only security therecipient R has is to have a choice of providers. Nomatter how hard the providers S1 and S2 try to keeptheir promises of service, unforeseen circumstancesmay prevent them from doing so. Indeed R may itselfbe negligent receiving their services.

This suggests that, while responsibility for keepinga promise lies with each source agent, only the finalrecipient can be considered responsible for securinga successful promise outcome. It’s up to the clientto acquire alternative sources, so any state shouldcommute across these parallel alternatives. This canbe handled by avoiding state, by leaving state withthe client, or by arranging for consensus about state[40], [41].

We now see the concision and utility of thepromise theoretic view. The assisted promise lawstates that [8]: if an interior promise π by an agentA is converted into an assisted promise π′, throughan intermediate agent I , we can draw a new bound-ary around {A, I} and no memoryless process willbe able to distinguish them, since the semantics ofthe promises are equal and any difference in the

22

Page 23: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

dynamical response times of the promise keepingis not observable without reference measurements tocalibrate with.

IX. APPLICATIONS

Given the length of the paper already, let’s onlybriefly try to review the application of the principlesdescribed in the paper to some key IT areas. Softwaresystems are a superposition of several hierarchicallydependent processes that may change independently:

• Code developments change.

• Version deployment changes.

• Operational runtime changes.

The goal of a developer or system architect is to listthe invariant and variant characteristics of a systemover its separable timescales.

A. Data pipelines in the extended cloud (IoT)

The extended cloud consists of datacentre ser-vices and the coming edge services, where data arecollected, known as the Internet of Things (IoT). Itcomprises fast localized resources where users com-pete for shared services (Amazon, Google, MicrosoftAzure, etc), and slower delocalized resources whichare naturally partitioned and where data originate(mobile phones, home computing, distributed sensornets, etc).

The challenges involve processing data where re-sources are available, avoiding the movement of largeamounts of data unnecessarily.

• Invariant policy for aggregation when datacome together at a hub.

• Invariant promises between independentstages of the pipeline.

• Invariant handling of message promises withnon-deterministic arrival.

• More.

B. Microservices

The development architecture known as microser-vices has become popular for cloud native develop-ment [42], contrasted with the perjorative ‘monolith’.The name seems to refer to the separation of modulesas formally independent processes that may be hostedby different cloud instances. The key innovation hereis in sharding the development process (not just thecode) to enable development agility and full lifecycleownership of code28. It should be clear that this isa non-invariant characterization of a system, whichdepends on fixing a particular scale. If we increasethe boundary of a set of modules to incorporate all

28Previously it was common for development and operations tobe dealt with by separate teams with weak ties. Microservicesembodies an implicit promise for both phases to be managed bythe same team, or equivalently by different members with strongties.

their interactions, we get a new monolith on a largerspatial scale, and on a longer timescale.

Applying the principle of separation of timescalesfor approximate invariance of promises.

• All developer team processes tick on differentclocks (version control submissions may becounted as ticks of the development clock, forinstance). Every hub that integrates changescombines clocks into policy based outcome.

• Code is invariant when it refers to identicalsource code: changes to any part of the code,including all dependencies, as well as plat-form dependencies, count as a new version. Aproper set of promises would include a pack-aging of all dependencies. Due to the sep-aration of containers and platform this maynot be practical today. With future unikernelplatforms, this would be realizable down tothe level of the processor hardware.

• A code base can vary independently of theplatform in which it runs. Similarly, the plat-form may vary independently of the codebaseas a dependency.

• Latency between communicating processesapplies to the runtime communication andto code changes, where dependencies are in-volved. If the promises made by one servicechange, downstream processes also have toadapt.

Modular separation of service instances reducesdependencies between interior details of team-work,and refactors or renormalizes it into computer codedependencies. Key promises are moved from exteriorto interior of agents. Whether promises concern theinterior or exterior of a certain superagent seems notto matter to the outcome of the arbitrary boundaryof the system, since the boundary choice is not aninvariant. It might matter to the teams and the runtimeefficiency of course.

C. Manifesto promises

Some brief comments on some of the manifestopromises that developers are encouraged to keep:

• Elasticity is the promise to spawn new agentsand feed them data at a rate determined bythe width of the bottleneck.

• Responsiveness is a promise to continue ina ‘timely manner’, i.e. to minimize the re-sponse time for imposed requests. To keepthe promise of a consistent response timeassumes that the response is invariant underchanges in the message size, etc. When agentscollectively respond to messages, there mayneed to be coordination and scaling of themessages to inform every partial agent of anarrival (so-called domain events).

23

Page 24: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

• Message based design does not make a sys-tem responsive, because the sending of a mes-sage is not a driver of the response, the sam-pling of the message is. Event driven systemsmay be synchronous or asynchronous, theyare just conditional promises. What makes asystem responsive is the policy (promise) toschedule a dependent agent ‘quickly’ on re-ceiving a message. The challenge is to definewhose clock gets to decide what ‘quickly’means.

• If a promised output depends on receivingmultiple messages, from the same or differentsources (as is common in data pipelines), thenthe policy needs to include specifics abouthow messages arriving at possibly differentrates will be combined and ordered (see theKoalja pipeline, for example [28]).

• Asynchronous messaging is a policy choice,which informally implies a buffer queue be-tween agents, decoupling them at the sourcerather than imposing on them at the receiverend. A strict synchronous system was have todrop many messages because no two systemscan be fully synchronous unless downstreamprocessing bandwidth is always greater thanupstream, at every stage (like a river delta).

• Failures are contained within each compo-nent, isolating components from each otherand thereby ensuring that parts of the systemcan fail and recover without compromisingthe system as a whole. Recovery of eachcomponent is delegated to another (external)component and high-availability is ensured byreplication where necessary. The client of acomponent is not burdened with handling itsfailures.

• Reactive Systems [39] can adapt to changesin the input rate by increasing or decreasingthe resources allocated to service these inputs.This implies designs that try to eliminate con-tention points or central bottlenecks, resultingin the ability to shard or replicate componentsand distribute inputs among them. ReactiveSystems support predictive, as well as Reac-tive, scaling algorithms by providing relevantlive performance measures.

• Reactive Systems rely on asynchronousmessage-passing to establish a boundary be-tween components that ensures loose cou-pling, isolation and location transparency.This boundary also provides the means todelegate failures as messages. Employing ex-plicit message-passing enables load manage-ment, elasticity, and flow control by shapingand monitoring the message queues in thesystem and applying back-pressure when nec-essary. Location transparent messaging as ameans of communication makes it possiblefor the management of failure to work withthe same constructs and semantics across a

cluster or within a single host.

• Redundancy enables a client for whom atransaction fails to promise a ‘retry’ fromits side of a binding, with an alternativeprovider. The redundant choices only needto be ‘good enough alternatives’ to satisfya client. In practice, they might be subtlydifferent versions, a different model of vehicle(‘I’m sorry we don’t have your first choicetoday’). In reliable services, like TCP, usersare guaranteed some kind of a response (butnot a delivery time), but and the stateful na-ture of the delivery mechanism (which routesare taken by packets) are not revealed to theend users. That doesn’t make TCP stateless,but it just shifts the responsibility for keepingstate into a shared responsibility, in which theclient assumes its natural downstream role.

• Consistency. In the IT industry, the re-sponsibility for promise outcomes is al-most uniquely apportioned to the serviceprovider—we require the promise of data con-sensus between alternative providers, which isimpossible in general over a finite interval ofarbitrary time. A safer strategy is to renderclients insensitive to the variations amongstredundant alternatives, using intrinsic stabilityof fixed points, and idempotent operations(see section VII-C). One tries to make re-dundant agents indistinguishable from oneanother [5]. One way to do this is to removetheir dependence on interior state. However,purely stateless behaviour is neither efficientnor desirable because it doesn’t take careof adapting to user needs without multiply-ing every possible combination of choicesas statically independent pathways through asystem. This applies to the arguments about‘immutability’ too. The cost of redefiningstate from being runtime to an initial condi-tion may be high.

X. SUMMARY

The reproducibility and functional stability ofsystems of agents depend on a few key principles,of which the separation of dynamical scales is themost important. In systems engineering, the engineerbasically figures out how to distribute a collection ofprocess promises using the criteria of state localiza-tion and longevity alongside a graph of conditionalcausal influence. Popular discourse is imprecise in itsterminology. This paper offers a set of concepts thatare not wrong, which could be used as a referencemodel.

On the matter of statelessness, non trivial pro-cesses can never be fully stateless, but independenceof certain states over certain regions of a systemcan be strategically motivated. When we talk aboutstatelessness, there is an implicit downstream observerin the picture—an agent that will receive an outcomethat acts in the role of observer—perhaps a service

24

Page 25: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

client. When a service is stateless, the client cannotdistinguish outcomes based on its history of priorinteractions (the client doesn’t make a mark or leave adent in the server!). The same principle applies client-server interactions like classic Web applications, andto data pipelines [28]. In either case one is definitelyinterested in being able to store data with integrityin a service: storage servers, filesystems, databases,and caches all make this promise explicitly. Thegoal is to enable stateful behaviour without loss ofefficiency, continuity, or stability on the scale of atotal application. By making use of agent autonomy,developers often try to isolate what are commonlyreferred to as ‘fault domains’.

From what I can tell, statelessness is something ofa red herring in the story of reliability and scalability,as it’s not an invariant characterization; it refers to apreferred scale and viewpoint. The proper invariantsin a process are the promises, including conditionalswitching rules, that link process agents into a system,and how interactions are localized during their keep-ing. Rhetoric aside, the actual goal of ‘statelessness’seems not to be to abhor state, but rather to containit inside transactional elements, whose outcome is nottrusted until the promised outcome has been assessedas kept. Containment has nothing to do with faultlocalization; rather, it is a prerequisite for transactionalchange to be preserved together for long enough tocomplete the transactions under invariant conditions.

Localization of state (the scope of memory be-haviour) concerns a tradeoff between decision flowand resources:

• Locally stateless: means ‘memoryless’, whichmay imply limited transactional invertibility,but long-lived stability and sustainability. Wewould be better served by noting which agentsare memoryless in their transformations.

• Stateful: means ‘memory process’. Whichchanges depend not only on incoming databut on the current state of the agent? Allprocesses are stateful on some scale. Thestateful parts bring potential for fragility andpossibly non-linear behaviour, and may beunsustainable unless there is voluntary obso-lescence of history29.

• Fixed point behaviour: a memory processreferring to an invariant future state, withintrinsic process stability, which does not re-quire any runtime memory, a priori, only amaintenance process that counters state driftand converges, or the absence of completeisolation from external change [10], [29].

The strategy of engineering around fixed points stillgoes highly unappreciated across software engineeringand management [32]. The legacy of industrial com-moditization is still with us, and out old-fashionedthinking favours the pattern of replacing defective

29While memory is crucial for some tasks, it shouldn’t beaccumulated without good reason—that’s called a memory leak.

parts with fresh ‘clean’ parts (like changing the airfilters). This process favours the builder, but may be awasteful strategy for the system as a whole, especiallywhen repair can be automated cheaply. Because statedrift can’t be prevented in practice, we need eithermaintenance over some timescale, or voluntary obso-lescence (apoptosis) of process. There will need to bea greater focus on the benefits of advanced causationto scale sustainable in the future. Disposability ofsystems and their runtime state in processes is linkedto transactional breakdown of process, which in turnis linked to a message strategy. There is nothingwrong with these approaches, but they may not besubstantially better from all viewpoints—they favoura developer viewpoint rather than a client viewpoint,a system viewpoint, or a sustainability viewpoint.

The remaining freedoms in process design liein the avoidance of serial contention (as Amdahl’slaw) and mutual coherence (distributed consistency),by partitioning activity into independent timelines.Locality of dynamics through service access pointsaddresses the trade-off between space and time:

• Parallel or partitioned agents: implies shorterqueueing, per agent or partition, at the ex-pense of more agents over more space toconfigure, maintain, and power.

• Serialised monolithic agents: implies feweragents, i.e. less space used, perhaps at theexpense of longer response time due to con-tention in queues or collisions.

This is not the same as the decomposition of semanticsin code within boundaries.

Finally, let me mention a few words about memoryprocesses given the current focus on learning systems,such as in so-called ‘Artificial Intelligence’, etc. Anylearning system is a memory process, by design.This also applies to any system that keeps memorythat may exert an influence over causally relatedoutcomes. Reasoning processes are state machines, byany measure—attempting to describe them as purelyballistic transactional phenomena, by focusing on onlya small part of their processes, only delays inevitableconsequences.

Locality of promises, at a stated scale, may be thepreferred way to describe behaviours relative to thecost and availability of collaborative resources. Theunspoken assumption in a lot of cases is that ‘localinterior resources’ are cheaper and faster than ‘non-local remote resources’, and that ‘remote resources’are safer from failures by spreading the potential forfailures over a wider area. This has all been turnedupside down several times in the virtualized worldthough. New technologies alter these basic assump-tions frequently, so we need an approach based onthe idea that the perturbations leading to faults arethemselves phenomena with scales of their own.

Acknowledgment: I’m grateful to Kenny Bastani,Jonas Boner, Andrew Cowie, and Daniel North forcommenting on drafts and offering references. Thanks

25

Page 26: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

also to Colin Breck and Tristran Slominski for athorough reading.

APPENDIX

We can supply an invariant definition of mem-orylessness using stochastic processes. Statelessnessrefers to memorylessness in the sense of a Markovprocess: the dependence on current state is inevitableas long as there is input to a process, but dependenceof behaviour on an accumulation of state over manyiterations is what people usually mean by statefulbehaviour.

Variables are embedded agents that keep simplepromises to remember a value. A stateful processaccumulates parametric data over a number of inte-rior times t0, t1, t2, . . ., so that we could write eachpromise made by the process as a function of all thosetimes:

π : A+V [d(t0),d(t1),...] | d(t0),d(t1),...−−−−−−−−−−−−−−−−−−−−−→ A′

' A+V [d(t)] | d(t)−−−−−−−−−→ A′ (78)

which schematically means that

∂π

∂d6= 0, (79)

∂π

∂t6= 0. (80)

i.e. the promise made by the agent is not constantover the interactions it promises. The memory d(t)is accumulated from some initial time, making thepromise evolve. This is called a memory process.The converse of a memory process is a memorylessprocess. A memoryless process may be constant intime, or it can depend on only that last known state,like a ballistic trajectory (imagine billiard balls whosechange in behaviour is entirely determined by the lastball to strike them)30.

Memoryless processes are also called Markov pro-cesses (see figure 5). Their behaviour is ‘ballistic’in the sense that the arrival of a prerequisite stateeffectively triggers the release of what is promised byeach agent. A Markov process is usually described asa chain of agents that satisfy a condition about randomprocesses, and based in probabilities [43].

Definition 26 (Markov Chain): Let Xn be a dis-crete random variable, for non-negative integer n =0, 1, 2, . . ., taking values in {x}.

P (Xn = s | X0 = x0, . . . , Xn−1 = xn−1) =

P (Xn = s | Xn−1 = xn−1) (81)

for all n ≥ 1 and s ∈ {x}. A Markov process has atransition matrix or scattering matrix for the discrete setof states Xn:

T(n)ij = P (Xn+1 = j|Xn = i) (82)

30This is an interesting example because in Newtonian mechan-ics, a collision may be memoryless, but the trajectory isn’t. Themomentum of balls is conserved and remembers the sum effect ofprior collisions. This is one of many ways to illustrate how memoryand state are scale dependent.

If this transition matrix is independent of n, i.e. T (n)ij =

pij , for all n, then the chain is said to be homogeneous ortranslationally invariant.

Probabilities, in the usual sense are globally defined,but we can replace them with assessments in PromiseTheory, which are the local equivalent. Each observeragent O in a system may form its own assessmentαO(π) of the probability that a promise π will bekept, for any definition of probability. Then the abovedefinitions apply for any local observer by the asso-ciation:

O(n)ij = αO(Xn+1 = j|Xn = i). (83)

In a Markov scattering process, each input leadsto a unique output by a fixed rule. The scatteringdoesn’t depend on the order of the inputs nor theirrelative frequencies. The scattering matrix does notremember past inputs; everything depends on the lastone. This makes the scattering agent autonomous orcausally independent31.

Definition 27 (Causally independent): An agent iscausally invariant under a promised influence x if it doesnot depend on a parameter x, so that.

x→ x′implies−→ V (x) = V (x′), ∀x, x′. (84)

In a quasi-differential shorthand, we might also betempted to write:

∂V

∂x= 0. (85)

Finally, we should note that this should not be taken tomean that V is differentiable, as no such mathematicalproperty exists in the real world, but we can con-struct state space extended generalizations that includeaveraging, and so on, so I’ll ask the forbearance ofreaders and follow common practice and use this asa shorthand for the expression of independence of Von some parameter x. For a popular discussion of themeaning of this, see [27].

REFERENCES

[1] A. Wiggins. The twelve-factor app. 12factor.net, 2017.[2] J. Boner. Stateful service design considerations for the

kubernetes stack. InfoQ, Architecture and Design, 2018.[3] J. Boner. Serverless needs a bolder, stateful vision. The

Newstack, Serverless, 2019.[4] M. Burgess. Spacetimes with semantics (ii).

http://arxiv.org/abs/1505.01716, 2015.[5] M. Burgess. From observability to significance in distributed

systems. arXiv:1907.05636 [cs.MA], 2019.[6] M. Burgess and W. Louth. Preserving the significance of

distributed observations. unpublished, 2019.[7] M. Burgess. A Treatise on Systems: Volume 2: Intentional

systems with faults, errors, and flaws. in progress, 2004-.[8] J.A. Bergstra and M. Burgess. Promise Theory: Principles

and Applications. χtAxis Press, 2014.[9] H. Lewis and C. Papadimitriou. Elements of the Theory of

Computation, Second edition. Prentice Hall, New York, 1997.

31This may also be used as a definition of the autonomy of theagents.

26

Page 27: Locality, Statefulness, and Causality in Distributed ... · distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful

[10] M. Burgess. Configurable immunity model of evolving con-figuration management. Science of Computer Programming,51:197, 2004.

[11] S. Traugott and J. Huddleston. Bootstrapping an infras-tructure. Proceedings of the Twelth Systems AdministrationConference (LISA XII) (USENIX Association: Berkeley, CA),page 181, 1998.

[12] S. Traugott. Why order matters: Turing equivalence in auto-mated systems administration. Proceedings of the SixteenthSystems Administration Conference (LISA XVI) (USENIXAssociation: Berkeley, CA), page 99, 2002.

[13] L. Kanies. Isconf: Theory, practice, and beyond. Proceedingsof the Seventeenth Systems Administration Conference (LISAXVII) (USENIX Association: Berkeley, CA), page 115, 2003.

[14] A. Couch, J. Hart, E.G. Idhaw, and D. Kallas. Seekingclosure in an open world: A behavioural agent approach toconfiguration management. Proceedings of the SeventeenthSystems Administration Conference (LISA XVII) (USENIXAssociation: Berkeley, CA), page 129, 2003.

[15] M.I. Seltzer and C. Small. Self-monitoring and self-adaptingoperating systems. Proceedings of the Sixth workshop onHot Topics in Operating Systems,Cape Cod, Massachusetts,USA. IEEE Computer Society Press, 1997.

[16] T. Moors. A critical review of ”end-to-end arguments insystem design”. In 2002 IEEE International Conference onCommunications. Conference Proceedings. ICC 2002 (Cat.No.02CH37333), volume 2, pages 1214–1219 vol.2, April2002.

[17] P. Helland. Life beyond distributed transactions: an apostate’sopinion. ACM Queue (Distributed Computing), 14(5), 2016(2007).

[18] Brendan Burns. How kubernetes changes operations. ;login:,40(5), 2015.

[19] Leslie Lamport. Time, clocks, and the ordering of events ina distributed system. Commun. ACM, 21(7):558–565, July1978.

[20] P. Helland. Immutability changes everything. ACM Queue(Databases), 13(9), 2016.

[21] P. Borrill, M. Burgess, A. Karp, and A. Kasuya. Spacetime-entangled networks (i) relativity and observability of stepwiseconsensus. arXiv:1807.08549 [cs.DC], 2018.

[22] M. Burgess and A. Couch. On system rollback and totalizedfields: An algebraic approach to system change. J. Log.Algebr. Program., 80(8):427–443, 2011.

[23] C.E. Shannon and W. Weaver. The mathematical theory ofcommunication. University of Illinois Press, Urbana, 1949.

[24] T.M. Cover and J.A. Thomas. Elements of InformationTheory. (J.Wiley & Sons., New York), 1991.

[25] M. Burgess. Smart Spacetime. χtAxis Press, 2019.[26] M. Burgess. A site configuration engine. Computing systems

(MIT Press: Cambridge MA), 8:309, 1995.[27] M. Burgess. In Search of Certainty: the science of our

information infrastructure. Xtaxis Press, 2013.[28] M. Burgess and E. Prangsma. Koalja: from data plumbing to

smart workspaces in the extended cloud. arXiv:1907.01796[cs.DC], 2019.

[29] M. Burgess. On the theory of system administration. Scienceof Computer Programming, 49:1, 2003.

[30] C. Hewitt, P. Bishop, and R. Steiger. A universal modularactor formalism for artificial intelligence. In Proc. of the 3rdIJCAI, pages 235–245, Stanford, MA, 1973.

[31] M. Stonebraker. The case for shared nothing architecture.Database Engineering, 9(1), 1986.

[32] M. Burgess. A Treatise on Systems: Volume 1: Analyticaldescription of human-information networks. in progress,2004-.

[33] C. Breck. From a time-series database to a key operationaltechnology for the enterprise: Part ii, March 2018.

[34] R. Hickey. Are we there yet? Talk at QConn,2009. http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey.

[35] J. Boner. Life beyond the illusion of the present (talk atvoxxeddays, zurich). 2016.

[36] K. Carter. Broken promises (talk at jfokus). 2018.[37] S.A. Kripke. Semantical considerations in modal logic. Acta

Philosophica Fenica, 16:83–94, 1963.[38] H. Everett. Theory of the Universal Wavefunction. PhD

thesis, Princeton, 1956.[39] J. Boner, D. Farley, R. Kuhn, and M. Thompson. The reactive

manifesto. https://www.reactivemanifesto.org/.[40] Leslie Lamport. Paxos Made Simple. SIGACT News,

32(4):51–58, December 2001.[41] Diego Ongaro and John Ousterhout. In search of an un-

derstandable consensus algorithm. In Proceedings of the2014 USENIX Conference on USENIX Annual TechnicalConference, USENIX ATC’14, pages 305–320, Berkeley,CA, USA, 2014. USENIX Association.

[42] S. Newman. Building Microservices. O’Reilly, 2015.[43] G.R. Grimmett and D.R. Stirzaker. Probability and random

processes (3rd edition). Oxford scientific publications, Ox-ford, 2001.

27