morpheus: safe and flexible dynamic updates for sdnsmwh/papers/morpheus-submitted.pdf · using...

Morpheus: Safe and Flexible Dynamic Updates for SDNs

Karla SaurUniversity of [email protected]

Joseph CollardUMass Amherst

[email protected]

Nate FosterCornell University

[email protected] Guha

UMass [email protected]

Laurent VanbeverETH Zurich

[email protected]

Michael HicksUniversity of [email protected]

ABSTRACTSDN controllers must be periodically modified to add fea-tures, improve performance, and fix bugs, but current tech-niques for implementing dynamic updates are inadequate.Simply halting old controllers and bringing up new onescan cause state to be lost, which often leads to incorrectbehavior—e.g., if the state represents hosts blacklisted by afirewall, then traffic that should be blocked may be allowedto pass through. Techniques based on record and replay canreconstruct state automatically, but they are expensive to de-ploy and can lead to incorrect behavior. Problematic scenar-ios are especially likely to arise in distributed controllers andwith semantics-altering updates.

This paper presents a new approach to implementing dy-namic controller updates based on explicit state transfer. In-stead of attempting to infer state changes automatically—anapproach that is expensive and fundamentally incomplete—our framework gives programmers effective tools for imple-menting correct updates that avoid major disruptions. Wedevelop primitives that enable programmers to directly (andeasily, in most cases) initialize the new controller’s state asa function of old state and we design protocols that ensureconsistent behavior during the transition. We also present aprototype implementation called Morpheus, and evaluate itseffectiveness on representative case studies.

1. INTRODUCTIONSoftware-defined networking (SDN) controllers are com-

plex software systems that must simultaneously imple-ment a range of interacting services such as topologydiscovery, routing, traffic monitoring, load balancing,authentication, access control, and many others. Likeany non-trivial system, SDN controllers must be peri-odically updated to add features, improve performance,and fix bugs. However, in many networks, downtimeis unacceptable, so updates must be deployed dynam-ically, while the network is in operation and in such away as to minimize disruptions.

In general, dynamic updates differ from static onesin that while modifying the program code they mustalso be concerned with the current execution state. In

SDN, this state can be divided into the internal statestored on controllers (e.g., in memory, file systems, ordatabases), and the external state stored on switches(e.g., in forwarding rules). A key challenge is thatupdated code may make different assumptions aboutstate—e.g., using different formats to represent internaldata structures or installing different rules on switches.This challenge is exacerbated in SDN, where state doesnot reside at a single location but is instead distributedacross multiple controllers and switches.

Existing approaches. Most SDN controllers today em-ploy one of two strategies for performing dynamic up-dates, distinguished by how they attempt to ensure cor-rect, post-update execution state.• In simple restart, the system halts the old controllerand begins executing a fresh copy of the new controller.In doing so, the internal state of the old controller isdiscarded (except for persistent state—e.g., stored ina database), under the assumption that any necessarystate can be reconstructed by the new controller afterthe restart. One simple strategy for recovering inter-nal state is to delete the forwarding rules installed onswitches so future network events are sent to the con-troller, which can use them to populate its state. Thisbehavior is available by default in open-source SDNplatforms such as POX [5] and Floodlight [2].• In record and replay, the system maintains a trace ofnetwork events received by the old controller. During anupdate, the system first replays the logged events to thenew controller to “warm up” its internal state and thenswaps in the new controller for the old one. By givingthe new controller access to the events that were used togenerate the internal and external state for the old con-troller, it is possible to avoid the issues that arise withless direct mechanisms for reconstructing state. Recordand replay has been used effectively in several previoussystems including HotSwap [30], OpenNF [11], and a re-cent system for managing middleboxes [28]. A relatedapproach is to attempt to reconstruct the state fromthe existing forwarding rules on the switches, ratherthan from a log. According to private discussions with

1

SDN operators, this approach is often adapted by proac-tive controllers that do not make frequent changes tonetwork state (e.g., destination-based forwarding alongshortest paths).

Unfortunately, neither approach constitutes a general-purpose solution to the dynamic update problem: theyoffer little control over how state is reconstructed andcan impose excessive performance penalties. Simplerestarts discard internal state, which can be expensiveor impossible to reproduce. In addition, there is noguarantee that the reconstructed state will be harmo-nious with the assumptions being made by end hosts—i.e, existing flows may be routed along different paths oreven to different destinations, breaking socket connec-tions in applications. Record and replay can reproducea harmonious state, but requires a complex logging sys-tem that can be expensive to run, and still providesno guarantees about correctness—e.g., in cases wherethe new controllers would have generated a different setof events than the old controller did. Reconstructingcontroller state from existing forwarding rules can belaborious and error prone, and is risky in the face of in-evitable switch failures. We illustrate these issues usingexamples in Section 2.

Our approach: Update by state transfer. The techniquesjust discussed indirectly update network state after anupdate. This paper proposes a more general and flexi-ble alternative: to properly support dynamic updates,operators should be able to directly update the internalstate of the controller, as a function of its current state.We call this idea dynamic update by state transfer.

To support this dynamic update technique, controllersmust offer three features: (1) they need a way of mak-ing their internal state available; (2) they need a way ofinitializing a new controller’s state, starting from (a pos-sibly transformed version of) the old controller’s state;and (3) they need a way to coordinate behavior acrosscomponents when updates happen, to make sure thatthe update process yields a consistent result. This basicapproach has been advocated in prior work on dynamicsoftware updates, which has shown that these require-ments are relatively easy to meet [13, 21].

Update by state transfer directly addresses the per-formance and correctness problems of prior approaches.There is no need to log events, and there is no need toprocess many events at the controller, whether by re-playing old events or by inducing the delivery of newones by wiping rules. Moreover, the operator has com-plete control over the post-update network state, en-suring that it is harmonious with the network—e.g.,preserving existing flows and policies. The main costsare that the network programmer must write a func-tion (we call it µ) to initialize the new controller state,given the old controller state, and the controller plat-

form must provide protocols for coordinating across dis-tributed nodes.

Fortunately, our experience (and that of dynamic soft-ware updates generally [13, 21]) is that for real-worldupdates this function is not difficult to write and canoften be partially automated. The changes to the con-troller platform needed to support state initializationand coordination between nodes adds complexity, butthey are one-time changes and are not difficult to im-plement. Moreover, the cost of coordinating controllernodes is likely to be reasonable, since the distributednodes that make up the controller are likely to be rela-tively small and either co-located on the same machineor connected through a fast, mostly reliable network.

Morpheus: A controller with update by state transfer.We have implemented our approach in Morpheus, a newdistributed controller platform based on Frenetic [10, 7,22], described in Section 3. Our design is based on adistributed architecture similar to the one used in indus-trial controllers such as Onix [17] and ONOS [8]. Weconsidered modifying a simpler open-source controllersuch as POX or Floodlight [5, 2], but decided to build anew distributed controller to provide evidence that up-date by state transfer will work in industrial settings.

Morpheus employs a NIB, basic controller replicas,and standard applications for computing and installingforwarding paths, each running as separate applications.Persistent internal state is stored in the NIB, which canbe accessed by any application. When an applicationstarts (or restarts, e.g., after a crash) it connects to theNIB to access the state it needs, and publishes updatesto the state while it runs. Applications coordinate ruledeployments to switches via controller replicas, whichuse NetKAT [7] to combine policies into a unified pol-icy, and can use consistent updates [22] to push rules toswitches.

Supporting update by state transfer requires only afew additions to Morpheus’s basic design, described inSection 4. The relevant state is already available formodification in the NIB, so we just need a means ofmodifying that state to work with the new versions.

We also need to coordinate the update across the af-fected applications. To see why this is important, con-sider a situation in which we have several routing ap-plication replicas, each responsible for a subset of theoverall collection of switches. Now suppose we wish todeploy a dynamic update that changes which paths for-ward traffic through the network. It is clear we must up-date all of the replicas in a coordinated manner, or elsesome of the replicas could implement old paths and oth-ers implement new paths, leading to anomalies includ-ing loops, black holes, etc. Morpheus’s simple coordina-tion protocol operates in three steps: (1) quiescence—the affected applications are signaled and paused before

2

the update begins; (2) installation—the µ function isregistered with the NIB for the purposes of transform-ing the state; and (3) restart—the updated applicationsare restarted, using µ to update the NIB state (in a co-ordinated way). After the state is updated, they sendupdated policies to the controller replicas which com-pose them and generate rules to install on switches.

Using Morpheus we have written several applications,and several versions of each, including a stateful fire-wall, topology discovery, routing, and load balancing.Through a series of experiments, described in Section 5,we demonstrate the advantages of update by state trans-fer, compared to simple restarts and record-and-replay.In essence, there is far less disruption, and no incorrectbehavior. We also find that the µ functions are rela-tively simple, and an investigation of changes to open-source controllers suggests that µ functions for realisticapplication evolutions would be simple as well.

Summary. This paper’s contributions are as follows:

• We study the problem of performing dynamic up-dates to SDN controllers and identify fundamentallimitations of current approaches.

• We propose a new, general-purpose solution to dy-namic update problem for SDNs—dynamic updateby state transfer. With this solution, the program-mer explicitly transforms old state to be used withthe new controller, and an accompanying protocolcoordinates the update across distributed nodes.

• We describe a prototype implementation of theseideas in the Morpheus system.

• We present several applications as case studies aswell as experiments showing that Morpheus imple-ments updates correctly and with far less disrup-tion than current approaches.

Next, we present the design and implementation of Mor-pheus (§2-4), our experimental evaluation (§5), and adiscussion of related work and conclusion (§6-7).

2. OVERVIEWThis section explains why existing approaches for han-

dling dynamic updates to SDN controllers are inade-quate in general, and provides detailed motivation forour approach based on state transfer.

2.1 Simple RestartAs an example, suppose the SDN controller imple-

ments a stateful firewall, as depicted in Figure 1. Thetopology consists of a single switch connected to trustedinternal hosts and untrusted external hosts. Initiallythe switch has no forwarding rules, so it diverts allpackets to the controller. When the controller receivesa packet from a trusted internal host, it records the

Switch

ConnectionsipSrc

10.0.1.1

ipDst legitimate

192.168.1.2 Y

10.0.1.1 192.168.1.3 Y

192.168.99.1 10.0.1.1 N

Internal Hosts External Hosts

Stateful Firewall

Figure 1: Example application: stateful firewall.

internal-external host pair in its (internal) state andpunches a hole in the firewall so the hosts can commu-nicate. Conversely, if the controller receives a packetfrom an external host first, it logs the connection at-tempt and drops the packet.

Now suppose the programmer wishes to update thefirewall so that if an external host tries to initiate morethan n connections, then it is blacklisted from all fu-ture communications. With simple restart, the old con-troller would be swapped out for a new controller thatcontains no record of connections initiated by internaland external hosts. In the absence of any other informa-tion about the state prior to the update, the controllerwould delete the rules installed on the switch to matchits own internal state, which is empty. This leads to acorrectness problem:1 If the external host of an activeconnection sends the first few packets after the rules arewiped, then those packets will be interpreted as unso-licited connection attempts. The host could easily beblacklisted even though it is merely participating in aconnection initiated previously by an internal host.

This problem stems from the fact that the old con-troller’s state is discarded by the simple restart. Inthis example, it could be avoided by storing key inter-nal state outside of the controller process’s memory—e.g., in a separate network information base (NIB), asis done in controllers such as Onix [17]—and indeed, wedo exactly this in our Morpheus controller. However, ingeneral, safe dynamic updates require more than exter-nalized state, as we discuss in Section 2.3—e.g., in thecase that the new version expects the state in a newformat and multiple controllers share this state.

2.2 Record and ReplayAt first glance, record and replay seems like it might

offer a fully automatic solution to dynamic controller

1It may also be disruptive: if unmatched traffic is sent tothe controller, then the new controller will essentially inducea DDoS attack against itself as a flood of packets stream in.

3

Switch

Active FlowsipSrc

10.0.1.1

tcpSrc replica

6782 1

10.0.1.1 8153 2

10.0.1.2 8728 3

External Hosts Server Replicas

Load Balancer

Figure 2: Example application: load balancer.

updates. The HotSwap (HS) system [30], a noteworthyexample of this approach, records a trace of the eventsreceived by the old controller and replays them to thenew controller to “warm up” its internal state beforeswapping it in. For the stateful firewall, HS would re-play the network events for each connection initiatedby an internal host and so would easily reconstruct theset of existing connections, avoiding the problems withthe simple restart approach. Moreover, because recordand replay works with events drawn from a standardAPI like OpenFlow, it is fully “black box”—the im-plementation details of the old and new controllers areimmaterial.

Unfortunately, record and replay has several limita-tions that prevent it from being a full solution to thedynamic update problem. One obvious issue is over-head: in general, unless the system has prior knowledgeof the new controller’s functionality (which it will not, ingeneral), the system will have to record (and replay) allrelevant events that contributed to the network’s cur-rent state. Doing this can be prohibitively expensive ina large, long-running network.

Another issue is that the recorded trace may notmake sense for the new controller, and therefore replay-ing it may result in an incorrect state, for the followingreasons. The new controller may, in general, behave dif-ferently than the old one—e.g., it may install differentforwarding rules on switches. As such, if the new con-troller had been used from the start, these rules mighthave caused a different set of network events to be gen-erated than those that were actually recorded. Suchevents could have been induced directly due to differentrules (e.g., because they handle fewer or more pack-ets compared to the old rules) or they might have beeninduced indirectly (e.g., because the new rules elicit dif-ferent responses from the hosts that are communicatingvia the network). Establishing that an update is cor-rect under these circumstances is extremely difficult in

general.To illustrate, consider the example of a server load

balancer, as depicted in Figure 2. The topology con-sists of a single switch with one port connected to anetwork of external hosts and another n ports connectedto back-end server replicas. Initially, the switch has norules, so all packets are diverted to the controller. Uponreceiving a new connection from an external host, thecontroller picks a server replica (e.g., uniformly at ran-dom) and installs rules that forward traffic in both di-rections between the host and the selected server. Thecontroller also records the selected server in its internalstate (e.g., so it can correctly repopulate the forwardingrules if the switch drops out and later reconnects).

Now suppose the programmer wishes to dynamicallydeploy a new version of the controller where the selec-tion function selects the least loaded server and alsoputs a cap c on the number of open connections toany given server, and refuses connections that wouldcause a servers to exceed that cap. During replay, thenew controller would receive a network event for eachexisting connection request. However, it would remapthose connections to the least loaded server instead ofthe server previously selected by the old controller. Ingeneral, the discrepancy between these two load balanc-ing strategies will break connection affinity—a differentserver replica may receive the ith packet in a flow andreset the connection.

Attempting to reconstruct the controller state fromquerying the switch state could also be problematic. Al-though the new controller would have the informationneeded to generate forwarding rules that preserve con-nection affinity, writing the controller to retrieve thisinformation is potentially laborious, error-prone workfor the programmer. And it may require modificationsto the code; e.g., if the new controller uses staticallyallocated data structures to keep track of active flows(something that is possible due to the cap c), it may beincorrect to exceed the cap. We would prefer a solutionthat is simpler and more systematic.

2.3 Solution: Update by state transferThis paper proposes a different approach to the SDN

dynamic update problem. Rather than attempting todevelop fully automated solutions that handle certainsimple cases but are more awkward or impossible in oth-ers, we propose a general-purpose solution that attacksthe fundamental issue: dynamically updating the state.The above approaches attempt to indirectly reconstructa reasonable state, but they lack sufficient precision andperformance to fully solve the problem.

Our approach, which we call update by state transfer,solves the dynamic update problem by giving the pro-grammer direct access to the running controller’s state,call it σ, along with a way enabling the new controller

4

with an existing state, call it σ′, such that the new statecan be constructed as a function, call it µ, of the oldstate so that σ′ = µ(σ). In addition, our approach re-quires a means to signal the controller that an updateis available so that it can quiesce prior to performingthe update. This mechanism ensures that σ is consis-tent (e.g., is not in the middle of being changed) beforeusing µ to compute σ′.

Consider the problematic examples presented thusfar. For both the firewall update and the load balanc-ing update, the state transfer approach is trivial andeffective: setting the µ function to a no-op (i.e., iden-tity function) grandfathers in existing connections andthe new semantics is applied to new connections. Pleas-antly, for the load-balancing update, any newly addedreplicas will receive all new connection requests untilthe load balances out.

Another feature of update by state transfer is that itpermits the developer to more easily address updatesthat are backward-incompatible, such as the load bal-ancer with a cap c discussed above. In these situations,the current network conditions may not constitute onesthat could ever be reached had the new controller beenstarted from scratch. With state transfer, the operatorcan either allow this situation temporarily by preserv-ing the existing state, with the new policy effectivelyenforced once the number goes below the cap. Or shecan choose to kill some connections, to immediately re-spect the cap. The choice is hers. By contrast, priorapproaches will have unpredictable effects: some con-nections may be reset while others may be inadvertentlygrandfathered in, unbeknownst to the controller.

In addition to its expressiveness benefits, update bystate transfer has benefits to performance: it adds nooverhead to normal operation (no logging), and far lessdisruption at update-time (only the time to quiesce thecontroller and update the state). The main cost is thatthe network service developer needs to write µ, whichwill not always be a no-op. For example, if we updateda routing algorithm from using link counts to usingcurrent bandwidth measurements, the controller statewould have to change to include this additional state.Fortunately, according to our experience (and that of asubstantial body of work in the related area of dynamicsoftware updating), µ tends to be relatively simple, andits construction can be at least partially automated.

3. MORPHEUS CONTROLLERTo provide a concrete setting for experimenting with

dynamic SDN updates, we have implemented a newdistributed controller called Morpheus, implemented inPython and based on the Frenetic libraries [10, 7, 22].Our design follows the basic structure used in a num-ber of industrial controllers including Onix [17] andONOS [8], but adds a number of features designed to

Platform

EventsConfigs

NIBfw_allowed

10.0.0.1:3456

10.0.0.2:80

Switch Switch Switch

RoutingTopologyFirewall

Platform

UpdateCoordinator

Install

Pause

Put

Get

Figure 3: Morpheus architecture.

facilitate staging and deployment of live updates. Thismeans that it should be easy to adapt update techniquesdeveloped in the context of Morpheus for use in othercontrollers as well. We should note that our aim is tosupport updates to the applications running on the con-troller, but not necessarily the controller itself. In thefuture, we plan to investigate extensions that will alsosupport updates to the controller infrastructure—e.g.,migrating to a new protocol for communicating withSDN switches.

3.1 ArchitectureMorpheus’s architecture is shown in Figure 3. The

controller is structured as a distributed system in whichnodes communicate via well-defined message-passing in-terfaces. Morpheus provides four types of nodes:

• platform nodes (platform), which are responsi-ble for managing low-level interactions with SDNswitches and interfacing with applications,

• a network information base (nib), which providespersistent storage for application state,

• an update coordinator (updc), which implementsdistributed protocols for staging and deploying up-dates, and

• application nodes (topology, routing, etc.), whichimplement specific kinds of functionality, such asdiscovering topology or computing shortest pathsthrough the topology.

Each node executes as a separate OS-level process, sup-porting concurrent execution and isolation. Processesalso make it easy to use existing OS tools to safelyspawn, execute, replicate, kill, and restart nodes.

3.2 ComponentsWe now describe Morpheus’s components in detail.

5

Platform. The most basic components are platformnodes, which implement basic controller functionality:accepting connections from switches, negotiating fea-tures, responding to keep-alive messages, installing for-warding rules, etc. The platform nodes implement asimple interface that provides commands for interactingwith switches:

• event() returns the next network event,

• update(pol) sets the network configuration to pol,specified using NetKAT [7],

• pkt out(sw,pkt,pt) injects packet pkt into thenetwork at sw and pt,

as well as commands for synchronizing with the updcduring dynamic updates:

• pause() temporarily stops propagating configura-tions to the network, and

• resume() resumes propagating configurations.

When multiple Morpheus applications are operating,the platform nodes make every network event avail-able to each application by default. If needed, filter-ing can be applied to prevent some applications fromseeing some network events. Likewise, the policies pro-vided by each application are combined into a singlenetwork-wide policy using NetKAT’s modular compo-sition operators [7]. For scalability and fault tolerance,Morpheus would typically use several platform nodesthat each manage a subset of the switches. These nodeswould communicate with each other to merge their sep-arate event streams into a single stream, and similarlyfor NetKAT policies. For simplicity, our current imple-mentation uses a single platform node to manage allof the switches in the network.

Network Information Base. Morpheus applications storepersistent state in the nib. The information in the nibis guaranteed to be preserved across application exe-cutions, thereby avoiding disruption if individual ap-plications stop and restart. The nib provides a simpleinterface to a NoSQL key-value store, which can be usedto store persistent state.2 Morpheus’s store is currentlybased on Redis [6] and although it currently uses a singlenode, Redis supports clustering for better scalability.

Data stored in the NIB is divided among concep-tual namespaces, organized according to the applica-tions that use it. For example, a firewall applicationmight store information in the nib in the fw allowed

namespace about which hosts are currently allowed. Anapplication may access data in multiple namespaces,

2Obviously, applications may also maintain their own in-memory state for efficiency reasons, but this state is lost onrestart.

where it might be the conceptual data owner for one,but a consumer of another. For example, our topol-ogy application discovers the structure of the networkby interacting with the platform nodes, and storesthe topology persistently in the topology namespace.

Redis does not support namespaces directly (someother NoSQL databases do) so we encode the names-pace as a prefix of the keys under which we store aapplication’s data values. Many Morpheus applicationsalso use Redis’ built-in publish-subscribe mechanism tohandle frequently changing data. For example, topol-ogy publishes a notification to a channel any of thekeys in the topology namespace changes, and routingsubscribes to this channel and updates its routing con-figuration appropriately when it receives a notificationthat the topology has changed.

Applications. Applications running on top of Morpheusfollow a common design pattern. Upon startup, theyconnect with the nib to retrieve any relevant persistentstate. The application then adds to, and retrieves from,the persistent store any other necessary data depend-ing on its function. For example, topology discoversand stores hosts, switches, edges, and additional infor-mation about the topology in the nib, and when rout-ing starts up it reads this information and then addsthe least-cost paths to each destination. During normaloperation, applications are reactive: they will processevents from the platform and from other applications(e.g., via the pub-sub mechanism). In response, theywill make changes to the NIB state and push out a newNetKAT program via the update function on the plat-form nodes, which will update in the switches.

Update Coordinator. Because Morpheus has a distrib-uted architecture, dynamic updates require coordina-tion between nodes. Morpheus uses an update coor-dinator (or updc) that manages interactions betweennodes during an update. We discuss these interactionsin detail in the next section.

4. DYNAMIC UPDATES WITH MORPHEUSMorpheus’s design supports dynamic updates by al-

lowing important state to persist in the NIB betweenversions while providing a way to transform that statewhen required by an update. To ensure consistent se-mantics, Morpheus’s updc node organizes updates tothe affected applications using a simple protocol. Thissection describes this protocol, and then describes someexample updates that we have performed.

4.1 Update protocolTo deploy an update, the operator provides updc

with the following update specification:

• New versions of the affected applications’ code

6

• A state transformation function µ that maps theexisting persistent state in affected namespaces intoa format suited to the new application versions.

As a convenience, the application programmer canwrite µ in a domain-specific language (DSL) we de-veloped for writing transformers over JSON values (in-spired by Kitsune’s xfgen language [13]), illustrated brieflyin Sections 4.2 and 4.3. This language’s programs arecompiled to Python code that takes an old JSON valueand produces an updated version of it.3 Alternatively,the user can write µ using standard Python code.

Given the update specification, updc then executesa distributed protocol that steps through four distinctphases: (i) application quiescence, (ii) code installationand state transformation, (iii) application restart, and(iv) controller resumption.

1. Quiesce the applications. updc begins by signalingthe applications designated for an update. The appli-cations complete any ongoing work and shut down, sig-naling updc they have done so. (A timeout is used toforcibly shut down applications that do not exit grace-fully.) At the same time, updc sends the list of applica-tions to the platform, which will temporarily suppressany rules updates made by those applications, whichcould be stale. Once all applications have exited, andthe platform has indicated it has begun blocking therules, Morpheus has reached quiescence.

2. Install the update in the nib. Next, updc installs theadministrator-provided µ functions at the nib. The nibverifies that these functions make sense, e.g., that if therequest is to update for namespace nodes from versionsv3->v4, then the current nib should contain namespacenodes at version v3. All transformations will be appliedlazily, as part of in step 4.

3. Restart the applications. Now updc begins the pro-cess of resuming operation. updc signals the new ver-sions of the affected applications to start up. These ap-plications connect to the nib, and the nib ensures thatthe applications’ requested version matches the versionjust installed in the nib. The applications then retrieverelevant state stored in the nib, and compute and pushthe new rules to the platform. The platform re-ceives and holds the new rulesets. It will push themonce it has received rules (or otherwise been signaled)from all of the updated applications, to ensure that therules were generated from consistent software versions.Once the platform has received rules from all updatedapplications, it will remove the old rules previously cre-ated by the updated applications and install the newrules on the switches.3While the programmer currently must write µ, automatedassistance is also possible [13, 21].

4. Resume operation. At this point, the update is fullyloaded and the applications proceed as normal. As theapplications access data in the nib, any installed µ func-tion is applied lazily. In particular, when an applicationqueries a particular key, if that key’s value has not yetbeen transformed, the transformer is invoked at thattime and the data is updated.

The rest of this section describes some example up-dates we have implemented in Morpheus for a statefulfirewall, and for topology and routing applications.

4.2 Update example: FirewallWe developed three different versions of a stateful

firewall, and defined updates between them.

• firewall↽ permits bidirectional flows betweeninternal and external hosts as long as the connec-tion is initiated from the inside. When the con-troller sees an outbound packet from internal hostS to external host H, it installs forwarding rulespermitting communication between the two.

• firewall acts like firewall↽ but only in-stalls the rules permitting bidirectional flows af-ter seeing returning traffic following an internalconnection request. (It might do this to preventattacks on the forwarding table originating from acompromised host within the network.)

• firewall� adds to firewall the ability totime out connections (and uninstall their forward-ing rules) after some period of inactivity betweenthe two hosts.

firewall↽ defines a namespace fw allowed thatkeeps track of connections initiated by trusted hosts,represented as JSON values:

{ "trusted_ip": "10.0.0.1",

"trusted_port": 3456,

"untrusted_ip": "10.0.0.2",

"untrusted_port": 80 }

Updating from firewall↽ to firewall requiresthe addition of a new namespace, called fw pending;the keys in this namespace track the internal hosts thathave sent a packet to an external host but have notheard back yet. Once the return packet is received,the host pair is moved to the fw allowed namespace.For this update, no transformer function is needed: allconnections established under the firewall↽ regimecan be allowed to persist, and new connections will gothrough the two-step process.4

4We could also imagine moving all currently approved con-nections to the pending list, but the resulting removal offorwarding rules would be unnecessarily disruptive.

7

Updating from firewall to firewall� requiresupdating the data in the fw pending and fw allowed

namespaces, by adding two fields to the JSON valuesthey map to, last count and time created, where theformer counts the number of packets exchanged betweenan internal and external host as of the time stored in thelatter. Every N seconds (for some N , like 3), the fire-wall application will query the nib to see if the packetcount has changed. If so, it stores the new count andtime. If not, it removes the (actual or pending) route.

In our DSL we can express the transformation fromfirewall to firewall� data for the fw allowed

namespace as follows:

for fw_allowed:* ns_v0->ns_v1 {

INIT ["last_count"] {$out = 0}

INIT ["time_created"] {$out = time.time()}

};

This states that for every key in the namespace, itscorresponding JSON value is updated from version ns v0

(corresponding to firewall) to ns v1 (correspond-ing to firewall�) by adding two JSON fields. Wecan safely initialize the last count field to 0 becausethis is a lower bound on the actual exchanged pack-ets, and we can initialize time created to the currenttime. Both values will be updated at the next timeout.In general, our DSL can express transformations thatinvolve adding, renaming, deleting field names, modi-fying any data stored in the fields, and also renamingthe keys themselves. The DSL is detailed in full in aseparate work [24] focusing on such updates.

The above code will be compiled to Python code thatis stored (as a string) in Redis and associated with thenew version. The existing data will be transformed asthe new version accesses it via the NIB accessor API.When the new version of the program retrieves con-nection information from the nib, the transformationwould add the two new fields to the existing JSON valueshown earlier in this section:

key : fw_allowed:10.0.0.1_3456_10.0.0.2_80

value: { "trusted_port": 3456,

"untrusted_port": 80,

"trusted_ip": "10.0.0.1",

"untrusted_ip": "10.0.0.2",

"last_count": 0,

"time_created": 1426167581.566535 }

4.3 Coordination: Routing and TopologyIn the above example, the firewall is storing its own

data in the nib with no intention of sharing it with anyother applications. As such, we could have killed theapplication, installed the update, and started the newversion. However, when multiple applications share thesame data and its format changes in a backward-in-compatible manner, then it’s critical that we employ

the update protocol described in Section 4.1, whichgracefully coordinates the updates to applications withshared data.

As an example coordinated update, recall from Sec-tion 3 that our routing and topology applicationsshare topology information stored in the nib. In its firstversion, topology merely stores information abouthosts, switches, and the links that connect them. Therouting application computes per-source/destinationroutes, assuming nothing about the capacity or usage oflinks. In the next version, topology regularly queriesthe switches for port statistics and stores the movingaverage of each link’s bitrate in the nib. This informa-tion is then used by routing when computing paths.The result should be better load balancing when multi-ple paths exist, between hosts.

Updating from the first to the second version in Mor-pheus requires adding a field to the JSON object foredges, to add the measured bitrate. The transformer µsimply initializes this field to 1, indicating the defaultvalue for traffic on the link as follows:

for edge:* ns_v0->ns_v1 {

INIT ["weight"] {$out = 1}

};

As such, the initial run of the routing algorithm willreproduce the existing routes because all initial valueswill be the same, ensuring stability. Subsequent rout-ing computations will account for and store the addedusage information and thus better balance the routes.

5. EXPERIMENTS AND EVALUATIONIn this section, we report the results of experiments

where we dynamically update several canonical SDNapplications: a load balancer, a firewall, and a rout-ing application. We implement three dynamic updatemechanisms: state transfer using Morpheus, simple restart,and record and replay. In all cases, state transfer isfast and disruption-free, whereas the other techniquescause a variety of problems, from network churn todropped connections. We ran all experiments usingMininet HiFi [12], on an Intel(R) Core(TM) i5-4250UCPU @ 1.30GHz with 8GB RAM. We report the aver-age of 10 trials.

5.1 FirewallFigure 4 illustrates a dynamic update to the firewall,

described in Section 4.2, from firewall↽ to fire-wall and then to firewall�. The figure showsthe result of simple restart (where all data is stored inmemory and lost on restart) and state transfer (wheredata is stored in the NIB). We do not depict recordand replay, which happens to perform as well as statetransfer for this example (as per Section 2.2).

For the experiment, we used a single switch withtwo ports (with 1 MBPS bandwidth) connected to two

8

Update to V1 Update to V2

500 k

1 M

10s 20sTime

Ban

dw

idth

Firewall: simple restart

Update to V1 Update to V2

500 k

1 M

10s 20sTime

Ban

dw

idth

Firewall: Morpheus

Figure 4: Firewall Update

hosts. One host is designated the client inside the fire-wall and the other is the server outside the firewall. Us-ing iperf, we establish a TCP connection from the clientto the server. The figure plots the bandwidth reportedby iperf over time. In both experiments, we update tofirewall after 10 seconds and firewall� after20 seconds.

Using simple restart, the figure shows that bandwidthdrops significantly during updates. This is unsurpris-ing, since a newly started firewall doesn’t remember ex-isting connections. Therefore, firewall and fire-wall� first block all packets from the server to theclient, until the client sends a packet, which restoresfirewall state. In contrast, Morpheus doesn’t drop anypackets because state is seamlessly transformed fromone version to the next.

5.2 Routing and TopologyFigure 5 shows the effect of updating routing and

topology applications (described in section 4.3), wherethe initial version uses shortest paths and the final ver-sion takes current usage into account. The experimentuses four switches connected in a diamond-shaped topol-ogy with a client and server on either end. Therefore,there are two paths of equal length through the net-work. The client establishes two iperf TCP connections

Update

Regenerate Routes

500 k

1 M

20s 40sTime

Ban

dw

idth

Host A

Host B

Routing and Topology: simple restart

Update

Regenerate Routes

500 k

1 M

20s 40sTime

Ban

dw

idth

Host A

Host B

Routing and Topology: Morpheus

Figure 5: Routing and Topology Discovery Update

to the server.Initially, both connections are routed along the same

path because the first version of topology and rout-ing pick the same shortest path. The links along thepath are 1MBPS, therefore each connection gets 500K-BPS by fair-sharing. After 20 seconds elapse, we up-date both applications: the new version of topologystores link-utilization information in the NIB and thenew version of routing using this information to bal-ance traffic across links. After the update, each connec-tion should be mapped to a unique path, thus increasinglink utilization and the bandwidth reported by iperf.

Using simple restart, both connections are disruptedfor 10 seconds, which is how long topology takesto learn the network topology. Until the topology islearned, routing can’t route traffic for either connec-tion. Morpheus is much less disruptive. Since the statetransfer function preserves topology information, thenew routing module maps each connection to a uniquepath. The connection that is not moved (Host B) suf-fers no disruption and gracefully jumps to use 1MBPSbandwidth. The connection that is moved (Host A) isbriefly disrupted as several switch tables are updated.Even this disruption could be avoided using a consistentupdate [22].

Table 1 breaks down the time to run the update pro-

9

startapps restart rout topo platformexit begins push push resume

0.00s 0.05s 0.11s 1.67s 1.68s 1.70s

Table 1: Update Quiescence Times for Topology andRouting (Median of 11 trials)

tocol for this update. It takes .05s for both topologyand routing to receive the signal to exit at their qui-escent points and shut down, and for the platform toalso receive the signal and pause. At .11s, both applica-tions restart, begin pulling from the nib, and begin per-forming computations. At 1.67s and 1.68s respectively,the routing and topology applications send theirnewly computed rules to the platform. The plat-form holds on to the rules until it ensures it has re-ceived the rules from both apps, and then platformpushes both sets of rules to the switches and unpauses.This entire process takes 1.70s, with most of the timetaken by simply restarting the application (as would berequired in the simple case anyway). In general, theamount of time to update multiple applications safelywill vary based on number of applications, the amountof state to restore, and the type computations to beperformed to generate the rules, but the overhead (com-pared to a restart) seems acceptable.

5.3 Load BalancerFigure 6 shows the effect of updating a load-balancer

that maps incoming connections to a set of server repli-cas. For this experiment, in addition to the simplerestart and Morpheus experiments, we also report thebehavior of record-and-replay which consists of record-ing the packet-in events and replaying them after restart.After 40 seconds, we bring an additional server onlineand update the application to also map connections tothis server. To avoid disconnecting clients, existing con-nections should not be moved after the update.

As shown in the figure, both simple restart and record-and-replay cause disconnections, whereas state transfercauses no disruptions, since the state is preserved. Asdiscussed in Section 2.2, replaying the recorded packet-ins will cause the three connections to be evenly dis-tributed across the three servers. Similarly, for the sim-ple restart, the connections will be evenly distributedwhen the clients attempt to reconnect. Therefore, oneconnection is erroneously mapped to the new servermid-stream, which terminates the connection.

5.4 Programmer EffortStarting from a Morpheus application/service, there

are two main additional tasks required to enable dy-namic update: writing code to quiesce an applicationprior to an update, and writing a µ transformer func-

tion to change the state. In this subsection we discussboth tasks, showing that both are straightforward.

Quiescence. The application developer must write codeto check for notifications from the nib that an updateis available, and if so to complete any outstanding tasksand gracefully exit. These tasks would include storingany additional state in the nib and/or notifying externalparties. For all of our examples, this work was quitesimple, amounting to 8 lines of code.

Transforming the state. Writing the function µ to trans-form the state was also straightforward. For firewall,as described in Section 4.2, we wrote 4 lines of DSLcode to initialize new fields to desired values so thatthe fields could be read with the correct data. Simi-larly for our applications topology and routing, asdescribed in Section 4.3, we wrote 3 lines of DSL codeto initialize the weight field to a default value. For theload balancer, no µ function was necessary, as nostate was transformed, only directly transferred to thenew version of the program.

We also looked at the application histories of othercontrollers to get a sense of how involved writing a µfunction might be for updates that occur “in the wild.”In particular, we looked at GitHub commits from 2012–2014 for OpenDaylight [4] and POX [5] applications.We examined applications such as a host tracker, atopology manager, a Dijkstra router, an L2 learningswitch, a NAT, and a MAC blocker. Several of the ap-plication changes consisted only of updates to the appli-cation logic, such as multiple changes to POX’s IP loadbalancer in 2013. For them, no µ would be necessary.We also found that many of the application changesinvolved adding state, or making small changes to ex-isting state. For example, an update to OpenDaylight’shost tracker on November 18, 2013 converted the rep-resentation of an InetAddress to a IHostId to allowfor more flexibility and to store some additional statesuch as the data layer address. To create this update,the administrator would write µ to initialize the datalayer address for all stored hosts, if known, or add somedummy value to indicate that the data layer address wasnot known. An update to POX’s host tracker on June2, 2013 added two booleans to the state to indicate ifthe host tracker should install flows or should suppressARP replies. To create this update, the administratorwould write µ to initialize these to True in the nib. Tosum up, while the size of µ scales with the size of thechange in state being made, in practice, we found thatthe effort to write µ is minimal.

6. RELATED WORKMorpheus represents the first general-purpose solu-

tion to the problem of dynamically updating SDN con-

10

Update0 M

5 M

10 M

20s 40sTime

Ban

dw

idth

Conn A

Conn B

Conn C

Load balancer: simple restart

Update0 M

5 M

10 M

20s 40sTime

Ban

dw

idth

Conn A

Conn B

Conn C

Load balancer: replay

Update0 M

5 M

10 M

20s 40sTime

Ban

dw

idth

Conn A

Conn B

Conn C

Load balancer: Morpheus

Figure 6: Load Balancer Results

trollers (and by extension, updating the networks theymanage). We argued this point extensively in Section 2,specifically comparing to alternative techniques involv-ing controller restarts and record and replay (exempli-fied by the HotSwap system [30]). In this section weprovide comparison to other work that provides somesolution to the dynamic update problem.

Graceful control-plane updates. Several previous workshave looked at the problem of updating control-planesoftware. In-Service Software Upgrades (ISSU) [1, 3]minimize control-plane downtime in high-end routersupon an OS upgrade by installing the new control soft-ware in parallel with the old one, on different blade andsynchronizing the state automatically. Other researchproposals go even further and allow other routers to re-spond correctly to topology changes that affect packetforwarding, while waiting for a peer to restart its con-trol plane [26, 27]. In general, most routing protocolshave mechanisms to rebuild their state when the controlsoftware (re)starts (cf. [19, 23]), e.g., by querying thestate of neighboring routers.

The key difference between these works and Mor-pheus is that Morpheus aims to support unanticipated,semantic changes to control-plane software, possibly ne-cessitating a change in state representation, whereasISSU and normal routing protocols cannot.5 In addi-tion, Morpheus is general-purpose (due to its focus onSDN), and not tied to a specific protocol design (e.g., arouting protocol).

Distributed Controllers. Distributed SDN controller ar-chitectures such as Onix [18], Hyperflow [29], ONOS [8]or Ravana [16] can create new controller instances andsynchronize state among them using a consistent store.Morpheus’s distributed design is inspired by the designof these controllers, which aim to provide scalability,fault-tolerance and reliability, and can support simple

5Cisco only supports ISSU between releases within a rolling18-month window [9]. Outside of this window, a hard-resetof the control-plane has to be done.

updates in which the shared state is unchanged betweenversions (and/or is backward compatible). However, tothe best of our knowledge these systems have not lookedclosely at the controller upgrade problem when (partsof) the control program itself must be upgraded in asemantics-changing manner, especially when the newcontroller may use different data structures and algo-rithms than the old one. Morpheus handles this sit-uation using the update protocol defined in Section 4,which quiesces the controller instances, initiates a trans-formation of the shared store’s data according to theprogrammer’s specification (if needed), and then startsthe new controller versions. We believe this same ap-proach could be applied to these distributed controllersas well.

Dynamic Software Upgrades. The approach we take inMorpheus is inspired by a line of work on dynamic soft-ware updating (DSU) [15, 20, 13, 21], which advocatesthe same basic approach: pause a program threads atquiescent points, transform and transfer state into thenew version of the program, and resume execution in theupdated version of a program. Most prior DSU workhas focused on updating a running process (“bring-ing the new code to the old (but transformed) data”)whereas for Morpheus the same effect is achieved bystarting a new process with the relevant state (“bring-ing the old (but transformed) data to the new code”).We used our KVolve [24] system for dynamically evolveRedis databases in our implementation of Morpheus.While prior DSU work has considered the problem up-dating network software generally [25] (including for“active” networks [14]), ours is the first to apply ageneral-purpose solution to (distributed) software-definednetwork controllers in particular.

7. CONCLUSIONSThis paper has proposed dynamic update by state

transfer as a general-purpose approach to dynamicallyupdate software-defined network controllers. The ap-proach works by providing direct access to the relevant

11

state in the running controller, and initializing the newcontroller’s state as function of the existing state. Thisapproach is in contrast to alternatives that attempt toautomatically reproduce the relevant state, but may notalways succeed. We implemented the approach as partof Morpheus, a new SDN controller whose design is in-spired by industrial-style controllers. Morpheus pro-vides means to specify transformations in a persistentstore, and employs an update coordination protocol tosafely deploy the transformation. Experiments withMorpheus show that dynamic update by state trans-fer is both natural and effective: it supports seamlessupdates to live networks at low overhead and little pro-grammer effort, while prior approaches would result indisruption, incorrect behavior, or both.

8. REFERENCES[1] Cisco IOS In Service Software Upgrade.

http://tinyurl.com/acjng7k.[2] Floodlight. http://floodlight.openflowhub.org/.[3] Juniper Networks. Unified ISSU Concepts.

http://tinyurl.com/9wbjzhy.[4] OpenDaylight. http://www.opendaylight.org.[5] Pox. http://www.noxrepo.org/pox/about-pox/.[6] Redis. http://redis.io/.[7] C. J. Anderson, N. Foster, A. Guha, J.-B. Jeannin,

D. Kozen, C. Schlesinger, and D. Walker. Netkat: Semanticfoundations for networks. In POPL, 2014.

[8] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi,T. Koide, B. Lantz, B. O’Connor, P. Radoslavov, W. Snow,and G. Parulkar. ONOS: Towards an open, distributedSDN OS. In HotSDN, pages 1–6, 2014.

[9] Cisco Systems. Cisco IOS In-Service Software Upgrade.http://www.cisco.com/c/dam/en/us/products/collateral/ios-nx-os-software/high-availability/prod_qas0900aecd8044c333.pdf.

[10] N. Foster, R. Harrison, M. J. Freedman, C. Monsanto,J. Rexford, A. Story, and D. Walker. Frenetic: A networkprogramming language. In ICFP, 2011.

[11] A. Gember-Jacobson, R. Viswanathan, C. Prakash,R. Grandl, J. Khalid, S. Das, and A. Akella. OpenNF:Enabling innovation in network function control. InSIGCOMM, 2014.

[12] N. Handigol, B. Heller, V. Jeyakumar, B. Lantz, andN. McKeown. Reproducible network experiments usingcontainer-based emulation. In CoNEXT, 2012.

[13] C. M. Hayden, K. Saur, E. K. Smith, M. Hicks, and J. S.Foster. Efficient, general-purpose dynamic softwareupdating for c. ACM Transactions on ProgrammingLanguages and Systems (TOPLAS), 36(4):13, Oct. 2014.

[14] M. Hicks and S. Nettles. Active networking means evolution(or enhanced extensibility required). In H. Yashuda, editor,International Working Conference on Active Networks(IWAN), volume 1942 of Lecture Notes in ComputerScience, pages 16–32. Springer-Verlag, October 2000.

[15] M. Hicks and S. M. Nettles. Dynamic software updating.ACM Transactions on Programming Languages andSystems (TOPLAS), 27(6):1049–1096, November 2005.

[16] N. Katta, H. Zhang, M. Freedman, and J. Rexford. Ravana:Controller fault-tolerance in software-defined networking.In SOSR, 2015.

[17] T. Koponen, M. Casado, N. Gude, J. Stribling,L. Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue,T. Hama, et al. Onix: A distributed control platform forlarge-scale production networks. In OSDI, volume 10, pages1–6, 2010.

[18] T. Koponen, M. Casado, N. Gude, J. Stribling,L. Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue,T. Hama, and S. Shenker. Onix: A distributed controlplatform for large-scale production networks. In OSDI.USENIX Association, Oct. 2010.

[19] J. Moy, P. Pillay-Esnault, and A. Lindem. Graceful OSPFRestart. RFC 3623, 2003.

[20] I. Neamtiu and M. Hicks. Safe and timely dynamic updatesfor multi-threaded programs. In PLDI, June 2009.

[21] L. Pina, L. Veiga, and M. Hicks. Rubah: DSU for Java on astock JVM. In OOPSLA, 2014.

[22] M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger, andD. Walker. Abstractions for network update. InSIGCOMM, 2012.

[23] S. Sangli, E. Chen, R. Fernando, J. Scudder, andY. Rekhter. Graceful Restart Mechanism for BGP. RFC4724, Jan. 2007.

[24] K. Saur, T. Dumitras, and M. Hicks. Evolving nosqldatabases without downtime, Apr. 2015.

[25] M. E. Segal and O. Frieder. On-the-fly programmodification: Systems for dynamic updating. IEEESoftware, pages 53–65, March 1993.

[26] A. Shaikh, R. Dube, and A. Varma. Avoiding instabilityduring graceful shutdown of OSPF. In INFOCOM, 2002.

[27] A. Shaikh, R. Dube, and A. Varma. Avoiding instabilityduring graceful shutdown of multiple OSPF routers.IEEE/ACM Transactions on Networking, 14(3):532 –542,june 2006.

[28] J. Sherry, P. Gao, S. Basu, A. Panda, A. Krishnamurthy,C. Macciocco, M. Manesh, J. Martins, S. Ratnasamy,L. Rizzo, and S. Shenker. Rollback recovery formiddleboxes. In SIGCOMM, 2015.

[29] A. Tootoonchian and Y. Ganjali. Hyperflow: A distributedcontrol plane for openflow. In Proceedings of the 2010Internet Network Management Conference on Research onEnterprise Networking, INM/WREN’10, pages 3–3,Berkeley, CA, USA, 2010. USENIX Association.

[30] L. Vanbever, J. Reich, T. Benson, N. Foster, andJ. Rexford. Hotswap: Correct and efficient controllerupgrades for software-defined networks. In HotSDN, 2013.

12

morpheus: safe and flexible dynamic updates for sdnsmwh/papers/morpheus-submitted.pdf · using...

Documents