integrated network service processing using programmable network devices

Integrated Network Service ProcessingUsing Programmable Network Devices

Christoph Schuba, Jason Goldschmidt,Kevin Kalajan, and Michael F. Speer

Sun Labs16 Network CircleMenlo Park, CA 94025


Christoph Schuba, Jason Goldschmidt,Kevin Kalajan, and Michael F. Speer

SMLI TR-2005-138 May 2005

Abstract:

Project NEon is the investigation of a paradigm shift away from special purpose network appli-ances to an integrated way to architect, operate, and manage data plane network services.We were interested in evaluating the benefits of data flow management and enforcementinside the data center edge.

Starting in the early 1990’s, network service functions that had traditionally been performedwithin servers moved into special purpose network appliances (e.g., firewalls and load balanc-ers). The main driving force behind this trend was the need to place these functions inline intothe data plane and operate these services at line rates. This technical solution has served asa sound business model for network appliance vendors during the last decade. However, asthis approach became more prevalent and network speeds increased, its performance andmanageability limits became apparent, too.

The NEon architecture strictly divides the control plane, an instance of which is called the con-trol plane policy manager, and the data plane, instances of which are called programmablerule enforcement devices. The control plane policy manager and programmable rule enforce-ment devices are separated through standard interfaces and protocols that are still beingdefined by standards bodies such as the Network Processor Forum (NPF) and the InternetEngineering Task Force (IETF.) Our prototypes generated valuable lessons, including the val-idation of the standard APIs and the IETF ForCES (Forwarding and Control Element Separa-tion) protocol under consideration.

email addresses:[email protected]@[email protected]@sun.com

© 2005 Sun Microsystems, Inc. All rights reserved. The SML Technical Report Series is published by Sun Microsystems Laboratories, of SunMicrosystems, Inc. Printed in U.S.A.

Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to thesource is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic,or mechanical, including photocopying, recording, taping, or storage in an information retrieval system, without the prior written permission of thecopyright owner.

TRADEMARKSSun, Sun Microsystems, the Sun logo, Java, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and othercountries.

All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and othercountries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. UNIX is a registered trade-mark in the United States and other countries, exclusively licensed through X/Open Company, Ltd.

For information regarding the SML Technical Report Series, contact Jeanie Treichel, Editor-in-Chief <[email protected]>.All technicalreports are available online on our website, http://research.sun.com/techrep/.


Christoph Schuba, Jason Goldschmidt, Kevin Kalajan, Michael F. Speer

Network Technology OfficeSun Microsystems, Inc.

4150 Network CircleSanta Clara, CA 95054

Extended Abstract

Starting in the early 1990's, network service functions that had traditionally beenperformed within servers moved into special purpose network appliances (e.g., firewallsand load balancers). The main driving force behind this trend was the need to placethese functions inline into the data plane and operate these services at line rates. Thistechnical solution has served as a sound business model for network appliance vendors during the last decade.

However, as this approach became more prevalent and network speeds increased, itslimits and disadvantages became apparent, too. For example, individual appliances haveproblems dealing with ever-increasing bandwidth requirements. Also, stringingappliances together to build network service architectures in data centers hit architecturalscaling and performance limitations. Finally, such architectures became notoriouslydifficult to manage. By early 2004, appliances that combine a small set of relatedservices as an integrated solution were starting to be offered in the marketplace. ProjectNEon is the first proponent of driving this approach to its logical extreme.

Project NEon is the investigation of a paradigm shift away from special purpose networkappliances to an integrated way to architect, operate, and manage data plane networkservices. We were interested in evaluating the benefits of data flow management andenforcement inside the data center edge. We therefore investigated ways to architectdata flow management over high bandwidth network connections feeding data centers,and focused our attention on handling data flows vs. individual packets. The project tookon an iterative process of thought, analysis, prototyping, and experimental evaluation.

- 1 -

The NEon architecture strictly divides the control plane, an instance of which is called thecontrol plane policy manager, and the data plane, instances of which are calledprogrammable rule enforcement devices. The control plane policy manager andprogrammable rule enforcement devices are separated through standard interfaces andprotocols that are still being defined by standards bodies such as the Network ProcessorForum (NPF) and the Internet Engineering Task Force (IETF.) Our prototypes validatedthe standard APIs and the IETF ForCES (Forwarding and Control Element Separation)protocol under consideration.

This standards-based separation enables the use of commercial-off-the-shelf (COTS)network processor (NPU) technology. The vision is that NPF standards become thedefault way to program these devices. NPF API calls are translated to vendor-specificAPIs, thus offering application portability. We validated our claim using three verydifferent hardware platforms: a network processor chip called PolicyEdge from a startupfirm called FastChip, a modified content load balancer blade from Sun's blade serverplatform B1600, and the Sun Secure Application Switch – N2000 Series. In the NEonarchitecture, the network edge can take advantage of the COTS performance curve whiletrivially coping with programmable rule enforcement device hardware heterogeneity.

The control plane policy manager accepts as input network services policies such as afirewall and load balancer configuration, application feedback such as server loadinformation, environmental feedback such as current physical link status, andadministrative input. It aggregates these policies in real time and creates the optimalconfiguration for the programmable rule enforcement devices under its control. Wenamed this semantic aggregation and optimization with the term rule crunching. It allowsfor the creation of a unified rule set which can be loaded into the programmable ruleenforcement device. By having a unified rule set, classification and action processingonly need to be performed once for any given flow, offering a per-packet latencyreduction in the data path.

Given such a network edge programming platform, it becomes possible to automaticallyprovision network services in response to application needs inside the network. Theartificial division that exists today between the tasks of system administrators (on the onehand) and network administrators (on the other) is removed. The NEon architectureenables automatic and semantically consistent programming of the network edge as anintegral part of service provisioning. Fine-grained feedback can now be automaticallygiven to the network, to make decisions about where flows go, how they are shaped, etc.Such automatic feedback is obviously preferable over having the network make thesetypes of decisions independent of the application or service, based on past or magicallypredicted future user behavior.

The remainder of this document is structured as follows. Section 1 introduces andmotivates the technical problem we are addressing, followed by Section 2 that explainsthe NEon architecture. Section 3 dives into more detail and describes project NEonmilestones and accomplishments in its two phases. Section 4 focuses on the lessonslearned. We conclude with a summary of ongoing and future work.

- 2 -

1. Introduction

Project NEon is an evolutionary architecture to address some of the problemsencountered in data center networking today. The NEon approach centralizes the controlplane leading to a more manageable, and scalable infrastructure on which to integratethe data center networking services and support data center application needs.

Project NEon is motivated by five major trends taking place in data center networking:

• Trend 1: Evolution of Tier 0 network services

We use the term Tier 0 to refer to networking equipment that resides in data centersbetween the router and its first bank of web servers (this set of web server is typicallyreferred to as Tier 1.) Examples of such devices (and/or functions) include: firewalls,load balancers, and reverse proxies. A more complete list can be found in Table 1below.

Such devices began the migration from packets to flows, as they maintain a notion ofsession or flow (sometimes including associated per-session state), often in thecontext of multiple networking layers. For example, one load balancer may implementalgorithms based just on TCP (layer 4) information, while another load balancer maybase its load-balancing decisions on HTTP state (e.g., cookies) and thus is consideredto be a layer 7 load balancer.

• Trend 2: Exploitation of benefits of decomposition

At first, many Tier 0 services were implemented as software processes on generalpurpose servers. These services have evolved into dedicated components,implemented with specialized software and/or custom hardware in various form factors(appliances or blades). This evolution underscores the increasing importance of Tier 0network processing.

Often this decomposition happened in multiple phases. As an example, consider SSL-acceleration: instead of spending expensive data center server cycles on SSLcryptographic operations, data center planners found it more cost-effective to offloadthis function to multiple, less expensive systems. The initial benefit was that serverload was reduced, resulting in increased CPU resources. Once these functions wereoffloaded, they were isolated and optimized as specialized Tier 0 network services,hardware, and management software (e.g., distributed certificate management)evolved. This architectural change has improved individual, general purpose serverperformance resulting in overall improved data center efficiency.

This pattern of evolution is evident in most Tier 0 platforms, which has resulted in abroad proliferation in devices along with an associated efficiency and managementdilemma, i.e., an unexpectedly high management cost for the intended efficiencybenefits.

- 3 -

• Trend 3: Proliferation of Tier 0 functions/devices

An increasingly large portion of data center equipment is spent on Tier 0 functions.Table 1 below summarizes popular Tier 0 devices implemented within data centers.While the proliferation of such services can be taken as an indication of theirimportance, the increased number of individual systems in the data path makes themanagement of packet flows more difficult and impinges on the limits of scalability.

Authentication Intrusion Detection Content-Based Routing

Authorization Layer 4 Load Balancing Availability Monitoring

Logging Layer 7 Load Balancing Bandwidth Management

Performance Monitoring NAT Caching

Session StateManagement

NAPT Site Partitioning

Reverse Proxying Content Filtering Site Selection

SSL Proxying Differentiation Transcoding

Crypto Acceleration Firewalling Virus Protection

Policy/Transactional QoS DOS Prevention ...

Table 1 List of popular Tier 0 services/functions

• Trend 4: Scaling and manageability problems with exiting architectures

The pipe/speed trade offData center managers have a range of choices for external network connectivityranging from a small number of high-speed links to a larger number of low-speed links.As termination costs (TC) for individual links have not decreased at the same rate asthe cost for their underlying physical links, the TC is representing an increasingly largecost. This change in economics indicates that fewer, high-speed links will tend to bemore cost-effective than a larger number of low-speed links.

The ramification of these economies is that high-speed links are only useful if thecorresponding Tier 0 devices and back-end Tier1-3 systems can cope with data at thecorresponding rates, where Tier 2-3 systems implement the application and data basetiers. In today's data center architectures, there are limited opportunities andpossibilities to deal with such changes in scale. Centralized flow managementpresents a solution to this and other problems of data center networking.

Prioritizing rules and servicesThe proliferation of Tier 0 devices/functions has created a service managementchallenge: prioritizing the order and importance of various elements. The greater the

- 4 -

number of Tier 0 elements deployed, the more complex their prioritization canbecome. For example, should a denial of service (DoS) prevention element examinedata traffic before a load balancer to avoid a DoS attack impacting the load balancer,or should a load balancer be placed before DoS prevention elements to ensure their(DoS prevention elements) scalability? This scenario is just one example of manymore complicated issues that can arise as an increasingly large number of Tier 0elements are deployed.

The traditional approach of discrete elements that are physically cabled in a specificorder inhibits flexibility. It is rather costly, time consuming, and difficult to implement achange in flow processing order. Dynamically altering the order of flow processing, socalled soft-cabling, is, to the best of our knowledge, still under development. Asnetwork and business processing conditions change, dynamic alteration of elementpriority could provide new and valuable benefits to enterprise security,competitiveness, productivity, and total cost-of-ownership (TCO).

One good packet classification deserves anotherThe multitude of Tier 0 elements is likely to be supplied by multiple vendors. Thisexpectation implies that packet classification and packet processing are performed byeach device, effectively forcing a single flow to traverse processing stacks of theindividual devices in a serial fashion. The worst-case scenario occurs when eachpacket must proceed all the way to layer 7 (and back down) on every single device,e.g., an anti-spam element followed by an anti-virus element. Such duplicatedprocessing (i.e., of the packet classification step) is wasteful and should be avoided.

Even if Tier 0 elements deployed in one data center were from a single vendor, today'simplementations still treat each element as a separate functional unit, requiringmultiple packet classifications.

Alternatively, if packet classification can be centralized, and analysis of various partsof the flow were made available to the appropriate Tier 0 subsystems, overall packetprocessing latency would be reduced, a greater degree of efficiency could beobserved, and improved TCO could be achieved.

Load balancing load balancersAs data flows and traffic are non-deterministic, data center managers cannotguarantee that a given set of Tier 0 devices can process incoming flows at requiredrates. The desire to meet traffic load demands requires distributing incoming trafficflows amongst multiple instances of each Tier 0 element. For example, a given firewallappliance may not be able to keep up with the rate of incoming flows at peak trafficperiods. This scenario requires a load balancer to be placed in front of a set offirewalls that can then share and consequently accommodate the load.

Today, various Tier 0 device manufacturers are implementing their own load balancingsystems (e.g., Checkpoint's ClusterXL). Implementing vendor-specific load balancingfor each Tier 0 element plots the path to a wasteful long-term solution for a plethora of

- 5 -

Tier 0 elements. And even if the same load balancer product were used for multipleTier 0 element functions, the resulting system would be difficult to manage, costly tomaintain, thereby inhibiting scalability and flexibility.

Multiple management consolesWithout centralized flow management, each Tier 0 element requires an individualmanagement console with its associated user interface and replicated administrativefunctions (e.g., IP address, subnet, gateway, etc.) Assuming that a typical data centerhas a firewall, load balancer, virus scanner, anti-spam engine, and DOS preventionelement, six different administrative consoles would have to be mastered, patched,updated, etc. Changing priorities of any element requires physical recabling and is noteasy to do as data center networking conditions may change in real-time.

A single management console for all Tier 0 services, as with centralized flowmanagement, enables dynamic reconfiguration, virtualization, and massively improvedTCO for the entire data center. Note: the point is not that the Tier 0 elementmanagement console will be completely centralized, but, instead, that a portion(shared by all Tier 0 elements) would become centralized. A second-generationapproach could leverage a common API to integrate the actual Tier 0 elementconfiguration within a single framework.

Lack of a feedback loopThe evolution of individual elements for Tier 0 functions has followed the changes intechnology and economics of the data center. Some Tier 0 elements could offersubstantially increased TCO if they could make dynamic and improved decisionsbased on interactive and dynamic information from the Tier 2 application services.Ideally, a feedback loop would exist between Tier 2 application subsystems (creatingflow contents) and Tier 0 elements handling flows.

A good example of this is the concept of dynamic differentiated services (a.k.a.diffserv). If a Tier 0 element is used to control flow priorities, it is really only useful if itcan dynamically alter the priority of a flow, based on its state. Because the relevantstate is maintained in an application which is the only entity aware of any statechanges, such applications need a way to communicate their state changes to theirdiffserv Tier 0 element.

If Tier 0 elements are indeed self contained, individual processing elements, withoutcentralized flow management, then it is not possible to create a feedback loop asdescribed above. However, if centralized flow management is used offering thedesired interfaces and functionality, an application may invoke a method within theflow management subsystem which in turn informs the Tier 0 diffserv element that agiven flow should now be given higher (or lower) priority/bandwidth. An examplewould be an electronic commerce application where a customer decides to purchasean item after browsing the store. Once the application detects the customer's intent(i.e., an item is placed into the shopping basket) the flow priority can be upgraded andthe customer consequently experiences better end-to-end response time.

- 6 -

Another example would be an anti-spam element detecting repetitive spam from aspecific IP address and then informing the firewall element (via the centralized controlplane policy manager) to drop all traffic from the relevant IP address. Alternatively, adynamic flow rule could be applied forcing the firewall function to be performed beforespam detection for a particular IP address.

• Trend 5: Latency vs. bandwidth

Imagine an environment where no matter how large the network pipe to the datacenter becomes, all Tier 0 devices could keep up with the various traffic flows at peakrate. Even in this case, the issue of latency is unavoidable. As Tier 0 processingelements are added in series and as flows are processed by packet classificationengines for each new element, the end-to-end latency for each packet increasessignificantly, even if the processing time spent at each individual element appears tobe negligible.

One consequence of the concept of centralized flow management is to be able toeliminate the repeated traversal of multiple (optimized or generic) packet classifiers.The centralized control plane policy manager would be responsible for configuring asingle packet classification engine that is shared by all cooperating Tier 0 elements, ina priority order defined by policy. This design eliminates repeated, unnecessary packetclassification.

With these major trends in mind, Project NEon is aimed at exploring solutions to thesedata center networking problems and at enabling a new approach to providing betterexperiences for our customers as well as their customers.

2. The NEon Architecture

This section serves as a high-level overview over the NEon architecture, describing itsmain components and interfaces . Further details on the evolution of the architecture aswell as its components, programming models, etc. are presented in the followingsections.

Core to NEon's architecture is the concept of centralized flow management. Centralizedflow management represents a novel architectural choice for building a data centernetwork. Moreover, centralized flow management can live at many boundaries of a datacenter network, for example the edge of data center network, horizontally scaled servers,or at the edge of a hosted server farm. The architecture consists of a combination ofcomponents as described below plus well defined interfaces and models of interactionbetween the components. This approach offers a simplified control plane managementmodel while providing scalability to maximum line rate in the data plane processing. Theseparation of control and data plane in this architecture allows for independent evolutionof both the control and data plane.

- 7 -

The NEon architecture consists of the following components:

• Control plane policy manager (CPPM)The control plane policy manager aggregates and integrates service rules and actionsfor data traffic policies of multiple network services such as firewalling, load balancing,SLA monitoring, and address translation (see Table 1 for a list of network services),into a single policy of rules and actions that implement these services. This unificationof data traffic policies provides for the virtualization of services while preserving thepolicy enforcement and mapping of service semantics to physical hardware semantics.The control plane policy manager provides a point of observation into the logicalconnection between virtualized network services and how individual network flowswould be processed. This observability can be very powerful when establishing thecorrectness of rule sets, identifying sub-optimal paths, simulating data trafficprocessing, and debugging specific network related issues.Rule unification is accomplished through an algorithm called rule crunching. It detectsconflicts between every pair of rules (i.e., determines if multiple rule patterns matchthe same flows) and resolves such conflicts through creating action lists to beexecuted on packets for these flows. The individual action lists are sorted accordingto the relative priority order of the Tier 0 networking services the CPPM manages andaccording to the type of matching hardware supported by the managed programmablerule enforcement devices. In a final step, a rule database is created that is to beinstalled and enforced in the connected programmable rule enforcement devices.

- 8 -

Figure 1 The NEon architecture

Server 1

Server n

Adaptation Layer

Control Plane PolicyManager (CPPM)

Programmable RuleEnforcement Device

(PRED)

EnvironmentAgent

ApplicationAgent

NetworkService(s)

Administrator

Fat Pipe2.5-40 Gb/s

Neon

Adap

tati

on L

ayer

• Programmable Rule Enforcement Devices (PRED)A programmable rule enforcement device is a programmable classification and actionengine that implements high performance packet processing at line speeds. For thepurpose of our investigation we concentrated on speeds of 2.5Gbps and higher.Programmable rule enforcement devices integrate a common programming interface(being standardized in the Network Processing Forum - NPF) to allow for classificationand action programming. This interface enables the separation of data path andcontrol plane processing. Programmable rule enforcement devices consist of anaggregation of logical function blocks that allow for the instantiation of networkservices such as load balancing, firewalling, fail-over, QoS, content inspection/layer 7processing, and address management. Logical function blocks provide high-levelpacket processing functions such as packet classification, flow identification, andaction processing through programmable dispatch engines (e.g., the insertion of meta-data into packets of identified flows to aid processing further down stream from theprogrammable rule enforcement device.)

• Network service(s) – Network services in this architecture are the managementportion and policy input of Tier 0 services such as firewalling, load balancing, and SLAmonitoring. Their policies get fed into the control plane policy manager for furtheraggregation, optimization, conflict detection, etc.

• Application agent(s)The application agent interface enables application driven networking. Applicationagents are responsible for trapping application or middleware events and forwardingthem to the control plane policy manager.

• Environmental agent(s)Environmental agents provide input to the CPPM to enable it to adapt to certainenvironmental and fault conditions such as server outages. The ability to dynamicallytune the steering of packet flows is a powerful step in providing dynamic and highlyavailable networked services.

• Adaptation layersThe NEon architecture is defined with open standards in mind. It therefore includesadaptation layers between its components. One of these adaptation layers residesbetween the control plane policy manager and the programmable rule enforcementdevice. It provides a separation of the control plane and the data plane. Moreover, itallows for an evolution of the data plane and the control plane independent of eachother. As of mid 2004, this layer is being standardized in two industry wide efforts –IETF's ForCES working group and in the Network Processing Forum (NPF). Other adaptation layers provide interfaces for control plane applications (networkservices), networked applications, and environmental agents to drive the configurationof programmable rule enforcement devices controlled by the control plane policymanager.

- 9 -

3. Phases of Project NEon

Project NEon was divided into two phases. The first phase of the project was aboutestablishing the principles and dimensions that Project NEon would explore, developingthe architectural model, and building an initial prototype of the NEon architecture as proofof concept. In its second phase we refined the architecture even further, provided a newprogramming model for control plane applications and established rule crunching as amethod for virtualizing network services such as load balancing and firewalling.

3.1 Project NEon - Phase One

The primary goals of Phase One where

• the investigation of which dimensions matter in flow-based data processing,• the development and prototyping of the NEon Architecture as proof of concept,• the design and implementation of a first control plane policy manager, and• an initial exploration of programmable rule enforcement.

Phase One generated a thorough understanding of which characteristics are importantwhen working with data flows, and the completion of a first set of prototypes of individualarchitecture components.

3.1.1 Dimensions of flow-based data processing

Before defining the NEon architecture, we investigated which characteristics might beimportant when working with data flows compared to processing individual packetswithout context. We arrived at the following list of dimensions that we considered mostimportant. It might not be complete, but it allowed us to agree on the dimensions wewanted to consider and for which to architect our system. These dimensions can helpdefine which space of flow management a given architecture is capturing.

• Stateful vs. statelessWe call a network service stateful if an enforcement device may have to update stateinformation for packets it processes. Further packet processing would then depend onthe updated state information. Stateless network services maintain no per packet orper connection state information.

Example for a stateful network service: an ftp-capable IP packet filter; ftp controlconnections would augment the currently exercised firewall policy to allow itssubsequent ftp data connections through the firewall.

An example for a stateless network service is a simple load-balancing scheme thatsends UDP DNS requests to randomly selected DNS servers inside the network.

- 10 -

• Concurrency of stateSome network services can be applied in both directions, but they can operateindependently from each other. Others may need to share consistent state. Thisaspect has a relationship to the stateful vs. stateless discussion above, but is adifferent point. It asks if shared state may need to be accessed concurrently formultiple connections, be they inbound, outbound, or both!

Example: NAT; A network address translator needs to have a consistent view of theaddress translating database at all times, in both directions.

• Basic vs. advanced packet processingSome network services do not need to modify any packet content (header or payload).We call those basic services. Network services that are not basic, are calledadvanced. Basic services, for example, can be executed in parallel to reduce latency.

Example: A basic network service is a packet filtering firewall that (on network egress)is configured to drop all IP packets that have source addresses that are not valid in theconnected network. This policy helps protect against IP source address spoofing bybeing a good netizen. An example for an advanced network service is a loadbalancing service that modifies MAC addresses of packets to steer them to the serverthat was chosen to service this flow.

• Direction (inbound vs. outbound vs. bidirectional)There are many network services that make sense to be applied either to inbound oroutbound connections, or to both.

Example: A module that predetermines a network stack call graph that packets haveto follow makes sense to apply only in the inbound direction. However, a scannermodule that filters all electronic mail and documents containing keywords such asconfidential and proprietary may be applied only to certain outbound connections.Finally, all traffic may be checked for correct CRC fields.

• Boundary (ingress vs. egress)This is a more subtle point than the question of flow direction above. This can beexplained perhaps best using an example. The most flexible firewalls in the marketapply their rule sets to traffic as it enters the firewall on one NIC and again as it leaveson another NIC. This approach protects, for example, against IP fragmentationattacks. Some older firewalls filtered only once on the inbound NIC, and did not filterfragments at all. This meant that subsequent fragments could overwrite incomplete IPpackets in the reassembly buffer. The reassembled packet had then become a packetwith malicious content, but there was no more filtering in place to detect it!

• Locality (remote vs. local)Can the network service be provided by the NE itself (i.e., local) or does it need to beprovided by a remote device? Processing may be way too expensive or too infrequent

- 11 -

to be done on the NE. Furthermore, there might be existing devices that perform theirparticular network services so well that providing them as part of the NEon platformwould not be desirable.

• Depth of inspection (control information vs. payload)This point asks the question how much packet processing work (e.g., patternmatching, bit manipulations, and the like) can be done at line rate in the data path.Depending on the answer, one enables a slew of local network services, or forcesthem to be provided remotely. Where this boundary is likely to be depends on thenetwork service in question, on the amount of processing it takes to perform theservices in question, and by the capability of the available hardware.

3.1.2 Development of the NEon architecture

The NEon architecture was designed with these characteristics in mind. We wanted toinclude both stateless and stateful (including concurrent) processing, however, did notget to work on stateful processing at line rates until late in the project. At the time of thiswriting, there is ongoing work investigating mechanisms that enable very efficient statefuland concurrent processing for flows carrying, for example, iSCSI traffic, pipelined HTMLrequests, or multiple XML objects. Advanced processing was included from the verybeginning, starting with our investigations of various candidates for programmable ruleenforcement technologies. During our prototyping efforts we concentrated on inboundtraffic, ingress, and local enforcement only, because we believe that our design is straightforward to be applied to outbound or bi-directional traffic. Ongoing prototyping work inthe area of distributed programmable rule enforcement has set out to provide a proof ofconcept that our ideas indeed work as expected. Both, ingress and egress processingindependent of traffic direction are a special case of distributed, programmable ruleenforcement. Finally, depth of processing depends directly on the capabilities of theunderlying PRED.

The original NEon architecture is depicted in Figure 2 below. It includes the followingcomponents: the network services adaptation layer (NSAL), the rule engine (became thecontrol plane policy manager), the network element adaptation layer (NEAL), and devices (became programmable rule enforcement devices).

- 12 -

The NSAL addresses the problem of allowing multiple network services (e.g., firewall, load balancing, SSL processing, etc. ) to interact with the rule engine to enable flowmanagement for these services. Hence, the NSAL was designed to be the interface fornetwork services to express their policies to a virtualized enforcement device. The ruleengine (control plane policy manager in today's architecture) consists of two interactingcomponents, an exception manager and a rule cruncher. The rule cruncher operates onrules and policies, whereas the exception manager deals with initial packets of new dataflows that cannot yet be handled by the programmable rule enforcement device becausethey had not been linked to flows known to the PRED.

As input, the rule cruncher accepts rules from external entities, such as Tier 0 networkservices or application agents, and policies, such as priorities between network servicerules or actions. The purpose of the rule cruncher is to create a set of low level rules thatcan be enforced by high performance network processors that operate in the PRED onthe data path. To accomplish that, the rule cruncher determines if rules are conflicting,i.e., if they are to be applied to multiple traffic flows. If that is the case, the algorithmassigns a policy-based priority in which the conflicting rules' actions must be applied tocorrectly enforce the original, high level rule set. The priorities to resolves these conflictsare assigned via the administrative input and controls in the architecture diagram inFigure 2 above.

The NEAL, which later became the FEAL (Flow Enforcement Adaptation Layer), was ourfirst design to insulate the control plane policy manager from programmable ruleenforcement devices, to hide their heterogeneity. Already during the first phase of thisproject, it became clear that an adoption of standards-based work in this interface areawas desirable to be able to take advantage of industry advances in programmable ruleenforcement device technology.

- 13 -

Figure 2 The NEon architecture in project Phase One

During this phase of the project, both adaptation layers were designed to contain ageneric part and service-specific parts (device-specific parts, respectively) toaccommodate that network services (enforcement devices, respectively) have genericcommonalities, but also individual characteristics.

The network element device (now called programmable rule enforcement device)provides the enforcement of the unified rule set produced by the rule engine.Programmable rule enforcement devices are populated with the rules consisting ofpatterns and lists of actions for packets matching the patterns.

3.1.3 Exploring flow management

When we investigated which functionality should be part of the control plane policymanager, we found that its main role should be the transparent mapping of networkservice semantics to generic packet processing hardware. To accomplish this goal, wefirst asked what typical Tier 0 services have in common and what makes up theirindividual value add, i.e., how they could be virtualized. We found that they all work on amatch/action-dispatch model, first identifying which rule applies to a given packet,followed by executing the associated action. If no rule matches, typically a default actionis executed. We realized that we could express what individual network services dothrough their rule sets that identify which flows to match and what to do with them if theydo match. Given such policies we designed a first algorithm to combine them into asingle rule set to be enforced by the underlying hardware. We named this class ofalgorithms rule crunchers.

3.1.4 Exploring programmable rule enforcement

Network processors or network processor units (NPU) are chips designed specifically forworking on network traffic at high data rates. They differ from general purpose CPUtechnology by typically managing many separate hardware threads designed forperforming functions related to networking. By employing these hardware threads, NPUsare capable of scaling to high data rates by parallelizing processing tasks. They alsodiffer from general purpose CPUs in the connective interfaces they support which aretypically designed for connecting to standard switch fabrics and networking interfaces.Applications that run on NPUs are typically designed around aggregating a large numberof data flows and delivering those flows to a disaggregate set of end points. Theseapplications can provide such networking services as IP forwarding, load balancing,firewalling, virus scanning, SSL acceleration, or bandwidth management. Applicationsimplemented on NPUs are typically designed to operate at line-rate: meaning that theyintroduce almost no latency into the processing path. It should be noted that theseapplications are different from those commonly considered to be part of TCP offloadtechnology. Those applications involve aggregating connections sent to a distinct endpoint, which is a different problem employing a similar hardware solution.

- 14 -

After establishing the conceptual model for the architecture, we then set out to prototypethe architecture and explore various aspects of it. In Phase One, the programmable ruleenforcement device leveraged the technology of a high speed classification and networkprocessing engine from a company called FastChip, Inc – a semiconductor company thathas since gone out of business. FastChip's NPU-like device was called PolicyEdge, adevice that we used as the vehicle to enforce policies for packet flows.

In this phase of the project, the control plane policy manager (CPPM) was tailored to thepeculiarities of PolicyEdge's capabilities. We established some of the processingrequirements for programmable rule enforcement devices, however, our view was limitedby the programming model we had in mind based on our experience with the PolicyEdgedevice. It was programmed by building up tree-based data structures that representedthe parsing and processing parameters to configure its hardware-based state machine.Data Packets would then drive the state machine, possible getting modified duringtraversal. In spite of its limitations, working with this technology gave us valuable insight into whatkind of processing is possible, at which speeds, and what data - in principle - isnecessary to configure such packet processing engines.

3.2 Project NEon - Phase Two

The second phase of project NEon began in the Spring of 2003. It represented arenewed focus on management of network services and the development of a moregeneric programming model for both enforcement and policy semantics. Throughincreasing our investment in standards organizations, such as the Network ProcessingForum (NPF) and the Internet Engineering Task Force (IETF), the team was able todevelop a very concise programming model to take the place of the flow enforcementadaptation layer (FEAL). Also, through our activities with both standards bodies andnumerous interactions with the NPU vendor community, we realized how the rulecrunching algorithm defined in Phase One was optimized primarily for the classificationalgorithm specifically employed by Fast Chip and was incompatible with the evolvingindustry standard programming models. This lead to the development of new rulecrunching algorithms. Phase two generated an experimental and theoretical evaluationof the rule crunching algorithm and a fully functional and demonstrable prototype.

3.2.1 NEon programming model

The Network Processing Forum has defined a programming model for the functionalconfiguration and operational management of network processing elements (NPE). Thismodel is based on the functional composition of the programmable rule enforcementapplication. A particular enforcement application can be expressed by the topology ofconnected logical function blocks. The NPF Functional Model [1] and the logical functionblock (LFB) template [2] define a mechanism for representing a specific enforcementapplication. An example of a programmable rule enforcement application might be one

- 15 -

that implements server load balancing. For example, Figure 3 shows the functionaldecomposition of a layer 4 server load balancer:

Using this modeling of logical function blocks, the NPF software working group is able todetermine each functional component requiring configuration (i.e., the 5-tuple classifier)as well as components requiring operational management (i.e., the Ingress block). LFBsserve only to model programmable rule enforcement; they are not meant to influence orrestrict implementation. However, for a vendor to implement a programmable ruleenforcement device that is NPF compliant, their implementation must be capable of beingmodeled using logical function blocks defined by the NPF. An API [3] has been definedfor an application to perform topology discovery of LFB's implemented by a particularNPE or programmable rule enforcement device (PRED).

The NPF programming model is based on layering APIs as a way of abstractinghardware heterogeneity and, in the case of higher level service APIs, hiding functionaldecomposition. Each LFB may require an interface for configuration. Be it updating aclassifier rule or setting the token bucket size of a meter, the APIs used to configure LFBsare referred to as Functional APIs (FAPI). The NPF has published the FAPI Model andUsage Guide as a way toward defining compatible functional APIs. The followingrepresent functional APIs which are being developed by the NPF: Classification,IPv4/IPv6 Forwarding and Next Hop, Metering, Traffic Queuing and IPSec.

Just as FAPIs are used to abstract away vendor specific implementations, Service APIs(SAPI) are used to hide the functional composition required to implement a service.SAPIs may interface with lower level FAPIs or directly with vendor specific APIs (VSAPI)to configure an underlying programmable rule enforcement application. Examples ofexisting SAPI efforts are IPv4/IPv6 Routing, MPLS, IPSec, DiffServ, Server LoadBalancing, SSL Acceleration, and High Availability.

- 16 -

Figure 3 Functional decomposition of a layer 4 server load balanceraccording to the NPF functional model

TX/Egress

LoadBalancer/Scheduler

RX/Ingress

5-tupleClassifier

IP NAT orIP Tunnel

or L2 Encap

The NEon team has been working with the NPF to define this programming model and toensure that it is compatible with the architecture defined by this project. This modelingeffort has also served as a way to drive a close collaboration between the IETFForwarding and Control Element Separation Working Group (ForCES WG) and the NPF.This collaboration has gone quite smoothly because there is a significant overlapbetween contributing companies and individuals in both organizations. The NPFsoftware framework [4] recognizes and details the connection between the APIs definedby the NPF and the protocol to be defined by the ForCES working group. Thecombination of the two became the explicit programming model for Phase Two of thisproject.

For the Phase Two prototype, the team focused on implementing the programming modeldepicted in Figure 4 below.

This programming model served as a replacement for the FEAL defined in Phase One.Based on the progress by both the NPF and IETF, it was decided that the prototypedprogramming model would be written in the spirit of the FAPI and ForCES model. Theteam did look to leverage the ongoing work by the NPF in defining a FAPI for the GenericPacket Classifier (GCLASS) configuration [5]. The use of this FAPI allows for some basicoperations such as managing rule sets and querying filter capabilities. The GCLASSFAPI is well suited for defining rules (the pairing of patterns and actions) for filters used toinspect fixed header protocols such as TCP/UDP. It is not suited for content-basedpacket classification for protocols such as HTTP. As a temporary placeholder, the teamdefined a ForCES-esque protocol in the spirit of the ForCES model [6] and requirements[7].

To complete the mapping of existing standards work to adaptation layers defined in theNEon architecture, it was determined that NPF SAPIs may eventually serve as the

- 17 -

Figure 4 Programming model at the CPPM/PRED interface

FAPI

FAPIVendor Specific API

Control Plane Policy Manager

Programmable Rule Enforcement Device

FORCES

RC

RuleDB

FEAL

programming interface for the network service adaptation layer (NSAL). Current efforts inthe Network Computing Working Group are in the process of defining SAPIs for serverload balancing and SSL acceleration. This realization, however, was not conceived aspart of the Phase Two prototype. No existing standards have been identified to take theplace of the application and environmental agent adaptation layers (AAAL, EAAL).

3.2.2 Rule crunching

The major piece of intellectual property that was created and prototyped as part of phasetwo was the longest prefix rule cruncher (LPRC) algorithm. The reason for developing anew rule crunching algorithm was to identify why the previous one was well designed forFastChip's classifier, but not for those from other vendors. A very common classificationmechanism is to execute the actions associated with a first match rule. The Phase Onerule crunching algorithm was found to be incompatible with such mechanisms. Inparticular, the classifier implemented by the Sun Fire™ B10n content load balancingblade worked off a longest prefix matching mechanism. Plus, the NPF GCLASS FAPI didnot consider the configuration of such classifiers where rule match continuation isemployed. It was determined that the algorithm should be redesigned to consider ruleswhere patterns with longer prefixes match before those with shorter prefixes, meaningthat more specific rules would match before less specific ones. The new algorithm alsoneeded to define the concept of an action list: a chain of actions to be executedsequentially. The action list concept was derived from the original Firehose design. Italso in turn complimented and partially contributed to the work done by the NPF DiffServSAPI TG to create an action chaining mechanism.

It was later determined that the LPRC could be modified to support rule sets that definedrule ordering using either longest prefix, ordered precedence, or both. A proof of conceptwas created by this modification, but was not realized as part of the Phase Twoprototype.

By studying the LPRC we determined that one potential application of rule crunching,besides the original intent of single classification driven programmable rule enforcement,is rule verification. Through the operation of rule crunching, rules from different networkservices are compared, conflicts are resolved, and new rules are created as arepresentation of that conflict resolution. Rules resulting from conflict resolution will havea new action list which is the concatenation, in priority order, of the actions from theconflicting rules. Once rule crunching has been completed, a unified rule set, thatrepresents the aggregation of all contributing network services, is created. By prioritizingactions within an action list, the physical flow of packets through the segment of the datacenter is preserved. For instance, if a firewalling action is to be performed before a loadbalancing action, based on these services' priority relationship, this ordering must bepreserved and represented in their combined action list.

By observing the resultant concatenated actions lists, it is possible to validate networkservice configurations. This goal can be achieved by, but not limited to, implementing anyof the following approaches:

- 18 -

• presentation of unified rule set to an administrator for manual verification• flagging of potential misconfiguration as defined by administrator policy• simulation of work load processing to meet a success criteria• flagging or notification of unnecessary or extraneous rules

After validating the rule set, all flagged items, through either manual inspection,simulation, or programmatic identification, should be handled appropriately. This handlingmay include, but is not limited to:

• manual changes to network services configuration• rule set modification (either manual or programmatic)• action list modification (either manual or programmatic)• no operation in the case of false positives

3.2.3 Evolution of prototype

In order to prototype and study the concepts conceived in Phase Two, four projectmilestones were defined. The goal of the first milestone was to use an existing softwarePRED (uPRED) as a platform for development, implement the ForCES-esque protocoland provide a mechanism for encapsulating NPF API calls over this protocol. To proveand complete this programming model implementation, it was necessary to implement astub control plane policy manager (CPPM) process using the Java platform which alsoimplements the ForCES-esque protocol and the necessary NPF encapsulation.

The second milestone resulted from the most intensive prototyping effort of the project,implementing the CPPM, the LPRC, and stub applications to drive network service policyinputs. The CPPM and LPRC implementations focused on manipulating network servicepolicies defining five-tuple rules (protocol, source/destination IP addresses, andsource/destination ports.) Five-tuples as pattern filters were selected because they areused to define the policies of many popular network services. During this milestone, wewere able to make many algorithmic improvements to the longest prefix rule crunchingalgorithm.

As a way of proving both our programing model and architecture using an existinghardware PRED, the third milestone was started. The goal was to take the PUMAcontent load balancer and turn it into a more generic PRED. The work made it possibleto implement generic classification, firewalling and IP forwarding actions in the Sun FireB10n firmware. Milestone three built upon this work by enabling the configuration of rulesusing NPF API calls encapsulated over the ForCES protocol. The intention was toreplace the uPRED with this high performance programmable rule enforcement device.

After combining all these components and demonstrating a complete NEon platform,consisting of the CPPM and B10n-PRED, we defined a milestone to analyze theperformance of the LPRC. The goal was to develop software for studying the CPPM andproduce an analysis report of the LPRC implementation. The output of this milestone

- 19 -

was a description of the analysis methodology and observations, a presentationdocumenting our findings, and an addition to the LPRC white paper.

Furthermore, we explored what it would mean to manage multiple, heterogeneous,programmable rule enforcement devices, loosely termed distributed, programmable ruleenforcement. We developed a management module that takes as input the policy thatneeds to be enforced on all data traffic entering/exiting a network domain and thecapabilities of each programmable rule enforcement device. It generates the set of rulesappropriate for each PRED. Upon any dynamic update of the policy, the managementmodule ensures that updates are propagated to all impacted devices. Dynamic updatesmay result from changes in the network (e.g., link/device failure), or could be inserted byan administrator (e.g., insertion of a new filter in all firewalls).

Another area of investigation had us focus on gathering and communicating informationand events from the middleware/application layers to Tier 0 network devices. We createdan event communication system between the end host servers and the control planepolicy manager through a Java-based event manager. This approach gives networkdevices visibility into observations that only end-host servers have and that are todayunavailable to network devices. This idea enables an agile and responsive virtualizationof the server tier based on the requirements and demands of applications.

Finally, we started studying layer 5-7 XML data processing as a first step to determinehow deeply we can process data at line rate. We focused on designing algorithms andmethodologies for load balancing and flow redirection of XML-based traffic streams. Inparticular, we focused on applications exchanging data using XML where packetscontaining XML data needed to be routed to the appropriate resource (server cluster)based on the document content and state. This investigation included packet levelfiltering based on XML tags and document content, assigning priorities to importanttransmissions (e.g., financial transactions could get prioritized over general browsing,)and bandwidth sharing based on end user priority level.

4. Lessons Learned

Project NEon investigated a paradigm shift away from special purpose networkappliances to an integrated way to architect, operate, and manage data plane networkservices. We were interested in evaluating the benefits of data flow management andenforcement inside the data center edge. We therefore investigated ways to architectdata flow management over high bandwidth network connections feeding data centers,and focused our attention on handling data flows vs. individual packets. The project tookon an iterative process of thought, analysis, prototyping, and experimental evaluation.

In the NEon architecture, we chose to strictly divide the control plane, an instance ofwhich we called the control plane policy manager, and the data plane, instances of whichwe called programmable rule enforcement devices. This division allowed us to

- 20 -

encapsulate heterogeneous, programmable rule enforcement hardware into its own layer,thus steering clear from the mingling of policy and enforcement that is often found innetworking appliances.

The control plane policy manager accepts as input network services policies such as afirewall and load balancer configuration, application feedback such as server loadinformation, environmental feedback such as current physical link status, andadministrative input. Such policies, we found, could be combined when patternsoverlapped (i.e., their description matched multiple flows), breaking ground for theinvention of what we termed rule crunching. Through rule crunching, multiple rule sets ofa number of networking services are transformed into an optimized set of rules that isconsequently enforced by programmable rule enforcement devices. In the control plane this approach enabled experimentation with different algorithmicapproaches to optimize what the hardware needed to enforce. We found that a widerange of networking services can be expressed through their policies which we calledrule sets. Each rule would be expressed as a pattern and an associated action list. Eachpattern would determine the network flow to which its associated action list needed to beapplied. As a result we were able to virtualize networking services: they wereimplemented through configuring our programmable rule enforcement hardware, yet at ahigher layer abstracted into just the policy required to drive such services. By having aunified rule set, classification and action processing only needed to be performed oncefor any given flow, offering a per-packet latency reduction in the data path.

We experimented with patterns whose matching priority is based on longest prefixmatching as well as ordered precedence and found that there is an algorithm that canconsider both types at the same time. We also experimented with the notion of offlinevs. online rule crunching. Our prototype implements offline rule crunching, where all rulesto be crunched are known before the algorithm commences. When a new rule is addedto the input policy, the algorithm starts over on the updated input set of rules. The notionof online rule crunching takes advantage of the idea that perhaps only an incrementalupdate may be necessary if a single rule is added. However, once we analyzed theruntime behavior of the offline rule crunching algorithm, we found that it operated soefficiently that we saw no need to further pursue the idea of online rule crunching.

We analyzed the longest prefix rule crunching (LPRC) algorithm using three types of datasets. Firstly, we used hand-crafted data sets to help convince ourselves of itscorrectness and to explore boundary conditions (e.g., lots of pattern conflicts.) Secondly,we used real world Tier 0 configurations, and thirdly we used large, generated data setsto explore LPRC scalability. We varied n, the number of rules per service, and m, thenumber of services. We found that the algorithm has a low complexity if the conflict rate(number of conflicts divided by the total number of comparisons) is low. In that case theexpected and experimentally validated complexity was O(n2) in the number of rules andO(log(m)) in the number of services (if n·m=c, i.e., the total number of rules over avarying number of services is kept constant.) If the conflict rate converged to 100%, thecomplexity would be O(2m) in the number of services for a constant number of rules. Thealgorithm has a complexity of O(n3) for a single service if all rules of the input rule set

- 21 -

conflicted with each other. We found that with the exception of some very unrealisticboundary cases, the conflict rate always converged toward 0, which meant the algorithmnever exhibited its cubic or exponential complexities. Finally, we defined the intersectrate to be the number of intersect conflicts divided by the number of comparisons. As theintersect rate approached 0 in the limit for n·m n*m approaching infinity, the median sizeof the rule set to be installed in programmable rule enforcement devices was order Thetaof the size of the input rule set.

Our implementation of the rule crunching algorithms employed data structures that reliedon the correct ordering of the input rules. If one rule was added to the input set it wouldbe very likely that the rule had to be added somewhere in the middle of the input set,thus, on average, requiring rerunning the algorithm over half of the input set. We arecertain there is great value in online rule crunching, especially in circumstances whereinput rule sets are very large, where rules from many different services are to becombined, or where a great numbers of conflicts exists among rule patterns. In all thesescenarios we expect a significant performance gain for just performing the incrementalwork compared to its offline counterpart, even if, e.g., more costly data structures have tobe maintained.

We explored the idea to semantically validate rule sets in the control plane policymanager. When validating unified rule sets, a control plane policy manager maycompare the unified rule set against one or more defined policies. Alternatively, it mayapply the unified rule set to either captured or manually specified simulated networkpackets. Furthermore, it could identify extraneous rules or actions, or present its unifiedrule set for manual verification. If errors are identified, automatic or manual correction ofthe input rule sets would be the consequence.

A control plane policy manager and its enforcers are separated through standardinterfaces and protocols that are still being defined by standards bodies such as theNetwork Processor Forum (NPF) and the Internet Engineering Task Force (IETF.) Ourprototypes validated the existing APIs and the IETF ForCES (Forwarding and ControlElement Separation) protocol under consideration. Lessons learned during this effortwere fed back to the individual standards bodies at that time and the APIs and protocolschanged accordingly, based on our feedback.

This standards-based separation enables the use of commercial-off-the-shelf (COTS)network processor (NPU) technology. The vision is that NPF standards become thedefault way to program these devices. NPF API calls are translated to vendor-specificAPIs, thus offering application portability. We validated our claim using three quitedifferent hardware platforms: a network processor chip called PolicyEdge from a startupfirm called FastChip, a modified content load balancer blade from Sun's blade serverplatform B1600, and Sun Secure Application Switch – N2000 Series. In the NEonarchitecture, the network edge can take advantage of the COTS performance curve whiletrivially coping with programmable rule enforcement device hardware heterogeneity.

- 22 -

Given such a network edge programming platform, it becomes possible to automaticallyprovision network services in response to application needs inside the network. Theartificial division that exists today between the tasks of system administrators on the onehand and network administrators on the other is removed. The NEon architectureenables automatic and semantically consistent programming of the network edge as anintegral part of service provisioning. Fine-grained feedback can now be automaticallygiven to the network, to make decisions about where flows go, how they are shaped, etc.Such automatic feedback is obviously preferable over having the network make thesetypes of decisions independently, based on past or magically predicted future userbehavior.

During the course of our investigation it became clear that there are a number of gradualtransition scenarios to phase in this architecture, e.g., into existing data centerdeployments. To start, consider a Tier 0 network service deployment as it is populartoday. First a control plane policy manager can be added that uses the existing networkservice appliances as its programmable rule enforcement devices. The policy to be fedto these devices is just what they would be configured with today. In this initial step, thecontrol plane policy manager does nothing but forward the individual policies to theirmatching enforcers. Later, though, a real programmable rule enforcement device couldbe added. With real we mean a device that is agnostic of the particular service it needsto implement, in other words, more generic in its functionality. For example, theprogrammable rule enforcement device is not tied to just providing a load balancingservice. Instead, it could perform generic packet classification, followed by a number ofactions such as drop, mark, change field contents, or checksum recalculations. Thecontrol plane policy manager could now start crunching rules and distribute rule sets, asappropriate, to this generic programmable rule enforcement device. Over time, theindividual appliances could be completely replaced by more and more versatile andpowerful programmable rule enforcement devices, while preserving the view of individual,albeit virtual, network services to the network service administrators. Finally, the networkservice input rule sets could be generated automatically in response to applicationprovisioning, and augmented by environmental events.

5. Future Work

We have not fully answered the question to what extent NEon-based technologies will beappropriate for processing application layer content within programmable ruleenforcement devices. The open challenges are closely related to the lack ofstandardization in how layer 7 rules are specified by network services. In the case oflayer 4, and any rules for fixed offset protocols, performing policy aggregation and rulecrunching is well understood, partially because of our ability to concretely describe thepatterns related to these rules in an industry standard way. There are subtle differencesrelated to translating rule and pattern grammars to the lower level semantics of a device(e.g., in the case of patterns which express port ranges and devices that do not supportsuch semantics), however the problems in this space are well understood. For layer 7,

- 23 -

rules are specified in a number of different ways. Given for example the following threediverse approaches, it is obvious why devising a common way for dealing with layer 7rules from many different network services is difficult:

• One way is to utilize a predefined protocol definition and to supply name-value pairsfor the desired fields. This approach can be seen in many HTTP protocol loadbalancers that allow administrators to express their URLs, COOKIEs, URIs, etc.Network services that employ this approach typically support only a specific or limitedset of protocols.

• Another approach is to use regular expressions to be applied to the packet payloadand/or procedural languages for traversing the packet. This approach is employed byintrusion detection systems that need to look for packet signatures.

• Yet another approach is more difficult to describe because rules are expressed inbinary form and/or through proprietary mechanisms. This practice is employed bymany subscription-based network services like anti-virus protection.

Another dimension driving the confusion around adapting NEon for layer 7 is howclassification is implemented. Layer 7 classification is a current point of controversy inthe network processor space. Traditionally, layer 7 classification was done on a smallsubset of packets as a slow path, or exception path, operation. Lower level classificationis used to direct packets into this exception path so that deeper classification can beperformed in software. The evolution of network processors has started a move toperform deep classification in the fast path which is difficult to accomplish becausepayload inspection may commonly require connection termination and/or packetqueueing. Some hardware classifiers allow for performing classifications that spanmultiple packets without termination, though their application is limited and are bettersuited for monitoring and inspection-based network services.

For implementations that do employ connection termination, typically a lower levelclassification is used to drive packets to different processing paths. Because this lowerlevel classification provides a point of differentiation which is typically defined on a per-application service basis, different deep classifiers are used for processing packetsrelated to various application services, meaning that these layer 7 rules are typicallytightly coupled with a layer 4 definition of an application service rule. An example isdirecting web server traffic for URL inspection. Thus, a simple definition of layer 7 rulesto adapt for rule crunching would be to specify name-value pairs for predefined protocoldefinitions.

As described in our findings, adapting the rule cruncher algorithm for different filter typescan be achieved through understanding two things: how to determine the matchrelationship between the patterns of two rules and how to create a pattern whichdescribes the intersection between these two rules. For the name-value pair approach,this might not be too difficult given that patterns can be properly compared andcombined. Combining patterns gets difficult if regular expressions are used in the valuedefinition. This challenge is pretty common for today's layer 7 load balancers which allowfor wild card and regular expression definitions for applications layer information.

- 24 -

Considering the application of rule crunching for these type of rules, it appears that thesame benefits would apply as they do for layers 2-4. Areas of concern are around thepotential for rule explosion which occurs as the rate of intersection increases. Anotherarea of concern would be around creating inefficient patterns for classification if regularexpressions are supported.

Another issue, that has not been resolved, is how to deal with actions that change theidentity of the packet. The output of rule crunching is a unified rule set where each rule iscomposed of a pattern and action list. This action list can have one or more elements todescribe the sequence in which different network service functions are applied. Bydefault, all processing sequences for packets start with some sort of classification action.Classification is used to determine the identity of the packet, if it belongs to an alreadyestablished flow and to identify which actions need to be performed before the packetcan be delivered to it's destination. Some actions, like load balancing or NAT, mightchange the identity of a packet by rewriting parts which are used to determine flowinformation. For instance, some load balancers will rewrite the destination IP addresswhen selecting a server from a load balance group. Consider a scenario where one ofthese actions is executed before others in a list. Action lists are created by concatenatingthe actions from different network service policies into a priority ordered list. These listssimulate the physical path that the packet would take when traversing multiple networkservice appliance boxes. If an action is changing the identity of the packet, it mightchange how a packet should be handled and thus invalidates the rest of the action list.An obvious way to deal with this is to reclassify the packet. However, the reclassificationmust occur with some context about where in the packet is in the processing path.Consider, for example, an action list composed of firewall, load balancing and diffservactions, in that respective priority order. If the load balancing action changes the identityof the packet and thus would invalidate the diffserv action, a reclassification is needed.However, the reclassification must be based only on rules containing actions related todiffserv, because firewall processing has already been performed. The NEon team hasnot yet devoted the time to identify a mechanism for tracking the packet context in orderto apply the appropriate set of actions when reclassification is needed.

Acknowledgments

Special thanks to Robert Bressler who initiated the NEon thinking at Sun and to LisaPavey for her leadership and support. Thanks also to our team alumni Cindy Dones andto our summer interns Ahsan Habib, Mohamed Hefeeda, Nan Jin, Sumantra Kundu,Santashil PalChaudhuri, George Porter, Ioan Raicu, and Aaron Striegel. We are gratefulfor many fruitful discussions with Danny Cohen, Raphael Rom, Ariel Hendel, JochenBehrens, Ranjit Henry, Paul Philips, and Tim Knight.

- 25 -

References

[1] Munro, Alistair. FAPI Model and User Guidelines Implementation Agreement. NPF,December 22, 2004.

[2] Cain, Gamil. LFB Template. npf2003.407, NPF, 2003.

[3] Khosravi, Hormuzd. FAPI Topology Management Functional API ImplementationAgreement. NPF, December 22, 2004.

[4] Keany, Bernie. Extending the Software Framework. npf2004.015, NPF, 2004.

[5] Renwick, John. Generic Classification LBF and Functional API ImplementationAgreement. NPF, December 22, 2004.

[6] Yang, Lilly et al. ForCES Forwarding Element Functional Model. (draft-ietf-forces-model-03), IETF, ForCES WG, 2003.

[7] Khosravi, Hormuzd and Anderson, Todd. Requirements for Separation of IP Controland Forwarding. RFC3654, IETF, ForCES WG, 2004.

[8] Schuba, Christoph and Hefeeda, Mohamed, and Goldschmidt, Jason, and Speer,Michael. Scaling Network Services Using Programmable Network Devices. IEEEComputer, April 2005.

- 26 -

About the Authors

Christoph Schuba is a senior staff engineer in the Security Program Office at SunMicrosystems, Inc. He studied mathematics and management information systems at theUniversität Heidelberg and the Universität Mannheim in Germany. As a Fulbrightscholar, he earned his M.S. and Ph.D. in Computer Science from Purdue University in1993 and 1997, performing most of his dissertation research in the Computer ScienceLaboratory at the Xerox Palo Alto Research Center (PARC). Christoph has taughtgraduate courses in computer and network security, cryptography, operating systems,and distributed systems at the Universtität Heidelberg, Germany, at the InternationalUniversity in Bruchsal, Germany, and at San José State University. His research interestsinclude network and computer system security, high-speed networking, and distributedsystems.

Jason Goldschmidt is a member of the technical staff, Network Systems Group at SunMicrosystems, Inc. Jason joined Sun in the Summer of 2000 after receiving his B.S. inComputer Science and Engineering from Bucknell University. His research interestsinclude network processing, protocols and scalable architectures.

Kevin Kalajan is a strategist for Sun Microsystems in the Market DevelopmentOrganization in Menlo Park, California. He majored in Computer Science at ColumbiaUniversity in the City of New York where he graduated cum laude and with departmentalhonors. He has nearly 16 years experience at Sun Microsystems during which time hehas released over 30 networking products and has been issued 5 networking-relatedpatents. His research interests include horizontally-scaled systems and wirelessnetworks.

Michael Speer received his B.S. in Computer Engineering in 1986 from the University ofCalifornia, San Diego. Michael's research interests include scalable network systemsand service architectures for a variety of network deployments including Data CenterNetworking and Telecommunications Networking. Today, Michael works in NetraSystems and Networking as a Senior Staff Engineer working on next generation ATCAblade/chasis platforms. Michael is a member of the IEEE Computer Society, the ACMand USENIX.

- 27 -

integrated network service processing using programmable network devices

Documents