darwin: customizable resource management for value-added ...hzhang/papers/icnp98.pdf · darwin:...

12
Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra , Allan Fisher , Corey Kosak , T. S. Eugene Ng , Peter Steenkiste , Eduardo Takahashi , Hui Zhang Electrical and Computer Engineering, School of Computer Science Carnegie Mellon University Email: prashant,alf,kosak,eugeneng,prs,takahasi,hzhang @cs.cmu.edu Abstract The Internet is rapidly changing from a set of wires and switches that carry packets into a sophisticated infrastructure that delivers a set of complex value-added services to end users. Services can range from bit transport all the way up to distributed value-added services like video teleconferencing, data mining, and distributed interactive simulations. Before such services can be supported in a general and dynamic manner, we have to develop appropriate resource manage- ment mechanisms. These resource management mechanisms must make it possible to identify and allocate resources that meet service or application requirements, support both iso- lation and controlled dynamic sharing of resources across organizations sharing physical resources, and be customiz- able so services and applications can tailor resource usage to optimize their performance. The Darwin project is developing a set of customizable resource management mechanisms that support value-added services. In this paper we present these mechanisms, describe their implementation in a prototype system, and describe the results of a series of proof-of-concept experiments. 1. Introduction There is a flurry of activity in the networking community developing advanced services networks. Although the focus of these efforts varies widely from per-flow service definitions like IntServ to service frameworks like Xbind, they share the overall goal of evolving the Internet service model from what is essentially a bitway pipe to a sophisticated infrastructure capable of supporting novel advanced services. In this paper, we consider a network environment that comprises not only communication services, but storage and computation resources as well. By packaging stor- age/computation resources together with communication ser- vices, service providers will be able to support sophisticated value-added services such as intelligent caching, video/audio transcoding and mixing, virtual reality games, and data min- ing. In such a service-oriented network, value-added services can be composed in a hierarchical fashion: applications invoke This work was sponsored by the Defense Advanced Research Projects Agency under contract N66001-96-C-8528. high-level service providers,which may in turn invoke services from lower-level service providers; the most basic service is provided by bitway and computation providers. Since services can be composed hierarchically, both applications and service providers will be able to combine their own resources with resources or services delivered by other providers to create a high-quality service for their clients. The design of such a service-oriented network poses chal- lenges in several areas, such as resource discovery, resource management, service composition, billing, and security. In this paper, we focus on the resource management architec- ture and algorithms for such a network. In particular, we observe that service-oriented networks have several impor- tant differences from traditional networks that make exist- ing network resource management inadequate. First, while traditional communication-oriented network services are pro- vided by switches and links, value-added services will have to manage a broader set of resources that includes computa- tion, storage, and services from other providers. Moreover, interdependencies between different types of resources can be exploited by value-added service providers to achieve higher efficiency. For example, by using compression techniques, one can make tradeoffs between network bandwidth and CPU cycles. Similarly, storage can sometimes be traded for band- width. Furthermore, value-added services are likely to have service-specific notions of Quality of Service (QoS) that would be difficult to capture in any fixed framework provided by the network. Therefore, the network must allow service providers to make resource tradeoffs based on their own notion of qual- ity. Finally, it must not only be possible to make the above tradeoffs at startup, but also to adapt resource management to the changing network conditions on an ongoing basis. The challenge is to accommodate service-specific qualities of ser- vice and resource management policies for a large number of service providers. To support these new requirements, we argue that resource management mechanisms should be flexible so that resource management policies are customizable by applications and service providers. We identify three dimensions along which customization is needed: space (what network resources are needed), time (how the resources are applied over time), and organization (how the resources are shared with other organi-

Upload: others

Post on 02-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

Darwin: CustomizableResourceManagementfor Value-AddedNetwork Services

PrashantChandra�, Allan Fisher

�, Corey Kosak

�,

T. S.EugeneNg�, PeterSteenkiste

�, EduardoTakahashi

�, Hui Zhang

��

ElectricalandComputerEngineering,� Schoolof ComputerScienceCarnegieMellon University

Email: � prashant,alf,kosak,eugeneng,prs,takahasi,hzhang � @cs.cmu.eduAbstract

TheInternet is rapidly changingfrom a setof wires andswitchesthatcarry packetsinto a sophisticatedinfrastructurethat deliversa set of complex value-addedservicesto endusers.Servicescanrangefrombit transportall thewayup todistributedvalue-addedserviceslike videoteleconferencing,data mining,and distributedinteractivesimulations. Beforesuch servicescan be supportedin a general and dynamicmanner, we haveto develop appropriate resource manage-mentmechanisms.Theseresource managementmechanismsmustmakeit possibleto identify and allocateresourcesthatmeetserviceor applicationrequirements,supportboth iso-lation and controlled dynamicsharing of resources acrossorganizationssharing physicalresources, and be customiz-ablesoservicesandapplicationscantailor resourceusagetooptimizetheir performance.

The Darwin project is developinga set of customizableresource managementmechanismsthat supportvalue-addedservices.In thispaperwepresentthesemechanisms,describetheir implementationin a prototypesystem,anddescribetheresultsof a seriesof proof-of-conceptexperiments.

1. Intr oductionThereis a flurry of activity in the networkingcommunity

developingadvancedservicesnetworks. Although the focusof theseeffortsvarieswidely from per-flow servicedefinitionslike IntServto serviceframeworkslike Xbind, they sharetheoverallgoalof evolving theInternetservicemodelfrom whatis essentiallya bitway pipe to a sophisticatedinfrastructurecapableof supportingnovel advancedservices.

In this paper, we considera network environment thatcomprisesnot only communicationservices, but storageand computationresourcesas well. By packagingstor-age/computationresourcestogetherwith communicationser-vices,serviceproviderswill beableto supportsophisticatedvalue-addedservicessuchasintelligentcaching,video/audiotranscodingandmixing, virtual reality games,anddatamin-ing. In sucha service-orientednetwork,value-addedservicescanbecomposedin ahierarchicalfashion:applicationsinvoke

Thisworkwassponsoredby theDefenseAdvancedResearchProjectsAgencyundercontractN66001-96-C-8528.

high-levelserviceproviders,whichmayin turninvokeservicesfrom lower-level serviceproviders; the mostbasicserviceisprovidedbybitwayandcomputationproviders.Sinceservicescanbecomposedhierarchically, bothapplicationsandserviceproviderswill be able to combinetheir own resourceswithresourcesor servicesdeliveredby otherprovidersto createahigh-qualityservicefor theirclients.

Thedesignof sucha service-orientednetworkposeschal-lengesin several areas,suchasresourcediscovery, resourcemanagement,servicecomposition,billing, and security. Inthis paper, we focus on the resourcemanagementarchitec-ture and algorithmsfor such a network. In particular, weobserve that service-orientednetworkshave several impor-tant differencesfrom traditional networksthat make exist-ing network resourcemanagementinadequate.First, whiletraditionalcommunication-orientednetworkservicesarepro-vided by switchesand links, value-addedserviceswill haveto managea broadersetof resourcesthat includescomputa-tion, storage,andservicesfrom otherproviders. Moreover,interdependenciesbetweendifferenttypesof resourcescanbeexploitedby value-addedserviceprovidersto achieve higherefficiency. For example,by using compressiontechniques,onecanmaketradeoffs betweennetworkbandwidthandCPUcycles. Similarly, storagecansometimesbetradedfor band-width. Furthermore,value-addedservicesarelikely to haveservice-specificnotionsof Qualityof Service(QoS)thatwouldbedifficult to capturein any fixedframework providedby thenetwork.Therefore,thenetworkmustallow serviceprovidersto makeresourcetradeoffs basedon their own notionof qual-ity. Finally, it mustnot only be possibleto makethe abovetradeoffs atstartup,but alsoto adaptresourcemanagementtothe changingnetworkconditionson an ongoingbasis. Thechallengeis to accommodateservice-specificqualitiesof ser-vice andresourcemanagementpoliciesfor a largenumberofserviceproviders.

To supportthesenew requirements,wearguethatresourcemanagementmechanismsshouldbeflexible so that resourcemanagementpolicies are customizableby applicationsandserviceproviders. We identify threedimensionsalongwhichcustomizationis needed:space(what networkresourcesareneeded),time (how theresourcesareappliedover time), andorganization(how theresourcesaresharedwith otherorgani-

Page 2: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

zations).Additionally, themechanismsalongall threedimen-sionsneedto to beintegrated,sothatthey canleverageoff oneanother.

TheDarwin project is developinga comprehensive setofcustomizableresourcemanagementtools that supportvalue-addedservices. In this paperwe first identify key resourcemanagementrequirements(Section2). We thenoutline andmotivatethe Darwin architecture(Section3). In Sections4through7 we describeeachof the four componentsof theDarwin architecturein more detail, and we illustrate theiroperationusing several proof of conceptexperiments. Wedemonstratethe integratedoperationof Darwin in Section8,discussrelatedworkin Section9,andsummarizein Section10.

2. CustomizableResourceManagement2.1.ResourceManagementRequirements

Theattributesof value-addedservicesdescribedin theIn-troductionsuggestseveralrequirementsfor resourcemanage-ment. We organizetheserequirementsinto the followingclasses:resourcemanagementin space,acrossorganizationalboundaries,andin time.� space: Servicesneedto be ableto allocatea rich set

of resources(links, switch capacity, storagecapacity,computeresources,etc.) to meet their needs. Re-sourcesshouldbe allocatedin a coordinatedfashion,sincethereareoftendependenciesbetweenresources,andthereshouldbesupportfor globaloptimizationofresourceuse.� organizations: Serviceproviderswill want to reservesomeresourcessothey canmeetcertainminimalQoSrequirements;at the sametime, they will want to dy-namicallyshareresourcesfor efficiency. Mechanismsareneededto isolateor dynamicallyshareresourcesina controlledfashionacrossorganizations.� time: Conditionsin thenetworkanduserrequirementschangeover time. Servicesmustbe able to quicklychangehow resourcesareusedandallocated,so theycanoperateundera broadrangeof conditions.Becauseof thebroaddiversityin servicesandbecauseser-

viceproviderswill wantto differentiatethemselvesfrom theircompetitors,thereis astrongneedfor customizationthatcutsacrossthesethreedimensions:providers will want to cus-tomizewhatresourcesthey allocate,how they shareresourceswith other organizations,andhow they adapttheir resourceuseover time.

2.2.Darwin ResourceManagementMechanismsWhile it mayseemattractiveto haveasingleresourceman-

agementmechanismthatmeetstheabove requirements,thereareseveral reasonswhy an integratedsetof mechanismsispreferable.First,resourcemanagementactivitiescoverawidedynamicrangein bothtimeandspace:sometasksareinvokedrarely(e.g.only atapplicationsetuptime),whereasothersareinvokedvery often (e.g. on every packetpassingthrougha

router); differentactivities also have differentcosts. Simi-larly, sometasksinvolve only local resources(e.g.thosein asingleswitch),whereasothersmight involveresourcesacrossmany networks(e.g.optimizingthepositionof avideoconfer-enceserver in a multicountryconferencecall). Finally, sometasksrequiredetailednetworkknowledge(e.g.selectingtheright QoS parametersfor a guaranteedsession),while oth-ersrequiredomainspecificknowledge(e.g.determiningthecomputationalcostof a MPEG to JPEGconversion). Giventhe diversity of thesetasks,softwareengineeringprinciplesargueagainstbuilding a singlemonolithic,complex resourcemanagementmechanism. Instead,the Darwin architecturecomprisesasmall family of relatedmechanisms:� High-levelresourceallocation: thismechanism,some-

times called a resourceor servicebroker, performsglobalallocationof theresourcesbasedonahigh-levelapplicationrequest,typically usingdomainknowledgefor optimization.Its tasksincludeperformingtradeoffsbetweenservices(e.g.tradingcomputationfor commu-nication) accordingto the application-selectedvaluemetric(e.g.minimizecost,maximizeservicequality).It mustalsoperformcoordinatedallocationsfor inter-dependentresources(e.g.sincetheamountof processorpowerrequiredbyasoftwaretranscoderisproportionalto thebandwidthof videodataflowing throughit, thesetwo allocationsmustbe correlated.)We alsowant toprovide the ability in somecasesto interconnectin-compatibleservicesor endpoints,for exampleauto-matically insertinga transcoderservicebetweentwootherwiseincompatiblevideoconferenceparticipants.Thebrokerin theDarwinsystemis calledXena[8].� Low-level resource allocation: this mechanismis asignalingprotocolthat providesan interfacebetweenXena’s abstractview of the network and low-levelnetwork resources. It has to allocate real networkresources–bandwidth,buffers, cycles,memory–whilehidingthedetailsof networkheterogeneityfromXena.TheDarwinsignalingprotocolis calledBeagle[9].� Runtimeresource management: this mechanismin-jectsapplicationor servicespecificbehavior into thenetwork. Ratherthanperformingruntimeadaptationat flow endpoints(wherethe informationprovidedbynetworkfeedbackispotentiallystaleandinaccurate),itallows rapid local runtimeadaptationat theswitchingpointsin thenetwork’s interior in responseto changesin networkbehavior. Darwin runtime customizationis basedoncontrol delegates, Java codesegmentsthatexecuteon routers.� HierarchicalScheduling: thismechanismprovidesiso-lation and controlledsharingof individual resourcesamongserviceproviders. For eachphysicalresource,sharingand thus contentionexist at multiple levels:at the physicalresourcelevel amongmultiple serviceproviders,at the serviceprovider level amonglower

Page 3: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

4 431

2

Comm

Library

Appl

Appl

Library

Comm

Appl

Library

Comm

Beag

Appl

Library

Comm

Appl

Library

Comm

Appl

Library

Comm

Appl

Library

Comm

DelBeag

Hierarc.Local RM

ClassifierSched

Del

Hierarc.Local RM

ClassifierSched

Hierarc.Local RM

ClassifierSched

Beag Del

Xena

Service Providers

Figure 1. Darwin components

level serviceprovidersor organizations,andat theap-plicationlevel amongindividualflows. Thehierarchi-calschedulerallowsvariousentities(resourceowners,serviceproviders,applications)to independentlyspec-ify differentresourcesharingpoliciesandensurethatall thesepolicy requirementsaresatisfiedsimultane-ously. DarwinusestheHierarchicalFairServiceCurve(H-FSC)scheduler.Thesemechanismswere chosenbecausethey cover the

space,organization,and time requirementsoutlined in Sec-tion 2, andbecausethey supportresourcemanagementon avarietyof time scalesandscopes.

The different resourcemanagementmechanismscan betied togetherusing the abstractionof a virtual network. Avirtual network,sometimesalsocalleda virtual mesh[23] orsupranet[12], is thesetof resourcesthatareallocatedandman-agedin an integratedfashionto meetspecificneeds.Virtualnetworkscanbeusedat theapplicationandserviceproviderlevel. An applicationmeshwould be createdin responsetoanapplicationservicerequest,while a servicemeshcapturesthe resourcescontrolledby the provider to meetservicere-quests.A virtual networknot only capturesresources,but italsocanincludestateandcodethatrepresentsor implementsresourcemanagementpolicies that areappropriatefor thoseresources.For example,theDarwindelegatesthatimplementcustomizedruntimeresourcemanagementareassociatedwiththevirtual network. Thecreationof thevirtual networkpro-videsanopportunityto doglobalresourceoptimizationandtoestablishstatein thevirtualnetworkthatwill speedupruntimeadaptation.

3. Darwin3.1.Darwin SystemArchitecture

Figure1 showshow thecomponentsin theDarwinsystemwork togetherto managenetworkresources.Applications(1)runningon end-pointscansubmitrequestsfor service(2) toa resourcebroker(Xena). Theresourcebrokeridentifiestheresourcesneededto satisfytherequest,andpassesthis infor-

Classifier Scheduler

Local Resource Manager

Beagle Delegates

Route lookup

Routing

Control API Event Notification

XenaOther Beagle Entities

OtherRoutingEntities

ApplicationsOther Delegates

Figure 2. Darwin node software architecture

mation(3) to asignalingprotocol,Beagle(4),whichallocatestheresources.For eachresource,Beagleinteractswith a localresourcemanagerto acquireandsetup theresource.Thelo-cal resourcemanagermodifieslocal state,suchasthatof thepacketclassifierandschedulershown in thefigure,sothatthenew applicationwill receive theappropriatelevel of service.The signalingprotocolcan alsosetup delegates. Through-out this process,appropriateresourceauthorizationsmustbemade. Resourcebrokershave to know what resourcepoolsthey are allowed to use,and the signalingprotocol and lo-cal resourcemanagersmustbe able to validatethe resourceallocationrequestandsetupappropriatebilling or charging.

Figure2 shows the componentson eachswitch nodeandtheir most important interactions. The bottom part of thepicturecorrespondsto thedataplane.Thefocusin this com-ponentis on simplicity and high throughput. The top halfcorrespondsto the control plane. Activities in the controlplanehappenon a coarsertime scale;althoughthereis onlya limited setof resourcesavailableto supportcontrol activi-ties,thereis moreroomin thecontrolplanefor customizationandintelligent decisionmaking. We expect that in general,thelocalresourcemanagerwill executeonaCPUcloseto thedatapath.Routing,signalinganddelegates,ontheotherhand,arenot astightly coupledto thedatapath,andcouldrunonaseparateprocessor.

The Darwin architectureis similar in many ways to tra-ditional resourcemanagementstructures. For example,theresourcemanagementmechanismsfor the Internetthat havebeendefinedin theIETFin thelastfew yearsrely onQoSrout-ing [28, 16] (resourcebrokers),RSVP[32] (signalingsimilarto Beagle),and local resourcemanagersthat set up packetclassifiersandschedulers.Themorerecentproposalsfor dif-ferentiatedservice[33] requiresimilar entities. The specificresponsibilitiesof the entitiesdiffer, of course,in thesepro-posals.In Darwin,we emphasizetheneedfor customizationof resourcemanagement,hierarchicalresourcemanagement(link sharing),andsupportfor not only communication,butalsocomputationandstorageresources.

Page 4: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

� ��

� �

� ��

� ��

� �

� �� � � � � � � � � �

� � � � � � � � � � ! " � �

Figure 3. Darwin IP testbed topology

3.2.Darwin TestbedTheDarwinsystemhasbeenimplemented;andthroughout

thispaperweshow theresultsof avarietyof experimentscon-ductedon the Darwin networktestbed.The topologyof thetestbedis shown in Figure3. The threeroutersarePentiumII 266 MHz PCsrunningFreeBSD2.2.5. The endsystemsm1 throughm8 are Digital Alpha 21064A 300 MHz work-stationsrunningDigital Unix 4.0. The endsystemsp1 andp2 arePentiumPro200Mhz PCsrunningNetBSD1.2DandFreeBSD2.2.5respectively, ands1 is aSunUltrasparcwork-stationrunningSolaris2.5. All links exceptthatbetweens1andwhiteface arefull-duplex point-to-pointEthernetlinksconfigurableaseither100Mbpsor 10 Mbps. Thelink to s1runsonly at10Mbps.

In the remainderof the paperwe describeXena,Beagle,delegates,andhierarchicalschedulingin moredetail. Figure4showstheexamplethatwewill usethroughoutthepaper. Thetop of the figure shows the input to Xena. The centerpartshows the information that Xenapassesto Beagle,and thebottomsectionshowstheinformationusedto setup thelocalresourcemanager.

4. Xena: ResourceAllocation in SpaceTheprocessof allocatingresources,eitherby anapplica-

tion or provider, hasthreecomponents.Thefirst componentis resourcediscovery: locatingavailable resourcesthat canpotentiallybeusedto meetapplicationrequirements.This re-quiresa resourcediscovery protocol. Thesecondcomponentis solvinganoptimizationproblem:identifying theresourcesthat areneededto meetthe applicationrequirements,whilemaximizingquality and/orminimizing cost. Finally, the re-sourceshave to beallocatedby contactingthe providersthatown them. In our architecture,the first two functionsareperformedby a servicebroker called Xena, while the thirdfunctionis performedby thesignalingprotocolBeagle(Sec-tion 7).

4.1.XenaDesignThe applicationprovides its resourcerequestto Xena in

the form of an application input graph, an annotatedgraphstructurethatspecifiesdesiredservices(asnodesin thegraph)and the communicationflows that connectthem (asedges).

# $ % & ' % ( ) * + , # $ % & ( + , -# . / 0 1 2 1 3 4 5 5 6 7 8 9 2 9 : ; < 7 = > > 9 ? 9 @ ;7 A 9 B ; < 7 = > 9 = C > ;# D / 0 1 2 1 3 4 5 5 6 7 8 9 2 9 : ; < 7 = > > 9 C 9 @ ;7 A 9 B ; < 7 = C 9 C > ;# E F G B F < C ># H F G B F < = >I J K L M L K N O PQ P R S T U V W T X Y Z [ \ ] Y ^ _ ` a b Y \ c

d e e ` Y f a b Y \ c

g h g i g j g kl m g ]

l

n L V o O o W p L W O P q o R N U L J rs t u vw x u y z { | | x } | ~ � }� u � � � z } { | | x } | ~ � }s t u v � z } �s � ~ � � s � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � �

� V � U R o O P  p O o L J r

¡ L o ¢ N O P£ U U P L W O ¢ L R J ¤ V T p ¥ ¦¥ §

¥ ¨¥ ©

¥ ª

] « ¬ ­ ® ¯ l °

] ± ¬ ² ® ¯ l °

³ « ¬ ­ ® ¯ l °

³ ± ¬ ­ ® ¯ l °´ µ a c ¶ f \ Z [ µg jg k

g hg i · ¸¥ ¹

¥ º¥ »

¼ ½ ¾ ¿ À Á Â¥ à ¿ Ä Å Â À ¿

Æ Ç Â È ¿ É Ê Ë ¿Ì Í Î ÏÐ Í Ï Ñ Ò Í

Ð Í Ï Ñ Ò ÍÓ Ô Õ Ï Ò Ö Í × Ô Ø Ù Õ ÍÚ Ï Î Ï Ñ Í Ù

Û Ü ¥ ©¥ ¨

X Y Z [ \ Ý Y ¶ e ` a Þß ² ® ¯ l \ µ ­ ® ¯ l àX Y Z [ \ Ý Y ¶ e ` a Þß ² ® ¯ l \ µ ­ ® ¯ l àX Y Z [ \ ] \ _ µ f [ß ² ® ¯ l \ µ ­ ® ¯ l àX Y Z [ \ ] \ _ µ f [ß ² ® ¯ l \ c ` Þ à] Y ^ _ ` a b Y \ cß a e e á ¶ e [ f Y â Y f b Þ e [ à] Y ^ _ ` a b Y \ cß a e e á ¶ e [ f Y â Y f b Þ e [ à

] Y ^ _ ` a b Y \ cß a e e á ¶ e [ f Y â Y f b Þ e [ à] Y ^ _ ` a b Y \ cß a e e á ¶ e [ f Y â Y f b Þ e [ à

¥ ¦ã ã

ã㣠U U P L W O ¢ L R JI J U N ¢ q o O U pä å æ

ä ç æ

X Y Z [ \ ] Y ^ _ ` a b Y \ cd e e ` Y f a b Y \ c

g h g i g j g k« è « é

± é ê± ë

« è é èl

l m g ]ä ì æ X Y Z [ \ ] Y ^ _ ` a b Y \ cd e e ` Y f a b Y \ c

g h g ig j« è « é

± é ê± ë

ll

í î ï � L � ð V o P L J V ñ ò ó í ô ï £ T U V J ñ � L � ð V o P L J V

· õ

Figure 4. Handling an application service re-quest in Darwin

The annotationscan vary in their level of abstractionfromconcretespecifications(place this nodeat networkaddressX) to moreabstractdirectives(this noderequiresa serviceofclassS). Theseannotationsaredirectlyrelatedto thedegreeofcontrol the applicationwishesto exert over the allocation: ameshwith fewer (or moreabstract)constraintspresentsmoreopportunitiesfor optimizationthana highly specifiedone.

In themostconstrainedspecification,theapplicationspeci-fiesthenetworkaddresseswheretheservicesshouldbeplaced,theservicesthemselves,andtheQoSparametersfor theflowsthatconnectthem. In this styleof specification,Xena’s opti-mizationopportunitiesarelimited to coarserouting: selectingcommunicationserviceproviders for the flows in the mesh.In a lessconstrainedspecification,the applicationcan leavethe networkaddressof a serviceunspecified.This providesan additionaldegreeof freedom: Xenanow hasthe abilityto placenodesand routeflows. In addition,the applicationcanleave the exact QoSparametersunspecified,but insteadindicatethe flow’s semanticcontent. An exampleof a flowspecificationmightbeMotion JPEG,with specificframerateandqualityparameters.In additionto providing sufficient in-formationto maintainmeaningfulapplicationsemantics,thisapproachgivesXenatheopportunityto optimizecostor qual-ity by insertingsemantics-preservingtransformationsto the

Page 5: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

mesh.For example,whenaMotion JPEGflow needsto crossa congestednetworksegment,Xenacaninsertmatchedpairof transcodersat two endsof thenetworksegment. Thefirsttranscoderconverts the flow to a more bandwidthefficientcoding format (suchas MPEG or H.261), and then convertit backto JPEGon the far side. Anotheroptimizationis thelowest-costtype unification: a groupof nodesreceiving thesamemulticastflow (say, a videostream)needto agreewiththe senderon the encodingthat is used. If thereis no sin-gle encodingacceptableto all parties,Xenacaninsert“typeconverter”nodesappropriately.

A feasiblesolution to the resourceselectionproblem isonethatsatisfiesall theconstraintsbasedon service-specificknowledgeor applicationspecification,e.g. entitiesat flowendpointsmustagreewith the type of datathat will be ex-changed.Givenasetof feasiblesolutions,Xenaevaluateseachaccordingtotheoptimizationcriteria.In Xena,theseoptimiza-tion criteriaareencodedby anapplication-specifiedobjectivefunctionthatmapscandidatesolutionstoanumericvalue:thisfunctionis composedof asumof terms,whereeachtermrep-resentsthe“quality” of aparticularlayoutchoice.Thisallowsapplicationsto defineservicequalityin anapplication-specificway. For tractability’ssakethe objective functionscurrentlymustbe selectedfrom a small set of built-in functionsthatrepresentuseful objectives; e.g. minimize price, maximizethroughput,minimize network delay. By using applicationspecificcriteriato guideresourceselection,Xenain effect al-lowsapplicationstocustomizethedefinitionof servicequality.

4.2.ImplementationOurcurrentimplementationof Xenaincludestheinterfaces

to the other systementities(applications,serviceproviders,and Beagle),plus the solving engineand optimizationsde-scribedabove. The applicationinterfaceallows the spec-ification of requestthat include nodes, flows, types, andtranscoders.ThecurrentXenaimplementationdoesnothaveageneralresourcediscoveryprotocol. Instead,it offersamech-anismthroughwhich servicescan register their availabilityand capabilities. This information allows Xena to build acoarsedatabaseof availablecommunicationandcomputationresources,andanestimateof their currentutilization. Addi-tionally,Xenamaintainsadatabasethatmapsserviceandflowtypes(e.g.transcodingor transmittingMPEGof acertainqual-ity) to their effective resourcerequirements(e.g.CPUcyclesor Mbps). Finally, thereis adatabasethatcontainsthevarioussemantics-preservingtransformations(e.g. JPEG-to-MPEG)andhow to instantiatethem.

Currently,Xenaexpressesitsoptimizationproblemin termsof an integer linear program,and turns it over to a solverpackage[5] that generatesa sequenceof successively bettersolutionsat successively greatercomputationcost. Sincetheoptimizationproblemis generallyNP-hard,this approachisonly appropriatefor small to mediumsizeproblems. Workis in progressondefiningheuristicsandothersimplifying as-

sumptionsthatwill maketheproblemtractable.Thisapproachwill necessarilytradequality for performance;i.e. ourgoalisto find high quality (but not, in general,optimal)solutionsata reasonablecost.

4.3.Example

Consideran application in which four scientistscom-municatevia a videoconferencingtool that usessoftwareMPEG/JPEGcodersandcollaborateon a distributedsimula-tion thatrunsoveraneight-nodedistributedcomputingtestbed(depictedin Figure4(b)).

Figure4(a) shows the abstractresourcemeshsuppliedtoXena. For the sakeof clarity, somedetail has beenomit-ted;for example,wehavedepictedcommunicationflowing inonedirectiononly. For thevideoconferencingtool, sincethescientistsarephysicallylocatedattheirmachines,theapplica-tion providesto Xenaspecificnetworkaddresses(m1,m2,m5,m6) for the nodesparticipatingin the videoconference.Thenodesarealsoannotatedby therequestedservicetypes(VideoSource,VideoDisplay). Note that the videosourceat m2 iscapableof emittingonly MPEG.Thenodesareconnectedbyamultipoint-to-multipoint flow (only partiallydepicted).Thisflow specificationdescribesonly theconnectivity betweenthenodes;the flow’s exact QoSparametersareleft unspecified.For thedistributedsimulation,theapplicationdoesnotspecifywhatnodesshouldparticipate.

Thescientistsrequesttheminimizecostoptimizationstrat-egy andat thetime of therequest,computationis costlyrela-tive to communicationbecauseof existing loads.This meansthat,eventhoughit is simplestto useMPEGfor thevideo,thelesscomputation-intensiveall-JPEGsolutionismoredesirable(Figure4). Xenacancompensatefor m2’s inability to emitJPEGby employingtheMPEG-JPEGconverterregisteredatm4; despitethe detourthroughm4 andits computationcost,this solutioncomesin at an overall lower cost than the all-MPEGapproach.Moreover, dueto severe loadson m7, m8,andthe routerwhiteface, which areexpressedasa highcost,the leastcostlysolutioncollocatesthedistributedsimu-lation nodeswith thevideoconferencenodes,at m1, m2, m5,m6. Oncenodeplacementandrouteselectionhasoccurred,XenainvokesBeagleto performtheresourceallocation.Theinputsto Beagleareshown in Figure4(b); they areexplainedin moredetailwhenwediscusssignalingin Section7.

5.CustomizableRuntime ResourceManagementWediscusshow delegatescanperformcustomizedruntime

resourcemanagement,describethedelegateruntimeenviron-mentimplementedin Darwin,andillustratedelegateoperationusinganexample.

5.1.Delegates

We usethe term “delegate” for codethat is sentby ap-plicationsor serviceproviders to networknodesin order to

Page 6: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

Shared Reservation Sel. Drop A. Sel. Drop0

5

10

15

20

25F

ram

e R

ate

(fps

)

0.4 0.5

8.47.5

19

15

21

18

Clip 1

Clip 2

Figure 5. Video quality under four scenarios

implementcustomizedmanagementof their dataflows. Del-egatesexecuteondesignatedroutersandcanmonitorthenet-work statusandaffect resourcemanagementon thoseroutersthrougha control API, asshown in Figure2. Delegatescanbeviewedasanapplicationof active networks[25]: networkuserscanaddfunctionality to the network. It is, however, averyfocusedapplication.Delegateoperationsarerestrictedtotraffic management,sodelegateswantto learnaboutchangesin resourceavailability andmustbe ableto changeresourceuse.

A critical designdecisionfor delegatesis thedefinitionofthecontrolinterface,i.e. theAPI thatdelegatesuseto interactwith theenvironment.If theAPI is toorestrictive,it will limittheusefulnessof delegates,while toomuchfreedomcanmakethesystemlessefficient. Thedefinitionof the API is drivenby the needto supportresourcemanagementandit includesfunctionsin a numberof categories:� Delegatescanmonitorthenetworkstatus,e.g. conges-

tion conditions.� Delegatescan changehow resourcesare distributedacrossflows: splittingandmergingof flows,changingtheir resourceallocationandsharingrules,controllingselective packetdropping.� Delegatescansendandreceive messages,for exampleto coordinateactivities with peerson otherroutersorto interactwith theapplicationonendpoints.� Finally, delegatescan affect routing, for example toreroutea flow insidethevirtual networkfor loadbal-ancingreasons,or to directa flow to a computeserverthatwill performdatamanipulationsto, for example,addcompressionor encryptionto a flow.

Theuseof delegatesraisessignificantsafetyandsecurityconcerns.Delegatesarein generaluntrusted,sotherouterhasto ensurethat they cannotcorruptstatein therouteror causeother problems. This can be achieved througha variety ofrun time mechanisms(e.g. building “sandbox”that restrictswhat thedelegatecanaccess)andcompiletime mechanisms(e.g. proof carryingcode).A relatedissueis thatof security.At setuptime, therouterhasto makesurethatthedelegateis

Methods Description

add Add nodein schedulerhierarchydel Deletenodefrom schedulerhierarchyset Changeparam.onschedulerqueue

dsc on Activateselective discardin classifierdsc off Deactivateselective discardin classifierprobe Readschedulerqueuestate

reqMonitor Requestasync.cong.notificationretrieve Retrieveschedulerhierarchy

Table 1. Methods available at the Delegate/LocalResour ce Manager API

beingprovidedby a legitimateuser, andat runtime,thelocalresourcemanagerhasto makesurethatthedelegateonly actsonflows thatit is allowedto manage.Theauthenticationandauthorizationof delegateswill beperformedjointly by thesig-nalingprotocolandthelocalresourcemanager;authenticationandauthorizationhave notyetbeenimplemented.

5.2.Implementation

Our currentframework for delegatesis basedon Java andusesthe Kaffe Java virtual machine[29], capableof just-in-time(JIT) compilationandavailablefor many platforms.Thisenvironmentgivesusacceptableperformance,portability,andsafetyfeaturesinheritedfrom thelanguage.Delegatesareex-ecutedasJava threadsinsidethe virtual machine“sandbox.”Currently, delegatescanrun with differentstaticpriority lev-els, althougha more controlledenvironmentwith real-timeexecutionguaranteesis desirable.

The API that givesusersaccessto resourcemanagementfunctionsand event notification is implementedas a set ofnative methodsthat call the local resourcemanager, whichrunsin the kernel. Table1 presentsthe methodsof the Javaclassdelegate.Clschd, which implementsthe API tothepacketclassifierandscheduler. Communicationwasbuilton top of standardjava.net classes.Reroutingfunctionshave not beenimplementedyet. While this environmentissufficient for experimentation,it is not complete. It needssupportfor authenticationand mechanismsto monitor andlimit the amountof resourcesusedby delegates. We alsoexpecttheAPI to evolveover time.

5.3.Experiment

Wepresentanexampleof how delegatescanbeusedto docustomizedruntimeresourcemanagement.

Supposea network carriestwo typesof flows, dataandMPEG video, similar to the example in Figure 4. If pack-ets aredroppedrandomlyduring congestion,the quality ofthe video degradesvery quickly. The reasonis that MPEGstreamshave threetypesof framesof differentimportance:Iframes(intracoded)areself contained,P frames(predictive)usesapreviousI or Pframefor motioncompensationandthus

Page 7: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

dependon this previous frame,andB frames(bidirectional-predictive)use(andthusdependon)previousandsubsequentI or P frames. Becauseof theseinter-framedependencies,losingI framesis extremelydamaging,while B framesaretheleastcritical.

WedirectthreeflowsovertheAspen-Timberlinelink of thetestbed:2 MPEG video streamsandan unconstrainedUDPstream.Bothvideosourcessendatarateof 30frames/second,andour performancemetric is the rateof correctlyreceivedframes.Figure5 comparestheperformanceof four scenarios.In thefirst scenario,thevideoanddatapacketsaretreatedthesame,andtherandompacketlossesresultin avery low framerate. In thesecondcase,thevideostreamssharea bandwidthreservationequalto thesumof theaveragevideobandwidths.This improvesperformance,but thevideostreamsareburstyandtherandompacketlossduringpeaktransfersmeansthatlessthana third of the framesarereceived correctly. In thethird scenario,wealsoplaceadelegateonAspen.Thedelegatemonitorsthe lengthof queueusedby video streams,and ifthe queuegrows beyond a threshold,it instructsthe packetclassifierto identify anddropB frames;B framesaremarkedwith anapplication-specificidentifier. Packetdroppingstopswhenthequeuesizedropsbelow asecondthreshold.Figure5shows that is quiteeffective: the framerateroughlydoubles.Thereasonis thatby restrictingthe presenceof B framesinthequeue,theI andP framesareprotectedfrom corruption.

Whiledelegatesprovideanelegantwayof selectivelydrop-pingB frames,thesameeffectcouldbeachievedby associat-ing differentprioritieswith differentframetypes,i.e. layeredvideocoding.In scenariofour weuseadelegateto implementamoresophisticatedcustomizeddroppolicy. In scenariothree,typically too many B framesaredropped,becauseall flowsaresimultaneouslyaffected.A betterapproachis to only droptheB framesof a subsetof thevideostreams,assumingthatis sufficient to relieve congestion.Theadvantageof having adelegatecontrolselectivepacketdroppingis thatit canimple-mentcustomizedpoliciesfor controllingwhatvideostreamsare degraded. Scenariofour in Figure 5 shows the resultsfor a simple“time sharing”policy, whereevery few secondsthe delegateswitchesthe streamthathasB framesdropped.This improvesperformanceby another10-20%.Policiesthatdifferentiatebetweenflowscouldsimilarly beimplemented.

6. Hierar chical SchedulingFor anindividualphysicalresource,sharingandthuscon-

tentionexist at multiple levels: at thephysicalresourcelevelamongmultipleserviceproviders,attheserviceproviderlevelamonglower level serviceprovidersor organizations,andatthe applicationlevel amongindividual flows. Theserela-tionshipscan be representedby a resourcetree: eachnoderepresentsoneentity (resourceowner, serviceprovider, ap-plication),thesliceof thevirtual resourceallocatedto it, thetraffic aggregatesupportedby it, andthepolicy of managingthe virtual resource;eacharc representsthe virtual resource

owner/userrelationships.Figure4(c)illustratearesourcesub-treefor anapplication.

Theability to customizeresourcemanagementpoliciesatall sharinglevelsfor aresourceis oneof thekey requirementsand distinctive featuresfor service-orientednetworks. Thechallengeis to designschedulingalgorithmsthatcansimulta-neouslysatisfydiversepoliciessetby differententitiesin theresourcemanagementtree.ThissectiondescribestheHierar-chicalFair ServiceCurve(H-FSC)schedulingalgorithm(firstdescribedin [24]) whichmeetstheabove constraint.

6.1.H-FSC Algorithm and CustomizationFeaturesIn a H-FSCscheduler, associatedwith eachlink is a class

hierarchythatspecifiestheresourcemanagementpolicy. Eachinterior classrepresentssomeaggregateof traffic flows thataremanagedby an entity suchas the link owner, a serviceprovider, and so on. The resourcemanagementpolicy foreachentity is thenmappedto onethatcanbeimplementedbytheFair ServiceCurve (FSC)algorithm. Thegoal of the H-FSCalgorithmis tosimultaneouslysatisfyall theFSCpoliciesof all entitiesin thehierarchy.

In FSC,a streamö is saidto beguaranteeda servicecurve÷ùøûúýü þ, if for any time ÿ 2, thereexists a time ÿ 1 � ÿ 2, which

is the beginning one of stream ö ’s backloggedperiods(notnecessarilyincluding ÿ 2), suchthatthefollowing holds÷ ø ú ÿ 2 � ÿ 1

þ���� ø ú ÿ 1 � ÿ 2þ � (1)

where� ø ú ÿ 1 � ÿ 2

þis theamountof servicereceivedby sessionö duringthetime interval

ú ÿ 1 � ÿ 2� .Consideran entity that manages� streams,whereeach

streamcanbe an applicationflow, or a flow aggregate. Theamountof resourcethe entity managesis the serviceguar-anteedby its parententity, denotedby

÷ ú ÿ þ . Without goinginto the detailsof the FSCalgorithm,we statethat the FSCalgorithmcan(1) guaranteetheservicecurvesfor all streamsif thestability condition �ø � 1

÷ ø ú ÿ þ�� ÷ ú ÿ þ holds;(2) fairlyallocatetheexcessserviceif someflowscannotusetheirguar-anteedservice.

Wedecidedto usetheH-FSCschedulerin Darwinbecauseof thefollowing two importantpropertiesof H-FSC.First,aslongasthestabilityconditionholds,H-FSCallowsFSCpoli-ciesto becombinedarbitrarily in a hierarchywhile satisfyingall of their requirementssimultaneously. Second,the FSCalgorithm,which is usedat eachlevel in a H-FSCscheduler,providesa generalframework for implementingmany poli-cies. For example,by ensuringa minimumasymptoticslopeof

÷ùø ú ÿ þ independentof the numberof competingstreams,aguaranteedbandwidthserviceis provided to stream ö . Byassigning

÷ùøûú ÿ þ to be÷ ú ÿ þ ü��ùø�� � � 1

� � for all streams,aweightedfair servicein which streamö hasa weightof

�ùøis

implemented.Unlike variousfair queueingalgorithms,FSCdecouplesdelayandbandwidthallocation.

Due to the flexibility of FSCschedulersin implementinggeneralpoliciesandtheflexiblity of H-FSCin arbitrarilyinte-gratingvariousFSCpolicies,variousentities(resourceown-

Page 8: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

FFT Only Shared Res: 20+20 Res: 400

50

100

150

200

250

300

350

400C

omm

unic

atio

n tim

e (s

)

2851

311

365

236 232206

166

0.5 KB

1 KB

Figure 6. FFT comm unication times under vari-ous scenarios.

ers, serviceproviders, applications)sharingthe resourceatdifferent levels can independentlycustomizeor specify thepoliciesof managingtheir shareof thevirtual resources.

6.2.ImplementationThe implementationof the H-FSC schedulerin Darwin

requirestwo extracomponents:apacketclassifierandanAPIfor signalingprotocols.

Beforethey canbescheduled,incomingdatapacketsmustbeclassifiedandmappedto packetqueues.To supporttrafficflowsof variousgranularities,wehavedevisedahighlyflexibleclassificationscheme.Eachflow is describedbya9 parameterflowdescriptor: sourceIPaddress,CIDR-stylesourceaddressmask,destinationIP address,CIDR-styledestinationaddressmask,protocolnumber, sourceport number, destinationportnumber, applicationspecificID (carriedas an option in theIP header),andCIDR-styleapplicationspecificID mask. Azero denotesa “don’t care” parameter. Using this scheme,flowssuchasend-to-endTCPconnections,aggregatesof traf-fic betweennetworks,andWWW, FTP, TELNETservicescanbe specified. With the applicationspecificID, we canevensub-divideanend-to-endtraffic flow into applicationspecificsub-flows,suchasthedifferentframetypesin a MPEGflow.ThecontrolAPI exportedby thescheduleris discussedin Sec-tion 5. Theprocessingoverheadassociatedwith classificationandqueueingin our implementationis suitablylow for useinourDarwintestbed.Classificationoverheadis 3 � sperpacketwith cachingon a 200 MHz PentiumPro system. Averagequeueingoverheadis around9 � s whenthereare1000flowsin thesystem.This low overheadallows usto easilysupportlink speedof 100Mbpsin ourprototypenetworktestbed.

6.3.ExperimentsWe presenttheresultsof a setof experimentsthatdemon-

stratetheimportanceof beingableto reserve resourcesandtocontroldynamicbandwidthsharing.

In thefirst setof experiments,we runtwo distributedcom-putations,FastFourierTransforms(FFTs),simultaneouslytoshow theeffect of reservingresources.Hostsm1, m4, andm7participatein thefirst mesh(FFT-1) on0.5 KB of datafor 96

Link

UDP

60 Mbps

FFT-1 FFT-2 UDP

20 Mbps 60 Mbps 40 Mbps

FFT-2FFT-1

(a) (b)

20 Mbps

Link

Figure 7. Resour ce trees in FFT experiments.(a) Per-flow reserv ation. (b) Aggregate reserv a-tion.

iterations.Hostsm2,m5 andm8 participatein thesecondFFTmesh(FFT-2) on 1 KB of datafor 32 iterations(pleasereferto Figure 3 for the testbedtopology). All networkconnec-tions in this setof experimentsareconfiguredat 100 Mbps.Theall-to-all communicationin theFFT oftendominatestheexecutiontime. FFT usesTCP for communicationand thecommunicationpatternis highly bursty: the burst datarateis approximately80 Mbps. Figure6 summarizestheresults.Scenario1 showsthebasecase:in anidlenetworkandwithoutreservations,thecommunicationtimesare27.66s and51.05s for FFT-1 andFFT-2 respectively.

In Scenario2, we introducetwo 90 Mbps UDP trafficstreamsfrom m3 to p1 and from m6 to p1, causingcon-gestionon theinter-routerlinks. SincetheFFTsuseTCPforcommunication,their performancedegradessignificantlybe-causethecompetingtraffic doesnotusecongestioncontrol. Toimprove the FFTs’ performance,protectionboundariesmustbe drawn betweenthe backgroundtraffic and the FFTs. InScenario3, we usereservationsto guaranteethat eachFFTmeshgetsat least20 Mbs of bandwdithon eachinter-router;Figure7(a) shows the correspondingresourcetree. The re-mainingbandwidthis givento thebackgroundtraffic. Underthisper-flow reservationscheme,thecommunicationtimesoftheFFTsimprovedby 24%and36%. In Scenario4, weapplya sharedreservation of 40 Mbps for both FFTs;Figure7(b)shows theresourcetree. SincetheFFT traffic is very bursty,theadvantageof dynamicallysharingthereservedbandwidthis significant: thecommunicationtime is reducedby another13%and29%. We concludethat it is importantto beabletoreserve resources,not only for flows,but alsofor flow aggre-gates.

In oursecondsetof experiments,wedemonstratehow hier-archicalschedulingcanbeusedto controldynamicbandwidthsharing. Considera distributedinteractive simulationappli-cationthatcombinesanFFTwith aninteractivecomponenentsuchasa videostreamor sharedwhite-board.In this experi-ment,theFFT (1 KB of dataand32 iterations)usesm2, m5,andm8, while theinteractiveadaptive componentis modeledby aTCPconnectionfromp2 tos1. In addition,backgroundtraffic is modeledby a full-blast UDP traffic streamfrom m6to p1. With “vanilla” resourcedistribution, we reserve 40Mbpsfor theFFT mesh,1 Mbpsfor theTCPstream,andthe

Page 9: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

Non-hierarchical Hierarchicalscheduling scheduling

FFTcomm.time (s) 70.76 73.18TCPBW (Mbps) 1.41 5.30

Table 2. Scheduling performance comparison.

remainingbandwidthis givento thebackgroundUDP stream(Figure8(a)). With this reservationscheme,theTCPstreamachievesa throughputof 1.41Mbps,while theFFT commu-nicationtime is 70.76s(Table2).

With hierarchicalresourcemanagementwegroupflowsasis shown in Figure8(b). While theoverall reservationfor theapplicationremainsthesame,thehierarchyspecifiesthatTCP(FFT)getsthefirst chanceatusingany bandwidthleft unusedby FFT (TCP).The result is that the throughputof the TCPstreamis now 5.30 Mbps, a 276% improvement,while thecommunicationtime of theFFT increasesby aninsignificantamountof 3%to73.18s(Table2). Thisexampledemonstratesthathierarchicalschedulingmakesit possiblefor applicationsto cooperatively sharethereservedresourcesmoreeffectivelywithin theapplicationboundaryandthatspecifyingspecificre-sourcetrees,applicationsandserviceproviderscancustomizehow resourcesareshared.

7. Beagle:SignalingTheBeaglesignalingprotocolprovidessupportfor thecus-

tomizableresourceallocationmodelof Darwin. While tradi-tional signalingprotocolssuchasRSVP[32] andPNNI [3]which operateon individual flows, Beagleoperateson vir-tual networksor meshes.We elaborateon somekey Beaglefeaturesin this section— moredetailscanbefoundin [9].

7.1.BeagleDesign

On the input side,Beagleinterfaceswith Xenato obtainthevirtual networkspecificationgeneratedby Xena.Thevir-tual networkis describedasa list of flows anddelegates,plusresourcesharingspecificationsthatdescribehow flowswithina meshshareresourcesamongstthem. The examplevirtualnetworkin Figure4(b) shows two video flows andtwo dis-tributedinteractive simulationflows. Eachflow is specifiedby a flow descriptoras describedin Section6.2 and infor-mationsuchasa tspecanda flowspecobject,asin the IETFIntServworkinggroupmodel.A delegateis characterizedbyits resourcerequirements(onCPU,memory, andstorage),itsruntimeenvironment,andalist of flowsthatthedelegateneedsto manipulate.The delegateruntimeenvironmentis charac-terizedby a codetype(e.g. Java,Active-X) andruntimetype(e.g. JDK 1.0.2, WinSock 2.1, etc.). The virtual networktypically alsoincludesa numberof designatedrouterswhichidentify themeshcore.In theexample,AspenandTimberlinearethedesignatedrouters.

FFT

UDPUDP

Link Link

FFT TCP

40 Mbps 59 Mbps

(b)(a)

Appl

TCP

1 Mbps

59 Mbps41 Mbps

40 Mbps 1 Mbps

Figure 8. Resour ce trees in distrib uted interac-tive sim ulation experiments. (a) Per-flow reser -vation. (b) Hierarchical reserv ation.After receiving arequest,Beagleissuesasequenceof flow

setupmessagesto thedifferentnodesin themesh,eachmes-sagespecifyingwhat total resourcesare neededon a link,plusa groupingtreethatspecifieshow theseresourcesshouldbe applied. An efficient setupof a meshincludesthe es-tablishmentof the core,and individual flow setupsinitiatedby the sendersor receivers that rendezvouswith the core indesignatedrouters. Our initial implementationusessimpleper-flow setupsanda hardstateapproachbasedon three-wayhandshakesbetweenadjacentrouters. Futurework includesoptimizingmeshsetupandevaluatingtheuseof soft stateforsomeor all of themeshstate.

Oneachnode,Beaglepassesa resourcetrees(Figure4(c))to theLocalResourceManagertoallocateresourcesfor flows.This interfaceis similar to theonedescribedin Section5 (Ta-ble 1). Beaglealsoestablishesdelegatesonto switch nodes(for resourcemanagementdelegates),or computeand stor-agenodes(for dataprocessingdelegates).For eachdelegaterequest,Beaglelocatestheappropriateruntimeenvironment,initializesthelocal resourcemanagerhandlesandflow reser-vationstate,andinstantiatesthedelegate. Thehandlesallowthedelegateto give resourcemanagementinstructionsto thelocal resourcemanagerfor the flows associatedwith it. Inthe future,Beaglewill alsoprovide supportfor communica-tion channelsbetweencontroldelegatesbelongingto thesameserviceandalsoprovide mechanismsto recover from controldelegatefailures.

7.2.ResourceDistributionIn Section6 we describedhow dynamicresourcesharing

canbecontrolledandcustomizedby specifyinganappropri-ate resourcetreefor eachresource.This could be achievedby having applicationsor Xenaspecifythe resourcetreestoBeagle,so that it can install themon eachnode. Therearetwo problemswith this approach.First, how onespecifiesaresourcetree is networkspecificand it is unrealisticto ex-pectapplicationsandbrokersto dealwith this heterogeneity.Second,applicationsandbrokersdonotspecifyeachphysicallink; insteadthey usevirtual links that may represententire

Page 10: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

subnets.Beagleusesthe hierarchical groupingtreeabstrac-tion to dealwith bothproblems:it isanabstractrepresentationof thesharinghierarchythatcanbemappedontoeachlink byBeagle.Figure4(b) givesanexample.

The hierarchicalgrouping tree encodesthe hierarchicalsharingstructureof all flows sharinga virtual link. Onceit knows theactualflows thatsharea particularphysicallinkin thenetwork,Beagleprunesthehierarchicalgroupingtree,eliminatingflowsthatdonotexistatthatlink. Todealwith net-work heterogeneity, interior nodesin the hierarchicalgroup-ing treehave genericQoSservicetypesassociatedwith theminsteadof network-specificsharingspecifications.The leafnodesof thegroupingtreerepresentflowswhoseQoSrequire-mentsareexpressedby individualflowspecs.Service-specificrulesdescribehow child nodeflowspecsareaggregatedintoparentnodeflowspecsin deriving aphysicallink resourcetreefrom the groupingtree. This involvespruningthe groupingtreeto eliminateflowsthatdonotexist ataparticularlink andconvertingflowspecsat eachnodeinto appropriatelow-levelscheduler-specificparameters,suchasa weightfor hierarchi-cal weightedfair shareschedulers[4] or a servicecurve fortheHierarchicalFair ServiceCurvescheduler[24].

7.3.Temporal SharingThere are often resourcesharingopportunitieson time

scales larger than what can be expressedin tspecsandflowspecs.For example,a conferencingapplicationmayen-surethatat mosttwo videostreamsareactive at any time,oranapplicationmay like to associatean aggregatebandwidthrequirementfor a group of best-effort flows. Applicationsandresourcebrokerscanspecifythis application-specificin-formation by handingBeagletemporalsharingobjectsthatlist setsof flow combinationsandtheir associatedaggregateflowspecs. Beaglecan then usethis information to reduceresourcerequirementsfor agroupof flowssharinga link.

The temporalsharingobject is similar in spirit to the re-sourcesharingapproachusedin the Tenet-2scheme[17].However, controlof flow routingusingdesignatedroutersal-lowsusto takebetteradvantageof sharingopportunities.ThetemporalsharingobjectalsogeneralizesRSVP’snotionof re-sourcereservationstyles.However, RSVPlimits aggregationtoflowswithin amulticastsessionandtheaggregateflowspecsmustbetheresultof eitherasumor aleastupperbound(LUB)operationon theindividualflowspecs.In Beagle,thetempo-ral sharingobjectcanbeusedto grouparbitraryflows withinanapplicationmeshandany aggregateflowspeccanbeused.As anexample,in Figure4(b),thedistributedinteractivesim-ulation applicationassociatesan aggregateControlledLoadserviceflowspecwith thetwo simulationflows.

The hierarchicalgroupingtree and the temporalsharingobjectsbothdefinewaysin whichanapplicationcantailor re-sourceallocationwithin themesh.However, they areseparateconceptsandareorthogonalto eachother. If both typesofsharingarespecified,theresourcetreeis derivedby applying

Figure 9. Xena output showing service layout

thetemporalsharingspecificationatevery level of thetree.Ifthe temporalsharingobject lists flow groupsthat do not fallunderthesameparentnodein theresourcetree,thetemporalsharingbehavior is ignored.Theexamplein Figure4(b)showsthe useof both sharingobjects. The resultinglink resourcesub-treesat links � 1 and � 2, assumingtheuseof hierarchicalweightedfair shareschedulers[4], areshown in Figure4(c).

7.4.Experimental EvaluationToevaluatetheperformanceof theBeagleprototypeimple-

mentation,we measuredend-to-endsetuplatenciesfor flowsanddelegates,andper-routerflow setupprocessingtimesonthe Darwin testbed.The experimentinvolvedsettingup thevirtual networkshown in Figure4(b). All measurementsre-portedareaveragesfrom 100 runs. The averageend-to-endlatency through two routerswas 7.5msfor flow setupand3.8msfor delegate setup. The flow setupprocessingtimeon eachrouterwas2.4ms,about68%of which wasspentininteractingwith thelocal resourcemanager. This involvesad-missioncontrolandsettingupflow statein thepacketclassifierandscheduler. The currentBeagleprototypesupportsabout425flow setupspersecond.This is comparableto connectionsetuptimesreportedfor variousATM switches.However, weexpectimprovementin theseresultsby optimizingtheimple-mentation.

8. Darwin SystemDemonstrationThissectiondemonstratesthevariouspiecesof theDarwin

systemin action. The example usedin demonstratingthesystemis the sameas the one shown in Figure 4(b). Theservicelayoutproducedby Xenais shown in Figure9 whichisascreenshotof arunningXenasystem.As shown in Figure9,

Page 11: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

Figure 10. Scheduler graphical user interfaceshowing the resour ce tree and bandwidth plots

Xenainstantiatesa transcoderdelegateon m4 to convert theMPEG video flow to JPEGformat. Xena also selectsm1and m2 as simulationendpoints. Beagletakesthis servicelayout andallocatesresourcesfor the video andsimulationflows on Timberline and Aspen. Figure 10 is a screenshotof a user-interfaceprogramfor theH-FSCschedulershowingthe classifierand schedulerstateinstalledon link � 1 in thedirectionfrom Aspento Timberline.Figure10alsoshowsthebandwidthplotsfor thevideo(light grey) andsimulation(darkgrey) flowsat thatlink.

9. RelatedWorkTherehasrecentlybeena lot of work aspartof theXbind

[19, 31] andTINA [13, 26] effortsto defineaservice-orientedarchitecturefor telecommunicationnetworks.Therearesev-eraldifferencesbetweenthemandDarwin. First,servicesen-visionedby Xbind andTINA aremostlytelecommunications-oriented. The value-addedservicesdescribedin this paperintegratecomputation,storage,andcommunicationresources.Second,the value-addedservicesin their context areusuallyrestrictedto the control plane,e.g. signaling. Darwin sup-portscustomizedservicesin thedataplane(controlledsharingof resources,processing,andstorage)andthe control plane(signalingandresourcebrokering). Finally, while the focusof both TINA and Xbind is on developing an openobject-orientedprogrammingmodel for rapid creationanddeploy-mentof services,thefocusof Darwinis ondevelopingspecificresourcemanagementmechanismsthatcanbecustomizedtomeetservice-specificneeds.While Xbind andTINA have sofar primarily beenusedasa developmentframework for tra-ditional ATM and telecommunicationnetwork managementmechanisms,they could potentially alsobe usedas a basisfor the developmentof customizableresourcemanagementmechanisms.

Over the pastdecademuchwork hasgoneinto definingQoSmodelsanddesigningassociatedresourcemanagement

mechanismsfor bothATM andIP networks[10,14,27]. Thishasresultedin specificQoSservicemodelsbothfor ATM [1]and IP [7, 30, 22]. This hasalso resultedin the develop-mentof QoSroutingprotocols[3, 28, 21, 20] andsignalingprotocols[2, 3, 32]. A closely relatedissuebeing investi-gatedin the IP communityis link sharing[15], the problemof how organizationscan sharenetwork resourcesin a pre-setway, while allowing the flexibility of distributing unusedbandwidthto otherusers. Darwin differs from theseeffortsin severalaspects.First, while mostof this work focusesoncommuncationservices,Darwin addressesboth bitway andvalue-addedserviceproviders. Second,most QoS modelsonly supportQoSon a per-flow basis [7, 1]. Exceptionsaretheconceptof VP andVC in ATM, andIP differentialservicemodel[6, 18, 11, 33], but theseefforts arevery restrictedei-ther in thetypeof hierarchythey supportor in thenumberoftraffic aggregatesfor whichQoScanbeprovided. In contrast,Darwin usesvirtual networksto defineservice-specificQoSandsupportscontrolledresourcesharingamongdynamicallydefinedtraffic aggregatesof differentgranularities. Finally,while theseeffortsprovideresourcemanagementmechanismson thespace,timeandorganizationaldimensions,themecha-nismsoperatelargelyin anisolatedanduncoordinatedfashion.On theotherhand,Darwin takesan integratedview towardsresourcemanagementalongthesethreedimensions.

The ideaof “active networks”hasrecentlyattracteda lotof attention.In anactivenetwork,packetscarrycodethatcanchangethebehavior of thenetwork[25]. TheDarwin projecttoucheson this conceptin two ways. First, servicedelegatesarean exampleof active packets,althougha very restrictedone: delegatesaretypically downloadedto a specificnodeatserviceinvocationtime,andremainin actionfor theduration.Second,Darwin’s facilities for managingboth computationandcommunicationresourcesvia virtual networkscanhelpto solvekey resourceallocationproblemsfacedby activenet-works.

10.SummaryWe have designeda resourcemanagementsystemcalled

Darwin for service-orientednetworksthat takesanintegratedview towardsresourcemanagementalong space,time, andorganizationdimensions. The Darwin systemconsistsoffour inter-relatedresourcemanagementmechanisms:resourcebrokerscalled Xena, signalingprotocol called Beagle,run-time resourcemanagementusingdelegates,andhierarchicalschedulingalgorithmsbasedonservicecurves.Thekey prop-erty of all thesemechanismsis that they canbe customizedaccordingto servicespecificneeds.While thesemechanismsaremosteffective whenthey work togetherin Darwin, eachmechanismcan also be usedin a plug-and-playfashion intraditional QoS architectures,e.g., Beaglefor RSVP, Xenafor resourcebrokers,and hierarchicalschedulingfor trafficcontrol. We have a proof-of-conceptimplementationof theDarwin systemandpreliminaryexperimentalresultsto vali-

Page 12: Darwin: Customizable Resource Management for Value-Added ...hzhang/papers/ICNP98.pdf · Darwin: Customizable Resource Management for Value-Added Network Services Prashant Chandra

datethearchitecture.TheDarwin prototypedescribedin this paperimplements

thevisionanddemonstratessomeof thepossibilities,butmuchworkremainstobedone.Futureversionswill befarmorescal-able,bothin termsof routingin largetopologiesandin termsof aggregateprocessingof largenumbersof flows. Securityfeaturesarecurrentlyrudimentary, andexplicit authentication,authorizationandencryptionmethodsremainto be incorpo-rated. Hierarchicalresourcemanagementhas beenimple-mentedonly for networkcomponents,andmustbeextendedto computationandstorageresources.Finally, a numberoftopics, while important to a completenetwork, are beyondthe scopeof the the currentproject: tools for servicecre-ation,mechanismsfor automateddiscovery of resources,anddetailedaccountingof resourcecommitmentanduse.

References[1] ATM Forum Traffic ManagementSpecificationVersion4.0,

October1995.ATM Forum/95-0013R8.[2] ATM User-NetworkInterfaceSpecification.Version4.0,1996.

ATM Forumdocument.[3] PrivateNetwork-NetworkInterfaceSpecificationVersion1.0,

March1996.ATM Forumdocument- af-pnni-0055.000.[4] J. Bennettand H. Zhang. Hierarchicalpacketfair queueing

algorithms.In Proceedingsof theSIGCOMM’96 Symposiumon CommunicationsArchitecturesand Protocols, pages143–156,Stanford,August1996.ACM.

[5] M. Berkelaar. lp solve: aMixedIntegerLinearProgramsolver.ftp://ftp.es.ele.tue.nl/pub/lpsolve/,September1997.

[6] S.Blake. Someissuesandapplicationsof packetmarkingfordifferentiatedservices,July 1997. InternetDraft.

[7] R. Braden,D. Clark,andS.Shenker. Integratedservicesin theinternetarchitecture:an overview, June1994. InternetRFC1633.

[8] P. Chandra,A. Fisher, C. Kosak,, andP. Steenkiste.Networksupportfor application-orientedqos. In Proceedingsof SixthInternationalWorkshoponQualityof Service, pages187–195,Napa,California,USA, May 1998.IEEE.

[9] P. Chandra,A. Fisher, andP. Steenkiste.Beagle: A resourceallocationprotocolfor anapplication-awareinternet.TechnicalReportCMU-CS-98-150,CarnegieMellon University, August1998.

[10] D. Clark,S.Shenker,andL. Zhang.Supportingreal-timeappli-cationsin anintegratedservicespacketnetwork: Architectureandmechanism.In Proceedingsof ACMSIGCOMM’92, pages14–26,Baltimore,Maryland,Aug. 1992.

[11] D. ClarkandJ.Wroclawski. An approachto serviceallocationin the internet,July 1997. Internetdraft, draft-clark-diff-svc-alloc-00.txt,work in progress.

[12] L. Delgrossiand D. Ferrari. A virtual network serviceforintegrated-servicesinternetworks.In Proceedingsof the7thIn-ternationalWorkshopon NetworkandOperatingSystemSup-port for Digital Audio and Video, pages307–311,St. Louis,May 1997.

[13] F. Dupuy, C. Nilsson, and Y. Inoue. The tina consortium:Towardnetworkingtelecommunicationsinformationservices.IEEE CommunicationsMagazine, 33(11):78–83,November1995.

[14] D. Ferrari and D. Verma. A schemefor real-timechannelestablishmentin wide-areanetworks. IEEEJournalonSelectedAreasin Communications, 8(3):368–379,Apr. 1990.

[15] S.FloydandV. Jacobson.Link-sharingandresourcemanage-mentmodelsfor packetnetworks.IEEE/ACM TransactionsonNetworking, 3(4),Aug. 1995.

[16] R.Guerin,D. Williams,T. Przygienda,S.Kamat,andA. Orda.QoSRoutingMechanismsandOSPFExtensions.IETF Inter-netDraft<draft-guerin-qos-routing-ospf-03.txt>,March1998.Work in progress.

[17] A. Gupta,W. Howe, M. Moran, and Q. Nguyen. Resourcesharingin multi-partyrealtimecommunication.In Proceedingsof INFOCOM95, Boston,MA, Apr. 1995.

[18] K. Kilkki. Simpleintegratedmediaaccess(sima),June1997.InternetDraft.

[19] A. Lazar,K.-S.Lim, andF. Marconcini.Realizingafoundationfor programmabilityof atm networkswith the binding archi-tecture. IEEE Journal on SelectedAreasin Communication,14(7):1214–1227,September1996.

[20] Q. Ma and P. Steenkiste.On path selectionfor traffic withbandwidthguarantees.In Fifth IEEEInternationalConferenceonNetworkProtocols, pages191–202,Atlanta,October1997.IEEE.

[21] Q. Ma andP. Steenkiste.Quality of serviceroutingfor trafficwith performanceguarantees.In IFIP InternationalWorkshopon Quality of Service, pages115–126,New York, May 1997.IFIP.

[22] S.Shenker, C. Partridge,andR. Guerin.Specificationof guar-anteedqualityof service,September1997.IETF RFC2212.

[23] P. Steenkiste,A. Fisher, and H. Zhang. Darwin: Resourcemanagementin application-aware networks. TechnicalRe-portCMU-CS-97-195,CarnegieMellon University, December1997.

[24] I. Stoica,H. Zhang,andT. S.E. Ng. A HierarchicalFair Ser-viceCurveAlgorithmfor Link-Sharing,Real-TimeandPriorityService.In Proceedingsof theSIGCOMM’97 SymposiumonCommunicationsArchitecturesandProtocols, pages249–262,Cannes,September1997.ACM.

[25] D. TennenhouseandD. Wetherall.Towardsandactivenetworkarchitecture.ComputerCommunicationReview, 26(2):5–18,April 1996.

[26] Tinaconsortium.http://www.tinac.com.[27] J.S.Turner. New directionsin communications(or whichway

to the information age?). IEEE CommunicationsMagazine,24(10):8–15,October1986.

[28] Z. Wang and J. Crowcroft. Quality-of-ServiceRouting forSupportingMultimediaApplications.IEEEJSAC,14(7):1288–1234,September1996.

[29] T. Wilkinson. KAFFE - A virtual machineto run Java code.http://www.kaffe.org/.

[30] J. Wroclawski. Specificationof the controlled-loadnetworkelementservice,September1997.IETF RFC2211.

[31] ProjectX-Bind. http://comet.ctr.columbia.edu/xbind.[32] L. Zhang,S. Deering,D. Estrin,S. Shenker, andD. Zappala.

RSVP:A New ResourceReservationProtocol.IEEECommu-nicationsMagazine, 31(9):8–18,Sept.1993.

[33] L. Zhang, V. Jacobson,and K. Nichols. A Two-bit Dif-ferentiatedServicesArchitecturefor the Internet,December1997. Internetdraft, draft-nichols-diff-svc-arch-00.txt,Workin progress.