application portability in cloud computing: an abstraction driven perspective

14
1 Application Portability in Cloud Computing: An Abstraction Driven Perspective Ajith Ranabahu, E. Michael Maximilien, Amit Sheth, and Krishnaprasad Thirunarayan Abstract—Cloud computing has changed the way organizations create, manage, and evolve their applications. While the abundance of computing resources at low cost opens up many possibilities for migrating applications to the cloud, this migration also comes at a price. Cloud applications, in many cases, depend on certain provider specific features or services. In moving applications to the cloud, application developers face the challenge of balancing these dependencies to avoid vendor lock-in. We present an abstraction-driven approach to address the application portability issues and focus on the application development process. We also present our theoretical basis and experience in two practical projects where we have applied the abstraction driven approach. Index Terms—Cloud computing, Domain Specific Languages, Application Generation 1 I NTRODUCTION Cloud computing is one of the most notable evolution in computing. Availability of seemingly unlimited, readily provisioned, pay per use computing resources has not only spawned a number of new industries but has also changed the mindset of all information tech- nology (IT) centric businesses. Larger tech businesses now offload their overflow computing requirements to computing clouds while technology startups use them to establish their IT infrastructure without a heavy up front capital expenditure. The adoption of clouds by organizations however, does not imply that all the challenges in using com- puting clouds have been well understood. Clouds offer access to cheap, abundant computing resources, but the most appropriate utilization of these resources is still limited by the unavailability of relevant soft- ware stacks. For example, infrastructure as a service (IaaS) clouds offer the ability to quickly and program- matically provision computing instances but it is up to the user programs to make use of this capability, say by dynamically load balancing. Some cloud service providers offer platform ser- vices where the difficulties in scaling and load man- agement are transparent to certain types of user pro- grams, e.g., Web applications. User programs merely adhere to a set of predefined software frameworks and the platform takes care of the otherwise mundane tasks such as load balancing. These platforms how- ever are focused on limited technical domains and Ajith Ranabahu, Amit Sheth and Krishnaprasad Thirunarayan are with the Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis) Center, Wright State University, Dayton, OH 45435 E-mail: {ajith,amit,tk}@knoesis.org E. Michael Maximilien is with IBM Research at 650 Harry Road, San Jose, CA 95120. E-mail: [email protected] thus are not applicable across all types of applications. For example, Google App Engine (GAE) 1 , one of the leading cloud platform service providers, supports only a limited set of Web development frameworks and two data storage options. Similarly, Windows Azure cloud 2 primarily supports the .NET platform and has limited support for other languages and frameworks. As illustrated by these examples, the current cloud computing landscape consists of a large number of heterogeneous service offerings, ranging from infras- tructure oriented services to specific software based services. These differences result in application archi- tectures dictated by service provider specific features, ultimately resulting in non-portable, vendor-locked applications. Many incidents have repeatedly shown that this is indeed a serious pitfall in adopting the cloud. Two incidents in this regard, recorded publicly, are listed below. 1) The Amazon Elastic Compute Cloud (EC2) became unavailable on April 21st, 2011 for about 12 hours due to a network misconfiguration 3 . Many popular startups, including Foursquare, Reddit, and Quora, were unable to function during this period. None of these services were able to restore their func- tionality until EC2 was fixed. 2) The Microsoft Azure cloud became unavailable for about 3 hours on February 28th, 2012 due to a leap year (February 29th) time calculation bug 4 in the Azure platform software. All Microsoft cloud services were not restored until a fix was deployed. Microsoft later issued service credits for all the customers affected by the outage. 1. http://code.google.com/appengine/ 2. http://www.windowsazure.com/en-us/ 3. http://aws.amazon.com/message/65648/ 4. http://goo.gl/tt0Pp Digital Object Indentifier 10.1109/TSC.2013.25 1939-1374/13/$31.00 © 2013 IEEE IEEE TRANSACTIONS ON SERVICES COMPUTING This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Upload: krishnaprasad

Post on 20-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Application Portability in Cloud Computing: An Abstraction Driven Perspective

1

Application Portability in Cloud Computing: AnAbstraction Driven Perspective

Ajith Ranabahu, E. Michael Maximilien, Amit Sheth, and Krishnaprasad Thirunarayan

Abstract—Cloud computing has changed the way organizations create, manage, and evolve their applications. While theabundance of computing resources at low cost opens up many possibilities for migrating applications to the cloud, this migrationalso comes at a price. Cloud applications, in many cases, depend on certain provider specific features or services. In movingapplications to the cloud, application developers face the challenge of balancing these dependencies to avoid vendor lock-in. Wepresent an abstraction-driven approach to address the application portability issues and focus on the application developmentprocess. We also present our theoretical basis and experience in two practical projects where we have applied the abstractiondriven approach.

Index Terms—Cloud computing, Domain Specific Languages, Application Generation

1 INTRODUCTION

Cloud computing is one of the most notable evolutionin computing. Availability of seemingly unlimited,readily provisioned, pay per use computing resourceshas not only spawned a number of new industries buthas also changed the mindset of all information tech-nology (IT) centric businesses. Larger tech businessesnow offload their overflow computing requirementsto computing clouds while technology startups usethem to establish their IT infrastructure without aheavy up front capital expenditure.

The adoption of clouds by organizations however,does not imply that all the challenges in using com-puting clouds have been well understood. Cloudsoffer access to cheap, abundant computing resources,but the most appropriate utilization of these resourcesis still limited by the unavailability of relevant soft-ware stacks. For example, infrastructure as a service(IaaS) clouds offer the ability to quickly and program-matically provision computing instances but it is upto the user programs to make use of this capability,say by dynamically load balancing.

Some cloud service providers offer platform ser-vices where the difficulties in scaling and load man-agement are transparent to certain types of user pro-grams, e.g., Web applications. User programs merelyadhere to a set of predefined software frameworksand the platform takes care of the otherwise mundanetasks such as load balancing. These platforms how-ever are focused on limited technical domains and

• Ajith Ranabahu, Amit Sheth and Krishnaprasad Thirunarayan arewith the Ohio Center of Excellence in Knowledge-Enabled Computing(Kno.e.sis) Center, Wright State University, Dayton, OH 45435E-mail: {ajith,amit,tk}@knoesis.org

• E. Michael Maximilien is with IBM Research at 650 Harry Road, SanJose, CA 95120.E-mail: [email protected]

thus are not applicable across all types of applications.For example, Google App Engine (GAE)1, one of theleading cloud platform service providers, supportsonly a limited set of Web development frameworksand two data storage options. Similarly, WindowsAzure cloud2 primarily supports the .NET platformand has limited support for other languages andframeworks.

As illustrated by these examples, the current cloudcomputing landscape consists of a large number ofheterogeneous service offerings, ranging from infras-tructure oriented services to specific software basedservices. These differences result in application archi-tectures dictated by service provider specific features,ultimately resulting in non-portable, vendor-lockedapplications. Many incidents have repeatedly shownthat this is indeed a serious pitfall in adopting thecloud. Two incidents in this regard, recorded publicly,are listed below.

1) The Amazon Elastic Compute Cloud (EC2) becameunavailable on April 21st, 2011 for about 12 hoursdue to a network misconfiguration3. Many popularstartups, including Foursquare, Reddit, and Quora,were unable to function during this period. Noneof these services were able to restore their func-tionality until EC2 was fixed.

2) The Microsoft Azure cloud became unavailable forabout 3 hours on February 28th, 2012 due to aleap year (February 29th) time calculation bug4 inthe Azure platform software. All Microsoft cloudservices were not restored until a fix was deployed.Microsoft later issued service credits for all thecustomers affected by the outage.

1. http://code.google.com/appengine/2. http://www.windowsazure.com/en-us/3. http://aws.amazon.com/message/65648/4. http://goo.gl/tt0Pp

Digital Object Indentifier 10.1109/TSC.2013.25 1939-1374/13/$31.00 © 2013 IEEE

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 2: Application Portability in Cloud Computing: An Abstraction Driven Perspective

2

These incidents provide evidence that being lockedinto a cloud service provider is indeed an importantissue to consider. We take a fundamentally differ-ent mindset and use an application oriented perspectivewhere users would focus on describing the applica-tion behavior rather than its implementation. Thismatches well with the perspective of typical cloudservice consumers where they expect a certain func-tionality from their application and are oblivious tothe actual underlying service provisioning mecha-nism. The service providers on the other hand, de-sign their interaction patterns, input parameters andservice interfaces taking a utilization perspective. Thismismatch is essentially the root cause for the majorityof the portability challenges we see today in cloudcomputing.

In our solution, cloud service consumers use ab-stract languages to specify their programs. A softwareinfrastructure transforms the user program specifi-cations to the required, provider specific softwarecomponents. These transformations are generic andcan mostly be automated. If these specifications arekept at a sufficiently high level such that cloud-specific features are not directly exposed, they can beautomatically compiled and customized for a varietyof cloud environments. This process is described indetail in Section 3.

There are many aspects to consider in achievingthis grand vision. In this paper, we focus only onone salient aspect that we consider to be the primarybuilding block of cloud program portability, Func-tional specification abstraction. Functional abstractionsprovide high level specifications of the core businesslogic of a program.

Our contributions in this paper are the following:

1) We present a set of fundamental transformationalconditions that apply to translating abstract func-tional specifications to executable cloud programs.

2) We outline the practical impact of these conditionsand their applicability in determining the feasibil-ity of an abstraction driven solution for a givendomain.

3) We present the metrics and lessons learned fromtwo successful projects that use abstract specifica-tions to generate cloud applications.

The upcoming sections are organized as follows. Weintroduce all the background material in Section 2.Next we provide an overview of the use of abstrac-tions and establish the core area of interest in Section3. Sections 4 and 5 discuss in detail the theoreticalaspects of language transformations and their impact.Section 6 presents an evaluation using two practicalapplications, followed by Section 7 where we dis-cusses our experience. Finally, we present a discussion(Section 8) and conclude.

2 BACKGROUND

Our approach has its basis in the fundamentals ofprogramming languages. In this section, we brieflycover the necessary details as well as the pertinentbackground on languages.

2.1 Defining the Abstract ConceptWe use a modified version the definition of abstractconcept, provided by Kleppe [1].

The abstraction level of a concept present in asoftware language is the amount of detail requiredto either represent (for data) or execute (forprocesses) this concept in terms of the perceived(virtual) zero level.

The perceived zero level in this definition refers tothe base line that the abstraction level is measuredfrom. The absolute zero line for a software languageis the computer hardware. In other words, everycomputer program has to be converted to hardwareinterpretable machine instructions if they are to be ex-ecuted. Yet, with the advancement and sophisticationof high level computer languages and their tools, thezero line may be considered at a much higher levelwhen constructing programs. This elevated zero lineis what is referred to as a virtual zero line [1].

A fitting example can be found in object orientedprogramming (OOP). In OOP, all the program de-sign happens using objects as the primitive buildingblocks, thus perceiving a virtual zero line at the levelof objects. The objects, defined further in terms of datastructures to hold its state and methods to transformits state, will obviously need to be mapped to memoryand instructions that can be executed on hardware.However, such transformations can be mechanicallyand transparently performed by established softwareframeworks (compiler, linker libraries) and hence, theprogram designer can conveniently assume objects tobe the lowest level of abstraction.

2.2 Language SpecificationLanguage theory states that a language specification(L) requires three elements to be described:1) An abstract syntax model (ASM): this is the

high level model of the language, often invisibleand used directly inside the language interpretermechanism. Also known as the conceptual model,the ASM can be represented as a directed, labeledgraph.

2) One or more concrete syntax models (CSM): Thisis generally the syntax seen by the programmersand what is typically referred to as the language. Asingle ASM may have more than one related CSM.

3) Set of transformations (mapping) : A mappingfrom the ASM to the CSM (defined per CSM)specifying the conversion of the ASM to concretesyntax and vice versa. These transformations are

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 3: Application Portability in Cloud Computing: An Abstraction Driven Perspective

3

reversible, i.e., a program representation can betransformed losslessly from ASM to CSM and viceversa.

There are three more elements, relevant to a lan-guage specification:1) A semantic description: a description of the mean-

ing of the program or model, including a model ofthe semantic domain.

2) Required language interfaces: a definition of whatthe programs need from programs written in otherlanguages.

3) Offered language interfaces: a definition of whatparts of the programs are available to programswritten in other languages.

The semantic description warrants our special at-tention. All language specifications require a semanticspecification, yet this semantic specification is hardlyprovided in a formal notation. Instead, it is providedas a prose, i.e. a rigorous but informal text for thebenefit of the programmers at large. Formal semanticspecifications are indispensable only for advancedactivities such as the developmental activities such asthe construction of compilers and interpreters, pro-gram transformation and verification tools, etc.

We direct the reader to Kleppe [1] for a thoroughcoverage of the fundamental language theory con-cepts.

2.3 Domain Specific Language

Van Deursen et al. [2] state that

a domain-specific language (DSL) is a program-ming language or executable specification lan-guage that offers, through appropriate notationsand abstractions, expressive power focused on,and usually restricted to, a particular problemdomain.

A domain in this case is the set of entities and their as-sociated operations, selected specifically to representand operate on these entities in a restricted context.Domains can be of varying degrees of granularity.Some domains are highly constrained, while othershave a much larger scope. For example, matriceswould be a more constrained domain while mathe-matics is a domain with a much larger scope.

A DSL, although technically a programming lan-guage, have an entirely different focus. Hence, someactivities considered critical in a construction of ageneral purpose programming language (GPPL) arenot treated with similar importance in a DSL. in thenext section, we discuss the concepts of modeling andestablish the relationship between a GPPL and a DSL.

2.4 Modeling and Metamodeling

A model can be thought of as an abstraction of a systemor its environment or both [3]. Models are represented in

many forms, ranging from textual languages to graph-ical notations. Unified Modeling Language (UML) [4]is one such modeling language software engineers arefamiliar with.

A metamodel defines the abstractions used by themodeling notation. The metamodel acts as the schemafor a model, defining the permissible componentsand the constraints applicable on the model. Thus,based on conformity, one can create a hierarchy ofmodels [5] as illustrated in Figure 2(a). Specificationshigher than meta-metamodels are typically not usefulin the context of model driven software development.

Metamodels are important in our research since weconsider a DSL to be a representation of a domainmodel. Hence, the ASM of the DSL is the domainmetamodel.

Fig. 1: Standard metamodel for Mathematical expressions

To illustrate the relationship between metamodelsand ASMs, consider the simple domain of unaryand binary mathematical expressions. The high levelconcepts that encapsulate this domain are operator andexpression. Expression may further be specialized asunary and binary expressions and number may alsobe added as a subclass of expression to support literalvalues. These concepts, arranged in a graph (Figure 1)defines the metamodel for mathematical expressionsin a graphical form (See [1] for a detailed version ofthis example). We’ve omitted some details for brevityand depict this model as a directed, labelled graph.

(a) The typicalfour layers ofmodeling

(b) The relationship between model, meta-model, and model transformation

Fig. 2: The modeling hierarchy and its relationship to modeltransformation

Now assume that we want a DSL to describe math-ematical expressions. We would need to model the

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 4: Application Portability in Cloud Computing: An Abstraction Driven Perspective

4

relationship between various components of mathe-matical expressions and represent these expressionsin a syntax agnostic manner, i.e. we need to constructan ASM for this language. It is easy to see that thecomponents of this ASM are essentially the sameas the metamodel. For example, each literal numberpresent in a mathematical expression will need to berepresented as an instance of number type, which isone of the components we defined in the expressionmetamodel.

2.5 Model TransformationsA Model transformation is a mapping from one modelto another, defined on the metamodels but operatedon the respective metamodel instances. This relation-ship is illustrated in Figure 2(b).

There are multiple methods of model transforma-tions and graph based model transformations is onlyone of them. For our work, we selected graphs asthe primary representation of the models and hencelimit our focus to graph transformations. Czarnecki etal. [3] provide an exhaustive list of transformationaltechniques used for model transformations.

Also, we do not consider a specific transformationimplementation technique, say a rule based mecha-nism. Our focus is only on the conditions and spe-cial requirements that apply to these transformations,rather than how they are performed.

2.6 Semantics of a DSL : Metamodeling vs Tradi-tional SemanticsMetamodeling is the preferred way of establishing se-mantics of a DSL. This is an alternative to the rigoroussemantic representations used in traditional languageconstruction. GPPLs are not confined to a domainand thus require domain independent specificationsto establish their semantics formally. However DSLsalways represent a domain and the domain meta-model in fact is sufficient to represent the semanticsof the language. We direct the reader to Nordstrom[6] for a detailed discussion on the relationship ofmodeling and languages.

3 ABSTRACTION DRIVEN PORTABILITY

We now outline the use of abstractions in achievingcloud application portability. First we provide anoverview of the process based on abstractions andthen present a formal definition for a cloud applica-tion.

3.1 Overview of Using AbstractionsThe essence of our approach is using an abstractspecification, typically in the form of a DSL script,to generate platform specific but functionally equiva-lent executable applications. The high-level process isillustrated in Figure 3.

Fig. 3: Using domain driven abstractions to generated exe-cutable cloud programs

The source program (script) is composed using aDSL, taking a domain perspective. This program isfree of any concept specific to a cloud environment.A transformation and code generation engine usesthe DSL script to mechanically convert them to thetarget platform specific code. During this process, thespecifics of the target platform remain transparentto the program composer, thus there is no locking.When the application needs to be ported to a differ-ent target platform, the composer simply reuses theoriginal source program to regenerate a functionallyequivalent application for the target platform, thusachieving application portability.

In reality, the abstractions may not provide com-plete coverage of all required features. The generatedprograms can provide generic functionality out-of-the-box by using sensible defaults but they may notbe able to exploit highly specialized features presentin the target platform. As an alternative, this mech-anism can be used as boiler plate code to cover theotherwise mundane work to be done by developers.The generated programs can have well-defined placeholders to support further customizations.

3.2 Different Aspects of a Cloud ApplicationGiven that a cloud application is a complex compositeentity, first we identify what constitutes a cloud appli-cation, thereby establishing the different aspects to bemodeled. We identified four types of semantic aspectsfor a cloud application; namely data, functional, non-functional (QoS) and system aspects [7].1) Data aspects refer to the core data structures and

the behaviors of data items in the application.2) Functional aspects refer to the core logic of the pro-

gram, i.e. the operations and data manipulationsexpected from the application.

3) Non-functional aspects refer to the quality of ser-vice (QoS) concerns such as security.

4) System aspects refer to the specific system leveldetails relevant to the application.

We present an example application to illustrate thedifferences in these aspects. The application of choiceis a two-tier numerical data processing program, wecall spectra processor(SP) (SP is a real application being

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 5: Application Portability in Cloud Computing: An Abstraction Driven Perspective

5

used as a back-end processing component for anexperimental bio-informaticians tool).

SP consists of a service interface, exposed via HTTP.The core function of SP is performing a set of sta-tistical operations over large amount of data, whichcan be uploaded via files or character large objects(CLOBs). SP uses a Hadoop cluster in the back-end,thus submitting a processing task to SP would initiatethe following sequence of operations.

1) Obtain the data via appropriate methods and placeit in the Hadoop file system.

2) Start the Hadoop process to perform the requestednumerical operations. A unique numeric token,which is to be used to retrieve the results of thecomputation, is issued to the job submitter at thispoint.

3) Collect the logs and the data output and place it ina non-distributed file system, when the processingconcludes.

4) Provide the log output and data output, when theuser requests it by passing the numeric token.

A high-level architectural overview of SP is illus-trated in Figure 4. Each of the aspects is highlightedover the actual application component. The core func-tion expected from SP is numerical data processing,thus the implementation of this logic constitutes thefunctional aspects. The representation and storage ofthe numerical data are considered under the data as-pects. QoS capabilities of the interface are consideredpart of the non-functional aspects and the systemconfiguration is considered as part of the systemaspects.

These aspects are orthogonal to each other, thuscan be addressed independently at design time. Forexample, non-functional aspects, such as security andprivacy, are layered on top of the functional aspectsand can be varied while other aspects remain un-changed. Similarly, the system aspects can change,while the functional, data or non-functional aspectsremain unchanged.

Note that the independence of these aspects maynot be visible in the implementation. For example,secure access to the service interface in SP is a non-functional consideration at design time but this re-quires system level configuration change to enable anencrypted connection for the Web server. The sameexample also provides evidence to the relative inde-pendence of these aspects as well. Securing the end-point does not affect the functional or data aspects atall, even in the implementation.

A similar notion of designing high level details firstand using tools to insert necessary code changes isused by the Aspect Oriented Programming (AOP)[8]community. Although the concept of aspect is not thesame as ours in the AOP context, success of AOPprovide evidence that such separation is usable inpractice.

Fig. 4: Four types of aspects, highlighted for the SpectraProcessor (SP) application

3.3 Formal Specification for a Cloud Application

Now we present a formal definition of a cloud appli-cation, based on the following assumptions.

1) Each semantic aspect can be expressed using aDSL.

2) The abstract syntax model (ASM) of each of theseDSLs (i.e. the respective domain metamodels) canbe represented using a graph.

Thus, we use the graph representation of a languageASM due to their generality and flexibility. Definition1 presents this formally.

Definition 1. Abstract Syntax Model (ASM) is a directed,labeled graph G = (V,E, lE , lV ), where

• V is a set of vertices,• E is a set of ordered pairs of vertices, called edges,• lE is a labeling function, defined on E and applies to

E, called edge labels.• lV is a labeling function, defined on V and applies to

V , called vertex labels.

The interpretation of the meaning of the vertices,edges and labels of an ASM graph is dependent uponthe domain that is represented.

Based on the above assumptions, we establish thefollowing formal specification of a cloud application,

Definition 2. A cloud application CA is represented bythe four tuple CA = 〈Gdata, Gfunc, Gqos, Gsys〉 where

• Gdata is the ASM graph for the DSL representingdata,

• Gfunc is the ASM graph for the DSL representingfunctional details,

• Gqos is the ASM graph for the DSL representing non-functional details,

• Gsys is the ASM graph for the DSL representing thesystem configuration.

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 6: Application Portability in Cloud Computing: An Abstraction Driven Perspective

6

Definition 2, in simpler terms, establishes that acloud application is a collection of four specifications,each describing a different aspect of the application.In practice, one may use a single DSL to describe morethan one aspect (say Data and Function) and leave outcertain aspects entirely, implying the use of defaults.

3.4 Addressing Cloud Specific FeaturesCloud specific features, such as explicit parallelism,may be modeled as part of different semantic aspectsof Definition 2. This also depends on the application.For example, the SP system may use a horizontallyreplicating configuration for load balancing, provid-ing parallelism at the system level (such horizontalreplication is common in IaaS cloud environments).However, this parallelism is not visible (nor effect) anyother aspect. Similarly, data storage may be performedon a distributed database, yet the rest of the systemmay well be completely isolated and such distributionis transparent at design time.

3.5 Theoretical Aspects of Interest and MotivationThe key component in the abstraction driven appli-cation generation is the code generation process thatuses model transformations to convert the domainmodel to a model suitable for a cloud environment.Thus, the theoretical details we are interested in arethe properties of the model transformation from theuser domain ASMs to cloud ASMs.

We assume that an ASM (and possibly the relevantCSMs), suitable for clouds exist. This is a reasonableassumption for two reasons:1) Programming languages and models built for dis-

tributed environments exist. Two notable examplesare ERLANG [10] and Chapel [11]. Distributedprogramming paradigms such as map-reduce [12]can also be considered as an ASM since one candefine clear syntactic translations from abstractmap-reduce models. Thus, a specification can beused to either generate a program using a dis-tributed programming language or a distributedprogramming model (that can be translated toa program that runs on a software frameworksupporting the programming model, say Hadoop),when the cloud of choice offers explicit parallelism.

2) When the distributed nature of the cloud is trans-parent (as in a platform cloud), a supported gen-eral purpose programming language can be usedas the target.

To justify the importance of these theoretical inves-tigations, we formulate our objectives into high-levelquestions. Our goal is to understand the applicabilityof DSLs in the context of clouds, thus there are threequestions we are interested in answering.1) To what kind of domains can we apply the DSL

based programming abstractions?

2) What savings in effort (and cost) can be achievedwhen the DSL based abstractions are used?

3) Is it possible to reverse engineer an existing pro-gram and create a DSL representation?

In order to answer these questions, one needs tounderstand the theoretical limitations of the transfor-mations. Thus we are motivated to investigate theDSL transformations and understand their applicabil-ity and limitations in the context of clouds.

4 LANGUAGE TRANSFORMATIONS FOR THECLOUD

In this section, we investigate the language trans-formational features in detail by using a symbolicrepresentation.

Specifically, we focus on the transformation of thefunctional language ASM from the users domain tothe cloud environment, i.e., our focus is on the trans-formation of the Gfunc, introduced in Definition 2(Section 3.3). It is possible to address this transfor-mation in isolation and without loss of generality.I.e. some of the requirements relevant to functionalspecification transformations are also applicable toother graphs in Definition 2.

We make the following realization that forms thecore of our transformation strategy.

The transformation from the domain model to a cloud-supported implementation model depends heavily on thedetails of the domain metamodel. In other words, thedomain metamodels must be detailed enough so thata meaningful transformation can be made. This re-alization leads to our primary principle that sourcemetamodel graphs need different vertices for se-mantically distinct language constructs, regardlessof their syntactic representation.

This is important since it is typical for ASMs tofocus purely on giving an abstraction of the CSM,where syntactically similar constructs are modeledindistinctively. We use the simple expression meta-model, introduced in Figure 1 (Section 2.4) as anexample. In this model, the concept operator is rep-resented by a single vertex despite the fact that manysemantically different operators may exist. For ex-ample, increment and decrement operators representcompletely different tasks but are represented as asingle vertex in the typical metamodel since their

Fig. 5: Enhanced Metamodel for Mathematical expressions,showing the sub expressions

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 7: Application Portability in Cloud Computing: An Abstraction Driven Perspective

7

representations are similar in a concrete syntax. Fig-ure 5 illustrates an enhanced metamodel that defineseach operator as a distinct vertex. Usually this levelof detail is considered excessive and unnecessary insyntactically driven ASMs, where the difference of theoperator only becomes a consideration in the annalsof compiler construction.

4.1 Requirements on ASM transformations forCloud implementationsNow we state our requirements formally using require-ments and rationales. While these are not as rigorousas theorems and proofs, they can be considered thegoverning principles.

Consider Gd as the ASM of the domain, Gc as theASM of the cloud, Gmeta

d and Gmetac as the respec-

tive metamodels, represented as graphs. Consider thetransformation Td−c, denoting the source and targetas the domain and cloud respectively. Thus, Td−c isdefined using Gmeta

d and Gmetac but applies to Gd

and Gc respectively. The relationship between thesecomponents is illustrated in Figure 6.

Fig. 6: Relationship between metamodels,models and trans-formations, represented symbolically

Requirement 1. Gmetad must define distinct vertices for

each semantically distinct domain concept.

Requirement 1 states that the source metamodelshould have elements for each and every distinctdomain concept that may implement a semanticallydistinct operation.

Rationale 1. Assume that there is a vertex λmeta in Gmetad

that has two interpretations. Then there exists at least twovertices in Gd, Say λ1 and λ2, that comply to λmeta buthas two interpretations, hence should map to two differentvertices in Gc. However, since there is one vertex in Gmeta

d ,only one mapping can exist for it in Td−c.

Thus, trivially Td−c cannot manage different mappingsto λ1 and λ2 unless they map to two different meta conceptsin Gmeta

d .

Requirement 2. Td−c is a surjective (onto) mapping, i.e.all vertices in Gc must be defined by the transformation.

Requirement 2 highlights the fact that the transfor-mation must yield a complete target graph. This doesnot mean that all vertices in the source graph will be

mapped. Rather, the result of the transformation, i.e.the target graph that gets created as a result, shouldbe complete. This is a different way of specifying thatthe transformation should provide sensible defaultsto avoid an incomplete target graph.

Rationale 2. Assume that the mapping is not surjective.Then Gc is missing at least one vertex λ needed to completethe model and thus, Gc is incomplete. Then Gc cannot beconverted to a working program.Thus, In order to have an executable program, the mappingmust be surjective.

As a result of Requirement 2, we can derive lemma1.

Lemma 1. Td−c is not reversible.

Lemma 1 states that the transformation is not re-versible. This can be trivially rationalized by consid-ering the properties of a general surjective mapping,except for the special case of the mapping beingbijective. The usual case in this context is that thetarget language is almost always at a lower levelof abstraction which makes the bijective case non-existent (the transformation can be considered bijec-tive when the models being translated are at equallevels of abstraction, enabling a lossless conversion inboth directions).

4.2 Addressing Explicit Parallelism

These base requirements can be used to formulatemore restricted requirements, applicable in domaindependent contexts. One such context is the transla-tion to a map-reduce program, which is a commonrequirement when the parallel nature of infrastructureclouds need to be exploited.

We first introduce the concept of the map-reducetask graph. A map-reduce task graph, sometimescalled a physical plan is a task graph representing themap and reduce task sequences for a given program.For almost all practical cases, a single map or reducecombination is insufficient and requires a combina-tion of multiple map and reduce tasks. The map-reduce task graph represents this sequence of mapand reduce tasks. Task graphs are especially usefulwhen abstractions, such as in PIGLatin, are used togenerate map-reduce programs. PIGLatin is a SQL-like DSL, targeted towards generating Hadoop basedmap-reduce jobs [13].

Figure 7 illustrates a map-reduce task graph gen-erated by the PIG compiler for a statistical operation,sum normalization.

When requirements 1 and 2 are applied to thespecial case of translating the DSL to a map-reduceprogram, we derive the following condition, applica-ble only when transforming a domain model to amap-reduce model.

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 8: Application Portability in Cloud Computing: An Abstraction Driven Perspective

8

Fig. 7: An example map-reduce task graph (plan) for a sumnormalization operation

Requirement 3. All vertices of Gmetad must be mappable

to the components of a map-reduce task model.

Requirement 3 simply states that all domain con-cepts should have a translation that can map it to amap-reduce task graph.

4.3 A Practical Example

We present the applicability of Requirement 3 usingan example from the Scalable Cloud Application Gen-erator(SCALE) project [14]. SCALE is a DSL basedsolution to create statistical workflows for scientificdata processing (We use SCALE for our evaluationsin Section 6 as well). The DSL is used to describe thenature of the processing job required by the scientistsand then converted to a program of their choice,running on either a cloud or a desktop. A simpleexample SCALE DSL script is illustrated in Listing1. The SCALE metamodel and the conceptual modelof the script are illustrated in Figure 8.

The metamodel of the SCALE DSL is illustrated inFigure 8(a), having unique vertices to semanticallydistinct operators. The ASM of the illustrated script(complying to the metamodel) is illustrated in Figure8(b). Note that both these models are illustrated asdirected labeled graphs rather than UML models.

Listing 1: Example SCALE script to sum normalize and autoscale a dataset# l o a d v a l u e sloaded values= l o a d f i l e ( : raw values )# n o r m a l i z enormalized=sum normalize ( loaded values )# s c a l esca led=auto s c a l e ( normalized )# s t o r e i t b a c k t o a f i l es t o r e f i l e ( : processed data , sca led )

(a) Partial Metamodel for SCALEDSL

(b) Model of the computation (in-stance of the metamodel), as inListing 1

Fig. 8: SCALE metamodel and the complying model for thecomputation

The transformation from the source metamodel to amap-reduce-metamodel is illustrated in Figure 9. Notethat this translation maps all operators we modeled,to parts of a map-reduce task graph.

Fig. 9: SCALE metamodel Transformation

5 IMPACT OF THE TRANSFORMATIONALCONDITIONS

In this section, we look at the practical impact ofthe conditions we identified in Section 4.1. In otherwords, we look at how these conditions determine theanswers to the questions we posited in Section 3.5.

5.1 Domain Modeling RequirementsThe first impact of these conditions can be seen inthe effort required for domain modeling (Tacitly as-suming that the application specification can indeedbe converted to a cloud environment). Condition 1requires domain modelers to identify the semanticallydistinct concepts and incorporate them to the meta-model appropriately. Such a task obviously requiresmore effort than typically anticipated and adds addi-tional overhead at the design phase. While such extrawork is feasible in restricted domains, it may requireconsiderable effort in domains that have a largerscope, encapsulating a large number of concepts.

One method we suggest is to use the Effort Tradeoffas a means of determining the suitability of the do-main. This requires a preliminary metamodel of thedomain. Given the large number of modeling andDSL creation tools, we assume that a preliminarymetamodel can be constructed quickly and withoutsignificant commitment. We describe a systematicmethod of determining the effort tradeoff in the nextsection.

5.2 Determining the Effort TradeoffTo determine the tradeoff in effort, we introduce a sin-gle indicator R. The purpose of R is to determine theeffort tradeoff in a single target platform, assumingthe domain has already being modeled. We make thefollowing assumptions.1) The base code generation framework (parsing, syntax

tree generation etc) is in place. Thus the effort re-quired is limited to the creation of templates.

2) The effort required to create the templates can be es-timated by the lines of code (LOC) count. LOC does

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 9: Application Portability in Cloud Computing: An Abstraction Driven Perspective

9

not indicate the complexity of the code. However,it can be measured easily and rapidly, making itsuitable for this type of indicator.

3) The LOC of the DSL is always less than the LOCof the generated program.

Given these assumptions, we use the following statis-tics.1) LOCtemplates : LOC count of the templates.2) LOCgenerated : LOC of the generated code.3) LOCdsl : LOC of the DSL script.

We combine these metrics to form R, using Equa-tion 1.

R =

(ln

LOCdsl

LOCgeneratedln

1

LOCtemplates

)−1

(1)

The rationale behind organizing these metrics inEquation 1 is as follows. The ratio between LOCdsl

and LOCgenerated is a direct indication of saving ineffort gained by using the DSL. However the gener-ation capability of the DSL comes via the templatesand thus, the effort required in creating the templatesshould be discounted from the gain. Arranging thesemetrics using their logarithms ensure R is normalizedand remains between 0 and 1, given our assumptionthat LOCdsl < LOCgenerated.

For practical cases, lower R values represent betterapplicability. Higher R values suggest either extensivetemplate code efforts (with respect to the generatedcode) or the need for lengthy DSL code scripts thatwould offset the advantage gained from introducinga DSL in the first place.

The evaluation in Section 6 presents an experimentthat calculate the R values.

5.2.1 Limitations in Transforming to Distributed Mod-elsCondition 3 states that a complete transformationfrom the domain model to the map-reduce taskmodel should exist, in order to implement explicitparallelism. In other words, every domain conceptrepresented in the DSL should have an equivalentoperation or a combination of operations in map-reduce.

In practice this is hard to achieve, even in a limiteddomain. The reason for this can be attributed to theDSL feature gap where all desired features may notbe supported by the implemented features. It impliesthat it is always possible for some desired features tobe not covered by the DSL.

An example for this can be found in the contextof the SCALE project (Presented in detail in Sec-tion 6. Some specialized statistical operators used inadvanced NMR data processing, such as orthogonalprojection on latent structures (OPLS)[15] have noparallel implementations yet (OPLS is an iterativeprocess and it may be possible to convert it to an

explicitly parallel version. It has not been formulatedinto a explicitly parallel process yet). Thus, the orig-inal SCALE language does not support OPLS as anoperator, even though it is highly desired.

The impact of this is felt when a desired featurebecomes essential. In that case only a portion of theprogram can be distributed. As in the case of SCALE,the code generators have modifications that allowthem to incorporate sequential (non distributed) codesegments to the programs. This is very inefficient(for example, the data has to be pulled from the dis-tributed file system to the local file system, processedand put back to the distributed file system when anon-distributed operation id interleaved), yet deemednecessary in extreme cases.

5.2.2 Reverse EngineeringLemma 1 states that it is unreliable (or impossible inmost cases) to reverse the transformation. This meansthat trying to reverse engineer programs and generatea DSL representation is not possible.

In practice, one may be able to glean a reasonableset of abstractions from a limited set of existing applica-tions. This however should not be taken as a generalproperty. The abstractions in the scope of this researchare domain focused and can only be converted toan executable program by incorporating significant(often assumed) details. It is simply not possible totake an arbitrary program and convert it to an abstractform.

6 EVALUATION

Now we present two of our research projects andevaluate them based on the code metrics.

MobiCloud [16], [17] presents a DSL driven ap-proach to generate cloud-mobile hybrid (CMH) appli-cations. A CMH application has a cloud based back-end as well as a mobile device based front-end. Thecurrent MobiCloud DSL has provisions for data andfunctional specifications. The QoS (non-functional)and system details are assumed by the generators,although the composer may tweak some of theseparameters, either via metadata attributes or an ex-tension mechanism [17]. The MobiCloud composer isavailable for public use5.

SCALE, briefly introduced in Section 4.3, is a DSLdriven solution for scientific programs [14]. A DSLis used to describe the nature of the processing jobrequired by the scientists and then converted to aprogram of their choice, running on either a cloud ora desktop. The SCALE DSL is deliberately kept simpleand tightly bound to the domain of interest since theprimary users of this DSL are domain experts ratherthan cloud programmers. SCALE composer is alsoavailable for public use6.

5. http://mobicloud.knoesis.org6. http://metabolink.knoesis.org/SCALE

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 10: Application Portability in Cloud Computing: An Abstraction Driven Perspective

10

Script Description DSL

LOC

(LOC

dsl)

Mod

els

Vie

ws

Con

trol

lers

Targ

et

Gen

erat

edLO

CLOC

gen

Rat

io(L

OC

dsl:L

OC

gen

)

TaskManager

A task manager application that storesand retrieves tasks 11 1 2 1

Android 541 1:49Blackberry 329 1:30EC2 156 1:15GAE 346 1:31

ShopManager

An application to keep track of jobs andcustomers for a mechanics shop 17 2 4 2

Android 1244 1:73Blackberry 592 1:35EC2 628 1:37GAE 1021 1:60

URL Fetcher Fetches and displays values from aYahoo Web service 9 1 1 1

Android 486 1:54Blackberry 100 1:11EC2 289 1:32GAE 466 1:52

SalesforceContacts

Fetches and displays the contact listfrom a Salesforce account 9 1 1 1

Android 794 1:88Blackberry - -EC2 - -GAE 1377 1:153

TABLE 1: LOC counts of selected MobiCloud generated applications

Operation DSL Azure Hadoop DesktopSum normalization 3 847 952 88Auto Scaling 3 848 956 94Sum normalize thenauto scale

4 938 1022 106

TABLE 2: LOC counts of selected SCALE generated appli-cations

6.1 Experiments

We performed two main categories of experiments.1) To evaluate the saving of effort, we measured the

lines of code (LOC) of selected programs againstthe LOC of the DSL.

2) To evaluate the effort to create the generationmechanism, we measured the LOC of the codegeneration templates.

The CLOC tool7 was used to obtain the LOC countsin all the listed experiments. The CLOC tool countsall code segments in a given directory recursively,separating counts for different types of languages (forexample, when a project includes resources that usemultiple languages, CLOC provides a breakdown ofdifferent code counts, excluding comments and whitespace). These experiments use the sum of all thecode counts. For example, for a generated Androidapplication, the LOC count (listed in Table 1) includesthe sum of Java LOC and XML LOC, as counted byCLOC. Android projects use an XML based languagefor user interface layout, apart from their Java codesegments.

LOC is used as the primary metric due to itssimplicity and ease of use. LOC depends on thechoice of the language ( scripting languages such asRuby typically require a fewer lines of code thana traditional programming language like Java, forthe same operation). However, we assume that therelative advantage of effort is minimally affected by

7. http://cloc.sourceforge.net/

the base language used since the DSL script is anywaymuch more concise than the equivalent GPPL code.

The next two sections produce details of theseexperiments.

6.2 Effort Comparison Using Generated CodeFor the code comparison experiment, 4 MobiCloudprograms were selected. Two of the programs use theextension capabilities while the other two are basedon the base language (See [17] for a discussion onthe difference between the base language and the ex-tended language of MobiCloud). Table 1 outlines thetypes of selected applications and their code statistics.The same statistics are presented in Figure 10(a) as agraph.

To compare the effort saving in SCALE, three pro-grams that represent either one or a combination oftwo operators were selected. The first two programssimply loaded a dataset, performed a single operationand wrote the results to a (distributed) file. The thirdprogram used a sequence of the two operations beforethe result is written to the file. The SCALE code statis-tics are presented in Table 2. A graph presentation ofthe same data is available in Figure 10(b).

6.2.1 DiscussionThere are varying degrees of savings in terms ofcode creation effort, which heavily depends on thetarget platform (when the effort is considered to bea matter of writing the code). The saving in effortis not uniform (ranges from 10 to 153 times in ourexperiments) because the cloud platforms provideprogramming primitives of different granularities.

In reality, there are three other types of effort thatgets introduced into program creation.1) Effort in algorithm conversion : This is more

prominent in instances where explicit parallelismis required, such as in the case of SCALE. There

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 11: Application Portability in Cloud Computing: An Abstraction Driven Perspective

11

(a) MobiCloud LOC Comparison (b) SCALE LOC Comparison

Fig. 10: LOC statistics of MobiCloud and SCALE generated applications

Target Platform Template LOC Resources LOCAndroid 1162 0EC2 519 10BlackBerry 380 0GAE 789 24

(a) MobiCloud Template LOC

Target Platform Template LOC Resources LOCAzure 454 736Hadoop 434 886Ruby 133 90

(b) SCALE Template LOC

(c) MobiCloud Template LOC Comparison (d) SCALE Template LOC Comparison

Fig. 11: LOC statistics of MobiCloud and SCALE Generators

is significant effort saved by using the generatorsto convert the algorithms to map- reduce versionswhich is not reflected in the LOC counts.

2) Effort in code organization : Some platforms re-quire the software artifacts to be organized in aspecific way. For example, GAE projects require aspecific code organization that is expected by theGAE deployment tools. The effort saved in codeorganization is not reflected in the above statistics.

3) Effort in debugging API incompatibilities : A sig-nificant debugging effort is saved in some caseswhere the client and server are both generated bythe same specification. Often the remote commu-nications are the most error prone segments in adistributed programming environment. This savedeffort is not indicated in the LOC comparison.

Despite the inability of the LOC metric to capturecode translation burden and quality, LOC count en-ables a rough assessment of the relative effort requiredto create a program. This is sufficient for us to rapidlycalculate the effort savings. In other words, it is

sufficient for us to find a lower bound assessment.

6.3 Effort Comparison for Templates

The objective of this evaluation is to quantify the effortin actually creating the code generation mechanism.

Both MobiCloud and SCALE are based on the samecode generation engine. The only difference is theparser and the set of code templates. Thus, in thisexperiment, the code templates are analyzed, assum-ing the LOC counts in the templates are indicative ofthe effort to create them.

Figure 11 presents these statistics. Table 11(a) andTable 11(b) (Charts 11(c) and 11(d) respectively) in-clude the LOC counts of templates and resourcesfor MobiCloud and SCALE respectively. The countsunder resources indicate static code files that getsplaced in the generated code without modifications.Resources are especially important when significantcode can be inserted without modification. For ex-ample, in SCALE, most of the relevant mapper and

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 12: Application Portability in Cloud Computing: An Abstraction Driven Perspective

12

reducer implementations are inserted without modi-fication and wired together using a controller class,generated dynamically.

6.3.1 DiscussionSimilar to the case of generated applications, effortin algorithm conversion and the effort in templateconversion are not reflected in these numbers.

Additionally, the effort in creating the parser isnot taken into account in this experiment. This doesnot effect the accuracy of our experiments since bothDSLs are subsets of Ruby (i.e. Internal DSLs that useRuby as the host language). Thus, the existing Rubyparser is reused and no explicit effort is taken toconstruct a parser. In cases where external DSLs areused, the parser would also contribute to part ofthe development effort. It is uncommon however toconstruct a parser from scratch. For most cases, onecan use a parser generator tool, thus the effort tocreate a parser for the DSL is negligible in most cases.

6.4 R value ComparisonUsing equation 1, we calculate the R values for aselected set of MobiCloud and SCALE generated ap-plications. These figures are presented in Table 3.

LOCgen LOCdsl LOCtpl RMobiCloud (Android) 1162 12 766 0.034MobiCloud (GAE) 813 12 802 0.036SCALE (Azure) 1190 3 878 0.025SCALE (Hadoop) 1320 3 977 0.024SCALE (Ruby) 223 3 96 0.053

TABLE 3: R value calculations

The R values are relatively small indicating thatthe selected targets are a good fit for this approach.It is noticeable that the Ruby target has a higher Rindicating that the benefit of the using the DSL isnot as much as in the other cases. Indeed the Rubyversions of the program were the smallest (in termsof LOC) implementation of the generated programs.

Since R is a relative measure, there is no hard cut-off. It could be used at the discretion of the programdeveloper to decide whether their own goals are met.There are other factors that are not reflected with R(since R is based on just LOC), thus it may not beused as the only decision factor.

7 PRACTICAL EXPERIENCE AND LESSONSLEARNEDIn this section, we briefly present our experiencein the research projects discussed in Section 6 andsome of the lessons learned. These lessons are morepractical in nature (i.e. they are not based on carefullydesigned objective experiments) and stem from thenumerous discussions and interactions we had withpractitioners. They outline the applicability of oursuggested development process and highlight someof the important considerations in practice.

7.1 Experience from MobiCloud and SCALE

The most notable lessons we learned from MobiCloudand SCALE projects are as follows:1) The language and data transformations via a DSL into

multiple platforms is practical, as long as the scope ofthe application domain is managed. In other words,the generated programs are functional but are notable to exploit every unique features of a targetplatform. Adding special constructs to the DSLto exploit such features tend to take away thesimplicity of the DSL and thus, a good controlof the domain scope is essential. In the case ofMobiCloud, the DSL is deliberately kept simple toavoid contaminating the core MVC structure.

2) It is possible to intertwine the functional, non-functional, data and system considerations in a singlescript, in a manner that is natural to the domain.Such compositions helps the application authorto define a single script with all the necessarydetails. Although designers may be tempted to usespecialized DSLs to define different aspects of theprogram, this produces a difficult learning process,defeating the purpose of the DSL. The MobiCloudDSL is produced as a single coherent languagecovering data and functional aspects, which madeit acceptable to many amateurs.

3) For domains driven primarily by domain experts, pro-viding tools to run a complete process (develop, de-ploy and monitor) is very important. Program gen-eration alone is not useful to these domain sci-entists unless there is an associated mechanismand tooling that lets them deploy and monitorthese applications. This was clearly observed in theMobiCloud project where the acceptance increasedafter automatic deployment tools were introduced.The tooling requirements are further discussed inSection 7.2.

7.2 Other Considerations in Practice

while the above mentioned facts highlight specificconsiderations in the highlighted projects, followingare some of the more general considerations we notedduring the use of DSLs in cloud program generation.

User PerceptionEven though abstractions implemented via a DSLintroduces a streamlined development life-cycle, user(developer) perception of the language plays an impor-tant role in adoption.

For example, MobiCloud is targeted at the devel-oper community and the introduction of the DSL-driven technique was not met with enthusiasm. A sur-vey identified that this is mostly due to the apparentinability of the DSL to exploit certain platform specificfeatures, in the mobile platforms as well as the cloudplatforms. The underlying reason seems to be the

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 13: Application Portability in Cloud Computing: An Abstraction Driven Perspective

13

perceived loss of fine grained control due to the intro-duction of the abstractions, although the time savedby using MobiCloud was obvious and the developerscould tweak the code if they wished, much fasterthan building it from scratch. Hence, user perceptionplays an important role in introducing this type ofan abstraction driven programming paradigm. Thepolarity of the perception changes with the target usercommunity and should be addressed accordingly.

ToolsTools play an absolutely critical role in an abstractiondriven development process. Properly featured toolsare essential to mitigate the learning curve of a DSL,even when the textual form of the DSL is fairly simple,featuring a much simpler learning curve than a typ-ical programming language. Both in MobiCloud andSCALE, the primary composer is graphical, i.e., onecan drag and drop graphical symbols to a canvas tocompose the respective program, reducing the learn-ing effort. The graphical composers arose from neces-sity since many users, especially non-programmers,were not interested in learning a new programminglanguage, even when it was only matter of few hours.

Also, tool support for other actions such as clouddeployments are expected by many users. The veryfirst demonstration of MobiCloud graphical composerwas not well received since it did not support auto-matic deployments. The subsequent releases that haddeployment support were received well by users andthe utility of such a MobiCloud could be demon-strated clearly.

8 DISCUSSION

Although an abstraction driven methodology has sig-nificant advantages, there are certain factors that needmore exploration for a comprehensive solution. Wediscuss three important considerations in this regard.

8.1 Data Management and MigrationWe have deliberately omitted application data man-agement considerations. In many instances, the accu-mulated data is considered an important asset andsignificant effort is spent on porting the data to thenew platform.

Although not explicitly discussed, our abstractiondriven, top-down approach forces the user programsto organize their data in a high level model, providinga methodical way to migrate data. The high levelmodel is translated to the platform specific logicalschema via a known translation process, thus, it ispossible to mechanically generate a transformationto port data from one platform to another using thehigher-level model as an intermediary.

Note that we tacitly assumed the data modelshave equal expressive power and the transformationscan be performed losslessly. Investigating data model

compatibility and the applicable limitations is a non-trivial task and outside the scope of the current work.

8.2 Deployment and Management of ApplicationsApplication deployment (placing the application ina cloud) and management (updating the configu-rations, taking backups etc.) are important to theapplications life-cycle. The programming abstractionsbecome highly useful only when abstractions are pro-vided over the deployment and management process.

We mention our related research, for sake of com-pleteness. We considered the use of a Middlewarelayer to provide abstractions over application de-ployment and management. This has been success-fully demonstrated in the IBM Altocumulus researchproject [18]. Altocumulus allows users to deploy com-patible applications to Amazon EC2, Google AppEngine and IBM HiPODS, an IBM private cloud of-fering, using a uniform user interface. The proceduraldifferences between the cloud deployment processeshave been made transparent to the users via themiddleware layer. The success of this strategy hasbeen highlighted by its influence on the new IBMproduct, the IBM workload deployer, part of the IBMPureSystems private cloud solution.

8.3 Addressing Non-functional AspectsIn real-world applications, non-functional aspects areconsidered extremely important, sometimes as muchas the core functionality of the application itself.For example, security and privacy are consideredparamount in a number of industries (Financial, On-line retail etc.) and significant effort is spent on hard-ening and verifying application security. We have notfocused on these issues, though the use of abstrac-tions provide a clear way to incorporate such non-functional capabilities into the generated applications.

The other aspects, as discussed in Section 3.2, canbe incorporated into the DSL, either by embeddingfragments of other DSLs or extending the DSL itselfto provide them. It is entirely possible for the codegenerators to insert QoS related code segments.

One example in this regard is the secure metadataattribute in MobiCloud. Setting the secure attribute totrue generates code that force all communications tohappen using the HTTPS protocol. Although this re-quires changes across a number of components (clientlibraries, server configurations and service interface),it’s a matter of setting one attribute in the DSL script.We have discussed at length, how such modificationscan be done to MobiCloud via an extension mecha-nism [17]. Such a mechanism would also be applicableto other domains that use DSL based solutions.

9 CONCLUSION

We have explored the use of abstractions to supportcloud programming. The driving principle is that the

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Page 14: Application Portability in Cloud Computing: An Abstraction Driven Perspective

14

use of cloud should take an user driven perspec-tive, rather than a provider-driven perspective. Weinvestigated the use of DSLs to provide user-orientedabstractions for cloud programs and contributed a setof conditions on domain metamodel transformationsapplicable in this context.

Based on these conditions, we have devised anindicator, R, as a litmus test to determine the appli-cability of a domain to use the abstraction drivenmethodology. Our experiments in two disparate do-mains indicate that DSL based solutions are indeedapplicable and provide a manageable way to gen-erate programs targeted towards the cloud. Whilethe solution space is limited to domains of interest,increasingly clouds are being used for domain drivenprocessing tasks and hence, domain based solutionsare of great interest.

In summary, we conclude that using abstractionsvia DSLs is an effective and a feasible method toprovide a uniform programming methodology forclouds.

REFERENCES

[1] A. Kleppe, Software language engineering: creating domain-specificlanguages using metamodels. Addison-Wesley Professional,2009.

[2] A. van Deursen, P. Klint, and J. Visser, “Domain-specificlanguages: an annotated bibliography,” SIGPLAN Not., vol. 35,no. 6, pp. 26–36, 2000.

[3] K. Czarnecki and S. Helsen, “Feature-based survey ofmodel transformation approaches,” IBM Systems Journal,vol. 45, no. 3, pp. 621–645, 2006. [Online]. Available:http://bit.ly/w6fD4S

[4] G. Booch, J. Rumbaugh, and I. Jacobson, Unified ModelingLanguage User Guide, The (Addison-Wesley Object TechnologySeries). Addison-Wesley Professional, 2005.

[5] J. Sprinkle, A. Ledeczi, G. Karsai, and G. Nordstrom, “Thenew metamodeling generation,” in Proceedings. Eighth AnnualIEEE International Conference and Workshop On the Engineeringof Computer Based Systems-ECBS 2001. IEEE Comput. Soc,pp. 275–279. [Online]. Available: http://bit.ly/wJLT6G

[6] G. Nordstrom, “Metamodeling - Rapid Design andEvolution of Domain-Specific Modeling Environments,” Ph.D.dissertation, Vanderbilt University, 1999. [Online]. Available:http://bit.ly/w09iDU

[7] A. Sheth and A. Ranabahu, “Semantic Modeling for CloudComputing, Part 1,” IEEE Internet Computing, vol. 14, no. 3, pp.81–83, May 2010. [Online]. Available: http://bit.ly/yGTv6D

[8] G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. Lopes,J.-M. Loingtier, and J. Irwin, “Aspect-oriented programming,”ECOOP’97Object-Oriented Programming, pp. 220–242, 1997.

[9] R. Laddad, AspectJ in action: practical aspect-oriented program-ming. Manning, 2003, vol. 512.

[10] J. Armstrong, R. Virding, C. Wikstrom, and M. Williams,Concurrent programming in ERLANG. Prentice Hall, 1996,vol. 2.

[11] B. L. Chamberlain, D. Callahan, and H. P. Zima, “Parallel pro-grammability and the chapel language,” International Journalof High Performance Computing Applications, vol. 21, no. 3, pp.291–312, 2007.

[12] S. Ghemawat and J. Dean, “Mapreduce: Simplified data pro-cessing on large clusters,” in Symposium on Operating SystemDesign and Implementation (OSDI04), San Francisco, CA, USA,2004.

[13] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins,“Pig Latin: A not-so-foreign language for data processing,” inProceedings of the 2008 ACM SIGMOD international conferenceon Management of data. ACM, 2008, pp. 1099–1110.

[14] A. Ranabahu, P. Anderson, and A. Sheth, “The Cloud Agnostice-Science Analysis Platform,” IEEE Internet Computing,vol. 15, no. 6, pp. 85–89, Nov. 2011. [Online]. Available:http://bit.ly/HdAIqP

[15] P. Anderson, “Algorithmic techniques employed in the quan-tification and characterization of nuclear magnetic resonancespectroscopic data,” Ph.D. dissertation, Wright State Univer-sity, 2010.

[16] A. Manjunatha, A. Ranabahu, A. Sheth, and K. Thirunarayan,“Power of Clouds in Your Pocket: An Efficient Approach forCloud Mobile Hybrid Application Development,” 2010 IEEESecond International Conference on Cloud Computing Technologyand Science, no. 2, pp. 496–503, 2010. [Online]. Available:http://bit.ly/zW2s4u

[17] A. Ranabahu, E. M. Maximilien, A. P. Sheth, andK. Thirunarayan, “A Domain Specific Language forEnterprise Grade Cloud-Mobile Hybrid Applications,” in11th Workshop on Domain-Specific Modeling, 2011. [Online].Available: http://bit.ly/ACKAzS

[18] E. Maximilien, A. Ranabahu, and K. Gomadam, “An OnlinePlatform for Web APIs and Service Mashups,” IEEE InternetComputing, vol. 12, no. 5, pp. 32–43, 2008.

Ajith Ranabahu is an engineer with Ama-zon Web Services and earned his PhD incomputer science at the Ohio Center of Ex-cellence in Knowledge-enabled Computing(Kno.e.sis) in Wright State University. His pri-mary research is focused on application anddata portability in Cloud computing. Contacthim at [email protected].

E. Michael Maximilien is a research staffmember at IBM Research. He is active in anumber of technical communities inside andoutside of IBM. He is keenly interested inlanguages, systems, methods, practices, andtechniques that make web computing easierand help make the web a trustable, social,and programmable platform and substrate forbusinesses and individuals. Contact him [email protected].

Amit Sheth is the LexisNexis Ohio EminentScholar and the director of the Ohio Centerof Excellence in Knowledge-enabled Com-puting (Kno.e.sis) at Wright State University.His research interests are Web 3.0, includingSemantic Web, and semantics empoweredSocial Web, Sensor Web/Web of Things, Mo-bile Computing and Cloud Computing. Con-tact him at [email protected].

Krishnaprasad Thirunarayan is a Profes-sor in the Ohio Center of Excellence inKnowledge-enabled Computing (Kno.e.sis)at Wright State University. His research inter-ests are in Semantic Social and Sensor DataAnalytics, Web 3.0, Information Retrieval/Ex-traction and Semantics of Trust. Contact himat [email protected]

IEEE TRANSACTIONS ON SERVICES COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.