developing a cloud computing platform for big data:...

3
Developing a Cloud Computing Platform for Big Data: The OpenStack Nova case Jose Teixeira Information Systems Science Unit University of Turku Finland jose.teixeira@utu.fi Abstract—New developments in virtualization and cloud computing infrastructure keep leveraging the value of Big Data for organizations. In this research, we explore how competing firms collaborate in the development of the OpenStack Nova open-source cloud computing infrastructure. We employ a mix- methods approach that bridges an ethnographic approach with computerized Social Network Analysis (SNA) that data-mines the OpenStack project source-code and its repositories. We take a longitudinal analysis approach that allows us to observe how key events in the competitive cloud computing in- dustry have affected the development of the Openstack project over time. Our findings capture how companies such as HP, Rackspace, IBM, Mirantis, Vmware, Citrix and many others simultaneously collaborate and compete in a high-networked open-source software project over time. After integrating our findings with the current body of theoretical knowledge in R&D Management, Software Engineering and Open-Source Software, we argue that the established coopetition manage- ment theories provide powerful lenses for understanding the competitive and collaborative issues that are simultaneously present and interconnected in the OpenStack open-source project. Keywords-Big Data; Cloud Computing; Open-Source; Open- Stack; Open-Coopetition; I. I NTRODUCTION In an era of recognizable software-crisis [1] the move of firms towards geographically-distributed, and often off- shored, software development teams is being challenged by collaboration issues. On this matter, the open-source phenomenon may shed some light, as successful cases on distributed collaboration in the open-source community have been repeatedly reported [2], [3], [4]. While practitioners face difficulties with globally distributed software develop- ment teams there remains a lack of research in academia addressing the collaboration dynamics of large-scale and complex distributed software projects [5], [6]. In this paper, we attempt to explore this gap by exploring the collaboration networks in the development of a complex cloud computing infrastructure for Big Data, the OpenStack Nova open-source project. II. THE OPENSTACK PROJECT OpenStack is an open-source software cloud computing platform that is primarily deployed as an Infrastructure as a Service (IaaS) solution. It started as a joint project with Rackspace, an established IT web hosting company, and NASA; a well-known USA governmental agency responsible for the civilian space program, aeronautics and aerospace research. Today more than 200 firms joint-develop Open- Stack while contributing to different open-source projects governed by the OpenStack Foundation. Both hardware and software developers affiliated with companies such as AT&T, AMD, Canonical, Cisco, Dell, EMC, Ericsson, HP, IBM, Intel, NEC, NASA and many others work together with independent non-affiliated developers in a scenario of pooled R&D in an open-source fashion. The OpenStack Nova project, our unit of analysis, is a cloud computing fabric controller, the main part of an IaaS system. It is the biggest and most core project governed by the OpenStack foundation. The project originally started within the NASA Ames Research Laboratory, but it further evolved to an inter-firm and high-networked open-source project developed by dozens of firms and thousands of developers. In the very competitive and expanding industry of cloud computing products and services, the OpenStack Nova project brought together volunteers and firm-sponsored software developers who collaborate over the Internet in an open and transparent manner while giving up many of the traditional intellectual property rights. III. METHOD We engaged in a multidisciplinary mix-methods approach employing both qualitative methods and Social Network Analysis. Our network analysis emphasis on the visual- ization of collaborative activities as in prior studies in Biomedicine [7], Innovation Studies [8], Information Sys- tems [9] and Software Engineering [10]. We made use of ethnographic material to inform a Social Network Analysis (i.e. the collected ethnographic data guided the sampling and design of the more positivist network analysis) and vice-versa (i.e. the network analysis raised new questions that required an qualitative ethno- graphic approach). Within Table I, we capture a set of multidisciplinary seminal works that guided our research design. 2014 IEEE International Conference on Big Data 978-1-4799-5666-1/14/$31.00 ©2014 IEEE 67

Upload: letu

Post on 26-Mar-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

Developing a Cloud Computing Platform for Big Data: The OpenStack Nova case

Jose TeixeiraInformation Systems Science Unit

University of TurkuFinland

[email protected]

Abstract—New developments in virtualization and cloudcomputing infrastructure keep leveraging the value of Big Datafor organizations. In this research, we explore how competingfirms collaborate in the development of the OpenStack Novaopen-source cloud computing infrastructure. We employ a mix-methods approach that bridges an ethnographic approach withcomputerized Social Network Analysis (SNA) that data-minesthe OpenStack project source-code and its repositories.

We take a longitudinal analysis approach that allows us toobserve how key events in the competitive cloud computing in-dustry have affected the development of the Openstack projectover time. Our findings capture how companies such as HP,Rackspace, IBM, Mirantis, Vmware, Citrix and many otherssimultaneously collaborate and compete in a high-networkedopen-source software project over time. After integrating ourfindings with the current body of theoretical knowledge inR&D Management, Software Engineering and Open-SourceSoftware, we argue that the established coopetition manage-ment theories provide powerful lenses for understanding thecompetitive and collaborative issues that are simultaneouslypresent and interconnected in the OpenStack open-sourceproject.

Keywords-Big Data; Cloud Computing; Open-Source; Open-Stack; Open-Coopetition;

I. INTRODUCTION

In an era of recognizable software-crisis [1] the moveof firms towards geographically-distributed, and often off-shored, software development teams is being challengedby collaboration issues. On this matter, the open-sourcephenomenon may shed some light, as successful cases ondistributed collaboration in the open-source community havebeen repeatedly reported [2], [3], [4]. While practitionersface difficulties with globally distributed software develop-ment teams there remains a lack of research in academiaaddressing the collaboration dynamics of large-scale andcomplex distributed software projects [5], [6]. In this paper,we attempt to explore this gap by exploring the collaborationnetworks in the development of a complex cloud computinginfrastructure for Big Data, the OpenStack Nova open-sourceproject.

II. THE OPENSTACK PROJECT

OpenStack is an open-source software cloud computingplatform that is primarily deployed as an Infrastructure asa Service (IaaS) solution. It started as a joint project with

Rackspace, an established IT web hosting company, andNASA; a well-known USA governmental agency responsiblefor the civilian space program, aeronautics and aerospaceresearch. Today more than 200 firms joint-develop Open-Stack while contributing to different open-source projectsgoverned by the OpenStack Foundation. Both hardware andsoftware developers affiliated with companies such as AT&T,AMD, Canonical, Cisco, Dell, EMC, Ericsson, HP, IBM,Intel, NEC, NASA and many others work together withindependent non-affiliated developers in a scenario of pooledR&D in an open-source fashion.

The OpenStack Nova project, our unit of analysis, is acloud computing fabric controller, the main part of an IaaSsystem. It is the biggest and most core project governedby the OpenStack foundation. The project originally startedwithin the NASA Ames Research Laboratory, but it furtherevolved to an inter-firm and high-networked open-sourceproject developed by dozens of firms and thousands ofdevelopers. In the very competitive and expanding industryof cloud computing products and services, the OpenStackNova project brought together volunteers and firm-sponsoredsoftware developers who collaborate over the Internet in anopen and transparent manner while giving up many of thetraditional intellectual property rights.

III. METHOD

We engaged in a multidisciplinary mix-methods approachemploying both qualitative methods and Social NetworkAnalysis. Our network analysis emphasis on the visual-ization of collaborative activities as in prior studies inBiomedicine [7], Innovation Studies [8], Information Sys-tems [9] and Software Engineering [10].

We made use of ethnographic material to inform aSocial Network Analysis (i.e. the collected ethnographicdata guided the sampling and design of the more positivistnetwork analysis) and vice-versa (i.e. the network analysisraised new questions that required an qualitative ethno-graphic approach).

Within Table I, we capture a set of multidisciplinaryseminal works that guided our research design.

2014 IEEE International Conference on Big Data

978-1-4799-5666-1/14/$31.00 ©2014 IEEE 67

Table I: Multidisciplinary approach

Employed approach Discipline(s) Seminalworks

Netnography Marketing [11][12]

Mining of softwarerepositories

Software-Engineering [13][14]

Network analysis ofdigital trace data

Information-Systems [15][9]

Network analysis withemphasis on thevisualization of

collaborative activities

BiomedicineBibliomentricsInnovation-Studies

[7][8][16]

Network analysis ofmassive networked data.

Use of clustering andsub-community detection

algorithms.

Physics MathematicsComputer-ScienceAnthropologyBioinformatics

[17][18][19][20]

IV. RESULTS

By mining the OpenStack Nova source-code change-logwith Social Network Analysis, we extracted informationabout software developers (network nodes) and their col-laborative behaviors (network edges). Taking a longitudi-nal approach we observed how the collaborative networkevolved over time. As in [7] [8] [16], we emphasized onthe visualization of collaborative activities we constructedvisualization that capture the evolution of the OpenStackcollaborative network release after release. The Figure 1aggregates visualizations with degree centrality (i.e. withmore networked developers at the center) that allow us todepict the evolution of code-collaboration in the OpenStackNova project .

With very dense networks, the direct interpretation ofthe visualizations was difficult. Therefore we performed anautomated sub-community detection using the state-of-artsimmelian backbones extraction method [21]. We opted touse data from the last OpenStack releases (Grizly, Havanaand Icehouse) due to higher project maturity and a steadydiminution of group cohesion (i.e. tendency for sub group-ing) as ’ploted’ in Figure 2. The emergent sub-communitiesin Figure 3 surprising reveal a low degree homophily incode-collaboration.

V. CONTRIBUTION

Our social network visualizations capture how featuresof collaboration (among software developers affiliated withdifferent companies) evolved over time, release after release,event after event. Our findings reinforce prior researchaddressing how rival firms simultaneously collaborate andcompete in the open-source arena [22], [23], [9].

A set of our retrieved network visualizations are available,under public domain, at our project website on the Internethttp://www.jteixeira.eu/OpenStackSNA.

citrix

rackspace

(a) Bexar

citrix

cloudscale

rackspace

(b) Cactus

cannonical

citrix

cloudscale

hp

ibm

mirantis

nebula

rackspace

redhat

vmware

(c) Diablo

cannonical

citrix

cloudscale

hp

ibm

mirantis

rackspace

redhat

intel

(d) Essex

cannonical

citrix

cloudscale

hp

ibm

mirantis

nebula

rackspace

redhat

vmware

(e) Folsom (f) Grizzly

(g) Havana (h) Icehouse

Figure 1: Visualizations with degree centrality

REFERENCES

[1] B. Fitzgerald, “Software Crisis 2.0,” Computer, vol. 45, no. 4,pp. 89–91, 2012.

[2] A. Bonaccorsi and C. Rossi, “Why open source software cansucceed,” Research policy, vol. 32, no. 7, pp. 1243–1258,2003.

[3] E. Raymond, “The cathedral and the bazaar,” Knowledge,Technology & Policy, vol. 12, no. 3, pp. 23–49, 1999.

[4] S. Q. Mian, J. Teixeira, and E. Koskivaara, “Open-SourceSoftware Implications in the Competitive Mobile PlatformsMarket,” in Building the e-World Ecosystem. Springer, 2011,pp. 110–128.

68

Figure 2: Evolution of group-size and group-cohesion overtime

cannonical

citrix

cloudscale

hp

ibm

mirantis

nebula

rackspace

redhat

vmware

Figure 3: Sub-community detection using Simmelian back-bones extraction

[5] B. Sengupta, S. Chandra, and V. Sinha, “A research agendafor distributed software development,” in Proceedings ofthe 28th international conference on Software engineering.ACM, 2006, pp. 731–740.

[6] M. Paasivaara and C. Lassenius, “Collaboration practices inglobal inter-organizational software development projects,”Software Process: Improvement and Practice, vol. 8, no. 4,pp. 183–199, 2003.

[7] A. Cambrosio, P. Keating, and A. Mogoutov, “Mapping col-laborative work and innovation in biomedicine: A computer-assisted analysis of antibody reagent workshops,” SocialStudies of Science, pp. 325–364, 2004.

[8] B.-A. Lundvall, “User-producer relationships, national sys-tems of innovation and internationalisation,” National systemsof innovation: Towards a theory of innovation and interactivelearning, pp. 45–67, 1992.

[9] J. Teixeira and T. Lin, “Collaboration in the open-sourcearena: The WebKit case,” in Proceedings of the 52th ACMSIGMIS conference on Computers and people research, Sin-gapore, 2014.

[10] K. Crowston and J. Howison, “The social structure of free andopen source software development,” First Monday, vol. 10,no. 2-7, 2005.

[11] R. V. Kozinets, Netnography: Doing ethnographic researchonline. Sage Publications Limited, 2009.

[12] ——, “The field behind the screen: using netnography formarketing research in online communities,” Journal of mar-keting research, pp. 61–72, 2002.

[13] G. Robles, J. M. Gonzalez-Barahona, M. Michlmayr, and J. J.Amor, “Mining large software compilations over time: an-other perspective of software evolution,” in Proceedings of the2006 international workshop on Mining software repositories.ACM, 2006, pp. 3–9.

[14] H. Kagdi, M. L. Collard, and J. I. Maletic, “A survey andtaxonomy of approaches for mining software repositoriesin the context of software evolution,” Journal of SoftwareMaintenance and Evolution: Research and Practice, vol. 19,no. 2, pp. 77–131, 2007.

[15] J. Howison, A. Wiggins, and K. Crowston, “Validity issuesin the use of social network analysis with digital trace data.”Journal of the Association for Information Systems, vol. 12,no. 12, 2011.

[16] W. Glanzel and A. Schubert, “Analysing scientific networksthrough co-authorship,” in Handbook of quantitative scienceand technology research. Springer, 2005, pp. 257–276.

[17] S. Fortunato, “Community detection in graphs,” Physics Re-ports, vol. 486, no. 3, pp. 75–174, 2010.

[18] W. Zachary, “An information flow modelfor conflict and fis-sion in small groups1,” Journal of anthropological research,vol. 33, no. 4, pp. 452–473, 1977.

[19] M. E. Newman and M. Girvan, “Finding and evaluatingcommunity structure in networks,” Physical review E, vol. 69,no. 2, p. 026113, 2004.

[20] B. Adamcsek, G. Palla, I. J. Farkas, I. Derenyi, and T. Vicsek,“Cfinder: locating cliques and overlapping modules in biolog-ical networks,” Bioinformatics, vol. 22, no. 8, pp. 1021–1023,2006.

[21] B. Nick, C. Lee, P. Cunningham, and U. Brandes, “Simmelianbackbones: amplifying hidden homophily in facebook net-works,” in Advances in Social Networks Analysis and Mining(ASONAM), 2013 IEEE/ACM International Conference on.IEEE, 2013, pp. 525–532.

[22] J. Teixeira, “Open-coopetition in the Cloud computing Indus-try: the OpenStack NOVA case,” in Proceedings of the firstEuropean Social Networks Conference, ser. European SocialNetworks Conference, 2014.

[23] J. Teixeira and T. Lin, “Rivalry and collaboration in the open-source arena: The WebKit case,” in Sunbelt XXXIV, 2014.

69