a comparison of two physical data designs for interactive social networking actions
TRANSCRIPT
A Comparisonof Two PhysicalDataDesignsfor InteractiveSocial
NetworkingActions�
SumitaBarahmand,ShahramGhandeharizadeh,JasonYap
DatabaseLaboratoryTechnicalReport2012-08
ComputerScienceDepartment,USC
Los Angeles,California90089-0781
�barahman,shahram,jyap� @usc.edu
October7, 2013
Abstract
Thispapercomparestheperformanceof anSQLsolutionthatimplementsarelationaldatamodelwith adocument
storenamedMongoDB.Wereportontheperformanceof asinglenodeconfigurationof eachdatastoreandassumethe
databaseis smallenoughto fit in mainmemory. Weanalyzeutilization of theCPUcoresandthenetwork bandwidth
to comparethetwo datastores.Our key findingsareasfollows. First, for thosesocialnetworking actionsthat read
andwrite asmallamountof data,thejoin operatorof theSQLsolutionis not slower thantheJSONrepresentationof
MongoDB.Second,with a mix of actions,theSQL solutionprovideseitherthesameperformanceasMongoDBor
outperformsit by 20%. Third, a middle-tiercacheenhancestheperformanceof bothdatastoresasqueryresultlook
up is significantlyfasterthanqueryprocessingwith eithersystem.
A Introduction
Thereis an abundanceof datastoreswith both the computerindustryandthe researcharenacontributing novel ar-
chitecturesanddatamodels. In [10], Cattell surveys andclassifies22 datastoresto motivatea quantitative analysis
of thealternative designsandimplementations.We studya specificaspectof this vastmulti-facetedtopic, namely, a
comparisonof anindustrialstrengthrelationaldatabasemanagementsystem(RDBMS)named1 SQL-X andaNoSQL
documentstorenamedMongoDB.While SQL-X implementsa relationaldatamodel[12], MongoDBimplementsa�A shorterversionof this paperappearedin the ACM InternationalConferenceon InformationandKnowledgeManagement(CIKM), San
Francisco,CA, Oct2013.1Dueto licensingagreement,wecannotdisclosetheidentity of this system.
1
JSONrepresentationof data[14]. Eachoffers a rich setof designchoices.We usethe BG [5] benchmarkto exer-
cisethedifferentcapabilitiesof eachdatastore.This socialnetworking benchmarkconsistsof a databaseandeleven
actions(seeTable1) thateitherreador write a smallamountof datafrom thedatabase.
While SQL-X doesnot scalehorizontally, MongoDBscalesto a largenumberof nodes.In additionto impacting
the performanceof a singlenodeinstanceof eachdatastore,physicalorganizationof dataimpactsthe horizontal
scalabilityof MongoDB.While both areimportant,we focuson the performanceof a singlenodeinstanceof each
datastorefor thefollowing reasons.First, it providesinsightsinto thetradeoffs associatedwith two alternative logical
datadesigns,namely, relationalandJSON.An interestingfinding is thattheuseof thejoin operatoris notslower than
theJSONrepresentation,seeSectionD.
Second,while BG’s interactive social networking actionsaresimple, they interactin complex ways to offer a
wide rangeof designchoices.We show it is beneficialto move the work of readactionsto write actionswhenthe
workloadis dominatedby readactions.(Accordingto Facebook,morethan99%of their workloadis dominatedby
queries[3, 28].) Materializedviewsarenot appropriatebecausethey provideeithera very low performanceor a high
amountof staledata,seeSectionE.
Members(������� � ��������������������������� !��"#���������%$&�'��"#���(�)��� *�+!,�� -'�����(��.� *�����"/�(�%$&����"/�(�%�0�(�(.�����(�1���)�� $���"/��$/����.+!��#$&��2(���(-'�(�"�3��4�5,�����#$&2(���(-'� )
Friends( 6 ��7� "/���2�8��96 ��7� "/����2�8 � 6��"/��"#��� )Resource(�#� � 6: .����"/+��#��� 6�;��$�$<�������#�=�1"#>��4���%,?+.�(>��1��+ : )Manipulation(�@#� ���)+.�(#��#���#����A�#��� 6: .����"/+��#����"# ���.��"/���B�C��"#>��4��� : +���"/����" )Figure1: BasicSQL-X databasedesign.Theunderlinedattribute(s)denotetheprimarykey of a table.Attributeswitha hatdenotetheindexedattributes.
Third, BG’s socialnetworking actionsimposea small amountof work on a nodeandshouldbe processedby a
singlenodeof amulti-nodedatastore.Otherwise,theoverheadof parallelismlimits thescalabilityof adatastore[19,
15, 33]. By understandingfactorsthat enhancethe performanceof a single node,we provide a solid foundation
to investigatealternative designsthat impact the scalabilityof a datastore. Thesealternativesincludepartitioning
strategies,replication,andsecondaryindexes[15, 29].
Our primarycontribution is a quantitativecomparisonof two differentdatastorearchitecturesto provide insights
into theirworking operations.Datastorearchitectsmayusetheseresultsto enhancetheperformanceof their existing
datastorefor socialnetworkingactions.Socialnetworking sitesmayusetheseresultsto fine tunetheperformanceof
their existing deployments.For example,whencomputingthefriendsof a member[3], obtainedresultssuggestjoin
of two tablesmightbefastenoughaslong asimagesarerepresentedeffectively, seeSectionC.
Related work: An experimentalcomparisonof RDBMSsolutionswith theNoSQLsystemsfor interactivedata-serving
environmentsanddecisionsupportworkloadsis presentedin [18]. Its key finding is that the SQL systemsprovide
significantperformanceadvantagesand that NoSQL systemsare fairly competitive in many cases.It employs the
2
Figure2: BasicMongoDBdesignof BG’s database.
YCSB benchmark[13] for its evaluationof interactive environments. Our evaluationfocuseson interactive social
networking actionsandconsidersa richerconceptualdatamodelandworkload.We explorea subsetof a largespace
of possibilitiesincludingtheuseof caches,quantifyingtheir tradeoffs.
The rest of this paperis organizedas follows. SectionB providesan overview of the BG social networking
benchmark. It includesan organizationof data,termedBasic, with both SQL-X andMongoDB. SectionsC to F
presentphysicaldesignenhancementsto the Basicdesignof eachsystem. This discussionincludesa quantitative
analysisto identify the bestdesigndecisions(termedBoosted)for eachsystem. We usethesedesignsto compare
SQL-X with MongoDBin SectionG. Brief conclusionsandfutureresearchdirectionsarepresentedin SectionI.
B Overview of BG Benchmark
BG [5] is a benchmarkto ratedatastoresfor interactive socialnetworking actionsandsessions.Theseactionsand
sessionseitherreador write a very small amountof the entiredataset. BG usesa threadto emulatea memberof
a socialnetworking site viewing eitherherown profile or thatof anothermember, listing eitherher friendsor those
of anothermember, inviting anothermemberto befriends,viewing her top-k resources(image,posting),andothers.
Thefirst columnof Table1 lists all actionssupportedby BG. Theseactionsarecommonto sitessuchasFacebook,
LinkedIn,Twitter, FourSquare,andothers[5].
BG modelsa databaseconsistingof a fixed numberof members( D ) with a registeredprofile. Eachmember
profile may consistof eitherzeroor 2 images.With the latter, oneimageis a thumbnailandthe secondis a higher
resolutionimage. While thumbnailsaredisplayedwhenlisting friendsof a member, the higherresolutionimageis
displayedwhenamembervisitsauser’sprofile. An experimentstartswith afixednumberof friends( E ) andresources
per member( F ). This studyassumesa databaseof 10,000profileswith 2 KByte thumbnailimagesand12 KByte
3
BG SocialActions TypeVeryLow Low High
(0.1%)Write (1%) Write (10%)WriteView Profile(VP) Read 40% 40% 35%List Friends(LF) Read 5% 5% 5%View FriendsRequests(VFR) Read 5% 5% 5%Invite Friend(IF) Write 0.02% 0.2% 2%AcceptFriendRequest(AFR) Write 0.02% 0.2% 2%RejectFriendRequest(RFR) Write 0.03% 0.3% 3%Thaw Friendship(TF) Write 0.03% 0.3% 3%View Top-K Resources(VTR) Read 40% 40% 35%View Commentson Resource(VCR) Read 9.9% 9% 10%PostCommenton a Resource(PCR) Write 0% 0% 0%DeleteCommentfrom a Resource(DCR) Write 0% 0% 0%
Table1: Threemixesof socialnetworkingactions.
profile images.We alsoconsiderdatabaseswith no images.All experimentsstartwith 100 friends2 andresources
per user, E = F =100. (The time to load the databasewith MongoDB is slightly fasterthan SQL-X [7].) We have
conductedexperimentswith a100Kmemberdatabase.Thereportedobservationsandtrendsdo notchangeaslongas
thebenchmarkdatabaseis smallerthantheservermemory.
BG computesa SocialAction Rating (SoAR) of a datastorebasedon a pre-specifiedservicelevel agreement
(SLA) by manipulatingthe numberof threads(i.e., emulatedmembers)thatperformactionssimultaneously. SoAR
is the maximumsystemthroughput(actionsper second)that satisfiesthe SLA. All SoAR ratingsin this paperare
establishedwith the following SLA: 95% of requestsobserve a responsetime of 100 millisecondsor fasterwith
unpredictable(stale)datalower than0.1%. An idealphysicaldatadesignis onethatmaximizesSoARof a system.
Datadesignsusingmaterializedviewsandcacheaugmenteddatastoresmayproducestaledata.Theformeris because
the RDBMS may propagateupdatesto the materializedview asynchronously. The latter is dueto write-write race
conditionsbetweenthedatastoreandthecache[20].
Figure 1 shows the relationaldesignof BG’s database.The underlinedattributesare the primary keys of the
identifiedtables.Index structuresareconstructedon theseattributesto facilitateefficient processingof readactions.
Forexample,with view profileactionreferencingamemberwith aspecificuserid,say5,ahashindex facilitateefficient
retrieval of the Membercorrespondingto this userid. MemberstablemaystoreimagesasBLOBs. Alternativesare
discussedin SectionC. Computingeither list of friendsor pendingfriends requiresa join betweenMembersand
Friendstable. SectionE exploresthe useof materializedviews and their alternativesto migratethe work of read
actionsto write actionsfor computingsimpleanalytics.We reportSoARof thesedesignswith SQL-X.
Figure2 shows the JSONdesignof BG’s databasetailoredfor usewith MongoDB. For eachmemberDHG , this
designmaintainsthreedifferentarrays:1) pendingFriendsmaintainsthe id of memberswho have extendeda friend
invitation to D G , 2) confirmedFriendsmaintainstheid of memberswho arefriendswith D G , and3) wallResourceIds
maintainsthe id of resources(e.g., images)postedon DIG ’s profile. Onemay storeprofile andthumbnailimageof2MedianFacebookfriendcountis 100[35, 4].
4
eachmembereither in the file system,MongoDB’s GridFS,or as an array of bytes. Figure 2 shows the last two
choices.Whenimagesarestoredin theGridFS,theprofileimageidandthumbnailimageidarestoredasattributesof
the Memberscollection(insteadof the arrayof bytesshown in Figure2). SectionC shows onedesignprovidesa
SoARsignificantlyhigherthantheothertwo.
In the next 3 sections,we provide additionaldetailsaboutBG’s actionsandtheir implementationusingboth the
relationalandJSONrepresentations.We discusschangesto the physicalorganizationof dataand their impacton
theSoARof SQL-X andMongoDB.We analyzeSoARof SQL-X with differentmixesof actions,seeTable1. Post
CommentandDeleteCommentactionsareeliminatedbecausewehavenoimproveddesignsto offer for theseactions.
To simplify discussion,this paperclassifiesBG’s actionsinto thosethateitherreador write data.A readactionis
onethatqueriesdataandretrievesdataitemswithout updatingthem.A write actionis onethateitherinserts,deletes,
or updatesdataitems.Column2 of Table1 identifiesdifferentreadandwrite actions.
All reportedSoARnumbersarebasedon a dedicatedhardwareplatformconsistingof six PCsconnectedusinga
GigabitEthernetswitch. EachPCconsistsof a 64 bit 3.4 GHz Intel Corei7-2600processor(4 coreswith 8 threads)
configuredwith 16 GB of memory, 1.5 TB of storage,andone3 Gigabit/secondnetworking card. Onenodeactsas
our datastoreserver (eitherMongoDB or SQL-X) at all4 times. All othernodesareusedasBGClientsto generate
workloadfor this node.With all reportedSoARvaluesgreaterthanzero,eitherthedisk,all cores,or thenetworking
cardof theserverhostingadatastorebecomefully utilized. Wereportontheuseof two networkingcardsto eliminate
thenetwork asa limiting resource.WhenSoARis reportedaszero,thismeansa designfailedto satisfytheSLA.
C Manage Images Effectively
Thereis folklore thatanRDMBSefficiently handlesa largenumberof smallimages,while file systemsaremoreeffi-
cientfor storageandretrieval of largeimages[31]. With BG, we show physicalorganizationof profile andthumbnail
imagesin adatastoreimpactsits SoARratingdramatically. For example,if thumbnailimagesarenotstoredasa part
of theprofile structurerepresentinga memberthentheperformanceof thesystemfor processingtheList Friend(LF)
actionis degradedsignificantly. This holdstruewith bothMongoDBandSQL-X. Performanceof SQL-X is further
enhancedwhenprofile imagesarestoredin thefile system.Thesamedoesnot hold truewith MongoDB.Below, we
provideexperimentalresultsto demonstratetheseobservations.
TheLF actionof BG retrievesthe thumbnailimageandthe profile informationof eachfriend of a member(see
attributesshown in Figure1). Figure3 shows theSoARratingof LF with SQL-X andMongoDBwith 100friendsper
member. While SQL-X performsa join betweentwo tables(MembersandFriendsof Figure1) to performthisaction,
MongoDBlooksup anarrayof memberidentifiers(confirmedFriendsof Figure2 for the referencedMemberJSON
instance)andretrievestheJSONobjectfor eachmember. With SQL-X, we considerthumbnailsstoredin eitherthe3In someexperiments,the server hostingeitherSQL-X or MongoDB is configuredwith two networking cards,eachis a oneGigabit/second
card.4Thesamenodeis usedaseitherSQL-X or MongoDBserver in all experiments.
5
0
100
200
300
400
SoAR (Actions/Second)
Inline Array of Bytes (296)
GridFS (0)
FS(0)
SQL−XMongoDB
Inline BLOB (395)
Figure3: SoARof List Friends(LF) with differentorganizationof 2 KB thumbnailimage,M=10K, E =100.
0
10,000
20,000
30,000
40,000
SoAR(Actions/Second)
2KB 12KB
FS7,695
BLOB 182
FS 12,300
BLOB37,902
Figure4: SoARof SQL-X for processingaworkloadconsistingof 100%View Profile(VP) actionwith imagesstoredaseitherBLOBsor in theFS, D =10K, E =100.
file systemor inline with therecordrepresentingthemember. With MongoDB,we considerthumbnailsstoredin its
Grid File System(GridFS)or asanarrayof bytesin theJSON-like representationof a member. With bothsystems,
storingthethumbnailimageasa partof theMemberprofile enhancesSoARratingof thesystemfrom zeroto a few
hundred.In theseexperiments,theCPUof thethedatastorebecomes100%utilized. Notethat,with asinglenode,the
join operationof SQL-X is not necessarilyslower thanMongoDB’s processingof confirmedFriendsarrayto retrieve
documentscorrespondingto thefriendsof themember.
Theperformanceof SQL-X for processingView Profile(VP) actionof BG is enhancedwhenlargeprofile images
arenot storedin the RDBMS, seeFigure4. An alternative is to storethemin thefile systemwith a memberrecord
maintainingthe nameof the file containingthe correspondingprofile image[31, 8]. Figure4 shows the SoAR of
SQL-X with thesetwo alternativesfor two differentimagesizes:2 KB and12KB. (As acomparison,with no images,
SoARof SQL-X is 119,746for this workload.) A small imagesize,2 KB, enablesSQL-X to storethe imageinline
with thememberrecord,outperformingthefile systemby a factorof 3. SQL-X storesimagesinline aslong asthey
aresmallerthan4 KB. Beyond this, for examplewith our assumed12 KB imagesizes,the performanceof SQL-X
diminishesdramatically, enablingthefile systemto outperformit by morethan40 folds.
MongoDB’sGridFSprovideseffectivesupportfor images.Its SoARis comparableto storingtheseimagesin the
file system.It outperformsthefile systemby morethana factorof two with very largeprofile images,e.g.,500KB.
It is worth notingthatSQL-X outperformsMongoDBwith imagesizessmallerthan4 KB by inlining themin profile
6
SocialAction OneRecordperFriendship Two RecordsperFriendship
Member1’s SELECTcount(*)FROM Friends SELECTcount(*)FROM Friendsnumber WHERE(inviterID=1 or inviteeID=1) WHEREinviterID=1 andstatus=‘C’of friends andstatus=‘C’Member SELECTm.* FROM Membersm, Friendsf SELECTm.* FROM Membersm, Friendsf1’s WHERE((f.inviterID=1andm.userid=f.inviteeID)or WHEREf.inviterID=1 andlist of (f.inviteeID=1andm.userid=f.inviterID)) f.status=‘C’andfriends andf.status=‘C’ m.userid=f.inviteeIDMember1invites INSERT INTO Friendsvalues(1, 2, ’P’)Member2Member2 UPDATE friendship 1. UPDATE friendshipSETstatus=‘C’accepts SETstatus=‘C’ WHEREinviterID=1 andinviteeID=2Member1’s WHEREinviterID=1 andinviteeID=2 2. INSERT into friendship(inviteeID,invitation inviterID, status)values(1, 2, ‘C’)Member2rejects DELETEFROM FriendsMember1’s WHEREinviterID=1 andinviteeID=2andstatus=‘P’InvitationMember1thaws DELETEfriendship FROM Friendswith WHERE((inviterID=1 andinviteeID=2)or (inviterID=2 andinviteeID=1))andstatus=‘C’Member2
Table2: Onerecordandtwo recordrepresentationof a friendshipwith onetable,Friendstableof Figure1.
records.Beyondthis limit, MongoDBoutperformsSQL-X.Similar to thethumbnaildiscussions,if profile imagesizes
areknown to besmall in advancethenonemayinline themwith MongoDBby representingthemasanarrayof bytes
in theMemberscollection,seeFigure9. Key considerationsincludeMongoDB’s limit of 16 Megabytesfor thesize
of a documentandtheimpactof largedocumentson actionsthatdo not requiretheretrieval of theprofile image.For
example,the List Friend(LF) actiondoesnot requirethe profile image. MongoDB providesan interfaceto remove
someattributevaluesof a documentwhile constructinga query. For example,onemayquerytheMemberscollection
for a documentwith userid1 andnot retrieve theprofile imageof thequalifying documentby issuingthe following
expression:db.member.find( J ”userid”:1,”profileimage”:f alseK ).
D Friendship
Theconceptof friendshipbetweentwo membersis centralto a socialnetworking site. Most of BG’s actionsmodel
this concept,seeTable2. An importantconsiderationis how to representthethumbnailimageof eachmemberlisted
asa friend of a referencedmember. This wasdiscussedin SectionC. Hence,this sectionfocuseson a BG database
configuredwith no images.
7
SocialAction OneRecordperFriendship Two RecordsperFriendships
Member1’s SELECTcount(*) SELECTcount(*)number FROM Frds FROM Frdsof friends WHEREfrdID1=1 or frdID2=1 WHEREfrdID1=1Member SELECTm.* FROM Membersm, Frdsf SELECTm.*1’s WHERE((f.frdID1=1 andm.userid=f.frdID2) FROM Membersm, Frdsflist of or WHEREf.frdID1=1 andm.userid=f.frdID2friends (f.frdID2=1 andm.userid=f.inviterID))Member1invites INSERT INTO PdgFrdsvalues(1, 2)Member2Member2 1. DELETE FROM PdgFrdsWHERE 1. DELETE FROM PdgFrdsWHEREaccepts inviterID=1 andinviteeID=2 inviterID=1 andinviteeID=2Member1’s 2. INSERT into Frds(frdID1, frdID2) 2. INSERT into Frds(frdID1, frdID2)invitation values(1, 2) values(1, 2), (2, 1)Member2Rejects DELETE FROM PdgFrdsMember1’s WHEREinviterID=1 andinviteeID=2InvitationMember1thawsfriendship DELETE FROM FrdsWHEREwith (frdID1=1 andfrdID2=2) or (frdID1=2 andfrdID2=1)Member2
Table3: Onerecordandtwo recordrepresentationof afriendshipwith two tables,FrdsandPdgFrdstablesof Figure8.
8
D.1 Relational Design: A Tale of One or Two
With a relationaldesign,onemay representpendingandconfirmedfriendshipsas eitheroneor two tables. With
eachalternative, a friendshipmight be representedaseitheroneor two rows. We elaborateon thesedesignsbelow.
Subsequently, we establishtheir SoARrating.Obtainedresultsshow thata two tabledesignis superiorto a onetable
design.
Figure1 shows a designthatemploys onetable. It employs anattributenamed“status” to differentiatebetween
pendingandconfirmedfriendships:A ‘C’ valuedenotesa confirmedfriendshipwhile a ‘P’ valuedenotesa pending
friendship.Thesecondcolumnof Table2 shows theSQL commandsissuedto implementthealternative BG actions
with this design.Notetheuseof disjuncts(“or”) in thequalificationlist of theSQL queries.A designermaysimplify
thesequeriesandeliminatedisjunctsby representinga friendshipwith two records.Theresultingqueriesareshown
in the third column of Table 2. The designchangesthe implementationof the Accept FriendshipRequestaction
(fourth row of Table2) into a transactionconsistingof two SQL statements.In our implementation,all transactions
areimplementedasstoredproceduresin SQL-X.
An alternativeto theonetabledesignis to employ two differenttablesandseparatependingfriendinvitationsfrom
confirmedinvitations,seephysicaldesignof Figure8 andqueriesof Table3. This eliminatesthe“status”attributeof
theonetabledesign.However, thedatadesigneris still facedwith thedilemmato representa friendshipeitherasone
row or two rows in thetablecorrespondingto theconfirmedfriends. Thesecondandthird columnof Table3 shows
theSQL commandswith thesetwo possibilities.A key differenceis thatSQL queriesaresimplerwith thetwo record
design.
Whencomparingthe alternative designs,the two recorddesignrequiresmorestoragespacethanthe onerecord
design. However, its resultingSQL queriesaresimplerto authorandreasonabout. With oneuserissuingrequests
(single threadedBG), the larger numberof recordsdoesnot impact the servicetime of issuedqueriesandupdate
commandsbecauseindex structuresfacilitateretrievalandmanipulationof therelevantrecords.In amulti-usersetting
with a mix of readandwrite actions,seeTable1, the two tabledesignoutperformsthe onetabledesignwhenthe
frequency of write actionis high enoughto result in conflicts. Figure5 shows SoAR of thesetwo alternativeswith
eachfriendshiprepresentedastwo records.ObservedSoAR with a mix of very low (0.1%)write actionsis almost
identical for the two designsdueto the useof index structuresanda low conflict rate. With a mix of high (10%)
write actions,the two tabledesignoutperformstheonetabledesignby morethan30%. We speculatethis is dueto
ACID propertyof transactionsslowing downtheonetabledesignasit is usedconcurrentlyto processbothpendingand
confirmedfriendshiptransactions.Thetwo tabledesignreducesthiscontentionamongconcurrentlyexecutingactions.
For example,thequeryto computethenumberof pendingfriend invitationsfor amemberis no longerblockedby the
transactionthatthaws friendshipbetweentwo members.
9
0
5,000
10,000
15,000
20,000
1 Table(22,781)
2 Tables(22,830)
1 Table(13,424)
2 Tables(17,887)
0.1% Write 10% Write
SoAR(Actions/Second)
Figure5: SoARof SQL-X with eitheroneor two tablesfor pendingandconfirmedfriendshipswith two workloads,D =10K and E =100.Eachfriendshipis representedastwo records.
D.2 MongoDB: List Friends
With MongoDB,BG’sList Friend(LF) actionis mostinterestingbecauseit mustretrievethedocumentspertainingto
the friendsof a referencedmember. Thesecanberetrievedeitheronedocumentat a time or all documentsat once.
With the former, LF is implementedby issuinga queryto retrieve the basicprofile informationfor eachconfirmed
friend. With thelatter, theentirelist of friendsis usedwith the$in operatorto constructthequeryissuedto MongoDB.
This operatorselectsall thedocumentswhoseidentifiersmatchthevaluesprovidedin thelist. With anunderutilized
system(a few BG threads),the secondapproachprovidesa responsethat is approximately1.5 timesfasterthanthe
first. This is becausethe first approachincursthe overheadof issuingmultiple queriesacrossthe network for each
document.TheSoARof thesetwo alternativesis almostidenticalbecausetheCPUof theserver hostingMongoDB
becomes100%utilized.
MongoDB supportsa hostof write concerns,see[27] for details. We investigatetwo, termednormal andsafe
in MongoDB’s documentation.Both areimplementedby MongoDB’s java client. Thenormalwrite concernreturns
controloncethewrite is issuedto thedriver of theclient. Thesafewrite concernreturnscontrolonceit receivesan
acknowledgmentfrom theserver. With alow systemload(BG with onethread),thenormalwrite concernimprovesthe
averageresponsetimeof MongoDBby 13%.It doesnot,however, improvetheprocessingcapabilityof theMongoDB
server andhasno impacton its SoARwhencomparedwith thesafewrite concern.Moreover, it produceda very low
( L 0.1%)amountof unpredictablereads.
E Migrate Work of Reads to Writes
Due to a high readto write ratio of the workload of social networking sites[28], one may enhancethe average
servicetime of the systemby migratingthe workloadof readsto writes. With RDBMSs,oneway to realizethis is
by usingmaterializedviews, MVs. SectionE.1 discussesthis approachandshows that it slows down write actions
sodramaticallythat it is difficult to arguethey areinteractive. It presentsanalternative namedManual thatdoesnot
suffer this limitation. However, Manualrequiresadditionalsoftwareandincursthe overheadof a developmentlife
10
0
5,000
10,000
15,000
20,000
25,000
0.1% Writes 10% Writes
MV Asynch(15,630)
MV Asynch(20,092)
MV Synch (665)
MVSynch (0)
Manual(23,733)
Manual(16,221)
Basic(13,224)
Basic(22,781)
SoAR (Actions/Second)
Figure6: SoARwith theBasicSQL-X designof Figure1, materializedviews (MV) for aggregatesasattributeswithboth synchronousandasynchronousmodeof refresh,anddevelopermaintained(Manual) aggregatesasattributes,D =10K, E =100,BG databasehasno images.
7.a)Client-Server (CS)architecture 7.b)SharedAddressSpace(SAS)architecture
Figure7: AlternativecacheaugmentedSQLarchitectures.
cycle.
E.1 Read Mostly Aggregates as Attributes
Socialnetworking sitespresenttheir memberswith individualized“small analytics” [32]. Theseareaggregatein-
formationsuchasa member’s numberof friends. BG modelstheseusingits View Profile (VP) actionthatprovides
eachmemberwith hercountof resources,friends,andpendingfriend invitations. Onemay implementthesein two
ways: 1) Computetheaggregateseachtime theVP actionis invoked,2) Storethevalueof aggregates,look themup
to processVP, andmaintainthemupto datein thepresenceof write actionsthatimpacttheir value.An exampleSQL
querythat implementsthe former is illustratedin the first row of Table2. The latter migratesthe workloadof read
actionsto write actions. It is appropriatewhenwrite actionsare infrequent. Below, we presenttwo alternativesto
implementthesecondapproach.
Onemay useMaterializedViews (MVs) of SQL-X to storethe valueof BG’s simpleanalyticsandrequirethe
RDBMS to maintaintheir valueup to date. This was implementedas follows. First, we defineoneMV for each
11
aggregateof the VP action. The resulting3 views have two columns: user-id andthe correspondingaggregateat-
tribute value. Next, we authora MV that joins thesethreeviews with the original Membertable(usingthe user-id
attributevalue),implementingatablethatconsistsof eachmember’sattributesalongwith 3 additionalattributevalues
representingeachaggregatefor thatmember. This tableis queriedby theVP actionto look up thevalueof its simple
analyticinsteadof computingit.
Onemay configureSQL-X to refreshMVs eithersynchronouslyor asynchronouslyin the presenceof updates.
The asynchronousrefreshis in the order of hours,causingthe MV to containstaledata. BG quantifiestheseas
unpredictable reads.Below, we discussthis in combinationwith theobservedSoAR.
With no profile imageanda readworkloadthat invokestheVP actiononly, theauthoredMV improvesSoARof
SQL-X morethansix folds from 19,020to 119,746actionspersecond.With infrequent(0.1%)writes,asynchronous
modeof processingupdatesenablesMVs to enhanceSoARof SQL-X by almosta factorof two, seeFigure6. How-
ever, thiscauses31%of readactionsto observeunpredictable(stale)data.Theamountof unpredictabledataincreases
to 72%with ahigh frequency (10%)of write actions,enhancingSoARof SQL-X by amodest11%.
Thesynchronousrefreshmodeof MVs eliminatesunpredictabledata.However, asshown in Figure6, it diminishes
SoAR of SQL-X dramatically. This is becauseit slows down write actions.As an example,the servicetime of the
Accept Friend Requestwrite action is slowed down from 1.7 millisecondto5 1.94 secondswith an under-utilized
system,i.e.,oneBG thread.Theseservicetimesarenot interactive,renderingMVs inappropriatefor BG’sworkload.
An alternative to MVs, namedManual, is for a softwaredeveloperto implementaggregatesasattributesby ex-
tendingtheMembertablewith 3 additionalcolumns,onefor eachaggregate.Whenamemberregistersaprofile,these
attributevaluesareinitialized to zero.Thedeveloperauthorsadditionalsoftware(eitherin theapplicationsoftwareor
in theRDBMSin theform of storedproceduresandtriggers)for thewrite actionsthatimpacttheseattributevaluesto
updatethemby eitherincrementingor decrementingtheir valuesby one.For example,thedeveloperextendsa write
actionthat invitesMember1 to befriendswith Member2 to incrementthenumberof pendingfriendsfor Member1
by oneasa partof thetransactionthatupdatestheFriendstable,seeSectionD.
Manualspeedsup the VP actionby transforming4 SQL queriesinto one. The four queriesincluderetrieval of
the referencedmember’s profile attribute values,countof friends,countof pendingfriend invitations,andcountof
resources.In ourexperiments,ManualenhancedSoARof SQL-X for processingtheVP actionby thesameamountas
MVs with asynchronousupdate.However, it producesno stalereads.Whencomparedwith Basic,Manualprovides
at mosta 22%improvementastheVP actionconstitutes35%to 40%of theworkload,seeTable1.
A drawback of Manual is the additionalsoftware and its associatedsoftware developmentlife cycle (design,
implementation,testinganddebugging,andmaintenance).Its key advantagesincludeinteractive responsetimesfor
boththereadandwrite actionswith nounpredictablereads.5A 1,141fold slow down.
12
F Cache Augmented Database Management Systems, CADBMS
With bothMongoDBandSQL-X, adevelopermayavoid issuingaqueryto thedatastoreby cachingits output,value,
givenits uniqueinput,key. This is themainmotivationfor middletier caches[23, 11, 36, 17, 16, 25, 1, 2, 30, 22, 28,
21]. This sectionfocuseson a specificsubclassthatemploys in-memoryKey-ValueStores(KVS) with a simpleput,
get,deleteinterface[21, 28]. Its usecaseis asfollows. Thedevelopermodifieseachreadactionto convert its input
to a key andusethis key to look up theKVS for a value.If theKVS returnsa valuethenthevalueis producedasthe
outputof theactionwithout executingthemainbodyof thereadactionthat issuesdatastorequeries.Otherwise,the
bodyof the readactionexecutes,issuesdatastorequeriesto computea value(i.e., outputof thereadaction),stores
theresultingkey-valuepair in theKVS for futureuse,andreturnstheoutputto BG.
Thedevelopermustmodify eachwrite actionto invalidatekey-valuepairsthatareimpactedby its insert,delete,
updatecommandto the datastore. For example, the write action that enablesMember1 to acceptMember2’s
friendshiprequestmustinvalidate5 key-valuepairs.Thesecorrespondto Member1’sprofile, list of friendsandlist of
pendingfriends,andMember2’sprofileandlist of friends.
The maximumnumberof uniquekey-valuepairs is a function of the numberof members/resourcesand read
actions. With a databaseof 10,000members,the view profile action of BG may populatethe KVS with 10,000
uniquekey-valuepairs. With View Commenton Resource(VCR) actionand100 resourcesper member, the KVS
mayconsistof a million uniquekey-valuepairs.Theactualnumberof cachedkey-valuepairsmight belower dueto
a skewedpatternof dataaccess,e.g.,aworkloadthatemploysa Zipfian distribution [6] to referencedataitems.
Thereare two categoriesof in-memoryKVSs: Client-Server (CS) andSharedAddressSpace(SAS) [21], see
Figure7. With CS,theapplicationserver communicateswith thecachevia messagepassing.A popularCSKVS is
memcached[26, 28]. With SAS,theKVS runsin theaddressspaceof theapplication.ExamplesincludeTerracotta’s
Ehcache[34] andJBossCache[9]. SASKVSsimplementtheconceptof atransactionto atomicallyupdateall replicas
of a key-valuein differentapplicationinstances.BothCSandSASarchitecturesmaysupportreplicationof key-value
pairs and implementconsistenthashingto enhanceavailability of dataand implementelasticity. A discussionof
thesetopicsis a digressionfrom our focus. Instead,we focuson the performanceof a singlecacheinstance.With
memcached,the cacheserver is a processhostedon a different server than the one hostingthe datastore. With
Ehcache,thecacheinstanceexecutesin theaddressspaceof theBGClient.
In thefollowing,wefocusontheimpactof theKVS with avery low (0.1%)andahigh(10%)frequency of writes.
With theseworkloads,both MongoDB andSQL-X provide comparableSoARsas either the CPU or the network
bandwidthof the server hostingthe KVS becomes100% utilized. Hence,without loss of generality, we present
SoARsobservedwith SQL-X usingeithermemcachedor Ehcache.
Table5 presentsSoAR of the alternative designswhen the databaseis configuredwith eitherno imagesor 12
KB profile imagesizeswith two differentmixesof workloads. Theseresultsshow Ehcacheprovides the highest
SoAR,outperformingmemcachedby morethana factorof 13 (5) with images(no images).This is becauseit runs
13
Members(������� � ��������������������������� !��"#���������%$&�'��"#���(�)��� *�+!,�� -'�����(��.� *�����"/�(�%$&����"/�(�%�0�(�(.�����(�1���)�� $���"/��$/�1"�3=���5,����� $&2M�)�(-�� )
Frds( 6��.��2(85NM� 6��.�(2�8PO )
PdgFrds( 6 ��7� "/��.2�8@�Q6 ��7( "/����2(8 )
Resource(�#� � 6: .����"/+��#��� 6�;��$�$<�������#�=�1"#>��4���%,?+.�(>��1��+ : )Manipulation(�@#� ���)+.�(#��#���#��� A�#���R6: .����"/+��#����"# ���.��"/���B�C��"#>��4��� : +���"/����" )Figure8: BoostedSQL-X databasedesignwith profile imagesstoredin thefile systemandthumbnailimagesasinlineblobs.Onerecordin theFrdstablerepresentsthefriendshipbetweentwo members.Theunderlinedattribute(s)denotetheprimarykey. Attributeswith a hatdenotetheindexedattributes.
in thesameaddressspaceastheBGClient,avoiding theoverheadof transmittingkey-valuepairsacrossthenetwork
anddeserializingthem. In theseexperiments,the four coreCPU of the server hostingBGClient (andthe Ehcache)
becomes100%utilized,dictatingtheoverallsystemperformance.(Thisbottleneckexplainswhy thereis nodifference
betweenSQL-X andMongoDBonceextendedwith Ehcache.)It is interestingto notethattheSoARof Ehcachewith
12 KB imagesis almosttwice lower than that with no images. This is dueto network transmissionof imagesfor
invalidatedkey-valuepairs,increasingnetwork utilization from 30%to 88%.
With memcached,the four coreCPU of its server becomes100%utilized whenthereareno images,dictating
its SoAR rating. With 12 KB profile images,the network bandwidthbecomes100% utilized dictating SoAR of
memcached.In theseexperiments,memcachedcouldproducekey-valuepairsat a rateof up to 2 Gbpsasits server
wasconfiguredwith two Gbpsnetworkingcards.
G A Comparison
Table5 showsSoARof theBasicSQL-X andMongoDBdatadesignswhencomparedwith theirBoostedalternatives.
(SeeFigures1 and8 (2 and9) for theBasicandBoostedSQL-X (MongoDB)datadesigns.)Boostedincorporatesall
of thebestpracticespresentedin theprevioussectionsexceptfor theuseof caches6. With bothSQL-X andMongoDB,
theBasicdatadesignis inferior to theBoostedalternativebecauseit is inefficientandutilizesits 4 coreCPUfully.
With Boostedandno images,the CPU of the server hostingthe datastorebecomes100%utilized, dictatingits
SoAR.This is truewith bothSQL-X andMongoDBandthetwo workloads,0.1%and10%frequency of writes.These
resultssuggestSQL-X processesBG’sworkloadmoreefficiently thanMongoDBbecauseits SoARratingis two folds
higher.
With 12 KB profile images,both SQL-X andMongoDB continueto utilize their CPU fully with the Basicdata
design. With Boosted,the network becomes100%utilized anddictatestheir SoAR rating. Theseresultssuggest
SQL-X transmitslessdatathanMongoDBto processBG’s workloadbecauseits SoARratingis 50%higher.6ThepresentedSoARfor memcachedandEhcacheusetheBoosteddatadesign.
14
Figure9: BoostedMongoDBdesignof BG’sdatabase.
We have conductedexperimentswith a 100K memberdatabase.The reportedtrendsandobservationshold true
for thisandotherdatabasesaslongastheavailableservermemoryis largerthanthesizeof thebenchmarkdatabase.
H Break-Even point
Theaverageservicetime of a datastoreis a functionof themix of readandwrite actionsthatconstituteits workload.
Both theuseof caches(KVS of SectionF) andanimplementationof aggregatesasattributes(Manualof SectionE.1)
slow down write actionsin order to speedupreadactions. This is beneficialas long as both the speedupand the
frequency of readactionsis highenoughto compensatefor theslow down observedwith write actions.Otherwise,the
migrationof work maydegrade(insteadof enhance)overall systemperformance.An obviousquestionis how does
thefactorof slow down interactwith thefactorof speedup?We answerthis questionby characterizingwhatfactorof
slow down eclipsestheobservedspeedup.This is thebreakevenpoint. A techniquethatslowsdown write actionsby
a higherfactorthanthis breakevenpoint is undesirablebecauseit doesnot enhancesystemperformance.Derivation
of thebreakevenpoint is asfollows.
Considerthe averageservicetime of a lightly loadedsystemthat doesnot incur queuingdelays. Let � and Sdenotethefrequency of write actionsandtheir averageservicetime, respectively. Theprobabilityof a readactionis
15
T Frequency of write actions.UAvg servicetime of readactionswith theBasicdesign.VAvg servicetime of write actionswith theBasicdesign.WAvg servicetime of a workload.XFactorof speedupin
Uwith a changeof datadesign.Y
Factorof slowdown inV
with a changeof datadesign.
Table4: Parametersandtheir definitions
Z N\[]��^ . Assuming_ denotestheaverageservicetimeof readactions,theaverageservicetimeof thesystem,̀ , is:
`Iab�dc Sbe Z Nf[]��^ _ (1)
Thevaluesfor S and _ mayvary with differentphysicaldatadesignchangessuchasthoseoutlinedin SectionsE.1
andF.
AssumeEquation1 denotesaverageservicetimewithoutaphysicaldatadesignchange.This is theBasic database
designof Figure1. A physicaldatadesignchange(sayaggregatesasattributesof SectionE.1) enhancesthe value
of _ andincreasesS . Let g ( h ) denotethe factorof speedup(slowdown) in the averageread(write) servicetime
observedwith thisphysicaldatadesignchange.Thenew averageservicetime is:
`]ai�dcjhkc Sbe Z N\[l��^ _g (2)
Onemaysolve for valuesof h thatcauseEquations1 and2 to break-even:
hma gn[oNg p
Z N\[l��^� p
_S eiN (3)
For example,if the averageserviceof readactionswith theBasicdesignis 150msec( _ =150msec)anda physical
datadesigntechnique,sayuseof memcached,enhancesthis time 8.6 folds ( g =8.6) thenthe sametechniquemay
slowdown write actionsby afactorof 23,381andprovidethesameservicetimeastheBasicdatabasedesignwhenthe
frequency of write actionsis 0.4%( � =0.004)andtheaverageservicetime of anupdateis 1.4msec( S =1.4msec).
A valueof h higher(lower) thanthatcomputedusingEquation3 meansthenew databasedesignis slower(faster)
thantheBasicdesign.In ourexample,valuesof h greaterthan23,381imply thattheBasicdesignis fasterthanusing
memcached.
Equation3 quantifiesseveraltrivial andintuitiveobservations:
1. A proposedphysicaldatadesigntechniqueoutperformstheBasicwhenwrite actionsarerare,i.e.,smallvalues
of � .
2. A proposedphysicaldatadesigntechniquemayincurahigherslowdown in servicetimeof write actions(higher
valuesof h ) andcontinueto outperformtheBasicdesignaslongasit enhancestheservicetimeof readactions
16
by a wider margin, (highervaluesof g ). However, this observationhaslimits: A linear increasein valueof
g doesnot compensatefor a linearincreasein valueof h . In our example,speedingreadactionsup (valueof
g ) from 8.6 to 100folds enablesa modestincreasein slowdown of write actions(valueof h ) from 23,144to
26,185folds. Note thatmorethana 10 fold increasein g did not provide a 10 fold increasein h . And, this
observationsbecomesexaggeratedwith largervaluesof g . In our example,increasingg from a 100to a 1000
folds facilitatesanegligible increaseof h from 26,185to 26,423folds. Equation3 shows this phenomenawith
thecomponentq;r�sq thatapproachesthevalue1 (i.e.,becomesirrelevant)with largevaluesof g .
3. The tolerableslowdown in write actions(valueof h ) is almostan inverselinear function of the frequency of
write actionsexhibitedby a workload(valueof � ). Thus,givena workload,if its frequency of write actionsis
halvedthentheproposedphysicaldatadesignmaytoleratealmosttwice asmuchslowdown in averageservice
time of its write actionsandoutperformthe Basicdesign. With Equation3, this is capturedwith s�r�tt whenu Lo�5v u=w O .
4. The averageservicetime of readactions( _ ) andwrite actions( S ) that constitutethe workload impactsthe
tolerableslowdown in write actions( h ) linearly. If readactionsaremorecomplex thanwrite actionswith a
significantlyhigheraverageservicetime thena proposedphysicaldatadesignmayslow down write actionsby
higherfactorsandcontinueto outperformtheBasicdesign.This is capturedby x y in Equation3.
Thefirst observation is a propertyof socialnetworking applications.While observation2 reasonsabouta proposed
physicaldatadesignchange,Observation 3 providesintuition aboutdifferentworkloadswith varying frequency of
write actions( � ). Observation4 focuseson theaverageservicetime of write andreadsactionswith theRDBMS as
they impacttheoverallaverageservicetimeof boththeBasicdesignandtheproposedphysicaldatadesignchange.
Thesimpleanalyticalmodelof this sectionenablesa databasedesignerto reasonabouta proposedchangein the
physicalorganizationof datato estimatewhetherit providesanenhancementfor theoverall system.
I Conclusion and Future Research
This experimentalpapercomparesorganizationof a socialnetworking databasewith two alternative datastorear-
chitectures:an industrialstrengthSQL solutionanda NoSQL documentstorenamedMongoDB.We usedthe BG
benchmarkfor thiscomparison.TheobservedSoARratingsareimpactedby two key parametersof BG. First, themix
of actionsthat constitutethe workload. Second,configurationof memberprofileswith eithertwo imagesor no im-
ages.We analyzedalternativeenhancementsto boththerelationalrepresentationof SQL-X andJSONrepresentation
of MongoDB.A summaryof theseenhancementsareasfollows,startingwith SQL-X:
z Separatependingandconfirmedfriendshipsinto two tables:Thisdesignmodificationprovidesa33%improve-
mentin performancewith a mixedworkloadthatconsistsof a high (10%)fractionof write actions.
17
Basic Basic Boosted Boosted memcached EhcacheSQL-X MongoDB SQL-X MongoDB
No Image 0.1%Write 12,322 6,512 33,694 14,665 55,634 271,76010%Write 13,976 5,895 28,503 13,117 49,006 286,260
12 KB Profile 0.1%Write 305 0 11,820 7,700 11,888 147,845Image 10%Write 300 0 10,977 7,438 10,271 144,672
Table5: SoARof alternativedesigns,D =10K, E = F =100.
z Storeprofile imagesthat cannotbe inlined in the file system:This improvesSoAR of SQL-X 40 folds. Note
that thumbnailimagesmustbeinlined andstoredwith therecordrepresentinga member. Otherwise,SoARof
SQL-X dropsto zero.
z Representsimpleanalyticsusingaggregatefunctionsasattribute valuesandmaintainthemup to dateusing
additionalsoftware(eitherat theapplicationlevel or astriggers).
A negative finding is thatmaterializedviews slow down theresponsetime of write actionssodramaticallythat they
canno longerbeconsideredinteractive.
With MongoDB,a key finding is to storethethumbnailimageof a memberasanarrayof bytesin themember’s
JSONobject. This resultsin a SoARof seventhousandactionspersecond.If thumbnailimagesarestoredin either
MongoDB’sGridFSor thefile systemof theoperatingsystem,SoARof MongoDBdiminishesto zero.
With bothMongoDBandSQL-X, onemayextendthedatastorewith a cacheto look up queryresultsinsteadof
processingqueriesto computeresults.We investigatedboth memcachedandEhcache.While bothenhancesystem
performance,theimprovementis dramaticwith Ehcachebecauseit eliminatestheoverheadof transmittingkey-values
acrossthenetwork anddeserializingthem.
WhencomparingthebestSQL-X andMongoDBphysicaldatadesigns,SoARof SQL-X is 2.5 timeshigherthan
MongoDBwhenBG’s databaseis configuredwith no images.With bothsystems,theCPUof theserver hostingthe
datastorebecomes100%utilized. This suggestsSQL-X processesBG’s workloadmoreefficiently thanMongoDB.
WhenBG’s databaseis configuredwith 12 KB images,thenetwork (2 Gbps)becomes100%utilized with bothdata
stores.In thisscenario,SoARof SQL-X is 30%higherthanMongoDB.ThissuggestsSQL-X transmitslessdatathan
MongoDBwhenprocessingBG’s workload.
A key featureof MongoDB,memcached,andEhcacheis their ability to horizontallyscaleto a large numberof
nodes.VoltDB is anSQL solutionthatalsoscalesto a largenumberof nodes[24, 29]. An on-goingresearcheffort
is to quantifythescalabilityof thesesystemsusingBG. This will explorea hostof new physicaldatadesignssuchas
differentpartitioningstrategies,replication,andsecondaryindexes[29]. It will includeaninteractionof thesedesign
choiceswith theboosteddesignspresentedin this study.
18
J Acknowledgments
We thankMark Callaghanandtheanonymousreviewersof CIKM 2013for their insightsandvaluablecomments.
References
[1] C.Amza,A. L. Cox,andW. Zwaenepoel.A ComparativeEvaluationof TransparentScalingTechniquesfor DynamicContent
Servers. In ICDE, 2005.
[2] C. Amza,G. Soundararajan,andE. Cecchet.TransparentCachingwith StrongConsistency in DynamicContentWebSites.
In Supercomputing, ICS ’05, pages264–273,New York, NY, USA, 2005.ACM.
[3] T. Armstrong,V. Ponnekanti,D. Borthakur, andM. Callaghan.LinkBench:A DatabaseBenchmarkBasedon theFacebook
SocialGraph.ACM SIGMOD, June2013.
[4] L. Backstrom.Anatomyof Facebook,http://www.facebook.com/note.php?noteid=10150388519243859,2011.
[5] S.BarahmandandS.Ghandeharizadeh.BG: A Benchmarkto EvaluateInteractiveSocialNetworkingActions.CIDR, January
2013.
[6] S. BarahmandandS. Ghandeharizadeh.D-Zipfian: A DecentralizedImplementationof Zipfian. ACM SIGMOD DBTest
Workshop, June2013.
[7] S. BarahmandandS. Ghandeharizadeh.ExpeditedRatingof DataStoresUsing Agile DataLoadingTechniques.CIKM,
2013.
[8] D. Beaver, S. Kumar, H. Li, J. Sobel,andP. Vajgel. Finding a Needlein Haystack:Facebook’s PhotoStorage. In OSDI.
USENIX, October2010.
[9] JBossCache.JBossCache,http://www.jboss.org/jbosscache.
[10] R. Cattell. ScalableSQL andNoSQLDataStores.SIGMOD Rec., 39:12–27,May 2011.
[11] J. Challenger, P. Dantzig,andA. Iyengar. A ScalableSystemfor ConsistentlyCachingDynamicWeb Data. In the 18th
Annual Joint Conference of the IEEE Computer and Communications Societies, 1999.
[12] E. F. Codd.A RelationalModel of Datafor LargeSharedDataBanks.Communications of the ACM, 13(6),June1970.
[13] B. F. Cooper, A. Silberstein,E. Tam,R. Ramakrishnan,andR. Sears.BenchmarkingCloudServingSystemswith YCSB. In
Cloud Computing, 2010.
[14] D. Crockford. The Application/JSON Media Type for JavaScript Object Notation (JSON). InternetEngineeringTaskForce
(IETF), RFC4627,July2006.
[15] C. Curino,E. Jones,Y. Zhang,andS.Madden.Schism:A Workload-DrivenApproachto DatabaseReplicationandPartition-
ing. VLDB, 3(1-2),September2010.
[16] A. Datta,K. Dutta,H. Thomas,D. VanderMeer, D. VanderMeer, K. Ramamritham,andD. Fishman.A Comparative Studyof
Alternative Middle Tier CachingSolutionsto SupportDynamicWebContentAcceleration.In VLDB, pages667–670,2001.
19
[17] L. Degenaro,A. Iyengar, I. Lipkind, andI. Rouvellou. A MiddlewareSystemWhich Intelligently CachesQueryResults.In
IFIP/ACM International Conference on Distributed systems platforms, 2000.
[18] A. Floratou,N. Teletria,D. J.DeWitt, J.M. Patel,andD. Zhang.CantheElephantsHandletheNoSQLOnslaught?In VLDB,
2012.
[19] S. GhandeharizadehandD. DeWitt. Hybrid-RangePartitioningStrategy: A New DeclusteringStrategy for Multiprocessor
DatabaseMachines.VLDB, 1990.
[20] S. GhandeharizadehandJ. Yap. Gumball: A RaceConditionPreventionTechniquefor CacheAugmentedSQL Database
ManagementSystems.In Second ACM SIGMOD Workshop on Databases and Social Networks, Scottsdale,Arizona,2012.
[21] S.GhandeharizadehandJ.Yap. CacheAugmentedDatabaseManagementSystems.In ACM SIGMOD DBSocial Workshop,
June2013.
[22] P. Gupta,N. Zeldovich, andS.Madden.A Trigger-BasedMiddlewareCachefor ORMs. In Middleware, 2011.
[23] A. IyengarandJ.Challenger. Improving WebServerPerformanceby CachingDynamicData.In Proceedings of the USENIX
Symposium on Internet Technologies and Systems, pages49–60,1997.
[24] R. Kallman,H. Kimura,J.Natkins,A. Pavlo, A. Rasin,S.Zdonik,E. Jones,S.Madden,M. Stonebraker, Y. Zhang,J.Hugg,
andD. Abadi. H-Store:aHigh-Performance,DistributedMain MemoryTransactionProcessingSystem.VLDB, 1(2),2008.
[25] A. LabrinidisandN. Roussopoulos.Exploring the Tradeoff BetweenPerformanceandDataFreshnessin Database-Driven
WebServers.The VLDB Journal, 2004.
[26] memcached.Memcached,http://www.memcached.org/.
[27] MongoDB. ClassWriteConcern,http://api.mongodb.org/java/2.10.1/com/mongodb/WriteConcern.html.
[28] R. Nishtala,H. Fugal,S.Grimm,M. Kwiatkowski, H. Lee,H. C. Li, R. McElroy, M. Paleczny, D. Peek,P. Saab,D. Stafford,
T. Tung,andV. Venkataramani.ScalingMemcacheatFacebook.In Tenth USENIX Symposium on Networked Systems Design
and Implementation, April 2013.
[29] A. Pavlo, C.Curino,andS.Zdonik.Skew-AwareAutomaticDatabasePartitioningin Shared-Nothing,ParallelOLTPSystems.
In SIGMOD, 2012.
[30] D. R. K. Ports,A. T. Clements,I. Zhang,S.Madden,andB. Liskov. TransactionalConsistency andAutomaticManagement
in anApplicationDataCache.In OSDI. USENIX, October2010.
[31] R.Sears,C.V. Ingen,andJ.Gray. To BLOB or Not To BLOB: LargeObjectStoragein aDatabaseor aFilesystem.Technical
ReportMSR-TR-2006-45,MicrosoftResearch,2006.
[32] M. Stonebraker. WhatDoes‘Big Data’Mean?Communications of the ACM, BLOG@ACM, September2012.
[33] M. Stonebraker andR. Cattell. 10 Rulesfor ScalablePerformancein SimpleOperationDatastores.Communications of the
ACM, 54,June2011.
[34] Terracotta.Ehcache,http://ehcache.org/documentation/overview.html.
[35] J. Ugander, B. Karrer, L. Backstrom,andC. Marlow. TheAnatomyof theFacebookSocialGraph. CoRR, abs/1111.4503,
2011.
20
[36] K. Yagoub,D. Florescu,V. Issarny, andP. Valduriez. CachingStrategies for Data-Intensive Web Sites. In VLDB, pages
188–199,2000.
21