middle tier

Upload: mehab

Post on 07-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Middle Tier

    1/24

    Middle-tierDatabaseCachingfore-Business

    QiongLuo1SaileshKrishnamurthy2C.Mohan3HamidPirahesh3

    HongukWoo4BruceG.Lindsay3JeffreyF.Naughton1

    Contactemail:{[email protected],[email protected]}

    Abstract

    Scaling up to the enormous and growing Internet population with unpredictable usage patterns, E-

    commerceapplicationsface severe challenges in cost andmanageability, especially fordatabaseservers

    that are deployed as those applications back-ends in a multi-tier configuration. Middle-tier database

    caching is one solution to this problem. In this paper, we present a simple extension to the existing

    federatedfeaturesinDB2UDB,whichenablesa"regular"DB2instancetobecomeaDBCachewithout

    any application modification. On deployment of aDBCache at an application server, arbitrary SQL

    statementsgeneratedfromtheunchangedapplicationhithertointendedforaback-enddatabaseserver,can

    beanswered:atthecache,attheback-enddatabaseserver,oratbothlocationsinadistributedmanner.The

    factorsthatdeterminethedistributionofworkloadincludetheSQLstatementtype,thecachecontent,the

    applicationrequirementondatafreshness,andcost-basedoptimizationatthecache.Wehavedevelopeda

    research prototype of DBCache, and conducted an extensive set of experiments with an E-Commerce

    benchmarktoshowthebenefitsofthisapproachandillustratetradeoffsincachingconsiderations.

    1 Introduction

    Variouscachingtechniqueshavebeendeployedtoincreasetheperformanceofmulti-tierweb-basedapplicationsin

    responseto theever-increasingscaleof theInternet. Suchapplications typically achieve a measureof scalability

    withapplication serversrunningon multiple (relativelycheaper)systems connecting toasingle databasesystem.

    This,however,doesnotsolvethescalabilityproblemforback-enddatabaseservers.Middle-tierdatabasecaching

    (shownasthegrayboxinFigure1)isoneofthesecachingtechniques,whichisdeployedinthemiddle,usuallyat

    theapplicationserver,ofa multi-tierwebsiteinfrastructure when theback-enddatabaseisa bottleneck.Example

    commercial products include the Database Cache of Oracle 9i Internet Application Server [19] and TimesTen's

    Front-Tier[22].

    Inamulti-tiere-Businessinfrastructure,middle-tierdatabasecachingisattractivebecauseofimprovementsto

    thefollowingattributes:

    WorkdoneatIBMAlmadenResearchCenter1ComputerSciencesDepartment,UniversityofWisconsin-Madison,Madison,WI53706

    2ComputerScienceDivision,DepartmentofEECS,UniversityofCaliforniaBerkeley,Berkeley,CA94720

    3IBMAlmadenResearchCenter,650HarryRoad,SanJose,CA95120

    4DepartmentofComputerSciences,UniversityofTexas-Austin,Austin,TX78712

  • 8/6/2019 Middle Tier

    2/24

    2

    (1) Scalability:bydistributingqueryworkloadfromback-endtomultiplecheapfront-endsystems.

    (2) Flexibility:withQoS(QualityOfService)controlwhereeachcachehostsdifferentpartsofthebackend

    data,e.g.,thedataofPlatinumcustomersiscachedwhilethatofordinarycustomersisnot.

    (3) Availability:bycontinuedserviceforapplicationsthatdependonlyoncachedtablesevenif theback-end

    serverisunavailable.

    (4) Performance:bypotentiallyrespondingtolocalitypatternsintheworkloadandsmoothingoutloadpeaks.

    Usingageneral-purposeindustrial-strengthDBMSformiddle-tierdatabasecachingisespeciallyattractivetoe-

    Businesses,eventhoughtherehavebeenspecial-purposesolutions(forexample,e-Bayusesitsownfront-enddata

    cache[18]).Thisismainlyduetocrucialbusinessrequirementssuchasreliability,scalability,andmanageability.

    Forinstance,an industrial-strengthDBMScloselytracksSQLenhancements,andprovidesa varietyof tools for

    application development. More importantly, it provides transactional support, multiple consistency levels, and

    efficient recoveryservices.Finally,anidealcacheshouldbe transparentto theapplicationthat uses it,andthis is

    difficulttoachievewithaspecial-purposesolution.

    Theexistingcommercialapproaches([19],[22])ofusingageneral-purposeindustrialstrengthDBMSinvolve

    anarchitecturesuchasinFigure2.Inthisapproach,thereisasmalllayer(CacheManager)thatdirectsapplication

    database queries to different databases, and the DBCache instance has no knowledge that it is being used as a

    DBCache.Inpractice,theCacheManagercouldbepartofacall-levelinterfacelibrary[19]orspecialpurposecode

    embeddedintheapplicationitself[22].Sincetheapplicationhastwoseparateconnectionstodifferentdatabases,we

    willcallthisthedouble-connectionapproachtodistinguishitfromtheconfigurationinFigure1,whichwewillcall

    thesingle-connectionapproach.Inthelattercase,theDBCacheisfullyawarethatitservesasaproxyfortheback-

    Figure1:Single-connectionApproachtoDatabaseCaching

    Figure2:Double-connectionApproachtoDatabaseCaching

    Web/App.

    Server

    Application

    DBServerBrowserHTTP

    Middle-tier

    DBCache

    Web/App.

    Server

    Application

    DBServerBrowserHTTP

    Cache

    Manager

    Middle-tier

    DBCache

  • 8/6/2019 Middle Tier

    3/24

    3

    enddatabaseserver.Wewillfurtherdistinguishtheformercaseintoapplication-awaredouble-connection[22]and

    application-transparentdoubleconnection [19].Aswewillseerepeatedlyin thispaper,thisdesignchoiceaffects

    manyaspectsofthedatabase.

    Manyresearchquestionsariseinusingafull-fledgeddatabaseengineformiddle-tierdatabasecaching,andthe

    answerstosomeofthemaffecttherelevanceofothers.Indecreasingorderofimportance,theseare:

    (1) Whataretheperformancebottlenecksine-Businessapplications,orinotherwords,areweaddressingthe

    rightproblemsbyfocusingondatabasecaching?

    (2) WillperformancebeacceptableusingacommercialDBMSasa middle-tierdatacache?Featuressuchas

    transactional semantics, consistency, and recovery come with some overhead. What features can be

    dispensedwithinsuchanenvironment?

    (3) Whatdatabasecachingschemesaresuitablefore-Businessapplications?

    (4) How can a databasecaching scheme beimplemented in a commercialdatabase engine and how does it

    performunderrealistice-Commerceworkloads?

    (5) Whatistheimpactofrunningadatabaseserverinthesamecomputerasanapplicationserver?

    (6) Howcanwegeneralizetheseresultstootherkindsofwebapplications?

    Mostofthesequestionsremainopen,partlyduetothediversityofe-Commerceapplicationsandthecomplexity

    ofthesesystems.

    Inthispaper,weattempttoanswersomeofthesequestions.Westartwithexaminingtheopportunitiesine-

    Commerceapplicationsformiddle-tierdatabasecachingbyrunningane-Commercebenchmarkontypicalwebsite

    architectures.Weobservethatthisbenchmarkgenerateda largenumberofsimpleOLTP-stylequeries,theirtable

    accesses were highly skewed on a few read-dominant tables, and there was a clear separation between write-

    dominanttablesand read-dominanttables.We findthatwebapplicationclonescouldscaleup toheavyloadsand

    thisleavesthebackenddatabaseservertoeventuallybecometheperformancebottleneckinthesystem.

    WethenexplorehowtoextendDB2sothatitcanbeusedasamiddle-tierdatabasecache.Wetakethesingle-

    connection approach,becausewe think itis important toleverage existing featuresof theDBMS engineandnot

    produce special purpose code outside of the engine. By extending DB2s federated features we turned a DB2

    instanceintoa tableleveldatabasecachewithoutchanginguserapplications.Thenoveltyofthisextensionisthat

    query plans at the cache may involve both the cache and the remote server based on cost estimation. Through

    experiments, we showed thatthe overhead of adding a full-strength DBMSas a middle-tierdatabasecache was

    insignificantfore-Commerceworkloads.Consequently,middle-tierdatabasecachingimprovedusersresponsetime

    significantlywhenthebackenddatabaseserverwasheavilyloaded.

    Theremainderofthepaperisorganizedasfollows.InSection2,wepresentourprototypemiddle-tierdatabase

    cache (calledDBCache) that leverages existing features of federated technology in a commercial DBMSengine

    (DB2). In Section3, wedescribeour overallevaluation methodologywith an e-Commerce benchmark. We then

  • 8/6/2019 Middle Tier

    4/24

    4

    presentourexperimentalresultsto showtheperformanceimpact ofmiddle-tierdatabasecaching(Section4). We

    discussrelatedworkinSection5andconcludeinSection6withouragendaforfutureresearch.

    2 TurningDB2intoaDBCache

    In this section, we will first discuss our design considerations for the DBCache. Then, we present our cache

    initializationtoolandourmodificationtotheDB2engine.

    2.1 DesignConsiderations

    ThegoalofourDBCacheistoimprovetheperformanceandscalability

    ofweb-basedapplicationsb ydistributingqueryprocessingto theclones

    ofthe applicationsand theunderlyingapplicationservers(as Figure3).

    WhentheDBCacheiscollocatedwiththeapplicationserver,itservesas

    adatabaseserverforthenodeitresidesin.EachDBCachecancachepart

    ofthedatainthebackenddatabaseserverandsharetheserverworkload.

    WithserverworkloaddistributedtomultipleDBCachenodes,theoverall

    userresponsetimeisimprovedwhenthebackendserverisunderheavy

    load.

    Withthisgoalinmind,weexaminethedesignrequirements,andour

    choiceofcachingschemesandexistingmechanisms.

    2.1.1 Requirements

    ThefirstrequirementinourdesignofDBCacheisthatneithertheapplication,northeunderlyingdatabaseschema

    shouldhavetochange.Firstly,itisdesiredthatthedecisiontodeployaDBCachecouldbemadeforanarbitrary

    shrink-wrapped application by local administrators who do not have access to the application source code.

    Secondly,requiringtheapplicationtobecognizantoftheDBCachewouldresultinincreasedcomplexity,whichis

    undesirableespeciallygiventhatcostandmaintainabilityarealreadymajorproblemsinsuchenvironments.Weaim

    to makeit easy for databaseadministrators to set up the cachedatabase schema, and makethe DBCache to be

    transparenttotheapplicationsatruntime.Noticethatin[22]anypartoftheapplicationthatusestheDBCachefor

    performance needs to retarget SQL queries to the specific dialect supported by the DBCache. In general, it is

    desirablethatthe DBCacheinstanceiscapableofunderstandinganySQLstatementthattheback-endcan handle.

    Withapplication-transparentdoubleconnection [19]thisisnotmuchofanissueastheCacheManagercaneasily

    redirectspecialcasequeriestotheback-endserver.

    The second requirement is to support reasonable update semantics. Update transactions increase resource

    Figure3:DeployingDBCache

    DBServer

    Browser

    NetworkDispatcher

    Web/App.

    Server

    Application

    Web/App.

    Server

    Application

    DBCache DBCache

    BrowserBrowser

  • 8/6/2019 Middle Tier

    5/24

    5

    contentionatthebackenddatabaseserveraswellasthecostforcacheconsistencymaintenance.Fortunately,E-

    commerceapplicationshave"highbrowse-to-buy"ratios(read-dominant)andhightoleranceforslightlyout-of-date

    data. This allows us to defer update propagation to the cache so that it affects on-line transactions as little as

    possible.Nevertheless,time limits on thedeferralof cache synchronization arenecessaryto ensure "reasonable"

    freshnessofcacheddata.

    Otherrequirementsincludesupportforfailoverofincomingrequests toa failedcachenodeto anothercache-

    node, and dynamic cache node addition and removal. These requirements are aimed at increasing system

    availability,manageability,andincrementalchangestocapacity.

    2.1.2 ChoiceofCachingSchemes

    Given the requirements, we have the task of choosing a caching scheme for DBCache. We categorize caching

    schemesby the unitof logicaldata(baseor derived) thatis cached asfollows: fulltable, asubsetof atable, an

    intermediatequeryresult,orafinalqueryresult.Although(full)tablelevelcachingcanbeviewedasafulltable

    scan query, it is the only scheme among the four that needs only schema information of the cached table. In

    contrast,otherschemesneedtoknowextrainformation,suchasthequerydefinitionsthatcorrespondtothecurrent

    cachecontent.So,weregardthelatterthreeasqueryresultcachingandconsidercachingasubsetofatableasa

    specialcaseofanintermediatequeryresult.

    Tablelevelcachinghasseveraladvantages,withthemostdefinitiveonesbeingeasydeploymentandtheability

    toanswerarbitraryqueriesoncachedtables.However,updatesonatablemustreachallnodesthatcachethistable

    withinareasonableamountoftime.Ifthequeriesareexpensive,tablelevelcachingdoesnotsaveanycomputation

    evenonacachehitunlesstheaccesspathsaredifferentand/orthecachenodeislessloaded.Incomparison,queryresult caching schemes may save expensive computation on a cache hit. But the downside is that they require

    complex schema definitions for deployment, complex query rewriting at runtime, and complex update logic to

    minimizethenumberofcachenodestosynchronize.

    Note: Wedo not consider the single-connection anddouble-connection approaches to be different schemes.

    Theyaremerelydifferentdesignsforthesameschemei.e.,tablelevelcaching.

    The relative performance of these caching schemes is determined by the characteristics of the data and

    workload. From our observations on an E-Commerce benchmark and anecdotal knowledge of real world e-

    Businesses,tablelevelcachingseemstobesufficientfortheseapplications.ThesimpleOLTP-typequeriesdonot

    needcomplexintermediateresultcaching,thesmallnumberoffrequentlyqueriedtablesserveaseasycandidatesfor

    tablelevelcaching;andthecleanseparationofread-dominanttablesandwrite-dominanttablesenablesselectively

    cachingtablestoreduceupdatepropagationcosts.

    Accordingly,asthefirststepofturningDB2intoamiddle-tierdatacache,weexploretablelevelcaching.Other

    techniquessuchassubsetcachingandintermediateresultcachingintheformofmaterializedviewsandfinalquery

    resultcachingatacall-levelinterfacelibrary(suchasaJDBCdriver)arealsoon-goingworkatIBMbutareoutof

  • 8/6/2019 Middle Tier

    6/24

    6

    thescopeofthispaper.

    2.1.3 LeveragingExistingMechanisms

    Having decided on table level caching, our problem is the following: given a web application that cannot bechanged,itsbackenddatabaseschema,anditsdatabaseworkload,generatethemiddle-tiercachedatabaseschema,

    andprocesstheSQLstatementsutilizingboththecacheandthebackenddatabase.Asdiscussedearlier,wehave

    twoalternatives:oneistobuilda specialpurposecachequeryprocessorfromscratch,andtheotheris toleverage

    existingtechnologyin DB2.Inourresearchprototype,we chosethelatterrouteforthefollowingreasons.Firstly,

    thereisaninterestingmatchbetweenexistingfederatedfeaturesandthequeryroutingfunctionneededinourcache.

    Secondly,weprefertoreuseexistingfeaturesandnotbuildyetanotherqueryprocessor.Finally,ourapproachalso

    allowsustohandledistributequerieseffectively,whereitispossiblefortheoptimizertodecidewhatportionofthe

    queryshouldbeprocessedinthefrontendandwhatportionisinthebackend.Incomparison,intheapplication-

    transparent double-connection approach, new functionality is limited to the Cache Manager and outside the

    databaseengineproper.

    ThefederatedfeaturesinDB2V7 firstappearedinIBMsDataJoiner[10]product.UserscanaccessIBMand

    some non-IBM databases, relational dataand non-relational data,as well as local data or remote data througha

    singlefederatedDB2database.Thelocaldatabase,calledafederator,translatesauserqueryoveralocalaliasfor

    remotedataintoa distributed query toremotedatasources. When settingup a federateddatabase, users needto

    createreferencesinthelocaldatabasetoremotedatasources,forexample,anodetoidentifyaremotehost,aserver

    foraremotedatabaseonthehost,andanicknameforatableorviewintheremotedatabase.

    Ifweuseafederatortomodelacache,andaremotedatasourcetobetheback-enddatabase,weinstantlyhavealmostallthedesiredqueryroutingcapabilitythatweneed.Wecantherefore,designthecacheschematobe such

    that all cached tables are local tables, and all un-cached tables to be nicknames of the back-end tables. SQL

    statementssubmittedtothecachearecompiledasusual;ifastatementinvolvesnicknames,thefederatedfeaturesof

    DB2willestimatethecostofqueryexecutionattheremoteserver,decideonpredicatepushdownbasedonthecost

    estimation,andgenerateadistributedqueryplan.

    Ourprototypecurrentlydoesnotsupportupdatesinthefront-end.Instead,usersneedtoissuethe setpassthru

    command and then issue the updates on the referenced remote tables. Inpassthru mode, the

    federator blindlyforwards thestatementstothespecified remoteserver.This allows users toupdate remotedata

    sources directly and frees the federator from maintaining multi-site updates. As we will see, this restriction

    necessitatessomemodificationtotheDB2engineinordertouseitasamiddle-tierDBCachewithoutchanginguser

    applications.Nevertheless,evenwithoutthisrestriction,wewouldstillneedsomethingspecialtorouteallupdates

    tothebackendevenwhentheaffectedtablesexistinthecache.Itshouldbenotedthatthepassthrufunctionalityis

    presentinDataJoinertohandlethosecaseswhereusersmaychooseto entirelybypassthefrondendandsendthe

    requestsdirectlytothebackend.DBMSsoftenhaveuniqueextensionstothestandardSQL.Usingpassthru,onecan

  • 8/6/2019 Middle Tier

    7/24

    7

    exploitthesefeatures.

    Like the approaches in [19] and [22], we aim to keep the data in our DBCache consistent using standard

    replicationtechniques.Inourapproach,weuseDataPropagator/Relational(DPropR)[11].DPropRisIBMstoolfor

    asynchronousdatareplication forrelational databases. In conjunction with DataJoiner, DPropR canalso support

    non-relational data replication and replication between non-IBM and IBM databases. DPropR consists of three

    independentprograms:anadministrationprogram,adata changecaptureprogram,andanupdateapply program.

    The three programs communicate with one another through a set of tables called control tables. Users can

    subscribe replication requests using GUI tools andthesubscriptioninformation is stored in thecontrol tables.

    Subscriptionscanbeonasetoftables,possiblywithsomeselectionpredicatesontables.Userscanalsospecifythe

    frequency of update propagation, the minimum size of each data transfer, among other options. For IBM data

    sources, DPropR uses a log reader tocapturechanges; for non-IBM data sources, DPropR uses triggerson the

    sourcedatabasetocapturechanges.Theapplyprogramisahumbledatabaseapplicationoperatingonthecontrol

    tablesandthetargettables.

    Inourapproach,asin[19],allupdates musthappen inthe back-end,and so wechose a replication scheme

    (basedonDPropR)forupdatepropagationbetweentheback-enddatabaseandtheDBCache.Thisensuresthatall

    write operations take place at the same location with no possibilities for asynchronously-performed conflicting

    updates.Infuturework,wewilladdressotheroptionsforhandlingupdatestoimproveavailabilityandperformance.

    Inaddition,evenifanactivetransactionholdsawritelockagainstacacheddataitemintheback-end,thatdoesnot

    precludeanothertransactionfromreadingatransaction-consistent olderversionofthesamedataiteminanother

    front-end.Thisway,weexploitrelaxedconsistencyrequirementsofe-Businessapplicationsandavoidtheoverhead

    oftwo-phasecommittransactionalconsistencymaintenance.

    2.2 CacheInitialization

    ImplementingtablelevelcachingusingDB2consistsof twopiecesof work. Thefirstisatoolforinitializingthe

    cache, including choosing tables to cache, and setting up update propagation tools for them. The second is to

    modifytheDB2enginesothatitcanconsideruserspecifieddatafreshnessrequirementsandrouterequeststothe

    cacheortothebackenddatabase.Inaddition,forreasonsexplainedbelow,DPropRalsohastobemodifiedslightly.

    OneofourdesignrequirementsforDBCacheisthattheexistingwebapplicationsandtheirdatabaseschemashould

    notchangewhendeployinga DBCache.Asthisisabasicrequirementforusability,weimplementeda toolcalled

    DBCacheInittoautomaticallycreatethedatabaseschemaforacachedatabaseandinitializeit.

    First,thetoolcollectsnecessaryaccessauthorizationinformationaboutthebackenddatabase,suchastheserver

    name, thebackenddatabasename,anduser/passwordinformation.Then, itusesthat information toexaminethe

    catalogof thebackenddatabase,and furthercollectsinformationaboutexistingtables,views, indexes,referential

    integrity constraints, and so on. Information about triggers is not collected, as triggers are only involved with

    updates,andinthisversion,wewantallupdatestohappenonlyattheback-end.StoredProceduresmaycontain

  • 8/6/2019 Middle Tier

    8/24

    8

    update statements and so we dont deploy themat theDBCache either. InDB2, user-defined functions maynot

    updatedatabasecontent.However,ingeneral,theback-endandfront-endnodesmaybeondifferentplatforms,and

    soforsimplicitythetooldoesnotattempttore-deployuser-definedfunctionsinthefront-end.

    Ideally,aftercollectinginformationaboutthebackenddatabase,thetoolshouldexamineasnapshotofatypical

    workloadconsistingofSQLqueries,anddecidewhichtablestocache.Thisisa similarproblemtothatsolvedby

    DB2's IndexAdvisor,which recommends indexes basedon query workloadand availabledisk space. So, inour

    tool,wepresumethatselectionofthecachedtablesversusuncachedtablesisprovideda-priori.

    Oncethetablestocachearedetermined,thetoolthencreatesacachedatabasewiththesamenameasthatofthe

    backenddatabase,unlessspecifiedotherwise,andreplicatestheto-be-cachedtablesatthecachedatabase.Foreach

    tablethatisnottobecached,thetoolcreatesanicknameforitatthecachedatabase,withthesamenameasthe

    correspondingtableattheback-end.Inaddition,allviewsintheback-endarerecreatedinthefront-end.Bysetting

    upnamesofcachedtablesandnicknamesidenticaltotheircounterpartsatthebackenddatabase,theuserapplication

    doesnotneedtochangeorevenbeawareoftheexistenceofthecachedatabase.TheDB2federatedqueryprocessor

    willdecidehowtoprocessqueries.

    Inaddition,indexesoncachedtablescanbereplicatedtothecachedatabase.Indexesonnon-cachedtables,are

    replicated as"indexspecification only", where actual indexesare notcreated, butthe cache databaseis informed

    abouttheexistenceofarealindexattheback-end.Thisletsthecachedatabasechooseapotentiallybetterplan.

    Finally,forthecachedtables,weneedtoloadtheirinitialdata.Wealsoneedtosetupreplicationsubscriptions

    for them so that when the tables in the backend database change, the cached tables will be brought up to date

    asynchronously.WeuseDPropR forthispurpose.When thecachedtablesaresubscribedfor updatepropagation,

    andthecaptureandapplyprogramsstartrunning,thecachedtablesareloadedwiththedatafromtheircounterparts

    inthebackenddatabaseandareupdatedasynchronouslyatthespecifiedfrequency.

    2.3 InsideDBCache

    2.3.1 DBCacheMode

    BecausewearemodifyingaregularDB2intoaDBCache,weneedawaytoindicatetotheDBMSinstancethatit

    willbeusedasacache.Wehavetwochoicesoverwheretosetthiscachemode:eitherattheDBMSinstancelevel,

    oronaper-databasebasis.Acachemodeattheinstancelevelissimplertoimplementwhileacachemodeatthe

    databaselevelismoreflexible.Ifthecachemodeisonaperdatabasebasis,theDBMSwillmakequeryrouting

    decisionsaccording to which databasethe queriesareon. Thisallowsa DBCache to managesboth reallocal

    databasesandlocaldatabasesthatserveasacacheofaremotedatabase.Thedownsideisthatitislessefficientthan

    aDBMSlevelsettingiftheinstanceisdeployedpurelyasa cache.Inourcurrentprototype,weuseaDBMSlevel

    cachemodesetting,andmaychangetoadatabaselevelsettinginthefuture.

    AnotherissuerelatedtothecachemodesettingiswhetherweallowacacheDBMSinstancetobeservedby

  • 8/6/2019 Middle Tier

    9/24

    9

    multipleremoteDBMSinstances.Ideally,thisshouldbethecasesincefederatedDB2canservequeriesthataccess

    multipleremotedatasources.Unfortunately,federatedDB2doesnotsupportmultiplesiteupdatesyetwhileour

    DBCache will facethis problemif itallows formultiple remote servers.Therefore, wesupportonlyone remote

    serverperDBCacheinstanceinthecurrentstage.Thecachedatabaseisawareoftheidentityof theremoteserver

    andusesitinitsqueryprocessingdecisions.

    Finally, even in the cache mode, applications sometimes may need to access the most-up-to-date data (for

    instance, shoppingcart), and sometimes need toissue updates thatare targetedto the cachedatabaseitself (for

    example, the applyprogram ofDPropR). For the case of accessing the most-up-to-date data, we allow users to

    changethevalueofanexistingspecialregistercalledREFRESHAGEtospecifytheirdatacurrencyrequirement.

    Fortheoperationstargetedatthelocaldatabase,weprovideacommand setpassthrulocalsothatanyoperations

    afterthatandbeforeasetpassthruresetareexecutedlocallyeveninacachemode.Wewillpresentmoredetails

    forpassthruinthefollowingsection.

    2.3.2 Auto-Passthru

    When the DBCache instance detects the cachemodesetting, itwillroute requeststo the cachedatabase, tothe

    backenddatabase,ortobothplacesfordistributedqueryprocessing.Weachievethisbyintroducinganautomatic

    passthru,orauto-passthrumechanismbasedonDB2sexistingpassthrumechanism.

    The existing mechanism relies on the commands set passthru and set passthru

    reset.Allstatementssubmittedafterapassthrusessionhasbeenturnedonandbeforeithasbeenreset,aresentto

    thespecifiedremoteserverdirectly.Theexceptiontothisisanothersetpassthrucommand.Ifausersetspassthru

    toServerA andthensetspassthrutoServerBbeforeresettingpassthru,thestatementsbeforesetpassthruBaresenttoServerAdirectly,andaftersetpassthruBtoServerBdirectly,implicitlyendingthepassthrutoServerA.

    Whena setpassthruresetisissuedlater,thepassthrumodetoB isthenended.Notethatthismodelisdifferent

    fromatrulynestedorastackmodel.

    Unliketheexistingpassthrumechanism,wedonotdependonexplicitpassthrusetandresetcommands,asthat

    will requireapplic\ationmodification.Instead,auto-passthru takes place inthecache mode. Three factorsaffect

    whereastatementisexecuted:(1)statementtype,whetheritisaUDI(Update/Delete/Insert)oraquery(Select),(2)

    thecurrentvalueoftheREFRESHAGEregister,whichindicatestheuserstoleranceforout-of-datedata,and(3)

    anynicknamesinthequery.

    Morespecifically,ifastatementisaUDI,auto-passthrusendsitthroughtotheremoteserver.Thisistoavoid

    conflictsduetomulti-siteupdatesaswellastobypasstheupdaterestrictionsonnicknames.Ifastatementisaread-

    onlyquery, auto-passthruexaminesthecurrentvalueofREFRESHAGEtodecidefurther:ifthevalueis0,it

    means that the user has requested the most-up-to-date data. In this case, auto-passthru will send the statement

    Thisvaluecanbesetorchangedwithoutnecessarilymodifyingtheapplicationprogram.

  • 8/6/2019 Middle Tier

    10/24

    10

    throughtothebackenddatabaseservertoensurethefreshnessofthedata.Otherwise,theauto-passthrumechanism

    willallowthequerytobeexecutedlocallyatthecache.Interestingly,ifa queryisroutedtothelocaldatabasebut

    involvesnicknames,thentheexistingfederatedqueryprocessingtakesover:ifthequeryinvolvesanycachedtables,

    thenadistributedqueryplanisgenerated;otherwise,aremote-onlyplanischosen.Finally,statementsotherthana

    UDIoraquery,suchasDataDefinitionLanguage(DDL)statements,aredirectlypassedtothebackenddatabaseon

    theassumptionthatitiswhattheuserdesired.Thephilosophyhereisthattheuserisingeneralunawareofthe

    existenceofthecache,andsoanyschemachangeshouldbeeffectedattheback-enddatabase.

    Forsituationswhereauserisawareofthecachesexistence(suchascreationoflocalindexes),weneedaway

    tocapturetheusersintent.Anexampleofasituationwhereoperationsareexplicitlytargetedatthecachedatabase

    is the DPropR apply program. This application

    propagatesdataupdatedontheback-enddatabasetothe

    cachedatabase. SinceDpropRreusestheSQLAPI,the

    cacheDBMSenginehasnowayofknowingthatthese

    updates are targeted at the cache database, and so we

    havetoensurethatauto-passthrudoesnotsendthemto

    the back-end database again. Other administration

    activitiesoverthecachedatabasefacethesameproblem.

    We solve this problem byprovidingan SQLstatement

    setpassthrulocal.Applicationsusethiscommandto

    indicate that the following statements should be

    executedlocally evenin thecache mode. Like a normal

    passthru, this can be turned off with the set passthru

    resetcommand.InourcasewemodifiedtheDPropRapplyprogramtoletitissue"setpassthrulocal"command

    rightafteritsetsupaconnectiontothecachedatabaseforapplyingchanges.

    Finally,wealsoneedtosolvetheproblemofanexplicitpassthruthatisspecifiedbytheuserapplication.In

    other words, the use of passthru for a DBCache should not be intrusive. The following scenario (in Figure 4)

    illustratestheproblem.Supposea userapplicationusesitsbackenddatabaseHasa federatorforsomeotherdata

    source A andhas a setpassthru statement set passthru A.Now theadministrator decides todeploy a database

    cacheCinfrontofitsbackenddatabaseH.WhenthesetpassthruAcommandcomestoC,auto-passthruneedsto

    recordthisinformation,andpassthisstatementitselfandthefollowingstatementstothebackenddatabaseH.This

    isbecausethecachedatabaseCshouldnotbypassthebackenddatabaseHandforwardsubsequentSQLrequestsdirectly to A. In fact, the cache database C might have no knowledge of the data source A. In a real-world

    environment,webapplicationserversoftenliveinaDMZ(De-MilitarizedZone)andmaynotbeabletoaccessthe

    sameserversthattheoriginaldatabaseHcan.

    Toputittogether,weshowthestatetransitiongraphcomparingtheoriginalpassthruandourauto-passthruin

    Figure 5. Note: (1)The italics(e.g. read-only) above each boxdenotes which SQLrequests(excluding passthru

    Figure4:HandlingUser-SpecifiedPassthru

    H

    setpassthruA;

    updateA_table;

    A

    updateA_table;

    H

    setpassthruA;updateA_table;

    A

    updateA_table;

    C

    setpassthruA;

    updateA_table;

    W/OCache WithCache

  • 8/6/2019 Middle Tier

    11/24

    11

    set/reset and commit/rollback, which are processed specially) are handled at the cache node. The remaining

    commandsarehandledatthebackendnode.(2)ThePTDS(i*)statemeansamodeofpassingthrutothemost

    recentlyspecifiedpassthrudatasource.

    DB2Local PTDS(i*)setpassthruDS(u)

    setpassthru resetsetpassthruDS(v)

    DBCache

    db2set/resetDBCACHE

    DBCacheLocalsetpassthrulocal

    setpassthrureset

    DBCache PTDS(j*)

    setpassthruDS(x)

    setpassthrureset setpassthruDS(y)

    setpassthrulocal

    read-only

    selective

    readsonly

    none

    everything

    none

    setpassthrulocal

    SetpassthruDS(z)

    OriginalPassthruinFederatedDB2

    Auto-Passthru inDBCache

    Figure5:ComparisonofPassthruStateTransition

    ThetwostatesabovethedottedlineareforDB2initsnon-DBCachemode(regularfederatedfeature).Inthe

    DB2Localstate,thelocalDB2 processesalmostallkindsofSQLstatements,exceptgivingerrormessagesfor

    UDIoperationsonnicknames. OnasetpassthruDS(u)request,thelocalDB2 transitsintoaPT DS(i*)state

    wherei*=u.Inthisstate,thelocalDB2passeseveryfollowingrequestto thebackendDS(i*)untila setpassthru

    resetrequestcomes.IfanothersetpassthruDS(v)requestcomesinthisstate,thelocalDB2staysinthePT

    DS(i*)statewherei*=v.Inthefederatedenvironment,thelocalDB2talkstoonedatasourceatanypointoftime

    (norealnesting)andremembersonlythemostrecentlyspecifieddatasourceforpassthru.Onasetpassthrureset,

    thelocalDB2willgobacktotheDB2Localstate.

    ThethreestatesbelowthedottedlineareforDB2initsDBCachemode.AftertheDBAissuesthedbcache-

    mode-settingcommandviadb2set,thelocalDB2enterstheDBCachestate,inwhichthelocalDB2willhandle

    readsonlocaltablesselectively(onlywhenrefresh-age=any).Uponasetpassthrulocalcommand,thelocalDB2

    enters the DBCache Local state, in which all later requests including UDIs are handled locally. From this

    DBCacheLocalstate,uponasetpassthruDS(z),thelocalDB2enterstheDBCachePTDS(j*)state,inwhich

    thelocalDB2passescommandstothebackenddatabaseandthebackenddatabaseinturnpassesthecommandsto

    thespecifieddatasource.TheDBCachePTDS(j*)statecanalsobereachedfromtheDBCachestateupona

    Set Pasthru DS(x) command. Proper handling of application issued set passthru is important since, as we

    explained before, even in DataJoiner passthru was supported to allow exploitation of non-standard features of

    remotedatasources.

  • 8/6/2019 Middle Tier

    12/24

    12

    3 EvaluationMethodology

    Thereareatleasttwoalternativewaystoexaminetheperformanceimpactofmiddle-tierdatabasecachingine-

    Businessapplications.Onealternativeistoapplyourprototypeinthefieldandperformcasestudies.Unfortunately,

    thisisseldomviableforvariousbusinessreasons.Moreover,withthediversityoftheseapplicationsandworkloads,

    itmaybedifficulttogaininsightsfromcasestudies.Theotheralternativeistopursuesimulationstudies.The

    problem there is that it may be extremely difficult to model thecomplex running environmentsof e-Commerce

    applications.

    Therefore, we chose to pursue a middle-of-the-road approach to test out our ideas. We used hardware and

    softwarecomponentsthatarepopularinreale-Commerceapplicationstobuildourtestingenvironment.Wechose

    an e-Commerce benchmark called ECDW (Electronic Commerce Division Workload) to be the test target

    application. This benchmark was developed and is used internally by the WebSphere Commerce Server

    PerformancegroupatIBMTorontoLab.ItissimilartotheTPC-Wbenchmark[23],buthasmorefeaturesthataretypicaline-Commerceapplications.

    WetestedtheECDWworkloadinseveraltypicalserver-sideconfigurations.WechosetouseproductionDB2

    inallconfigurations,insteadofusingproductionDB2forsomeconfigurationsandusingourDBCacheprototypefor

    someotherconfigurationswithamiddle-tierdatabasecache.Themainreasonwasthatwe wantedto comparethe

    caching scheme results with the non-caching results without worrying about the effects of implementation

    differences.Therefore,forconfigurationswithmiddle-tierdatabasecaching,wecreatedspecialdatabaseschemaat

    the middle-tier to simulate the effects of caching. By simulating table level caching using the existing DB2

    federatedfeatures,itissufficienttoshowhowthisschemeperforms.

    Disclaimer:TheusageoftheECDWbenchmarkthroughoutthispaperisforustogaininsightsinane-Commerce

    applicationandtestourideas.Itwasneitherintendedfornorshouldbeinanywayviewedasthebestpossible

    performanceresultsforanyspecificIBMornon-IBMproducts.

    3.1 BenchmarkDescription

    TheECDWbenchmarkwasdesignedthroughcloseinteractionswitha widerangeofcustomersinordertoreflect

    thekeycharacteristicsofrealworlde-Commerceapplications.Itsimulateswebusersaccessinganon-lineshopping

    mall.TheapplicationusedisIBMsWebSphereCommerceSuite(WCS)[13],andSegueSoftwaresSilkPerformer

    [21]tooltosimulatetheuserworkloadandtestperformance.WedescribeWCS,SilkPerformer,andECDWitselfin

    order.

    WCSisanintegratedsolutionusedby e-commercesitesinvariousindustries.A samplesetofcustomersare:

    BuyUSA, InfinityQS, Mazdas Competition Parts Program, Milwaukee Electronic Corporation, and IBMs own

    shopIBM site (www.ibm.com). It provides services for creating, customizing, running, and maintaining on-line

  • 8/6/2019 Middle Tier

    13/24

    13

    storesthroughouttheirentirelifespanofoperations.Onthedatabaseside,ithasmorethan500tables,manyindexes,

    constraints,andtriggers.Therearetoolstocreatethedatabaseschemaandloadindata.Ontheapplicationside,it

    uses an Enterprise JavaBean (EJB) framework so thatdevelopers canprogram databaseaccesseswithout being

    directlyboundtotheunderlyingdatabaseschema.Furthermore,customerscananddoextendthedatabaseschema

    aswellastheapplication,bycreatingnewcolumnsandtablesandnewEJBs.

    SilkPerformer isa loadandperformancetesting toolfor e-business webapplications. Itemulates workloads

    thattestersspecify,suchasnumberofconcurrentusers,testingperiod,testingscenarios,andotheroptions.Thetool

    warms up thetesting environment, doesthemeasurement,and finisheswith a proper shutdown process. Allthe

    measurementsareperformedontheclientside;inthenormalconfiguration,noinstrumentationisdoneat theweb

    server,applicationserver,orbackenddatabaseserver.Themeasurementoutputincludesthroughput,responsetime,

    user-definedcounters,user-definedtimers,andothernumbers,bothonarunningbasisandonanaggregationbasis.

    TheECDWbenchmarkusesaround300tablesintheWCSschema.Thedatabasesizecanbesmall(10,000

    items,650MB),medium(30,000items,2GB),orlarge(50,000items,3.5GB).The"regularshoppingscenario"is

    definedinaSilkPerformerscript,whichdependsontwovariablesusertypeandshopflow.Theusertypesare:

    new registering user (5%), existing registered user (10%), and guest user (85%). The shop flowsare: browsing

    (88%), browsing and adding toa shoppingcart(5%), browsingand preparing anorder (2%),and browsing and

    buying (5%). These ratios were obtained through customer interactions, and thus attempt to mimic common

    "browsetobuy"ratiosatrealshoppingsites.

    The benchmark measures web interactions and web transactions. Aweb interaction corresponds to a user

    conductingaspecificoperationatabrowserthatmayinvolveafewmouseclicksandpossiblysomeuserinput.For

    instance,aLogOnwebinteractionisoneinwhicharegistered

    userclicksonthe"Logon"link,fillsoutherinformation,and

    clicksonthesubmitbutton.Oraguestuserclicksonalinkto

    a product item to view the details of that item. A web

    transactioncorrespondstoanHTTP sessionaseriesofuser

    operations,whichincludesmultiplewebinteractions.

    Figure 6 shows the regular shopping scenario of the

    benchmark. Each box represents a web interaction. A web

    transaction in the regular shopping scenario consists of the

    following sequence of web interactions: 1) Go to the front

    pageof the store. 2)Iftheuseris anew registered shopper,

    register with the site; if the user is an existing registered

    shopper,logontothesite;otherwise(aguestshopper),goto

    thenextstepdirectly.3)Browseafewitems.4)Ifthecurrent

    shopflowis notjustbrowsing,butalsopreparinganorderor

    buying,loopafewtimesaddingsomeproductsbrowsedinto

    Figure6:RegularShoppingScenario

    Register

    Registereduser

    StoreFront

    LogOn

    AddToShopCart

    Guest Newregisteringuser

    Browse

    ShoppingBrowsingonly Continue

    browsing

    GuestorderingRegistereduserordering

    DisplayOrderInfo FillInAddressBook

    ShippingDetails

    Buying

    Order

  • 8/6/2019 Middle Tier

    14/24

    14

    theshoppingcart,andbrowsingafewitemsagain.5)Ifthecurrentshopflowneedstoprepareanorder,openand

    fillouttheaddressbookforaguestshopperordirectlydisplayorderinformationforregisteredusers,andgenerate

    thedetailedshippinginformationforallusers.6)Ifthecurrentshopflowistobuy,finishtheorder.

    Sincetherearefourdifferentshopflowsintheregularshoppingscenario,awebtransactionmayendafterone

    ofthefollowingfourwebinteractions(grayedboxesasinFigure6):Browse,AddingToShopCart,ShippingDetails,

    andOrdercorrespondingly.

    3.2 Server-sideTopologies

    We tested on five server-side topologies, two of

    them with a middle-tierdatabasecache,and three

    ofthem without.Thethree topologies that do not

    have a middle-tier cache(shown in Figure 7) are

    the following: (1) single box, in which the

    web/application server and the backend database

    serverareonthesamemachine;(2)remoteDB,in

    whichthesetwocomponentsareontwomachines;

    (3) clustered remote DB, in which there are

    multiple web/application server machines to

    communicatewiththesamebackenddatabaseserver.

    The HTTP requests are distributed to the web

    application server machines in round-robin

    fashion.throughanetworkdispatcherorsomemechanismslikethat.

    Figure7:ThreeNon-CachingServer-sideTopologies

    Figure8:TwoCachingServer-sideTopologies

    Web/App.

    Server

    Application

    DBServer

    Browser

    HTTP

    DBServerDBServer

    Web/App.

    Server

    Application

    Browser

    HTTP

    Browser

    HTTP

    NetworkDispatcher

    Web/App.

    Server

    Application

    Web/App.

    Server

    Application

    SingleBox ClusteredRemoteDBRemoteDB

    DBServerDBServer

    Browser

    HTTP

    Browser

    HTTP

    NetworkDispatcher

    ClusteredDBCacheDBCache

    Web/App.

    Server

    Application

    DBCache

    Web/App.

    Server

    Application

    DBCache

    Web/App.

    Server

    Application

    DBCache

  • 8/6/2019 Middle Tier

    15/24

    15

    Thetwotopologiesthatdohavemiddle-tiercachingareshowninFigure8.DBCachetopologyaddsamiddle-

    tierdatabasecachetotheRemoteDBtopology,andClusteredDBCacheaddsamiddle-tierdatabasecachetoeach

    web application server in the Clustered Remote DB topology. Note that the single box topology in Figure 7 is

    essentiallyaDBCachetopologywitha100%cachehitratio.

    3.3 TestEnvironmentDetails

    We used six computers in the tests. Four of them were IBM Netfinity 3500 server machines with an 800MHz

    PentiumIIICPUand1GBmemory,andtwoofthemwereIBMIntelliStationworkstationswith930MHzPentium

    IIICPUand512MBmemory.ThefourservermachineshadWindows2000Serverandthetwoworkstationshad

    Windows2000Professional.Allmachineshad20-30GBdiskspace.AllmachineswereonaLANwithbandwidth

    of100Mbits/second.

    We installed an instance of IBM WebSphere Commerce Suite (WCS) V5.1 on each server system, which

    includestheIBMHTTP Server(repackagedApache),WebSphereApplicationServer(WAS),andDB2V7.1.We

    also deployedthe ECDW storeapplication (JSP files, HTML files, Java classfiles, EJBfiles,etc.) inthe WCS

    instanceoneachseversystem,andcreatedthestoredatabasewiththelargedatasize(3.5GB)usingthescriptsand

    datathatcomewiththebenchmark. Ononeworkstationcomputer,weinstalledSilkPerformerV3.5to beusedas

    thetestdriver(webclient).Ontheotherworkstationmachine,weinstalledIBMWebSphereEdgeServerV3.6tobe

    usedasanetworkdispatchertodistributeHTTPrequeststomultipleWASservers.

    WeconfiguredDB2asspecifiedbytheECDWbenchmarkwithappropriatebufferpoolandlogbuffersizes.To

    intensifythetestingforthedatabaseserver,wesetthethinktime(waitingtimebetweenwebinteractions)tobezero

    intheECDWclienttestdriver.

    3.4 DatabaseWorkloadDetails

    Weareespeciallyinterestedinthecharacteristicsofthedatabaseworkloadfromtheapplication.ThisWCS-based

    benchmarkisacannedapplication.WeusedDB2sdynamicSQLstatementsnapshottool[12]tocapturetheSQL

    executioninformationatthedatabaseserverside.ItreportstheSQLstatementtext(with?representingparameter

    markers input variables that are bound at query run-time as opposed to compile-time), the total number of

    executions,totalexecutiontime,numberofrowsread,andotherinformation.

    WeexaminedtheSQLstatementsthatwerecapturedthroughthesnapshottool.Forthe1-userregularshopping

    workload, there were151 distinct query templates (with parameter markersand literals),consisting of 125 read

    queries,14insertstatements,and11updatestatements.Forthe30-userregularshoppingworkload,therewere388

    distinctquerytemplates,butstillthesame14insertsand11updatesasinthe1-userworkload.Thiswasbecause

    whilealltheinsertionsand updateswereissuedaspreparedstatementswithparametermarkers,somequeries(for

    example,checkingordersofaparticularregistereduser)wereissuedwithliteralsandnotparametermarkers.Asa

  • 8/6/2019 Middle Tier

    16/24

    16

    result, thedifferent workloadshad a fixed numbersof insert andupdatetemplates, buthad differentnumbers of

    selectiontemplates.Thislargeandvaryingnumberof querytemplates madeitdifficultforustoanalyzetheSQL

    query characteristics of the workloads. Since ordering and registering comprised only a small fraction of the

    workload,inlaterexperimentswefocusedonbrowsing-onlyworkloads,whichwecreatedbymodifyingtheoriginal

    regularshoppingworkloads.

    In a browsing-only scenario, all users are guest shoppers and all transactions are browsing only. We also

    examinedtheSQLsnapshotsof1-userand30-userbrowsing-onlyworkloads.Thequerytemplatesinthebrowsing-

    onlyworkloadswerefixed. Therewerea totalof 47querytemplates,with27ofthemhavingparametermarkers,

    andtheother20not.AllofthemweresimpleOLTPstylequeries,withonly15ofthemhavingjoinsamongtwoto

    fourtablesandtheother32beingsingle-tableselectionqueries.Intotal,thebrowsingonlyworkloadsinvolved51

    tables.

    Thenumberofexecutionsofthesequerytemplatesinbrowsing-onlyworkloadisshowninFigure9.Thetop12

    mostaccessedquerytemplatesallhadparametermarkersinthem.Fourofthemwerejoinsandtheothereightwere

    selection queries.Collectivelythey accessed15 tables(less than1/3 oftheinvolved tables ofthe workload) and

    consistedof88%ofthetotalnumberofSQLexecutions.

    Moreover,inregularshoppingworkloads,weobservedthattherewasalargedegreeofoverlapbetweentables

    withinsertsandupdatesofthe11tableswith

    updates and 14 with inserts (there was no

    deletion in the workload), 9 had both inserts

    andupdates.Weobservedthattherewaslittle

    overlapbetweenread-onlytableswithqueries

    andtableswithupdatesandinserts.Onlytwo

    tables(userregandusers)weresubjectto

    inserts, updates and selects, only one table

    (member) was both queried from and

    inserted into, and one table (keys) was

    queried from and updated to. We also

    examined if any updates/inserts happened on the tables that were involved in the top 12 most frequent query

    templates.Wefoundthattherewasonlyonesuchtable(thetableusersaccessedbythe6th

    mostfrequentquery

    template).

    In summary, we observed that e-Commerce workloads had short query execution time, highly skewed

    popularityoftables,andcleanseparationof read-dominantand write-dominant tables.Thesecharacteristics make

    middle-tierdatabasecachingveryattractive.

    Figure9:NumberofExecutionsofQueryTemplatesinBrowsing

    #executionsofquerytemplates/xact

    0

    20

    40

    60

    querytemplates

    #ex

    ecutions

  • 8/6/2019 Middle Tier

    17/24

    17

    4 ExperimentalResults

    First, we compared performance of regular shopping scenarios with that of browse-only scenarios. Then, we

    measuredtheoverheadofaddingamiddle-tierdatabasecachebyusingadatabasecachewith0%hitrate.Then,we

    cached tables for the top 6 most frequent queries at the middle-tier and measured its performance varying the

    workloadonthebackenddatabaseserver.Wethenexploredupdatepropagationcostinthecachingscheme.Finally,

    weexaminedtheperformanceofclusteredtopologies.

    4.1 ComparingWorkloadCharacteristics

    Wetestedtheregularshoppingscenariointhesingleboxtopology.Wevariedthenumberofconcurrentusersatthe

    simulatorandmeasuredeachuserexecuting100webtransactions.Thebackenddatabasewasrestoredaftereachrun

    sothateachrunstartedwiththesamedatabasecontent.The throughputisreportedintermsof averagenumberof

    webtransactionspersecond,andtheaverageresponsetimeisreportedintermsofthenumberofsecondsperweb

    transaction. We also report the average response times (in seconds) perweb interaction as tBrowse, tBuy,

    tOthers,andtOverAll.tBrowsereferstothetimespentinbrowsing.tBuyincludesthetimespentinadding

    itemstoshoppingcarts,filling outaddressbook,displaying orderinformation, workingout shippingdetails, and

    ordering. tOthersreferstothetimespentinregistereduserloggingonandnew usersregistering.tOverAllis

    theweightedaverageresponsetimeperwebinteractionforalltypesofwebinteractions,withtheweightsbeingthe

    occurrences of each type of interaction. These numbers are shown in Table 1. We also ran the browsing only

    workloadonasingleboxservertopologyvaryingthenumberofconcurrentusersasshowninTable2.

    #users 1 5 10 30

    #xacts/sec 0.5 0.8 0.8 1.0

    secs/xact 1.9 6.2 11.2 30.1

    TBrowse 0.1 0.5 0.8 3.0

    TBuy 1.0 1.9 6.1 5.5

    TOthers 0.2 1.0 3.7 3.0

    TOverAll 0.2 0.7 1.2 3.2

    Notsurprisingly,boththethroughputandresponsetimes(perwebtransactionandperwebinteraction)ofthebrowsing-onlyworkloadimprovedoverthoseoftheregular shoppingworkload. Butbothscenariosfollowedthe

    same pattern: the throughput increased slightly from 1 user to 30 users, while the response time consistently

    increased proportionalto theincreaseof thenumberof users.Sincebrowsingrepresentsthe majority ofthe total

    workload (tOverAll in regular shopping scenario follows closely with the tBrowse value), and browsing-only

    scenarioismuchsimplertotest,mostofthefollowingexperimentsarefocusedonbrowsing-onlyworkloads.

    #users 1 5 10 30

    #xacts/sec 1.3 1.6 1.6 1.6

    secs/xact 0.8 3.1 6.4 19.5

    tOverAll 0.1 0.4 0.8 2.5

    Table1:RegularShoppingScenarioonSingleBox Table2:Browsing-onlyScenarioonSingleBox

  • 8/6/2019 Middle Tier

    18/24

    18

    4.2 ExaminingOverheadofAddingaFrontEndCache

    Adding a middle-tier database cache at the application server creates overhead by consuming resources on the

    application server machine. This isa common concern,especially when themiddletier databasecache isa full-

    strengthDBMSnotalightweightqueryprocessor,soweexaminethisoverhead.

    WeconfiguredWAS(WebSphereApplicationServer)onaservermachinetoletitusealocalDB2server,and

    usedtheDBCacheInittooltocreateadatabaseofallnicknamesinthelocalDB2referencingtheback-enddatabase

    inaremoteDB2onanotherservermachine.ThismakesthelocalDB2actasamiddle-tierDBCachewitha0%

    cachehitrate.Wecomparedtheperformanceofthis dbcache0configurationandtheremoteDBconfigurationto

    examinetheoverhead.

    Wecomparethethroughputandwebtransactionresponsetimeofthesethreeconfigurationsforvaryingnumber

    ofconcurrentusersinFigure10.TheremoteDB configurationisshownasremote,andthedatabasecachewith

    allnicknamesdbcache0.Asexpected,dbcache0wasalwaysworsethantheremoteDBcase,becausethebackend

    serverwasnotoverloaded.Thisisbecauseallthequeriesatthecachearemissesandeveryquerygoesthroughtwo

    databaseservers.Thismadetheperformanceofdbcache0aroundonehalfoftheremoteDBcaseunderalightload

    (lessthan10users).Butwhenthenumberofconcurrentusersincreasedto30,thedifferencebecamemuchless

    significant. This shows that although using a full strength DBMS as a middle-tier database cache adds some

    overhead,thisoverheadisinsignificantwhentheserverisfullyloaded.

    4.3 ExaminingServerWorkloadSharing

    Inrealworldscenarios,e-businessapplicationshavealargenumberofonlineusers,andtheloadcanbevarybya

    factorof100indailyoperations[5].Whenthebackenddatabaseserverismoreheavilyloaded,cachinginthefront

    endsisevenmoreimportanttoimproveusers'responsetime.Therefore,wesetupamiddle-tiercachedatabaseata

    WASmachineasthefrontendandmeasureditsperformancevaryingtheworkloadonthebackendserver.

    Figure10:OverheadofAddingaFrontEndCachewitha0%CacheHitRate

    0

    0.5

    1

    1.5

    2

    1 5 10 30 100#usersThroughput(xacts/sec)

    remote dbcache0

    0

    20

    40

    60

    80

    1 5 10 30 100#users

    Respons

    eTime

    (secs/x

    act)

    remote dbcache0

  • 8/6/2019 Middle Tier

    19/24

    19

    From previousinvestigation onthebrowse-onlyworkloads, weobservedthatthe accessesof different query

    templateswerehighlyskewed.Weselectedthetop6mostusedquerytemplatesandcachedinthelocaldatabasethe

    eighttablesthattheyaccessed:inthelocaldatabase.Queriesontheseeighttablesconsistedof 71%ofthedatabase

    queriesinthebrowse-onlyscenario.Inthelaterexperiments,wealsousedthiscacheconfigurationforallDBCache

    cases.

    ThesetupforvaryingthebackendserverworkloadintheDBCachetopologyisshowninFigure11.Thegoalis

    tovarythebackendserverworkload and examine how cachingcan help.We achievedthisby sending anextra

    browsingonlyworkloadtothebackenddirectly.ThesetupforvaryingthebackendserverworkloadintheRemote

    topologyissimilarexceptthatthereisnomiddle-tierdbcacheatthefrontend.Boththefrontendresponsetimeand

    thebackendresponsetimeweremeasuredatthetestdriver(webclientside).Wecomparetheresponsetimesinthe

    dbcachecasewiththoseintheremotecase.

    InFigure12,weseethatwhentheextraworkloadonthebackenddatabasewas10or50users,addingacache

    at theapplicationserverdid nothelp. Butwhen thenumberofextra users on thebackendreached100,caching

    startedtomakeadifference.Thefrontendresponsetimewasimprovedbecausethecachesheltereditsusersfrom

    theoverloadedbackenddatabaseserver,andthebackendresponsetimewasimprovedbecausethecachesharedits

    serverworkload.Duetoresourceconstraints,wewerenotabletotestmorethan100users.Butwebelievethatthis

    cachingbenefitwillbeevenmoresignificantwhenthebackenddatabaseserverismoreheavilyloaded.

    Figure11:SetupforVaryingServerWorkload

    Figure12:CachingEffectWithVaryingServerWorkload

    UsersConnectingtotheFrontEnd

    0

    2

    4

    6

    8

    10

    10users 50users 100users

    ExtraWorkloadatBackend

    ResponseTime

    (secs/xact)

    w/dbcache w/odbcache

    UsersConnectingtotheBackEnd

    0

    20

    40

    60

    80

    100

    10users 50users 100users

    ExtraWorkloadatBackend

    ResponseTime

    (secs/xact)

    w/dbcache w/odbcache

    Web/App.

    Server

    Application

    Backend

    DBServerBrowserMiddle-tier

    DBCache

    Browser Web/App.

    Server

    Application

    HTTP

    HTTP

    ExtraWorkload

    10-userload

  • 8/6/2019 Middle Tier

    20/24

    20

    4.4 ExaminingUpdatePropagationCost

    Thisexperimentwastoexaminehowmuchperformanceimpacttheasynchronousupdatepropagationprocesshad

    ontheon-linequeryperformance.WesetupDPropRonthebackenddatabaseserverandtheWASserverwitha

    DBCache.The captureprogramwasrunningon thebackenddatabaseserver,andtheapplyprogramonthecache

    database. The cache database still had the eight tables cached and the other tables uncached. Since the cached

    "users" table was updated frequently in a regular-shopping scenario to update the "lastSession" field with the

    timestampofthelastlog-onsessionforeachregistereduser,wesubscribedthe"users"tableforupdatepropagation

    withtheminimumfrequencyof1minute.

    Westillusedthesameextraworkloadsonthebackendasinthepreviousexperiment,andexaminedthe

    updatepropagationcostinthissetting. Besidessendinga 10-userbrowsing-onlyworkloadtothefrontend

    WASserverwithaDBCacheandsendingextrabrowsing-onlyworkloadtothebackenddatabaseserverwith

    aWASclonedirectly,wealsosentanupdateworkloadonthe"lastSession"fieldofthe"users"tabletothe

    backend database server (shown in Figure 13). This "lastSession" field was updated with the current

    timestamptosimulatetheactualupdateinaregular-shoppingscenariowhenaregistereduserloggedon.The

    Figure13:SetupofDPropRwithVaryingServerWorkload

    Figure14:UpdatePropagationCostwithVaryingServerWorkload

    Web/App.Server

    Application

    BackendDBServerBrowserMiddle-tierDBCache

    Browser Web/App.

    Server

    Application

    FE BE

    HTTP

    HTTP

    ExtraWorkload

    10-userLoad

    TestDriver

    Ca

    pture

    Apply

    UpdaterUpdateWorkload

    UsersConnectingtotheFrontEnd

    02468

    10

    10users 50users 100users

    ExtraWorkloadatBackend

    ResponseTime

    (secs/xact)

    w/odpropr w/dpropr

    UsersConnectingtotheBackend

    0

    50

    100

    10users 50users 100users

    ExtraWorkloadatBackend

    ResponseTime

    (secs/xact)

    w/odpropr w/dpropr

  • 8/6/2019 Middle Tier

    21/24

    21

    updateworkloadwasexecutedwithoutanywaitingtimebetweenconsecutiveupdates.Theupdatethroughputwas

    measuredtobe40-60updates/seconddependingontheserverload.Wemeasuredtheperformanceimpactofupdate

    propagationonthebrowsingworkloadsbymeasuringtheresponsetimesatthesimulatedbrowsers.

    Wecomparedthewith-dpropr-runningcasewiththe without-dpropr-runningcasewhenthesameupdaterwas

    updatingthebackenddatabaseserver.Figure14showsthatingeneralasynchronousupdatepropagationdidnotadd

    significantoverheadto theresponse time, althoughthe captureprogramon thebackenddatabase serverincurred

    around20%overheadtothequeryworkloadwhentheextraloadontheserverwas100users.Theoverheadcaused

    by the apply program was low because the apply program batched up updates for each propagation interval (1

    minute),andeachexperimentlasted20-30minutes.Theoverheadcausedbythecaptureprogram wasreadingthe

    logand,whenalogrecordrelatingtoasubscribedtableisfound,performinganSQLinsertintothe"changeddata"

    table(oneofthecontroltablesusedbyDPropR).

    4.5 ClusteringWebApplicationServers

    Finally,wecomparedtheperformanceonclusteredtopologies("2-WAS"and"3-WAS")withthatoncorresponding

    non-clusteredtopologies("1-WAS")toseehowtheywerescalingwiththenumberofWASmachines.Figure15

    shows the throughput and response times of a 30-user browsing-only workload when the backend database is

    heavilyloadedunderaCPUhogprogram.

    WhenthenumberofWASmachinesincreased,bothclusteredtopologiesimprovedtheuserresponsetime,but

    clustereddbcacheimprovedthethroughputmorethanclusteredremoteDBtopology.Thisimpliesthat(1)Forthe

    clustered remote DB topology,simply increasing the number of application servers doesnot scale up the entiresystem under a heavy load, and causes the back-end database server to become the bottleneck. (2) Clustered

    DBCachetopologysharesthebackenddatabaseserverworkload,anditcanscaleupthroughputbetterbyadding

    morecachenodes.WeareinterestedinaddingmoreWASnodestofurtherexaminethescale-upeffectforclustered

    topologies.

    4.6 Discussion

    Figure15:VaryingNumberofWASmachines

    0

    1

    2

    3

    4

    1-WAS 2-WAS 3-WASThroughput(xacts/second)

    remote dbcache

    0

    5

    10

    15

    20

    25

    30

    1-WAS 2-WAS 3-WAS

    Respons

    eTimes

    (secs

    /xact)

    r emote dbc ac he

  • 8/6/2019 Middle Tier

    22/24

    22

    Byrunningthe benchmarkand itsmodificationsin variousconfigurations,weshow that bothapplication servers

    andback-enddatabaseservers canbe bottlenecksunder different workloads. Applicationserversare mostlyCPU

    intensiveundere-Commerceworkloads,buttheycanscaletoalargenumberofusersbyreplicating(togetherwith

    replicatedapplications)tomultiplenodes.Ingeneral,singlenodecommercialdatabaseserversconsumemuchless

    CPUresourcethanapplicationservers,buttheycanalsobecomeabottleneckunderheavyloads.

    Oneapproachtoscalingtheback-enddatabasewhenitisabottleneckistouseamoreexpensiveSMP/MPP

    system while this approach helps increase the scalability of the system, it does not address the performance,

    flexibility and availability concerns listed in Section 1. It is also more expensive compared to the DBCache

    approachwherecheaperandlessreliablemachinescanbeusedtoruntheapplicationserverswithDBCache.

    Duetoresourceconstraints,wewerenotabletotestmorethan100simulatedusers,morethan3WCSnodes,or

    separatingtheapplicationserversfromtheDBserverbyawideareanetwork.However,fromthetrendsshownin

    the experiments, we believe that middle-tier database caching on the application servers can improve server

    scalability. If these datacachesaredeployedwith edgeservers,they canalso bring contentcloser to users and

    improveperformanceintermsofresponsetime.Performancecanalsobeenhancedwhentheapplicationserverand

    thedatabasearegeographicallyseparatedbya wide-areanetwork,asiscommonformanycustomers.Finally,by

    continuingtoprovidelimitedservicebasedoncacheddata,thisapproachalsoincreasesavailabilityofthewebsite.

    Manyissuesrelatingtodatabasecachinginapplicationandedgeserversarediscussedin[18].

    5 RelatedWork

    ProductsmostrelevanttooursaretheDatabaseCacheofOracles9iIAS(InternetApplicationServer)[19]and

    TimesTen's Front-Tier [22]. Oracle's Database Cachecaches full tables using a full-fledgedOracle DBMS, and

    reliesonreplicationtoolstoasynchronouslypropagateupdatesfromthebackenddatabasetothecache.TimesTen's

    Front-Tierisacachingproductbasedontheirin-memorydatabasetechnology.OneadvancedfeatureofFront-Tier

    isthatuserscancreatecacheviewsattheFront-Tier,whichcanbea subsetoftablesandjoinviews.UnlikeOracle

    and our DBCache, updates are performed at the Front-Tier cache, and propagated to the backend database at

    transactioncommittime(orthepropagationtothebackendcanalsobedoneasynchronously).

    A major difference between our work and these existing products is that our cache has distributed query

    processing capability. Thisis because weleverageDB2's federatedfeatures so that query plans at thecache can

    involvebothsitesinacost-basedmanner[20].Incontrast,OraclesqueryroutinghappensattheOCIlayerbeforea

    statement reaches the cache database. Consequently, the statement is either entirely executed at the backend

    databaseorentirelyatthecachedatabase.Similarly,applicationsusingTimesTen'sFront-Tiermustbeawareofthe

    cachecontentandissuequeriesoncachedcontentandonthebackenddatabaseseparately.

    Caching for data-intensive web sites have been recently studied in [2], [3], [7], [16], [17], and [24]. They

    focused on cachingdynamically generatedweb pages, HTMLfragments, XMLfragments, or query resultsfrom

    outside of a DBMS(except [24]investigated usingthe backend database to cache intermediate query results as

  • 8/6/2019 Middle Tier

    23/24

    23

    materializedviews). Ourfocusis toengineera full-strengthDBMSintoa middle-tierdatabasecachefrominside

    out,andimproveavailabilityandperformanceforapplicationswithoutmakinganychangestothem.

    Finally,previousworkonmaterializedviews[9]andcachingforheterogeneoussystems([1],[4]),clientserver

    databasesystems([6],[14]),andOLAPsystems[8]arealsorelevanttoourwork.Mostofthetechniquesproposed

    inthesepapersaresuitableforspecifictypesofapplications,forexample,keywordbasedsearch,mobilenavigation,

    orcomputationintensiveOLAPqueries.Comparetotheseapplications,e-Commerceapplicationsareusuallysimple

    OLTP-stylequeriesbutrequiresreliability,scalability,and maintainability. Consequently,we choosesimpletable

    levelcachingusinganindustrialstrengthDBMS.

    6 ConclusionsandFutureWork

    Wehaveexaminedtheopportunitiesine-Commerceapplicationsformiddle-tierdatabasecachingbyrunningane-

    Commercebenchmark on typical website architectures.We observedthat e-Commerceapplications generated a

    largenumberofsimpleOLTP-stylequeries,theirtableaccessesarehighlyskewedona fewread-dominanttables,

    andthere wasa clear separation between write-dominant tables andread-dominant tables. We demonstratedthat

    web application clones could scale up to heavy loads and the backend database server eventually becomes the

    performancebottleneckinthesystem.

    We have presented our prototype implementation of a middle-tier database cache. By extending DB2s

    federatedfeaturesweturnedaDB2instanceintoaDBCachewithoutchanginguserapplications.Thenoveltyofthis

    extensionisthatqueryplansatthecachemayinvolveboththecacheandtheremoteserverbasedoncostestimation.

    Throughexperiments,weshowedthattheoverheadofaddingafull-strengthDBMSasamiddle-tierdatabasecache

    wasinsignificantfore-Commerceworkloads.Consequently,middle-tierdatabasecachingimprovedusersresponse

    timesignificantlywhenthebackenddatabaseserverwasheavilyloaded.

    FutureworkincludesextendingtheDBCacheprototypetohandlespecialSQLdatatypes,statements,anduser

    defined functions. We are also investigating alternatives for handling updates. Usability enhancements, such as

    cache performance monitoring, and dynamically identification of candidate tables for caching are important

    directionsforustopursue.

    Acknowledgements

    WewouldliketothankMehmetAltinelforhishelpfulsuggestionsonthepaper,LarryBrownforsettingupandtestingtheprototypeonNTplatforms,andKiranMehtaandDanWolfsonfordiscussionsaboutourarchitecture.

    We would also like to thank Satya Dash, Joseph Fung, Jim Kleewein, Peter Schwarz, and David Tolleson for

    answeringourquestionsontheECDWbenchmark,DB2federatedfeatures,andDPropR.

    References

  • 8/6/2019 Middle Tier

    24/24

    [1] Sibel Adali, K. Seluk Candan, Yannis Papakonstantinou, and V. S. Subrahmanian. Query Caching andOptimizationinDistributedMediatorSystems.SIGMODConference1996:137-148.

    [2] K.SelukCandan,Wen-SyanLi,QiongLuo,Wang-PinHsiung,andDivyakantAgrawal.EnablingDynamicContentCachingforDatabase-DrivenWebSites.SIGMODConference2001.

    [3] JimChallenger,ArunIyengar,andPaulDantzig.AScalableSystemforConsistentlyCachingDynamicWebData.IEEEINFOCOM99.

    [4] Boris Chidlovskii, Claudia Roncancio, and Marie-Luise Schneider. Cache Mechanism for HeterogeneousWebQuerying.Proc.8thWorldWideWebConference(WWW8),1999.

    [5] Mike Conner, George Copeland and Greg Flurry. Scaling Up e-Business Applications with Caching.DeveloperToolbox Technical Magazine, August 2000.

    http://service2.boulder.ibm.com/devtools/news0800/art7.htm

    [6] ShaulDar,MichaelJ.Franklin,BjrnrJnsson,andDiveshSrivastava,MichaelTan.DataCachingandReplacement.VLDB1996.

    [7] AnindayaDatta,KaushikDutta,HelenM.Thomas,DebraE.VanderMeer,KrithiRamamritham,andDanFishman. A Comparative Study of Alternative Middle Tier Caching Solutions to Support Dynamic Web

    ContentAcceleration.VLDB2001.

    [8] Prasad Deshpande, Karthikeyan Ramasamy, Amit Shukla, and Jeffrey F. Naughton. CachingMultidimensionalQueriesUsingChunks.SIGMODConference1998:259-270.

    [9] AshishGuptaandInderpalSinghMumick(Editors).MaterializedViews:Techniques,Implementations,and

    Applications.TheMITPress,1999.[10] IBM.DB2DataJoiner.http://www-4.ibm.com/software/data/datajoiner/[11] IBM.DB2DataPropagator.http://www-4.ibm.com/software/data/DPropR/[12] IBM. DB2 System Monitor Guide and Reference. http://www-4.ibm.com/cgi-

    bin/db2www/data/db2/udb/winos2unix/support/document.d2w/report?fn=db2v7f0frm3toc.htm

    [13] IBM.WebSphereCommerceSuite.http://www-4.ibm.com/software/webservers/commerce/wcs51.html[14] Arthur M. Keller and Julie Basu. A Predicate-based Caching Scheme for Client-Server Database

    Architectures.VLDBJournal5(1):35-47(1996).

    [15] Donald Kossmann, Michael J. Franklin, and Gerhard Drasch. Cache Investment: Integrating QueryOptimizationandDistributedDataPlacement.ACMTODS,December2000

    [16] AlexandrosLabrinidisandNickRoussopoulos.WebViewMaterialization.SIGMODConference2000.[17] Qiong Luo and Jeffrey F. Naughton. Form-Based Proxy Caching forDatabase-Backed Web Sites.VLDB

    2001.

    [18] C. Mohan. Caching Technologies for Web Applications. VLDB 2001.http://www.almaden.ibm.com/u/mohan/Caching_VLDB2001.pdf

    [19] Oracle Corporation. Oracle Internet Application Server Documentation Library.http://technet.oracle.com/docs/products/ias/doc_index.htm

    [20] Mary TorkRoth, Fatma Ozcan, Laura M. Haas: CostModelsDO Matter:Providing CostInformationforDiverseDataSourcesinaFederatedSystem.VLDB1999

    [21] SegueSoftware,Inc.SilkPerformer.http://www.segue.com/html/s_solutions/s_performer/s_performer.htm[22] TimesTen.TimesTenFront-Tier.http://www.timesten.com/products/fronttier/index.html[23] TransactionProcessingPerformanceCouncil.TPC-WBenchmark.http://www.tpc.org/tpcw/default.asp[24] KhaledYagoub,DanielaFlorescu,ValrieIssarny,and PatrickValduriez.Buildingand CustomizingData-

    IntensiveWebSitesUsingWeave.VLDB2000.