the survey of cache management in the shared storage environment

7/30/2019 The Survey of Cache Management in the Shared Storage Environment

1/4

The 7th International Conference for Internet Technology and Secured Transactions (ICITST-2012)

The survey of cache management in the shared

storage environment

Junfeng Xu, 80 Sun

National Computer Network Emergency ResponseTechnical Team oordination enter of hina

CNCERT/CCBeijing, China

Abtact-Wih he developmen of loud ompuing, i

needs high performne, vililiy nd sliliy of

sorge servie. In shred sorge environmen sed on

he opening sorge rhieure, due o he indpiliy

of he he mngemen, he loed on ess phs

provides performne elerion, u limis vililiy

nd sliliy of loud sorge sysem. This pper sudies

previous he mngemen ehnology of luser or

disriued sysem, nd pus emphsis on vililiy nd

onsiseny, whih ring down vililiy nd sliliy

of loud sorge sysem. Moreover, his pper lso nlyzes

he dpiliy of previous he mngemen ehnology

for shred sorge environmen.

Kywords-ha tag; cach managmnt clu

cmputing

I. NTRODUCTION

With rapid growth of the information and data scale,computing ability of IT systems are facing challenges. Theemergence of cloud computing [18][19][20][30] may solve theproblem. Cloud Computing is a technology that uses theinternet and central remote servers to maintain data andapplications. It provides user hardware, platform, soware andstorage services. As is known, data analysis includes datacomputing and data storage, so storage system is veryimportant. With the explosive growth of data scale, cloudcomputing system provides high demand for capacity,performance, reliability and scalability of storage system.Cloud storage system [21][23][24] based on the opening

storage architecture is widely noted. It usually employs generalhardware and open source soare to construct storage system.

Cache is an effective method to improve 0 performance ofstorage system. employs data access locality to improveperformance and brings down access latency of applications. Inthe environment of cloud storage, cache located on access pathsnot only improves storage performance, but makes animportant eect on availability and scalability of cloud storagesystem. To be specic, 1 Effect on system availability. Inorder to improve performance, cache data usually is updated toended storage asynchronously or periodically. Due toasynchronous update beteen cache and ended storage, if an

97B190B3200B/7/$2.002012 IEEE

Chengxiang Si

National Computer Network Emergency ResponseTechnical Team oordination enter of hina

CNCERT/CCBeijing, China

error about hardware or soware causes system crash, cachedata will be lost. may result in data inaccessible andinterruption of application service; therefore, the systemavailability brings down. 2 Effect on system scalability. In the

environment of cloud storage, in order to improve handingability, it usually concurrently accesses the ended storagethrough several 0 paths. As a result, there are several copies indifferent paths; however, every copy is managedindependently, which will result in cache inconsistency formultiply paths. One modied copy is known by local cache,but it is invisible to other copies in other paths, which results incache in-consistency.

From the above, with the development of cloud computing,it needs high performance, availability and scalability ofstorage service. In cloud storage environment based on theopening storage architecture, due to the inadaptability of the

cache management, cache located on access paths providesperformance acceleration, but limits availability and scalabilityof cloud storage system. This paper studies previous cachemanagement technology of cluster or distributed system, andputs emphasis on availability and consistency, which bringdown availability and scalability of cloud storage system.Moreover, this paper also analyzes the adaptabili of previouscache management technology for cloud storage environment.

II. EVIOUS ESEARCHES ON CACHE MANAGEMENT OFCLUSTER

In this section, we will study previous cache managementtechnology of cluster or distributed system, and pay atention toavailability and consistency. In terms of different usage modes,we will divide different cache management into the followingtypes to illustrate.

A c mmt o clstr l systm

According to the cooperative relation between cache nodes,we divide into two typescooperative cache and noncooperative cache.

(1) Non-cooperative cacheNFS [5] is widely used in the industry, and has been widely

transplanted into all operating systems. The propertyinformation of client les or directories is kept in client cache

139


2/4


for a certain time. To be noted, the client never keeps le datain its cache. NFS adopts the timeout mechanism to distinguishwhether client cache is effective. To be specic, while datastaying in cache exceeds a certain time, NFS will reesh thecache data at the following access. The cache reliability of NFS

is weak, and the timeout mechanism also doesn't guaranteecache consistency for multiple clients ndamentally.

Reference [3] adopted NVRM to improve writeperformance and reliabili of the distributed le system. added NVRAM into client and server sides respectively astemporary cache for write data. Due to the limit of cacheconsistency, this way is only appropriate for the distributed lesystem whose write data buffered in clients, such as Sprite andAFS. NVRAM can guarantee performance and reliability ofcache effectively, but it needs special hardware which results inweak generality. Moreover, NVAM is very expensive andlow cost performance for a long time. In the aspect of cacheconsistency, it followed cache consistency protocols of its hostle system. For example, Sprite adopted the "concurrent writesharing protocol. When multiple clients open a leconcurrently, and one of them modies the le, the server willnoti every client to invalidate its cache.

Clients of coda le system [4][11] adopt local disks as datacache, which can guarantee persistent data access at the case ofshared storage failures. When clients fail, due to the non

volatile property of disk medium, data which has been writteninto cache disks will be reliable. However, data which hasn'tbeen ushed into cache disks won't assure reliability. For theaspect of cache consistency, there are two strategies ---- thepositive and the passive. The cache consistency of the former is

relatively strong, which prevents concurrent partition writesand limits concurrent read and write for a single partition tosolve access conicts. The later employs usual lease methodsto make cache effective, which assign an upper limit for onelease. When one lease exceeds the limit, clients won't accessthe according objects.

(2) Cooperative cacheReference [6] indicated that with the development of

networks, employing cache of other clients is feasible fordistributed le system. In the reference, remote client memoryis regarded as one level of the storage architecture, and clientmemory, remote client memory, server memory and server

storage construct the entire storage architecture. The specialmanager is responsible for the global cache management andconsistency. For the cooperative cache, due to unreliability ofwrite operations, it adopted synchronous-update strategy andonly kept read data in cache. In the aspect of cache consistency,it employed the read-write token mechanism, and the centralmanager is responsible for token allocation and reclamation.When a cache block is modied, it adopted the "writeinvalidate strategy to noti the according clients to invalidatethe cache block.

zFS [12] [13] is developed by the IBM Haifa lab. The entirele system is consist of clients, le managers, cooperative

97B190B3200B/7/$2.002012 IEEE

cache managers, lease managers, transition managers andoriented-object storage devices. It employed all memory ofnodes as cache of the entire le system to improveperformance. Similar to reference [6] the cooperative cachein zFS can only buffer read data, and cache consistency is

guaranteed by the lease and distributed transactionmechanisms.

B c mmt o ork pplctos

This section will introduce cache management strategies ofseveral popular networking applications, such as web,streaming media. Internet is a widely distributed system, andcache technology has been extensively used in netorkservices. There are kinds of cache distributed at access pathsom user terminals to network data sources, such as browser,gateway server, the proxy server of the ISP serice providers,switches and routers. According to the location of one cacheserver, there are the following two cases.

( I) Cache cluster for the ended databaseThe type is usually deployed at the gateway or inner of

local network. Memcached[7] is a universal distributed cachesystem, which is used to bring down the load of the endeddatabase. It employed memory of storage cluster to providecache service for web application nodes. Memcached madelly use of all memory of storage cluster to construct cacheobjects which are shared and managed centrally. The ont webnodes manage all memory of cluster nodes by a global hashtable, which can avoid duplication of cache objects in the entirecluster and improve cache utilization of the entire storagecluster system. The goal of memcached is to improve cacheutilization of the entire system without consideration for cache

reliabili a In addition, every cache object is unique in thesystem, so cache consistency is no need to be considered.Reference [10] added data persistency mechanism formemcached to guarantee cache reliability.

Figure 1. proxy cache cluster

(2) Proxy cache clusterWeb Cache is widely used, which is distributed in the entire

Inteet. The implement of it is usually simple, and itsdistribution in access paths is similar to the tree. As shown inFig. 1, it employed ICP [14] protocol to nish queries for everycache levels. Its basic principle as follows: if a query hit insome level cache, it will return immediately. Otherise, thequery request will be forwarded to cache nodes at the upper

140


3/4


level until the content server. The cooperative work nished bythe tree-level proxy cache servers, can reduce the amount ofcommunication with the exteal network and meet the users'requirements. Since there are multiple paths to reach the webserer, a cache node failure can still ensure the availability of

caching proxy service. When data of the web server is updated,and the according data in cache is still old, cache inconsistencywill appear. How to keep cache consistency has been widelystudied. Previous consistency algorithms are divided into twocategories: weak consistency mechanism and strongconsistency mechanism Weak consistency technologiesinclude TTL Time-To-Live, Adaptive TTL and user query.Similar to the timeout mechanism of NFS, TTL is an expiredtime assigned for a cache copy. If the current time is less thanTTL, the cached data is regarded as esh. Or we regarded thecache copy is gray, namely the copy is old. For strongconsistency mechanism, the server will record which client hasaccessed the data. When the data is modied on the server, itwill noti clients, which have cached the according data, to

invalidate the cached data. Strong consistency mechanism ISrelative complicated, and will increase access latency.

C c mmt o ork stor systm

(I) Clients of network storage systemSTICS [8] is a cache system for network storage devices,

which is designed to nish effective conversion om SCSIprotocol to IP protocol. It employed two-level cache to realizethe protocol conversion and eectively alleviated performancegap between the two. STICS included two parts--- read cacheand write cache. The former consisted of a large DAM, andthe later made up of two-level cache, including a large log diskand a small NVAM. Modied data is writen into the small

NVM. When the data amount ofVM reaches a certainvalue, the data will be ushed into the large log disk. STICSprovide a lock manager for every shared device and maintain alock status table which record lock status of shared datasegment. Moreover, it provides a shared read lock and anexclusive write lock. Therefore, cache consistency of STICS isguaranteed through the lock and the write-invalidatedmechanism.

(2) Access path of network storage systemCache located on 0 path has a signicant impact on the

ont applications and the ended storage.

SANBoost [1][2] realized a cache system based solid statedisk for a virtual server. Its goal is to solve performanceinterference between several logic storage volumes andguaranteed quality of service. In order to guarantee cacheexclusive om the underlying disk array, it only buffer hot datawhose access equency reached a certain value. From theoverall perspective, in fact, SANBoost and disk array cacheform the to-level cooperative cache. It employed thedistributed lock to synchronize shared access beteen clientsand SANBoost. From the view of cache reliability, solid statedisk is persistent, which can ensure that the write cache data isnot lost in the event of system failure.

97B190B3200B/7/$2.002012 IEEE

m-cache [9] developed by IBM, employed local disk asdevice cache for network storage system. Its great feature is tocustomize and manage cache system based on session semanticstrategy. Every session is associated with the deployment andexecution of every application. The beginning of the session

establishes the corresponding cache, and deletes thecorresponding cache aer the session end. From the view ofhigh availability, if each server has its own independent cachedisk, the complexi of data serer recovery om failures willbe increased. m-cache didn't consider cache consistency, andreferred to the upper application to solve the problem

III. ON CLUSIONS

In this section, we summary the related research, andanalyze the applicability of the previous tecnology for cloudstorage environment.

A Prspctv o vlblAs stated above, there are two ways to guarantee cache

reliabili. C synchronized update. The data is syncronized tothe back-end storage to ensure data consistency, which wellsupported fail over.@ the media property. NVM or otherpersistent media can effectively protect the written data, but itcan't guarantee continuity of cache service. If data in cacheisn't ushed into the ended storage before cache nodes failure,the latest data which is inaccessible may lead to the interruptionof business such as critical data inaccessible. Therefore, theavailability of the entire system brings down.

Availability of storage system demands that when parts of

the hardware and soare failures, the system can still ensuredata availability and integrity, and achieve the continuity ofapplications. Therefore, the redundant technology is employedto eliminate single point of failure. When fault happens, failuredetection and failover mechanisms will enable work tocontinue.

B Prspctv o cosstcy

The reason of cache inconsistency is that there are multiplecopies for a data block. The technology of cache consistency isto guarantee the consistent view for the same data. There aretwo ways to maintain cache consistency: write-invalid andwrite-update. For the former, if one copy has been modied,other copies will invalidate. When other node demands theaccording data, it needs to reacquire the latest version om theserver. For the later, the modication for one copy will notiothers. Previous studies mainly employ timeout-update orwrite-invalidate mechanism to maintain different levels ofcache coherency. The timeout-update mechanism canguarantee weak consistency, which is suitable for applicationswith low consistent demand. Write-invalidate mechanismemploys the distributed lock to achieve different levels ofcache consistency, but semantics of the distributed lock is verycomplex. The distributed lock mechanism usually needs thelock manager which may be distributed or centralized. Thecentralized lock manager usually becomes the botleneck of

141


4/4


system performance; therefore, the system scalability bringsdown. The management complexity of the distributed lockmanager is usually high; therefore, the scalabili is limitedwith the increase of system scale.

Aimed at inapplicability of cache management in theenvironment of shared storage based on the opening storagearchitecture, this paper studied previous cache managementtechnology. We analyzed design features, managementtechnology, cache protection and cache consistency of severaltypical cache systems, and summarized the related researchom the perspectives of cache availability and consistencyrespectively. According to the summary, we conclude thatprevious technologies for cache protection have theshortcomings of low performance, weak support for systemavailability, cache consistency limiting the scalability of thesystem. In ture work, we will study the efcient cacheprotection and consistency maintenance techniques for cloudstorage environment. On the basis of guaranteeingperformance, we will try to enhance availability and scalabilityof the entire system.

IV. EFEENCES

[ 1] Ismail Ari, Melanie Gottwals, Dick Henze. SANBoost: automated SANlevel caching in storage area network. In Proceedings of the 1st IEEEInteational Conference on Autonomic Computing, pp. 164 - 17 1, NewYork, NY, USA, May 2004.

[2] Ismail Ari, Melanie Gowals, Dick Henze. Performance Boosting andWorkload Isolation in Storage Area Networks with SANCache. InPoeedng of the 23d IEEE 4th NASA Goddad Confeene onMass Storage Systems and Technologies (MSST), May 2006.

[3] Baker M , Asami S., Deprit E, Ousterhout J., Seltzer M.. NonvolatileMemor for Fast, Reliable File Systems. In Proceedings of the 5th

Inteational Conference on Architectural Suppor for ProgrammingLanguages and Operating Systems, pp. 1022, Boston, MA, USA,October 1992.

[4] Peter J Braam. The Coda Distributed File System.Linux oual, une1998.

[5] B. Callaghan, B. Pawlowski, P. Staubach, Sun Microsystems, Inc. NFSVersion 3 Protocol Specication, June 1995.

[6] Michael D Dahlin, Randolph Y Wang, Thomas E Anderson, DavidPatterson. Cooperative Caching: Using Remote ClientMemor. Proceedings of the 1st USENIX conference on OperatingSystems Design and Implementation, Monterey, CA, USA November1994.

[7] Brad Fitzpatrick. Distributed caching with memcached.Linux ournal,August 2004.

[8] X He, M. Zhang, and Q. Yang. STICS SCSItoIP cache for storagearea networks . oual of Parallel and Distributed Computing, pp. 1069

1085, Vol. 64, No. 9, September 2004.[9] EY Henbegen and M. Zhao. Dynamc polcy dk cachng fo toage

networking. IBM Technical Report, RC24123, IBM Research DivisionAustin ResearchLaborator, 2006.

[ 10] Memcachedb, http/memcachedb.org (Access date 18 Agust 20 12)

[ 1 1] Mahadev Satanarayanan, James 1. Kistler, Puneet Kumar, Maria EOkasai, Ellen H. Siegel, and David e Steere. Coda: A Highly AvailableFile System for a Distributed Workstation Environment, IEEETransactions on Computers, pp47459, Vol 39, No 4, April 1990.

97B190B3200B/7/$2.002012 IEEE

[ 12] Ohad Rodeh, Uri schonfeld, Avi Tepean, zFS A ScalableDistributed File System Using OSD. In proceeding of the 20th IEEE 1 1th NASA Goddard Conference on Mass Storage Systems andTechnologies. San Diego, CA, April 2003.

[ 13] A. Teperman, A. Wei Improving Performance of a Distributed FileSystem Using OSDs and Cooperative Cache. In Proceeding of the 18th

International Conference on Parallel and Distributed ComputingSystems.Las Vegas, Nevada, USA, September 2005.

[ 14] Duane.Wessels, Kimberly. Cla. Application of Inteet Cacheprotocol(version2), FC2187, September 1997.

[15] Chen G., Wang eL.,Lau F. e M .. A Scalable Clusterbased WebServer with Cooperative Caching Suppor. Computation and Currency:Practice and Experience, Vol. IS No. 8, JuneJuly 2003.

[ 16] Brian C Forney, Andrea Carol Arpaci Dusseau, Remzi Hussein ArpaciDusseau. StorageAware Caching: Revisiting Caching forHeterogeneous Storage Systems. In proceedings of the 1st USENIXConference on File and Storage Technologies, pp. 2830, Monterey, CA,USA, anuar 2002,.

[ 17] ehanfrancois Paris, Theodore Haining, Darrell D ELong. A StackModel based Replacement Policy for a Nonvolatile Cache. InProceedings of IEEE System Mass Storage System, pp. 217224,College Park, MD, USA, March 2000.

[ 18] A histor of cloud computinghttp:.computerweekly.comfeatureAhistorocloudcomputing(Access date: 15 Agust 20 12)

[ 19] GoogleTrends httpww.google.comtrends (Access date: 10 Agust2012)

[20] The NIST Definition of Cloud Computinghttp/csrc.nisgovpublicationsnistpubs800 145SP800 145.pdf(Access date: 5 Agust 2012)

[2 1] Cloudstorage http/en.wikipedia.orgwikiCloud_storage (Access date:10 Agust 20 12)

[22] James A. Gardner, Darrell G. Freeman. Asymmetric data mirroring.Unted State Patent US7941694B, May 20 1 1.

[23] Cloud Storage for Cloud Computinghttp:ogorgResourcesdocumentsCloudStorageForCloudComputing.pdf

[24] Cloud Data Management Interfacehttp/snia.orgsitesdefaultfilesCDMI_SNIA_Architecture_v...pdf

[25] Chuanpeng Li and Kai Shen. Managing Prefetch Memor for DataIntensive Online Servers. In Proceedings of the 4th USENIX Conferenceon File and Storage Technologies, San Francisco,CA,USA, 2005.

[26] Reza Azimi, Livio Soares, Michael Stumm, Thomas Walsh, AngelaDemke Brown. Path: Page Access Tacking to Improve MemorManagemen Proceedings of Inteational Symposium on MemorManagement, pp. 3 142, Montreal, Quebec, Canada, October 2007.

[27] Liton Chakrabor and Aji t Singh. A Util itbased Approach to CostAware Caching in Heterogeneous Storage Systems. In Proceedings ofIEEE International Parallel & Distributed Processing Symposium, pp. 110,Long Beach, CA, USA, March 2007.

[28] Laura M. Grupp John D. Davis Steven Swanson The Bleak Futureof NAND Flash Memor Proc of the 10th USENIX Conference on Fileand Storage Technologies FAST ' 12 San Jose, CA: USENIX

Association, 20 12[29] Doug Beave Sanjeev Kuma Ha eL et al Fndng a needle

in Haystack: Facebook's photo storage Proc of the 9th USENIXSymposium on Operating Systems Design and Implementation OSDI' 10 Vancouver, B USENIX Association, 20 10

[30] Gartner Personal Cloud Services Will Be Integrated in MostConnected Devices httpww.garnecomitpagesp?id=1942015

142

the survey of cache management in the shared storage environment

Documents

computation mapping for multi-level storage cache...

in and out of the postgresql shared buffer cache - pgcon

shared pool waits. #.2 copyright 2006 kyle hailey shared...

dynamic partitioning of shared cache memory computation

enhancing the usage of the shared class cache

dell...

cacheminer : run-time cache locality exploitation on smps...

evaluating cache coherent shared virtual memory for...

custom storage -- cache 5

shared storage drive - seagate

k sarath kumar. introduction storage and querying in p2p...

model using crfimproving the java...

optimizing shared resource contention in hpc...

a comparative analysis of shared cache management ... · a...

version 4 - dell emc isilon · fast cache. to do this,...

インテル cache acceleration software (cas) nsg storage...

gudang ekspor shared storage

cache craftiness for fast multicore key-value storage

an efficient hybrid cache coherence protocol for shared...

single-producer/ single-consumer queues on shared cache...