cloud file system security and dependability with safecloud-fs/miguel-correi… · tion services...

16
6/15/17 1 Cloud File System Security and Dependability with SafeCloud-FS Miguel P. Correia KTH – Stockholm – June 2017 Joint work with Alysson Bessani, B. Quaresma, F. André, P. Sousa, R. Mendes, T. Oliveira, N. Neves, M. Pasin, P. Verissimo, M. Pardal, E. Silva, F. Apolinário, D. Matos Clouds are complex so they fail 2 These faults can stop services, corrupt state and execution: Byzantine/malicious faults

Upload: others

Post on 17-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

1

CloudFileSystemSecurityandDependabilitywithSafeCloud-FS

MiguelP.CorreiaKTH– Stockholm– June 2017

Jointwork with AlyssonBessani,B.Quaresma,F.André,P.Sousa,R.Mendes,T.Oliveira,N.Neves,M.Pasin,P.Verissimo,M.Pardal,E.Silva,F.Apolinário,D.Matos

Cloudsarecomplexsotheyfail

2

Thesefaultscanstopservices,corruptstateandexecution:Byzantine/maliciousfaults

Page 2: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

2

Cloud-of-Clouds

• Consumerrunsserviceonasetofcloudsformingavirtualcloud,whatwecallacloud-of-clouds

• Relatedtothenotionoffederationofclouds– Federationofclouds– avirtualcloudcreatedbycloudproviders;requirescooperationbetweenproviders

– Cloud-of-clouds – anad-hocvirtualcloudcreatedbyconsumers;nocooperationbetweencloudsneeded

3

User

Cloud-of-Cloudsdependability+security

• Thereisredundancyanddiversity betweenclouds• soevenifsomecloudsfailacloud-of-clouds thatimplementsreplication canstillguarantee:– Availability – ifsomestop,theothersarestillthere– Integrity – ifsomecorruptdata,dataisstillattheothers– Disaster-tolerance– cloudscanbegeographicallyfar– Novendorlock-in– severalcloudsanyway

• plus,although,notspecifictocloud-of-clouds:– Confidentiality (fromclouds)– encryption– Confidentiality/integrity (fromusers)– accesscontrol

4

Page 3: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

3

Outline

• DepSky – filestorage inclouds-of-clouds

• SCFS – filesystem inclouds-of-clouds

• SafeCloud-FS– filesystem inclouds-of-clouds

5

DEPSKY:FILESTORAGEINCLOUDS-OF-CLOUDS

6

Page 4: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

4

DepSky

• Client-sidelibraryforcloud-of-cloudsstorage– Filestorage,similartoAmazonS3:read/writefiles,etc.

• Usestoragecloudservices (S3,etc.) astheyare:– Allcodeattheclient

• Dataisupdatable– RequiresByzantinequorumreplicationprotocolsforconsistency

7

Amazon S3

Nirvanix

Rackspace

WindowsAzure

Cloud A

Cloud B

Cloud C

Cloud D

Writeprotocol

8

WRITEFILE

D

ACK

D

D

D

D

WRITEMETADATA

qwjdasjkhdahsd

ACK

qwjdasjkhdahsd

qwjdasjkhdahsd

qwjdasjkhdahsd

qwjdasjkhdahsd

time

Page 5: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

5

Cloud A

Cloud B

Cloud C

Cloud D

Readprotocol

9

REQUESTFILE

D

FILE

D

D

D

D

REQUESTMETADATA

qwjdasjkhdahsd

qwjdasjkhdahsd

qwjdasjkhdahsd

qwjdasjkhdahsd

METADATA

qwjdasjkhdahsd

Fileisfetchedfromothercloudsifsignaturedoesn’tmatchthefile

highestversionnumber(+fastestorcheapestcloud)

time

Cloud A Cloud B Cloud C Cloud D

DepSky-A:limitations

10

Data• Dataisaccessible

bycloudproviders• Requiresn×|Data|

storagespace

Data Data DataData

Page 6: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

6

Cloud A Cloud B Cloud C Cloud D

DepSky-CA:combining erasurecodesandsecretsharing

11

S1 S2 S3 S4

share

K keyData

disperse

F1 F2 F3 F4

F1 S1 F2 S2 F3 S3 F4 S4

encrypt

Inverseprocessforreadingfromf+1 shares/fragmentsEncryptedsodatacan’tbereadatacloud!

Only~2xthesizeofstorage,not4x!

Onlyfordata,notmetadata

Consistencyproportionality

• TheconsistencyprovidedbyDepSky isthesameasthebasestorageclouds– Iftheweakestconsistencycloudprovideseventualconsistency,DepSky provideseventualconsistency

– Iftheweakestconsistencycloudprovidesregularstorage,DepSky providesregularstorage

– …

12

Page 7: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

7

DepSky latency100KBfiles,clientsinPlanetLab nodes

13

DepSky’s read latencyisclosetothecloudwiththebest latency

DepSky’s write latencyisclosetothecloudwiththeworst latency

DepSky perceivedavailability• perceivedavailability=n.offilesread/n.ofreadattempts• impactedbythecloudandInternetavailability

14

Page 8: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

8

SCFS:FILESYSTEMINCLOUDS-OF-CLOUDS

15

Storagevs.FileSystem(DepSky vs.SCFS)

• Storage(DepSky)– API:simpleoperationsoverdatablocks

– sameconsistencyasclouds

– create(id)– read(fd)– write(fd,block)– delete(fd)– lock(fd)– unlock(fd)– setACL(fd)

• Filesystem(SCFS)– API:~POSIX,soit’smountedandunmodifiedappscanuseit(usesFUSE)

– strongconsistency

– open(path,flags)– read(fd,buffer,length,offset)– write(fd,buffer,length,offset)– chmod(path,mode)– mkdir(path,mode)– flush, fsync, link, rmdir,

symlink, chown,...

16

Page 9: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

9

SharedCloud-backedFileSystem-SCFS

Cloud Storage

SCFSAgent

SCFSAgent

SCFSAgent

DATA DATA

Client-based:Uses existingcloud storageservices

Controlled sharing:Access control forsecurity and concurrency

Pay-per ownership:Each client paysfor files it creates

Strong ConsistencyRedundantCloud Services

Storage clouds

CoordinationService

Cloud storageCache

Cache

Cache

LockService

AccessControl

Metadata

Computingclouds

SCFSAgent

SCFSAgent

SCFSAgent

SCFSarchitecture

Page 10: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

10

Features

• Datalayout/accesspattern– Eachfileisanobject(single-blockfile)– Multipleversionsofthefilesaremaintained– Alwayswrite,avoidreading(exploitingfreewrites)

• Caching– Filecache:persistent(toavoidreading)

• Localstorageisusedtoholdcopiesofallclientfiles(thatfit)• Openedfilesarealsomaintainedinmain-memory

– Metadatacache:short-lived,main-memory• Todealwithburstsofmetadata requests

Features

• Consistency– Consistency-on-closesemantics

• whenuserclosesafile,allupdateshedidbecomeobservablebytherestoftheusers

– Lockstoavoidwrite-writeconflicts

• Modularcoordination– Metadataisstoredinacoordinationservice

• e.g.,ApacheZookeeper(crashfault-tolerant),ourownDepSpace (Byzantine/intrusion-tolerant)

– Alsousedformanagingfilelocks– Separatedatafrommetadata

Page 11: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

11

• Problem:Howtoprovidestrongconsistencyontopofweakconsistencystorageclouds? (typicallyeventualconsistency)

• Keyproperty:thecompositestorage’consistencyisthesameoftheconsistencyanchor(typ.atomicconsistency)

• Coordinationserviceservesasconsistencyanchor

Consistencyanchor

StorageService

ConsistencyAnchor

Algorithmwrite

read

CompositeStorage

strongconsistency

strongconsistency

weakconsistencywrite

read

write

read

• SCFScanusedifferentconfigurations/backends

• Operation:blocking,non-blockingandnon-sharing

With PNSs, the amount of storage used in the coordi-nation service is proportional to the percentage of sharedfiles in the system. Previous work show traces with 1 mil-lion files where only 5% of them are shared [33]. WithoutPNSs, the metadata for these files would require 1 milliontuples of around 1KB, for a total size of 1GB of storage(the approximate size of a metadata tuple is 1KB, assum-ing 100B file names). With PNSs, only 50 thousand tuplesplus one PNS tuple per user would be needed, requiring alittle more than 50MB of storage. Even more importantly,by resorting to PNSs, it is possible to reduce substantiallythe number of accesses to the coordination service, allow-ing more users and files to be served.

3 SCFS ImplementationSCFS is implemented in Linux as a user-space file sys-

tem based on FUSE-J, which is a wrapper to connect theSCFS Agent to the FUSE library. Overall, the SCFSimplementation comprises 6K lines of commented Javacode, excluding any coordination service or storage back-end code. We opted to develop SCFS in Java mainlybecause most of the backend code (the coordination andstorage services) were written in Java and the high latencyof cloud accesses make the overhead of using a Java-basedfile system comparatively negligible.

3.1 Modes of OperationOur implementation of SCFS supports three modes of

operation, based on the consistency and sharing require-ments of the stored data.

The first mode, blocking, is the one described up to thispoint. The second mode, non-blocking, is a weaker ver-sion of SCFS in which closing a file does not block untilthe file data is on the clouds, but only until it is writtenlocally and enqueued to be sent to the clouds in back-ground. In this model, the file metadata is updated andthe associated lock released only after the file contents areupdated to the clouds, and not when the close call returns(so mutual exclusion is preserved). Naturally, this modelleads to a significant performance improvement at cost ofa reduction of the durability and consistency guarantees.Finally, the non-sharing mode is interesting for users thatdo not need to share files, and represents a design similarto S3QL [7], but with the possibility of using a cloud-of-clouds instead of a single storage service. This versiondoes not require the use of the coordination service, andall metadata is saved on a PNS.

3.2 BackendsSCFS can be plugged to several backends, including

different coordination and cloud storage services. This pa-per focuses on the two backends of Figure 5. The first oneis based on Amazon Web Services (AWS), with an EC2VM running the coordination service and file data beingstored in S3. The second backend makes use of the cloud-

of-clouds (CoC) technology, recently shown to be prac-tical [9, 12, 15]. A distinct advantage of the CoC back-end is that it removes any dependence of a single cloudprovider, relying instead on a quorum of providers. Itmeans that data security is ensured even if f out-of 3f+1of the cloud providers suffer arbitrary faults, which en-compasses unavailability and data deletion, corruption orcreation [15]. Although cloud providers have their meansto ensure the dependability of their services, the recurringoccurrence of outages, security incidents (with internal orexternal origins) and data corruptions [19, 24] justifies theneed for this sort of backend in several scenarios.

SCFS%Agent%

SCFS%Agent%

DS%

BFT$SMaRt*

DepSky*S3(

EC2(

AWS%Backend% CoC%Backend%

S3(

DS% DS%

DS%DS% RS(

WA(

GS(

S3(

Figure 5: SCFS with Amazon Web Services (AWS) and Cloud-of-Clouds (CoC) backends.

Coordination services. The current SCFS prototype sup-ports two coordination services: Zookeeper [29] andDepSpace [13] (in particular, its durable version [16]).These services are integrated at the SCFS Agent with sim-ple wrappers, as both support storage of small data entriesand can be used for locking. Moreover, these coordina-tion services can be deployed in a replicated way for faulttolerance. Zookeeper requires 2f + 1 replicas to toleratef crashes through the use of a Paxos-like protocol [30]while DepSpace uses either 3f + 1 replicas to toleratef arbitrary/Byzantine faults or 2f + 1 to tolerate crashes(like Zookeeper), using the BFT-SMaRt replication en-gine [17]. Due to the lack of hierarchical data structures inDepSpace, we had to extend it with support for triggers toefficiently implement file system operations like rename.Cloud storage services. SCFS currently supports Ama-zon S3, Windows Azure Blob, Google Cloud Storage,Rackspace Cloud Files and all of them forming a cloud-of-clouds backend. The implementation of single-cloudbackends is simple: we employ the Java library madeavailable by the providers, which accesses the cloud stor-age service using a REST API over SSL. To implementthe cloud-of-clouds backend, we resort to an extendedversion of DepSky [15] that supports a new operation,which instead of reading the last version of a data unit,reads the version with a given hash, if available (to imple-ment the consistency anchor algorithm - see §2.4). Thehashes of all versions of the data are stored in DepSky’sinternal metadata object, stored in the clouds.

SCFSconfigurations

Intrusion-tolerantconfiguration(usesDepSky)

Page 12: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

12

Latency(s)

from

instan

thostA

closes

fileun

tilhostB

seesit

Sharinglatency:SCFSvs DropBox

Amazon S3

Google Storage

RackspaceFiles

Windows Azure Blob

DATA

DATA

DATA

DATA

Non-blocking

Blocking

Cloud-of-cloudsdoesn’tincreaselatencyBlocking good for

latency (in this sense)

SCFSslightlybetterthanDropbox

SCFS

Benchmarkingunmodifieddesktopapplications

OpenOfficeWriter

Micro-benchmark #Operations File size SCFS-AWS SCFS-CoC S3FS S3QL LocalFSNS NB B NS NB Bsequential read 1 4MB 1 1 1 1 1 1 6 1 1sequential write 1 4MB 1 1 1 1 1 1 2 1 1random 4KB-read 256k 4MB 11 11 15 11 11 11 15 11 11random 4KB-write 256k 4MB 35 39 39 35 35 36 52 152 37create files 200 16KB 1 102 229 1 95 321 596 1 1copy files 100 16KB 1 137 196 1 94 478 444 1 1

Table 3: Latency of several Filebench micro-benchmarks for SCFS (six variants), S3QL, S3FS and LocalFS (in seconds).

The benchmark follows the behavior observed in tracesof a real system, which are similar to other modern desk-top applications [25]. Typically, the files managed by thecloud-backed file system are just copied to a temporarydirectory on the local file system where they are manipu-lated as described in [25]. Nonetheless, as can be seen inthe benchmark definition (Figure 7), these actions (espe-cially save) still impose a lot of work on the file system.

Open Action: 1 open(f,rw), 2 read(f), 3-5 open-write-close(lf1), 6-8open-read-close(f), 9-11 open-read-close(lf1)

Save Action: 1-3 open-read-close(f), 4 close(f), 5-7 open-read-close(lf1), 8 delete(lf1), 9-11 open-write-close(lf2), 12-14 open-read-close(lf2), 15 truncate(f,0), 16-18 open-write-close(f), 19-21 open-fsync-close(f), 22-24 open-read-close(f), 25 open(f,rw)

Close Action: 1 close(f), 2-4 open-read-close(lf2), 5 delete(lf2)

Figure 7: File system operations invoked in the file synchro-nization benchmark, simulating an OpenOffice document open,save and close actions (f is the odt file and lf is a lock file).

Figure 8 shows the average latency of each of the threeactions of our benchmark for SCFS, S3QL and S3FS, con-sidering a file of 1.2MB, which corresponds to the aver-age file size observed in 2004 (189KB) scaled-up 15% peryear to reach the expected value for 2013 [11].

0 0.5

1 1.5

2 2.5

AWS-N

B

AWS-N

B(L)

. CoC

-NB

CoC

-NB(L)

. CoC

-NS

CoC

-NS(L)

. S3QL

S3QL(L)

La

ten

cy (

s)

CloseSaveOpen

(a) Non-blocking.

0 5

10 15 20 25

AWS-B

AWS-B(L)

. CoC

-B

CoC

-B(L)

. S3FS

S3FS(L)

Late

ncy

(s)

CloseSaveOpen

(b) Blocking.

Figure 8: Latency of file synchronization benchmark actions(see Figure 7) with a file of 1.2MB. The (L) variants maintainlock files in the local file system. All labels starting with CoC orAWS represent SCFS variants.

Figure 8(a) shows that SCFS-CoC-NS and S3QL ex-hibit the best performance among the evaluated file sys-tems, having latencies similar to a local file system (wherea save takes around 100 ms). This shows that the addeddependability of a cloud-of-clouds storage backend doesnot prevent a cloud-backed file system to behave similarlyto a local file system, if the correct design is employed.

Our results show that SCFS-*-NB requires substan-tially more time for each phase due to the number of ac-

cesses to the coordination service, especially to deal withthe lock files used in this workload. Nonetheless, saving afile in this system takes around 1.2 s, which is acceptablefrom the usability point of view. A much slower behavioris observed in the SCFS-*-B variants, where the creationof a lock file makes the system block waiting for this smallfile to be pushed to the clouds.

We observed that most of the latency comes from themanipulation of lock files. However, the files accessed didnot need to be stored in the SCFS partition, since the lock-ing service already prevents write-write conflicts betweenconcurrent clients. We modified the benchmark to repre-sent an application that writes lock files locally (in /tmp),just to avoid conflicts between applications in the samemachine. The (L) variants in Figure 8 represent resultswith such local lock files. These results show that remov-ing the lock files makes the cloud-backed system muchmore responsive. The takeaway here is that the usabilityof blocking cloud-backed file systems could be substan-tially improved if applications take into consideration thelimitations of accessing remote services.Sharing files. Personal cloud storage services are of-ten used for sharing files in a controlled and convenientway [20]. We designed an experiment for comparingthe time it takes for a shared file written by a client tobe available for reading by another client, using SCFS-*-{NB,B}. We did the same experiment considering aDropbox shared folder (creating random files to avoiddeduplication). We acknowledge that the Dropbox de-sign [20] is quite different from SCFS, but we think it isillustrative to show how a cloud-backed file system com-pares with a popular file synchronization service.

The experiment considers two clients A and B deployedin our cluster. We measured the elapsed time between theinstant client A closes a variable-size file that it wrote to ashared folder and the instant it receives an UDP ACK fromclient B informing the file was available. Clients A and Bare Java programs running in the same LAN, with a pinglatency of around 0.2 ms, which is negligible consideringthe latencies of reading and writing. Figure 9 shows theresults of this experiment for different file sizes.

The results show that the latency of sharing in SCFS-*-B is much smaller than what people experience in currentpersonal storage services. These results do not consider

AWS AWS CoC S3FSNon-blocking Blocking

CoC(Non-Sharing)

CoC S3QL

80%

40%

55%lock fileops; maybe donelocally

1.2MBfile

Doing locks locally reduces much the latency

Lotsofoperations;doingthisremotely...

Cloud-of-cloudspersedoesn’tincreaselatencymuch

Page 13: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

13

SAFECLOUD-FS– ANENHANCEDCLOUD-OF-CLOUDSFILESYSTEM

25

SCFS:opportunities

• Coordinationservice storesmetadata(e.g.,filenames,directories)inclear

• Integrityverification ofdatastoredinacloudrequiresfirstdownloadingthedata

• Intrusionrecovery– whenauseraccountiscompromisedanddatacorrupted,recoveryhastobedonemanually

26

Page 14: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

14

SafeCloud-FS• BasedonSCFS,withthefeaturesjustexplained,but:• Coordinationservice HomomorphicSpace

– BasedonDepSpace butsupportshomomorphicoperations– BasedontheMorphicLib library(Java)

• Operations:searchable,orderpreserving,summable,multipliable– Storesfilemetadataencrypted

• Integrityverification:SafeAudit– integrityverificationofstoreddatawithoutdownloadingit,usinghomomorphicsignatures

• IntrusionrecoveryautomaticallywithSafeRCloud

27

WRAP-UP

28

Page 15: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

15

Conclusions• Maskingfaults/intrusionsusingclouds-of-clouds• DepSky:storageclouds-of-clouds

– Availability,integrity,disaster-tolerance,novendorlock-in,confidentiality

– Faultsinclouds+versions,soByzantinequorumsystemprotocols

– Sameconsistencyasthestorageclouds– Erasurecodestoreducethesizeofdatastored– Secretsharingtostorecryptographickeysinclouds

29

Conclusions

• SCFS:acloud-backedfilesystem– BasedonDepSky andprovidingsimilarguaranteesbutnear-POSIXAPI

– soitneedsstrongconsistencyprovidedbycoordinationservice

– cachingandcarefuldesignallowsgoodperformance

• SafeCloud-FS:anenhancedcloud-backedfilesystem– Encryptedmetadata– Integrityverification– Intrusionremoval

30

Page 16: Cloud File System Security and Dependability with SafeCloud-FS/miguel-correi… · tion services can be deployed in a replicated way for fault tolerance. Zookeeper requires 2f +1replicas

6/15/17

16

Thank you• Papers:

– DepSky:DependableandSecureStorageinaCloud-of-Clouds.ACMTransactionsonStorage,2013(alsoEuroSys 2010)

– SCFS:aSharedCloud-backedFileSystem.Usenix AnnualTechnicalConference(ATC),2014

• Code:– DepSky:http://cloud-of-clouds.github.io/depsky/– SCFS:http://cloud-of-clouds.github.io/SCFS/

• My web:http://www.gsd.inesc-id.pt/~mpc/