k. gopinath iisc - storage networking industry …€“ pigmix benchmark: ... –amazon rds, amazon...

39
Cloud Storage Security K. Gopinath IISc

Upload: truongnga

Post on 23-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Cloud Storage Security

K. GopinathIISc

Page 2: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

A Very Brief History of Cloud

● मेघदूतम ् – A Cloud Messaging Platform by काळीदास

● AWS in 2006

– IaaS● S3: A Cloud Storage Service

● Now Microsoft and others

– PaaS, AaaS

Page 3: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

2014 celebrity photo leaks

● images obtained via the online storage

– offered by Apple's iCloud platform for automatically backing up photos from iOS devices, such as iPhones

● (Apple) victims' iCloud account info obtained using "a very targeted attack on user names, passwords and security questions", such as phishing and brute-force guessing

– no specific vulnerability in the iCloud service itself?

● Aggregation issue...

Page 4: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Outline● “Regular” security?

– “Could Ecosystem” security (vuln. wrt certificates, DNS, etc)● Includes all “systems” insecurities

– Poor key mgmt (but connected with cloud)● Amazon EC2 keys stored in opensource projects?● But may need better solutions (RBAC/RBE)

– DDoS

● vs “Cloud” security?

– Assume all pkts encrypted in the cloud. – How much info is still leaking? Or, exploitable?– Security/Privacy in the context of Data aggregation

● Correlation Analysis, Traffic analysis

Page 5: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Ecosystem Issues

● Bugs in Infrastructure

– Recent examples (ShellShock, HeartBleed, USB)

– Very Basic issues in IPMI 2.0● In Servers for LOM (“lights out mgmt”)● Similar Firmware insecurity in other components

● Trust wrt Certificates

● DDoS Mitigation

Page 6: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Trusting Certificates● Added trust with “Public-key pinning” where possible

– mechanism for sites to specify which certificate authorities have issued valid certs for that site, and for user-agents to reject TLS connections to those sites if the certificate is not issued by a known-good CA

– 2011: attempted SSL man-in-the-middle (MITM) attacks against Google users, whereby someone tried to get between users (primarily located in Iran) and encrypted Google services.

● attacker used a fraudulent SSL cert issued by DigiNotar, a root cert auth that should not issue certs for Google (and has since revoked it)

– Firefox 32 has support since '14; chrome from '11

● Also, stolen private keys of device vendors (“Stuxnet”)

– Issue during upd of drivers, etc.

– Need the trust wrt certificates (CMU work ~'08)

Page 7: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

“Cloud” Security● Secrecy of “sustained communication” across cloud

– Is information leaking and how much?

– Discussed in this talk!● Computation on Encrypted Objects?

– Homomorphic Encryption (HE) suitable for cloud!?● Fully HE: a cryptosystem that supports arbitrary

computation on ciphertexts● Partially HE: only certain ops (det, + , *, order-

preserving,...): – Std, AHE (Pallier)/MHE (El Gamaal)/...

– Fully HE “impractical” while Partially HE may be feasible● 1st worries about how to plug the leaks, 2nd worries about how

to exploit the structure of the “crypto” (“regulated” leak!)

Page 8: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Program Analysis and Partially Homomorphic Encryption

● Recent work uses program analysis on Map-Reduce programs to select specific Partially Homomorphic Encryption (PHE)

– “Practical Confidentiality Preserving Big Data Analysis”, Stephen et al (HotCloud'14)

● Based on Pig Latin, a high level data flow language in the Hadoop system

● Generate Data Flow Graph (DFG) from source

● Identify encryption scheme

● Transform

– Generate new DFG using available encryption schemes

– Replace operations by their cryptographic equivalents

● RESULTS:

– PigMix benchmark: 3× overhead (avg) in terms of latency

– PHE overhead extremely low compared to FHE.

Page 9: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Example (from Stephen et al, HotCloud14)doc1: (3,X), (133,Y),...doc2: (56,Q), (344,R), (47,Y),...

group ((133,Y), (47,Y)), ...add (180,Y), ...

Page 10: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

AWS● Compute & Networking

– Amazon EC2, Autoscaling Elastic Load Balancing (ELB), Amazon VPC, Amazon Route 53, AWS Direct Connect

● Storage & Content Delivery Netw – Amazon S3, Amazon Glacier, Amazon EBS, AWS Import/Export,

AWS Storage Gateway, Amazon Cloud Front● Databases

– Amazon RDS, Amazon DynamoDB, Amazon Elastic Cache, Amazon Redshift

● Application Services– Amazon CloudSearch, Amazon SWF, Amazon SQS, Amazon SES,

Amazon SNS, Amazon FPS, Amazon Elastic Transcoder● Deploy & Management

– AWS Management Console, AWS Identify and Access Management (AIM), Amazon CloudWatch, AWS Elastic Beanstalk, AWS CloudFormation, AWS Data Pipeline, AWS OpsWorks, AWS CloudHSM

● AWS Marketplace & Software

Page 11: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Storage APIs ● POSIX: Read (fd, buffer, count)

– Partial writes to a file OK (appends, overwrites, etc)

– mmap avlbl● NFS: Read (fd, offset, buffer, count)

– Partial writes and mmap avlbl

● Amazon S3: “storage” service

– Key Value store; no features like partial write or mmap!

Page 12: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

S3 Interface: Key Value Store● Amazon S3 stores data in named buckets

– Each bucket is a flat namespace, containing keys associated with objects (but not another bucket)

– Max obj size 5GB. Partial writes to objects not allowed (must be uploaded full), but partial reads OK

● Storage API– create bucket

– put bucket, key, object

– get bucket, key

– delete bucket, key

– delete bucket

– list keys in bucket $aws s3 ls s3://mybucket

– list all buckets ● $aws s3 cp myfolder s3://mybucket/myfolder --recursive

Page 13: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Security Model

● Auth betw (EC2) Machine Instance M and Storage Bucket Y

– Admin, for eg, creates a role X st only instances (such as M) can assume role X with RO perms for some bucket Y

– Programmer creates instance M with role X

– Appl. P (running on M) retrieves temporary credentials from M

● after expiry, renewed automatically if developed w SDK

Page 14: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

RBAC and RBE● AWS uses a form of RBAC● Better: Crypto-enhanced RBAC models

● eg, Achieving Secure Role-based Access Control on Encrypted Data in Cloud Storage, Lan Zhou,Vijay Varadharajan,and Michael Hitchens. Uses Weil pairing.

– Role-based encryption (RBE)● Design req.: users only need to keep a single key

for decryption and system operations are efficient regardless of the complexity of the role hierarchy and user membership in the system

● Add an information flow model?

– eg. all network pkts are at a particular “net” label (low) level

– encrypted info can further be declassified (in crypto-enhanced RBAC models) by those with decryption keys (high level) instead of keeping it at “net” label level

Page 15: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

What could be a deeper “cloud security” problem?

● (Ecosystem problems)

● Analysis on streams of cmds and responses betw Instances and Storage objs aka “traffic analysis”

– Can we infer something purely from the cmd streams?● Compression: LBX extension to X reduces BW reqd.

– Even with encrypted streams? Comparable with cryptosecurity● “crypto” considered broken with 2^^40 or so ops

● Split Problem to 2 parts

– Workload Identification based on cmds/responses

– Identification of cmds/responses even if encrypted

Page 16: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Workload Identification ● “Discovery of Application Workloads from Network

File Traces,” Yadwadkar, Bhattacharyya, Gopinath, Niranjan, Susarla, FAST 2010

– Work at IISc/NetApp. Uses only cmd name info and no other fields

Page 17: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Variability in the traces belonging to same class

● Additions, deletions and mutations● Issue with the Conventional HMMs● Need profile HMMs

Page 18: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Analogy with Problem in Computational Biology● In Computational Biology

– Need for multiple Alignment

– Divergence due to chance mutations

– Conserving critical parts

● Workload identification essentially statistical– Probabilistic models appropriate

– With Markov property due to statistical regularities: Markov Models with Hidden States (HMMs)

– Need a profile to represent multiple alignment of variable traces

– Proposal: use profile HMM for representing profile of a workload

Page 19: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Pairwise Alignments

● Global Alignment

– compute two equal length seqs such that matches are maximized and insertions/deletions are minimized

● Local Alignment

– locate two subsequences one from each string such that they are very similar

Page 20: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

An Example Multiple Alignment● Need of multiple alignment of sequences

– Detecting similarity between more than two sequences

● Multiple alignment of 10 `edit' traces

● Unfortunately,– Extending DP has complexity (time & space) O(n^r)

Page 21: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Transition Structure of a profile HMM

● Essentially L-R HMMs

Page 22: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Workload Identification● Pre-train profile HMM for each workload

– globally align the profile with the unknown sequence

● Trace Annotation– compute a local alignment of each profile with the

test trace

Page 23: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Automated Learning on Real Traces

● Small snippets of traces are sufficient for identifying many workloads

Page 24: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Workload Identification: Summary ● Profile HMM approach successful at discovering the

application-level behavioural characteristics● A diverse, representative sample of workloads required for

training● Not able to handle higher level of concurrency ● Multiple Kernel based SVM Methods can improve accuracy

(HotStorage'12)– Uses all fields instead of only cmd name

● Pankaj Pipada, Achintya Kundu, K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ: Online learning to label program phases using storage traces," HotStorage Jun 2012

– But encryption makes it again not that useful!

Page 25: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Encrypted Command Streams: How secure are they?

• Here, we use NFS instead of S3– Also, encrypted NFS commands

• Extraction of feature vectors from Network Traces to form training data

• Application of Supervised and Unsupervised Machine learning techniques on training to predict the NFS command

Page 26: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Approach

• Setup NFS client and Server using SSH Tunneling • Tap Encrypted/Decrypted Network trace using Wire-

Shark.• Extract feature vectors for each NFS command from

encrypted NFS trace• Feature Vectors considered:

– Request Packet Size

– Reply Packet Size

– Round Trip Time

– Ratio of Reply and Request Size

Page 27: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Trace Scenario

NFSSERVER

Secure communicationNFS CLIENT(decrypted)

INTRUDER(encrypted)

Page 28: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Trace Extraction

• Some challenges involved in extracting the Feature Vector from encrypted dump eg.– Non Consecutive request and reply sequences

• Extraction of true labels (NFS Command) from decrypted dump for1. Feature set validation using supervised learning

2. finding classification error in unsupervised learning

Page 29: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

NFS Commands in Traces

i. GetAttr

ii. Access

iii. Read

iv. ReadDirPlus

v. FSStat

vi. Lookup

No of feature Vectors = 4

No of Output classes = 6

No of Training sets = 7103

Page 30: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Machine Learning Techniques UsedFeature Set Validation using Supervised Learning

(needs labelled examples, ie both encrypted and decrypted pkts, to learn mapping function)

1. Hashing and Analysis2. Decision Tree

Unsupervised Learning: needed as only encrypted pkts avlbl in practice

3. K-Means

4. Hidden Markov Model

5. Online Hidden Markov Model

Relative comparison of accuracy of the methods

Page 31: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Supervised Techniques Technique 1:• Analyze the pattern • Find out similarity among the data corresp to same

command in decrypted trace• Hard code the rules of classification• Accuracy : 99+%

Technique 2: Decision Tree• Generates classification tree based on feature vectors

to predict NFS cmds at leaf of tree • Uses principle of “minimization of entropy”• Accuracy: 94%

Page 32: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Decision Tree

Page 33: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Unsupervised learning

• No access to labels from decrypted data.• Only from timestamp and packet size of encrypted commands• Assumptions:

1. The number of files and their sizes in a directory in server is distributed according to Gaussian Distribution.

2. Naïve Base Model of feature vector (cause is independent when the effect is observed)

3. Data is generated from Gaussian Distribution of feature vectors.

4. Synchronous Request from Single Client Server Environment

Page 34: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Technique 3: K-Means

• Clustering Based Technique • Classifies the training data to 6 classes

corresponding to NFS commands– Use offline access to decrypted data to infer properties

of pkts that can be used to predict labels on clusters

• Accuracy: 87%• “Hard Clustering” • Converges in finite number of iterations.

Page 35: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Technique 4: Hidden Markov Model

• The actual cause (NFS command) is unobserved, but the effect (encrypted n/w trace) is observed

• Hidden Variable: NFS command• Observed Variable: packet size and rtt• Why use HMM?

– Because there is correlation between successive commands. No Independence assumptions

• “Soft Clustering” • Accuracy: 82% on training data.

Page 36: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Technique 5: Online HMM

• Why Online?– Realistic way to model online generation of traces in

NFS network.

– Prevents over application of Markov Structure in data.

• A sequence of n/w trace for 1000 consecutive NFS commands is collected.

• Repeated for every new 1000 n/w traces• Accuracy: 84%

Page 37: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Comparison of Models

Technique Accuracy %

Hashing 99

Decision Tree 94

K-Means 87

HMM 82

Online HMM 84

Page 38: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Summary (Leakage of Info)Unsupervised more useful• Realistic Scenario in most cases• Robust (soft clustering)• More general and easily extended to other netw frameworks• But accuracy lower• Can handle missing data or labels• Useful? in new Android attacks that can predict pwd input

Since accuracy of 80% and above, need to obfuscate• Add variable padding

– But Goldwasser et al prove no “secure” obfuscation!

– No “perfect” crime

Page 39: K. Gopinath IISc - Storage Networking Industry …€“ PigMix benchmark: ... –Amazon RDS, Amazon DynamoDB, ... K Gopinath, Chiranjib Bhattacharyya, Sai Susarla, Nagesh P. C., "LoadIQ:

Conclusions

● Cloud storage security: “cloudy” with some chances of rain!

● “Ecosystem” security and “real” cloud security aspects intertwined– Detect pwd input thru side channels

– Aggregation attacks

– Traffic analysis attacks