1 a framework for developing privacy middleware for cloud data services mamadou h. diallo
TRANSCRIPT
1
A Framework for Developing Privacy Middleware for Cloud Data Services
Mamadou H. Diallo
2
Outline Overview/Motivation
Approach: A framework for developing privacy middleware Abstract Service Model
Privacy Middleware Architecture
Data Protection Model
Implementation Based on proxy – adaptation of Sahi
Web application: Google Calendar
Google Calendar Service Model
Data protection: cryptographic algorithms
Implementation Status Implemented Features
Remaining Features
3
Overview/Motivation Increase of web based data services
Some Benefits: improved service, accessibility, availability, low cost, etc… Examples: Google calendar, Microsoft Live Mesh, Yahoo Briefcase, etc…
Privacy issues Outsider attacks – (Internet hackers) Insider attacks – (non honest employees) Lack of support for privacy enforcement from web applications
Current approaches Assumption: cooperative servers Algorithms and protocols – supported by servers Drawbacks: web service providers not willing to cooperate
Proposed approach: privacy middleware Assumption: un-cooperative servers Techniques: encryptions Advantages: address insider attacks, policy-based, Challenges:
Service abstraction, Service adaptation Query processing – privacy enforcement Sharing - keys distribution and revocation Support for other servers
4
Approach: A Framework for Privacy Middleware
Standard web application architecture Three logical layers
Client layer – implemented in a browser Presentation and business logic layer – implemented in a web server Data layer – implemented as a database
5
Approach: A Framework Privacy Middleware
New logical layer: privacy enforcement layer Implemented in a privacy middleware Design and implementation - based on proxy technology
6
Abstract Service Model: Data Model Data modeled as objects
Object: O = {(A1,V1), (A2,V2), …, (An,Vn)}, where (Ai,Vi) an attribute/value pair, n the total number of pairs
Granularity of objects: depend on data types Event-based: unit = event
File-based: unit = file
Data categories Structured: examples – events in calendar, database entries Unstructured: examples – text documents, video files, audio files
Data types Ordered data: examples – dates, numerical data Non ordered data: examples - text document, presentation document Other data: categorical data (list of choices), boolean data (YES/NO)
7
Abstract Service Model: Operations Operations modeled as functions
Function: inputs, processing, outputs Create/store and modify objects
Inputs: object, privacy policies Processing: encryption, tagging Outputs: encrypted object with tags
Fetch/retrieve objects Inputs: HTML pages with encrypted data Processing: decryption, un-tagging Outputs: HTML pages with no encrypted data
Query objects Inputs: query parameters Processing: encryption Outputs: encrypted parameters
Share objects Inputs: object ID, sharing policies Processing: encryption Outputs: encrypted data (object ID, keys and metadata for decrypting the object)
8
Data Protection Model Approaches
Based on cryptographic techniques Encryption/decryption mechanisms
Challenges Supporting web applications services Issues: accessing encrypted multi-data set Examples:
Searching text, searching range text, etc… Sharing personal data, sharing documents, etc… Collaboration, integration, etc…
Available techniques Non efficient encryption
More security vs. poor performance Examples: Randomized encryption – retrieve all data for each query
Efficient searchable encryption Less security vs. better performance Examples: ordered-preserving encryption, bucketization based encryption
9
Data Protection Model Encryption Strategy
Ordered Data Order-preserving encryption schemes Example: keyword-based encryption
Non Ordered Data Searchable encryption schemes Example: order-preserving encryption
Other Data May not be encrypted Example: categorical data, boolean data
Key Management Storage and retrieval
Keys and metadata stored on the server – portability Encrypted using a master key for the owner Retrieved once for each web session
Representation XML Schema Need to flattened before storing Extensibility
10
Privacy Policies Definition (illustration)
PP = <PolicyID, CreationDate, ExpirationDate, Statements>
Statement = <Object, Attribute, EncryptionMethod>
Example: Google Calendar “Hide my meeting with Bob on 01/01/2009”
Encoding:{Policy1, 1/1/2010, 12/31/2010, {Event1, Event1.What, KDE1},{Event1, Event1.When, OPE1},{Event1, Event1.Where, KDE1},{Event1, Event1.Descryption, KDE1}}where,KDE= keyword-based encryption, OPE= order-preserving encryption
Policy enforcement Attribute-level: encrypt all attributes or none
Object-level: more flexible, but more challenging (information leakage)
11
Framework Architecture Privacy middleware: 7 components Communication: HTTP messages Trusted: messages cannot be intercepted by others Untrusted: messages are susceptible to be intercepted by others
12
Implementation Approach
Proxy-based Browser independent Web application: Google calendar Adapted from Sahi
Sahi Automation and testing tool for web applications Open-source application Based on proxy server technology Browser independent Developed in Java and JavaScript Some Features
Injects JavaScript code into web pages to help record and playback events on the browser
Provides support for Database based testing File read/write APIs for data driven testing HTTP and HTTPS
13
Google Calendar Model
Data Model Calendar
A set of events Event: composed of parameters
Parameters <what, When, Repeats, Where, who, Calendar, Description, Attachment>
What: String – (non ordered data) When: - (ordered data)
start/end date: Date start/end time: (xx:xx am/pm)
Repeats: categorical (daily, weekly, etc) Where: String – (non ordered data) Who (Guests):
Guest id: email Permission: choices (modify event, invite others, see guest list)
Calendar (owner): String – non ordered data Description: String – non ordered data
14
Google Calendar Services Query events
Basic query: any text in any parameter, operation (AND) Advanced: specific parameters, range query, operations (AND, NOT)
Sharing and Invitations Sharing a calendar, Publish a calendar - (embed, public calendars) Event invitations - (invite guests, allow guests to modify events, allow
guests to see the guest lists) Notifications
Types: create, change, cancel invitations SMS (text messaging): mobile phones
Sync Events Microsoft Outlook - options (1-way, 2-way) Other calendars: Apple iCal, Mozilla Sunbird Mobile devices: Windows Mobile, iPhone, BlackBerry
Others Support for many languages
15
Technique 1: Keyword-based Searchable Encryption
Basic Approach Based on keyword encryption
Use a hash function to bucketize the keywords Original plaintext
Parse original text into a set of words W = {W1, W2, …, Wn}, where Wi is a dictionary word
Keyword generation and bucketization Generate keywords from W
Kw = {Kw1, Kw2, …, Kwm}, where Ki is the key selected from W Bucketize the keywords using a hash function – H: {0,1}* ----> {0,1} l
HV = {HV1, …, HVk} Encryption
Encrypt W using a non-deterministic encryption scheme, E(W) Block cipher based encryption Example: AES, Blowfish
Encrypt Kw using a deterministic encryption scheme, E(HV) Examples: RSA
Tag E(HV) to E(W)
16
Technique 2: Order-preserving Encryption (OPE)
Definition Deterministic encryption schemes: preserve numerical order For A,B in N, |A| <= |B| f: A ----> B is order-preserving if for all I, j in A, f(i) > f(j) iff i>j SE = (K, Enc, Dec) is order-preserving if Enc(k, .) is an order-preserving
function for all k output by K. Security
IND-OCPA generalization of IND-DCPA – does not work Based on the approach used to define PRPs Note: order-preserving functions are injective POPF-CCA
POPF: Pseudorandom order-preserving functions SE = (K, Enc, Dec), A an adversary against SE Lazy simple a random order-preserving function (ROPF)
Lazy Sampling Connection: random order-preserving function & HG probability distribution Use HG distribution to lazy sample a ROPF and its inverse
17
Technique 2: OPE of Dates Approach
Uses order-preserving symmetric encryption (OPE) scheme OPE based on Hypergeometric distribution Maps the dates from a domain (D) to a range (R) Domain D: set of dates Range R: set of dates F: D ----> R, where D <= R D={D1, D2, …, Dm}, R={D1, D2, …, Dn}, m<=n Example:
D={01/01/2009-1:00am, 12/31/2009-1:00am} R={01/01/2009-1:00am, 12/31/2011-1:00am} Plaintexts: 06/06/2009 ----------> Cipher: 08/15/2010 Plaintexts: 06/07/2009 ----------> Cipher: 10/25/2010
OPE Uses consecutive numbers Mapping dates to numbers 1 --------------> 30mn X --------------> Y mn X = Y mn / 30mn Examples: 3h30mn = 7, 1 day = 48
18
Technique 2: OPE Proposed Improvement
Approach Use bucketization technique Domain and Range
D = [SD, ED], where SD = start domain date, ED = end domain date R = [SR, ER], where, SR = start range date, ER = end range date
Process Bucketization
Break domain and range into smaller ones D = {D1, D2, …, Dn}, R = {R1, R2, …, Rm}, n<=m Sub-ranges don’t have to be consecutive
Mapping Buckets Use pseudorandom function to deterministically map domain to range Di -----> Ri
Examples Domain= January 2009, Range = 2009 D = {D1, D2, D3}, R = {R1, …, R10} D1 = [1/1/2009, 1/10/2009], ….. R1 = [1/1/2009, 2/15/2009], ….. D1 -----> R4, D2 -----> R10, D3 -----> R1
19
Technique 3: Bucketization Approach
Relation: R = (V, F), where V is a set of values sorted in increasing order and F the set of
corresponding frequencies of V in R Domain:
D = {V1, V2, …, Vn}, Vi<Vj for all i<j Buckets: divide D into k blocks
B = {B1, B2, …, Bk}, |B| = |D|/k Codes:
Used to represent buckets Set of codes: C = {C1, C2, …, Cl}
Mapping buckets to codes Requirements: each bucket needs to be mapped to 1 to l codes Mapping: C(Bi) = {Ci, …, Cj} (increasing onder) Number of mappings for bucket: NM(Bi) = C(k,1) + C(k,2)+ … + C(k,k)=N Number of possible mapping for all buckets: N^k
Bucketization scheme Select one mapping from N^k Goal: maximizing privacy
20
Technique 3: Bucketization Choosing a mapping
Mapping scheme needs to enforce the privacy definition Operations on the scheme
Insertion (encryption) Convert data (Wi) to bucket ID (Bi): Bi(Wi) Map bucket ID (Bi) to corresponding code IDs Result: Wi ---> Bi ---> {Ci, Cj}, size q
Retrieval/Query (decryption) Find bucket Bi for the data Wi Generate q codes for Bi Search and retreive all d codes Filter out the false posive
Range QueryFind all the buckets in the data range Generate a query for each bucketOR the results of the queries after filtering them.
21
Implementation Status Implemented Features
HTTP Proxy Server HTTP Parser Operations: create, modify, query events Two cryptographic algorithms: KDE, OPE
Remaining Features Sharing data Policy management Service adapter Mobile access More encryption algorithms:
bucketization,
22
Questions?
23
Data Storage Model Service provider storage
Client application: embeds application specific queries in HTTP query messages
Both storage data and retrieval of data Server: uses HTTP response messages to respond to application
requests HTTP Request Messages
Request message: <request line, headers, empty line, body (optional)> Methods: HEAD, GET, POST, PUT, DELETE, TRACE, OPTIONS,
CONNECT Data: attribute-value pairs (attribute=value) Sources: query string (request line URL), data string (body in POST),
cookie string (HTTP) HTTP Response Messages
Response message: <status line, response header fields, content body> Data: plaintext (content body)
24
Data Model: Representation Objects Hierarchies
Representation: XML tree Data (attribute/value): resides at the leaf nodes – (represented here by a rectangle) Metadata: internal nodes only
File-Oriented Event-Oriented
25
Services: Query Model Simple query: (structure or content)
Q:= set of words = {w1, w2, …, wn}, where wi is a word Data types: Number, String, Date Operations: AND, OR, NOT, EXACT
Complex query (content) General
Q = set of attributes/predicate pairs = {<a1, p1>, …, <ak, pk>}, where ai is the
attribute and pi is the predicate Data types: Number, String Date, Operations: AND, OR, NOT, EXACT
Range query Q = set of attributes/predicate pairs = {<a1, p1>, …, <ak, pk>}, where there exists
at least one range Range: defined by two pairs <ai, pl>, <ai, ph>,
where pl=lower bound, ph = higher bound Range data types: Number, Date Non range data types: Any Non range operations: AND, OR, NOT, EXACT
26
Services: Sharing and Collaboration Objects
Based on user ID Example: email address
Can be shared at any internal level of the hierarchy Examples: a single event, an entire
calendar
An object can be shared with multiple users Example: an event for a meeting – all
participant can share it
Policies used to set permissions Examples: view only the object, edit the
object, share the object with others
27
Sharing and Collaboration: Approach Key Management (Encryption)
Objects encryption: individual or group Model: <OwnerID, Object, Kenc>
Examples: <Bob, Meeting 1, K1> Objects Sharing: individual or group
Model: <Owner, Target, Object, Keys, Policies> Example: <Bob, Alice, April-Events, K1, P1>
Objects Multiple Sharing Same objects and same policies
Examples: <Bob, Alice, April-Events, K1, P1><Bob, John, April-Events, K1, P1>
Same objects and different policies Examples: <Bob, Alice, April-Events, K1, P1>
<Bob, John, April-Events, K2, P2> Objective
Minimize the number of encryption keys while enforcing the sharing policies and ensuring the confidentiality of data at the server.
28
Sharing and Collaboration: Approach Objective
Minimize the number of keys while enforcing the sharing policies and providing the confidentiality of data at the server
Approach Data: Set of documents D = {D1, D2, … Dn} Document: D = {O1, O2, …, Om} Model: Dt = {N1, N2,…, Nn} (Internal nodes, and leaf nodes)
K = {K1, K2, …, Kn} Complete encryption:
Enc(K)[D] = D* = {N1*, N2*,…, Ni*} Partial encryption:
Enc(K)[D] = D* = {N1, N2,…, Ni} + {N1*, N2*,…, Nj*}, Ni in (NODES*) U (NODES)