big data security with hadoop
TRANSCRIPT
![Page 1: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/1.jpg)
1
Big Data Security Joey Echeverria | Principal Solu8ons Architect [email protected] | @fwiffo
©2013 Cloudera, Inc.
![Page 2: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/2.jpg)
Big Data Security
EARLY DAYS
2
![Page 3: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/3.jpg)
Hadoop File Permissions
• Added in HADOOP-‐1298 • Hadoop 0.16 • Early 2008
• Authoriza8on without authen8ca8on • POSIX-‐like RWX bits
3
![Page 4: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/4.jpg)
MapReduce ACLs
• Added in HADOOP-‐3698 • Hadoop 0.19 • Late 2008
• ACLs per job queue • Set a list of allowed users or groups per opera8on
• Job submission • Job administra8on
• No authen8ca8on
4
![Page 5: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/5.jpg)
Securing a Cluster Through a Gateway
• Hadoop cluster runs on a private network • Gateway server dual-‐homed (Hadoop network and public network)
• Users SSH onto gateway • Op8onally can create an SSH proxy for jobs to be submi`ed from the client machine
• Provides minimum level of protec8on
5
![Page 6: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/6.jpg)
Big Data Security
WHY SECURITY MATTERS
6
![Page 7: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/7.jpg)
Prevent Accidental Access
• Don’t let users shoot themselves in the foot • Main driver for early features • Not security per-‐se, but a cri8cal first step • Doesn’t require strong authen8ca8on
7
![Page 8: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/8.jpg)
Stop Malicious Users
• Early features were necessary, but not sufficient • Security has to get real • Hadoop runs arbitrary code • Implicit trust doesn’t prevent the insider threat
8
![Page 9: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/9.jpg)
Co-‐mingle All Your Data
• Ofen overlooked • Big data means gegng rid of stovepipes
• Scalability and flexibility are only 50% of the problem • Trust your data in a mul8-‐tenant environment
• Most cri8cal driver
9
![Page 10: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/10.jpg)
Big Data Security
AN EVOLVING STORY
10
![Page 11: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/11.jpg)
Authoriza8on
• Files • MapReduce/YARN job queues • Service-‐level authoriza8on
• Whitelists and blacklists of hosts and users
11
![Page 12: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/12.jpg)
Authen8ca8on
• HADOOP-‐4487 • Hadoop 0.22 and 0.20.205 • Late 2010
• Based on Kerberos and internal delega8on tokens • Provides strong user authen8ca8on • Also used for service-‐to-‐service authen8ca8on
12
2.2 High Level Use Cases 2 USE CASES
2.2 High Level Use Cases
1. Applications accessing files on HDFS clusters Non-MapReduce ap-plications, including hadoop fs, access files stored on one or more HDFSclusters. The application should only be able to access files and servicesthey are authorized to access. See figure 1. Variations:
(a) Access HDFS directly using HDFS protocol.(b) Access HDFS indirectly though HDFS proxy servers via the HFTP
FileSystem or HTTP get.
Name Node
Data Node
kerb(joe)
kerb(hdfs)
block token
ApplicationMapReduce
Task
block token
delg(joe)
Figure 1: HDFS High-level Dataflow
2. Applications accessing third-party (non-Hadoop) services Non-MapReduce applications and MapReduce tasks accessing files or opera-tions supported by third party services. An application should only beable to access services they are authorized to access. Examples of third-party services:
(a) Access NFS files(b) Access ZooKeeper
3. User submitting jobs to MapReduce clusters A user submits jobs toone or more MapReduce clusters. Jobs can only be submitted to queuesthe user is authorized to use. The user can disconnect after job submissionand may re-connect to get job status. Jobs may need to access files storedon HDFS clusters as the user as described in case 1). The user needsto specify the list of HDFS clusters for a job at job submission. Jobsshould only be able to access only those HDFS files or third-party servicesauthorized for the submitting user. See figure 2. Variations:
(a) Job is submitted via JobClient protocol(b) Job is submitted via Web Services protocol (Phase 2)
4
![Page 13: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/13.jpg)
Encryp8on
• Over the wire encryp8on for some socket connec8ons
• RPC encryp8on added soon afer Kerberos • Shuffle encryp8on (HTTPS) added in Hadoop 2.0.2-‐alpha, back ported to CDH4 MR1
• HDFS block streamer encryp8on added in Hadoop 2.0.2-‐alpha
• Volume-‐level encryp8on for data at rest
13
![Page 14: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/14.jpg)
Big Data Security
SECURITY FOR KEY VALUE STORES
14
![Page 15: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/15.jpg)
Apache Accumulo
• Robust, scalable, high performance data storage and retrieval system
• Built by NSA, now an Apache project • Based on Google’s BigTable • Built on top of HDFS, ZooKeeper and Thrif • Iterators for server-‐side extensions • Cell labels for flexible security models
15
![Page 16: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/16.jpg)
Data Model
• Mul8-‐dimensional, persistent, sorted map • Key/Value store with a twist • A single primary key (Row ID) • Secondary key (Column) internal to a row
• Family • Qualifier
• Per-‐cell 8mestamp
16
![Page 17: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/17.jpg)
Cell-‐Level Security
• Labels stored per cell • Labels consist of Boolean expressions (AND, OR, nes8ng)
• Labels associated with each user • Cell labels checked against user’s labels with a built-‐in iterator
17
![Page 18: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/18.jpg)
Pluggable Authen8ca8on
• Currently supports username/password authen8ca8on backed by ZooKeeper
• ACCUMULO-‐259 • Targeted for Accumulo 1.5.0
• Authen8ca8on info replaced with generic tokens • Supports mul8ple implementa8ons (e.g. Kerberos)
18
![Page 19: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/19.jpg)
Applica8on Level
• Accumulo ofen paired with applica8on level authen8ca8on/authoriza8on
• Accumulo users created per applica8on • Each applica8on granted access level of most permi`ed user
• Applica8on authen8cates users, grabs user authoriza8ons, passes user labels with requests
19
![Page 20: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/20.jpg)
Apache HBase
• Also based on Google’s BigTable • Started as a Hadoop contrib project • Supports column-‐level ACLs • Kerberos for authen8ca8on • Discussion and early prototypes of cell-‐level security ongoing
20
![Page 21: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/21.jpg)
Big Data Security
FUTURE
21
![Page 22: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/22.jpg)
Encryp8on for Data at Rest
• Need mul8ple levels of granularity • Encryp8on keys 8ed to authoriza8on labels (like Accumulo labels or HBase ACLs)
• APIs for file-‐level, block-‐level, or record-‐level encryp8on
22
![Page 23: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/23.jpg)
Hive Security
• Column-‐level ACLs • Kerberos authen8ca8on • AccessServer
23
![Page 24: Big Data Security with Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022051000/55d4faa8bb61eb3f428b460b/html5/thumbnails/24.jpg)
24 ©2013 Cloudera, Inc.