security bigdata

19
Big Data Security Top 5 Security Risks and Best Practices Jitendra Chauhan Head R&D, iViZ Security [email protected]

Upload: jitendra-chauhan

Post on 01-Jul-2015

167 views

Category:

Presentations & Public Speaking


0 download

DESCRIPTION

Big Data is the "next" Bg Technology and Business and Hadoop is one of the important framework of Big Data. Hadoop is currently used by Yahoo, EBay and 100s of organisations. As the Big Data use cases will grow, security of Big Data technologies, solutions and applications will become extremely important. In this presentation, I have described top 5 key security challenges related to developing Big Data solutions and applications.

TRANSCRIPT

Page 1: Security bigdata

Big Data Security

Top 5 Security Risks and Best Practices

Jitendra Chauhan

Head R&D, iViZ Security

[email protected]

Page 2: Security bigdata

Agenda

• Key Insights of Big Data Architecture

• Top 5 Big Data Security Risks

• Top 5 Best Practices

Page 3: Security bigdata

Key Insights of Big Data

Architecture

Page 4: Security bigdata

Distributed Architecture(Hadoop as example)

Data Partition, Replication

and Distribution

Auto-tiering

Move the

Code

Page 5: Security bigdata

Real Time, Streaming and Continuous

Computation

No SQL Roadshow| 12

Integration Patterns

Real

timeVariety of

Input

Sources

Adhoc

Queries

Page 6: Security bigdata

Parallel & Powerful Programming

Framework

Example:

• 16TB Data

• 128 MB Chunks

• 82000 Maps

Java vs SQL / PLSQL

Frameworks:

• MapReduce

• Storm Topology

(Spouts & Bolts)

Page 7: Security bigdata

Big Data ArchitectureNo Single Silver Bullet

• Hadoop is already unsuitable for many Big

data problems

• Real-time analytics• Cloudscale, Storm

• Graph computation o Giraph and Pregel (Some examples graph

computation are Shortest Paths, Degree of

Separation etc.)

• Low latency queries

o Dremel

Page 8: Security bigdata

Top 5 Security Risks

Page 9: Security bigdata

Insecure Computation

Sensitive

Info

• Information Leak

• Data Corruption

• DoSHealth Data

Untrusted

Computation program

Page 10: Security bigdata

Input Validation and Filtering

• Input Validationo What kind of data is untrusted?

o What are the untrusted data sources?

• Data Filtering

o Filter Rogue or malicious data

• Challengeso GBs or TBs continuous data

o Signature based data filtering has limitations

How to filter Behavior aspect of data?

Page 11: Security bigdata

Granular Access Controls

• Designed for Performance, almost no

security in mind

• Security in Big Data still ongoing research

• Table, Row or Cell level access control gone

missing

• Adhoc Queries poses additional challenges

• Access Control is disabled by default

Page 12: Security bigdata

Insecure Data Storage

• Data at various nodes, Authentication,

Authorization & Encryption is challenging

• Autotiering moves cold data to lesser secure

medium o What if cold data is sensitive?

• Encryption of Real time data can have

performance impacts

• Secure communication among nodes,

middleware and end users are disabled by

default

Page 13: Security bigdata

Privacy Concerns in Data Mining

and Analytics

• Monetization of Big Data generally involves

Data Mining and Analytics

• Sharing of Results involve multiple

challengeso Invasion of Privacy

o Invasive Marketing

o Unintentional Disclosure of Information

• Exampleso AOL release of Anonymzed search logs, Users can

easily be identified

o Netflix faced a similar problem

Page 14: Security bigdata

Top 5 Best Practices

• Secure your Computation Code• Implement access control, code signing, dynamic

analysis of computational code

• Strategy to prevent data in case of untrusted code

• Implement Comprehensive Input Validation

and Filtering

• Implement validation and filtering of input data, from

internal or external sources

• Evaluate input validation filtering of your Big Data

solution

Page 15: Security bigdata

Top 5 Best Practices

• Implement Granular Access Control• Review Role and Privilege Matrix

• Review permission to execute Adhoc queries

• Enable Access Control

• Secure your Data Storage and Computation• Sensitive Data should be segregated

• Enable Data encryption for sensitive data

• Audit Administrative Access on Data Nodes

• API Security

Page 16: Security bigdata

Top 5 Best Practices

• Review and Implement Privacy Preserving

Data Mining and Analytics• Analytics data should not disclose sensitive

information

• Get the Big Data Audited

Page 17: Security bigdata

Thank You

[email protected]

http://www.ivizsecurity.com/blog/

Page 18: Security bigdata

Big Data ArchitectureKey Insights

• Distributed Architecture & Auto Tiering

• Real Time, Streaming and Continuous

Computation

• Adhoc Queries

• Parallel and Powerful Computation

Language

• Move the Code, Not the data

• Non Relational Data

• Variety of Input Sources

Page 19: Security bigdata

Top 5 Security Risks

• Insecure Computation

• End Point Input Validation and

Filtering

• Granular Access Control

• Insecure Data Storage and

Communication

• Privacy Preserving Data Mining and

Analytics