fighting cyber fraud with hadoop

19
1 Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect

Upload: niel-dunnage

Post on 20-Aug-2015

421 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Fighting cyber fraud with hadoop

1

Fighting Cyber Fraud with HadoopNiel DunnageSenior Solutions Architect

Page 2: Fighting cyber fraud with hadoop

2 ©2014 Cloudera, Inc. All rights reserved.

• Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and cyber security, how we preserve privacy whilst exploiting the advantages of data collection and processing. Big Data technologies provide both governments and corporations powerful tools to offer more efficient and personalized services. The rapid adoption of these technologies has of course created tremendous social benefits. Unfortunately unwanted side effects are the potential rich pickings available to those with malicious intentions. Increasingly, the sophisticated cyber attacker is able to exploit the rich array public data to build detailed profiles on their adversaries to support their malicious intentions.

Summary

Page 3: Fighting cyber fraud with hadoop

3 ©2014 Cloudera, Inc. All rights reserved.

• Data: - The new oil• Defend your data• The security value of Big Data

Agenda

Source: Grant Thornton LLP 2014 Corporate General Counsel Survey, conducted by American Lawyer Media

Page 4: Fighting cyber fraud with hadoop

4 ©2014 Cloudera, Inc. All rights reserved.

• DDOS• Data Exfiltration

• Confidential customer records• Transaction data

• Reputation attack• False flag• Fake data

• Insider Threat

Cyber Security:- Data is a valuable commodityOperations designed to deceive in such a way that the operations appear as though they are being carried out by entities, groups or nations other than those who actually planned and executed them http://en.wikipedia.org/wiki/False_flag

@security_511 has continued to support OpSaudi, claiming further attacks on

websites connected to Saudi Aramco.

The @SQLiNairb hacker has released a database dump from a US fantasy football website (http://www.fftoday.com/), claiming that it was timed to coincide with the NFL draft

Anonymous Italy and Operation Green Rights (OpGR) have released the contents of an

email account connected to an Italian steel producer, in connection to accusations of

pollution against the company

Page 5: Fighting cyber fraud with hadoop

5 ©2014 Cloudera, Inc. All rights reserved.

Typical Security Layers

Type Example

Access Physical (lock and key), Virtual (Firewalls, VLANS)

Authentication Logins – verify users are who they say they are

Authorization Permissions – verify what a user can doEncryption at Rest Data protection for files on diskEncryption in transport Data protection on the wire

Auditing Keep track of who accessed what

Policy / Procedure Protect against Human Error & Social Engineering

Page 6: Fighting cyber fraud with hadoop

6

Cloudera’s Approach to Hadoop Security

Compliance-Ready

Comprehensive

Transparent

• Standards-based Authentication• Centralized, Granular Authorization• Native Data Protection• End-to-End Data Audit and Lineage

• Meet compliance requirements• HIPAA, PCI-DSS, …• Encryption and key management

• Security at the core• Minimal performance impact• Compatible with new components• Insight with compliance

6 ©2014 Cloudera, Inc. All rights reserved.

Page 7: Fighting cyber fraud with hadoop

7 ©2014 Cloudera, Inc. All rights reserved.

• Hadoop Security: - Kerberos simplified deployment with Cloudera Manager• Sentry: - provides unified authorization with a single policy

for Hive, Impala and Search• HDFS Extended ACL’s and HBase cell level access control• Navigator encrypt and key trustee deliver compliant data security

• Via Gazzang acquisition• Navigator provides data management layer including audit, access

control reviews, data classification and discovery, and lineage

Defense: - Security Features

Page 8: Fighting cyber fraud with hadoop

8 ©2014 Cloudera, Inc. All rights reserved.

Kerberos Security

Perimeter Security• Guarding access

to the cluster itself

• Technical Concepts:• Authentication

• Network isolation

Kerberos• Kerberos: A computer network authentication protocol that works on basis of tickets to

allow nodes to prove identity to each other in a secure manner using encryption extensively

• Messages are exchanged between:• Client• Server• Kerberos Key Distribution Center (KDC). • Note this is not part of Hadoop, but most Linux Distros come with MIT Kerberos

KDC.• Passwords are not sent across network, Instead passwords are used to compute

encryption keys• Authentication status is cached (don’t need to send credentials with each request)• Timestamps are essential to Kerberos (make sure system clocks are synchronized !)

Page 9: Fighting cyber fraud with hadoop

9 ©2014 Cloudera, Inc. All rights reserved.

Apache Sentry

Access Security Sentry

• Sentry provides unified authorization across multiple access paths• A single authorization policy will be enforced

for Impala, Hive and Search• Role based access at Server, Database, Table or

View granularity• Multi-tenant: Separate policies for each

database / schema

• Access• Defining what users and

applications can do with data

• Technical Concepts:• Permissions

• Authorization

Page 10: Fighting cyber fraud with hadoop

10 ©2014 Cloudera, Inc. All rights reserved.

Cloudera Navigator

Visibility Cloudera Navigator• Auditing and Access Management

• View, granting and revoke permissions across the Hadoop stack• Identify access to a data asset around the time of security breach• Generate alert when a restricted data asset is accessed

• Lineage• Given a data set, trace back to the original source• Understand the downstream impact of purging/modifying a data set

• Metadata Tagging and Discovery• Search through metadata to find data sets of interest• Given a data set, view schema, metadata and policies

• Lifecycle Management• Automate periodic ingestion of data • Compress/encrypt a data set at rest• Purge a dataset/replicate data set to a remote site

• Visibility• Reporting on where data

came from and how it’s being used

• Technical Concepts:• Auditing• Lineage

Page 11: Fighting cyber fraud with hadoop

11 ©2014 Cloudera, Inc. All rights reserved.

Page 12: Fighting cyber fraud with hadoop

12 ©Gazzang gazzang.com/products/cloudencrypt-for-aws

Linux Server / VMEncrypt client

Linux File, Directory

AES-256 Encryption

Process Based ACL’s

GPG

Linux Server / VMKey Trustee Server

Encryption at restNavigator Encrypt and Key Trustee• Encrypt any File, Directory

• AES-256 Encryption

• Unique Access controls• Process Based, NOT users / groups

• 100% Transparent• Separation of Duties

• Key Management• AES encryption keys stored on

separate Key Trustee server• Key manager breach, data is safe• Data Server breach, data is safe

Page 13: Fighting cyber fraud with hadoop

13 ©2014 Cloudera, Inc. All rights reserved.

13

Our Design StrategyThe Enterprise Data Hub

One pool of data

One metadata model

One security framework

One set of system resources

A fully integrated Hadoop ecosystem

Storage

Integration REST (Webhdfs), File (Fuse) Flume, Sqoop

Resource Management YARN

Met

adat

a, N

avig

ator

BatchProcessing

Spark, MAPREDUCE,

HIVE & PIG

Stream Processing

Spark streaming

HDFS Hbase/ Accumulo

TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS

Engines

InteractiveSQL

CLOUDERAIMPALA

InteractiveSearchCLOUDERA

SEARCH

MachineLearning

Spark Mlib,MAHOUT,

Oryx

Math &Statistics

SAS, R

Secu

rity,

Nav

igat

or, S

entr

y

graph.vertices.filter{case(id, _) => id==13669222}.collect

Select CPU_Met from application WHERE (USAGE > 1000)LEFT OUTER JOIN ON application_ID where application_type IS Non_Critical

Page 14: Fighting cyber fraud with hadoop

14

Operational EfficiencyPerform existing workloads faster, cheaper, better

Innovation and AdvantageAsk bigger questions in the pursuit of discovering something incredible

©2013 Cloudera, Inc. All Rights Reserved.

Enterprise Data Hub Users Cases

ETLAcceleration

EDWOptimization

Active Archive

OSINTAnalysis Fraud

Detection

Deep Exploratory

BI

HistoricalCompliance

Log Processing

PerformanceManagement

Risk Manageme

nt

Page 15: Fighting cyber fraud with hadoop

15

Offence:- Fraud Detection

User Cases

• Distributed parallel execution with chained joins• Historical processing at scale• Machine Learning,

malware/anomaly detection, spam filters etc• Combined real time and batch

predictors15

Fully Automated at scale

Page 16: Fighting cyber fraud with hadoop

16 ©2013 Cloudera, Inc. All Rights Reserved.

Big Data EconomicsAsk bigger questions

• Predictably process large data sets• Linear scaling• Robust and economic crypto

security• Creative fail fast innovation• Powers productivity insights

• Increasing infrastructure ROI• Increasing business ROI• Defeating fraudulent activity• Evaluating risk

Ingest

DiscoverPredict

Innovate

16

Page 17: Fighting cyber fraud with hadoop

17 ©2014 Cloudera, Inc. All rights reserved.

storebuffer

Data Ingest• NRT Ingest

• Flume• Optimized to flow real time event data into the

Hadoop cluster• Spark Streaming for near real time micro batch

aggregations• Twitter streaming• Kafka• Log

• API• Bulk Load

• Sqoop for structured• Fuse file system access• API• Web / Hue

• Data Enrichment• Flume interceptors• Kite Morplines module

• Configuration based interceptors that can enrich data. For example extracting facets, entity extraction applying regulatory tags

Client

Client

Client

Client

Agent

Agent

Agent

enrichcollect

Page 18: Fighting cyber fraud with hadoop

18 ©2014 Cloudera, Inc. All rights reserved.

Near Real time Access to threats

• View the geographic distribution of Slowloris DDOS taken from Apache web server logs• Help isolate unpatched

servers• Identify source of attacks

LogUtils.createStream(...) .filter(_.getText.contains(”408 Error")) .countByWindow(Seconds(10))stream.join(historicCounts).filter { case (word, (curCount, oldCount)) => curCount > oldCount}

Page 19: Fighting cyber fraud with hadoop

19

Machine Learning

19

Real-time large-scale machine learning predictive analytics infrastructure build on Hadoop• Collaborative filtering and

recommendation• Classification and regression,• Clustering