9 – fighting cyber fraud with hadoop

25
1 Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect

Upload: vantu

Post on 13-Feb-2017

233 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 9 – Fighting Cyber Fraud with Hadoop

1

Fighting Cyber Fraud with Hadoop Niel Dunnage

Senior Solutions Architect

Page 2: 9 – Fighting Cyber Fraud with Hadoop

2 ©2014 Cloudera, Inc. All rights reserved.

Big Data is an increasingly powerful enterprise asset with many potential user cases in this case we’ll explore the relationship between big data and cyber security.

Summary

Page 3: 9 – Fighting Cyber Fraud with Hadoop

3

Quick facts Founded 2008, by former employees of

Employees Over 750

Global 24x7 Support Follow-the-sun capability; Pro-active & Predictive Support Programs Dedicated Support Engineers; Support Centers in NA, Europe & Asia

Professional Services World class services delivery teams worldwide

Mission Critical Thousands of enterprise customers rely on Cloudera: 50% of the Fortune 50; 65% of the Fortune 500, Top Defense & Intelligence Agencies

The Largest Ecosystem Over 1200 Members of our Partner Program, ClouderaConnect

Cloudera University Over 40,000 people trained around the world

Open Source Leaders Cloudera employees are founders of most of the Apache Hadoop ecosystem projects, and leading contributors to all of them

Page 4: 9 – Fighting Cyber Fraud with Hadoop

4

2008 CLOUDERA FOUNDED BY MIKE OLSON AMR AWADALLAH & JEFF HAMMERBACHER

2009 HADOOP CREATOR

DOUG CUTTING JOINS CLOUDERA

2009 CLOUDERA RELEASES CDH THE FIRST COMMERCIAL APACHE HADOOP DISTRIBUTION

2010 CLOUDERA MANAGER:

FIRST MANAGEMENT APPLICATION FOR

HADOOP

2011 CLOUDERA REACHES 100 PRODUCTION CUSTOMERS

2011 CLOUDERA UNIVERSITY

EXPANDS TO 140 COUNTRIES

2012 CLOUDERA ENTERPRISE 4 THE STANDARD FOR HADOOP IN THE ENTERPRISE

2012 CLOUDERA CONNECT

REACHES 300 PARTNERS

2014 THE ENTERPRISE DATA HUB LAUNCHED

2013 CLOUDERA IMPALA CLOUDERA NAVIGATOR CLOUDERA SEARCH

2013 TOM REILLY JOINS AS CEO

OVER 800 PARTNERS IN CLOUDERA CONNECT

CDH Cloudera Manager

CLOUDERA ENTERPRISE

4

ASK BIGGER QUESTIONS

ENTERPRISE DATA HUB

Leading the way in data management powered by Hadoop

Page 5: 9 – Fighting Cyber Fraud with Hadoop

5

Agenda Data: - The new oil

• How to scale out with unreliable workers

• How enterprises pool and share storage and computation resources

• Enterprise data governance and growing up Hadoop security

• Deploying machine learning at scale

• Empowering creative data science

©2014 Cloudera, Inc. All rights reserved.

Page 6: 9 – Fighting Cyber Fraud with Hadoop

6 ©2014 Cloudera, Inc. All rights reserved.

This Morning

Page 7: 9 – Fighting Cyber Fraud with Hadoop

7 ©2014 Cloudera, Inc. All rights reserved.

• DDOS

• Data Exfiltration • Confidential customer records

• Transaction data

• Reputation attack • False flag

• Fake data

• Insider Threat

Cyber Security:- Data is a valuable commodity

Operations designed to deceive in such a way that the operations appear as though they are being carried out by entities, groups or nations other than those who actually planned and executed them http://en.wikipedia.org/wiki/False_flag

@security_511 has continued to support OpSaudi, claiming further

attacks on websites connected to Saudi Aramco.

The @SQLiNairb hacker has released a database dump from a US fantasy football website (http://www.fftoday.com/), claiming that it was timed to coincide with the NFL draft

Anonymous Italy and Operation Green Rights (OpGR) have released the contents of an email account connected to an Italian steel producer, in

connection to accusations of pollution against the company

Page 8: 9 – Fighting Cyber Fraud with Hadoop

9

Cloudera’s Approach to Hadoop Security

Compliance-Ready

Comprehensive

Transparent

• Standards-based Authentication • Centralized, Granular Authorization • Native Data Protection • End-to-End Data Audit and Lineage

• Meet compliance requirements • HIPAA, PCI-DSS, … • Encryption and key management

• Security at the core • Minimal performance impact • Compatible with new components • Insight with compliance

9 ©2014 Cloudera, Inc. All rights reserved.

Page 9: 9 – Fighting Cyber Fraud with Hadoop

10

Operational Efficiency Perform existing workloads faster, cheaper, better

Innovation and Advantage Ask bigger questions in the pursuit of discovering something incredible

©2013 Cloudera, Inc. All Rights Reserved.

Enterprise Data Hub Users Cases

ETL Acceleration

EDW Optimization

Active Archive

OSINT Analysis

Fraud Detection

Deep Exploratory

BI

Historical Compliance

Log Processing

Performance Management

Risk Manageme

nt

Page 10: 9 – Fighting Cyber Fraud with Hadoop

11

Our Design Strategy The Enterprise Data Hub

©2014 Cloudera, Inc. All rights reserved. 11

One pool of data

One metadata model

One security framework

One set of system resources

A fully integrated Hadoop ecosystem

Storage

Integration REST (Webhdfs), File (Fuse) Flume, Sqoop

Resource Management YARN

Met

adat

a, N

avig

ato

r

Batch Processing

Spark, MAPREDUCE,

HIVE & PIG

Stream Processing

Spark streaming

HDFS Hbase/ Accumulo

TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS

Engines

Interactive

SQL

CLOUDERA IMPALA

Interactive

Search

CLOUDERA SEARCH

Machine

Learning Spark

Mlib,MAHOUT,Oryx

Math &

Statistics

SAS, R

Secu

rity

, N

avig

ato

r, S

entr

y

graph.vertices.filter{case(id, _) => id==13669222}.collect

Select CPU_Met from application WHERE (USAGE > 1000) LEFT OUTER JOIN ON application_ID where application_type IS Non_Critical

Page 11: 9 – Fighting Cyber Fraud with Hadoop

12

Offence:- Fraud Detection

User Cases

• Distributed parallel execution with chained joins

• Historical processing at scale

• Machine Learning, malware/anomaly detection, spam filters etc

• Combined real time and batch predictors

12

Fully Automated at scale

Page 12: 9 – Fighting Cyber Fraud with Hadoop

13

Big Data Economics Ask bigger questions

• Agile (2 week cycle) • Linear scaling • Robust and economic crypto

security • Creative fail fast innovation • Powers productivity insights

• Increasing infrastructure ROI • Increasing business ROI • Defeating fraudulent activity • Evaluating risk

Ingest

Discover Predict

Innovate

©2013 Cloudera, Inc. All Rights Reserved. 13

Page 13: 9 – Fighting Cyber Fraud with Hadoop

14

store buffer

Data Ingest

• NRT Ingest • Flume

• Optimized to flow real time event data into the Hadoop cluster

• Spark Streaming for near real time micro batch aggregations

• Twitter streaming • Kafka • Log

• API

• Bulk Load • Sqoop for structured • Fuse file system access • API • Web / Hue

• Data Enrichment • Flume interceptors • Kite Morplines module

• Configuration based interceptors that can enrich data. For example extracting facets, entity extraction applying regulatory tags

©2014 Cloudera, Inc. All rights reserved.

Client

Client

Client

Client

Agent

Agent

Agent

enrich collect

Page 14: 9 – Fighting Cyber Fraud with Hadoop

15

Near Real time Access to threats

• View the geographic distribution of Slowloris DDOS taken from Apache web server logs

• Help isolate unpatched servers

• Identify source of attacks

©2014 Cloudera, Inc. All rights reserved.

LogUtils.createStream(...) .filter(_.getText.contains(”408 Error")) .countByWindow(Seconds(10)) stream.join(historicCounts).filter { case (word, (curCount, oldCount)) => curCount > oldCount }

Page 15: 9 – Fighting Cyber Fraud with Hadoop

16

Machine Learning

16

Real-time large-scale machine learning predictive analytics infrastructure build on Hadoop • Collaborative filtering and

recommendation • Classification and

regression, • Clustering

Page 16: 9 – Fighting Cyber Fraud with Hadoop

17

VARs and Monte Carlo Simulations “Under reasonable circumstances, how much can you expect to lose?”

• “Monte Carlo simulation, involves posing thousands or millions of random market scenarios and observing how they tend to affect a portfolio of financial instruments”

• VAR based on TimePeriod, Portfolio and Confidence level

• This technique is easily parallelizable and as such is a great fit for Hadoop and Spark in particular

• Until recently required complex MPI C++ code

• Easily implemented in Hadoop and feasible across hierarchies of financial instruments (P&L Accounts)

• Backtest to validate the VAR • Curation of Market Factors is important • Can shape portfolio investments for

instruments that trial as loss making

©2014 Cloudera, Inc. All rights reserved.

Page 17: 9 – Fighting Cyber Fraud with Hadoop

18

Applying VAR Techniques to Cyber Threat Monitoring with Hadoop

• Historical event data processing at scale

• Hadoop as a service shared with financial governance applications

• Treat £££s spent on vendors and software like instruments and portfolios?

• Anomaly detection of network traffic by learning what is normal

• Siloed applications have previously made it hard to have a tangible value of finanicial risk.

• Risk calculations tend towards the subjective ie low (FIS APT) high (insider threat)

©2014 Cloudera, Inc. All rights reserved.

Page 18: 9 – Fighting Cyber Fraud with Hadoop

19

Internal Threat Dashboard

Ranked List of High Risk Personnel:

Name Risk Score

Kim Burgess 94

Guy Hughes 93

Jeff Maclaen 87

Ed Snowden 86

Mary Smith 82

Customers with Risk Scores that Recently Changed

Name Old Score

New Score

John Smith 34 94

Rob Jones 26 93

Jim Fisher 17 87

Henry Johnson 45 86

Sue Leefield 12 82

Overall Risk Assessment:

Risk Per Category: Online Banking Access: Public Records: Financial transaction rate: Online Activity: Social Media Activity: Regular purchases Foreign Travel:

Open Cases:

Name Risk Score Customers

Dodgy Ecomm.biz 94 John Smith, Rob Jones.

Brentford Shopping Centre 93 Jim Fisher, Henry Johnson

Page 19: 9 – Fighting Cyber Fraud with Hadoop

20

Analytics

Page 20: 9 – Fighting Cyber Fraud with Hadoop

21 ©2014 Cloudera, Inc. All rights reserved.

• Hadoop Security: - Kerberos simplified deployment with Cloudera Manager

• Sentry: - provides unified authorization with a single policy for Hive, Impala and Search

• HDFS Extended ACL’s and HBase cell level access control

• Navigator encrypt and key trustee deliver compliant data security • Via Gazzang acquisition

• Navigator provides data management layer including audit, access control reviews, data classification and discovery, and lineage

Defense: - Security Features

Page 21: 9 – Fighting Cyber Fraud with Hadoop

22

Kerberos Security

Perimeter Security

• Guarding access to the cluster

itself

• Technical Concepts:

• Authentication

• Network isolation

Kerberos • Kerberos: A computer network authentication protocol that works on basis of

tickets to allow nodes to prove identity to each other in a secure manner using encryption extensively

• Messages are exchanged between:

• Client • Server • Kerberos Key Distribution Center (KDC). • Note this is not part of Hadoop, but most Linux Distros come with MIT

Kerberos KDC. • Passwords are not sent across network, Instead passwords are used to compute

encryption keys • Authentication status is cached (don’t need to send credentials with each request) • Timestamps are essential to Kerberos (make sure system clocks are synchronized !)

©2014 Cloudera, Inc. All rights reserved.

Page 22: 9 – Fighting Cyber Fraud with Hadoop

23

Apache Sentry

Access Security Sentry

©2014 Cloudera, Inc. All rights reserved.

• Sentry provides unified authorization across multiple access paths

• A single authorization policy will be enforced for Impala, Hive and Search

• Role based access at Server, Database, Table or View granularity

• Multi-tenant: Separate policies for each database / schema

• Access

• Defining what users and applications can do with

data

• Technical Concepts:

• Permissions

• Authorization

Page 23: 9 – Fighting Cyber Fraud with Hadoop

24

Cloudera Navigator

Visibility Cloudera Navigator

©2014 Cloudera, Inc. All rights reserved.

• Auditing and Access Management • View, granting and revoke permissions across the Hadoop stack • Identify access to a data asset around the time of security breach • Generate alert when a restricted data asset is accessed

• Lineage • Given a data set, trace back to the original source • Understand the downstream impact of purging/modifying a data set

• Metadata Tagging and Discovery • Search through metadata to find data sets of interest • Given a data set, view schema, metadata and policies

• Lifecycle Management • Automate periodic ingestion of data • Compress/encrypt a data set at rest • Purge a dataset/replicate data set to a remote site

• Visibility

• Reporting on where data came from and how it’s

being used

• Technical Concepts:

• Auditing

• Lineage

Page 24: 9 – Fighting Cyber Fraud with Hadoop

25 ©2014 Cloudera, Inc. All rights reserved.

Page 25: 9 – Fighting Cyber Fraud with Hadoop

26 ©Gazzang gazzang.com/products/cloudencrypt-for-aws

Linux Server / VM Encrypt client

Linux File, Directory

AES-256 Encryption

Process Based ACL’s

GPG

Linux Server / VM Key Trustee Server

Encryption at rest Navigator Encrypt and Key Trustee

• Encrypt any File, Directory • AES-256 Encryption

• Unique Access controls • Process Based, NOT users / groups

• 100% Transparent

• Separation of Duties

• Key Management • AES encryption keys stored on

separate Key Trustee server • Key manager breach, data is safe

• Data Server breach, data is safe