achieve big data analytic platform with lambda architecture on cloud

56
Achieve Big Data Analytic Platform with Lambda Architecture on Cloud SPN Infra. , Trend Micro Scott Miao & SPN infra. 9/10/2016 1

Upload: scott-miao

Post on 08-Jan-2017

325 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Achieve big data analytic platform with lambda architecture on cloud

1

Achieve Big Data Analytic Platform with Lambda Architecture on CloudSPN Infra. , Trend MicroScott Miao & SPN infra.9/10/2016

Page 2: Achieve big data analytic platform with lambda architecture on cloud

Who am I

• Scott Miao• RD, SPN, Trend Micro• Hadoop ecosystem about 6

years• AWS for BigData about 3 years• Expertise in

HDFS/MR/HBase/AWS EMR• @takeshimiao• @slideshare

Page 3: Achieve big data analytic platform with lambda architecture on cloud

Agenda

• Why go on Cloud• Common Cloud Services in Trend• Lambda Architecture on Cloud• Servicing Layer as-a Service• What we learned

Page 4: Achieve big data analytic platform with lambda architecture on cloud

Why go on Cloud

Page 5: Achieve big data analytic platform with lambda architecture on cloud

Data volume increases 1.5 ~ 2x every year

Growth becomes 2x

Page 6: Achieve big data analytic platform with lambda architecture on cloud

Return of Investment

• On traditional infra., we put a lot of efforts on services operation

• On the Cloud, we can leverage its elasticities to automate our services

• More focus on innovation !!

Time

Money

Revenue

Cost

Page 7: Achieve big data analytic platform with lambda architecture on cloud

Why AWS ?

Page 8: Achieve big data analytic platform with lambda architecture on cloud

AWS is a leader of IaaS platform

https://www.gartner.com/doc/reprints?id=1-2G2O5FC&ct=150519&st=sbSource: Gartner (May 2015)

Page 9: Achieve big data analytic platform with lambda architecture on cloud

AWS Evaluation

Cost acceptable

Functionalities satisfied

Performance satisfied

Page 10: Achieve big data analytic platform with lambda architecture on cloud

Common Cloud Services in TrendANALYTIC ENGINE + CLOUD STORAGE

Page 11: Achieve big data analytic platform with lambda architecture on cloud

Common Services on the Cloud

Cloud CI/CD

Common Auth

Analytic Engine

Cloud Storage

Page 12: Achieve big data analytic platform with lambda architecture on cloud

AE + CS

Analytic Engine• Computation service

for Trenders• Based on AWS EMR

• Simple RESTful API calls

• Computing on demand• Short live• Long running

• No operation effort• Pay by computing

resources

Cloud Storage• Storage service for

Trenders• Based on AWS S3

• Simple RESTful API calls

• Share data to all in one place

• Metadata search for files

• No operation effort• Pay by storage size

used

Page 13: Achieve big data analytic platform with lambda architecture on cloud

Analytic Engine is a…A common Big Data

computation service on Cloud (AWS)

2

Page 14: Achieve big data analytic platform with lambda architecture on cloud

Major Features in nutshell

14

AE

CS

submitJob

EMRcreateClust

er

Input from• cs path• cs metadata

search• Pig UDFs support

Output to CSwith meta data

UIs

Cost visibility(AWS Cost

explor.)Client logs

(SumoLogic)

Cluster info.(Proxy Gateway)

Visibility• Fully HA• Fully automated• Auto recovery

Page 15: Achieve big data analytic platform with lambda architecture on cloud

Support usecases1. User creates a cluster2. User can create multiple clusters as he/she need3. User submits job to target cluster to run4. AE delivers job to secondary cluster if target cluster

down5. Diff. group of users are not allowed to submit

cluster(s)6. Diff. group of users are not allowed to delete cluster7. Only same group of users are allowed to delete cluster8. User wants to know what their current cost is9. User wants to troubleshoot his/her submitted job10.User wants to observe his/her cluster status

2

Page 16: Achieve big data analytic platform with lambda architecture on cloud

1.User invokes submitJob2.Auth service check user’s credential3.AE knows user name and group4.AE matches the job and deliver it to target cluster5.AE pull data from CS6.Job run on target cluster7.AE output result to CS8. AE sends msg to SNSTopic if user specified

Usecase#3 – User submits job to target cluster to run (1/4)

16

AE SaaSuserssubmitJob

EMR

Cloud Storage

1.

2.

4.

3.

clusterCriteria:

[[‘sched:adhoc’,

‘env:prod’], [“env:prod”]]

group:SPN,tag:

‘sched:routine’,

‘env:prod’

validUser is SPN group

group:SPN,tag:

‘sched:adhoc’,

‘env:prod’

5.

7.

6.8.

Auth Service

Page 17: Achieve big data analytic platform with lambda architecture on cloud

Usecase#3 – User submits job to target cluster to run (2/4)

• Sample payload of submitJob API

2

{ "clusterCriterias": [ { "tags": [ "sechd:adhoc", "env:prod" ] }, { "tags": [ "env:prod" ] } ], "commandArgs": "$inputPaths $outputPaths",// see below

Page 18: Achieve big data analytic platform with lambda architecture on cloud

Usecase#3 – User submits job to target cluster to run (3/4)

2

// see previous "fileDependencies": "s3://path/to/my/main.sh,s3://path/to/my/test.pig", "inputPaths": [ "cs://path/to/my/input/data“ // or you can use metadata search for input data // “csq://first_entry_date:['2016-05-30T09:00:000Z','2016-05-30T09:01:000Z'}” ], "name": "SubmitJob_pig_cs_to_cs_csq", "outputPaths": [ "cs://path/to/my/output/result" ], "tags": [ "env:my-test" ], "notifyTo" : "arn:aws:sns:us-east-1:123456789123:my-sns"}

Page 19: Achieve big data analytic platform with lambda architecture on cloud

Usecase#3 – User submits job to target cluster to run (4/4)

• All existing job types used in on-premise are supported

• Pure MR• Pig and UDFs• Hadoop streaming– Python, Ruby, etc

2

Page 20: Achieve big data analytic platform with lambda architecture on cloud

Usecase#8 – User wants to know what their current cost is (1/2)

20

• Billing & Cost management -> Cost Explorer -> Launch Cost Explorer• Filtered by

• tags: “sys = ae“ and “comp = emr” and “other = <your-cluster-name>”• Group by Service

Page 21: Achieve big data analytic platform with lambda architecture on cloud

2

Usecase#8 – User wants to know what their current cost is (2/2) - Billing and Cost Analysis

• Attach tags to your AWS resourcesTag Key Tag Value (sample) Description

name aesaas-s-11-api *optional* for AWS cost explorerstack aesaas-s-11 *optional* for AWS cost explorerservice aesaas *optional* for AWS cost explorer

owner spn *required* the bill is under whose budget

env prod|stg|dev *required* environment typesys ae *required* the system name

comp api-server|emr *required* the subcomponent name

other spn-stg *optional* an optional tag that free for other usage.

Page 22: Achieve big data analytic platform with lambda architecture on cloud

Why we use AE instead of EMR directly ?• Abstraction

• Avoid locked-in• Hide details impl. behind the scene

• AWS EMR was not design for long running jobs• >= AMI-3.1.1 – 256 ACTIVE or PENDING jobs

(STEPs)• < AMI-3.1.1 – 256 jobs in total

• Better integrated with other common services• Keep our hands off from AWS native codes

• Centralized Authentication & Authorization• Leverage our internal LDAP server• No AWS tokens for user

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/AddingStepstoaJobFlow.html

Page 23: Achieve big data analytic platform with lambda architecture on cloud

Lambda Architecture on Cloud

Page 24: Achieve big data analytic platform with lambda architecture on cloud

Next Phase

Cloud Infra.

AE-v1.0

AE + CS

(v1.1~)

Lambda

arch.

24

Page 25: Achieve big data analytic platform with lambda architecture on cloud

What is Lambda (λ) Architecture

2

Page 26: Achieve big data analytic platform with lambda architecture on cloud

Data Ingestio

n

Batch Layer

Master Dataset

Speed Layer

Streaming Processing

Batch Processing Batch View

Merged View

Real-Time View

Serving Layer

Data Access

API

Batch Layer as-a Service

Serving Layer as-a Service

A data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods

https://en.wikipedia.org/wiki/Lambda_architecture

Page 27: Achieve big data analytic platform with lambda architecture on cloud

Servicing Layer as-a ServiceMETADATA STORE

Page 28: Achieve big data analytic platform with lambda architecture on cloud

GoalsHelp everyone to easily access metadata shared by several teams

• Access data in one place• Avoid storage duplication• Share immediately to all• Provide unified intelligence

Common metadata storage for several services• Abstract to hide infra & ops• Customize for different needs

28

(on aws)

Page 29: Achieve big data analytic platform with lambda architecture on cloud

Usecase• Store all threat entities into one place from new

born– Every team can leverage contributions from other teams

at very early stage

2

Page 30: Achieve big data analytic platform with lambda architecture on cloud

Features

30

Metadata Store Service

Random Writes

Bulk Writes

Sync Query

Async Query

Automatic ProvisionCustomizable Schema

Unified Intelligence Threat Monitor

Page 31: Achieve big data analytic platform with lambda architecture on cloud

Borrow idea from Star Schema• A schema design widely used in data

warehousing

31

Historical data – measurements or metrics for a specific event

Descriptive attributes – characteristics to describe and select the fact data

Page 32: Achieve big data analytic platform with lambda architecture on cloud

Basic Idea

• Refer to Star Schema design– Fact table• Put all records into this table (Single Source

of Truth)• Affordable for random and bulk load of writes• Fast random reads by rowkey

– Dimension table• Fast and flexible info. discovery• Get rowkey of records stored in Fact table• Then retrieve records by rowkey

Page 33: Achieve big data analytic platform with lambda architecture on cloud

Reference Implementation – Part 1• This Star Schema concept can be fulfill by

different impl.• A famous one is HBase + Indexer + Solr

http://www.hadoopsphere.com/2013/11/the-evolving-hbase-ecosystem.htmlhttps://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html

Page 35: Achieve big data analytic platform with lambda architecture on cloud

DimensionTables

Schema

Dimension TablesEngine:Elastic Search

Dimension TablesEngine:MySQL (RDS)

Dimension TablesEngine:

Dynamo DB

Propagate data to dimension storage

35

Fact Tables(Dynamo DB)

Propagator

Dynamo DB Streams

Propagation Rules

Random Writes

Bulk Writes

(Eventually Consistent)

Page 36: Achieve big data analytic platform with lambda architecture on cloud
Page 39: Achieve big data analytic platform with lambda architecture on cloud

What we learnedFROM BIG DATA ON CLOUD

Page 40: Achieve big data analytic platform with lambda architecture on cloud

Pros & ConsAspects IDC AWSData Capacity Limited by

physical rack space

No limitation in seasonable amount

Computation Capacity

Limited by physical rack space

No limitation in seasonable amount

DevOps Hard, due to on physical machine/ VM farm

Easy, due to code is everything (CI/CD)

Scalability Hard, due to on physical machine/ VM farm

Easy, relied on ELB, Autoscaling group from AWS

Page 41: Achieve big data analytic platform with lambda architecture on cloud

Pros & Cons

Aspects IDC AWSDisaster Recovery

Hard, due to on physical machine/ VM farm

Easy, due to code is everything

Data Location Limited due to IDC location

Various and easy due to multiple regions of AWS

Cost Implied in Total Cost of Ownership

Acceptable cost with Cost Conscious DesignSomething more details…

Page 42: Achieve big data analytic platform with lambda architecture on cloud
Page 43: Achieve big data analytic platform with lambda architecture on cloud

We Are Hiring !

Page 44: Achieve big data analytic platform with lambda architecture on cloud

Backup

Page 45: Achieve big data analytic platform with lambda architecture on cloud

AE SaaS Architecture Design

Page 46: Achieve big data analytic platform with lambda architecture on cloud

IDC

High Level Architecture Design

46

AZb

AE API servers

RDS

Private ELB

AZa

AZb

AZc

AE API servers

RDS

services

services

services

peering

HTTPS

EMR

EMR

Cross-accountS3 buckets

Time based Auto

Scaling group

workers

workersMulti-AZs

Auto Scaling group

Time based Auto

Scaling group

Eureka

Eureka

VPN

HTTPS/HTTP Basic

Cloud StorageInternet

HTTPS/HTTP Basic

Amazon SNS

Oregon (us-west-2) SJC1

SPN VPC

CI slave

Splunk forward

er

peering

VPN

Splunk

peering

Page 47: Achieve big data analytic platform with lambda architecture on cloud

What is Netflix Genie

• A practice from Netflix• A hadoop client to submit jobs to EMR• Flexible data model design to adopt diff

kind of cluster• Flexible Job/cluster matching design

(based on tags)• Cloud characteristics built-in design– e.g. auto-scaling, load-balance, etc

• It’s goal is plain & simple• We use it as an internal component

47https://github.com/Netflix/genie/wiki

Page 48: Achieve big data analytic platform with lambda architecture on cloud

What is Netflix Eureka• Is a RESTful service• Built by Netflix• A critical component for Genie to do Load Balance

and failover

48

Genie

API API API

Page 49: Achieve big data analytic platform with lambda architecture on cloud

05/02/2023

Confidential | Copyright 2016 TrendMicro Inc. 49

AWS EMR (Elastic MapReduce)

Page 52: Achieve big data analytic platform with lambda architecture on cloud

2

Page 53: Achieve big data analytic platform with lambda architecture on cloud

05/02/2023

Confidential | Copyright 2016 TrendMicro Inc. 53

Lessons Learned on AWS details

Page 54: Achieve big data analytic platform with lambda architecture on cloud

Different types of Auto-scaling group

54

Service Auto Scaling Group Type

Features ProvisionDeploy/Config Method

OpsWorks

24/7•manual creation/deletion•configure one instance for one AZ

• CloudFormation• AWS::OpsWorks

::Instance. AutoScalingType

chef recipe

time-based

•can specify time slot(s) based on hour unit, on everyday or any day in week•configure one instance for one AZ

load-based

•can specify CPU/MEM/workload avg. based on an OPS layer•UP: when to increase instances•Down: when to decrease instances•No max./min. # of instances setting•configure one instance for one AZ

EC2 •can set max./min. for # of instance•Multi-AZs support

• CloudFormation• AWS::AutoScali

ng::AutoScalingGroup

• AWS::AutoScaling::LaunchConfiguration

user-data

Page 55: Achieve big data analytic platform with lambda architecture on cloud

ELB + Auto-Scaling Group

• ELB– Health Check

• Determining the route for coming requests• Auto-Scaling Groups–Monitoring EC2 instance by CloudWatch– If EC2 abnormal, then terminate and start a

new one• ELB + Auto-Scaling Group– Auto attach/detach EC2 instance(s) to ELB

if Auto-Scaling Group launch/terminate EC2

http://docs.aws.amazon.com/autoscaling/latest/userguide/autoscaling-load-balancer.html

Page 56: Achieve big data analytic platform with lambda architecture on cloud

Auto Recovery based on Monit• OpsWorks already use Monit for Auto

Recovery– Leverage the Monit on EC2– Have practices in on-premise

2

AZ1 AZ2

API serve

r

API serve

r

https://mmonit.com/monit/

Auto Scaling group• Instance check

by CloudWatch• Process check

by Monit

• No process – restart process

• Process health check failed – terminate EC2

• Terminate EC2 !Auto Scaling group launch new EC2