back end analytics_platform_2013_v1.0

75
BACK END ANALYTIC PLATFORMs Nguyen Thi Kim Tuyen Version 1.0 01/2013

Upload: sentifi

Post on 15-Apr-2017

425 views

Category:

Data & Analytics


0 download

TRANSCRIPT

BACK END

ANALYTIC PLATFORMs

Nguyen Thi Kim Tuyen

Version 1.001/2013

Agenda Targets Approaches Analytic platforms Map/Reduce GNT Game analytic system (current & testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

Targets

Analytic systems KPI Monitoring (access log , error log) Real-time analytics / batch processing analytic

al tasks for Client Depends GNT infrastructure , Scalability

Approaches

Refer log platforms' achitectures of Facebook , Twitter , ...

Community reviews of each component Adapt needs

Agenda Targets Approaches Analytic platforms Map/Reduce GNT Game analytic system (current & testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

Analytic platform(1/10)

Tracker Action log, error log (nginx) Web log (Play framework) Game user activities log (event-driven logs) Database log (Cassandra, Redis, commit log) Page taging / logfile analytics

Collector ETL Analyzer Reporter

Analytic platform(2/10)Facebook

Facebook Web -> Scribe -> Ptail -> Puma -> HBase http://www.slideshare.net/slrash/2011-

0630hadoopsummit-v5-8469751

=> Collection layer (Flume/Scribe) → Filter layer (Flume) → Batching layer (Coprocessor)

Analytic platform(3/10)Facebook

Analytic platform(4/10 )FacebookCaligraphus-HDFS-ZK = { HDFS, Zookeeper,

Hbase, Hive }

Analytic platform(5/10) FacebookPtail = Parallel Tail

Concurent read : HDFS 2.0 : add sync : lower write-to-read latency

Ptail : read blocking data being written , < 10s latency

Analytic platform(6/10) FacebookPUMA write-path

Analytic platform(7/10) FacebookPUMA read-path

Analytic platform (8/10)Twitter

Analytic platform(9/10)Twitter

Analytic platform(10/10)

Agenda Targets Approaches Analytic Platform Map/Reduce GNT Game analytic system(current/testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

Map Reduce(1/2)

Map Reduce(2/2)

Agenda Targets Approaches Analytic platforms Map/Reduce GNT Game analytic system (current &

testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

GNT Game Analytic System(1/2) (Current)

GNT Game Analytic System(2/2)current

Limitation :– Javscript implementation : limit 1 JS execution

/1 server at time– Scalability : not scale except in case of

sharding Improving : integration Mongo + Hadoop http://www.slideshare.net/iammutex/the-

elephant-in-the-room-mongo-db-hadoop

GNT Game Analytic System(1/4)

GNT Game Analytic System(2/4)testing FlumeNG

flume.conf : 192.168.30.183

t-game-web183.sources = tail-nginx tail-play t-game-web183.sinks = avro-sink-nginx183 avro-sink-play183 t-game-web183.channels = mem-channel-nginx183 mem-channel-

play183 t-game-web183.sources.tail-nginx.type = exec t-game-web183.sources.tail-nginx.command = tail -F

/var/log/nginx/access.log t-game-web183.sources.tail-nginx.channels = mem-channel-nginx183 t-game-web183.channels.mem-channel-nginx183.type = memory t-game-web183.sinks.avro-sink-nginx183.type = avro t-game-web183.sinks.avro-sink-nginx183.hostname = 192.168.30.185 t-game-web183.sinks.avro-sink-nginx183.port = 10183 t-game-web183.sinks.avro-sink-nginx183.channel = mem-channel-

nginx183

GNT Game Analytic System(3/4)testing FlumeNG

flume.conf : 192.168.30.185 t-game-cass185.sinks.hdfs-sink-

nginx183.type = hdfs t-game-cass185.sinks.hdfs-sink-

nginx183.hdfs.path = hdfs://namenode/flume/webdata/nginx183

GNT Game Analytic System(4/4)testing Hadoop

Mixed Solutions

Case 1 : old system : Mongo + Hadoop Case 2 : FlumeNG + Hadoop + HBase Case 3 : Batch processing : Hadoop HDFS

(not use FlumeNG)

GNT Game Analytic system proposal

Agenda Targets Approaches Analytic platforms Map/Reduce GNT Game analytic system (current & testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

Hadoop(1/19)

Hadoop(2/19)

Hadoop(3/19)

Hadoop(4/19)

Hadoop(5/19)

Hadoop(6/19)

Hadoop(7/19)

Hadoop(8/19)

Hadoop(9/19)

Hadoop(10/19)

Hadoop(11/19)

Hadoop(12/19)

Hadoop(13/19)

Hadoop(14/19)

Hadoop(15/19)

Hadoop(16/19)

Hadoop(17/19)

Hadoop(18/19)

Hadoop(19/19)

Agenda Targets Methodologies Log platforms Map/Reduce GNT analytic system (current & testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

Hadoop components(1/2)

Hadoop components(2/2) Flume HBase Sqoop Zookeeper Hue Oozies Pig Whirr Hive Snappy Hbase Mahout

Agenda Targets Approaches Analytic platforms Map/Reduce GNT Game analytic system (current & testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

HBase

Facebook , Twitter using it . Why ? New feature : Coprocessor

HBase

HBaseData model

Table Row Column Cell Versions Row-key design

HBase vs Cassandra

TODO

Agenda Targets Approaches Analytic platforms Map/Reduce GNT Game analytic system (current & testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

Collectors

Flume NG (Collector + ETL) Scribe Chukwa

Which one is suitable for us ?

Flume/FlumeNG(1/10)architecture

Flume/FlumeNG(2/10)Concepts

Network stream : Avro/Syslog/Netcat

Source / Channel /Sink Decorator Flow Event Flume agent Flume avro client / log4j Appender

Flume/FlumeNG(3/10)

Flume source : Avro Netcat Syslog (TCP/UDP) Exec Thrift/Avro legacy Custom

Flume/FlumeNG(4/10) Flume Sink :

HDFS Avro Logger … , FileRoll, Custom

Flume Channel : Memory JDBC channel Recoverable memory channel

FlumeFlumeNG(5/10)Consolidation

Flume/FlumeNG(6/10)Multiplexing

Flume/FlumeNg(7/10)Reliability & Failure handling

Flume/FlumeNG(8/10)Failure handling

Flume/FlumeNG(9/10)

Flume/FlumeNG(10/10)plugin (decorator)

TODO Flume with HBase sink https://groups.google.com/a/cloudera.org/

group/cdh-user/browse_thread/thread/5ee135ad0e720ea9/c5bffc83f97fdd3c?hl=vi&lnk=gst&q=flume-ng#c5bffc83f97fdd3c

Agenda Targets Approaches Analytic platforms Map/Reduce GNT Game analytic system (current & testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

Chukwa(1/2)

Chukwa(2/2)

Agenda Targets Approaches Log platforms Map/Reduce GNT Game analytic system (current & testing) Appendix

Hadoop/Hadoop components Hbase Collectors (FlumeNG/Chukwa/Scribe)

Flume vs Scribe

TODO

Future issues

Manage analytic jobs Message queue : Kafka , ZeroMQ

Monitoring memory , flume agent , hadoop cluster , ...

Scalability

Referrences

Google docs : Cloudera Hadoop begins Hbase begins FlumeNG Log center

Wikipedia :

...