wisdom from crowds of machines - snia from crowds of machines analytics and big data summit...

Download Wisdom from Crowds of Machines - SNIA  from Crowds of Machines Analytics and Big Data Summit September 19, 2013 Chetan Conikee Irfan Ahmad Wednesday, September 18, 13

Post on 25-Apr-2018

215 views

Category:

Documents

3 download

Embed Size (px)

TRANSCRIPT

  • Wisdom from Crowds of MachinesAnalytics and Big Data Summit

    September 19, 2013

    Chetan Conikee Irfan Ahmad

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    About UsCloudPhysics' mission is to discover the underlying principles that govern systems behavior and use this knowledge to transform computing.

    Chetan ConikeeChief Data Officer

    Founder, SeekVence Data Architect - Intuit Founding Engineer - Business Signatures

    (Entrust), CashEdge (Fiserv), Smartpipes (Sophos)

    Irfan AhmadChief Technology Officer

    VMware tech lead - DRS (Distributed Resource Scheduling) Team

    Transmeta (microprocessors)

    Wednesday, September 18, 13

  • Modeling Rules Engines

    Card Card Card Card CardApps

    ParsersAnalyzersParsers

    HTTPS

    CloudPhysics, Inc. Copyright 2013. All rights reserved.

    HTTPSHigh-resolutionperformance,config, fault,change, event,task data

    Perf StatDatastore

    InventoryDatastore

    Event, TaskDatastore

    Secure multi-tenant architecture

    Schema Normalization

    Web/App Servers

    PredictiveAnalytics Benchmarks

    Scale Out Query Engine

    www.CloudPhysics.com

    Collector vApp

    Scalable Cache

    Ingest

    Collective Intelligence

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    We collect

    100 billion+ data points /day Avg 1.3 million properties/datacenter Continuous time-series data stream VMs, servers, networks, storage Configuration, state, tasks, events

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Our dialect is

    ProductiveFastFunctionalStatically typed

    Actors - an elegant way to handle concurrency

    ... andinter-op Java with Scala

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Our Software ToolSet

    Apache Kafka

    Wednesday, September 18, 13

    http://kafka.apache.org/http://kafka.apache.org/

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Data Pipeline Criteria

    Secure Data Retention NO data-loss guarantee Scale out consumption process on demand Query near-to-real-time Query historical data on demand (replay)

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Secure Data Retention

    Data organized and partitioned by time Stored directly in highly scalable, distributed

    object storage

    Encryption for data at rest - AES-256 Processing pipeline starts from object

    storage

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    How do we scale?

    courtesy: http://www.capricemotorinn.com/Dam1_.html

    Wednesday, September 18, 13

    http://www.capricemotorinn.com/Dam1_.htmlhttp://www.capricemotorinn.com/Dam1_.html

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Circa 2012

    A cluster of Play! + Akka app servers TechCrunch moment at VMworld 2012 Exponential customer growth

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    DOJO Data Collection/Processing Pipeline

    Generation-1

    Object Storage

    2

    AWS S3-Object Store

    HTTP Play! AkkaCluster

    HTTP Play! AkkaCluster

    CloudPhysicsObserver(s)

    HTTP Cluster(Message Router)

    HTTP Cluster(Message Processer)

    3

    415

    5

    5Event Metrics Store

    Configuration Store

    Performance Metrics Store

    KairosDB Cassandra Cluster

    MongoDB Sharded Cluster

    KairosDB Cassandra Cluster

    Storage

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    HTTP Endpoints - Play! , Akka and ScalaHTTP Cluster

    RouterMessageBox Dispatcher

    Actor(s)

    12 3 4

    56

    1 Enqueue

    2 Dequeue

    3 Dispatch

    4 Process

    5 Complete

    6 Next (Repeat 1)

    Request

    Lightweight Objects with Universal Concurrency primitives Message Passing Concurrency & no shared state Location Transparent & distributable by design Scale UP and Scale OUT on demand You get a PERFECT FABRIC for the CLOUD

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Actor Model Design Issues MessageBox and MessageProcessing units

    are tightly coupled

    Increased backlog of streamed messages into the MessageBox can overload an Actors memory footprint

    Negatively Impact responsiveness leading to application crashes

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    courtesy: http://www.earthyreport.com

    Wednesday, September 18, 13

    http://www.earthyreport.com/http://www.earthyreport.com/

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Lets discuss design patterns

    Work-Stealing or Work-Pull Pattern

    https://twitter.com/derekwyatt

    Akka ConcurrencyDerek Wyatt

    A must read for serious Akka hAkkers

    https://twitter.com/jamie_allen

    Effective AkkaJamie Allen

    Wednesday, September 18, 13

    https://twitter.com/derekwyatthttps://twitter.com/derekwyatthttps://twitter.com/jamie_allenhttps://twitter.com/jamie_allen

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Re-Design Use Akka durable unbounded mailbox

    (filesystem or Redis based)

    Use Akka work-stealing pattern and distribute load based on workers processing queue depth (deployed zookeeper for cluster co-ordination and service discovery)

    Akka Cluster was early in development (http://doc.akka.io/docs/akka/snapshot/common/cluster.html#cluster)

    Wednesday, September 18, 13

    http://doc.akka.io/docs/akka/snapshot/common/cluster.html#clusterhttp://doc.akka.io/docs/akka/snapshot/common/cluster.html#cluster

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Decoupling Messaging from Processing - Why?

    Redundancy Scalability Resiliency Ordering

    Guarantee

    Asynchronous Communication

    Replay

    Choices

    JMS AWS SQS ZeroMQ RabbitMQ Kafka

    Producer Queue

    Consumer(s)

    Push

    Push

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    We chose Apache Kafka

    Broker-1

    Topic-1/Part-1 /Part-2

    Topic-2/Part-1

    Broker-2

    Topic-1/Part-1 /Part-2

    Topic-2/Part-1

    Broker-3

    Topic-1/Part-1 /Part-2

    Topic-2/Part-1

    Producer Producer

    Consumer-GroupConsumer 1 ... n

    Consumer-GroupConsumer 1

    1 PUSH

    2 PULL

    Zookeeper

    Kafka has:

    Broker(s) Topic(s)

    Partition(s) Replication

    Zookeeper Ensemble Consumer Grouping Buffering of Messages in

    Producer & Consumer

    Open Sourced by LinkedIn SNA 2011 and written in Scala

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Why Kafka? Disk Layout

    Simple storage Logs in BrokerTopic-1:Partition-1

    Segment-1

    Segment-n

    Topic-1:Partition-2

    Segment-1

    Segment-n

    Zoom

    Message-2

    Message-n

    Message-1

    ..

    Index

    read()append()

    Batch writes and readsZero-copy transfer from files to socketCompression(Batched)- GZIP and Snappy

    Sequential file I/O on commit log

    Message caching in file system page cache- Avoid double buffering and GC overhead

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    DOJO Data Collection/Processing Pipeline

    Generation-2

    HTTP Cluster(Producers)

    Object Storage

    Kafka - Broker

    T1

    T2

    T3

    HTTP Cluster(Consumers)

    2

    3

    4

    5

    AWS S3-Object Store

    HTTP Play! AkkaCluster

    Auto ScalingGroup

    CloudPhysicsObserver(s)

    HTTP Play! AkkaCluster

    Auto ScalingGroup

    1

    6

    6

    HDFS

    Spark Cluster Compute

    4

    Collective IntelligenceKafka

    HDFS Consumer

    ALPHA

    Event Metrics Store

    Configuration Store

    Performance Metrics Store

    KairosDB Cassandra Cluster

    MongoDB Sharded Cluster

    KairosDB Cassandra Cluster

    Storage

    6

    Pull

    Pull

    PullStorage

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Kafka Tuning Considerations

    Advantages Increases I/O parallelism for writes Increases degree of parallelism for consumers Tune memory of consumer based on (number of threads in consumer group) * (queuedchunks.max) * (fetch.size)

    Increase Heap size Reduce consumer thread count Lower fetch size or max queue size

    Overheads More open file handles More offsets to be check-pointed increasing load on zookeeper ensemble

    Increased Partition Configuration (for high volume topics)

    Wednesday, September 18, 13

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Kafka Performance Benchmarks

    courtesy:http://kafka.apache.org

    Message size : 200 bytesBatch size : 200 messages

    Fetch size : 1 MB Flush interval : 600 messages

    Consumer Throughput

    No. of Consumers : 50Data Consumed : 100 MB

    No. of messages consumed : 6000000

    Wednesday, September 18, 13

    http://kafka.apache.orghttp://kafka.apache.org

  • CloudPhysics Copyright 2013. All Rights Reserved.

    Event Metrics Store

    Configuration Store

    Performance Metrics Store

    Query Engine

    In-MemoryCache

    with TTL