apache apex (next gen hadoop) as a part of bigtop deployment stack
TRANSCRIPT
![Page 1: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/1.jpg)
© 2016 DataTorrent
Chinmay KolhatkarCommitter, Apache Apex
Engineer, DataTorrentMarch 23, 2017
Apache Apex-Bigtop
![Page 2: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/2.jpg)
© 2016 DataTorrent
Agenda
2
•About Apache Apex•Apex Platform Overview•Apex - Native Hadoop Integration•Apex Malhar Library•Apex as a Bigtop component• Installing Bigtop Apex•Apex Docker sandbox•Apex Docker sandbox Demo
![Page 3: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/3.jpg)
© 2016 DataTorrent
About Apache Apex
3
•Platform and runtime engine that enables development of scalable and fault-tolerant distributed applications
•Hadoop native (Hadoop >= 2.2)No separate service to manage stream processingStreaming Engine built into Application Master and
Containers•Process streaming or batch big data•High throughput and low latency•Library of commonly needed business logic•Write any custom business logic in your application
![Page 4: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/4.jpg)
© 2016 DataTorrent
Apex Platform Overview
4
![Page 5: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/5.jpg)
An Apex Application is a DAG(Directed Acyclic Graph)
A DAG is composed of vertices (Operators) and edges (Streams).A Stream is a sequence of data tuples which connects operators at end-points called PortsAn Operator takes one or more input streams, performs computations & emits one or more output streams
● Each operator is USER’s business logic, or built-in operator from our open source library● Operator may have multiple instances that run in parallel
![Page 6: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/6.jpg)
© 2016 DataTorrent
Apex - Native Hadoop Integration
6
• YARN is the resource manager
• HDFS used for storing any persistent state
![Page 7: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/7.jpg)
© 2016 DataTorrent
Apex Malhar Library
7
RDBMS• Vertica• MySQL• Oracle• JDBC
NoSQL• Cassandra, Hbase• Aerospike, Accumulo• Couchbase/ CouchDB• Redis, MongoDB• Geode
Messaging• Kafka• Solace• Flume, ActiveMQ• Kinesis, NiFi
File Systems• HDFS/ Hive• NFS• S3
Parsers• XML • JSON• CSV• Avro• Parquet
Transformations• Filters• Rules• Expression• Dedup• Enrich
Analytics• Dimensional Aggregations
(with state management for historical data + query)
Protocols• HTTP• FTP• WebSocket• MQTT• SMTP
Other• Elastic Search• Script (JavaScript, Python, R)• Solr• Twitter
![Page 8: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/8.jpg)
© 2016 DataTorrent
Apex as Bigtop component
8
•Uses Bigtop framework for ease of deploymentDeployment using puppet recipes and VagrantCan spawn multiple node clusters for docker, VM &
OpenStack
•Generates a deployable binaries for Apex engineRPM - CentOS 6 & 7, Fedora 25, OpenSuse 42.1DEB - Ubuntu 16.04, Debian 8
•Allows validating installationsPackage TestSmoke Test
![Page 9: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/9.jpg)
© 2016 DataTorrent
•Add Bigtop Repositoryhttp://www.apache.org/dist/bigtop/bigtop-1.1.0/repos/
• Install bigtop-hadoopFor Debian: apt-get install hadoop\*For RPM: yum install hadoop\*
•Download bigtop-apex from bigtop CIhttps://ci.bigtop.apache.org/job/Bigtop-trunk-packages/
• Install Apex:For Debian: dpkg -i apex_3.4.0-1_all.debFor RPM: rpm -i apex-3.5.0-1.el7.noarch.rpm
Installing Bigtop Apex Bigtop 1.1.0 (Current)
9
![Page 10: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/10.jpg)
© 2016 DataTorrent
•Add Bigtop Repository (Future URL)http://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/
• Install apexFor Debian: apt-get install apexFor RPM: yum install apex
Installing Bigtop Apex Bigtop 1.2.0 (Next Release)
10
![Page 11: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/11.jpg)
© 2016 DataTorrent
•A quick starter Apex docker image: https://hub.docker.com/r/apacheapex/sandbox/
•Preconfigured and running componentsHDFS (namenode, secondarynamenode, datanode)YARN (resourcemanager, nodemanager, timelineserver)
•Preconfigured and installed componentApex
•Get started:Step1: docker pull apacheapex/sandboxStep2: docker run -it apacheapex/sandbox
Apex Docker sandbox
11
![Page 12: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/12.jpg)
© 2016 DataTorrent
Apex Docker sandbox (contd.)
12
![Page 13: Apache Apex (Next Gen Hadoop) as a part of Bigtop Deployment Stack](https://reader035.vdocuments.site/reader035/viewer/2022062503/58e4a1621a28aba3458b60c5/html5/thumbnails/13.jpg)
© 2016 DataTorrent
Resources
13
• Apache Apex website - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - http://apex.apache.org/downloads.html• Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex• Facebook - https://www.facebook.com/ApacheApex/• Meetup - http://www.meetup.com/topics/apache-apex•SlideShare - http://www.slideshare.net/ApacheApex/presentations•More Examples - https://github.com/DataTorrent/examples• Startup Program – Free Enterprise License for Startups, Educational
Institutions, Non-Profits - https://www.datatorrent.com/startups/•Cloud Trial - https://www.datatorrent.com/download/cloud-trial/