Download - Data Distribution Patterns with Apache NiFi
![Page 1: Data Distribution Patterns with Apache NiFi](https://reader036.vdocuments.site/reader036/viewer/2022083119/586e8d1c1a28aba0038b874b/html5/thumbnails/1.jpg)
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Distribution Patterns with Apache NiFi
March 2016
Bryan Bende – Member of Technical Staff
![Page 2: Data Distribution Patterns with Apache NiFi](https://reader036.vdocuments.site/reader036/viewer/2022083119/586e8d1c1a28aba0038b874b/html5/thumbnails/2.jpg)
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Overview
• Frequently Asked Question: “How do I distribute data across my NiFi cluster?”
• Answer: “It depends…”
• Typically based on whether the data is being pushed or pulled
![Page 3: Data Distribution Patterns with Apache NiFi](https://reader036.vdocuments.site/reader036/viewer/2022083119/586e8d1c1a28aba0038b874b/html5/thumbnails/3.jpg)
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pushing to Listeners
NCM
Node 1
ListenHTTP
Node 2
ListenHTTP
Load Balancer
Data Producer
Data Producer
Data Producer
• Processors listening for incoming data on each node in the cluster
• Load balancer sitting in front of the cluster pointing at listeners
• Data producers push data to load balancer which distributes data across the cluster
• Same approach works for ListenSyslog, ListenUDP, and HandleHttpRequest
![Page 4: Data Distribution Patterns with Apache NiFi](https://reader036.vdocuments.site/reader036/viewer/2022083119/586e8d1c1a28aba0038b874b/html5/thumbnails/4.jpg)
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pulling on All Nodes
NCM
Node 1
GetKafka
Node 2
GetKafka
Kafka Topic
• Works when data source ensures each request gets a unique piece of data
• Kafka sees each GetKafka processor as the same client
• Each GetKafka processor gets different data
![Page 5: Data Distribution Patterns with Apache NiFi](https://reader036.vdocuments.site/reader036/viewer/2022083119/586e8d1c1a28aba0038b874b/html5/thumbnails/5.jpg)
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pulling With List & Fetch OperationsNCM
Node 1 (Primary)
ListHDFS
HDFS
FetchHDFS
RPG
Input Port
Node 2
ListHDFS
FetchHDFS
RPG
Input Port
• Perform “list” operation on primary node
• Send results to Remote Process Group pointing back to same cluster
• Redistributes results to all nodes to perform “fetch” in parallel
• Same approach for ListFile + FetchFile and ListSFTP + FetchSFTP
![Page 6: Data Distribution Patterns with Apache NiFi](https://reader036.vdocuments.site/reader036/viewer/2022083119/586e8d1c1a28aba0038b874b/html5/thumbnails/6.jpg)
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Push
NCM
Node 1
Input Port
Node 2
Input Port
Standalone NiFi
RPG
• Site-To-Site makes a direct connection between NiFi instances
• Either side can be a cluster or standalone instance
• In a push scenario, the source connects a Remote Process Group to an Input Port on the destination
• If pushing to a cluster, Site-To-Site takes care of load balancing across the nodes in the cluster
![Page 7: Data Distribution Patterns with Apache NiFi](https://reader036.vdocuments.site/reader036/viewer/2022083119/586e8d1c1a28aba0038b874b/html5/thumbnails/7.jpg)
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Pull
NCM
Node 1
RPG
Node 2
RPG
Standalone NiFi
Output Port
• In a pull scenario, the destination connects a Remote Process Group to an Output Port on the source
• Each node will pull different data from the source
• If the source was a cluster, each node would pull from each node in the source cluster
![Page 8: Data Distribution Patterns with Apache NiFi](https://reader036.vdocuments.site/reader036/viewer/2022083119/586e8d1c1a28aba0038b874b/html5/thumbnails/8.jpg)
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Summary
Several mechanism for distributing data across a cluster…• Push to Listeners with a load balancer in front
• Pull from a source that provides a queue
• Pull with a List + Fetch approach
• Push with Site-to-Site
• Pull with Site-to-Site
Contact Info• [email protected]
• Twitter - @bbende