data distribution patterns with apache nifi

8
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Distribution Patterns with Apache NiFi March 2016 Bryan Bende – Member of Technical Staff

Upload: bryan-bende

Post on 16-Apr-2017

1.957 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Data Distribution Patterns with Apache NiFi

Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Data Distribution Patterns with Apache NiFi

March 2016

Bryan Bende – Member of Technical Staff

Page 2: Data Distribution Patterns with Apache NiFi

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Overview

• Frequently Asked Question: “How do I distribute data across my NiFi cluster?”

• Answer: “It depends…”

• Typically based on whether the data is being pushed or pulled

Page 3: Data Distribution Patterns with Apache NiFi

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Pushing to Listeners

NCM

Node 1

ListenHTTP

Node 2

ListenHTTP

Load Balancer

Data Producer

Data Producer

Data Producer

• Processors listening for incoming data on each node in the cluster

• Load balancer sitting in front of the cluster pointing at listeners

• Data producers push data to load balancer which distributes data across the cluster

• Same approach works for ListenSyslog, ListenUDP, and HandleHttpRequest

Page 4: Data Distribution Patterns with Apache NiFi

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Pulling on All Nodes

NCM

Node 1

GetKafka

Node 2

GetKafka

Kafka Topic

• Works when data source ensures each request gets a unique piece of data

• Kafka sees each GetKafka processor as the same client

• Each GetKafka processor gets different data

Page 5: Data Distribution Patterns with Apache NiFi

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Pulling With List & Fetch OperationsNCM

Node 1 (Primary)

ListHDFS

HDFS

FetchHDFS

RPG

Input Port

Node 2

ListHDFS

FetchHDFS

RPG

Input Port

• Perform “list” operation on primary node

• Send results to Remote Process Group pointing back to same cluster

• Redistributes results to all nodes to perform “fetch” in parallel

• Same approach for ListFile + FetchFile and ListSFTP + FetchSFTP

Page 6: Data Distribution Patterns with Apache NiFi

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Site-To-Site Push

NCM

Node 1

Input Port

Node 2

Input Port

Standalone NiFi

RPG

• Site-To-Site makes a direct connection between NiFi instances

• Either side can be a cluster or standalone instance

• In a push scenario, the source connects a Remote Process Group to an Input Port on the destination

• If pushing to a cluster, Site-To-Site takes care of load balancing across the nodes in the cluster

Page 7: Data Distribution Patterns with Apache NiFi

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Site-To-Site Pull

NCM

Node 1

RPG

Node 2

RPG

Standalone NiFi

Output Port

• In a pull scenario, the destination connects a Remote Process Group to an Output Port on the source

• Each node will pull different data from the source

• If the source was a cluster, each node would pull from each node in the source cluster

Page 8: Data Distribution Patterns with Apache NiFi

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Summary

Several mechanism for distributing data across a cluster…• Push to Listeners with a load balancer in front

• Pull from a source that provides a queue

• Pull with a List + Fetch approach

• Push with Site-to-Site

• Pull with Site-to-Site

Contact Info• [email protected]

• Twitter - @bbende