Download - Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroJoe Percivall - @JPercivallHadoop Summit – Melbourne
1 Sept 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About Me• Software Engineer at Hortonworks
• Apache NiFi committer and PMC member
• Github: github.com/JPercivall
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat is dataflow and what are the challenges?
Apache NiFi
Architecture
Live Demo
Community
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat is dataflow and what are the challenges?
Apache NiFi
Architecture
Live Demo
Community
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Let’s Connect A to BProducers A.K.A Things
AnythingAND
Everything
Internet!
Consumers• User• Storage• System• …More Things
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Moving data effectively is hard
Standards: http://xkcd.com/927/
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why is moving data effectively hard?
Standards Formats “Exactly Once” Delivery Protocols Veracity of Information Validity of Information Ensuring Security Overcoming Security
Compliance Schemas Consumers Change Credential Management “That [person|team|group]” Network “Exactly Once” Delivery
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕsLet’s consider the needs of a courier service
Physical Store
Gateway Server
Mobile Devices
Registers
Server Cluster
Distribution Center Core Data Center at HQ
Server Cluster
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Great! I am collecting all this data! Let’s use it!Finding our needles in the haystack
Physical Store
Gateway Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark / Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕsOh, that courier service is global
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat is dataflow and what are the challenges?
Apache NiFi
Architecture
Live Demo
Community
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• Web-based User Interface for creating, monitoring, & controlling data flows
• Directed graphs of data routing and transformation
• Highly configurable - modify data flow at runtime, dynamically prioritize data
• Easily extensible through development of custom components
• Data Provenance tracks data through entire system
[1] https://nifi.apache.org/
Dataflow
Apache NiFi
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFiKey Features
• Guaranteed delivery• Data buffering
- Backpressure- Pressure release
• Prioritized queuing• Flow specific QoS
- Latency vs. throughput- Loss tolerance
• Data provenance• Supports push and pull
models
• Recovery/recording a rolling log of fine-grained history
• Visual command and control
• Flow templates• Pluggable/multi-role
security*• Designed for extension• Clustering
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Revisit: Courier service from the perspective of NiFi
Physical Store
Gateway Server
Mobile Devices
Registers
Server Cluster
Distribution Center Core Data Center at HQ
Server Cluster
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
NiFi NiFi NiFi NiFi NiFi NiFi
On Delivery Routes
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Courier service from the perspective of NiFi & MiNiFi
Physical Store
Gateway Server
Mobile Devices
Registers
Server Cluster
Distribution Center Core Data Center at HQ
Server Cluster
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
Client Libraries
Client Libraries
MiNiFi
MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi
Client Libraries
On Delivery Routes
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi MiNiFiKey Features
• Guaranteed delivery• Data buffering
- Backpressure- Pressure release
• Prioritized queuing• Flow specific QoS
- Latency vs. throughput- Loss tolerance
• Data provenance
• Recovery/recording a rolling log of fine-grained history
• Designed for extension
• Design and Deploy• Warm re-deploys
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi MiNiFiKey Features
• Guaranteed delivery• Data buffering
- Backpressure- Pressure release
• Prioritized queuing• Flow specific QoS
- Latency vs. throughput- Loss tolerance
• Data provenance
• Recovery/recording a rolling log of fine-grained history
• Designed for extension
• Design and Deploy• Warm re-deploys
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Visual Command and Controlvs.
Design and Deploy
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi Managed DataflowSOURCES REGIONAL
INFRASTRUCTURECORE
INFRASTRUCTURE
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi is based on Flow Based Programming (FBP)FBP Term NiFi Term DescriptionInformation Packet
FlowFile Each object moving through the system.
Black Box FlowFile Processor
Performs the work, doing some combination of data routing, transformation, or mediation between systems.
Bounded Buffer
Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates.
Scheduler Flow Controller
Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use.
Subnet Process Group
A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFiles & Data Agnosticism
NiFi is data agnostic! But, NiFi was designed understanding that users
can care about specifics and provides tooling
to interact with specific formats, protocols, etc.
ISO 8601 - http://xkcd.com/1179/
Robustness principle
Be conservative in what you do, be liberal in what you accept from others“
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFiles are like HTTP dataHTTP Data FlowFile
HTTP/1.1 200 OKDate: Sun, 10 Oct 2010 23:26:07 GMTServer: Apache/2.2.8 (CentOS) OpenSSL/0.9.8gLast-Modified: Sun, 26 Sep 2010 22:04:35 GMTETag: "45b6-834-49130cc1182c0"Accept-Ranges: bytesContent-Length: 13Connection: closeContent-Type: text/html
Hello world!
Standard FlowFile AttributesKey: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'fileSize’ Value: '23609'FlowFile Attribute Map ContentKey: 'filename’Value: '15650246997242'Key: 'path’ Value: './’
Binary Content *
Header
Content
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat is dataflow and what are the challenges?
Apache NiFi
Architecture
Live Demo
Community
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
NiFi Architecture 0.x line
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
NiFi Cluster Manger – Request Replicator
Web Server
MasterNiFi Cluster Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
SlavesNiFi Nodes
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture 1.x line
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture 1.x line
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories - Pass by reference
FlowFile Content Provenance
F1 C1 C1 P1 F1
Excerpt of demo flow… What’s happening inside the repositories…
BEFORE
AFTER
F2 C1 C1 P3 F2 – Clone (F1)
F1 C1 P2 F1 – Route
P1 F1 – Create
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories – Copy on Write
FlowFile Content Provenance
F1 C1 C1 P1 F1 - CREATE
Excerpt of demo flow… What’s happening inside the repositories…
BEFORE
AFTER
F1 C1
F1.1 C2C2 (encrypted)
C1 (plaintext)
P2 F1.1 - MODIFY
P1 F1 - CREATE
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs MiNiFi Java Processes
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Java Processes
Bootstrap
NiFi
UI
bootstrap.conf
nifi.properties
flow.xml.gzreads &modifies
reads
reads
starts
NiFi MiNiFi
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi Java Processes
MiNiFi
Bootstrap
ConfigurationChange Notifier(s)
bootstrap.conf
nifi.properties
flow.xml.gzreads
reads
starts
config.ymltransforms
reads
into
NiFi MiNiFi
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simple Config.ymlTail a rolling file -> Site to Site
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Config Change Notifiers Two implementations
– RestChangeNotifier• Http(s)
– FileChangeNotifier
Configured in bootstrap.conf
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Change notifier update
MiNiFi
Bootstrap
ConfigurationChange Notifiers
1. Initial state– Both running
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Change notifier update
MiNiFi
Bootstrap
ConfigurationChange Notifiers
user creates new configuration2. User sends update through
notifier– HTTP(S) post request– Change watched file
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Change notifier update
MiNiFi
Bootstrap
ConfigurationChange Notifiers
3. Bootstrap validation– Basic validation– Rest notifier will respond
accordingly– Results logged
validate new configuration
user creates new configuration
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Change notifier update
MiNiFi
Bootstrap
ConfigurationChange Notifiers
config.ymlsaves new
4. Bootstrap saves and transforms
– Copy old config.yml to a swap file
validate new configuration
user creates new configuration
nifi.properties
flow.xml.gz
transforms into
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Change notifier update
MiNiFi
Bootstrap
ConfigurationChange Notifiers
nifi.properties
flow.xml.gz
attempt restart
config.ymlsaves new
reads
transforms into
5. Bootstrap attempts restart– MiNiFi reads in the new
nifi.properties and flow.xml.gz
validate new configuration
user creates new configuration
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Change notifier update
6. Success or Fail– Successful restart continue
processing– Failure, rollback to old
config– Existing Data is mapped or
orphaned
MiNiFi
Bootstrap
ConfigurationChange Notifiers
nifi.properties
flow.xml.gz
attempt restart
config.ymlsaves new
reads
transforms into
validate new configuration
user creates new configuration
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat is dataflow and what are the challenges?
Apache NiFi
Architecture
Demo
Community
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaWhat is dataflow and what are the challenges?
Apache NiFi
Architecture
Demo
Community
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Matured at NSA 2006-2014
Brief history of the Apache NiFi Community
• Contributors from Government and several commercial industries
• Releases on a 6-8 week schedule
Code developed at NSA
2006
Today
Achieved TLP
status in just 7 months
July 2015
Code available open source
ASL v2
November 2014
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi Prospective Plans - Centralized Command and Control
Design at a centralized place, deploy on the edge– Flow deployment– NAR deployment– Agent deployment
Version control of flows Agent status monitoring Bi-directional command and control
Centralized management console with a UI
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Dataflow
IngestionSimple Event Processing
EngineStream Processing
DestinationData Bus
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn more and join us!
Apache NiFi sitehttp://nifi.apache.org
Subproject MiNiFi sitehttp://nifi.apache.org/minifi/
Subscribe to and collaborate [email protected]@nifi.apache.org
Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI
Follow us on Twitter@apachenifi
47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!