dataflow with apache nifi
TRANSCRIPT
DataflowwithApacheNiFiAldrinPiri- @aldrinpiriApacheNiFi CrashCourseDataWorks Summit2017– Munich
6April2017
2 ©HortonworksInc.2011– 2016.AllRightsReserved
Key:'ApacheNiFi’Value:'PMCMember'
Key:'Work’Value:’Sr.MemberofTechnicalStaff@Hortonworks'
Key:'WorkingwithNiFi Since’Value:'2010’
3 ©HortonworksInc.2011– 2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
LiveDemo
Community
4 ©HortonworksInc.2011– 2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
LiveDemo
Community
5 ©HortonworksInc.2011– 2016.AllRightsReserved
Let’sConnectAtoBProducersA.K.AThings
AnythingAND
Everything
Internet!
Consumers• User• Storage• System• …MoreThings
6 ©HortonworksInc.2011– 2016.AllRightsReserved
Movingdataeffectivelyishard
Standards:http://xkcd.com/927/
7 ©HortonworksInc.2011– 2016.AllRightsReserved
Whyismovingdataeffectivelyhard?
à Standardsà Formatsà “ExactlyOnce”Deliveryà Protocolsà VeracityofInformationà ValidityofInformationà EnsuringSecurityà OvercomingSecurity
à Complianceà Schemasà ConsumersChangeà CredentialManagementà “That [person|team|group]”à Networkà “ExactlyOnce”Delivery
8 ©HortonworksInc.2011– 2016.AllRightsReserved
Let’sConnectLotsofAstoBs toAstoCstoBs toΔs toCstoϕsLet’sconsidertheneedsofacourierservice
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter CoreDataCenteratHQ
ServerCluster
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:Rigo Peter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/
9 ©HortonworksInc.2011– 2016.AllRightsReserved
Great!Iamcollectingallthisdata!Let’suseit!Findingourneedlesinthehaystack
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter
Kafka
CoreDataCenteratHQ
ServerCluster
Others
Storm/Spark/Flink /Apex
Kafka
Storm/Spark/Flink /Apex
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:Rigo Peter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/
10 ©HortonworksInc.2011– 2016.AllRightsReserved
Whyismovingdataeffectivelyhardwhenscopedinternally?
à Standardsà Formatsà “ExactlyOnce”Deliveryà Protocolsà VeracityofInformationà ValidityofInformationà EnsuringSecurityà OvercomingSecurity
à Complianceà Schemasà ConsumersChangeà CredentialManagementà “That [person|team|group]”à Networkà “ExactlyOnce”Delivery
11 ©HortonworksInc.2011– 2016.AllRightsReserved
Let’sConnectLotsofAstoBs toAstoCstoBs toΔs toCstoϕsOh,thatcourierserviceisglobal
12 ©HortonworksInc.2011– 2016.AllRightsReserved
Whyismovingdataeffectivelyhardwhenscopedglobally?
à Standardsà Formatsà “ExactlyOnce”Deliveryà Protocolsà VeracityofInformationà ValidityofInformationà EnsuringSecurityà OvercomingSecurity
à Complianceà Schemasà ConsumersChangeà CredentialManagementà “That [person|team|group]”à Networkà “ExactlyOnce”Delivery
13 ©HortonworksInc.2011– 2016.AllRightsReserved
TheUnassumingLine:ACaseStudyWe’veseenafewlinesshowupinthewildthusfar
Internet! Inter- &Intra- connectionsinourglobalcourierenterprise
Spotlight:ArthurLacôte,https://thenounproject.com/turo/
14 ©HortonworksInc.2011– 2016.AllRightsReserved
DataflowLineAnatomy101Let’sdissectwhatthislinetypicallyrepresents
Fig1.Lineus Worldwidewebus.CommonName:Internet!
ScriptorApplication
ScriptorApplication
Data Data
DisparateTransportMechanisms
15 ©HortonworksInc.2011– 2016.AllRightsReserved
DataflowLineAnatomy201Sometimesthattransportisjustmorelines
Fig1.Lineus Worldwidewebus.CommonName:Internet!
ScriptorApplication
ScriptorApplication
LineInception
Data Data
16 ©HortonworksInc.2011– 2016.AllRightsReserved
DataflowLineAnatomy301Butthoselinescouldalsohavecomponents…
Fig1.Lineus Worldwidewebus.CommonName:Internet!
17 ©HortonworksInc.2011– 2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
LiveDemo
Community
18 ©HortonworksInc.2011– 2016.AllRightsReserved
ApacheNiFiKeyFeatures
• Guaranteeddelivery• Databuffering
- Backpressure- Pressurerelease
• Prioritizedqueuing• FlowspecificQoS
- Latencyvs.throughput- Losstolerance
• Dataprovenance• Supportspushandpull
models
• Recovery/recordingarollinglogoffine-grainedhistory
• Visualcommandandcontrol
• Flowtemplates• Pluggable/multi-role
security• Designedforextension• Clustering
19 ©HortonworksInc.2011– 2016.AllRightsReserved
ApacheNiFi Subproject:MiNiFi
à LetmegetthekeypartsofNiFi closetowheredatabeginsandprovidebidrectionalcommunication
à NiFi livesinthedatacenter.Giveitanenterpriseserveroraclusterofthem.
à MiNiFi livesasclosetowheredataisbornandisaguestonthatdeviceorsystem
20 ©HortonworksInc.2011– 2016.AllRightsReserved
Let’srevisitourcourierservicefromtheperspectiveofNiFi
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter
Kafka
CoreDataCenteratHQ
ServerCluster
Others
Storm/Spark/Flink /Apex
Kafka
Storm/Spark/Flink /Apex
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:Rigo Peter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/
ClientLibraries
ClientLibraries
MiNiFi
MiNiFiNiFi NiFi NiFi NiFi NiFi NiFi
ClientLibraries
21 ©HortonworksInc.2011– 2016.AllRightsReserved
ApacheNiFi ManagedDataflowSOURCES REGIONAL
INFRASTRUCTURECORE
INFRASTRUCTURE
22 ©HortonworksInc.2011– 2016.AllRightsReserved
NiFi isbasedonFlowBasedProgramming(FBP)
FBPTerm NiFi Term DescriptionInformationPacket
FlowFile Each objectmovingthroughthesystem.
Black Box FlowFileProcessor
Performsthework, doingsomecombinationofdatarouting,transformation,ormediationbetweensystems.
BoundedBuffer
Connection Thelinkage betweenprocessors, actingasqueuesandallowingvariousprocessestointeractatdifferingrates.
Scheduler FlowController
Maintainstheknowledgeofhowprocessesareconnected, andmanagesthethreadsandallocationsthereofwhichallprocessesuse.
Subnet ProcessGroup
Asetofprocessesandtheirconnections,whichcanreceiveandsenddataviaports.Aprocess groupallowscreationofentirelynewcomponentsimplybycompositionofits components.
23 ©HortonworksInc.2011– 2016.AllRightsReserved
FlowFiles &DataAgnosticism
à NiFi isdataagnostic!
à But,NiFi wasdesignedunderstandingthatusers
cancareaboutspecificsandprovidestooling
tointeractwithspecificformats,protocols,etc.
ISO8601- http://xkcd.com/1179/
Robustnessprinciple
Beconservativeinwhatyoudo,beliberalinwhatyouacceptfromothers“
24 ©HortonworksInc.2011– 2016.AllRightsReserved
FlowFiles arelikeHTTPdataHTTPData FlowFile
HTTP/1.1200OKDate:Sun,10Oct201023:26:07GMTServer:Apache/2.2.8(CentOS)OpenSSL/0.9.8gLast-Modified:Sun,26Sep201022:04:35GMTETag:"45b6-834-49130cc1182c0"Accept-Ranges:bytesContent-Length:13Connection:closeContent-Type:text/html
Helloworld!
StandardFlowFile AttributesKey:'entryDate’ Value:'FriJun1717:15:04EDT2016'Key:'lineageStartDate’Value:'FriJun1717:15:04EDT2016'Key:'fileSize’ Value:'23609'FlowFile AttributeMapContentKey:'filename’ Value:'15650246997242'Key:'path’ Value:'./’
BinaryContent*
Header
Content
25 ©HortonworksInc.2011– 2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
LiveDemo
Community
26 ©HortonworksInc.2011– 2016.AllRightsReserved
Extension/IntegrationPoints
NiFi Term DescriptionFlow FileProcessor
Push/Pull behavior.CustomUI
ReportingTask
Used topushdatafromNiFi tosomeexternalservice(metrics,provenance,etc..)
ControllerService
Usedtoenablereusablecomponents/ sharedservicesthroughouttheflow
RESTAPI Allowsclientstoconnecttopullinformation,changebehavior,etc..
©HortonworksInc.2011–2016.AllRightsReserved�X
Architecture
OS/Host
JVM
FlowController
WebServer
Processor1 ExtensionN
FlowFileRepository
Content Repository
ProvenanceRepository
LocalStorage
Standalone
Cluster
27 ©HortonworksInc.2011– 2016.AllRightsReserved
NiFiArchitecture– Repositories- Passbyreference
FlowFile Content Provenance
F1à C1 C1 P1à F1
Excerptofdemoflow… What’shappeninginsidetherepositories…
BEFORE
AFTER
F2à C1 C1 P3à F2 – Clone(F1)
F1à C1 P2à F1 – Route
P1à F1 – Create
28 ©HortonworksInc.2011– 2016.AllRightsReserved
NiFiArchitecture– Repositories– CopyonWrite
FlowFile Content Provenance
F1à C1 C1 P1à F1- CREATE
Excerptofdemoflow… What’shappeninginsidetherepositories…
BEFORE
AFTER
F1à C1F1.1à C2 C2(encrypted)
C1(plaintext)
P2à F1.1 - MODIFY
P1à F1- CREATE
29 ©HortonworksInc.2011– 2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
Demo
Community
30 ©HortonworksInc.2011– 2016.AllRightsReserved
Learn,ShareatBirdsofaFeatherIOT,STREAMING&DATAFLOW
Thursday,April65:50pm,Room5
31 ©HortonworksInc.2011– 2016.AllRightsReserved
WhyNiFi?
à Movingdataismultifacetedinitschallengesandthesearepresentindifferentcontextsatvaryingscopes– Thinkofourcourierexampleandorganizationslikeit:intervs intra,domestically,internationally
à Providecommontoolingandextensionsthatarecommonlyneededbutbeflexibleforextension– LeverageexistinglibrariesandexpansiveJavaecosystemforfunctionality– Alloworganizationstointegratewiththeirexistinginfrastructure
à Empowerfolksmanagingyourinfrastructuretomakechangesandreasonaboutissuesthatareoccurring– DataProvenancetoshowcontextanddata’sjourney– UserInterface/Experienceakeycomponent
32 ©HortonworksInc.2011– 2016.AllRightsReserved
Learnmoreandjoinus!
Apache NiFi sitehttp://nifi.apache.org
Subproject MiNiFi sitehttp://nifi.apache.org/minifi/
Subscribe to and collaborate [email protected]@nifi.apache.org
Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI
Follow us on Twitter@apachenifi
33 ©HortonworksInc.2011– 2016.AllRightsReserved
OurLabforToday
à WewillbeexploringsomeexamplestoworkthroughcreatingadataflowwithApacheNiFi
à UseCase:Anurbanplanningboardisevaluatingtheneedforanewhighway,dependentoncurrenttrafficpatterns,particularlyasotherroadworkinitiativesareunderway.Integratinglivedataposesaproblembecausetrafficanalysishastraditionallybeendoneusinghistorical,aggregatedtrafficcounts.Toimprovetrafficanalysis,thecityplannerwantstoleveragereal-timedatatogetadeeperunderstandingoftrafficpatterns.NiFi wasselectedforforthisreal-timedataintegration.
à Labsareavailableathttp://tinyurl.com/nificrashcourse