dataflow with apache nifi/minifi - archive.fosdem.org · dataflow with apache nifi/minifi andy...
TRANSCRIPT
DataflowwithApacheNiFi/MiNiFiAndyLoPresto-@yolopey
IntelligentlyCollectingDataattheEdgeApacheNiFiPMCFOSDEM’17-Brussels
04Feb2017
©HortonworksInc.2011–2016.AllRightsReserved3
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiApacheMiNiFiArchitectureCommunity
©HortonworksInc.2011–2016.AllRightsReserved4
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiApacheMiNiFiArchitectureCommunity
©HortonworksInc.2011–2016.AllRightsReserved5
Let’sConnectAtoBProducersA.K.AThings
AnythingAND
Everything
Internet!
Consumers• User• Storage• System• …MoreThings
©HortonworksInc.2011–2017.AllRightsReserved6
Movingdataeffectivelyishard
Standards:http://xkcd.com/927/
©HortonworksInc.2011–2017.AllRightsReserved7
Whyismovingdataeffectivelyhard?
⬢ Standards⬢ Formats⬢ “ExactlyOnce”Delivery⬢ Protocols⬢ VeracityofInformation⬢ ValidityofInformation⬢ EnsuringSecurity⬢ OvercomingSecurity
⬢ Compliance⬢ Schemas⬢ ConsumersChange⬢ CredentialManagement⬢ “That[person|team|group]”⬢ Network*⬢ “ExactlyOnce”Delivery
©HortonworksInc.2011–2016.AllRightsReserved8
ConnectingAtoBtoCEasyenoughwithBashscripts,Ruby/Python/Groovy,etc.
Logfiles
SQL
BigData
©HortonworksInc.2011–2016.AllRightsReserved9
Let’sConnectLotsofAstoBstoAstoCstoBstoΔstoCstoϕsLet’sconsidertheneedsofacourierservice
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter CoreDataCenteratHQ
ServerCluster
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:RigoPeter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/
©HortonworksInc.2011–2016.AllRightsReserved10
Great!Iamcollectingallthisdata!Let’suseit!Findingourneedlesinthehaystack
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter
Kafka
CoreDataCenteratHQ
ServerCluster
Others
Storm/Spark/Flink/Apex
Kafka
Storm/Spark/Flink/Apex
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:RigoPeter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/
©HortonworksInc.2011–2016.AllRightsReserved11
Let’sConnectLotsofAstoBstoAstoCstoBstoΔstoCstoϕsRaiseyourhandifyouwanttomaintainPythonscriptsfortherestofyourlife
©HortonworksInc.2011–2016.AllRightsReserved12
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiApacheMiNiFiArchitectureCommunity
©HortonworksInc.2011–2016.AllRightsReserved13
ApacheNiFiKeyFeatures
• Guaranteeddelivery• Databuffering
- Backpressure- Pressurerelease
• Prioritizedqueuing• FlowspecificQoS
- Latencyvs.throughput- Losstolerance
• Dataprovenance• Supportspushandpull
models
• Recovery/recordingarollinglogoffine-grainedhistory
• Visualcommandandcontrol
• Flowtemplates• Pluggable,multi-tenant
security• Designedforextension• Clustering
©HortonworksInc.2011–2017.AllRightsReserved14
NiFiisbasedonFlowBasedProgramming(FBP)
FBPTerm NiFiTerm DescriptionInformationPacket
FlowFile Eachobjectmovingthroughthesystem.
BlackBox FlowFileProcessor
Performsthework,doingsomecombinationofdatarouting,transformation,ormediationbetweensystems.
BoundedBuffer
Connection Thelinkagebetweenprocessors,actingasqueuesandallowingvariousprocessestointeractatdifferingrates.
Scheduler FlowController
Maintainstheknowledgeofhowprocessesareconnected,andmanagesthethreadsandallocationsthereofwhichallprocessesuse.
Subnet ProcessGroup
Asetofprocessesandtheirconnections,whichcanreceiveandsenddataviaports.Aprocessgroupallowscreationofentirelynewcomponentsimplybycompositionofitscomponents.
©HortonworksInc.2011–2017.AllRightsReserved15
FlowFiles&DataAgnosticism
⬢ NiFiisdataagnostic!
⬢ But,NiFiwasdesignedunderstandingthatusers
cancareaboutspecificsandprovidestooling
tointeractwithspecificformats,protocols,etc.
ISO8601-http://xkcd.com/1179/
Robustnessprinciple
Beconservativeinwhatyoudo,beliberalinwhatyouacceptfromothers“
©HortonworksInc.2011–2017.AllRightsReserved16
FlowFilesarelikeHTTPdataHTTPData FlowFile
HTTP/1.1200OKDate:Sun,10Oct201023:26:07GMTServer:Apache/2.2.8(CentOS)OpenSSL/0.9.8gLast-Modified:Sun,26Sep201022:04:35GMTETag:"45b6-834-49130cc1182c0"Accept-Ranges:bytesContent-Length:13Connection:closeContent-Type:text/html
Helloworld!
StandardFlowFileAttributesKey:'entryDate’ Value:'FriJun1717:15:04EDT2016'Key:'lineageStartDate’Value:'FriJun1717:15:04EDT2016'Key:'fileSize’ Value:'23609'FlowFileAttributeMapContentKey:'filename’ Value:'15650246997242'Key:'path’ Value:'./’
BinaryContent*
Header
Content
©HortonworksInc.2011–2017.AllRightsReserved19
DataProvenance
▪ Constrained▪ High-latency▪ Localizedcontext
▪ Hybrid–cloud/on-premises▪ Low-latency▪ Globalcontext
Origin–attributionReplay–recovery
EvolutionoftopologiesLongretention
TypesofLineage• Event• Configuration
©HortonworksInc.2011–2017.AllRightsReserved20
DeeperEcosystemIntegration:180+Processors
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
RouteContent
RouteContext
RouteText
ControlRate
DistributeLoad
GenerateTableFetch
JoltTransformJSON
PrioritizedDelivery
Encrypt
Tail
Evaluate
Execute
AllApacheprojectlogosaretrademarksoftheASFandtherespectiveprojects.
Fetch
HTTP
Syslog
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
©HortonworksInc.2011–2016.AllRightsReserved21
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiApacheMiNiFiArchitectureCommunity
©HortonworksInc.2011–2017.AllRightsReserved22
ApacheNiFiSubproject:MiNiFi
⬢ LetmegetthekeypartsofNiFiclosetowheredatabeginsandprovidebidirectionalcommunication
⬢ NiFilivesinthedatacenter—giveitanenterpriseserveroraclusterofthem
⬢ MiNiFilivesasclosetowheredataisbornandisaguestonthatdeviceorsystem
⬢ IoT
⬢ Connectedcar
⬢ Legacyhardware
⬢ S2Sclientlibs
©HortonworksInc.2011–2017.AllRightsReserved23
WhybuildMiNiFi?
⬢ NiFiisbig
⬢ 1.1.0releaseis726MBcompressed
⬢ Canbemodifiedtoruninrestrictedenvironments,butrequiresmanualsurgery
⬢ ProvidesUI,provenancequery,etc.
⬢ Runsondedicatedmachines/clusters—“ownsthebox”
⬢ MiNiFilivesattheedge
⬢ NoUI
⬢ 0.1.0Javabinaryis45MB,C++binaryis746KB
⬢ “Goodguest”
©HortonworksInc.2011–2017.AllRightsReserved24
WhatdoesMiNiFiprovide?
⬢ Datatagging/provenance
⬢ Governancefromedge(geopoliticalrestrictions)
⬢ Security(encryption,certificate-basedauthentication)
⬢ Lowlatency(immediatereactions&decision-making)
Connected Car Reference Platform Box
Tuner + DSRC CardConnectivity Card
©HortonworksInc.2011–2017.AllRightsReserved25
MiNiFionaConnectedCar
Comprehension
Collection
CANBus
Gateway
MCU MCU MCU
Ethernet/EthernetAVB
LocalInterconnectNetwork
Yettobeestablishedprotocol
ListenEthernet ListenLINListenCAN Listen<>
ParseCAN ParseEthernet ParseLIN Parse<>
Processing/Synthesis
Route
Transmit Execute PrioritizeFilter
©HortonworksInc.2011–2017.AllRightsReserved27
MiNiFiFeatureProposals
⬢ FlowVersioning
⬢ DevelopflowsforclassofMiNiFiinstances
⬢ Command&Control(C2)API
⬢ FileChangeIngestor
⬢ RestAPIIngestor
⬢ PullHTTPIngestor
©HortonworksInc.2011–2016.AllRightsReserved28
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiApacheMiNiFiArchitectureCommunity
©HortonworksInc.2011–2016.AllRightsReserved29
Let’srevisitourcourierservicefromtheperspectiveofNiFi
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter
Kafka
CoreDataCenteratHQ
ServerCluster
Others
Storm/Spark/Flink/Apex
Kafka
Storm/Spark/Flink/Apex
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:RigoPeter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/
ClientLibraries
ClientLibraries
MiNiFi
MiNiFiNiFi NiFi NiFi NiFi NiFi NiFi
ClientLibraries
©HortonworksInc.2011–2016.AllRightsReserved30
ApacheNiFiManagedDataflowSOURCES REGIONAL
INFRASTRUCTURECORE
INFRASTRUCTURE
©HortonworksInc.2011–2016.AllRightsReserved31
Extension/IntegrationPoints
NiFiTerm DescriptionFlowFileProcessor
Push/Pullbehavior.CustomUI
ReportingTask
UsedtopushdatafromNiFitosomeexternalservice(metrics,provenance,etc.)
ControllerService
Usedtoenablereusablecomponents/sharedservicesthroughouttheflow
RESTAPI Allowsclientstoconnecttopullinformation,changebehavior,etc.
©HortonworksInc.2011–2016.AllRightsReserved32
Architecture
OS/Host
JVM
FlowController
WebServer
Processor1 ExtensionN
FlowFileRepository
Content Repository
ProvenanceRepository
LocalStorage
Standalone
Cluster
©HortonworksInc.2011–2016.AllRightsReserved33
NiFiArchitecture–Repositories-Passbyreference
FlowFile Content Provenance
F1à C1 C1 P1à F1
Excerptofdemoflow… What’shappeninginsidetherepositories…
BEFORE
AFTER
F2à C1 C1 P3à F2–Clone(F1)
F1à C1 P2à F1–Route
P1à F1–Create
©HortonworksInc.2011–2016.AllRightsReserved34
NiFiArchitecture–Repositories–CopyonWrite
FlowFile Content Provenance
F1à C1 C1 P1à F1-CREATE
Excerptofdemoflow… What’shappeninginsidetherepositories…
BEFORE
AFTER
F1à C1F1.1à C2 C2(encrypted)
C1(plaintext)
P2à F1.1-MODIFY
P1à F1-CREATE
©HortonworksInc.2011–2016.AllRightsReserved35
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiApacheMiNiFiArchitectureCommunity
©HortonworksInc.2011–2017.AllRightsReserved36
WhyNiFi?
⬢ Movingdataismultifacetedinitschallengesandthesearepresentindifferentcontextsatvaryingscopes– Thinkofourcourierexampleandorganizationslikeit:intervsintra,domestically,internationally
⬢ Providecommontoolingandextensionsthatareneededbutbeflexibleforextension– LeverageexistinglibrariesandexpansiveJavaecosystemforfunctionality– Alloworganizationstointegratewiththeirexistinginfrastructure
⬢ Empowerfolksmanagingyourinfrastructuretomakechangesandreasonaboutissuesthatareoccurring– DataProvenancetoshowcontextanddata’sjourney– UserInterface/Experienceakeycomponent
©HortonworksInc.2011–2017.AllRightsReserved38
Learnmoreandjoinus
Apache NiFi site https://nifi.apache.org
Subproject MiNiFi site https://nifi.apache.org/minifi/
Subscribe to and collaborate at [email protected] [email protected]
Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter @apachenifi
©HortonworksInc.2011–2016.AllRightsReserved
ThankYou
I’mstickingaroundfordiscussions/questions@yolopey/@[email protected]:70ECB3E598A65A3FD3C4BACE3C6EF65B2F7DEF69
39