apache nifi: ingesting enterprise data at scale
TRANSCRIPT
1 ©HortonworksInc.2011– 2017.AllRightsReserved
TimothySpann2017FutureofData– PrincetonMeetupHostedbyTRACIntermodal
Apache NiFi: Ingesting Enterprise Data @ Scale
DATAWORKSSUMMIT/HADOOPSUMMITJUNE13–15,2017SanJoseMcHenryConventionCenter
REGISTERNOWANDSAVE$1,000
REGISTERNOW!>
dataworkssummit.com
3 ©HortonworksInc.2011– 2017.AllRightsReserved
Agenda
• Apache NiFi RDBMS, EDI, JSON, CSV, Sensors
• EDI• https://community.hortonworks.com/content/kbentry/59975/in
gesting-edi-into-hdfs-using-hdf-20.html• https://github.com/tspannhw/EnterpriseNIFI
4 ©HortonworksInc.2011– 2017.AllRightsReserved
5 ©HortonworksInc.2011– 2017.AllRightsReserved
6 ©HortonworksInc.2011– 2017.AllRightsReserved
FlowManagement Flowmanagement+StreamProcessing
D A T A I N M O T I O N D A T A A T R E S T
IoTDataSources AWSAzure
GoogleCloudHadoop
NiFiKafka
Storm
Others…NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF2.1– DatainMotionPlatform
EnterpriseServices
Ambari Ranger Otherservices
7 ©HortonworksInc.2011– 2017.AllRightsReserved
Actionable Insights Architecture
IngestionSimpleEventProcessing
EngineComplexEventProcessing
Destination
DataBus
BuildPredictiveModel
FromHistoricalData
DeployPredictiveModel
ForReal-timeInsights
PerishableInsights
HistoricalInsights
8 ©HortonworksInc.2011– 2017.AllRightsReserved
ActionableIntelligenceTransformsIndustrial,Transportation&Utilities
AssetData
CustomerSurveys
Weather&Environmental
ServiceFleetGPSData
SmartMeterStreams
CommodityPrices
REVENUEPROTECTION
SINGLEVIEWOFCUSTOMER
PREDICTIVEEQUIPMENTMAINTENANCE
CONSERVATIONVOLTAGEREDUCTION
COMMODITYTRADING
SocialMedia
GISData
SCADA OutageHistories
CISRecords
EDW
9 ©HortonworksInc.2011– 2017.AllRightsReserved
What is Apache NiFi?
• Created to address the challenges of global enterprise dataflow• Key features:
– VisualCommandandControl
– DataLineage(Provenance)
– DataPrioritization
– DataBuffering/Back-Pressure
– ControlLatencyvs.Throughput
– SecureControlPlane/DataPlane
– ScaleOutClustering
– Extensibility
10 ©HortonworksInc.2011– 2017.AllRightsReserved
Apache NiFi
What is Apache NiFi used for?• Reliable and secure transfer of data between systems• Delivery of data from sources to analytic platforms• Enrichment and preparation of data:
– Conversionbetweenformats– Extraction/Parsing– Routingdecisions
What is Apache NiFi NOT used for?• Distributed Computation• Complex Event Processing• Complex Rolling Window Operations
11 ©HortonworksInc.2011– 2017.AllRightsReserved
NiFi Terminology
FlowFile• Unitofdatamovingthroughthesystem• Content+Attributes(key/valuepairs)
Processor• Performsthework,canaccessFlowFiles
Connection• Linksbetweenprocessors• Queuesthatcanbedynamicallyprioritized
12 ©HortonworksInc.2011– 2017.AllRightsReserved
Contact:
[email protected]/futureofdata-princeton
community.hortonworks.com/users/9304/tspann.html
13 ©HortonworksInc.2011– 2017.AllRightsReserved
HortonworksCommunityConnection
Read access for everyone, join to participate and be recognized
• FullQ&APlatform(likeStackOverflow)
• KnowledgeBaseArticles
• CodeSamplesandRepositories
14 ©HortonworksInc.2011– 2017.AllRightsReserved
CommunityEngagement
Participate now at: community.hortonworks.com©HortonworksInc.2011– 2015.AllRightsReserved
4,000+RegisteredUsers
10,000+Answers
15,000+TechnicalAssets
One Website!