you can't search without data

Post on 22-Jan-2018

91 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

YouCan’tSearchWithoutDataBryanBende– StaffSoftwareEngineer@HortonworksNYCSolr/LuceneMeetup– December7th 2017

2 ©HortonworksInc.2011– 2016.AllRightsReserved

Agenda

à TheProblem

à ApacheNiFi Overview

à IntegrationbetweenNiFi &Solr

à Recent&FutureWork

à DemoCoolStuff!

à Q&A

3 ©HortonworksInc.2011– 2016.AllRightsReserved

AboutMe

à StaffSoftwareEngineer@Hortonworks

à ApacheNiFi PMC&Committer

à ContributedSolr processorsinMarch2015– https://issues.apache.org/jira/browse/NIFI-461

à bbende@hortonworks.com /Twitter@bbende /bryanbende.com

4 ©HortonworksInc.2011– 2016.AllRightsReserved

TheProblem

5 ©HortonworksInc.2011– 2016.AllRightsReserved

Team2

Itstartsoutsosimple…

Hey!Wehavesomeimportantdatato

sendyou!

Cool!Yourdataisreallyimportantto

us!

Team1

Thisshouldbeeasyright?...

6 ©HortonworksInc.2011– 2016.AllRightsReserved

Butwhataboutformats&protocols?

Team2

WecanpublishAvrorecordstoaKafkatopic,does

thatwork?

Oh,wellwehaveaRESTservicethataccepts

JSON…

Team1

7 ©HortonworksInc.2011– 2016.AllRightsReserved

Andwhataboutsecurity&authentication?

Team2

Hmmwhataboutsecurity?Wecanauthenticatevia

Kerberos

Sorry,weonlysupport2-Way

TLSwithcertificates

Team1

8 ©HortonworksInc.2011– 2016.AllRightsReserved

Andwhataboutallthesedevicesattheedge?

Wealsoneedtograbdatafromallthesedevices,howarewegoingtodo

that?

Team2

9 ©HortonworksInc.2011– 2016.AllRightsReserved

Wouldn’titbeniceiftherewasatoolthatcouldhelptheseteams?

10 ©HortonworksInc.2011– 2016.AllRightsReserved

EnterApacheNiFi…

11 ©HortonworksInc.2011– 2016.AllRightsReserved

Apache NiFi

• Created to address the challenges of global enterprise dataflow• Key features:

– VisualCommandandControl

– DataLineage(Provenance)

– DataPrioritization

– DataBuffering/Back-Pressure

– ControlLatencyvs.Throughput

– SecureControlPlane/DataPlane

– ScaleOutClustering

– Extensibility

12 ©HortonworksInc.2011– 2016.AllRightsReserved

NiFi Core Concepts

FBPTerm NiFi Term DescriptionInformationPacket

FlowFile Each objectmovingthroughthesystem.

Black Box FlowFileProcessor

Performsthework, doingsomecombinationofdatarouting,transformation,ormediationbetweensystems.

BoundedBuffer

Connection Thelinkage betweenprocessors, actingasqueuesandallowingvariousprocessestointeractatdifferingrates.

Scheduler FlowController

Maintainstheknowledgeofhowprocessesareconnected, andmanagesthethreadsandallocationsthereofwhichallprocessesuse.

Subnet ProcessGroup

Asetofprocessesandtheirconnections,whichcanreceiveandsenddataviaports.Aprocess groupallowscreationofentirelynewcomponentsimplybycompositionofits components.

13 ©HortonworksInc.2011– 2016.AllRightsReserved

VisualCommand&Control

• Drag& dropprocessorstobuildaflow

• Start,stop,&configurecomponentsinreal-time

• Viewerrors& correspondingmessages

• Viewstatistics& healthof thedataflow

• Create shareable templatesofcommonflows

14 ©HortonworksInc.2011– 2016.AllRightsReserved

Provenance/Lineage

• Tracksdataateachpointasitflowsthroughthesystem

• Records,indexes,andmakeseventsavailablefordisplay

• Handlesfan-in/fan-out,i.e.mergingandsplittingdata

• Viewattributesandcontentatgivenpointsintime

15 ©HortonworksInc.2011– 2016.AllRightsReserved

Prioritization

• Configureaprioritizer perconnection

• Determinewhatisimportantforyourdata– timebased,arrivalorder,importanceofadataset

• Funnelmanyconnectionsdowntoasingleconnectiontoprioritizeacrossdatasets

16 ©HortonworksInc.2011– 2016.AllRightsReserved

Back-Pressure

• Configureback-pressureperconnection

• BasedonnumberofFlowFiles ortotalsizeofFlowFiles

• Upstreamprocessornolongerscheduledtorununtilbelowthreshold

17 ©HortonworksInc.2011– 2016.AllRightsReserved

Latencyvs.Throughput

• Choosebetweenlowerlatency,orhigherthroughputoneachprocessor

• Higherthroughputallowsframeworktobatchtogetheralloperationsfortheselectedamountoftimeforimprovedperformance

• Processordeveloperdetermineswhethertosupportthisbyusing@SupportsBatchingannotation

18 ©HortonworksInc.2011– 2016.AllRightsReserved

Architecture- Standalone

OS/Host

JVM

FlowController

WebServer

Processor1 ExtensionN

FlowFileRepository

ContentRepository

ProvenanceRepository

LocalStorage

à FlowFile Repository– WriteAheadLog– StateofeveryFlowFile– Pointerstocontentrepository

(pass-by-reference)

à ContentRepository– FlowFile content– Copy-on-write

à ProvenanceRepository– WriteAheadLog+Lucene Indexes– Store&searchlineageevents

19 ©HortonworksInc.2011– 2016.AllRightsReserved

OS/Host

JVM

FlowController

WebServer

Processor1 ExtensionN

FlowFileRepository

ContentRepository

ProvenanceRepository

LocalStorage

OS/Host

JVM

FlowController

WebServer

Processor1 ExtensionN

FlowFileRepository

ContentRepository

ProvenanceRepository

LocalStorage

Architecture- Cluster

OS/Host

JVM

FlowController

WebServer

Processor1 ExtensionN

FlowFileRepository

ContentRepository

ProvenanceRepository

LocalStorage

ZooKeeper

à Samedataflowoneachnode,datapartitionedacrosscluster

à AccesstheUIfromanynodeà ZooKeeper forauto-electionof

ClusterCoordinator&PrimaryNode

à ClusterCoordinatorreceivesheartbeatsfromothernodes,managesjoining/disconnecting

à PrimaryNodeforschedulingprocessorsonasinglenode

20 ©HortonworksInc.2011– 2016.AllRightsReserved

NiFi &Solr

21 ©HortonworksInc.2011– 2016.AllRightsReserved

NiFi Solr Processors

à SupportSolr Cloudandstand-aloneSolr instances

à LeverageSolrJ (CloudSolrClient &HttpSolrClient)

à GetSolr – Extractnewdocuments

à PutSolrContentStream – Streamflowfilecontenttoanupdatehandler

22 ©HortonworksInc.2011– 2016.AllRightsReserved

PutSolrContentStream

à ChooseSolr Type– CloudorStandard

à SpecifyZooKeeper hosts,ortheSolr URLwithcore

à SpecifytheSolr pathfortheupdatehandler

à DynamicPropertiessentaskey/valuepairsonrequest

à Relationshipsforsuccess,failure,andconnectionfailure

23 ©HortonworksInc.2011– 2016.AllRightsReserved

GetSolr

à Incrementallyextractnewdocuments

à Mainqueryis*:*,Solr Queryisoptionalfilterquery

à DateFieldusedasfilterquery,fromlastexecutionorinitialvalue

à Sortedbydatefieldanduniquekey

à Cursormarkusedbehindthescenes

à Specifyreturnfields,orallifblank

à OutputSolr XML,orRecords

24 ©HortonworksInc.2011– 2016.AllRightsReserved

InteractingWithaSecureSolr

à BasicAuth– Providerusername/password

à Kerberos– SetJAASsystempropertyinbootstrap.conf– ProvidenameofJAASentryforprocessortouse

à TLS/SSL– ProvideanSSLContextService– One-wayTLSwithTruststore only– Two-wayTLSwithKeystore +Truststore

25 ©HortonworksInc.2011– 2016.AllRightsReserved

Recent&FutureWork

26 ©HortonworksInc.2011– 2016.AllRightsReserved

Problem– ConversionBetweenDataFormats

à Specializedprocessorstooperateondifferentdatatypes

à Sometimesmissingconversions

à Sometimesmissingaspecificfunctionforadatatype

à Sometimesimplementedwithdifferentlibrariescausinginconsistencies

27 ©HortonworksInc.2011– 2016.AllRightsReserved

Solution– RecordProcessing

à Introducetheconceptofa”record”– ReleasedinApacheNiFi 1.2.0(May2017),improvementsin1.3.0and1.4.0

à Centralizethelogicforreading/writingrecordsintocontrollerservices– Readers/WritersforCSV,Json,Avro,etc.

à Providestandardprocessorsthatoperateonrecords– ConvertRecord,QueryRecord,PartitionRecord,UpdateRecord,etc.

à Provideintegrationwithschemaregistries– LocalSchemaRegistry,HortonworksSchemaRegistry,ConfluentSchemaRegistry

à Canstillhandlearbitrarydata,butprocessrecordswhenappropriate

28 ©HortonworksInc.2011– 2016.AllRightsReserved

Problem– VariableHandling

à Needtoparametrizevaluesintheflowperenvironment– Connectionstrings,URLs,FileSystempaths,etc.

à Cansetvariablesinbootstrap.conf– -Dmy.var=foo

à Cansetapropertiesfileinnifi.properties– nifi.variable.registry.properties=production.properties

à Bothrequirecommandlineaccess

à Bothrequirerestarttopickupchanges

29 ©HortonworksInc.2011– 2016.AllRightsReserved

Solution– FirstClassVariableRegistry

à Variablesassociatedwithaprocessgroup,releasedin1.4.0

à Right-clickoncanvastoviewvariablesforcurrentgroup

à Hierarchicalorderofprecedence,resolveclosestreferencetocomponent

à Editingvariablesautomaticallyrestartsanycomponentsreferencingthevariables

30 ©HortonworksInc.2011– 2016.AllRightsReserved

Problem– HowdoIdeploymyflow?

à Mostorganizationswanttheclassicdevelopmentlifecycle(dev->int ->prod)

à Cancopyflow.xml.gz betweenenvironments– Requirescopyingentiredataflow– Can’ttellwhatchanged,hardtodiffifyouputinversioncontrol– Requiresallenvironmentsusethesameencryptionkeyforsensitiveproperties

à Canmaketemplatesforportionsoftheflow– Scriptcreationoftemplateanddeploymenttonextenvironment– Requiresstoppingflowandremovingcomponents,thenre-instantiatingtemplate– Noeasywaytoseechanges,hardtorollback

31 ©HortonworksInc.2011– 2016.AllRightsReserved

Solution– NiFi Registry

à DISCLAIMER- UNDERDEVELOPMENT&NOTRELEASEDYET!

à Complimentaryapplication,sub-projectofApacheNiFi– https://github.com/apache/nifi-registry– https://issues.apache.org/jira/projects/NIFIREG

à Centrallocationforstorage/managementofsharedresourcesacrossNiFi instances

à Initialcapabilitytostoreandretrieve“versionedflows”

à Aversionedflowisasnapshotofaprocessgroupatagivenpointintime

à Potentiallystoreextensions,shareddatasets,andmoreinthefuture

32 ©HortonworksInc.2011– 2016.AllRightsReserved

DEMO!!

33 ©HortonworksInc.2011– 2016.AllRightsReserved

ExampleScenario

à Userdata– https://randomuser.me

à InitiallyinCSVformat– name.title,name.first,name.last,email,registered– mr,dennis,reyes,dennis.reyes@example.com,2012-04-10 01:54:19– miss,carole,gomez,carole.gomez@example.com,2002-12-17 22:15:49

à Requirements– ConvertCSVtoJSON– Addafull_name fieldwithfirstname+lastname– Addagenderfieldbasedontitle(i.e.iftitle==mr thenMALE)– IngesttodifferentSolr collectionsdependingonenvironment

34 ©HortonworksInc.2011– 2016.AllRightsReserved

Questions?

35 ©HortonworksInc.2011– 2016.AllRightsReserved

Learnmoreandjoinus!

Apache NiFi sitehttp://nifi.apache.org

Subscribe to and collaborate atdev@nifi.apache.orgusers@nifi.apache.org

Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI

Follow us on Twitter@apachenifi

36 ©HortonworksInc.2011– 2016.AllRightsReserved

Thankyou!

top related