cwin17 frankfurt / talend_nlp
TRANSCRIPT
![Page 1: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/1.jpg)
1©2017 Talend Inc
CWIN17– NaturalLanguageProcessing
ArminWallrab|DirectorPreSalesCentral&[email protected]
![Page 2: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/2.jpg)
2
• Whatisnaturallanguageprocessing?• Texttokenization• Sentencesplitting• Part-of-Speechtagging
http://www.clips.ua.ac.be/pages/mbsp-tags
• Syntacticparsing• Shallowparsing(akachunking)• NamedEntityRecognition
• Co-referenceresolution• Dependencyparsing
• Sentimentanalysis
Playathttp://nlp.stanford.edu:8080/corenlp/process
NaturalLanguageProcessing
![Page 3: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/3.jpg)
3
• Extractusefulinformationfromthetextualresources(suchasforums,notesinsalesforce,etc.)• Namesofpersons• Namesofcompanies(competitors...)• Namesoftools(concurrenttools...)
• Classifydiscussionsbytopics• Groupdiscussionstogether• Finddiscussionswherepeoplearementionedbutdon'tparticipatetothediscussion.
• Entitylinking• Linksbetweenprofilesandmentionsinthetext• Linksbetweenpersonsandorganizations• Linksbetweenpersonsandanyotherinformationthatmaybeusedforre-identification
Wherecanthisbeuseful?
![Page 4: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/4.jpg)
4
Wherecanthisbeuseful?
![Page 5: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/5.jpg)
5
• Usetextualdatatogetmoreinformationaboutyourstructureddata
• AnalyzeCRMnotes• Extractcontactnames• Getinformationabouttheirstatus(leftthecompany,newphonenumber,gotmarriedandchangedname…)
• Comparethemwiththecurrentvaluesinyourstructureddata• Contactinformationup-to-date?• Namechanged?• Phonechanged?• Addresschanged?• …
http://ualr.edu/informationquality/iciq-proceedings/iciq-2015/
Self-healingcustomerdataqualityissuesthroughinterpretationofunstructured
data(Chandrasekaran.K,Clement.D)
Relationshipwithdataquality?
![Page 6: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/6.jpg)
6
• Prepare text sample• Removeclutter (e.g.HTMLtags)• Tokenize &normalize
• TrainaModel• Designthe features• Labelentities• Validate the model (e.g.K-Fold CrossValidation)
• Usethe Model• Apply onfull text
UseSparkBatch
Great!HowdoesitworkinTalend?
![Page 7: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/7.jpg)
7
Componentworkflow
![Page 8: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/8.jpg)
8
Texttransformations
ConvertinConll-2003formataddoptionalfeaturesandlabeltokens
Extractnamedentitieswith<PER>labels
![Page 9: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/9.jpg)
9©2017 Talend Inc
TheStanfordCoreNLPLibrary
![Page 10: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/10.jpg)
10
Semantic Analysis
http://nlp.stanford.edu:8080/corenlp/
![Page 11: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/11.jpg)
11
Meaning of the tags
https://www.clips.uantwerpen.be/pages/mbsp-tags
![Page 12: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/12.jpg)
12
SentimentAnalysis&SentimentTree
http://corenlp.run/
http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
![Page 13: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/13.jpg)
13©2017 Talend Inc
Let’sdosomeNLPwithTalend!
![Page 14: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/14.jpg)
14
Capturing TwitterMessages
![Page 15: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/15.jpg)
15
Analysisof text messages with Talend
![Page 16: CWIN17 Frankfurt / talend_nlp](https://reader033.vdocuments.site/reader033/viewer/2022052706/5a64c5a77f8b9ac21c8b5b85/html5/thumbnails/16.jpg)
16
• NaturalLanguageProcessing(NLP)componentsareavailableinSparkBatchandStreaming
• Whatcanitbeusedfor?• Extractusefulinformationfromtextualresources(peoplenames,
companies,tools…)• Classifydiscussionsbytopics(groupdiscussionstogether,find
discussionswherepeoplearementioned)• Entitylinking(e.g.personsandorganizationslinking,links
betweenpersonsandanyotherinformationthatmaybeusedforre-identification)
• Whatarethetypicalindustryusecases?• IntelligentSearch• SentimentAnalysis• MarketingPersonalization• GDPR• …
• TalendcomeswithSupportforNLP• ModelPreparation• ModelTraining• ModelEvaluation
Summary
I added
a tool in the software