convcomp2016: verso la “chat intelligente”: la ricerca in natural language processing e machine...

19
Verso la chat “intelligente” Bernardo Magnini Fondazione Bruno Kessler, Trento [email protected]

Upload: convcomp2016

Post on 15-Apr-2017

313 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

Versolachat“intelligente”

BernardoMagniniFondazioneBrunoKessler,Trento

[email protected]

Page 2: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

Outline

• Natural LanguageProcessing• TheroleofMachineLearning

• Scenario 1:questionanswering overstructured data• Theroleofknowledgemodeling

• Scenario 2:FAQretrieval• Theroleoftext-to-texttechnologies

• Towardtheintelligentchat• Learningcomplexdialogues

Page 3: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

Pipeline of NLP tools

• HTML cleaner• Tokenizer and Sentence Splitter• POS tagging • Morphological analysis• Chunking and parsing• Named entities• Temporal expressions• Key concepts• Geo-codingOngoing• Sentiment analysis• NER for German

SemanticAnnotations

TextPro isfreelydistributedforresearchpurposes

http://textpro.fbk.eu/

Page 4: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

A Pipeline of NLP taggers

CleanPro TokenPro SentencePro MorphoPro

EntityPro

LemmaPro

GeoCoder

SyntaxPro

TagPro

KX ChunkPro

TimePro

TextPro: - a cascade of “annotators”- Input: Pure text (UTF-8)- Output: tabular format (IOB annotation),XML

Page 5: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

OutputofanNLPPipeline

token tokenid tokenstart tokenend pos lemma chunk entity timexUS 1 0 2 NP0 __NULL__ B-NP B-GPE OPresident 2 3 12 NP0 president I-NP O OBarack 3 13 19 NP0 __NULL__ I-NP B-PER OObama 4 20 25 NP0 __NULL__ I-NP I-PER Odelayed 5 26 33 VVD delay B-VP O Oa 6 34 35 AT0 a B-NP O Odecision 7 36 44 NN1 decision I-NP O Oon 8 45 47 PRP on B-PP O Opotential 9 48 57 NN1 potential B-NP O Omilitary 10 58 66 AJ0 military I-NP O Oaction 11 67 73 NN1 action I-PP O Oagainst 12 74 81 PRP against B-PP O O

Syria 13 82 87 NP0 syria B-NP B-LOC Oafter 14 88 93 CJS after B-PP O O

September 15 94 103 NP0 september B-NP O B-DATE

9 16 104 105 CRD __NULL__ I-NP O I-DATE

Providedby:TokenPro TagPro LemmaPro ChunkPro EntityPro TimePro

Page 6: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

Graph-basedRepresentation

AbstractMeaningRepresentation (AMR) format

TheAMRabovecanbeexpressed variously inEnglish:

Theboywantsthegirltobelieve him.

Theboywantstobebelievedbythegirl.

Theboyhasadesire tobebelieved bythegirl.

Theboy’sdesire is forthegirl tobelieve him.

Theboyisdesirousofthegirlbelieving him.

Page 7: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

United States presidential election of 2008, scheduled for Tuesday November 4, 2008, will be the 56th consecutive quadernnial United Statespresidential election and will select the President and the Vice President of the United States. The Republican Party has chosen JohnMcCain, thesenior United States Senator from Arizona as its nominee; the DemocraticParty has chosen Barak Obama, the junior United States Senator fromIllinois, as its nominee.

United_B-LOC States_I-LOC presidential_O election_O of_O 2008_O ,_O scheduled_O for_O Tuesday_O November_O 4_O ,_O2008_O ,_Owill_O be_Othe_O 56th_O consecutive_O quadernnial_O United_B-LOC States_I-LOC presidential_O election_O and_O will_O select_O the_O President_B-PERand_O the_O Vice_B-PER President_I-PER of_I-PER the_I-PER United_I-PER States_I-PER. The_O Republican_B-ORG Party_I-ORG has_O chosen_OJohn_B-PER McCain_I-PER ,_O the_O senior_O United_B-PER States_I-PER Senator_I-PER from_O Arizona_B-LOC as_O its_O

United,null,null,null,States,presidential,election,1,1,1,1,0,1,0,0,1,1,1,1,1,0,1,0,1,1,ed,ted,Uni,UnNamed EntityClassification

supervised TEST DATA

answer

Learning Algorithm Trained Machine

TRAIN DATA

MachineLearningDevelopmentCycle

• Typicaldevelopmentprocess:taskdefinition,dataset(training,test),featureextraction,evaluation

• Advantages:lowcost,goodperformance• Drawbacks:domainportability,poorcontrolonresults

Page 8: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

ActiveLearning

• A techniqueforselectingtrainingexampleswithhighprobabilitytochangewrongclassifications

• Activelearningselectionvsrandomselection

• Involvetheuserinthelearningprocess

• Improvingperformancecorrectingthesystemerrors

• Significantlyreducesdevelopmenttime

Page 9: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

hasInfrastructure

hasName

CinemahasName

hasEvent

Event

isinSite

PostalAddress

ContacthasPostalAddress

hasContact

Director

Astra

Destination

isInDestination

Movie

hasEventContent

hasDirectorhasName

194 mins

Titanic

Cameron

duration

Movie

hasEventContent

hasDirector hasName

164 mins

E.T.

Spielberg

duration

Director

hasName

• Task:givenaquestionfindapreciseanswercontainedindatabase/knowledgebase

• Needmodelingdomainknowledge:theontology

• E.g.Anontologyforculturalevents(movies,sportevent,etc.)

Scenario1:Question/AnsweringonStructuredData

Page 10: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

WhichcinemaisshowingTitanicbyCameroninTrento?Scenario1:Question/AnsweringonStructuredData

10Madrid, June 1, 2010 - Bernardo Magnini

Titanic is showing today in Trento at Cinema Astra at 8 p.m. Price of the ticket is 10 Euros.

Page 11: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

WhichcinemaisshowingTitanicbyCameroninTrento?

EAT CONSTRAINT CONTEXT

?CINEMA:XMovie-hasCinema(“Movie:Titanic” Cinema:X) Movie-director(“Titanic” “Cameron”) Cinema-loc(Cinema:x Trento)

Context-Time(Q, December 14th) Context-LOC(Q, Trento)

11

Scenario1:Question/AnsweringonStructuredData

Page 12: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

12

WhichcinemaisshowingTitanicbyCameroninTrento?

EAT CONSTRAINT CONTEXT

?CINEMA:XMovie-hasCinema(“Titanic” Cinema:X) Movie-director(“Titanic” “Cameron”) Cinema-loc(Cinema:x Trento

Context-Time(Q, December 14th) Context-LOC(Q, Trento)

CORE JUSTIFICATION COMPLEMENTARY

CINEMA:Astra Movie-hasCinema(“Titanic”Cinema:Astra)Movie-director(“Titanic”“Cameron”)Cinema-loc(Cinema:Astra,Trento)

Movie-time(“Titanic”, 8 pm)Movie-price(“Titanic” 10 euros)

Scenario1:Question/AnsweringonStructuredData

Page 13: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

13

WhichcinemaisshowingTitanicbyCameroninTrento?

EAT CONSTRAINT CONTEXT

?CINEMA:XMovie-hasCinema(“Titanic” Cinema:X) Movie-director(“Titanic” “Cameron”) Cinema-loc(Cinema:x Trento

Context-Time(Q, December 14th) Context-LOC(Q, Trento

Titanic is showing today in Trento at Cinema Astra at 8 p.m. Price of the ticket is 10 Euros.

CORE JUSTIFICATION COMPLEMENTARY

CINEMA:Astra Movie-hasCinema(“Titanic”Cinema:Astra)Movie-director(“Titanic”“Cameron”)Cinema-loc(Cinema:Astra,Trento)

Movie-time(“Titanic”, 8 pm)Movie-price(“Titanic” 10 euros)

Scenario1:Question/AnsweringonStructuredData

Page 14: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

Scenario2:FAQRetrieval

• Task:retrievethemostsimilarFAQtotheuserquestion

• Noneedfordeeptextinterpretation

• Text-to-Textapproaches• Theroleoflearningtechnologies

Page 15: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

TOPIC: Reasons for dissatisfaction in railway service

Int-448: Efficient service. Quick through security and check in. But leg room in standard class was quite poor.

Int-202: Everything ran smoothly and well. Only complaint is lack of leg room with seating with tables.

Int-275: Seating is very cramped – my journey has been very uncomfortable with the person next to me taking up most of the space we have.

ExtractingFragmentsfromInteractions

Page 16: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

nothappywiththecatering coffee isawful

coffee ineconomyisawful

norefreshments

foodontrainistooexpensive

youchargetoomuchforsandwiches

foodqualityisdisappointing

badfoodinpremier

notenoughfoodselection provideveggiemeals

nothappywiththeservice

journeyistooslow

noclearinformation

nothappywiththestaff

staffisunfriendly novegetarianfood expandmealoptions

sandwichesareoverpriced

sandwichesaretooexpensive

disgustingcoffeeisserved

theyhavehorriblecoffee

foodisbad

Catering nonbuono

Caffe’pessimoC’era uncaffe’terribileHannoservito caffe’non

buono

Ilcaffe’inclasseeconomica e’cattivo

Nessunrinfresco

pasti inprimaclasse troppo cari

Paninitroppo costosiPaninicostosissimi

Sipaga troppo perIpanini

Cibo discarsa qualita’Laqualita’delcibo e’

bassa

Cibo scadente inprimaclasse

Scelta dicibon nonsufficienteEspandere il menu

niente cibovegetariani

Aggiungere scelta vegetariana

serviziononsoddisfaciente

viaggiotroppo lento Informazioni

nonchiare

Nonsoddisfatto del

personale

Personale pocoamichevole

Organising Customer Interactions

Page 17: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

Text-to-Textapproaches

UIM

A-CAS

Distance-based (EDITS)

DistanceComponentEditDistance

ITALIANTokenization,Lemma,POS,dependencyparsing

GERMANToken, POS,Lemma,dependencyparsing

ENGLISHToken,Lemma,POS,dependencyparsing

WORDNETItalianGermanEnglish

Lexical componentEntailment rules

WIKIPEDIAItalianEnglish

Classification-based (TIE)

ScoringComponentBagofWordssimilarity

DISTRIBUTIONALSIMILARITYEnglishGermanItalian

Configurator

Transformation-based (BIUTEE +AdArte)

Alignment-based (P1EDA)

Algorithms

DERIVATIONALMORPHOLOGY

ItalianEnglishGerman

AlignmentComponent

PHRASETABLESItalianEnglishGerman

BulgarianToken,Lemma,POS,dependencyparsing

• Aplatformfortext-to-textinferences

• SemanticAlignment• Similarity• Entailment• Contradiction

• Based onmachinelearning

• FAQretrieval at Evalita2016:http://qa4faq.github.io

Page 18: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

TowardtheIntelligentChat

• Fromtext-to-text (FAQretrieval)tosequence-to-sequencelearning

• Learningfromchatsusingneuralnetworks• Firstresults:veryrealisticdialogue“style”

• Canwelearndialogueschema?• Howtointegratespecificknowledge(e.g.inadatabase)

Page 19: Convcomp2016: Verso la “chat intelligente”: la ricerca in Natural Language Processing e Machine Learning

Link…

• Associazione Italiana diLinguisticaComputazionale:ai-lc.it

• CLIC-it,Terza Conferenza Italiana diLinguisticaComputazionale

• Evalita:Valutazione ditecnologie dellinguaggioscritto eparlato perl’italiano:evalita.it

• NLPaFBK:http://hlt.fbk.eu