csa3050: natural language algorithms finite state devices

27
CSA3050: Natural Language Algorithms Finite State Devices

Upload: jerome-craig

Post on 13-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: CSA3050: Natural Language Algorithms Finite State Devices

CSA3050: Natural Language Algorithms

Finite State Devices

Page 2: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 2

Sources

• Blackburn & Striegnitz Ch. 2

Page 3: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 3

Parsers vs. Recognisers

• Recognizers tell us whether a given input is accepted by some finite state automaton.

• Often we would like to have an explanation of why it was accepted.

• Parsers give us that kind of explanation.

• What form does it take?

Page 4: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 4

Finite State Parser

• The output of a finite state parser is a sequence of nodes and arcs. If we, gave the input [h,a,h,a,!] to a parser for our first laughing automaton, it should give us [1,h,2,a,3,h,2,a,3,!,4].

• The technique in Prolog for turning a recognizer into a parser is to add one or more extra arguments to keep track of the structure that was found.

Page 5: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 5

Base Case

Recogniser

recognize1(Node,[ ]) :-    final(Node).

Parser

parse1(Node,[ ],[Node]) :-    final(Node).

Page 6: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 6

Recursive Case

Recogniser recognize1(Node1, String) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), recognize1(Node2, NewString).

Parserparse1(Node1,

String, [Node1,Label|Path]) :-

arc(Node1,Node2,Label),traverse1( Label,

String,NewString),

 parse1(Node2, NewString, Path).

Page 7: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 7

Complex Labels

• So far we have only considered transitions with single-character labels.

• More complex labels are possible – e.g. symbols comprising several characters.

• We can construct an FSA recognizing English noun phrases that can be built from the words:

the, a, wizard, witch, broomstick, hermione, harry, ron, with, fast.

Page 8: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 8

FSA for Noun Phrases

Page 9: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 9

FSA for NPs in Prolog

initial(1).final(3).arc(1,2,a).arc(1,2,the).arc(2,2,brave).arc(2,2,fast).arc(2,3,witch).

arc(2,3,wizard).arc(2,3,broomstick).arc(2,3,rat).arc(1,3,harry).arc(1,3,ron).arc(1,3,hermione).arc(3,1,with).

Page 10: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 10

Parsing a Noun Phrase

testparse1(Symbols,Parse) :-

initial(Node),parse1(Node,Symbols,Parse).

?-testparse1([the,fast,wizard],Z).

Z=[1, the, 2, fast, 2, wizard, 3]

Page 11: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 11

Rewriting Categories

• It is also possible to obtain a more abstract parse, e.g.

?- testparse2([the,fast,wizard],Z).

Z=[1, det, 2, adj, 2, noun, 3]

• What changes are required to obtain this behaviour?

Page 12: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 12

1. Changes to the FSA

%FSA %Lexiconinitial(1).           lex(a,det).final(3).             lex(the,det).arc(1,2,det).         lex(fast,adj).arc(2,2,adj).         lex(brave,adj).arc(2,3,cn).          lex(witch,cn).arc(1,3,pn).          lex(wizard,cn).arc(3,1,prep).        lex(broomstick,cn).                      lex(rat,cn).                      lex(harry,pn).                      lex(hermione,pn).                      lex(ron,pn).                      lex(with,prep).

Page 13: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 13

Changes to the ParserParse1

parse1(Node1, String,

[Node1,Label|Path]) :-arc(Node1,Node2,Label),traverse1( Label,

String,NewString),

 parse1(Node2, NewString, Path).

Parse2parse2(Node1,

String, [Node1,Label|Path]) :-

arc(Node1,Node2,Label),traverse2( Label,

String,NewString),

 parse2(Node2, NewString, Path). traverse2(Label,[Symbol|

Symbols],Symbols) :-   lex(Symbol,Label).

Page 14: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 14

Handling Jumps

traverse3('#',String,String).

traverse3(Cat,[Word|Words],Words) :-   lex(Word,Cat).

Page 15: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 15

Finite State Transducers

• A finite state transducer essentially is a finite state automaton that works on two (or more) tapes.

• The most common way to think about transducers is as a kind of ``translating machine'‘ which works by reading from one tape and writing onto the other.

Page 16: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 16

A Translator from a to b

• initial state: arrowhead

• final state:double circle

• a:b read from first tape and write to second tape

Page 17: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 17

Prolog Representation

:- op(250,xfx,:). initial(1).final(1).arc(1,1,a:b).

Page 18: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 18

Modes of Operation

• generation mode: It writes a string of as on one tape and a string bs on the other tape. Both strings have the same length.

• recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs.

• translation mode (left to right): It reads as from the first tape and writes an b for every a that it reads onto the second tape.

• translation mode (right to left): It reads bs from the second tape and writes an a for every f that it reads onto the first tape.

Page 19: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 19

Transducers and Jumps

• Transducers can make jumps going from one state to another without doing anything on either one or on both of the tapes.

• So, transitions of the form a:# or #:a or #:# are possible.

Page 20: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 20

Simple Transducer in Prolog

transduce1(Node,[ ],[ ]) :-    final(Node).

transduce1(Node1,Tape1,Tape2) :-arc(Node1,Node2,Label),traverse1(Label, Tape1, NewTape1, Tape2, NewTape2),transduce1(Node2,NewTape1,NewTape2).

Page 21: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 21

Traverse for FST

traverse1(L1:L2, [L1|RestTape1],

RestTape1, [L2|RestTape2], RestTape2).

testtrans1(Tape1,Tape2) :-    initial(Node),    transduce1(Node,Tape1,Tape2).

Page 22: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 22

Handling Jumps:4 cases

• Jump on both tapes.

• Jump on the first but not on the second tape.

• Jump on the second but not on the first tape.

• Jump on neither tape (this is what traverse1 does).

Page 23: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 23

4 Corresponding Clauses

traverse2('#':'#',Tape1,Tape1,Tape2,Tape2).

traverse2('#':L2,Tape1,Tape1,[L2|RestTape2],RestTape2).

traverse2(L1:'#',[L1|RestTape1],RestTape1,Tape2,Tape2).

traverse2(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2).

Page 24: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 24

Morphological Analysis with FSTs

• Morphology is concerned with the internal structure of words.– How can a word be decomposed into morphemes?– How do the morphemes combine?– What are legitimate combinations?

• Basic idea is to write FSTs that map the surface form of a word to a description of the morphemes that constitute that word or vice versa.

• Example: wizard+s to wizard+PL or kiss+ed to kiss+PAST.

Page 25: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 25

Plural Nouns in English

• Regular Forms – add an s as in wizard+s. – add –es as in witch +s

• Handled with morpho-phonological rules that insert an e whenever the morpheme preceding the s ends in s, x, ch or another fricative.

• Irregular forms– mouse/mice– automaton/automata

• Handled on a case-by-case basis• Require transducer that translates wizard+s into

wizard+PL, witch+es into witch+PL, mice, into mouse+PL and automata into automaton+PL.

Page 26: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 26

FST for English Plurals

Page 27: CSA3050: Natural Language Algorithms Finite State Devices

October 2004 CSA3050 NLP Algorithms 27

FST in Prolog

lex(wizard:wizard,`STEM-REG1').lex(witch:witch,`STEM-REG2').lex(automaton:automaton,`IRREG-SG').lex(automata:`automaton-PL',`IRREG-PL').lex(mouse:mouse,`IRREG-SG').lex(mice:`mouse-PL',`IRREG-PL').