grammar engineering: what is it good for? miriam butt (university of konstanz) and martin forst...

16
Grammar Engineering: Grammar Engineering: What is it good for? What is it good for? Miriam Butt Miriam Butt (University of Konstanz) (University of Konstanz) and and Martin Forst (NetBase Martin Forst (NetBase Solutions) Solutions) Colombo 2014 Colombo 2014

Upload: jocelyn-jordan

Post on 13-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Grammar Engineering:Grammar Engineering:What is it good for?What is it good for?

Miriam Butt Miriam Butt (University of Konstanz)(University of Konstanz)and and Martin Forst (NetBase Solutions)Martin Forst (NetBase Solutions)

Colombo 2014Colombo 2014

Page 2: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

OutlineOutline

What is a deep grammar and why would you want one? XLE and ParGram Robustness techniques Generation and Disambiguation Some Applications:

– Question-Answering System – Murrinh-Patha Translation System– Computer Assisted Language Learning (CALL)– [Text Summarization]

Page 3: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Deep grammarsDeep grammars

Provide detailed syntactic/semantic analyses– HPSG (LinGO, Matrix), LFG (ParGram)– Grammatical functions, tense, number, etc.

Mary wants to leave.

subj(want~1,Mary~3)

comp(want~1,leave~2)

subj(leave~2,Mary~3)

tense(leave~2,present)

Usually manually constructed

Page 4: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Why don't people use them?Why don't people use them? Time consuming and expensive to write

– shallow parsers can be induced automatically from a training set

Brittle– shallow parsers produce something for everything

Ambiguous– shallow parsers rank the outputs

Slow– shallow parsers are very fast (real time)

Page 5: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Why should one pay attention now?Why should one pay attention now?

Robustness: – Integrated Chunk Parsers/Fragment Grammars– Bad input always results in some (possibly good) output

Ambiguity:

– Integration of stochastic methods– Optimality Theory used to rank/pick alternatives

Speed: getting closer to shallow parsers Accuracy and information content:

– far beyond the capabilities of shallow parsers.

New Generation of Large-Scale Grammars:

Page 6: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

XLE at PARCXLE at PARC

Platform for Developing Large-Scale LFG Grammars

LFG (Lexical-Functional Grammar)– Invented in the 1980s

(Joan Bresnan and Ronald Kaplan)– Theoretically stable Solid Implementation

XLE is implemented in C, used with emacs, tcl/tk XLE includes a parser, generator and transfer (XFR)

component.

Page 7: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Project StructureProject Structure Languages: Arabic, Chinese, Danish, English,

French, Georgian, German, Hungarian, Irish Gaelic, Indonesian, Japanese, Malagasy, Murrihn-Patha, Norwegian, Polish, Tigrinya, Turkish, Urdu, Welsh, Wolof…

Theory: Lexical-Functional Grammar Platform: XLE

– parser– generator– machine translation

Loose organization: no common deliverables, but common interests.

Page 8: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Grammar ComponentsGrammar Components

Each Grammar contains:• Annotated Phrase Structure Rules (S --> NP VP)

• Lexicon (verb stems and functional elements)

• Finite-State Morphological Analyzer

• A version of Optimality Theory (OT):

used as a filter to restrict ambiguities and/or parametrize the grammar.

Page 9: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

The Parallel in ParGramThe Parallel in ParGram Analyze languages to a degree of abstraction that

reflects the common underlying structure (i.e., identiy the subject, the object, the tense, mood, etc.)

Even at this level, there is usually more than one way to analyze a construction

The same theoretical analysis may have different possible implementations

The ParGram Project decides on common analyses and implementations (via meetings and the feature committee)

Page 10: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

The Parallel in ParGramThe Parallel in ParGram

Analyses at the level of c-structure are allowed to differ (variance across languages)

Analyses at f-structure are held as parallel as possible across languages (crosslinguistic invariance).

Theoretical Advantage: This models the idea of UG.

Applicational Advantage: machine translation is made easier; applications are more easily adapted to new languages (e.g., Kim et al. 2003).

Page 11: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Basic LFGBasic LFG Constituent-Structure: tree Functional-Structure: Attribute Value Matrix universal

NP

PRON they

S

VP

Vappear

PRED 'pro'

PERS 3

NUM pl

SUBJ

TENSE pres

PRED 'appear<SUBJ>'

Page 12: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Basic LFG (cont.)Basic LFG (cont.)Three basic wellformedness principles:

ConsistencyEvery attribute can only have one value.

CompletenessAll grammatical functions listed in a subcategorization frame must be present, and if thematic, they must have a PRED.

CoherenceAll grammatical functions in an f-structure must be licensed by a subcategorization frame.

Page 13: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Syntactic rulesSyntactic rules

Annotated phrase structure rules Category --> Cat1: Schemata1;

Cat2: Schemata2;

Cat3: Schemata3.

S --> NP: (^ SUBJ)=!

(! CASE)=NOM;

VP: ^=!.

Page 14: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Another sample ruleAnother sample rule

"indicate comments"

VP --> V: ^=!; "head"

(NP: (^ OBJ)=! "() = optionality"

(! CASE)=ACC)

PP*: ! $ (^ ADJUNCT). "$ = set"

VP consists of:

a head verb

an optional object

zero or more PP adjuncts

Page 15: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

LexiconLexicon

Basic form for lexical entries:word Category1 Morphcode1 Schemata1; Category2 Morphcode2 Schemata2.

walk V * (^ PRED)='WALK<(^ SUBJ)>'; N * (^ PRED) = 'WALK' . girl N * (^ PRED) = 'GIRL'.

kick V * { (^ PRED)='KICK<(^ SUBJ)(^ OBJ)>' |(^ PRED)='KICK<(^ SUBJ)>'}.

the D * (^ DEF)=+.

Page 16: Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Parsing a stringParsing a string

create-parser grammar1.lfg parse ”the dog eats bones” Ungrammatical via Unification, etc.

Simple Demo