nlp. introduction to nlp (u)nderstanding and (g)eneration language computer (u) language (g)
DESCRIPTION
(U)nderstanding and (G)eneration Language Computer (U) Language (G)TRANSCRIPT
NLP
Introduction to NLP
Text Generation
Basic NLP Pipeline
• (U)nderstanding and (G)eneration
Language Computer(U) Language(G)
Definition
• Natural language generation is the process of deliberately constructing a natural language text in order to meet specified communicative goals.
[McDonald 1992]
What is NLG?
• Mapping meaning to text• Stages:
– Content selection– Lexical choice– Sentence structure: aggregation, referring expressions– Discourse structure
Example of an NLG System
• FOG (Goldberg et al. 1994)• Weather forecast reports
for the Canadian Weather Service
• Input– Numerical simulation data
annotated by humans
Plandoc
• Function:– Produces a report describing the simulation options that
an engineer has explored• Input
– A simulation log file• Developer
– Bellcore and Columbia University
Input for Plandoc
• RUNID fiberall FIBER 6/19/93 act yes• FA 1301 2 1995• FA 1201 2 1995• FA 1401 2 1995• FA 1501 2 1995• ANF co 1103 2 1995 48• ANF 1201 1301 2 1995 24• ANF 1401 1501 2 1995 24• END. 856.0 670.2
Output
• This saved fiber refinement includes all DLC changes in Run-ID ALLDLC. RUN-ID FIBERALL demanded that PLAN activate fiber for CSAs 1201, 1301, 1401 and 1501 in 1995 Q2. It requested the placement of a 48-fiber cable from the CO to section 1103 and the placement of 24-fiber cables from section 1201 to section 1301 and from section 1401 to section 1501 in the second quarter of 1995. For this refinement, the resulting 20 year route PWE was $856.00K, a $64.11K savings over the BASE plan and the resulting 5 year IFC was $670.20K, a $60.55K savings over the BASE plan.
Considerations
• NLG is about choices– Content– Coherence– Style– Media– Syntax– Aggregation– Referring expressions– Lexical choice
Introduction to NLP
Features and Unification
Need for feature-based grammars• Example
– The dogs bites (agreement)• Example
– many water (count/mass nouns)• Idea
– S NP VP (if the person of the NP is equal to the person of the VP)
Unification Grammars
• Types of unification grammars– LFG, HPSG, FUG
• Handle agreement– e.g., number, gender, person
• Unification– Two constituents can be combined only if their features
can unify
Feature unification
CAT NPPERSON 3NUMBER SINGULAR
CAT NPNUMBER SINGULARPERSON 3
CAT NPNUMBER SINGULARPERSON 3
U
Feature unification
CAT NPPERSON 3NUMBER SINGULAR
CAT NPPERSON 1PERSON 3
U
FAILURE
Agreement
• S NP VP{NP PERSON} = {VP PERSON}
• S Aux NP VP{Aux PERSON} = {NP PERSON}
• Verb bites{Verb PERSON} = 3
• Verb bite{Verb PERSON} = 1
Subcategorization
• VP Verb{VP SUBCAT} = {Verb SUBCAT}{VP SUBCAT} = INTRANS
• VP Verb NP{VP SUBCAT} = {Verb SUBCAT}{VP SUBCAT} = TRANS
• VP Verb NP NP{VP SUBCAT} = {Verb SUBCAT}{VP SUBCAT} = DITRANS
Systemic Grammars
• Language is viewed as a resource for expressing meaning in context (Halliday, 1985)
• Layers: mood, transitivity, theme
The system will save the document
Mood subject finite predicator object
Transitivity actor process goal
Theme theme rheme
Example
(:process save-1:actor system-1:goal document-1:speechact assertion:tense future
) Input is underspecified
The Functional Unification Formalism (FUF)
• Based on Kay’s (83) formalism• Partial information, declarative, uniform,
compact• Same framework used for all stages: syntactic
realization, lexicalization, and text planning
Functional Analysis
• Functional vs. structured analysis• “John eats an apple”• Actor (John), affected (apple), process (eat)• Suitable for generation
Partial vs. Complete Specification
• Voice: An apple is eaten by John• Tense: John ate an apple• Mode: Did John eat an apple? • Modality: John must eat an apple
action = eatactor = Johnobject = apple
Unification
• Target sentence• Input FD• Grammar• Unification process• Linearization process
Path notation
• View an FD as a tree• To specify features, use a path
– {feature feature … feature} value– e.g. {prot number}
• Also use relative paths– {^ number} value = the feature number for the current node– {^ ^ number} value = the feature number for the node above
the current node
Sample input
((cat s) (prot ((n ((lex john))))) (verb ((v ((lex like))))) (goal ((n ((lex mary))))))
Sample Grammar((alt top (((cat s) (prot ((cat np))) (goal ((cat np))) (verb ((cat vp) (number {prot number}))) (pattern (prot verb goal))) ((cat np) (n ((cat noun) (number {^ ^ number}))) (alt (((proper yes) (pattern (n))) ((proper no) (pattern (det n)) (det ((cat article) (lex “the”))))))) ((cat vp) (pattern (v)) (v ((cat verb)))) ((cat noun)) ((cat verb)) ((cat article)))))
Sample Output((cat s) (goal ((cat np) (n ((cat noun) (lex mary) (number {goal number}))) (pattern (n)) (proper yes))) (pattern (prot verb goal)) (prot ((cat np) (n ((cat noun) (lex john) (number {verb number}))) (number {verb number}) (pattern (n)) (proper yes))) (verb ((cat vp) (pattern (v)) (v ((cat verb) (lex like))))))
Unification Example
Unify Prot
Unify Goal
Unify VP
Unify Verb
Finish
The SURGE grammar (Elhadad)• Syntactic realization front-end• Variable level of abstraction• 5,600 branches and 1,600 alts
Lexicalchooser
SURGE LinearizerMorphology
Lexicalized FD Syntactic FD
Text
NLP