adpositional grammars
Post on 18-Jul-2015
918 Views
Preview:
TRANSCRIPT
Introduction Theory Application Implementation Machine translation Conclusions
Adpositional GrammarsA multilingual grammar formalism for NLP
Federico Gobbofederico.gobbo@uninsubria.it
Universita dell’Insubria, VareseCC! BY:! $\! C!
Varese, 9 January 2009
Introduction Theory Application Implementation Machine translation Conclusions
Two key questions posed by Leibniz
1. How are the laws of human thought made?
2. How can linguistic knowledge be formalised?
Introduction Theory Application Implementation Machine translation Conclusions
Two current approaches for question 2
Since Chomsky 1956: generative-transformational grammars.
Pro: well formalised, good algorithms.
Con: constituents are not linguistically adherent.
Since Tesniere 1959: dependency-based grammars.
Pro: linguistically adherent and rich.
Con: di!cult to be rendered computationally.
Introduction Theory Application Implementation Machine translation Conclusions
Pennacchietti’s research
Since 1974, studies on prepositions.
moving from the Tesnerian school;
unsatisfied by the vague notion of ‘dependency’;
looking for a clear notion;
found in the Langacker’s dichotomy trajector/landmark(tr/lm).
The dichotomy tr/lm is derived from Gestalt psichology.
Introduction Theory Application Implementation Machine translation Conclusions
Pennacchietti’s main results
Prepositions build the structure of natural languages (NL):
expressing the relation between a trajector (yellow)...
...and an active landmark (red);
...or a passive landmark (blue).
A pseudo-formal system is built:
2 directions (tr ! lm);
4 configurations in a Cartesian space;
each NL is depicted by a prepositional space.
Introduction Theory Application Implementation Machine translation Conclusions
The prepositional space of English (specimen)
!"#$
non-dimensional dimensional
applic
ativ
ere
troap
plic
ativ
e
": to, ... #: between, ...
$: with, ...%: from, ...
Introduction Theory Application Implementation Machine translation Conclusions
Trajector/landmark in a glance 1/4
&(A book) is between "(two men).&(Un libro) e tra "(due uomini).
Introduction Theory Application Implementation Machine translation Conclusions
Trajector/landmark in a glance 2/4
&(Two men) hold ! "(a book).&(Due uomini) tengono ! "(un libro).
Introduction Theory Application Implementation Machine translation Conclusions
Trajector/landmark in a glance 3/4
&(Two men) are with $(a book).&(Due uomini) sono con $(un libro).
Introduction Theory Application Implementation Machine translation Conclusions
Trajector/landmark in a glance 4/4
&(A book) is with '(two men).&(Un libro) e con '(due uomini).
Introduction Theory Application Implementation Machine translation Conclusions
This structure is composable and recursive
&(Mr. C) receives ! "(the book) from %(Mr. A).&(Il Sig. C) riceve ! "(un libro) da- %(-l Sig. A).
Introduction Theory Application Implementation Machine translation Conclusions
Foundation of adtrees
Adpositional trees (adtrees) represent that structure. Caveats:
adtrees are always Porfirian, i.e., binary;
prepositions are generalised in adpositions;
each tree has a hook, i.e., where adpositions stand;
the basic unit of NLs is the morpheme, not the word;
adpositional morphemes form morphosyntax;
non-adpositional morphemes are lexemes.
Adtrees represent morphosyntax and some pragmatic phenomena.
Introduction Theory Application Implementation Machine translation Conclusions
The abstract minimal adtree types
Chapter 2. Adpositional trees
!!
!!
!!
""""""
!!"
adp
lm tr
!!
!!
!!
""""""
!#$
adp
tr lm
!!
!!
!!
""""""
!%$
adp
lm tr
!!
!!
!!
""""""
!&"
adp
tr lm
Figure 2.8: The abstract minimal adtree types
!!
!!
!!
""""""
!'(
adp
tr|lm lm|tr
Figure 2.9: The minimal anonymous adtree
!!
!!
!!
""""""
!'(e
carne !!
!!
!!
""""""
!'(in
!!
!!
!!
""""""
!'(da
ieri frigorifero
pesce
Figure 2.10: The relation between adtrees and Ceccato’s translation system
47
The hook is derived from Ceccato’s work in machine translation.
Introduction Theory Application Implementation Machine translation Conclusions
Semantics
Lexemes form semantics;
there are four grammar characters;
each lexeme has a fundamental grammar character;
lexemes can be tranferred to another grammar character;
each lexeme has a valence value.
Adpositional grammar = adtrees + dictionary.
Introduction Theory Application Implementation Machine translation Conclusions
Semantics
Lexemes form semantics;
there are four grammar characters;
each lexeme has a fundamental grammar character;
lexemes can be tranferred to another grammar character;
each lexeme has a valence value.
Adpositional grammar = adtrees + dictionary.
Introduction Theory Application Implementation Machine translation Conclusions
The 4 grammar characters
From Whorf and Tesniere:
stativation (O);
adjunctivation, i.e., what modifies stativation (A);
verbification (I);
circumstantiation, i.e., what modifies verbification (E).
This structure of semantics is cross-lingually valid (Whorf).
Introduction Theory Application Implementation Machine translation Conclusions
How the dictionary is built (example)
transference English Italian French GermanA long lung-o long langA>O length lungh-ezz-a longu-er LangeA>E long lung-amente longu-ement ent-langA>I length-en al-lung-are (r)al-long-er ver-lang-ern
Introduction Theory Application Implementation Machine translation Conclusions
Valence and actants
Valence is derived from Tesniere. Some examples:
nevica (valence = 0);
Carl smiled (valence = 1);
Liza reads a book (valence = 2);
Liza gives a kiss to Paul (valence = 3).
Actants expresses pragmatics:
Carl and Liza are Agents (Na);
Paul is an Experiencer (Ne);
a book is a Patient (Np).
A man is with a book
2.11. A cross-lingual example
• (49-en.) A man is !(with) a man.
• (49-it.) Un uomo e !(con) un libro.
• (49-tu.) O kitap!-lı adam-dır.
Again, in example 49 the English and Italian adtrees are very similar (seeFigure 2.55. Adpositions with : con are false, as the bivalent verb is to be
!!
!!
!!
""""""
!"#!
!!
!!
!!
""""""
!$#!
A manNa
""""""!
!!
!!!
""""""
!!%
with
!!
!!
!!
""""""
!$#!
a bookNp
is
!!
!!
!!
""""""
!"#!
!!
!!
!!
""""""
!$#"-o
un uom-Na
""""""!
!!
!!!
""""""
!!%
con
!!
!!
!!
""""""
!$#"-o
un libr-Np
e
Figure 2.55: The English and Italian adtrees of A man is with a man (49).
with : essere con, a situation already explained in section 2.9.1. In contrast,the Turkish adtree is far more simple (see Figure 2.56). Example 49 is builtmorphologically as follows:
O kitap- -lı adam- -dırThat book his man is
Interestingly, the Turkish language use the determiner O as the syntacticsubject S, instead of the Agent adam-, which is part of the second valence,unlike English and Italian.
2.11.4 The relation of possession
Let us see how the convention for PhAdS acts in NLs like Italian and En-glish, while in Turkish is not so heavily needed. In conceptual space 50, the
86
Introduction Theory Application Implementation Machine translation Conclusions
Why Esperanto?
Structural facts:
linguistic phenomena of English, German, Russian, Frenchall-in-one;
highly regular, as quasi-natural languages (QNLs) by children;
the morphology is considerable small compared to NLs;
grammar characters are always expressed, even redundantly.
Basic sociolinguistical facts:
Launched in 1887, it survived two world wars.
Stable speech community, approx. 50,000 active speakersworldwide.
Free web corpora available (e.g., Le Monde Diplomatique).
Introduction Theory Application Implementation Machine translation Conclusions
Phraseology is flexible
English, German and Italian:
Alice had a shower.
Alice duscht.
Alice fece una doccia.
The same sentence in Esperanto:
Alico havis duson. (English-like)
Alico faris duson. (Italian-like)
Alico dusis. (German-like)
Introduction Theory Application Implementation Machine translation Conclusions
What is a QNL?
Esperanto is a child-like language for grammar regularity,although planned.
Quasi-Natural English vs. English:
Two child-s run-ed towards the mouse-s.
Two children run towards the mice.
Quasi-Natural Italian vs. Italian:
I due uov-i and-ano cuoci-uti piu bene.
Le due uova vanno cotte meglio.
Introduction Theory Application Implementation Machine translation Conclusions
Application at a sentence level
%-junctives (example):
&(Alfredo povas pagi), car li estas rica.
&(Al can pay), because he is rich.
&(Alfredo puo pagare), poiche e ricco.
$-junctives (example):
Alfredo povas pagi, do &(li estas rica).
Al is rich, therefore &(he can pay).
Alfredo e ricco, dunque &(puo pagare).
Introduction Theory Application Implementation Machine translation Conclusions
The architecture
strings ( words ( tokens;
a token is a set of tags;
the parser builds the adtrees;
adtrees are the data structures.
Introduction Theory Application Implementation Machine translation Conclusions
How it is implemented
Von Neumann’s virtual machine with enough memory;
C-like pseudo-language for the lexical analyser;
non-determinism simulated through backtracking;
parsing algorithm as a set of derivation rules;
complex logic syntax;
suitable for a logical framework (e.g., Isabelle).
Introduction Theory Application Implementation Machine translation Conclusions
The description of parsing (specimen)
S. Prefix* Spec {-o-|-a-|-e-|-i-|)} Atom Su!x* finalE. demand -o- sign -as
kun- vojag -ufrenez -ig- -os
daur -i- pov -isP. I5* I4 I3 I2 I1* I0M. * + + , * ,
The parsing of the verbal group in Esperanto.
Introduction Theory Application Implementation Machine translation Conclusions
The predicate of the verbal group (specimen)Chapter 7. The formal model
recognising a I-group is as follows.
I-group(u, v, n, d, c) ! (I " v # Rep(d) " v # (7.7)# I-comp-group(u,$%v , n)) &
& ('c!.(c = c! ( -n & c = c!) ## n = 1 # Rep(est-) " u ## I " %)u # Rep(d) " %)u #
# (Adj-group(%)%)u , v,*, c!) &
& e-group(%)%)u , v) &
& nom-stative-group(%)%)u , v))) .
Except for extracting the valence from the lexeme acting as the root ofthe verb, the I-comp-group is a !-group.
I-comp-group(u, v, n) ! 'a.!-atom-group(u, v, n, a) & (7.8)& ('[u,v)w.'l.!-spec-group(u, w, l, a) #
# !-atom-group(%)w , v, n)) .
7.4.3 O-predicates
A stative group is a noun with a precise function within a phrase. Stativegroups can be distinguished in valence arguments and extra arguments.The valence stative groups are:
1. the first valence argument (subject S);
2. the second valence argument (object O);
3. the third valence argument (dative D).13
In Esperanto, extra arguments can be stative groups, where the hook canbe a preposition. As this case is deeply relevant in order to write predi-cates and rules, I call these particular stative groups prepositional clauses:analogously to correlative clauses, prepositional clauses are pseudophrasesacting as adjectives or circumstantials (see section 5.3.9 for C-correlatives).14
The most fundamental stative group is an O-group, so I call this set ofpredicates ‘O-predicates’. A stative group is either a simple stative group(O-S-group), or a composition of stative groups. The p parameter repre-sents the main adposition. More precisely, it is the one that determines theadtype: in Esperanto, this is mostly the preposition. Under a formal pointof view, the main adposition is the one that is used to attach the adtreecorresponding to the stative group to the governing adtree. Visually, themain adposition is never ‘pushed down’ by a left triangle (") or a right one
269
Introduction Theory Application Implementation Machine translation Conclusions
The rule for the verb esti
7.5. Parsing rules
tural rule takes care of concluding the derivation (section 6.3).
! "-suffix-range(u, v)! T! Rep(s) ! u
!-suffix! {+"-suffix-range("#u , v),""-suffix-range(u, v)}
!
!!
!!
!!
""""""
!$##
s T
(7.68)
7.5.3 The I-group
The I-group describes a verb. Thus, this set of rules parse an I-range, corre-sponding to a sequence of tokens satisfying the I-group predicate.
The general I-range gets parsed by the following rule. The extraction ofthe verbal final is left to the rules that build the adtree for a phrase.
! I-range(u, v)! %! "-group(u,&"v )
I1! {+"-range(u,&"v ),"I-range(u, v)}! %
(7.69)
As said before, the verb esti is an exception (section 5.5.5), so the fol-lowing rules apply when the it is used in conjunction with an adjective, acircumstantial or a nominal stative, respectively.
! I-range(u, v)! %! Adj-group(
"#"#u , v,', c)
! {Adj-range("#"#u , v)}
! %..."Adj
! %! A[Rep(a),Sign(!),Dir(")]
I2! {"I-range(u, v)}
!
!!
!!
!!
""""""
!'&a
A est-
(7.70)
284
Introduction Theory Application Implementation Machine translation Conclusions
The machine translation architecture
rule-based, transfer system;
from Esperanto to English/Chinese.
Two steps are performed:
1 the first step: adtree transformation (metataxis);2 the second step: substitution (i.e., lexeme-by-lexeme).
English2.11. A cross-lingual example
!!
!!
!!
""""""
!!"!
!!
!!
!!
""""""
!#"!
The !!
!!
!!
""""""
!#"
where
!!
!!
!!
""""""
!$%!
often !!
!!
!!
""""""
!!"!
I !!
!!
!!
""""""
!$%!
literature study
library
""""""""""""""""""""""""!
!!
!!!
""""""
!$%!
!!
!!
!!
""""""
!#"!
far away
is
!!
!!
!!
""""""
!!"-a"
!!
!!
!!
""""""
!#"!
La !!
!!
!!
""""""
!#"
dove
!!
!!
!!
""""""
!$%!
spesso !!
!!
!!
""""""
!!"#-o
! !!
!!
!!
""""""
!$%-a"
letteratur- studi-
bibliotec-
""""""""""""""""""""""""!
!!
!!!
""""""
!$%!
!!
!!
!!
""""""
!#"-a"
molto lontan-
e
Figure 2.47: The English and Italian adtrees of The library where I... (46).
80
Italian
2.11. A cross-lingual example
!!
!!
!!
""""""
!!"!
!!
!!
!!
""""""
!#"!
The !!
!!
!!
""""""
!#"
where
!!
!!
!!
""""""
!$%!
often !!
!!
!!
""""""
!!"!
I !!
!!
!!
""""""
!$%!
literature study
library
""""""""""""""""""""""""!
!!
!!!
""""""
!$%!
!!
!!
!!
""""""
!#"!
far away
is
!!
!!
!!
""""""
!!"-a"
!!
!!
!!
""""""
!#"!
La !!
!!
!!
""""""
!#"
dove
!!
!!
!!
""""""
!$%!
spesso !!
!!
!!
""""""
!!"#-o
! !!
!!
!!
""""""
!$%-a"
letteratur- studi-
bibliotec-
""""""""""""""""""""""""!
!!
!!!
""""""
!$%!
!!
!!
!!
""""""
!#"-a"
molto lontan-
e
Figure 2.47: The English and Italian adtrees of The library where I... (46).
80
From Esperanto to “Chinese-anto”
“Chinese-anto”
Chinese
Chapter 2. Adpositional trees
!!
!!
!!
""""""
!!"de
!!
!!
!!!
!!
!!!
""""""
!#"!
wo !!
!!
!!
""""""
!$%!
jıng chang !!
!!
!!
""""""
!$%!
wen xue xue xı
!!
!!
!!
""""""
!#"!
tushu guan !!
!!
!!
""""""
!$%lı
zher !!
!!
!!
""""""
!!"!
hen yuan
!!
!!
!!
""""""
!!"
suru
!!
!!
!!!
!!
!!!
""""""
!#"
ga"
watashi !!
!!
!!
""""""
!$%wo
bun gaku benkyoo
!!
!!
!!
""""""
!#"
wa" #desu
toshokan tooi
Figure 2.48: The Chinese and Japanese adtrees of wo jıng... (46).
Table 2.1: Pragmatic analysis of the cross-lingual example
Actants English Italian TurkishAgent (Na) man uom- adam-Patient (Np) book libr- kitab-Action (V ) read- legg- oku-,...
81
Introduction Theory Application Implementation Machine translation Conclusions
Main results of this dissertation
Adgrams are:
cognitively grounded (on the dichotomy trajector/landmark);
cross-linguistically valid (English, Italian, Chinese...);
not domain-specific(full Esperanto implementation: 179 logic formulas, 56predicates);
formally robust (described in a complex logic syntax);
computationally well found(suitable for up-to-date logical frameworks).
Adtree macros implemented in Latex will be relased in CTAN.
Introduction Theory Application Implementation Machine translation Conclusions
Pennacchietti’s essential references
Pennacchietti, Fabrizio A. 2008. The Prepositional System ofClassic Syriac and that of Sureth. J/Simtha. forthcoming.
Pennacchietti, Fabrizio A. 2006. Come classificare lepreposizioni? Una nuova proposta. Quaderni del Laboratoriodi Linguistica. 6. Normale: Pisa.
Pennacchietti, Fabrizio A. 2006. Propono klasifiki laprepoziciojn de Esperanto. IKU 2006. UEA: Firenze.
Pennacchietti, Fabrizio A. 1976. La prepozicia sistemo deEsperanto. Esperantologiaj Kajeroj I. ELTE: Budapest.
Pennacchietti, Fabrizio A. 1974. Appunti per una storiacomparata dei sistemi preposizionali semitici. Annali. IstitutoOrientale: Napoli.
Introduction Theory Application Implementation Machine translation Conclusions
Gobbo’s essential references
Gobbo, Federico. 2008. Pianificare il lessico scientificointernazionale. Giuseppe Peano and his School. Universitadegli Studi: Torino.
Gobbo, Federico. 2006. L’esperanto e la traduzioneautomatica, in Borbone et al. Loquentes linguis. Harassowitz:Wiesbaden.
Gobbo, Federico. 2005. The digital way to spread conlangs.ICIL 2005. Universitat Jaume I: Castellon (Spain).
Gobbo, Federico. 2005. The European Union’s Need for anInternational Auxiliary Language. Journal of UniversalLanguage. 6.
Gobbo, Federico. 1998. Il dilemma dell’esperanto. Master’sthesis. Universita degli Studi: Torino.
top related