embedding nomlex-br into openwn-pt
DESCRIPTION
Slides not presented at GWN2014TRANSCRIPT
+
Embedding NomLex-BR into OpenWN-PT Valeria de Paiva (joint work with Alexandre Rademaker, Gerard de Melo and Livy Real)
+NomLex?
http://nlp.cs.nyu.edu/nomlex/index.html
+NomLex n a dictionary of English nominalizations,
under Catherine Macleod
n relate the nominal complements to the arguments of the corresponding verb
n 1025 entries of several types of lexical nominalizations
n first version on January 15, 1999, latest version October 2001
n Developed into NomLex-Plus and NomBank
n downloadable from http://nlp.cs.nyu.edu/nomlex/index.html
Alexander’s destruction of the city happened in 330 BC.
+NOMLEX-BR? n a dictionary of Portuguese
nominalizations
n Relate nominals to corresponding verbs
n Over 1000 entries of several types of lexical nominalizations
n first version of NOMLEX-BR in 2011, much expanded 2013
n downloadable https://github.com/arademaker/nomlex-br
Construção da rodovia Transamazônica, na década de 70, pelo governo Medici, uma das obras faraonicas da ditadura militar.
+Nominalizations in Portuguese
n Nominalizations are difficult to deal in KR systems, as it is harder to obtain the arguments of the nominal predicate
n NOMLEX project (Macleod et al., 1998) provides a well-established, open access baseline
n nominalizations with the suffixes -ion, -ment and -er, which work well in Portuguese
n E.g. construction/ construcao, adjournment/adiamento and writer/escritor
n 90% of the original resource easily manually translated.
+ Into OpenWordnet-PT? Why?
We need a Portuguese Wordnet for our work, as complete and accurate as we can get it.
Nomlex-BR helps completenes and accuracy of OpenWN-PT
+OpenWordNet-PT…
n data is freely available
n correspondence with Princeton WordNet
n From Universal WordNet(de Melo and Weikum, 2009) high recall with high precision for the more salient words
n Useful embedding: checking nominalizations from the Portuguese NOMLEX were related to the corresponding verbs showed issues in OpenWN-PT.
https://github.com/arademaker/wordnet-br
+ OpenWN-PT: what does it look like?
n Typical good entry with minor manual improvements.
n Automatic produces candidate Portuguese words for each of some of WN3.0 synsets.
n Check suggested words and add Portuguese gloss and examples.
+ OpenWN-PT: what does it look like?
Not very useful, but sense exists
No single verb in Portuguese for this synset…
+OpenWN-PT: some issues…
Capitalized items, plurals, duplicates, a few gender issues, missing items…
+OpenWN-PT: RDF Representation
n OpenWN-PT encoded and distributed in RDF/OWL.
n Both data model and actual data in the same format. Plus existing data processing tools, including databases (“triple stores”) with SQL-like query interfaces (SPARQL).
n Standard W3C encoding of WordNet in RDF since 2006. OpenWN-PT is modelled after and fully interoperable with Princeton WordNet.
n find Portuguese equivalents for specific English word senses and vice versa.
n OpenWN-PT is part of a large ecosystem of compatible resources, including domain identifiers and mappings to Wikipedia.
+A small Experiment… n Accuracy: Since the lexicon was
manually created, it is mostly accurate. Minor typos and bugs are checked when comparing to OpenWN-PT.
n Coverage: Using DHBB to complete NOMLEX-BR, completed after submission
n Need more systematic effort. But results were encouraging
+ Conclusions n We presented NomLex-BR, an lexicon of
nominalizations in Brazilian Portuguese.
n NomLex-BR is embedded into OpenWordNet-PT and shares its RDF representation.
n Recent improvements include better coverage: newer suffixes and Nomage incorporation.
n The data is freely available from http://github.com/ arademaker/wordnet-br/ and a SPARQL Endpoint at logics.emap.fgv.br:10035.
n Browsing via Open Multilingual Wordnet //www.casta-net.jp/ ~kuribayashi/ cgi-bin/wn-multi.cgi is fun
+ NomLex-BR: next steps?..
n Work with Claudia Freitas on leveraging Linguateca’s PAPEL, ACDC and Floresta Sintá(c)tica.
n Lists from Linguateca’s resources complement NomLex-BR using corpora and make sure our resource is not simply a translation.
n Classification of nominalizations?
n Adding the Portuguese terms that satisfy different relations?OpenVerbNet-PT?
n Glosses?
+
Thanks!
+References Revisiting a Brazilian Wordnet. Valeria de Paiva, Alexandre Rademaker, (2012) Proceedings of Global Wordnet Conference, Global Wordnet Association, Matsue. OpenWordNet-PT: An Open Brazilian WordNet For Reasoning. de Paiva, Valeria, Alexandre Rademaker, and Gerard de Melo. In Proceedings of the 24th International Conference On Computational Linguistics. http://hdl.handle.net/10438/10274. OpenWordNet-PT: A Project Report. Alexandre Rademaker, Valeria de Paiva, Gerard de Melo, Livy Real and Maira Gatti. Proceedings of the 7th Global Wordnet Conference, Tartu, Estonia. Global Wordnet Association, 2014. Embedding NomLex-BR Nominalizations Into OpenWordnet-PT. Coelho, Livy Maria Real, Alexandre Rademaker, Valeria De Paiva, and Gerard de Melo. 2014. In Proceedings of the 7th Global WordNet Conference. Tartu, Estonia
+ OpenWN-PT: true lexical gaps?...
+Other stuff to add in?…
n Onto.PT, ES wordnet?
n Editing interfaces?
n BabelNet?
n NER issues?
n Temporal issues?
n Work with Claudia Freitas?…Leonel?
n Work on implicatives/factives in Portuguese?
n FOIS workshop
+References Towards a Universal Wordnet by Learning from Combined Evidence Gerard de Melo, Gerhard Weikum (2009) 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China. Bridges from Language to Logic: Concepts, Contexts and Ontologies Valeria de Paiva (2010)Logical and Semantic Frameworks with Applications, LSFA'10, Natal, Brazil, 2010. `A Basic Logic for Textual inference", AAAI Workshop on Inference for Textual Question Answering, 2005. ``Textual Inference Logic: Take Two", CONTEXT 2007. ``Precision-focused Textual Inference", Workshop on Textual Entailment and Paraphrasing, 2007. PARC's Bridge and Question Answering System Proceedings of Grammar Engineering Across Frameworks, 2007.
+ Simplifying the PARC’s Bridge Architecture
Idea: Simplify and reproduce components in PORTUGUESE
F-structure semantics
KR
Parsing KR Mapping
Inference Engines Text
Sources
Question
Assertions
Query
Grammar Stanford Parser
Textual Inference logics
Term rewriting OpenWN-PT SUMO-PT KR mapping rules