universal networking language: advances in theory and applications

Upload: shamim-h-ripon

Post on 07-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    1/465

    Universal Networking Language:Advances in Theory and Applications

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    2/465

    Research on Computing Science

    Series Editorial BoardComité Editorial de la Serie 

    Chief Editors:Editores en Jefe 

     Juan Luis Díaz de León S. (Mexico)Gerhard Ritter (USA)

     Jean Serra (France)Ulises Cortés (Spain)

    Associate Editors:Editores Asociados 

     Jesús Angulo (Frane)Oscar Camacho (Mexico)

     Jihad El-Sana (Israel) Jesús Figueroa (Mexico) Alexander Gelbukh (Russia) Ioannis Kakadiaris (USA)Serguei Levachkine (Russia)

     Petros Maragos (Greece) Julian Padget (UK) Miguel Torres (Mexico) Mateo Valero (Spain)Cornelio Yáñez (Mexico)

    Editorial Coordination:Coordinación Editorial 

     José Ángel Cu Tinoco Miguel R. Silva Millán

    Format:Formación 

    Sulema Torres Ramos Hiram Calvo Castro

    Research on Computing Science  es una publicación trimestral, de circulación internacional, editada por el

    Centro de Investigación en Computación del IPN, para dar a conocer los avances de investigación científica ydesarrollo tecnológico de la comunidad científica internacional. Volumen 12, febrero, 2005. Tiraje: 500

    ejemplares. Certificado de Reserva de Derechos al Uso Exclusivo del Título No. 04-2004-062613250000-102,expedido por el Instituto Nacional de Derecho de Autor. Certificado de Licitud de Título No. 12897, Certificadode licitud de Contenido  No. 10470, expedidos por la Comisión Calificadora de Publicaciones y Revistas

    Ilustradas. El contenido de los artículos es responsabilidad exclusiva de sus respectivos autores. Queda prohibidala reproducción total o parcial, por cualquier medio, sin el permiso expreso del editor, excepto para uso personal

    o de estudio haciendo cita explícita en la primera página de cada documento. Impreso en la Ciudad de México,

    en los Talleres Gráficos del IPN – Dirección de Publicaciones, Tres Guerras 27, Centro Histórico, México, D.F.

    Distribuida por el Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Esq. Av. Miguel Othónde Mendizábal, Col. Nueva Industrial Vallejo, C.P. 07738, México, D.F. Tel. 57 29 60 00, ext. 56571.

    Editor Responsable:  Juan Luis Díaz de León Santiago, DISJ 690604

     Research on Computing Science  is published by the Centre for Computing Research of IPN. Volume 12 , February, 2005. Printing 500. The authors are responsible for the contents of their articles. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by anymeans, electronic, mechanical, photocopying, recording or otherwise, without prior permission of Centre forComputing Research. Printed in Mexico City, February, 2005, in the IPN Graphic Workshop – PublicationOffice.

    Volume 12Volumen 12 

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    3/465

    Universal Networking Language:Advances in Theory and Applications

    Volume Editors:Editores del Volumen 

     Jesús Cardeñosa Alexander Gelbukh Edmundo Tovar

    Instituto Politécnico Nacional

    Centro de Investigación en Computación

    México 2005 

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    4/465

    ISBN: 970-36-0226-6ISSN: 1665-9899

    Copyright © Instituto Politécnico Nacional 2005

    Copyright © by Instituto Politécnico Nacional

    Instituto Politécnico Nacional (IPN)Centro de Investigación en Computación (CIC)

    Av. Juan de Dios Bátiz s/n esq. M. Othón de Mendizábal

    Unidad Profesional “Adolfo López Mateos”, Zacatenco

    07738, México D.F., México

    http://www.ipn.mx

    http://www.cic.ipn.mx

    Printing: 500Impresiones 

    Printed in MexicoImpreso en México 

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    5/465

    Preface

    This volume1  is an attempt to compile and illustrate all the open lines of research

    within the UNL initiative. The included papers constitute a selection of the most sig-

    nificant papers presented in several international conferences and workshops during

    the last four years that served as a meeting point for the UNL consortium. In general,

    papers are not restricted to UNL although they are clearly predominant; they clearlyillustrate the wideness and flexibility of this UNL initiative, launched by the United

    Nations aiming at the elimination of linguistic barriers.

    Since the starting of the UNL project in 1996, the participants in the project from

    initially 15 languages have made substantial progress in technical matters and the or-

    ganizational aspects involved as well. This book attempts to provide a survey on theapproaches and theoretical studies around UNL, since research on UNL is not only

    devoted to studies on interlinguas, MT or any NLP related issues, the intrinsic proper-

    ties of UNL make it a firm candidate to support a wide variety of applications ranging

    from e-learning platforms to management of multilingual document bases. Such a va-

    riety of applications, their theoretical basis and subsequent methodological inquiries

    are at core of this volume.

    What is UNL? Its motivation and purpose

    The emerging needs and use of Internet for cultural and educational dissemination

    and commercial expansion of the peoples collide with linguistic diversity, which inprinciple diminishes the potential of Internet as a vehicle of knowledge for everybody.

    Aware of this problem, the Institute of Advanced Studies of the University of the

    United Nations University (UNU/IAS) launched the UNL project in 1996 with the

    initial participation of 15 languages (German, Arab, Chinese, Spanish, French, Hindi,

    Indonesian, English, Italian, Japanese, Latvian, Mongol, Portuguese, Russian, Thai).

    In short, the UNL Programme was initially conceived to support multilingual services

    in Internet being an alternative to classical machine translation systems.

    The UNL system revolves around a unique artificial language (Universal Network-

    ing Language) that pretends to capture the meaning of written documents. This lan-

    guage is based on the representation of concepts and its relations. The definition of

    this language has been possible thanks to the collaboration of more than one hundred

    people, prestigious researchers, and scientists of all around the world, that worked

    during the first three years of the project to produce a final version of the UNL speci-fications2.

    1  Earlier versions of the papers at pages 10, 109, 117, 125, 145, 215, 230, 254, 268, 276, 309,

    347, 359, 370, 380 have been published in the Proceedings of Convergences’03, Alexandria,

    Egypt. Earlier versions of the papers at pages 3, 10, 27, 38, 101, 261, 326 have been pub-

    lished in the Proceedings of LREC-2002.2  UNL Specifications, v.3.1 available at

    http://www.undl.org/unlsys/unl/UNL%20Specifications.htm

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    6/465

    The UNL organization

    The UNL initiative has often been regarded as “hidden organization”. The first years

    of the project (1996-2000) were devoted to the definition of the interlingua and to the

    development of the essential components required to undertake the basic process in

    UNL (mainly dictionaries and language generators). During this period, the organiza-

    tion was closed and limited to a number of participants, because of the need to define

    the specifications of the language.

    By the end of this period, the UNL project reached a significant degree of quality

    in the development of components, linguistic resources and technical specifications;

    and the specifications were finally produced. Once the specifications were finished,

    they were made public and accessible to all the international community, so that col-

    laboration and participation in this initiative is completely open. As a consequence of this degree of development, the Board of the United Nations

    University, in its fifth meeting in 2000, agreed on the creation of a new institution re-

    sponsible for the organization and promotion of the UNL in the future under the um-

    brellas of the United Nations. This new entity was the UNDL Foundation, with head-

    quarters in Geneva3. The development of the components of different languages was

    assigned to the so-called Language Centres, constituted by the initial teams in each

    country in charge of the development of the essential components of UNL.

    The year 2004 represents a turning point in the evolution of UNL for two main rea-

    sons. First, it is the year where a new period coordinated and fostered by the Lan-

    guage Centres starts for the debugging, updating and expansion of linguistic resources

    and developed components of their representative languages, in order to respond to

    the institutional and marketable challenges at a pre-competitive level in the support of

    multilingual services. Second, it is the year where the UNL patent has been approvedin USA for the UN (US Patent No. 6704700 B1, March 2004). It has been the first

    software patent of the United Nations.

    Open nature and scientific dissemination of UNL

    Since 2002, an open annual conference around the convergence of language, culture

    and knowledge is being held as a meeting point for researchers, politicians, linguists

    and engineers. The most recent edition of this open conference was Convergences’03,

    held in Alexandria, Egypt. The most significant papers from this conference have se-

    lected and included in this volume. Additionally, an international workshop on UNL

    and Interlinguas was organized in 2002 (International Workshop on UNL, other Inter-linguas and their Applications, held at Las Palmas de Gran Canaria, May, 2002), pa-

    pers from this workshop are also compiled in this volume. Finally, we include the pa-

    pers of the current edition of the UNL Workshop, held in Mexico D.F, February,

    2005.

    These conferences and workshops try to be a forum where all the interested people

    in this initiative find a vehicle for communication and exchange of knowledge. The

    3  www.undl.org

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    7/465

    UNL is a great initiative that could never succeed and advance if the number of par-

    ticipants is limited to the initial ones. The heterogeneity of the authors and languages

    involved in this selection of papers shows the open nature of UNL.

    Research on UNL: Current Trends

    Apart from the mere applied studies of UNL, there is a current important trend ontheoretical studies of UNL, even though there is a final version of the specifications of

    the language, dating to July 2003.

    The rationale for such theoretical research is the need for standardization and ho-

    mogenization on the use of the Interlingua both at the applied level and at the theo-

    retical level. The UNL Specifications turned out to be subject to different personal in-terpretations, thus creating own UNL dialects. This is not desirable for an interlingua,

    that claims to be language independent and that, in fact, turned out to be “person-

    dependent”. For this reason, it is important and desirable to foment theoretical studies

    on UNL, both from the linguistic point of view and the knowledge point of view.

    From a scientific point of view, UNL follows the approach of the concept of Inter-

    lingua, as an “artificial” language aiming at the neutral representation of linguistic

    meaning. In this sense its roots can be sought in the tradition of MT interlinguas andin the tradition of Knowledge Representation formalisms.

    When viewed as an interlingua, UNL differs from some of its predecessors and

    current Interlinguas in the generality of appliance, that is, UNL is not restricted to a

    number of languages or to a given domain. Thus, its design pretended to show the

    highest degree of language independence while retaining natural language expres-

    siveness in order to support multilingual generation tasks.Of course, the staging of UNL is such a general enterprise that requires research

    and efforts. This process can be divided into several periods:

     –   Creation of deconversion and enconversion modules, (see Part 3) that is, devel-opment of the basic tools to undertake the basic architecture of the UNL system

    (enconversion and generation), along with dictionaries. Although basic, it is con-

    ditio sine qua non  to have powerful generation systems. This a fruitful trend in

    the UNL consortium, with three different approaches:

    1.  The official one: those using a common engine provided by the UNL Center.

    2.  The integrative ones: those that have integrated UNL into pre-existing MT

    systems, following the transfer-based architecture, showing the flexilibility

    of UNL with good results.

    3. 

    The new ones: those that have noticed the drawbacks of the official compo-

    nents, and have decided to create new architectures for generation

    It should be noticed that emphasis is put on the deconversion process, quantita-

    tively proven by the number of papers devoted to generation. Teams usually de-

    velop generation systems, not so much enconversion systems, although the integra-

    tive usually includes both processes in UNL.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    8/465

     –  

    Application of UNL in other contexts (see Part 3). Should UNL be considered as

    an interlingua, it can be applied in fields and tasks other than multilingual genera-

    tion, being the main one Knowledge representation and Knowledge Management.

     –   Use of external lexical and ontological resources. It is important as well, and fol-lowing the spirit of the integrative approaches, the use of external lexical re-

    sources such as Wordnet to enhance some of the processes of UNL, especially in

    the lexicographic part (see Part 3, also). This is also a trend and the philosophy of

    UNL: integration and complementation of resources is encouraged, rather than

    confrontation. And this is the spirit of the consortium and of every work in UNL. 

    From an engineering point of view, research is taken on:

     –  

    Creation of methodologies in the workflow. –  

    Standardization of UNL, integration of UNL into current standards.

    Why such studies methodologies and standards? Because of the heterogeneity and

    diversity of the current consortium, it is needed such a process of standardization and

    methodologies, since the short and medium term objective of UNL is its staging in the

    market, where standards and methodologies are required in order to pursue higher

    productivity and quality. The areas of linguistic engineering together with knowledge

    engineering are claiming for such methodologies and processes of standardization. 

    The Future

    After some time developing components and systems to support the multilingual ser-vices, UNL researchers and new teams have discovered that the UNL could be sup-

    port of other applications as crosslingual information retrieval, knowledge reposito-ries, automatic building of ontologies from texts once repressented in UNL and much

    more. UNL could be useful in new possible applications in areas where a common

    conceptual representation is needed, independent of any particular language. For do-

    ing it, new necessities emerge; particularly when putting together semantics and mul-

    tilingualism. More theoretical studies are needed, along with the tuning up of re-

    sources and tools, the proper standardization of the interlingua and processes for

    enconverting and deconverting, and of course the integration and definition of the

    lexical component of UNL.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    9/465

    The Structure of the Book

    The volume is divided into four parts.

    Part 1. Introduction

    This fist part is an introduction to the language itself, and its purpose is to set up the

    reader in the UNL context. These introductory papers posit the general philosophy of

    the language (paper at page 3) and provide a general introduction to the language it-

    self and to the context of multilingual generation, one of the main and most basic

    “applications” supported by UNL (paper at page 10).

    Part 2. Fundamentals

    This part is dedicated to theoretical studies on UNL. As already said, UNL is mainly

    an interlingua. There are many aspects that have to be taken into account when de-

    signing an interlingua, such as its expressiveness, degree of language-independency,

    accuracy and formality of the language, etc. Most of these issues are covered in this

    part. Thus, the part opens up with an experiment on the common understandability of

    UNL by different humans and the admissible degree of indeterminacy and ambiguity

    in an Interlingua (paper at page 27). Pure theoretical studies on the universality of

    UNL and its adequacy from a representational and linguistic point of view follow

    (papers at pages 51 to 101). It has to be pointed out that this part is not exclusively

    devoted to UNL, but to the field of interlinguas in general (paper at page 38; paper atpage 109).

    All these papers point at the proper designs of the Interlingua. However, there is

    another important aspect worth of consideration in any artificial language, namely, the

    syntactic formalism of the formal language and its adequacy to the declared purpose.

    These topics are addressed in papers at page 117 and at page 125, where the emphasis

    is put on the syntactic properties of UNL expressions and its consequences to other is-sues such as analysis or proper deconversion. Finally, there is a (recurrent) thematic

    shift; UNL is not viewed as an interlingua to support linguistic tasks, but as a lan-

    guage for knowledge representation (papers at page 138 and at page 145).

    These two sides of UNL (an interlingua to support linguistic tasks and a as knowl-

    edge representation language) determine the nature of the applications dealt with in

    Part 3.

    Part 3. Applications

    The core applications of UNL are those that support the tasks of NL analysis and gen-

    eration (enconversion and deconversion in the UNL jargon). When dealing with NLP

    tasks, the scene is quite heterogeneous: from the use of common generation tools pro-

    vided by the UNL Center (as shown in papers at pages 215 and 241), to the integra-

    tion of existing MT translation systems based on the transfer architecture to support

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    10/465

    an Interlingua architecture (papers at pages 157 and 230). Other languages are sup-

    ported with new tools, but differs in their configuration and architecture (maybe re-

    flecting language variety, maybe reflecting different ways to support generation and

    of course, as an advanced over common tools, like Deco). Chinese, Brazilian Portu-

    guese, Arabic or Armenia are example of this, where very different paradigms are il-

    lustrated in order to undertake the generation task (papers at pages 167, 175, 195, and

    210, respectively).

    Papers at pages 254 to 276 illustrate the development of workbenches to support

    the processes of edition, generation and training and with the creation of multilingual

    platforms within the UNL framework.

    In parallel with the theoretical studies of Part 2, UNL also presents and applied di-

    mension when conceived as a language for knowledge representation (papers at pages

    337 and 359). These papers present the use of UNL as an extension (or complementa-

    tion) to the expressiveness of standard languages such as XML (illustrated in papers

    at pages 300 and 309), as the communication language among agents, developed in

    paper at page 326, or as the support of case-based reasoning systems (paper at page

    347). It is also remarkable the possibility of complementation and integration with

    other lexical and ontological resources such as WordNet (papers at pages 370 and

    380) to the enhancement of the processes of knowledge acquisition and representation

    within the UNL context. Finally, paper at page 286 shows how to extend the expres-

    sivity of UNL in order to represent and formalize meaning coming for oral sources.

    Part 4. Methodologies

    Finally, the volume ends up with the methodological work. Methodologies target at

    the creation of methodologies to support multilingual services (papers at pages 395

    and 413) and for the optimization of knowledge intensive tasks (paper at page 430).

    Needless to say, methodologies conforms an integral part of the UNL R+D activities,

    as long as productivity, quality and a real consolidation of UNL are pursued both at

    the scientific and commercial levels.

    Mexico D.F, 16th February 2005

    Jesús Cardeñosa

    Alexander Gelbukh

    Edmundo TovarEditors

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    11/465

    Table of ContentsÍndice 

    INTRODUCTION: Setting up UNL

    A Rationale for Using UNL as an Interlingua and More in Various Domains ............3

    Christian Boitet

    Standardization of the Generation Process in a Multilingual Environment ............... 10 Jesús Cardeñosa, Carolina Gallardo and Edmundo Tovar

    FOUNDATIONS

    The UNL Distinctive Features: Inferences from a NL-UNL Enconverting Task.......27 Ronaldo Teixeira Martins, Lúcia Helena Machado Rino, Maria dasGraças Volpe Nunes, Osvaldo Novais Oliveira Jr.

    Issues in Generating Text from Interlingua Representations......................................38Stephan Busemann

    On the Aboutness of UNL................................................................... ....................... 51 Ronaldo Teixeira Martins, Maria das Graças Volpe Nunes

    A Comparative Evaluation of unl Participant Relations using a Five-LanguageParallel Corpus .............................................................. ............................................. 64

     Brian Murphy and Carl Vogel

    Some Controversial Issues of UNL: Linguistic Aspects.............................................77

     Igor Boguslavsky

    Some Lexical Issues of UNL....................................................................................101 Igor Boguslavsky

    The Representation of Complex Telic Predicates in Wordnets: the Case ofLexical-Conceptual Structure Deficitary Verbs .......................................................109

     Palmira Marrafa

    Remaining Issues that Could Prevent UNL to be Accepted as a Standard ............... 117Gilles Sérasset and Étienne Blanc

    Semantic Analysis through Ant Algorithms, Conceptual Vectors and Fuzzy

    UNL Graphs ............................................................ ................................................. 125 Mathieu Lafourcade

    Term-Based Ontology Alignment ................................................................... ......... 138Virach Sornlertlamvanich, Canasai Kruengkrai, Shisanu Tongchim, Prapass Srichaivattana, and Hitoshi Isahara

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    12/465

    Universal Networking Language: A Tool for Language IndependentSemantics?................................................................................................................145

     Amitabha Mukerjee, Achla M Raina, Kumar Kapil, Pankaj Goyal, Pushpraj Shukla

    APPLICATIONS

    About and Around the French Enconverter and the French Deconverter.................157 Étienne Blanc

    A UNL Deconverter for Chinese..............................................................................167 Xiaodong Shi, Yidong Chen

    Flexibility, Configurability and Optimality in UNL Deconversion viaMultiparadigm Programming .................................................... ............................... 175

     Jorge Marques Pelizzoni, Maria das Graças Volpe Nunes

    Arabic Generation in the Framework of the Universal Networking Language ........ 195 Daoud Maher Daoud

    Development of the User Interface Tools for Creation of National LanguageModules ............................................................. ....................................................... 210

    Tigran Grigoryan, Vahan Avetisyan

    Universal Networking Language Based Analysis and Generation for Bengali

    Case Structure Constructs.........................................................................................215 Kuntal Dey, Pushpak Bhattacharyya

    Interactive Enconversion by Means of the Etap-3 System ....................................... 230 Igor M. Boguslavsky, Leonid L. Iomdin and Victor G. Sizov

    Prepositional Phrase Attachment and Interlingua.....................................................241 Rajat Kumar Mohanty, Ashish Francis Almeida, and Pushpak Bhattacharyya

    Hermeto: A NL–UNL Enconverting Environment...................................................254 Ronaldo Martins, Ricardo Hasegawa and M. Graças V. Nunes

    A Platform for Experimenting UNL (Universal Networking Language) ................. 261Wang-Ju TSAI

    A Framework for the Development of Universal Networking Language E-Learning User Interfaces...........................................................................................268

     Alejandro Martins, Gabriela Tissiani and Ricardo Miranda BarciaA WEB Platform Using UNL: CELTA’s Showcase ............................................... 276 Lumar Bértoli Jr., Rodolfo Pinto da Luz and Rogério Cid Bastos

    Studies of Emotional Expressions in Oral Dialogues towards an Extension ofUniversal Networking Language .............................................................................286

     Mutsuko Tomokiyo, Gérard Chollet, Solange Hollard

    An XML-UNL Model for Knowledge-Based Annotation........................................300 Jesús Cardeñosa, Carolina Gallardo and Luis Iraola

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    13/465

    A Pivot XML-Based Architecture for Multilingual, Multiversion Documents:Parallel Monolingual Documents Aligned Through a Central CorrespondenceDescriptor and Possible Use of UNL........................................................................309

     Najeh Hajlaoui, Christian Boitet

    UCL - Universal Communication Language ............................................................ 326Carlos A. Estombelo-Montesco and Dilvan A. Moreira

    Knowledge Engineering Suite: A Tool to Create Ontologies for an AutomaticKnowledge Representation in Intelligent Systems ................................................... 336

    Tânia C. D. Bueno, Hugo C. Hoeschl, Andre Bortolon, Eduardo S. Mattos, Cristina Santos, Ricardo M. Barcia

    Using Semantic Information to Improve Case Retrieval in Case-Based

    Reasoning Systems...................................................... ............................................. 346 J. Akshay Iyer and Pushpak Bhattacharyya

    Facilitating Communication Between Languages and Cultures: aComputerized Interface and Knowledge Base..........................................................358

    Claire-Lise Mottaz Jiang, Gabriela Tissiani, Gilles Falquet, Rodolfo Pinto da Luz

    Using WordNet for linking UWs to the UNL UW System ...................................... 369 Luis Iraola

    Automatic Generation of Multilingual Lexicon by Using Wordnet ......................... 379 Nitin Verma and Pushpak Bhattacharyya

    METHODOLOGIESGradable Quality Translations Through Mutualization of Human Translationand Revision, and UNL-Based MT and Coedition .................................................. 393

    Christian Boitet

    Towards a systematic process in the use of UNL to support multilingualservices ................................................................ ..................................................... 411

     Jesús Cardeñosa, Carolina Gallardo, Edmundo Tovar

    Knowledge Representation Issues and Implementation of Lexical Data Bases ....... 428 F. Sáenz and A. Vaquero

    Author Index ............................................................................................................443

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    14/465

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    15/465

    Prologue

    UNL is an ongoing worldwide initiative starting in 1996. Almost 10 years have

    passed a big span of time for a project. We could say that UNL didn’t meet its expec-

    tations. But let’s have a closer look to UNL, the project, its basics and objectives. A

    closer look at its objective will reveal that this affirmation is gratuitous and unmoti-

    vated.

    The Problem: Linguistic Diversity

    UNL was launched by IAS/UNU to erase linguistic barriers. Linguistic barriers col-lide with the enhancement of linguistic diversity and the value that native languages

    as one of the main vehicles to express one’s cultural identity. Apart from socio-

    cultural issues, linguistic diversity also knows an economic and political dimension.

    Institutions like the United Nations or the European Union have to face everyday with

    the barriers that linguistic diversity imposes. It is well known the enormous amount of

    documentation that these institutions produce everyday, which have to be produced inall their official languages: 6 for the UN, 25 for the European Union. It is simply un-

    feasible to rely on human translators for the production of all these amount of docu-

    mentation.

    Aware of this, the IAS/UNU launched the UNL project, aiming at the real access

    of information in the own native language and not recurrent to dominant languages.

    UNL is basically an artificial language where contents expressed in natural languages

    can be converted to and subsequently, contents written in UNL can be generated intoany natural language, provided that the adequate tools are built.

    MT and Multilinguality

    From the technological point of view, multilinguality has been tackled by Machine

    Translation. In the evolution of the area of MT, there is variety of architectures to un-

    dertake the task of translating the contents of one text written in a given language into

    another language. Transfer-based systems could be regarded as the most productive

    and of better quality. But they are hindered by the exponential growth in the modules

    to be developed when the number of involved languages increases. A transfer-based

    system involving N languages need to develop N*(N-1) modules. An astronomic

    number to create real multilingual platforms.Further, although there are some very good systems, the quality of these systems

    seem to be limited, since after years of refinement, the MT system does not surpass a

    given degree of quality. Besides, the development of transfer based MT systems is

    usually reduced to the so-called majority languages (English, French, German and

    even Spanish or Italian), but it is fairly rare to find a good quality and wide coverage

    MT system covering English and Polish, let’s say.

    Transfer based MT is not the only option, Interlingua-based systems represents an

    alternative to transfer systems. Interlingua-based MT does not work on pair of lan-

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    16/465

    guages, but translation is carries out to and from an artificial language that serves as a

    pivot for all the natural languages involved in the system. This architecture tries to

    overcome the exponential growth of transfer-based systems, since the number of

    modules to develop for N languages is 2*N and the inclusion of new languages into

    the system does not affect the other language modules. In this way, UNL follows the

    architecture of Interlingua-based MT systems.

    Usually, Interlinguas are abstract formal (or semi formal) languages that captures

    the meaning of texts in a language independent way. Ideally, the Interlingua should

    not be close to a given particular language and should not include linguistic devices

    proper of natural languages. In this way, Interlingua-based systems seem the most

    plausible (and even the unique) option to tackle massive multilinguality.

    But Interlinguas has been often rejected within the scientific community and since

    their boom in the 80ies, there have no commercial application of Interlinguas and the

    systems developed under this trend were laboratory products. Why is this so? Let’s

    have a look at the properties of interlinguas.

    Problems with Interlinguas

    Interlinguas are semantic languages designed to represent the meaning of any given

    text, ideally satisfying the following conditions:

    (a)  They are language neutral.

    (b) 

    They are precise, unambiguous, formal languages

    Being so, they usually show the following characteristics:

    − Interlinguas are intimately tied up with ideas about the representation of meaning,being meaning the most abstract and deepest level of linguistic analysis (that

    should be common to all languages, far enough from surface representation of lan-

    guages).

    −  An Interlingua is “another language” in the sense that it has autonomy and thus its

    components need to be defined: vocabulary and “relations” mainly. Besides, and

    Interlingua is an artificial language that should be as expressive as natural lan-

    guages.

    Here we find the main bottleneck of interlinguas: its proper design and definition.

    Defining an Interlingua involves the following parameters:

    (a) 

    A language whose “atoms” are not dependent on any given natural language

    so that the ambiguity of natural languages is eliminated.

    (b) 

    A language whose “atoms” are not dependent on a given natural language so

    that the concepts and ideas expressed in different natural languages can be

    easily and naturally expressed in the Interlingua.

    (c) 

    A language that is as expressive as a natural language so that what can be ex-

    pressed in natural languages can be transposed to the Interlingua, and from

    the interlingua to other natural languages.

    These three conditions make interlinguas hard to design. It is quite difficult to find

    the equilibrium between language independency, degree of abstraction and expres-

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    17/465

     

    siveness in a formal device such an Interlingua. Maybe this difficulty in the design of

    interlinguas is the reason why they have not been successful at least in open domains

    within massively multilingual environments. The examples of interlingua-based sys-

    tems are domain dependent and quite limited in the number of languages.

    Is UNL a Viable Solution?

    The panorama appears quite despairing. While Interlinguas are theoretically biased

    and difficult to put into practice, transfer based systems have proved to be unattain-

    able when dealing with massive multilinguality. Maybe the concept of Interlingua

    should be revisited, and re-adapted to real necessities and to real scenarios. This is the

    spirit of UNL. UNL, by its definition and by its most basic architecture is definitelyan Interlingua-based system. Its targets are the support of multilinguality, not re-

    stricted to a given domain or to a given family of languages. Thus, the design of a in-

    terlingua like UNL encounters all the possible barriers that an Interlingua may en-

    counter (especially to find a real language independent representation).

    So why we could considered UNL as different, as a new viable technology if inter-

    linguas were rejected a long time ago? First, let’s remember the main objective of

    UNL:

    −  to generate and produce contents in any natural language in any domain.

    −  to support multilingual services.

    That is, there is a primacy of generation and coverage of languages and domains,

    which means that a very expressive formalism has to be designed in order to repre-

    sent such a variety of contents coming from any natural language.Let’s illustrate this fact by have a closer look at the vocabulary of the Interlingua,

    one of the most difficult and polemic issues of UNL and of any Interlingua. UNL util-

    izes the so-called Universal Words as the semantic atoms of the Interlingua (no de-

    composable). They exhibit the following main characteristic:

    They are based on English headwords.

    From this very simple definition, we can conclude that UNL is language biased

    (English) and thus:

    1. UNL is based on a natural language:

    2. 

    It hinders logical relations and inferences (facilitated by primitive based solutions)

    3.  Its vocabulary is a potential source of ambiguity

    4.  Its vocabulary fosters lexical and conceptual mismatches among languages.

    So is there any advantage in the UW system and in the overall essence of UNL?

    Well, if theoretical reasons do not support the design of open-domain interlinguas,

    let’s look at the practical or pragmatic ones.

    (a) 

    UNL is based on a natural language. At first sight could be a drawback,

    however, the expressiveness of a natural language is inherited by the Inter-

    lingua, thus allowing for the representation of a variety of domains and con-

    cepts.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    18/465

    (b) 

    UNL shows an English oriented vocabulary. At this moment, English is the

    lingua franca, the most accessible to work with for Indo-Europeans, Semitic,

    Japanese, Chinese, etc. Bilingual dictionaries usually have English as one of

    their target/source languages, thus the development of lexicographic re-

    sources is facilitated by choosing English as the most basic atoms of the lan-

    guage.

    Of course, this approach (although supported by pragmatism) is far from perfect.

    Even at first sight, it can be considered as naïve, since it merely “suggest” well known

    problems in lexical semantics (like support verbs, compounds expressions, connota-

    tional meaning, etc). For this reason, theoretical research on the UNL as a language it-

    self should be fostered within the Consortium, while respecting the basic nature of the

    language.

    That is, UNL should be viewed rather than a perfect Interlingua as the pillars tosupport multilingual services. Its natural language orientation (apparently, its weak-

    est points as an Interlingua) turns the language as a candidate to the support of multi-

    linguality and facilitates converting contents to and from UNL. There are several as-

    pects that support it. First, the creation of generators of medium quality (where post-

    edition is possible) is rather straightforward. Second, its flexibility and language ori-

    entation makes it possible to integrate UNL into other pre-existent MT systems (be it

    transfer-based be it another architecture) which extends the range of application of

    UNL and makes possible to alleviate the problem of exponential growth in transfer-

    based systems. And last, but not least, the processes of enconverting and deconverting

    are independent so that if generation is taken as a priority, generators are constructed

    first; the process of enconversion can be done manually, due to the human readability

    of the language.

    At this point in the evolution of UNL, there appears a contradiction, UNL is stillnot theoretically mature, but from an applied perspective, it is. In the short term there

    is priority for the UNL Consortium to get feedback from previous experiences in In-

    terlinguas, from Linguistic Theory (semantics, logic, and lexical semantics) in order

    for UNL to grow and find a place in the scientific community and, why not, in the

    market as a real approach to support multilinguality, once the applications and utilities

    are clear and defined within the UNL Programme.

    Prospective

    So is it worth another attempt? Definitely yes, the real need to overcome linguistic

    barriers (be it at the institutional level, be it at the social level) claims for a solution to

    the problem of multilinguality. Transfer based systems simply are out of question ifisolated . This doesn’t mean that they are useless: they are not. An interlingua like

    UNL is conceived as another autonomous languages, close enough to the superficial

    form of natural languages, thus integration of the Interlingua into the transfer system

    is possible and not a contradiction in terminis.

    After several years of experience, we know that knowledge and language genera-

    tion do not go on a par . Thus the final design have to be done bearing the ultimate

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    19/465

     

    purpose of the interlingua (the closer to language semantics is, the better to generate

    languages) and probably will lead to the success of the interlingua.

    A Final Word

    I would like to thank the editors of this book for their invitation to write a prologue tothis work and to collaborate with them in the selection and revision of the selected

    papers presented in this volume. Hopefully it will provide a thorough understanding

    of the UNL Programme, its meaning, its evolution, its shortages and its strengths.

    Carolina Gallardo

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    20/465

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    21/465

     

    INTRODUCTION: SETTING UP UNL

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    22/465

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    23/465

     

    A Rationale for Using UNL as an Interlingua and More

    in Various Domains

    Christian Boitet

    GETA, CLIPS, IMAG385, Av. de la Bibliothèque, BP 53F-38041 Grenoble cedex 9, [email protected]

    Abstract. The UNL language of semantic graphs may be called as a "seman-tico-linguistic" interlingua. As a successor of the technically and commerciallysuccessful ATLAS-II and PIVOT interlinguas, its potential to support variouskinds of text MT is certain, even if some improvements would be welcome, asalways. It is also a strong candidate to be used in spoken dialogue translationsystems when the utterances to be handled are not only task-oriented and oflimited variety, but become more free and truly spontaneous. Finally, althoughit is not a true representation language such as KRL and its frame-based andlogic-based successors, and although its associated "knowledge base" is not atrue ontology, but rather a kind of immense thesaurus of (interlingual) sets ofword senses, it seems particularly well suited to the processing of multilingualinformation in natural language (information retrieval, abstracting, gisting,etc.).The UNL  format  of multilingual documents aligned at the level of utter-ances is currenly embedded in html (call it UNL-html), and used by various

    tools such as the UNL viewer. By using a simple transformation, one obtainsthe UNL-xml format, and profit from all tools currently developed aroundXML. In this context, UNL may find another application in the localization ofmultilingual textual resources of software packages (messages, menu items,help files, and examples of use in multilingual dictionaries.)

    1 Introduction

    UNL is the name of a project, of a meaning representation language, and of a formatfor "perfectly aligned" multilingual documents. There is some hefty controversyabout the use of the UNL language as an "interlingua", be it for translation or forother applications such as cross-lingual information retrieval. On the other hand,

    there is almost no discussion on the UNL format, in its current form, embedded inHTML, or some directly derivable form, embedded in XML.We argue that the UNL language is indeed a good interlingua for automated trans-

    lation, ranging from fully automatic MT to interactive MT of several kinds through,we believe, spoken translation of non task-oriented dialogues. It is also more thanthat, due to the associated "knowledge base", and has a great potential in textual in-formation processing applications.

    © J. Cardeñosa, A. Gelbukh, E. Tovar (Eds.)Universal Network Language: Advances in Theory and Applications.Research on Computing Science 12, 2005, pp. 3–9.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    24/465

    4 Christian Boitet

    We will first give our view of what the UNL language is, and then develop a "ra-tionale" for using the UNL language UNL along the previous lines. We will then de-scribe some interesting potential uses of the UNL format in an "XML-ized" form.

    2 The UNL language

    The UNL representation is made of "semantic graphs" where a graph expresses themeaning of some natural language utterance. Nodes contain lexical units and attribu-tes, arcs bear semantic relations. Connex subgraphs may be defined as "scopes", sothat a UNL graph may be a hypergraph. Figure 1 illustrates a UNL graph.

    agt

    insplt

    objmod

    Ronaldo head(pof>body)

    corner

    left

    goal(icl>thing)

    score(icl>event,agt>human,fld>sport).@entry.@past.@complete

    obj

    pos

     Fig. 1. A possible UNL graph for “Ronaldo has headed the ball into the left corner of the goal”

    The lexical units, called Universal Words (in French, not "mot universel" but bet-ter "Unité de Vocabulaire Virtuel" or UVV or UW), represent word meanings, some-thing less ambitious than concepts. Their denotations are built to be intuitively under-stood by developers knowing English, that is, by all developers in NLP. A UW is anEnglish term or pseudo-term possibly completed by semantic restrictions.

    A UW such as "process" represents all word meanings of that lemma, seen as cita-tion form (verb or noun here). The UW "process(icl>do, agt>person)" covers the ver-bal meanings of processing, working on, etc.

    The attributes are the (semantic) number, genre, time, aspect, modality, etc.The 40 or so semantic relations are traditional "deep cases" such as agent, (deep)

    object, location, goal, time, etc.One way of looking at a UNL graph corresponding to an utterance U-L in lan-

    guage L is to say that it represents the abstract structure of an equivalent English ut-terance U-E as "seen from L", meaning that semantic attributes not necessarily ex-pressed in L may be absent (e.g., aspect coming from French, determination ornumber coming from Japanese, etc.).

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    25/465

    A Rationale for Using UNL as an Interlingua and More in Various Domains  5

    3 Some arguments for using the UNL language in various contexts

    To show that using UNL is not only a workable but a good or perhaps the best idea atthe moment, we can say that

    −  the "pivot" technique HAS BEEN not only experimented but deployed success-fully (ATLAS, PIVOT, ULTRA, KANT).

    −  in particular, ATLAS-II (Fujitsu) is built on the basis of a pivot from which theUNL representation has evolved. The main designer of UNL, H. Uchida, was alsothe main designer of ATLAS-II.

    −  ATLAS-II has been recognized as the best EJ/JE MT system in Japan for over 10

    years and has a very large coverage (586,000 words in English and Japanese).−  interlingual representations can not in principle be used (alone) to achieve the

    highest quality achievable by transfer systems, BUT they can give quite high qual-ity as demonstrated by ATLAS-II.

    −  due to the precise nature of UNL, it is possible for human non-specialists to im-prove a UNL representation interactively, a posteriori, from any UNL-related lan-guage, and on demand (meaning partially — think of "lazy improvement").

    −  in many contexts other than translation, an interlingual, semantic-oriented repre-sentation like UNL is actually the best solution. For example, all applications re-lated to information processing in multilingual contexts don't need a very preciserepresentation of the FORM of the information, they need a precise ENOUGHrepresentation of the INFORMATION CONTENT of the information.

    −  applications such as information retrieval and abstracting have already been proto-typed successfully with UNL. It is far easier to generate SQL or SQL-like queriesand answers from a UNL form than from text in many languages.

    4 Applications of the UNL format

    The UNL format  of multilingual documents aligned at the level of utterances is cur-renly embedded in html (call it UNL-html). A sentence is represented between the [S]and [/S] tags. Its original text is contained between {org:el} (English, here) and{/org}, its UNL graph between {unl} and {/unl}, each French version between {fr}and {/fr}, and analogously for other languages. Atrtibutes such as version, date, loca-tion, author, etc. may appear in the tags. Here is a slightly simplified example of a filein UNL-html format.

    Example 1 El/UNL

    [D:dn=Mar Example 1, on= UNL French,[email protected]]

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    26/465

    6 Christian Boitet

    [P][S:1]{org:el}I ran in the park yesterday.{/org}{unl}agt(run(icl>do).@entry.@past,i(icl>person)) plc(run(icl>do).@entry.@past,park(icl>place).@def)tim(run(icl>do).@entry.@past,yesterday){/unl}{cn dtime=20020130-2030, deco=man}

    我昨天在公園裡跑步{/cn}

    {de dtime=20020130-2035, deco=man}Ich lief gestern im Park. {/de}

    {es dtime=20020130-2031, deco=UNL-SP}Yo corri ayer en el parque.{/es}{fr dtime=20020131-0805, deco=UNL-FR}J’ai couru dans le parc hier. {/fr}[/S][S:2]{org:el}My dog barked at me.{/org}{unl}agt(bark(icl>do).@entry.@past,dog(icl>animal))gol(bark(icl>do).@entry.@past,i(icl>person)) pos(dog(icl>animal),i(icl>person)){/unl}{de dtime=20020130-2036, deco=man}Mein Hund bellte zu mir.{/de}

    {fr dtime=20020131-0806, deco=UNL-FR}Mon chien aboya pour moi.[/S] [/P][/D]

    The French versions have been produced automatically while the German and Chi-nese versions have been translated manually.

    The output of the UNL viewer for French is:

    Example 1 El/UNL

    J’ai couru dans le parc hier.Mon chien aboya pour moi.

    and will probably be displayed by a browser as:

    Example 1 El/UNLJ’ai couru dans le parc hier. Mon chien aboya pourmoi.

    and similarly for all other languages.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    27/465

    A Rationale for Using UNL as an Interlingua and More in Various Domains  7

    The UNL viewer produces on demand as many html files as languages selectedand sends them to any available browser.

    The UNL-html format predates XML, hence the special tags like [S] and {unl},but it is easy to derive from it an XML format and to transform the documents into anequivalent "UNL-xml" format. Then, using DOM and javaScript, it is possible to pro-duce various views, including that of a classical viewer, a bilingual or multilingualeditable presentation, and a revision interface where not only the text but the UNLgraph and possibly other structures may be directly manipulated.

    Let us take an example from an experiment performed for the "Forum Barcelona2004" on documents in Spanish, Italian, Russian, French and Hindi. Hindi and Rus-sian are not shown. The XML form is simplified (see figure 2).

    agt(retrieve.@entry.@future, city) tim(retrieve.@entry.@future, after) obj (after. Forum) obj(retrieve.@entry.@future, zone.@indef) mod(zone.@indef, coastal) After a Forum, a city will retrieve a coastal zone Una ciudad recuperará una zona de costa después de Forum

    Una cité retrouvera une zone côtière après un forum Città ricuperarà une zone costiera dopo Forum

    Fig. 2. Simplified XML form. Correct sentences are produced by the deconverters from cor-rect and complete UNL graphs. Suppose for the sake of illustration that some UNL graph has

    been produced from a Chinese version, and does not contain definiteness and aspectual infor-mation. All results may be wrong wrt articles, and some wrt aspect.

    The idea of "coedition" is applicable if there is a UNL graph associated with asegment one wants to modify. The goal is to share the revisions across languages, byreflecting them on the UNL graph, e.g.

    •  add ".@def" on the nodes containing "city", "Forum".

    •  replace "retrieve" by "recover" and add ".@complete" on the node containing it.It is not possible in principle to deduce the modification on the graph from a modi-

    fication on the text. For example, replacing "un" ("a") by "le" ("the") does not entailthat the following noun is determined (.@def), because it can also be generic ("ilaime la montagne" = "he likes mountains"). Hence, the technique envisaged is that:

    • 

    revision is not done by modifying directly the text, but by using a menu system,

    •  the menu items have a "language side" and a hidden "UNL side",

    •  when a menu item is chosen, only the graph is transformed, and the action to bedone on the text is stored and shown next to its focus in the "To Do" zone,

    •  at any time, the new graph may be sent to the L0 deconverter and the result shown.If is is satisfactory, that shows that errors were due to the graph and not to the de-converter, and the graph may be sent to deconverters in other languages. Versions

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    28/465

    8 Christian Boitet

    in some other languages known by the user may be displayed, so that improve-ment sharing is visible and encouraging.New versions will be added with appropriate tags and attributes in the original

    multilingual document in UNL-xml format, or in a DBMS, so that nothing is everlost, and cooperative working on a document is feasible. UNL may find another ap-plication in the localization of multilingual textual resources of software packages(messages, menu items, help files, and examples of use in multilingual dictionaries.)

    Apart of the "coedition", there are many other portential applications of UNL, suchas:

    •  crosslingual information retrieval, on which we are currently working,

    •  abstracting & gisting, which has been prototyped at NecTec and in India,

    • 

    localization of software packages: messages in multiple languages could be cre-ated from UNL graphs produced from a graphical interface or by enconversion,and then sent to appropriate deconverters.

    For this last point, we have found how to represent messages including variables(such as integers, file names etc.), but not yet how to handle messages including mor-phological or even lexical variants (as "4 goda / 5 let" for "4 years / 5 years" in Rus-sian).

    5 Conclusion

    The UNL language is an artificial interlingua, embeddable in html or xml formats for

    multilingual document representation and processing. Because of its both abstract andlinguistic nature, the UNL language offers many more interesting potential applica-tions than other types of interlingua such as task and/or domain specific interlingua.

    The history of MT shows that UNL will also be usable in the context of high-quality MT, quality being obtained through typology specialization and/or interactiveimprovement, a priori (interactive disambiguation after all-path robust analysis)and/or a posteriori by coedition of the text in any language and the correspondingUNL graph.

    References

    Blanc É. & Guillaume P. (1997)  Developing MT lingware through Internet : ARIANE and the

    CASH interface. Proc. of Pacific Association for Computational Linguistics 1997 Confer-ence (PACLING'97), Ohme, Japon, 2-5 September 1997, 1/1, pp. 15-22.Blanchon H. (1994) Perspectives of DBMT for monolingual authors on the basis of LIDIA-1,

    an implemented mockup. Proc. of 15th International Conference on Computational Linguis-tics, COLING-94, 5-9 Aug. 1994, 1/2, pp. 115—119.

    Boitet C., Guillaume P. & Quézel-Ambrunaz M. (1982)  ARIANE-78, an integrated environ-ment for automated translation and human revision. Proc. of COLING-82, Prague, July1982, North-Holland, Ling. series 47, pp. 19—27.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    29/465

    A Rationale for Using UNL as an Interlingua and More in Various Domains  9

    Boitet C. (1994)  Dialogue-Based MT and self-explaining documents as an alternative to MAHT and MT of controlled languages. Proc. of Machine Translation 10 Years On, 11-14Nov. 1994, Cranfield University Press, pp. 22.21—29.

    Boitet C. & Blanchon H. (1994)  Multilingual Dialogue-Based MT for Monolingual Authors:the LIDIA Project and a First Mockup. Machine Translation, Vol. 9, N° 2, pp. 99—132.

    Boitet C. (1997) GETA's MT methodology and its current development towards personal net-working communication and speech translation in the context of the UNL and C-STAR pro- jects. Proc. of PACLING-97, Ohme, 2-5 September 1997, Meisei University, pp. 23-57.

    Boitet C., Réd. (1982) "DSE-1"— Le point sur ARIANE-78 début 1982. Contrat ADI/CAP-Sogeti/Champollion (3 vol.), GETA, Grenoble, janvier 1982, 616 p.

    Brown R. D. (1989) Augmentation. (Machine Translation), Vol., N° 4, pp. 1299-1347.Ducrot J.-M. (1982) TITUS IV . In Information research in Europe. Proc. of the EURIM 5 conf.

    (Versailles), edited by Taylor P. J., London, ASLIB.Kay M. (1973) The MIND system. In Courant Computer Science Symposium 8: Natural Lan-

    guage Processing, edited by Rustin R., New York, Algorithmics Press, Inc., pp. 155-188.Lafourcade M. (2001)  Lexical sorting and lexical transfer by conceptual vectors. Proc. ofMMA'01, 29-31/1/01, SigMatics & NII, Tokyo, 10 p.

    Lafourcade M. & Prince V. (2001) Synonymies et vecteurs conceptuels. Proc., 29-31/1/01,SigMatics & NII, Tokyo, 10 p.

    Maruyama H., Watanabe H. & Ogino S. (1990)  An Interactive Japanese Parser for MachineTranslation. Proc. of COLING-90, 20-25 août 1990, ACL, 2/3, pp. 257-262.

    Melby A. K., Smith M. R. & Peterson J. (1980)  ITS : An Interactive Translation System. Proc.of COLING-80, Tokyo, 30/9-4/10/80, pp. 424—429.

    Moneimne W. (1989) TAO vers l'arabe. Spécification d'une génération standard de l'arabe. Réalisation d'un prototype anglais-arabe à partir d'un analyseur existant . Nouvelle thèse,UJF.

    Nirenburg S. & al. (1989) KBMT-89 Project Report ., Center for Machine Translation, CarnegieMellon University, Pittsburg, April 1989.

    Nyberg E. H. & Mitamura T. (1992) The KANT system: Fast, Accurate, High-Quality Transla-

    tion in Practical Domains. Proc. of COLING-92, 23-28 July 92, ACL, 3/4, pp. 1069—1073.Sérasset G. & Boitet C. (2000) On UNL as the future "html of the linguistic content" & the re-use of existing NLP components in UNL-related applications with the example of a UNL-French deconverter . Proc. of COLING-2000, Saarbrücken, 31/7—3/8/2000, ACL, 7 p.

    Slocum J. (1984)  METAL: the LRC Machine Translation system. In  Machine Translation to-day: the state of the art (Proc. third Lugano Tutorial, 2–7 April 1984) , edited by King M.,Edinburgh University Press (1987).

    Wehrli E. (1992) The IPS System. Proc. of COLING-92, 23-28 July 1992, 3/4, pp. 870-874.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    30/465

     

    Standardization of the Generation Process

    in a Multilingual Environment

    Jesús Cardeñosa, Carolina Gallardo and Edmundo Tovar 

    Universidad Politécnica de Madrid, 28660 Madrid.{carde,carolina,edmundo}@opera.dia.fi.upm.es

    Abstract.  Natural language generation has received less attention within the

    field of Natural language processing than natural language understanding. One

    possible reason for this could be the lack of standardization of the inputs to

    generation systems. This fact makes the systematic planning of the process of

    developing generation systems to become difficult. The authors propose the use

    of the UNL (Universal Networking Language) as a possible standard for the

    normalization of inputs to generation processes.

    1 Introduction

    In natural language processing (from now on NLP) two areas can be differentiated:

    analysis and generation. However, one has not received the same attention as the

    other from the scientific community, that is why generation can be considered as the

    “poor brother” of the NLP. The reason for this minor development is the different na-ture of the input to the analysis and generation systems. The input to the analysis sys-

    tems is always natural language, whose casuistic and phenomenology are known;

    while in a generation system, the output is always known, but not what it is going to

    generate from [1].

    The input to a generation system varies depending on whether it is monolingual

    generation (dialogue systems) or a multilingual system (mainly machine translation

    systems). In dialogue systems it is difficult to establish appropriate characteristics

    common to all inputs, because “the problem” of generation is usually solved with so-

    lutions ad hoc, depending on the application and the system language. In machine

    translation systems, there are also many differences in the inputs to the generation

    subcomponents, conditioned by the nature of system architecture (transfer, interlin-

    gua, etc.), the kind of grammars being used (declaratives vs. procedural) [2], or the

    number of languages in the system.This difference in the input to the generators makes a systematic planning of their

    development process impossible (main cause of the minor development of generation

    compared to analysis). It is necessary then, that the input to the “generator” can be

    supported with an appropriate model of contents representation, separated from the

    format or language that ensures a standard process for the development of generation

    systems.

    In this article we propose the UNL as a possible standard for the generation inputs.

    To achieve this, in section 2 we will introduce the main generation architectures. Sec-

    © J. Cardeñosa, A. Gelbukh, E. Tovar (Eds.)

    Universal Network Language: Advances in Theory and Applications.

    Research on Computing Science 12, 2005, pp. 10–24.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    31/465

    Standardization of the Generation Process in a Multilingual Environment 11

    tion 3 will describe in detail the UNL system, its qualities and basic architecture for

    generation. Section 4 will establish the conditions required by any technology in order

    to be considered a standard and which ones are fulfilled by the UNL. The article will

    end with the description of a real massively multilingual system (HEREIN) where

    UNL has been formally studied and proposed as a de-facto standard for generation of

    contents in natural languages.

    2 Generation Architectures

    2.1 Dialogue Systems

    Dialogue systems represent one of the main applications of natural language genera-

    tion. This kind of systems have as their most important target “to present information

    to the users in an easy to understand format” [3] in very specific fields where the user

    generally interacts with the system in the same language. The user asks the system

    specific information; once obtained, the system can show it through an answer in

    natural language. This answer is very frequently obtained (with certain success)

    through the generation of a “built” language from a series of templates that keep a

    predefined relationship with the templates that support the questions [4]; this means

    the generation process takes as input a representation that depends on the way the user

    makes the question. It could be said that there is not a thorough analysis of the text,

    nor an abstract representation of the information that should be given to the user. The

    great dependency of the source language and the domain restrain the construction of

    multilingual dialogue systems and the reuse of these systems in other domains.

    2.2 Machine translation systems

    Machine translation systems (from now on MT) are essentially multilingual because

    their target is the “transformation” of a text written in language A into an equivalent

    text in language B. In this section main architectures of MT systems will be de-

    scribed, because each architecture sets a series of conditions over the appropriate

    characteristics of the inputs to the generation process.

    2.2.1 Transfer systems

    The basic tasks in a transfer system are analysis, transfer and generation. The analysiscomponent produces a syntactic representation (sort of thorough) depending of the

    source language. This syntactic representation is the input to the transfer module

    whose task is to transform that representation into a closer structure to the target lan-

    guage. The output of the transfer module shapes the input to the generation system

    module which finally produces the phrase in the target language. In transfer systems,

    the components, inputs and outputs are strongly oriented to the source and target lan-guages. 

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    32/465

    12 Jesús Cardeñosa, Carolina Gallardo, and Edmundo Tovar

    The main problem of the transfer systems is the almost impossibility to reuse the

    existing resources (transfer modules) and components in order to include new lan-

    guage pairs in the system. In fact, if it were necessary to increase the number of lan-

    guage pairs, a new system would have to be built. Generally, the great orientation of

    the “transfer” systems towards the target language involves great accuracy in the out-

    put, and a considerable difficulty in reusing components to include new languages in

    the system.

    2.2.2 Interlingua based Systems

    Interlingua based systems form the second great systems’ paradigm of machine trans-

    lation. Included into the systems based in interlingua are the “traditional” ATLAS-II[5], PIVOT [6] as much as the knowledge-based  ones such as KANT [7] or Mikro-

    kosmos [8]. Their defining characteristics are:

    •  Unique intermediate representation. The abstract representation, result from the

    analysis, “feeds” directly the generation module. This intermediate representation

    is the component named “interlingua”. 

    •  Elimination of the transfer process. The system carries out two basic tasks:

    analysis and generation.

    The systems based on interlingua are oriented to cover the largest possible number

    of languages, given that the number of components that requires a system based in in-

    terlingua for n languages is 2*n, it is remarkably inferior to n*(n-1) that transfer sys-

    tems require for the same number of languages.The basic architecture of the generation in a interlingua system is shown in the next

    figure.

    Interlingua based systems offer an important advantage over the transfer ones; thearchitecture facilitates the inclusion of new languages and are reusable. However, dur-

    ing the conversion process to the interlingua, it is possible that some significant

    grammar information for the generation may be lost, that is, the interlingua may have

    less information (grammatical, not conceptual) than a syntactic representation. To

    sum up, the systems based on interlingua offer a larger number of languages at the

    expense of lesser precision in the generated texts.

    Fig. 1. Generation in interlingua systems

    Interlingua

    Generator A Generator B Generator C 

    Language A Language BLanguage C

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    33/465

    Standardization of the Generation Process in a Multilingual Environment 13

    2.2.3 Fusion

    Without any doubt, multilingualism is an added value for any generation system. The

    transfer-interlingua dichotomy seems to imply an opposition between precision vs.

    number of languages. To take advantage from every one, some transfer systems have

    “interlingued” their architectures to support a larger number of languages [9] [10].

    The common characteristic in these systems is the existence of a deep syntactic repre-

    sentation that has some amount of independence from the source language. The proc-

    ess to combine the interlingua architecture in a transfer system requires the construc-

    tion of a transfer module between the deep syntactic structure and an interlingua

    representation [11].

    3 The UNL approach

    3.1 The UNL system

    UNL [12] is an artificial language designed to reproduce the content of texts written

    in any natural language. The UNL is provided with specifications that formally define

    the language. A UNL expression is an hyper graph consisting of:

    •  Universal words. They define the vocabulary of the language, i.e., they can be

    considered the lexical items of UNL. To be able to express any concept occurring

    in a natural language, the UNL proposes the use of English words modified by a

    series of semantic restrictions that eliminate the innate ambiguity of the vocabu-

    lary in natural languages. In this way, the language gets an expressive richnessfrom the natural languages but without their ambiguity. Take, for example, the

    English word “construction” meaning “the action of constructing” and the “final

    product”. Thus, the word “construction” will be paired with two different univer-

    sal words: 

    construction1 construction(icl>action) 

    construction2  construction(icl>concrete thing) 

    where “icl” is the abbreviation for “included”. 

    •  Relations. These are a group of 41 relations that define the semantic relations

    among concepts. They include argumentative (agent, object, goal), circumstantial

    (purpose, time, place), logic (conjunction, and disjunction) relations, etc. For ex-

    ample, in a sentence like “The boy eats potatoes in the kitchen”, there is a main

    predicate (“eats”) and three arguments, two of them are instances of argumenta-tive relations (“boy” is the agent  of the predicate “eats”, whereas “potatoes” is

    the object ) and one circumstantial relation (“kitchen” is the place where the ac-

    tion described in the sentence takes place). 

    •  Attributes. They express the semantic information resulting from the morphol-

    ogic flexion and the functional elements of the phrase (auxiliary verbs, articles,

    etc.). They are put together with the universal words to complete their meaning

    when they appear in a specific context. The attributes include information about

    time or aspect of the event, number, polarity, modality, etc. In the previous sen-

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    34/465

    14 Jesús Cardeñosa, Carolina Gallardo, and Edmundo Tovar

    tence, attributes are needed to express plurality in the object (“potatoes”), definite

    reference in both the agent (“boy”) and the place (“kitchen”) and finally and spe-

    cial attribute denoting which UW is the head of the whole expression. (the entry 

    node).

    Formally, a UNL expression has the form of a semantic net, where the nodes

    (universal words) are linked by labeled arcs with the UNL concept relations. The

    graphical representation of the sentence “the boy eats potatoes in the kitchen” in UNL

    is shown in figure 2.

    This sentence is written in UNL in the following manner:

     agt( eat(icl>do).@entry, boy(icl>person).@ def  )

     obj( eat(icl>do).@entry, potato(icl>food).@ pl  )

     plc( eat(icl>do).@entry, kitchen(icl>facilities).@ def  ) 

    3.2 Basic characteristics of UNL

    The UNL system represents a generic framework for the massive generation of multi-

    lingual contents. Its main goal is the contents’ representation of a document, web

    page, data base, etc., in a consensual and normalized structure  that may be trans-

    formed into a text in a natural language. The defining characteristics of the UNL sys-

    tem are: 

    a) It is a system oriented to the generation of multilingual contents. A document writ-

    ten in the UNL has its “own identity” and can be stored in a document data base, etc.

    b) The UNL does not involve the use of specific components or tools. The tools and

    components, as well as the processes that may be defined to accomplish the editionand the generation in the UNL vary from one language to another. The use of the

    UNL only involves the standardization of the input into a generation system [13].

    Fig. 2. Representation of a UNL expression.

    place

    agent

    objectpotato(icl>food)@pl eat(icl>do).@entry

    boy(icl>person).@def  

    kitchen(icl>facilties).@def  

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    35/465

    Standardization of the Generation Process in a Multilingual Environment 15

    In spite of the emphasis given to the language generation in the system, the UNL

    framework includes the editing process of natural language into the UNL, named “en-

    conversion” as well as the generation into natural languages or “deconversion” (see

    figure 3).

    The UNL is an interlingua in essence, that is, an appropriate language for the rep-

    resentation of the meaning in an independent way from the natural languages. The

    UNL is not restricted to a specific domain (as can be the KANT or Mikrokosmos in-

    terlinguas); the fact of not restricting the input in the vocabulary collection of the in-

    terlingua guarantees the UNL adaptation for the representation of contents in any lan-

    guage or domain.

    3.3 Generation in the UNL framework.

    There are several architectures for the generation of natural language from the UNL.Next, the two generation architectures within the UNL framework will be described in

    detail.

    3.3.1 Direct Generation

    The UNDL Foundation (http://www.undl.org) supplies a module that carries out the

    generation process through a unique process. This module is known as DeCo (stand-ing for DeConverter). This module is completely language independent, since all the

    Fig. 3. Architecture of the UNL system.

    generation module 

    edition

    module

    EditorUNL

    Target lan-

    guage gene-rator

    Originaltext (any

    language)

    GeneratedText

    UNL

    Document

    Base

    UNL – locallanguage Dic-

    tionaries

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    36/465

    16 Jesús Cardeñosa, Carolina Gallardo, and Edmundo Tovar

    necessary grammar knowledge for the generation of the target language is included in

    the dictionary and the rules’ set proper of the language.

    Given that this module directly transforms the semantic UNL representation into

    the morphological realization (that is, a sentence in natural language), the dictionary

    must contain the best detailed information in the following aspects: 

    •  Grammar category and subcategories: the more organized by hierarchies the

    lexical level, the better quality will be expected from the generation.

    •  Argument structure and prepositions required by verbs, nouns, and adjectives.

    •  Semantic information that may be relevant for the syntactic configuration in the

    target language.

    With the help of the information included in the dictionary, the generation rules

    have, as their main task, to transform the UNL expression into a phrase in the naturallanguage. Basically the following tasks are being carried out:

    •  Matching of the UNL relations with the grammar relations of the language.

    In the previous sentence, the agent of the predicate in UNL corresponds to thegrammatical subject in English or Spanish.

    •  “Translation” of the UNL attributes into their appropriate morphologic or

    syntactic realization. For example, the attribute “plural” has to be morphologi-

    cally realized as a plural noun in Spanish. The attribute “definite reference” is

    translated into Spanish through the insertion of a definite article. Not always there

    is a direct translation between UNL attributes and morphological/pragmatic in-

    formation in natural languages. For instance, when dealing with time, UNL only

    offers three possibilities (past, present and future). It would be “competence” of

    the generation rules of each natural language to correctly select the tense and ver-

    bal moods applicable to the languages that do not have this kind of time system

    (for instance, Spanish).

    •  Generation of pronouns and anaphoric expressions. The UNL expression is

    devoid of anaphoric elements, all concepts in UNL should be stated explicitly. It

    is the task of the generation rules to insert pronouns and other anaphoric elements

    in the generated texts.

    •  Morphologic synthesis. Finally, generation rules should tackle aspects such as

    agreement between verb and subject, or between adjectives and nouns, word or-

    der or the expression of the correct verb tense.

    Figure 4 shows the architecture for direct generation, there it can be seen how

    the “bilingual” dictionary Natural Language-UNL and the generation rules feeds theDeCo module in order to carry out the generation of UNL text into a natural language

    text.

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    37/465

    Standardization of the Generation Process in a Multilingual Environment 17

    3.3.2 Combined Generation (reuse of transfer components)

    The treatment for Russian and French languages inside the UNL system is the perfect

    example of the combined generation within the UNL framework. Both teams have in-

    tegrated the UNL system into their transfer systems, ETAP in Russian case [11], and

    Ariane for the French one [14]. These systems have chosen to reuse the available generators of the target languages

    and to develop an additional module that allows the conversion of the UNL represen-

    tation into a friendly format through the generators of their “transfer” systems. An ex-

    ample of combined architecture would be exemplified in figure 5.

    The so called “UNL transfer module” is with no doubt a new component to de-velop. However, the experience in the already mentioned systems has shown that the

    development costs of this module are cheaper than the costs for developing a new

    generator that could have the UNL code as its direct input.

    Fig. 4. Architecture of the direct generation in the UNL

    framework

    DeCo

    text in

    naturallanguage

    UNLdocumentbase

    diction-

    aryLn -

    generation

    rules

    new module reused generator

    text in

    naturallanguage

    UNLdocumentbase

    Transfer ge-nerator

    UNL transfermodule

    Fig. 5. Combined Generation

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    38/465

    18 Jesús Cardeñosa, Carolina Gallardo, and Edmundo Tovar

    4 Can the UNL be a generation standard?

    4.1 What is a standard?

    If we try to avoid the formal definitions for “standard”, it could be said that a standard

    is a set of rules, criteria and recommendations that allow to build a product or to de-sign and offer a service in a proper way that assures:

    •  The universalization of the work, that is, a unique way of doing something that at

    the same time can be independently evaluated, no matter who does it or when.

    •  The quality. When products or services have been carried out following a stan-

    dard, there is a certainty that the processes are well implemented and the product

    quality is not at risk.•  The assessment of the product or service provisions, meaning that it could be de-

    termined through a unique way, when a product or a service fulfills the specifica-

    tions it has been designed and built for.

    Many more could be enumerated, but we are focusing in these three that may be

    the most intuitive. As it has already been mentioned, the lesser development of some

    products (in this case, language generators) is due to the lack of standards that could

    assure these characteristics. The diversification and extremely disperse casuistic of the

    inputs to a generator cause that the output become the only way to assess it to estab-

    lish a subjective evaluation.

    Although there are some researchers that have not neglected this side of the gen-

    eration [15], this standard has not been yet established, neither formally nor de facto.

    4.2 The UNL as a standard

    Technically speaking, the lack of uniformity in the inputs to language generators is

    almost the only reason that restrains a bigger development. Therefore, the support

    systems to multilingual services see their action limited only to specific languages

    where translation services may be offered, either automatic or not. However, the lan-

    guage expansion is an unapproachable road with these methods. If the input to lan-

    guage generators is not standardized, this problem will not be solved in a global way.

    The only standardization would then be the choice of a content support that could ex-

    press itself in a unique way, with a specific language. Actually, this concept has ex-

    isted for many years, and it is the Interlingua concept. It is within this context where

    the UNL can play a role. The UNL has not been conceived as an interlingua, but itcan be used as one. The interlinguas had their historic moment when they faced the

    same problems as the other systems created for machine translation during the 80’s.

    At the beginning of the 90’s it was clear that the subject of the languages was much

    more complex than it seemed during the technological development of the 80’s and

    the exaggerated optimism of the time.

    It is not the purpose of this paper to describe the economic advantages of an inter-

    lingua over the traditional systems of machine translation regarding many languages

    (a traditional machine system requires 90 systems to support 10 languages, while one

  • 8/18/2019 Universal Networking Language: Advances in Theory and Applications

    39/465

    Standardization of the Generation Process in a Multilingual Environment 19

    based in interlingua requires only 20). In fact, the crossing point between systems

    takes place at three languages. For more than three languages interlingua is cheaper.

    However, historic matters at the beginning of the 90’s buried the interlinguas

    (mainly those developed in Japan and the USA) because while the interlingua based

    systems were not well defined, the “transfer” machine translation systems began to

    offer more positive results. Even so, within the group of language technologies, ma-

    chine translation became kind of discredited. At the end of the 90’s, the United Na-

    tions opted for models based on interlingua approximations to define the multilingual

    support systems for the Internet. The result is the today’s named UNL, already de-

    scribed in this chapter. Apparently, it would be the ideal system to solve the problem

    of the absence of a standard input to language generators. Nevertheless, a standard is

    something else than a technological solution. It could be summarized like this: a stan-

    dard is evaluated through the maturity concept that to sum up means that it would be

    associated to the organized and organizational maturity, that is, there has to be an or-

    ganization behind the standard that may be able to maintain, modify, allow the study

    of its acceptance and real use for it, and other factors. Currently, it could be said that

    the UNL has weak and strong points to formally become a standard [16]:

    Weak points:

     –   Relatively recent technology

     –   Not too much implemented

     –   Quality system not implemented Strong points:

     –   Worldwide organization behind (dissemination assured)

     –  

    Business expectations increased by the incorporation of minority lan-

    guages  – 

     

    Quality system defined 

    However, independently of the global factors, the technological approach is nowa-

    days the only one able to solve the problem of automatic multilingual generation sys-

    tems. Regarding the business approach, the expansion of multilingual systems in the

    Internet requires much more than traditional systems of machine translation. This is

    why the UNL is not just an interlingua, but a language to support knowledge reposito-ries, different ontological approaches, and other matters. Summarizing, the UNL (or

    something similar to it) is necessary and needed by others.

    5 A real experience: HEREIN and UNL

    5.1 |Herein and standardization of form and structure.

    The Herein system (IST-2000-29355) [17] is a perfect example of a massively multi-

    lingual environment. It constitutes an Internet-based facility for improving cultural