first international workshop on free/open-source rule ... · sal representation language...

27
First International Workshop on Free/Open-Source Rule-Based Machine Translation 2 nd and 3 rd November 2009 Universitat d'Alacant, Alacant - Spain OpenLogos OpenLogos Bernard Scott Bernard Scott Anabela Barreiro Anabela Barreiro [email protected] [email protected] Rule-driven Machine Translation

Upload: vuongdung

Post on 25-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

First International Workshop on Free/Open-Source Rule-Based Machine Translation

2nd and 3rd November 2009Universitat d'Alacant, Alacant - Spain

OpenLogosOpenLogos

Bernard Scott Bernard Scott Anabela BarreiroAnabela [email protected] [email protected]

Rule-driven Machine Translation

Page 2: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

�Background on OpenLogos

�System Pipeline Architecture

�SAL Representation Language

Presentation OutlinePresentation Outline

2

�SAL Representation Language

�Classic problems with rule-driven systems

�How SAL benefits translation

�Exploiting OpenLogos MT for new applications

�Free resources of OpenLogos

Page 3: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

�Adapted from Logos System (1970-2000) by DFKI

– Developed in US, Germany, Italy

Background to OpenLogosBackground to OpenLogos

– 25-100 development staff for 30 years

�Language pairs: E-G, E-F, E-S, E-I, E-P

G-E, G-F, G-I

� Industrial strength MT used in 12 countries

3

Page 4: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

�Multi-target System

�Pipeline Architecture

�Language-neutral Software

OpenLogos CharacteristicsOpenLogos Characteristics

�Language-neutral Software

– All linguistic knowledge is in form of data, stored in a relational databaserelational database

�SAL representation Language

–– SemantoSemanto--Syntactic Abstraction LanguageSyntactic Abstraction Language

�Semantic Processing

– Thousands of transformation rules in Semantic Table (= Semtab)Semantic Table (= Semtab)

4

Page 5: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

FormatRES1

RES2P1

P2

Semtab

SAL Rules

Input

OpenLogos Pipeline ArchitectureOpenLogos Pipeline Architecture

5

P2P3

P4

ST4

T3

T1T2

GEN

FormatTarget Rules Semtab

Semtab

Semtab

Target Rules

Target Rules

• Highly Modular

• Incremental Process

• Multi-Target System

• Bottom-up Analysis

• Deterministic Parse

Output

Page 6: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

Format

SAL Rules

SEMTAB

Enter

Pipeline

Incremental Source Analysis Incremental Source Analysis -- 11

6

Clausal Segmentation Clausal Segmentation waysways ofof cookingcooking lentilslentils -- VV

Homograph Resolution Homograph Resolution typestypes of of cookingcooking utensilsutensils -- ADJADJ

Deterministic parsing Deterministic parsing requires that all ambiguous PoS be resolvedrequires that all ambiguous PoS be resolved

RES2

RES1

Page 7: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

Parse1

Parse3

Parse2

Simple NP

SAL RulesSemtab

Incremental Source Analysis Incremental Source Analysis -- 22

7

Parse3

Parse4

S

• Simple NP

• Semantic

resolution • NP Prep

NP

• Relative

Clauses

• Semantic

resolution

• Complex

NP

• Simple

Clauses

• Semantic

resolution

• Complex

sentences

• Semantic

resolution

E.g: a book on the presidency

on = about; concerning

≠ a book on the table 7

Page 8: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

SALSAL stands for SSemantoemanto--syntactic syntactic AAbstraction bstraction LLanguageanguage

� A Semanto-Syntactic Taxonomy: SupersetsSupersets / SetsSets / SubsetsSubsets

� Semantic-Syntactic continuum from NL word to Word Class

SAL Representation LanguageSAL Representation Language

– Literal word: airport

– Head morph: port

– SAL Subset: agfunc (agentive functional location)

– SAL Set: func (functional location)

– SAL Superset: PL (place)

– Word Class: N

Both Pipeline Input Stream and Rulebases are in SAL

8

Page 9: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

SAL Noun SupersetsSAL Noun Supersets

9

E.g: two pieces of cake

NP parse must have:

- Plural morphology of pieces

- Semantics of cake

Developed:- inductively- by trial and error- over a period of years- by the development team

Page 10: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

Abstract Noun Superset ����

Non-verbal Abstract Set ����

Non-verbalSubsets

Classifications

Abstract Noun TaxonomyAbstract Noun Taxonomy

10

Verbal Abstract Set ����

Verbal Subsets

Classifications

Methods / Procedures

Page 11: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

Is the word Is the word cookingcooking a verb or an adjective?a verb or an adjective?

waysways of cooking lentilsof cooking lentils

typestypes of cooking utensilsof cooking utensils

Use of SAL Codes to Resolve HomographsUse of SAL Codes to Resolve Homographs

11

typestypes of cooking utensilsof cooking utensils

waysways �� N(AB/N(AB/methodmethod)) �� parser parser verbverb biasbias

typestypes �� N(AB/N(AB/classclass)) �� nonnon--verbverb biasbias

SAL contributes to

the resolution of

the homograph

The SAL code N(AB/method) in the rule

matches on a similar code in the SAL input

stream. The effect of such a match is to

resolve cooking as a verb

Page 12: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

Rules Have Five ComponentsRules Have Five Components

�� SAL PatternSAL Pattern– PARSE2 Example: N(IN/data;u) Prep(“on”;u) N(u;u) (a book on the presidency)

�� ConstraintsConstraints– Match only if conditions are true or false

What SAL Rules Look LikeWhat SAL Rules Look Like

�� Source ActionsSource Actions– RES Rulebase: Resolves syntactic ambiguity

– PARSE Rulebase: Creates parse tree

– SEMTAB Rules: Effect semantic disambiguation

�� Target Action (optional)Target Action (optional)– Effect syntactic and/or semantic transfer

�� Comment LineComment Line– PARSE2 Example: NP(info) Prep(“on”) NP � N1 “about” N2

E.g., E.g., bookbook onon political political satire satire �� book book aboutabout ........

12

Page 13: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

�Complexity– Logic Saturation

– Performance Degradation

– Difficult Maintainability

– System Improvability Stasis

Classic Problem of RBMTClassic Problem of RBMT

– System Improvability Stasis

�Ambiguity– Quality/Accuracy of Output – depends on effective disambiguation

�Classic Dilemma of the Developer– Reduce rulebase size to relieve complexity weakens disambiguation

– Increase rulebase size to address ambiguities increases complexity

13

Page 14: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

�Complexity

– Rules and input stream are both expressed as SAL patterns

– Homogeneous apples-to-apples matching

How OpenLogos Addresses Complexity and How OpenLogos Addresses Complexity and AmbiguityAmbiguity

14

– Homogeneous apples-to-apples matching

– SAL input stream serves as search argument to SAL rulebase

– Rules are SAL patterns stored in an indexed pattern dictionary

– No limit on rule size, no impact on performance

– Rules are self organizing, like a dictionary

– Rulebase is easy to maintain

Page 15: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

V2 V3 V4 V5 V6 V7

S

V1

How Rules Are AppliedHow Rules Are Applied

Metaphor: biological neural net As the analysis

progresses:

1- cells

become fewer fewer

(abstract

nature of the

parse)

15

R1 R2 P1 P2 P3 P4

– Vectors labeled V1-V6 = SAL input stream of the pipeline

– Cells in input vectors = SAL elements/words to which the NL input stream has been

converted

– In this network, R1 through P4 = hidden layers containing SAL rules

– R1 represents RES1, P1 represents Parse1 and so on.

– Each hidden layer contains between 2-4 thousand rules, organized by their SAL

pattern, as in a dictionary.

2- vectors

become lighterlighter

(semantic

dismbiguation)

Page 16: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

�Ambiguity

– Syntactic Homograph Resolution

How OpenLogos Addresses Complexity and How OpenLogos Addresses Complexity and AmbiguityAmbiguity

16

– Scoping of adjectives, prepositions

–– PolysemyPolysemy

Page 17: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

Resolution of Polysemy in OpenLogosResolution of Polysemy in OpenLogos

Key to Polysemy HandlingKey to Polysemy Handling: :

1. SAL Representation Language 1. SAL Representation Language

2. Semantic Table2. Semantic Table

17

2. Semantic Table2. Semantic Table

NL NL StringString SemSem. Table . Table RuleRule FrenchFrench TransferTransfer

raise a childraise a child �� V(‘V(‘raise’raise’) N() N(ANdesANdes)) �� éleverélever. . . . . .

raise cornraise corn �� V(‘V(‘raise’raise’) N() N(MAedibMAedib) ) �������� cultivercultiver. . . .. .

raise the rentraise the rent �� V(‘V(‘raise’raise’) N() N(MEabsMEabs) ) �� augmenteraugmenter. . . . . .

Page 18: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

Deep Deep Structure Rules of Semantic TableStructure Rules of Semantic Table

A A Single DeepSingle Deep--Structure Rule Structure Rule

Matches Multiple SurfaceMatches Multiple Surface--StructuresStructures

And Produces Appropriate Target TransfersAnd Produces Appropriate Target Transfers

18

he he raisedraised the rentthe rent � il il a a augmentéaugmenté le loyerle loyer V+ObjectV+Object

the the raisingraising of of the rentthe rent � l’l’augmetationaugmetation du du loyerloyer GerundGerund

the rent, the rent, raisedraised byby …… � le le loyerloyer,, augmentéaugmenté de …de … Part. ADJPart. ADJ

a a rentrent raiseraise � une une augmentationaugmentation de loyerde loyer NounNoun

Page 19: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

How SAL Benefits TranslationHow SAL Benefits Translation

Examples showing

target voice transformations

EN passive voice >>> FR active voice

19

The situation was alluded to The situation was alluded to by my friend in his letterby my friend in his letter..

Mon amiMon ami a fait allusion à la situation dans a fait allusion à la situation dans sasa lettre.lettre.

The situation was alluded to The situation was alluded to in their letterin their letter..

OnOn a fait allusion à la situation dans a fait allusion à la situation dans leurleur lettre.lettre.

Page 20: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

��ReEscreve (ReWriter)ReEscreve (ReWriter)– Language composition tool

– Helps authors to improve text quality and consistency

– Provides paraphrases that can substitute the content of existing text, standardize text

style or optimize meaning

Applications Employing OpenLogos ResourcesOpenLogos EN-PT dictionary data were adapted and enhanced

with new properties (derivational, etc.) to create a new resource:

Port4NooJ (http://www.linguateca.pt/Repositorio/Port4NooJ/).

The new applications created for Portuguese NLP use Port4NooJ.

Contribution of OpenLogos Resources for New NLP Contribution of OpenLogos Resources for New NLP ApplicationsApplications

20

style or optimize meaning

– Based on NooJ’s technology (http://ww.nooj4nlp.net/)

– Publicly available at: http://www.linguateca.pt/ReEscreve/

��ParaMTParaMT– Bilingual/multilingual paraphraser (translator prototype)

– Uses similar methodology to that employed by ReEscreve

– Uses bilingual data

– Directly applicable to MT

��CorpógrafoCorpógrafo– Multilingual corpora management tool

– Available at: http://www.linguateca.pt/corpografo/

Page 21: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

ReEscreve: Suggestions for Text RewritingReEscreve: Suggestions for Text Rewriting

Paraphrases of SVC

presented by

ReEscreve’s

paraphrasing system

21

Page 22: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

ReEscreve: a Rewritten TextReEscreve: a Rewritten Text

Text rewritten based

on the user’s

preferences

Users can suggest

new expressions!

22

Page 23: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

MACHINE

ParaMT: a Paraphraser Applicable to MTParaMT: a Paraphraser Applicable to MT

EN verbsPT support verb construction >

23

Recognition of

Portuguese SVC

and translation

into English verbs

MACHINE

TRANSLATION

Page 24: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

� The Language Technology Lab of DFKI has adapted OpenLogos from the commercial Logos System

� OpenLogos is available in LinuxLinux platform using PostgreSQLPostgreSQL

� OpenLogos employs onlyonly open source components

OpenLogos Resources at DFKIOpenLogos Resources at DFKI

24

� OpenLogos employs onlyonly open source components

– Use of open source development tools and compilers, such as GCCGCC

– Replacement of non-open code and libraries

– Use of open source databases instead of a commercial database. All

language specific resources have been converted to an open source

database, such as PostgreSQLPostgreSQL

– Use of open standards instead of vendor specific protocols

– As a proof of concept for the software migration, Linux is used as target

platform for the first open source release of Logos

Page 25: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

DFKI hosts an open OpenLogos mailing list dedicated to

discussion and exchange of information concerning

OpenLogos developments and problems at

DFKI User Assistance with OpenLogosDFKI User Assistance with OpenLogos

OpenLogos developments and problems at

http://www.dfki.de/mailman/listinfo/openlogos-list

25

Page 26: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

SAL representation languageSAL representation language

� Described in depth

Detailed information aboutDetailed information about

Supplemental Free Resources at Logos System Supplemental Free Resources at Logos System Archives WebsiteArchives Website

26

Detailed information aboutDetailed information about

� System Architecture

� System Flow

Technical papersTechnical papers

On the Internet at: On the Internet at: http://logossystemarchives.homestead.comhttp://logossystemarchives.homestead.com

Page 27: First International Workshop on Free/Open-Source Rule ... · SAL Representation Language Presentation Outline 2 ... V2 V3 V4 V5 V6 V7 S V1 How Rules Are Applied ... language specific

First International Workshop on Free/Open-Source Rule-Based Machine Translation

2nd and 3rd November 2009Universitat d'Alacant, Alacant - Spain

Bernard Scott Bernard Scott Anabela BarreiroAnabela [email protected] [email protected]

Bernard Scott Bernard Scott Anabela BarreiroAnabela [email protected] [email protected]