first machine translation marathon in the americasmachine translation marathon • 2007 edinburgh,...

Post on 08-Oct-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

First Machine Translation Marathon in the Americas

10-15 May 2015 Champaign, Illinois, USA

Machine Translation Marathon

• Week-long Summer school

• Tutorial labs

• Open source hack-a-thon

Machine Translation Marathon• 2007 Edinburgh, Scotland

• 2008 Wandlitz, Germany

• 2009 Prague, Czech Republic

• 2010 Dublin, Ireland

• 2010 Le Mans, France

• 2011 Trento, Italy

• 2012 Edinburgh, Scotland

• 2013 Prague, Czech Republic

• 2014 Trento, Italy

Sponsors

• Bloomberg

• University of Illinois at Urbana-Champaign School for Literatures, Cultures & Linguistics

• University of Illinois at Urbana-Champaign Department of Linguistics

Welcome

First Martian MT MarathonUniversity of Sabishii - 2 Feb 17, 2148

Introduction to Machine Translation

The world is badly in need of translators. Almost nobodydenies this. The number of pairs of languages betweenwhich transla ons must be made and the number andtypes of documents involved is constantly increasing.

Mar n Kay (1980)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

But we are fortunate to be children of the age ofcomputers and it is to them that we naturally turn. Acomputer is a device that can be used to magnify humanproduc vity.

Mar n Kay (1980)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

Properly used, it does not dehumanize by imposing itsown Orwellian stamp on the products of the human spiritand the dignity of human labor but, by taking over what ismechanical and rou ne, it frees human beings for what isessen ally human.

Mar n Kay (1980)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

196 BC

Rosetta Stone

• Text in translation

• Parallel text

1950s

Not (yet) on github

Transla on, from one language into another, presentssubtle, important, ancient, and difficult problems.

Warren Weaver (1955)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

That the problems are subtle no one can doubt. We aretold, by those who are sensi ve to all the beau es of theRussian language, that it is completely fu le to try totranslate the poetry of Pushkin into any other language —fu le not for a computer, but fu le for the most ablebilingual poet.

Warren Weaver (1955)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

Adam names the animals

The Tower of Babel

Students of languages and of the structures of languages,the logicians who design computers, the electronicengineers who build and run them— and specially the rareindividuals who share all of these talents and insights —are now engaged in erec ng a new Tower of An -Babel.

Warren Weaver (1955)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

This new tower is not intended to reach to Heaven. But itis hoped that it will build part of the way back to thatmythical situa on of simplicity and power when men couldcommunicate freely together, and when this contributed sonotably to their effec veness.

Warren Weaver (1955)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

No reasonable person thinks that a machine transla oncan ever achieve elegance and style. Pushkin need notshudder. And the kinds of ques ons that enter inconnec on with the transla on of the Bible will con nue torequire at least fi y learned men.

Warren Weaver (1955)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

There is now reason to hope …[that this] new tower maywell have more than just a workshop basement. A fewstories above ground would not afford, from this new tower,a drama c far view of great aesthe c value, but it could bevery useful for loading trucks with informa on content.

Warren Weaver (1955)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

This, in fact, is the reasonable purpose of this effort. Notto charm or delight, not to contribute to elegance orbeauty; but to be of wide service in the work-a-day task ofmaking available the essen al content of documents inlanguages which are foreign to the reader.

Warren Weaver (1955)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

This is a limited, but a very important, aspect oftransla on. A new tower of this sort would by no meansreach to Heaven — but it would be aimed in a gooddirec on.

Warren Weaver (1955)

The New Tower Empowering Readers CAT Tools MT for Post-Edi ng Conclusion References

Workshop on Interac ve and Adap ve Machine Transla on Lane Schwartz, Isabel Lacruz

Pre-history of Machine Translation:

Universal Language in the 17th Century

1600s

Francis Bacon

• Advancement of Learning (1605)

• De Augmentis Scientiarum (1623)

John Comenius

• Via Lucis (1641)

Isaac Newton

• Unpublished notes (1661)

Marin Mersenne

René Descartes

• Letter to Mersenne (1629)

John Wilkins

• An Essay towards a Real Character and a Philosophical Language (1668)

Gottfried Leibniz

• De Arte Combinatoria (1666)

Thomas Hobbes

• Leviathan (1651)

• De Corpora (1655)

Jonathan Swift

• Gulliver’s Travels (1726)

Voltaire

• Candide (1759)

The History of Machine Translation

1948

ПРЕАМБУЛА

Принимая во внимание, что признание достоинства, присущего всем членам человеческой семьи, и равных и неотъемлемых прав их является основой свободы, справедливости и всеобщего мира; и

принимая во внимание, что пренебрежение и презрение к правам человека привели к варварским актам, которые возмущают совесть человечества, и что создание такого мира, в котором люди будут иметь свободу слова и убеждений и будут свободны от страха и нужды, провозглашено как высокое стремление людей; и

принимая во внимание, что необходимо, чтобы права человека охранялись властью закона в целях обеспечения того, чтобы человек не был вынужден прибегать, в качестве последнего средства, к восстанию против тирании и угнетения; и

принимая во внимание, что необходимо содействовать развитию дружественных отношений между народами; и

принимая во внимание, что народы Объединенных Наций подтвердили в Уставе свою веру в основные права человека, в достоинство и ценность человеческой личности и в равноправие мужчин и женщин и решили содействовать социальному прогрессу и улучшению условий жизни при большей свободе; и

принимая во внимание, что государства-члены обязались содействовать, в сотрудничестве с Организацией Объединенных Наций, всеобщему уважению и соблюдению прав человека и основных свобод; и

принимая во внимание, что всеобщее понимание характера этих прав и свобод имеет огромное значение для полного выполнения этого обязательства,

Генеральная Ассамблея,

провозглашает настоящую Всеобщую декларацию прав человека в качестве задачи, к выполнению которой должны стремиться все народы и государства с тем, чтобы каждый человек и каждый орган общества, постоянно имея в виду настоящую Декларацию, стремились путем просвещения и образования содействовать уважению этих прав и свобод и обеспечению, путем национальных и международных прогрессивных мероприятий, всеобщего и эффективного признания и осуществления их как среди народов государств-членов Организации, так и среди народов территорий, находящихся под их юрисдикцией.

PREAMBLE

Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world,

Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people,

Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law,

Whereas it is essential to promote the development of friendly relations between nations,

Whereas the peoples of the United Nations have in the Charter reaffirmed their faith in fundamental human rights, in the dignity and worth of the human person and in the equal rights of men and women and have determined to promote social progress and better standards of life in larger freedom,

Whereas Member States have pledged themselves to achieve, in co-operation with the United Nations, the promotion of universal respect for and observance of human rights and fundamental freedoms,

Whereas a common understanding of these rights and freedoms is of the greatest importance for the full realization of this pledge,

Now, Therefore THE GENERAL ASSEMBLY proclaims THIS UNIVERSAL DECLARATION OF HUMAN RIGHTS as a common standard of achievement for all peoples and all nations, to the end that every individual and every organ of society, keeping this Declaration constantly in mind, shall strive by teaching and education to promote respect for these rights and freedoms and by progressive measures, national and international, to secure their universal and effective recognition and observance, both among the peoples of Member States themselves and among the peoples of territories under their jurisdiction.

Source language

words

Target language

words

1950s

The First Conference on Machine Translation

June 1952

Massachusetts Institute of Technology

Who’s Who in 1952?• Yehoshua Bar-Hillel (MIT)

• Andrew Booth (Birkbeck College, London)

• Leon Dostert (Georgetown University)

• Erwin Reifler (University of Washington)

• Victor Oswald (UCLA)

Some Issues Discussed• OCR

• Document summarization

• Limitations of computing hardware

• Pre-editing, post-editing, and constrained language

• Problem of translating idioms

• Domain adaptation

• Out-of-vocabulary words and Zipfian distributions

• Assumed importance of morphology

Source language

words

Target language

words

Source language

words

Target language

words

Source morphemes

Target morphemes

1954

Machine Translation Meets the Public

1958

The sponsors start asking questions

Bar-Hillel’s Verdict in 1958?

• World-wide MT in 1954 estimated at $3 million (This is $24.5 million in 2015 dollars)

• The problems solved to date were the easy ones

• Fully automatic, high quality machine translation is unattainable “not only in the near future, but altogether”

1960s

1966

The ALPAC Report

ALPAC Report• Automatic Language Processing Advisory

Committee - National Academy of Sciences

• “There is no emergency in the field of translation.”

• Funding should focus on fundamental research in computational linguistics

• Led to a dramatic decrease in U.S. Government funding of machine translation for the next 20 years

1970s - 1980s

Source language

words

Target language

words

Source language

words

Target language

words

Source morphemes

Target morphemes

Source language

words

Target language

words

Source morphemes

Target morphemes

Source syntax

Target syntax

Source language

words

Target language

words

Source morphemes

Target morphemes

Source syntax

Target syntax

Source semantics

Target semantics

Source language

words

Target language

words

Source morphemes

Target morphemes

Source syntax

Target syntax

Source semantics

Target semantics

Interlingua (language-independent semantics)

Source language

words

Target language

words

Source morphemes

Target morphemes

Source syntax

Target syntax

Source semantics

Target semantics

Interlingua (language-independent semantics)

1988

Voltaire

• Candide (1759)

Statistical MT

• Candide (1988)

Source language

words

Target language

words

Historical Note

•Why IBM Models?

Fred Jelinek(1932-2010)

Some of us started to wonder in the mid 1980s whether our [speech recognition]

methods could be successfully applied to new fields. Bob Mercer and I spent many of

our after-lunch “periphery” walks discussing possible candidates. We soon

came up with two: machine translation and stock market modeling

“The validity of a statistical (information theoretic) approach to MT has indeed been

recognized, as the authors mention, by Weaver as early as 1949. And was universally

recognized as mistaken by 1950 (cf. Hutchins, MT – Past, Present, Future, Ellis Horwood,

1986, p. 30ff and references therein). The crude force of computers is not science. The paper is

simply beyond the scope of COLING.”

if length(output_dir)>0 then trc_init('Candide6',output_dir); else trc_init('Candide6'); read_ddinf; ddinf_dump; allocate_buffers; read_checkpoint; open_anthology; trc_write('q_document:'||char(extract_text_q)||' q_decoding:' || char(q_decoding)||' q_timing: ‘||char(q_timing) || ' use_pos_attrs: '||q_use_pos_attrs); open_tex_file(output_unit,output_buffer,'output_file'); if q_input_text then open_tex_file(input_unit,input_text_buffer,'intext_file'); n_processed_this_time = 0; write_trc_header; u_display_virtual_memory_used('Candide: start of loop through sentences'); do while(target_reade(analysis_len,analysis_values)); if q_input_text then do; if ^na_x_get_text(input_attr,input_text) then error('Jack'); massage_input_text(input_text); trc_write('Input: '||input_text); end; u_display_virtual_memory_used('Candide: pre-decoding'); if ^na_x_get_values(analysis_attr,analysis_len,analysis_values) then error("Candide: Can't Read Target"); trc_write('Analysis: '||spelling_string_of_attr(analysis_attr)); if ^na_x_get_values(transfer_attr,transfer_len,transfer_values) then error('Candide: Bosco'); if ^na_x_get_values(transfer_align_attr,transfer_align_len,transfer_align_values) then error('Candide: Bosco'); u_display_virtual_memory_used('Candide: post-decoding, pre-synthesis'); voc_free_unused_spellings(4); if ^na_x_get_text(synthesis_attr,synthesis_text) then error('Candide: Nabisco'); u_display_virtual_memory_used('Candide: post-synthesis'); voc_free_unused_spellings(4);

if q_human_word &* ^na_x_get_values(human_attr,human_len,human_values) then error('Candide: Triscut'); if q_use_pos_attrs then do; if ^na_x_get_values(start_pos_attr, start_pos_len,start_pos_values) then error('Candide: Biscut'); if ^na_x_get_values(end_pos_attr, end_pos_len,end_pos_values) then error('Candide: Uniscut'); if end_pos_len ^= start_pos_len then error('Candide: Zeroscut'); end; if q_timing then elapsed_time = na_x_time_of_action(decoder_action_unit); display_answer; if extract_text_q then do; extract_prev_interline_text(interline_string); write_to_tex_file(output_unit, output_buffer, interline_string); end; if q_input_text then write_to_tex_file(input_unit,input_text_buffer,input_text); write_to_tex_file(output_unit,output_buffer,synthesis_text); write_topn_and_candidate_files; n_processed_this_time = n_processed_this_time + 1; last_sentence_processed = last_sentence_processed+1; last_item_processed = current_item; checkpoint; display('Candide: checkpointed'); end; if extract_text_q then do; extract_final_text(interline_string); write_to_tex_file(output_unit, output_buffer, interline_string); end; if q_input_text then close_tex_file(input_unit,input_text_buffer); close_tex_file(output_unit,output_buffer); display('Candide: ALL DONE !'); trc_time; return(0);

top related