machine translation for manufacturing: a case study at ford motor

14
Machine translation (MT) was one of the first applications of artificial intelligence technology that was deployed to solve real-world problems. Since the early 1960s, researchers have been building and uti- lizing computer systems that can translate from one language to another without requiring extensive human intervention. In the late 1990s, Ford Vehicle Operations began working with Systran Software Inc. to adapt and customize its machine-translation tech- nology in order to translate Ford’s vehicle assembly build instructions from English to German, Spanish, Dutch, and Portuguese. The use of machine transla- tion was made necessary by the vast amount of dynamic information that needed to be translated in a timely fashion. The assembly build instructions at Ford contain text written in a controlled language as well as unstructured remarks and comments. The MT system has already translated more than 7 million instructions into these languages and is an integral part of the overall manufacturing process-planning system used to support Ford’s assembly plants in Europe, Mexico and South America. In this paper, we focus on how AI techniques, such as knowledge rep- resentation and natural language processing can improve the accuracy of machine translation in a dynamic environment such as auto manufacturing. F ord Motor Company has been manufac- turing and selling automobiles outside the United States since the early 1900s. As a global company, Ford currently has assembly plants located throughout the world, including many locations where English is not spoken by our assembly employees. These requirements have motivated us to explore new technolo- gies, such as machine translation (MT), in order to translate and disseminate critical information in languages other than English. Since 1998, Ford Vehicle Operations has been utilizing a machine-translation system in order to translate our process assembly build instruc- tions from English to German, Spanish, Por- tuguese, and Dutch. This system was developed in conjunction with Systran Software Inc. and is an integral part of our worldwide process- planning system for manufacturing assembly. The input to our system is a set of process build instructions that are written using a controlled language known as Standard Language. The process sheets are read by an artificial intelli- gence (AI) system that parses the instructions and creates detailed work tasks for each step of the assembly process (Rychtyckyj 1999). These work tasks are then released to the assembly plants where specific workers are allocated to perform each task. In order to support the assembly of vehicles at plants where the work- ers do not speak English, we utilize MT tech- nology to translate these instructions into the native languages of these workers. Standard Language is primarily a restricted subset of Eng- lish and contains a limited vocabulary of about 5,000 words that also include acronyms, abbre- viations, proper nouns, and other Ford-specif- ic terminology. In addition, Standard Language allows the process sheet writers to embed com- ments within Standard Language sentences and to attach explanatory remarks to the instructions. These comments and remarks are ignored by the AI system during processing, but have to be translated by the MT system. Standard Language also utilizes some structures that are grammatically incorrect, which creates problems for the MT translation process. Based on our experience, we have concentrated on two different approaches to improve the qual- ity of machine translation: (1) develop a Articles FALL 2007 31 Copyright © 2007, American Association for Artificial Intelligence. All rights reserved. ISSN 0738-4602 Machine Translation for Manufacturing: A Case Study at Ford Motor Company Nestor Rychtyckyj AI Magazine Volume 28 Number 3 (2007) (© AAAI)

Upload: trinhphuc

Post on 01-Jan-2017

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Machine Translation for Manufacturing: A Case Study at Ford Motor

■ Machine translation (MT) was one of the firstapplications of artificial intelligence technology thatwas deployed to solve real-world problems. Since theearly 1960s, researchers have been building and uti-lizing computer systems that can translate from onelanguage to another without requiring extensivehuman intervention. In the late 1990s, Ford VehicleOperations began working with Systran Software Inc.to adapt and customize its machine-translation tech-nology in order to translate Ford’s vehicle assemblybuild instructions from English to German, Spanish,Dutch, and Portuguese. The use of machine transla-tion was made necessary by the vast amount ofdynamic information that needed to be translated ina timely fashion. The assembly build instructions atFord contain text written in a controlled language aswell as unstructured remarks and comments. The MTsystem has already translated more than 7 millioninstructions into these languages and is an integralpart of the overall manufacturing process-planningsystem used to support Ford’s assembly plants inEurope, Mexico and South America. In this paper, wefocus on how AI techniques, such as knowledge rep-resentation and natural language processing canimprove the accuracy of machine translation in adynamic environment such as auto manufacturing.

Ford Motor Company has been manufac-turing and selling automobiles outside theUnited States since the early 1900s. As a

global company, Ford currently has assemblyplants located throughout the world, includingmany locations where English is not spoken byour assembly employees. These requirementshave motivated us to explore new technolo-gies, such as machine translation (MT), inorder to translate and disseminate criticalinformation in languages other than English.Since 1998, Ford Vehicle Operations has been

utilizing a machine-translation system in orderto translate our process assembly build instruc-tions from English to German, Spanish, Por-tuguese, and Dutch. This system was developedin conjunction with Systran Software Inc. andis an integral part of our worldwide process-planning system for manufacturing assembly.The input to our system is a set of process buildinstructions that are written using a controlledlanguage known as Standard Language. Theprocess sheets are read by an artificial intelli-gence (AI) system that parses the instructionsand creates detailed work tasks for each step ofthe assembly process (Rychtyckyj 1999). Thesework tasks are then released to the assemblyplants where specific workers are allocated toperform each task. In order to support theassembly of vehicles at plants where the work-ers do not speak English, we utilize MT tech-nology to translate these instructions into thenative languages of these workers. StandardLanguage is primarily a restricted subset of Eng-lish and contains a limited vocabulary of about5,000 words that also include acronyms, abbre-viations, proper nouns, and other Ford-specif-ic terminology. In addition, Standard Languageallows the process sheet writers to embed com-ments within Standard Language sentencesand to attach explanatory remarks to theinstructions. These comments and remarks areignored by the AI system during processing,but have to be translated by the MT system.Standard Language also utilizes some structuresthat are grammatically incorrect, which createsproblems for the MT translation process. Basedon our experience, we have concentrated ontwo different approaches to improve the qual-ity of machine translation: (1) develop a

Articles

FALL 2007 31Copyright © 2007, American Association for Artificial Intelligence. All rights reserved. ISSN 0738-4602

Machine Translationfor Manufacturing:

A Case Study at Ford Motor Company

Nestor Rychtyckyj

AI Magazine Volume 28 Number 3 (2007) (© AAAI)

Page 2: Machine Translation for Manufacturing: A Case Study at Ford Motor

process and methodology to build, test, andmaintain translation glossaries that containthe specific terminology that needs to be trans-lated; and (2) develop a process to analyze andconvert the source text into a format that ismuch more understandable to the MT systemand produces more accurate results.

Problem DescriptionFord Motor Company operates vehicle assem-bly plants all over the world, including loca-tions in Germany, Spain, Belgium, Mexico,Brazil, and Venezuela. The assembly-line work-ers at these plants generally do not speak Eng-lish, but the assembly build instructions arealways written in English. Therefore, theseinstructions need to be translated into thehome language of the workers who will actual-ly be following these instructions. The stan-dard process-planning document, the processsheet, is the primary means for conveying theassembly information from the initial process-planning activity to the assembly plant. Aprocess sheet contains the detailed instructionsneeded to build a portion of a vehicle. A single

vehicle may require several thousand processsheets to describe its assembly. An engineerwrites a process sheet describing a portion ofthe assembly work utilizing a restricted subsetof English known as Standard Language. Stan-dard Language allows an engineer to write clearand concise assembly instructions that aremachine readable. The process sheets also con-tain embedded comments and associatedremarks that need to be translated. In addition,changes to the process build instructions arefrequent and this necessitates the retranslationof those instructions. In a typical month, wemay need to translate more than 150,000records from English into our target languagesof Spanish, Portuguese, Dutch, and German.Figure 1 displays our monthly translation met-rics from 2004–2006. Since the initial deploy-ment of this system, we have translated morethan 7 million records. The sheer volume,quick turnaround, and cost required precludedthe use of human translators on this project.The use of a controlled language, such as Stan-dard Language, also gave us impetus to find anautomated solution. The specific terminologyrequired to describe the automotive assembly

Articles

32 AI MAGAZINE

Machine Translation Metrics

0

50000

100000

150000

200000

250000

300000

350000

400000

Decem

ber-0

4

Janua

ry-0

5

Febr

uary

-05

Marc

h-05

April-0

5

May

-05

June-0

5

July-0

5

Augus

t-05

Sept

embe

r-05

Octobe

r-05

Novem

ber-0

5

Decem

ber-0

5

Janua

ry-0

6

Febr

uary

-06

Marc

h-06

April-0

6

May

-06

June-0

6

July-0

6

Month

Num

ber

of

Tran

slat

ion

s

PortugueseDutchSpanishGerman

Figure 1. Monthly Translation Counts.

Page 3: Machine Translation for Manufacturing: A Case Study at Ford Motor

and engineering methodology at Ford requiredus to develop technical glossaries that couldaccurately translate text containing theseterms. However, we also learned that machine-translation accuracy can be greatly improvedby analyzing and modifying the source text toimprove the quality of the translation output.In the next section, I will describe in moredetail how we combined natural language pro-cessing and knowledge representation and rea-soning to build and deploy a machine-transla-tion system.

Application DescriptionThe machine-translation system utilized atFord is integrated into the Global Study ProcessAllocation System (GSPAS). The goal of GSPASis to incorporate a standardized methodologyand a set of common business practices for the

design and assembly of vehicles to be used byall assembly plants throughout the world.GSPAS allows for the integration of parts, tools,process descriptions, and all other informationrequired to build a motor vehicle into one sys-tem. It also provides the engineering and man-ufacturing communities with a common plat-form and toolset for manufacturing processplanning. GSPAS utilizes Standard Language asa requirement for writing process build instruc-tions, and we have deployed an MT solutionfor the translation of these process buildinstructions.

The translation process at Ford for our man-ufacturing build instructions is fully automatedand does not require human manual interven-tion. All of the process build instructions arestored within an Oracle database; they are writ-ten in English and validated by the AI system.AI validation consists of parsing the Standard

Articles

FALL 2007 33

Figure 2. Machine Translation in GSPAS.

Page 4: Machine Translation for Manufacturing: A Case Study at Ford Motor

Language sentence, analyzing it, and creatingthe appropriate work description based on theinformation in the knowledge base. The systemthen creates an output set of work instructionsand assigns their associated MODAPTS (Modu-lar Arrangement of Predetermined Time Stan-dards) codes. MODAPTS codes are used to cal-culate the time required to perform theseactions. MODAPTS is an industrial measure-ment system used around the world (Carey2001). A more complete description of theGSPAS AI system can be found in Rychtyckyj(1999). A sample of a GSPAS process sheet isshown in figure 2.

After a process sheet is validated and the AIsystem generates the appropriate MODAPTScodes and times, a process engineer will releasethe process sheet to the appropriate assemblyplants. A vehicle that is built at multiple plantsneeds to have these process sheets sent to eachof these assembly plants. The informationabout each local plant is stored in the database,and those plants that require translation areidentified by the system. The system thenselects the process sheets that require transla-tion and starts the daily translation process foreach language. Currently we translate theprocess build instructions for 32 different vehi-cles into the appropriate language. English-Spanish is the most commonly used languagepair, as it is utilized at our assembly plants inSpain, Mexico, and South America. However,we have recently developed and deployed aseparate technical glossary for the English-Spanish translation system for our plants inMexico due to the differences in the translatedterminology between “Mexican Spanish” andregular Spanish.

The machine-translation system was inte-grated into GSPAS through the development ofan interface to the Oracle database. Our trans-lation programs extract the data from an Ora-cle database, modify the source text to improvetranslation accuracy, utilize the SYSTRAN sys-tem to perform some postprocessing, and thensend the data back to the Oracle database.

Our user community is located globally. Thetranslated text is displayed on the user’s PC orworkstation through the use of a graphical userinterface to the GSPAS system. The Ford multi-targeted customized dictionary that containsFord technical terminology was developed inconjunction with Systran and Ford, based oninput from engineers and linguists familiarwith Ford’s terminology.

One of the most difficult issues in deployingany translation is the need to obtain consistentand accurate evaluation with regard to thequality of translations (both human and

machine). We are using the J2450 metric devel-oped by the Society of Automotive Engineers(SAE) as a guide for our translation evaluators(SAE 2002). The J2450 metric was developed byan SAE committee consisting of representativesfrom the automobile industry and the transla-tion community as a standard measurementthat can be applied to grade the translationquality of automotive service information. Thismetric provides guidelines for evaluators to fol-low, describes a set of error categories, specifiesthe weight of the errors found, and calculates ascore for a given document. The metric doesnot attempt to grade style, but focuses primari-ly on the understandability of the translatedtext. The utilization of the SAE J2450 metrichas given us a consistent and tangible methodto evaluate translation quality and identifywhich areas require the most improvement.

We have also spent substantial effort in ana-lyzing the source text in order to identifywhich terms are used most often in StandardLanguage so that we can concentrate ourresources to correctly translate those mostcommon terms (Manning and Schulze 2000).This process was accomplished by using theparser from our AI system to store parsed sen-tences into the database. Periodically, we runan analysis of our parsed sentences and createa table where our terminology is listed in orderof usage frequency. This table is then comparedto the technical glossary to ensure that themost commonly used terms are being translat-ed correctly. The frequency analysis also allowsus to calculate the number of terms that needto be translated correctly to meet a given trans-lation accuracy threshold. For example, we cancalculate that 80 percent translation accuracy(based on terminology) requires that the most-frequently used 200 terms need to be insertedinto the translation glossary. An example ofthis type of analysis is shown in figure 3. Weperform this analysis on individual terms andon distinct noun phrases that are identified inthe system.

A machine-translation system, such as theone we utilize from Systran, translates text sen-tence by sentence. In Standard Language, eachsentence is self-contained, and users cannotuse pronouns to refer back to objects that mayhave been described in a previous sentence. Asingle term by itself cannot be translated accu-rately because it may correspond to differentparts of speech depending on the context.Therefore, it is necessary to build sample testcases for each word or phrase that we will needto test for translation accuracy. This test caseutilizes that term in its correct usage within thesentence. A file containing these translated

Articles

34 AI MAGAZINE

Page 5: Machine Translation for Manufacturing: A Case Study at Ford Motor

sentences (known as a test corpus) is used as abaseline for regression testing of the translationdictionaries. After the dictionary is updated,the test corpus of sentences is retranslated andcompared against the baseline. Any discrepan-cies are examined and a correction is made toeither the baseline (if the new translation iscorrect) or to the dictionary (if the new trans-lation is incorrect). We also designate a personfor each language who has the final responsi-bility for the given language pair; any discrep-ancies or differences as to the correct transla-tion will be decided by this languagecoordinator.

Our system allows the users to override anymachine-generated translation with a manualtranslation. This manual translation willremain current until the underlying Englishtext is modified. When the English text ischanged, the system automatically deletes allthe existing translations. We keep a copy of themanual translations and spend considerabletime in analyzing these manual translations todetermine if they could be used to improve the

machine-translation quality. Unfortunately,there are several problems with trying to useunedited manual translations. Many of theusers would be inconsistent in their usage ter-minology for the same English word. A morecritical problem would result when users wouldadd or delete content from an English sentenceas part of the translation process. This wouldbe done on an ad hoc basis and would makethe manual translations extremely difficult touse. We found that the manual translationprocess would need to be strictly regulated toproduce usable results and this is not feasiblein our production environment. Therefore, inpractice, our translation system automaticallytranslates all of the assembly build instructionsrequired for a given assembly plant withoutany manual human intervention.

Uses of AI TechnologyIt has been known that improving machine-translation quality can often be done mosteffectively by focusing on the source text

Articles

FALL 2007 35

Translation Frequency UsageNoun Phrases Sorted by Usage Count Pct of Total Running Pct

SPOT 10441 3.786071203 3.786071203STOCK 9678 3.509395374 7.295466578PART 7850 2.846533756 10.14200033FIXTURE 6966 2.52598142 12.66798175SCREW 4719 1.711183795 14.37916555SPOT-WELD GUN 4701 1.704656712 16.08382226HOLE 4663 1.690877313 17.77469957BRACKET 3844 1.393895001 19.16859457NUT 3504 1.270605641 20.43920021SPOT-WELD-GUN 3293 1.194093714 21.63329393BOLT 3112 1.128460261 22.76175419PALM BUTTON 2782 1.008797058 23.77055125VEHICLE 2557 0.927208511 24.69775976CLAMP 2552 0.925395432 25.62315519CLIP 2461 0.892397398 26.51555259HAND-TOOL 2270 0.823137787 27.33869038ASSEMBLY 2171 0.787238826 28.1259292BODY 1610 0.583811382 28.70974058

Figure 3. Translation Frequency Usage.

Page 6: Machine Translation for Manufacturing: A Case Study at Ford Motor

(Hutchins and Somers 1992). In most cases, thepreediting of text is performed by a human edi-tor, who verifies and modifies the text before itis sent to the translation system. In our case,the source text is a combination of a controlledlanguage and free-form text. Each of thesemust be treated in a somewhat different fash-ion in order to get the most accurate transla-tion results. This can be done by applying nat-ural language processing along with knowledgerepresentation and reasoning to convert thesource text to an equivalent form that can beprocessed more accurately by the machine-translation engine.

The first step in applying MT technology isto analyze the existing text in order to under-stand exactly what terminology needs to be

translated and how the source text is struc-tured. The terminology analysis is performedby running all of the source text through a pro-gram that retrieves each individual token andlooks up the token in the automotive ontologythat we have developed for Standard Languageas part of the GSPAS project. The automotiveontology utilized is a semantic network thatcontains more than 10,000 concepts related toautomotive assembly at Ford Motor Company.All of the associated knowledge about StandardLanguage, tools, parts, and everything elseassociated with the automobile assemblyprocess, is contained in the DLMS knowledgebase or ontology (Rychtyckyj 2006). Thisknowledge base structure is derived from theKL-ONE family of semantic network structures

Articles

36 AI MAGAZINE

Figure 4. A Portion of the GSPAS Automotive Ontology.

Page 7: Machine Translation for Manufacturing: A Case Study at Ford Motor

(Brachman and Schmolze 1985) and is an inte-gral component of the GSPAS system. Figure 4shows a portion of the GSPAS automotiveontology.

A Standard Language sentence can be parsedand understood by the GSPAS AI system; there-fore, each token in the sentence has the rele-vant information (part of speech, usage, size,and so on) available in the ontology. In addi-tion, the ontology provides us with a methodto identify phrases that need to be translated asan entity rather than as a collection of singlewords. The analysis of free-form text is sub-stantially more difficult. We have discoveredthat a vast majority of the terms (87 percent)can be identified using the GSPAS ontology;however most of the free-form comments andremarks cannot be parsed successfully.

Along with the need for special technicalglossaries for translation, we utilize a variety ofapproaches that take advantage of the naturallanguage-processing and the knowledge repre-sentation technologies to convert the sourcetext into a form that is much more likely tolead to a better translation. This is based on thefact that MT systems expect source text to con-form to some specific rules including the fol-lowing five: (1) Simple, unambiguous sentencestructures (shorter sentences usually translatemuch better than long, complicated sen-tences). Many authoring systems put a strictlimit on the length of a sentence. (2) It ispreferable to put articles in front of nouns andnoun phrases as it helps the MT system identi-fy the proper part of speech and create a moreunderstandable translation. (3) The regulargrammar rules of capitalization and punctua-tion need to be observed. In general, a sentencethat is written according to the structured rulesof English grammar will be translated moreaccurately than one that is not. (4) Acronyms,abbreviations, and proper nouns need to beidentified unambiguously; this is where theontology is most useful. For example, the sys-tem needs to know that “ABS” is an abbrevia-tion for ANTI-LOCK BRAKE SYSTEM and notfor ABSOLUTE. (5) The MT system will utilizeany additional information about the sourcetext that can be gleaned from the system; inour case we utilize XML tags to identify certainproperties of the source text, such as part ofspeech and its usage in this context.

Therefore, we have deployed a pretransla-tion component into our system that reads inthe source text as it is written by the processengineers, converts the source text into a moreMT-friendly form, and then submits the refor-mulated text to the translation engine. Thisreformulation process begins by using the

ontology and AI parser to process the inputtext. At this point, the ontology is referencedto determine if any acronyms, abbreviations,or terms need to be replaced by a synonym,which will always translate correctly. Otherchanges to the Standard Language text are alsoperformed to enhance the structure of thesource text. For instance, articles are added intothe text in front of noun phrases except in cir-cumstances where the noun phrase would nev-er expect an article. The sentence “SECUREBRACKET TO BUMPER” is converted to“SECURE THE BRACKET TO THE BUMPER,”but “DRIVE VEHICLE 60 FEET” is not convert-ed to “DRIVE THE VEHICLE THE 60 FEET.”

In Standard Language, we allow the engi-neers to use ungrammatical structure in somecases, and this needs to be corrected before thesentence can be translated. A process writermay put a size adjective after a part to overridethe existing size of the part as in the followingexample: “OBTAIN BRACKET ASSEMBLY VERY-LARGE.” In this case, the system uses the term“VERY-LARGE” to override the existing size ofthe “BRACKET ASSEMBLY.” The sentence isthen converted to “OBTAIN THE VERY-LARGEBRACKET ASSEMBLY” before it is sent to thetranslation engine. Similar types of text refor-mulation are performed when handling plu-rals, numeric constants, and special caseswhere the Standard Language text cannot betranslated accurately.

As I mentioned previously, we also need totranslate embedded remarks and commentsthat are not in Standard Language and containfree-form text. In this case, we rely on embed-ded XML tags to assist the MT program in thetranslation process (Senellart, Boitet, andRomary 2003). First, we identify the free-formremarks that are embedded in the StandardLanguage text. We then utilize the ontology toanalyze the terminology that is containedwithin the remarks and replace any abbrevia-tions or acronyms with the proper unambigu-ous Standard Language term. The system thenlooks at the length of the embedded remarkand places the appropriate tag around theremark; we have found that very short remarks(one or two) words are generally modifiers,while longer remarks are self-contained phras-es that should be translated as such. In effect,the XML tagging uses the benefits of the natu-ral language processing and ontology from theAI system to assist the MT program in creatinga more accurate translation. We are currentlyworking on expanding the scope of the taggingprocess to incorporate additional information,such as part of speech tagging, to furtherenhance the translation accuracy.

Articles

FALL 2007 37

Page 8: Machine Translation for Manufacturing: A Case Study at Ford Motor

Application Use and PayoffThe machine-translation system has beendeployed at Ford for more than seven years. Theimpact of this system can be summarized as fol-lows. First, we have translated more than 7 mil-lion records from English to Spanish, German,Portuguese, and Dutch. Second, the user com-munity has access to translations of assemblyinstructions in their home language within 24hours of the process sheet being written andcompleted. Third, we have created Ford-specif-ic translation glossaries for each of the languagepairs for which we need to translate our assem-bly instructions. The translation glossaries con-tain a significant number of part descriptionphrases that need to be translated as a singleentity and, consequently, contain up to 6,000entries. Fourth, we have worked with Systran todeploy a web-based process that makes it possi-ble for us to maintain and update the Ford-spe-cific technical glossaries on a timely basis. Fifth,we have built a process that allows the assemblyplant personnel to manually override the trans-lations when necessary. These human transla-tions will remain in the system as along as theunderlying English source text is not modified.Finally, we have developed a process to retrans-late the process sheets when an updated tech-nical glossary is deployed; this ensures that theusers will have the benefit of the latest versionof the translations available.

The easiest way to calculate the benefits ofusing the machine translation is to comparethe costs of human translation versus the costof developing an MT solution that can generatetranslations with the same accuracy. Amachine-translation system, even in a semi-controlled setting, will not generate transla-tions that are as accurate as those completed bya trained human translator. We can developtranslations that are highly accurate (our Eng-lish-German is more than 90 percent correct),but this is directly dependent on the involve-ment of the bilingual technical people with thecreation of technical glossaries. The English-German glossary is much more complete thanEnglish-Portuguese, so our translations aremore accurate into German than into Por-tuguese. However, the huge amount of datathat we need to translate precludes the use ofhuman translators. Our goal in this project wasto develop translations that are understandableto the operators at the assembly plants. Thesetranslations may not be as natural as those pro-vided by human translators, but they will pro-vide the correct information to the users. SinceStandard Language is always evolving, thetechnical glossaries must always be modified tokeep them current. The main payoff for this

project is that we are able to provide “under-standable” translations to our users around theworld in a timely manner without utilizing anydirect human intervention.

Application Developmentand Deployment

The artificial intelligence development for ourapplications here at Ford Manufacturing Engi-neering Systems is based on the Hewlett-Packard UNIX (HP-UX) platform utilizing theLispworks and Knowledgeworks tools fromLispworks Inc.1 We have found that this toolprovides a flexible and powerful developmentenvironment while providing access to ourOracle database through an SQL interface. Wehave worked closely with Systran in seamlesslyintegrating their translation programs into ourtranslation process. The largest amount ofeffort that we spent was to develop the cus-tomized translation glossaries for each of thefour language pairs that we need to translate.This development work required the efforts ofinternal Ford bilingual subject matter experts,the use of retired and external people whounderstand Ford and automotive technical ter-minology, the use of linguistic experts fromSystran, and our own expertise in bringing allof these knowledge sources together.

The actual translation process is shown infigure 5; the entire process is fully automated.Each evening, a batch run scans the databasefor those process sheets that need to be trans-lated based on the assembly plant in which thevehicle is built. At this point, the element texthas been reformulated into a more translation-friendly format by the AI system, and ourtranslation programs selects the records fromthe database that need to be translated. Theappropriate XML tags are added, and therecord is then translated for each target lan-guage. The translated record is then writteninto the database. The translation process uti-lizes three different glossaries: a customizedFord-specific glossary, a generic automotiveglossary, and a general-purpose glossary. The“translation parameters” file contains specificinformation about the translation processingfor each language. For example, English-Ger-man is translated in imperative form whileEnglish-Spanish is translated in infinite form.

The initial application deployment anddevelopment took about six months to accom-plish; this included writing the software thatwould interface with the translation enginesand update the database as needed. These ini-tial translations were of very poor quality andwere not acceptable to the user community. At

Articles

38 AI MAGAZINE

Page 9: Machine Translation for Manufacturing: A Case Study at Ford Motor

that point we started working to improve thetranslation quality by building up the technicalglossaries and building a process to improvethe source text before it is translated. This wasaccomplished by creating utilities to analyzethe source text and identify the terminologythat was causing translation problems.Changes were also implemented to the transla-tion process to allow our users the ability tooverride the automated translations manuallywhen necessary. Another important issue thathad to be addressed was to ensure that thetranslated text could be properly displayed tothe users because of the special characters thatare required and the extra space that is oftenneeded. The accuracy of the translationsincreased as we built up the technical glossariesand improved the text reformulation process.Over the next few years, Systran spent consid-erable time and effort to streamline andimprove their translation programs, and as aresult, we deployed more than 10 versions ofthe technical glossaries. Our translation accu-racy improved noticeably with the English-German and English-Spanish, as it was mucheasier to find people who could work on theseglossaries. The amount of maintenancerequired is also directly proportional to the sizeof the technical glossaries. This system hasbeen in production since 1998, but we are still

spending a considerable amount of effortmaintaining and enhancing the system boththrough advances in technology and with thecreation of more complete technical glossaries.

We have also studied the possibility ofexpanding the machine-translation approachbeyond just manufacturing assembly instruc-tions. There are other types of automotiveinformation, such as technical service bulletinsand warranty claims that need to be translatedin a timely and accurate manner. This type ofsource information is much less controlled andcontains more ambiguity than the assemblybuild instructions. In addition, the terminolo-gy glossaries will need to be refined and updat-ed to improve the quality of these translations.However, we believe that further advance-ments in MT technology, including “part ofspeech tagging,” statistical analysis, and learn-ing techniques will increase the use of machinetranslation for other less-structured problemdomains and applications.

MaintenanceAs previously discussed, we have spent consid-erable time and effort to create a set of cus-tomized technical glossaries that are used dur-ing the translation process. These glossarieswere developed in conjunction with Systran

Articles

FALL 2007 39

Oracle Database

GSPAS Translation Program:English/GermanEnglish/SpanishEnglish/DutchEnglish/Portugese

Source TextTranslation Software

SYSTRAN

Target Text

Ford Customer Dictionary

Subject Glossary

Main SYSTRAN Dictionary

TranslationParameters

Figure 5. Actual Translation Process.

Page 10: Machine Translation for Manufacturing: A Case Study at Ford Motor

and with subject matter experts from FordMotor Company. However, since Standard Lan-guage and Ford terminology are always evolv-ing, it soon became obvious that we needed todevelop a process to modify and add terminol-ogy to our technical glossaries in a timely man-ner.The initial release of our MT system was

designed so that all updates to the technicalglossaries required Systran to create and com-pile a new set of dictionaries that wouldinclude the new changes. Systran would needto test these dictionaries through their internalquality control program and then deliver theupdates to Ford. We would also need to test theupdates against our internal benchmarks anddeploy them into production if the result ofthe testing was acceptable. The entire processwould be delayed if any problems were discov-ered during testing. This approach was toocumbersome and time-consuming and was notviable for the long term.

Systran developed a web-based systemknown as the Systran Review Manager (SRM)(Costa and Panissod 2003) that addressed all ofthese shortcomings. The SRM was deployed ona Ford internal server that allowed us to con-trol and monitor the access to the application.Figure 6 shows a screen print of the SRM thatdisplays how a user would deal with a term thatwas not found in the translation glossary. Fig-ure 7 demonstrates how the user can review asample corpus for translation accuracy. Ouruser community was trained to use this tool,and it gave them the a number of benefits,such as automation of the testing process (auser could make a change to the technical glos-saries and immediately run a translation thatwould test to see how the change would impactthe translation quality). The SRM allows usersto create and modify different versions of user-defined dictionaries without impactingchanges that are being made by a different user.Test corpora can be loaded and analyzed direct-

Articles

40 AI MAGAZINE

Figure 6. SYSTRAN Review Manager.

Page 11: Machine Translation for Manufacturing: A Case Study at Ford Motor

ly within the SRM. The web-based architectureof the SRM allows our users to access the sys-tem without any additional software or hard-ware requirements. The SRM provides veryquick turnaround time for the process of mod-ifying and deploying an updated translationglossary.

Another important facet in dictionary main-tenance involves the analysis and customiza-tion of the source text. We have previouslydescribed some of the techniques we have beenusing to clean up the source text to improvetranslation quality. In this section we will dis-cuss additional capabilities we have added intothe system to improve the translation of thefree-form text. A Standard Language elementmay contain embedded free-form text that isignored by the AI system; however this textmust be translated and sent to the assemblyplants. This free-form text usually consists ofadditional information that may be useful to

the operator on the assembly line. Theseembedded remarks may contain non–StandardLanguage terminology or they may be separatephrases or sentences that describe specific cir-cumstances for this process work. Our analysishas shown that the embedded remarks neededto be treated separately from the Standard Lan-guage text in order to create accurate transla-tions. In many cases, a single embeddedremark that looks innocuous inside a StandardLanguage element would lead to an incorrecttranslation. Therefore, we decided that the bestsolution would be to separate the embeddedremarks from the Standard Language text andtranslate them separately. The following exam-ple shows how this process would take place.

PLACE TWO MOULDINGS INSIDE HEATER{TAPE SIDE UP}

The text inside the curly brackets {TAPE SIDEUP} is not really part of the sentence; it actual-ly describes the position of the “mouldings.”

Articles

FALL 2007 41

Figure 7. Review of a Sample Corpus for Translation Accuracy

Page 12: Machine Translation for Manufacturing: A Case Study at Ford Motor

Therefore, a translation system that processesthis sentence as one entity would not generatean accurate translation. We need to be able totell the system that the clause inside the curlybrackets should be treated independently fromthe rest of the sentence. This problem is solvedby embedding tags into the source text beforeit gets translated. These tags identify commentsand provide the translation program withinformation about how these commentsshould be translated. Short comments areprocessed differently from long commentswithin Standard Language regarding transla-tion parameters (dictionaries and segmenta-tion). The above comment with embedded tagswill look like the following:

PLACE TWO MOULDINGS INSIDE HEATER<comment> TAPE SIDE UP </comment>

Another facet of system maintenanceaddresses the underlying software architecturethat supports our translation system. Transla-tion in GSPAS involves a set of programs thatcommunicate with a database as well as withthe translation engines and technical glos-saries. Most changes to the translation engineprocessing also require changes to the transla-tion preprocessing programs. In addition,modifications to the database model orupgrades to the operating system requireextensive testing and validation of the transla-tion results. The testing needs to identify boththe translation issues in both the Standard Lan-guage and the non–Standard Language com-ponents of the source text. The results of thetranslation tests focus on two types of poten-tial problems: terminology and grammar. Ter-minology errors are almost always fixed just byadding the correct translation for the problemterm into the appropriate translation glossary.The grammar errors are more complex; theymay require changes to the translation engineitself.

Conclusions and Future Work

In this article I discussed some of the issuesrelated to the maintenance of a machine-trans-lation application at Ford Motor Company.This application has been in place since 1998,and we have translated more than 7 millionrecords describing build instructions for vehi-cle assembly at our plants in Europe, Mexico,and South America. The source text for ourtranslation consists of a controlled language,known as Standard Language, but we also needto translate free-form text comments that areembedded within the assembly instructions.The most difficult issue in the development of

this system was the construction of technicalglossaries that describe the manufacturing andengineering terminology in use at Ford. Ourapplication uses a customized version of theSystran translation system coupled with a setof Ford-specific dictionaries that are used dur-ing the translation process. The automotiveindustry is very dynamic, and we need to beable to keep our technical glossaries currentand to develop a process for updating our sys-tem in a timely fashion.

The solution to our maintenance issues wasthe development and deployment of the Sys-tran Review Manager. This web-based toolallows our users the capability to test andupdate the technical glossaries as needed. Thishas reduced our turnaround time for deployingchanges to the dictionaries from two monthsto less than 48 hours. The Systran Review Man-ager runs on an internal Ford server and isavailable for use by our internal customers.

System maintenance is an ongoing issue. Westill require additional capabilities to improveour translation accuracy and to expand our sys-tem to other types of source data, includingpart and tool descriptions. We have alreadyintroduced XML tagging into our free-formcomment translation and are working withSYSTRAN to enhance that capability andimprove translation accuracy. Our current AIsystem in GSPAS already parses Standard Lan-guage into its components, and we plan to passthe information obtained during parsing to thetranslation system to improve the sentenceunderstanding that should lead to higher accu-racy. One of the unique advantages that wehave on this project is the automotive ontol-ogy that we have developed for our manufac-turing processes at Ford. This ontology allowsus to retrieve knowledge and infer contextinformation about the source text that needs tobe translated. Our challenge is to leverage thisbackground knowledge and integrate the con-text information into the translation process.

This project has given us a unique perspec-tive into the culture and business processes ofour fellow Ford employees around the world.We allow the users to override the translationsmanually when they are unacceptable and alsoprovide a feedback mechanism to measure theaccuracy of these translations. We have beensurprised to see that, in many cases, our usersprefer that we utilize an English acronym orterm rather than the correct translated word.We have also discovered that even in a techni-cal domain such as automobile assembly, therestill exists some variation between Spanish inSpain, Mexico, Argentina, and Venezuela. Theproliferation of free web-based translation

Articles

42 AI MAGAZINE

Page 13: Machine Translation for Manufacturing: A Case Study at Ford Motor

engines has proven to be both a blessing and acurse for our project. In some cases, userswould not even consider using MT after tryingout these web services; in other cases users wereperfectly satisfied with the quality of thesetranslations and did not see the need for anycustomization work. Perhaps one of our biggestchallenges is to properly educate and managethe expectations of the user community whenexposing them to this technology.

Our experience with machine-translationtechnology at Ford has been positive; we haveshown that customization of a commercialtranslation system can lead to very positiveresults. It is also essential to put a process inplace that allows for the timely testing andupgrades to the technical glossaries. We areconfident that further enhancements to thetechnology, such as tagging of terminology,will lead to better results in the future andimprove the use and acceptance of machinetranslation in the corporate world.

AcknowledgementsI thank the AI Magazine reviewers for theirinsightful comments; in addition, I would liketo thank Mike Rosen and Rosemarie Janissefrom Ford and Christiane Pannisod, John PaulBarazza, and Jean Senellart from Systran Soft-ware Inc. for their work on this project. I wouldalso like to thank Erica Klampfl and Reba Sitzerfor their assistance in the preparation of thisarticle.

Note1. www.lispworks.com.

ReferencesBrachman, R., and Schmolze, J. 1985. An Overviewof the KL-ONE Knowledge Representation System.Cognitive Science 9(2): 171–216.

Carey, P.; Farrell, J.; Hui, M.; and Sullivan, B. 2001.Heyde’s Modapts: A Language of Work. Brighton, Vic-toria, Australia: Heyde Dynamics Pty. Ltd.

Costa, J.-C., and Panissod, C. 2003. SYSTRAN ReviewManager. In Proceedings of the Ninth Machine Transla-tion Summit, 451–454. Stroudsburg, PA: Associationfor Machine Translation in the Americas.

Gazdar, G., and Mellish, C. 1989. Natural LanguageProcessing in LISP. Reading, MA: Addison-Wesley.

Hutchins, W., and Somers, H. 1992. Introduction toMachine Translation. London: Academic Press.

Iwanska, L., and Shapiro, S., eds. 2000. Natural Lan-guage Processing and Knowledge Representation: Lan-guage for Knowledge and Knowledge for Language. Men-lo Park, CA: AAAI Press.

Manning, D., and Schutze, H. 2000. Foundations ofStatistical Natural Language Processing. Cambridge,MA: The MIT Press.

Rychtyckyj, N. 1999. DLMS: Ten Years of AI for Vehi-

cle Assembly Process Planning. In Proceedings of theSixteenth National Conference on Artificial Intelligenceand the Eleventh Innovative Applications of ArtificialIntelligence Conference, 821–828. Menlo Park, CA:AAAI Press.

Rychtyckyj, N. 2002. An Assessment of MachineTranslation for Vehicle Assembly Process Planning atFord Motor Company. In Machine Translation: FromResearch to Real Users, Proceedings of the Fifth Confer-ence of the Association for Machine Translation in theAmericas, 207–215,. Berlin: Springer-Verlag.

Rychtyckyj, N. 2004. Maintenance Issues forMachine Translation Systems. In Machine Translation:From Real Users to Research: Proceedings of the SixthConference for the Association of Machine Translation inthe Americas, Lecture Notes in Computer Science Vol.3265, 252–261. Berlin: Springer-Verlag,.

Rychtyckyj, N. 2006. Measuring Long-Term Ontol-ogy Quality: A Case Study from the AutomotiveIndustry. In Proceedings of the Nineteenth InternationalFLAIRS Conference (FLAIRS-2006), 147–152. MenloPark, CA: AAAI Press.

Senellart, J., Boitet, C., Romary, L. 2003. SYSTRANNew Generation: The XML Translation Workflow. InProceedings of the Ninth Machine Translation Summit,338–345. Stroudsburg, PA: Association for MachineTranslation in the Americas.

Society of Automotive Engineers. 2002. J2450 Quali-ty Metric for Language Translation. Warrendale, PA:SAE International.

Nestor Rychtyckyj is a technicalexpert in artificial intelligence atFord Motor Company in Dear-born, Michigan, in advanced andmanufacturing engineering sys-tems. He received his Ph.D. incomputer science from WayneState University in Detroit, Michi-gan. His research focuses on the

application of knowledge-based systems for vehicleassembly process planning, ergonomics, and adap-tive in-vehicle systems. Currently his responsibilitiesinclude the development of automotive ontologies,intelligent manufacturing systems, controlled lan-guages, machine translation, and corporate termi-nology management. He is a member of AAAI, ACM,and the IEEE Computer Society. His email address [email protected].

Articles

FALL 2007 43

The IAAI-07 Paper Deadline Is January 22, 2007

Details:www.aaai.org/Conferences/IAAI/

Page 14: Machine Translation for Manufacturing: A Case Study at Ford Motor

Advertisements

44 AI MAGAZINE

The Tenth International Symposium on

General ChairMartin Charles GolumbicConference ChairFrederick HoffmanProgram Co-ChairsBerthe Y. Choueiry and Bob GivanPublicity ChairMehran Sahami

Submission deadlineOctober 21, 2007

David McAllesterFrancesca RossiNaftali Tishby

Special Sessions

Michael Kaminski and Mirek Truszczynski organizers

Toby Walsh organizer

The classic book on data mining

Available from AAAI Press / The MIT Press

http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=8132