creating a termbase. terminology management report

8
Creating a termbase. - Terminology Management Report Juan Yborra Golpe

Upload: juan-yborra-golpe

Post on 26-Oct-2014

108 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Creating a termbase. Terminology Management Report

Creating a termbase. -Terminology Management Report

Juan Yborra Golpe

Page 2: Creating a termbase. Terminology Management Report

This is a report on how we elaborated a termbase for a Terminology Management module at

Swansea University.

The first step we took was to read the book tilted: Terminology: Theory, methods and

applications, by María Teresa Cabré. The book was extremely helpful, as it explained how to

build up a professional termbase step by step. Apart from this, we also made use of the

PowerPoint presentations we were given in the module.

We were requested to develop two monolingual termbases (one in English and the other one

in Spanish), along with a correspondence record for the terms.

The main problem we had to face was the departure of one of the members of the group,

what meant that only two people were making all the work. Luckily, we were told that due to

these special circumstances, we did not need to use as many terms as the other groups when

making the termbase.

What we first did was choosing the domain we were going to work on. After some discussion

and proposals, we eventually choose the Vestibular system as our research domain.

Division of work

Once we reached that point, we decided to carry out a small experiment. My team mate would

perform the English term search on her own whereas I would also do the same with the

Spanish ones.

A couple of days later, we met again to check our results and we were surprised that almost

50% of the terms we had extracted were the same ones although obviously, in different

languages.

After that, we decided to divide the work and do it during the holydays when we had more

spare time.

Creating the termbase

DocumentationFirst of all, we reached the conclusion that carrying out a good work when documenting

ourselves that is, choosing trustable resources to build up the textual corpus, was crucial.

In order to achieve this, we agreed that the best way to find reliable medical journals was

through official organisms. As I have been a student at the University of Granada for a few

Page 3: Creating a termbase. Terminology Management Report

years, I know its electronic library quite well and I know there are many prestigious journals

that can be accessed to.

Thus, we used a VPN (Virtual Private Network) connection and entered the electronic library of

the University. Once there, we started our search looking for English articles and journals.

Then, we used the PubMed database to access the MEDLINE (Medical Literature Analysis and

Retrieval System) bibliographic database, one of the best for life sciences and biomedical

information. Thanks to it, we found 4-5 extremely useful articles from prestigious journal for

our work.

Unfortunately, it got notably harder when having to find Spanish articles as the number of

articles about the vestibular system in Spanish is low or at least, not as huge as in English. We

also had to change the databases, for the previous ones were for English documents only.

Luckily, we found a quite useful journal: Revista de otorrinolaringologia y cirugia de cabeza y

cuello, hosted in Scielo, (Scientific Electronic Library Online) a trustable bibliographic database

supported by the State of Sao Paulo (Brazil) that contained hundreds of scientific journals in

English, Spanish and Portuguese.

Thanks to this exhaustive research, we gathered a good number (approx. 4-5) of both English

and Spanish texts and, what is more important, we managed to elaborate a greatly trustable

textual corpus for our termbase.

Software choicesPrior to the extraction of terms from the textual corpus, we decided to establish which

software we were going to use. I had previous experience at the University of Granada with

SDL Multiterm so; we chose to use it for building up the termbase. As for the extraction

process, we agreed to use SDL Multiterm Extract because, as they are created by the same

company, it would not be necessary to worry about any conversion/integration problems

when exporting the extracted terms from the text corpus to the termbase creator (SDL

Multiterm.)

Finally, we decided to use Microsoft Excel to create the correspondence records between

terms for we both were familiar with its usage.

Extraction processAs it has been foresaid, we used SDL Multiterm Extract, which was already installed in Lab. C of

the University, to extract all the terms needed for our termbases. Before all the extraction

Page 4: Creating a termbase. Terminology Management Report

process took place, we had to convert the documents from pdf to a readable file because

Multiterm Extract does not support pdf files (we decided to convert everything into a rtf file.)

After that, a little bit of editing was needed, because Microsoft Word had placed a paragraph

break at the end of every line. In order to solve this we just had to follow the instructions

provided on one of the documents on Blackboard and Search/Replace the paragraph breaks by

spaces.

Once all the problems had been solved, we ran the extraction but we did not get as many

results as expected so we tried to scarcely change the search parameters, allowing a little bit

more noise in the extraction process. We received more feedback that time, so we carefully

chose those terms that had an unambiguous equivalent in both languages, in order not to find

troubles when creating the correspondence record later. We also chose to include synonyms,

initialisms or terminological phrases in order to make the termbase more interesting and

challenging.

As for the examples of the term in actual use we had to make reference to in the termbase, we

did not choose all the first entries given by the SDL Multiterm. We tried to use those which

included examples of the other terms so as to create a strong network among all the terms by

means of hyperlinks and cross-references.

Finally, we exported the chosen terms selecting Tab delimited to a Word file and removed the

non-validated candidate terms.

This was the last step done before the holidays, as we noticed that whereas we found no

trouble whatsoever when using the demo version of Multiterm to create the termbase, with

the demo version of Multiterm Extract we were not allowed to export the terms once

extracted, so we had to do it by using the campus’ facilities.

Recording the terms into a termbase

After all this previous work, we proceeded to record the extracted terms into the two

monolingual termbases in SDL Multiterm. Our main aim was to create an extremely compact

and linked termbase, where every entry in the termbase would have at least one cross-

reference to any of the other terms.

We were asked to build up a termbase that specified the following information:

Page 5: Creating a termbase. Terminology Management Report

Citation form: Almost all entries were nouns and we have little problem here because all the

terms we chose were present in the texts in a proper form according to the conventions for

lexicography (e.g. nouns in the singular, adjectives in the singular masculine form in languages

with gender marking on adjectives , etc.)

Definition: We needed to provide a proper terminological definition. In order to achieve that,

we used online medical term dictionaries and the information we found in the text corpus.

One of our concerns was also avoiding circular definitions and entries, as we read in Cabre’s

book:

“Definitions should not be circular:

dense: having relatively high density

density: the quality or condition of being dense”

To avoid this we made sure that definitions used known words and in the case a more specific

word is used, this word must be a term defined in the same termbase, for example:

utricle: The part of the vestibule of the ear into which the semicircular canals open.

vestibule: The parts of the membranous labyrinth comprising the utricle and the saccule and

contained in the cavity of the bony labyrinth.

Notes on the definition: We had to give the sources of the definition and comment the degree

of authority, but as we have said before, we were extremely careful during the documentation

step.

Examples: Of the term in actual use. We provided examples and besides, we tried to find

examples in which other terms of the termbase appeared.

Semantic relationships: Most of them were holonymy/meronymy relationships but for one

case in which we found a hyperonymy/homonymy one. It was impossible for us to

semantically relate all the words in the termbase, although we achieved to do that in most of

them. Anyway, we managed to relate all the terms thanks to its definitions where at least one

of the terms in the termbase was present, thus developing an interesting network.

Synonyms: We included in our termbase synonyms and initialisms, indicating which ones are

predominant and why.

Page 6: Creating a termbase. Terminology Management Report

Regarding the software, everything worked quite well but for –truth to be said- some minor

problems with SDL Multiterm. We found some trouble when creating the cross-references and

hyperlinks until we realised that, in order to be able to create them we had to do it in the

editing mode, but without opening the slot where the word we were going to use to create the

hyperlink was. Apart from that, one of the worst things SDL Multiterm has is the almost

impossibility of editing the parameters and field labels after you have created a new termbase.

Apart from that editing problem, SDL Multiterm is extremely efficient and useful for a

terminologist in our criterion.

Correspondence RecordsWe used Microsoft Excel to create the Correspondence Record. We used three columns, the

first one with the English term, the second one with the Spanish term and the last one with

notes on the differences in correspondence between the terms. As we had previously chosen

similar terms in both languages, we did not find many problems, just some details that we

mentioned in the “notes” column.

ConclusionAlthough it has been a long and hard job, we tried to do it the best we could by following a

meticulous process. We started with an in-depth search of trustable sources during the

documentation phase, and then followed with a careful choice of the terms to be extracted

during the extraction. The last challenge came when trying to link all the terms in the termbase

and thus create a good network of terms of the same domain. And what is more important, I

learned a lot through the process and I am sure it will be really helpful for me and my future.