creating a termbase. terminology management report
TRANSCRIPT
Creating a termbase. -Terminology Management Report
Juan Yborra Golpe
This is a report on how we elaborated a termbase for a Terminology Management module at
Swansea University.
The first step we took was to read the book tilted: Terminology: Theory, methods and
applications, by María Teresa Cabré. The book was extremely helpful, as it explained how to
build up a professional termbase step by step. Apart from this, we also made use of the
PowerPoint presentations we were given in the module.
We were requested to develop two monolingual termbases (one in English and the other one
in Spanish), along with a correspondence record for the terms.
The main problem we had to face was the departure of one of the members of the group,
what meant that only two people were making all the work. Luckily, we were told that due to
these special circumstances, we did not need to use as many terms as the other groups when
making the termbase.
What we first did was choosing the domain we were going to work on. After some discussion
and proposals, we eventually choose the Vestibular system as our research domain.
Division of work
Once we reached that point, we decided to carry out a small experiment. My team mate would
perform the English term search on her own whereas I would also do the same with the
Spanish ones.
A couple of days later, we met again to check our results and we were surprised that almost
50% of the terms we had extracted were the same ones although obviously, in different
languages.
After that, we decided to divide the work and do it during the holydays when we had more
spare time.
Creating the termbase
DocumentationFirst of all, we reached the conclusion that carrying out a good work when documenting
ourselves that is, choosing trustable resources to build up the textual corpus, was crucial.
In order to achieve this, we agreed that the best way to find reliable medical journals was
through official organisms. As I have been a student at the University of Granada for a few
years, I know its electronic library quite well and I know there are many prestigious journals
that can be accessed to.
Thus, we used a VPN (Virtual Private Network) connection and entered the electronic library of
the University. Once there, we started our search looking for English articles and journals.
Then, we used the PubMed database to access the MEDLINE (Medical Literature Analysis and
Retrieval System) bibliographic database, one of the best for life sciences and biomedical
information. Thanks to it, we found 4-5 extremely useful articles from prestigious journal for
our work.
Unfortunately, it got notably harder when having to find Spanish articles as the number of
articles about the vestibular system in Spanish is low or at least, not as huge as in English. We
also had to change the databases, for the previous ones were for English documents only.
Luckily, we found a quite useful journal: Revista de otorrinolaringologia y cirugia de cabeza y
cuello, hosted in Scielo, (Scientific Electronic Library Online) a trustable bibliographic database
supported by the State of Sao Paulo (Brazil) that contained hundreds of scientific journals in
English, Spanish and Portuguese.
Thanks to this exhaustive research, we gathered a good number (approx. 4-5) of both English
and Spanish texts and, what is more important, we managed to elaborate a greatly trustable
textual corpus for our termbase.
Software choicesPrior to the extraction of terms from the textual corpus, we decided to establish which
software we were going to use. I had previous experience at the University of Granada with
SDL Multiterm so; we chose to use it for building up the termbase. As for the extraction
process, we agreed to use SDL Multiterm Extract because, as they are created by the same
company, it would not be necessary to worry about any conversion/integration problems
when exporting the extracted terms from the text corpus to the termbase creator (SDL
Multiterm.)
Finally, we decided to use Microsoft Excel to create the correspondence records between
terms for we both were familiar with its usage.
Extraction processAs it has been foresaid, we used SDL Multiterm Extract, which was already installed in Lab. C of
the University, to extract all the terms needed for our termbases. Before all the extraction
process took place, we had to convert the documents from pdf to a readable file because
Multiterm Extract does not support pdf files (we decided to convert everything into a rtf file.)
After that, a little bit of editing was needed, because Microsoft Word had placed a paragraph
break at the end of every line. In order to solve this we just had to follow the instructions
provided on one of the documents on Blackboard and Search/Replace the paragraph breaks by
spaces.
Once all the problems had been solved, we ran the extraction but we did not get as many
results as expected so we tried to scarcely change the search parameters, allowing a little bit
more noise in the extraction process. We received more feedback that time, so we carefully
chose those terms that had an unambiguous equivalent in both languages, in order not to find
troubles when creating the correspondence record later. We also chose to include synonyms,
initialisms or terminological phrases in order to make the termbase more interesting and
challenging.
As for the examples of the term in actual use we had to make reference to in the termbase, we
did not choose all the first entries given by the SDL Multiterm. We tried to use those which
included examples of the other terms so as to create a strong network among all the terms by
means of hyperlinks and cross-references.
Finally, we exported the chosen terms selecting Tab delimited to a Word file and removed the
non-validated candidate terms.
This was the last step done before the holidays, as we noticed that whereas we found no
trouble whatsoever when using the demo version of Multiterm to create the termbase, with
the demo version of Multiterm Extract we were not allowed to export the terms once
extracted, so we had to do it by using the campus’ facilities.
Recording the terms into a termbase
After all this previous work, we proceeded to record the extracted terms into the two
monolingual termbases in SDL Multiterm. Our main aim was to create an extremely compact
and linked termbase, where every entry in the termbase would have at least one cross-
reference to any of the other terms.
We were asked to build up a termbase that specified the following information:
Citation form: Almost all entries were nouns and we have little problem here because all the
terms we chose were present in the texts in a proper form according to the conventions for
lexicography (e.g. nouns in the singular, adjectives in the singular masculine form in languages
with gender marking on adjectives , etc.)
Definition: We needed to provide a proper terminological definition. In order to achieve that,
we used online medical term dictionaries and the information we found in the text corpus.
One of our concerns was also avoiding circular definitions and entries, as we read in Cabre’s
book:
“Definitions should not be circular:
dense: having relatively high density
density: the quality or condition of being dense”
To avoid this we made sure that definitions used known words and in the case a more specific
word is used, this word must be a term defined in the same termbase, for example:
utricle: The part of the vestibule of the ear into which the semicircular canals open.
vestibule: The parts of the membranous labyrinth comprising the utricle and the saccule and
contained in the cavity of the bony labyrinth.
Notes on the definition: We had to give the sources of the definition and comment the degree
of authority, but as we have said before, we were extremely careful during the documentation
step.
Examples: Of the term in actual use. We provided examples and besides, we tried to find
examples in which other terms of the termbase appeared.
Semantic relationships: Most of them were holonymy/meronymy relationships but for one
case in which we found a hyperonymy/homonymy one. It was impossible for us to
semantically relate all the words in the termbase, although we achieved to do that in most of
them. Anyway, we managed to relate all the terms thanks to its definitions where at least one
of the terms in the termbase was present, thus developing an interesting network.
Synonyms: We included in our termbase synonyms and initialisms, indicating which ones are
predominant and why.
Regarding the software, everything worked quite well but for –truth to be said- some minor
problems with SDL Multiterm. We found some trouble when creating the cross-references and
hyperlinks until we realised that, in order to be able to create them we had to do it in the
editing mode, but without opening the slot where the word we were going to use to create the
hyperlink was. Apart from that, one of the worst things SDL Multiterm has is the almost
impossibility of editing the parameters and field labels after you have created a new termbase.
Apart from that editing problem, SDL Multiterm is extremely efficient and useful for a
terminologist in our criterion.
Correspondence RecordsWe used Microsoft Excel to create the Correspondence Record. We used three columns, the
first one with the English term, the second one with the Spanish term and the last one with
notes on the differences in correspondence between the terms. As we had previously chosen
similar terms in both languages, we did not find many problems, just some details that we
mentioned in the “notes” column.
ConclusionAlthough it has been a long and hard job, we tried to do it the best we could by following a
meticulous process. We started with an in-depth search of trustable sources during the
documentation phase, and then followed with a careful choice of the terms to be extracted
during the extraction. The last challenge came when trying to link all the terms in the termbase
and thus create a good network of terms of the same domain. And what is more important, I
learned a lot through the process and I am sure it will be really helpful for me and my future.