non-technical computer thesaurusversusspecialized computer thesaurus

23
NON-TECHNICAL COMPUTER THESAURUS VERSUS SPECIALIZED COMPUTER THESAURUS Olena Siruk Laboratory for Computational Linguistics Institute of Philology National Taras Schevchenko University of Kyiv Ukraine olebosi @ gmail.com

Upload: sabadel

Post on 11-May-2015

2.884 views

Category:

Documents


3 download

DESCRIPTION

This presentation is devoted to a comparative analysis of the Computer Thesaurus of Ukrainian Verbs and the Specialized Thesaurus of Computer Ideography. These two dictionaries are representative examples of a general language (non-technical) computer thesaurus and a specialized computer thesaurus. We focus our attention on the entries of each thesaurus, its macrostructure, microstructure, compilation and use.

TRANSCRIPT

Page 1: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Olena Siruk Laboratory for Computational Linguistics Institute of PhilologyNational Taras Schevchenko University of KyivUkraine [email protected]

Page 2: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

2

1. Topicality of the research

Compilation of general and specialised (terminological) thesauri

Ukrainian lexicography development Users’ requirements in integrated information Development of computer technologies

• Development of formalised principles of thesauri modellingDevelopment of formalised principles of thesauri modelling• Systematisation of termsSystematisation of terms

• Standardisation of definitionsStandardisation of definitions

Page 3: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

3

Non-technical(Computer Thesaurus of Ukrainian Verbs)approbation on the basis of the semantic field of speech

ТТhe Thesaurushe Thesaurus joinsjoins termstermson the on the conceptual conceptual principleprinciple

Specialized(Specialized Thesaurus of Computer Ideography)

Page 4: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

4

2. CT units (CT of CI versus CT of UV)

Quantity – 75 terms (it is considered complete) / the semantic field of speech contains about 2000 units

Type – nouns, noun-noun and noun-adjective compounds / verbs

Amount – from 1 to 4 words in a term / LSV Content – from highly specialised terms to

terms related with other linguistic disciplines / verbs of the semantic field of speech

Page 5: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

5

3. CT of Nouns versus CT of Verbs

It is precisely the noun that holds the garland in ideographical dictionaries of different languages.

The basis for the semantic scheme of nouns is adopted from objective extralinguistic reality.

Verbs are included in the different types of thesauri considerably less often than nouns, and especially seldom in terminological thesauri.

Significative semantics prevails in the meaning of a verb.

Page 6: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

6

4. CT of Nouns

Consequently, for a noun

1) external, denotative choice of concepts is characteristic;

2) a deductive approach to structuring the material is mostly applied;

3) word-formation and the valency potential of a noun are not very important for the creation of the synoptic scheme;

4) whole–part relations are substantial, taxonomy is prevalent.

It is precisely the noun that holds the garland in ideographical dictionaries of different languages.

Page 7: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

7

5. CT of Verbs

In light of this for verbs

1) an internal, significative concept selection strategy based on the analysis of meaning is more acceptable;

2) an inductive approach to ordering lexemes is more adequate;

3) relations based on word-formation type (derivation hyponymy) and valency potential (a basis for connections between parts of speech) are essential;

4) taxonomy, whole–part relations are irrelevant.

Page 8: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

8

6. CT macrostructure

Synoptic scheme represented as a term index

Maximum depth – 6 intervals of hierarchy

Page 9: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

9

Both types of CT have certain common, analogous and uniting features:

1) both dictionaries represent more or less completely the relations between units;

2) both dictionaries either have an explicit synoptic scheme, that is a division of the universe into thematic classes, or such a scheme is present іmplicitly;

3) the rubric (a class of synonymous words in non-technical thesauri and a descriptor article in specialized thesauri) serves as interpretation, or as context, in both dictionaries;

4) there are cross-references between entries in both dictionaries.

The features of the lexical semantics of verbs condition the difference between an ideographical dictionary of nouns and an analogous dictionary of verbs with respect to the organization of its external structure (macrostructure). Verbs have been categorized primarily on a semantic basis, using the method of component analysis and stepwise identification of verbal meanings.

Page 10: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

10

Комп’ютерна лексикографія (КЛ) – 0 рівень

Комп’ютерна ідеографія (КІ) – 1 рівень Відношення між одиницями КТ – 2

рівень Комп’ютерний тезаурус (КТ) – 2 рівень Одиниці КТ – 2 рівень Укладання КТ– 2 рівень

База даних КТ – 3 рівень Лінгвістичний процесор – 3 рівень

Лінгвістичний алгоритм – 4 рівень Блок-схема алгоритму – 5 рівень

Макроструктура КТ – 3 рівень Методика укладання КТ – 3 рівень

Дедуктивний метод – 4 рівень Індуктивний метод – 4 рівень Метод компонентного аналізу – 4 рівень Метод ступеневої ідентифікації – 4 рівень

Мікроструктура КТ – 3 рівень

Synoptic scheme of the CT

Page 11: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

11

Page 12: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

12

CT fragment (online version)

Page 13: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

13

Діяльність – 0 рівень Діяльність мовленнєва – 1 рівень

Висловлення думки / почуття (висловлювати) – 2 рівень

Обмін думками (розмовляти) – 2 рівень Особливості вимови (вимовляти) – 2 рівень

* багато, беззмістовно, про неістотне – 3 рівень * басом – 3 рівень * включаючи свої слова в чиєсь мовлення – 3 рівень * грубо – 3 рівень * для записування іншою особою – 3 рівень * довго, захоплюючись розмовою – 3 рівень * дотепно – 3 рівень * дуже голосно – 3 рівень

* * з негативним наслідком – 4 рівень * * з позитивним наслідком – 4 рівень * * один раз – 4 рівень * * раз по раз – 4 рівень * * постійно – 4 рівень […]

* чітко – 3 рівень […] Повідомлення інформації (повідомляти) – 2 рівень

[…] Здатність, спроможність, уміння – 1 рівень […]

Synoptic scheme of speech verbs in the CT

Page 14: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

14

Page 15: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

15

7. CT microstructure (CT of CI versus CT of UV)

Title term / Verb Definition – genus-species (for a term)

or close to encyclopaedic (for a concept) / interpretation

Relations – genus-species and synonymic / + manner of action relations and relations between verb and other parts of speech

Page 16: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

16

Page 17: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

17

8. Semantic relations in CT

Hierarchical interverbal relations, or hyponymy, derivational hyponymy in particular, represented by hyperonyms, hyponyms, and verbs of manner of action (VMA).

Same-level interverbal relations, i.e., synonymy (represented by complete (absolute) synonyms, in particular, by phonetic variants of verbs, stylistic and derivational synonyms) as well as antonymy (represented by antonyms).

Relations between verb and other parts of speech, based on verbal derivation within parts of speech and valence potential of the verb.

Page 18: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

18

Example of a CT entry

Page 19: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

19

Example of a CT entry

Page 20: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

20

9. Application of CT

As an inquiry system For teaching purposes

Page 21: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

21

10. Audience

The Specialised Thesaurus of Computer Ideography is intended for:

Specialists in philology Students of philologyThe Computer Thesaurus of Ukrainian Verbs has

a wider audience: thanks to its specification, it can be used as a multi-level information system and as a base for further linguistic research.

Page 22: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia

22

11. How to use CT

Computer program Paper project Computer version – on the linguistic

portal MOVA.info in the dictionary section

Page 23: NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS

Thank you!

Contact information:

Olena Siruk Laboratory for Computational Linguistics National Taras Schevchenko University of [email protected]