automatic creation of a semantic network encoding part of...

Automatic Creation of a Semantic Network Encoding part_of Relations

Michael Zock1 and Debela Tesfaye2

1 LIF-CNRS, 163 Avenue de Luminy, 13288 Marseille, France2 ITPHD PROGRAM, Addis Ababa University, Addis Ababa, Ethiopia

[email protected], [email protected]

We describe here the principles underlying the automatic creation of a semantic map to support navigation in a lexicon. Whenever we read a book, write a letter, or launch a query on Google, we always use words, the short- hand labels for more or less well-specified thoughts. The problem is that words may refuse to come to our mind when we need them most, at the very moment of speaking or writing. This is when we tend to reach for a dictionary. Yet, even dictionaries may fail to reveal the target word, although they contain them. This is not only a problem of input (poor query word), but also a problem of design : the way how words are organized and the kind of information associated to each one of them. We will consider in this paper one of the most original hand-crafted resources, WordNet, discussing its relative strengths and weaknesses with respect to word access. We will then describe an attempt to build automatically a subset of this resource, to conclude with the presentation of an approach meant to help authors (speakers/writers) to overcome the tip-of-the-tongue-problem (TOT) even in cases where other resources, including Wordnet or Roget’s Thesaurus, would fail.

Keywords: Lexical access, navigation, word association, lexical graphs, automatic link extraction, part-of relations, meronyms.

Journal of Cognitive Science 16-4: 431-491, 2015Date submitted: 05/06/15 Date reviewed: 07/05/15 Date confirmed for publication: 08/16/15©2015 Institute for Cognitive Science, Seoul National University

432 Michael Zock and Debela Tesfaye

1. Context and Problem

When speaking or writing we encounter basically either of the two following situations: one where everything works automatically, somehow like magic, words popping up one after another like a fountain spring, leading to a discourse flowing naturally like a (more or less) quiet river. The other situation is much less peaceful or harmonious : discourse containing hesitations, the author being blocked somewhere along the road, forcing him to look deliberately and often painstakingly for a specific, possibly known word (Zock et al. 2010).

We will be concerned here with this latter situation. More specifically, we are concerned here with a speaker/ writer using a dictionary (paper/electronic) to look for a word. While there are many kinds of dictionaries, most of them are not very useful for the language producer (authors, writers/speakers). The great majority of dictionaries are semasiological, that is, alphabetically organized words or word-forms. Alas, this kind of organization is not well suited for lookup by the language producer whose starting points (input) are meanings —i.e. concepts, elements of the word’s definition, or conceptually related elements (collocations, associations: elephant-Africa) — and the end point (outputs) the target words.

While it is true that most dictionaries have been built with the reader in mind, one must admit though that attempts have been made to assist also the writer. The best known example is, of course, Roget’s Thesaurus (Roget, 1852), but there are other tools like the BBI (Benson et al., 2010) or the Oxford Collocations Dictionary, Longman’s Language Activator (Summers, 1993), reverse dictionaries (Bernstein, 1975; Kahn, 1989; Edmonds, 1999), or hybrid forms combining lexical and encyclopedic information in a single resource (OneLook: http://onelook.com). There is also MEDAL (Rundell and Fox, 2002), a thesaurus produced with the help of Sketch Engine (Kilgarriff et al., 2004). For an excellent overview concerning semantic aspects in the computational lexicon, see (Frank & Padó, 2012).

Last, but not least, there is WordNet (http://wordnet.princeton.edu), a very special kind of ‘dictionary’ integrating in a single volume information ‘normally’ spread over different resources. Rather than creating different dictionaries for different tasks (revealing the meaning of a term, an

433Automatic Creation of a Semantic Network Encoding part_of Relations

equivalent form or its opposite, etc.), WordNet (henceforth WN) has been built as a single a database allowing for multiple access routes (Miller, 1990; Fellbaum, 1998). As this resource is closest to what we have in mind, we will focus on it in this paper, commenting on its strengths and weaknesses with respect to word access. We will also show how to create automatically a subset of it, to conclude then by a proposal describing the steps to be performed in order to create an enhanced version. This latter is meant to help authors to overcome the tip-of-the-tongue problem (Brown & Mc Neill, 1966),1 even in cases where other resources, including WN or Roget’s Thesaurus, would fail, as they have not been built accordingly.

2. WordNet

WordNet is a large, handcrafted lexical database of English. Given its originality and usefulness it has become a major resource in computational linguistics, as well as a blueprint for building equivalent resources in other languages: http://globalwordnet.org/wordnets-in-the-world/. It should be noted though that despite the fact that WN is based on psycholinguistic principles (Beckwith et al., 1991),2 and despite the fact that initially it was

1 The tip-of the tongue-problem (TOT) is characterized by the fact that the author (speaker/writer) has only partial access to the word s/he is looking for. The typically lacking parts are phonological, syllables, phonemes (Aitchison, 2003). Since all information except this last one seems to be available, and since this is the one preceeding articulation, we say: the word is stuck on the tip of the tongue. 2 The building of the resource has been inspired by work on aphasia (Caramazza & Berndt, 1978), word associations (Deese, 1965), priming (Meyer & Schvanefeldt, 1971), and Quillian's seminal work concerning the way our mind stores and organizes words (Quillian, 1967 and 1968; Collins & Quillian, 1969). Note that WN uses basically only three kind of lexical categories: nouns, verbs and adjectives, adverbs being derived from adjectives. Note also that only nouns and verbs are organized hierarchically, adjectives are organized differently. Both choices are motivated on empirical grounds. The absence of function words is based on speech error data (Garrett, 1982), suggesting that they are part of the syntactic component rather than of the lexicon. Prepositions are inserted into the syntactic frame after the choice of the main categories: nouns, verbs, adjectives, adverbs. The hypothesis that syntactic categories differ in terms of organization is based on word association studies. Indeed, when being asked to give the first word coming to


meant to support dictionary look-up– “The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically.” (Miller et al., 1993)– it was never built accordingly. Due to various circumstances, it ended up becoming a resource used mainly by machines: “WordNet is an online lexical database designed for use under program control.” (Miller, 1995, p. 39).3 This being said, WN can nevertheless be used for consultation, all the more as it is quite good at it under certain conditions. We will describe these latter briefly here below, showing how WN, or at least a subset of it, can be built automatically, and how some of its limitations could be overcome.

2.1 Brief Description of WN

Conventional dictionaries associate information to lexical entries which they present in alphabetical order. WN is different: words are linked via semantic relations. Given some input, say ‘dog’, you can get any of the following forms by clicking on the adequate hyperlink : canine, domestic animal (hypernym), puppy/pooch (hyponym), flag (part meronym), pack (member holonym), ... Note though that you will not get ‘bone, tail, muzzle or snout’.

WN is mainly a lexical graph emphasizing meaning rather than form. The lexicon is structured according to three viewpoints: part of speech (POS), equivalence– lexical units expressing the same concept are grouped into

mind in response to some stimulus, adults usually provide a term of the same category as the probe (Fillenbaum and Jones, 1965). While adults answer by paradigmatically related words, children tend to answer with a syntagmatically related word (Brown & Berko, 1960; Ervin, 1961). In sum, the two seem to organize words differently in their respective mental lexicon.3 Note that there are two ways of using WN. One is via algorithms which generally need to be written, the other is via the Graphical User Interface: http://wordnetweb.princeton.edu/perl/webwn. Our comments here concern mainly this latter mode, as ordinary users will hardly ever conceive algorithms for using this resource. Also, among the many potential uses of WN, we are interested here mainly in lexical access, and to some extent in clustering, i.e. classifying or organizing words. See the discussion at the end of this paper.


synsets– and relations. Synsets, WN’s basic building blocks, are linked via conceptual (hyperonymy, meronymy,...) and lexical relations (synonymy, antonymy, ...).

The fact that nouns and verbs are grouped hierarchically, makes WN resemble a thesaurus or a light-weight ontology. Yet, strictly speaking, WN is neither nor. It is similar to both of them in that it emphasizes meaning or meaning relations rather than word forms, but unlike an ontology it deals with words rather than with concepts. WN is not a thesaurus either, as a thesaurus like Roget’s clusters words by topic (means of transportation, law, medicine,...), while WN clusters them by POS, equivalence and relations. Also, Roget’s thesaurus offers an index to guide search, while WN does not. Words are accessed via links.

Probably the most important feature of WN is the fact that the lexicon is organized semantically. Indeed, its creators were more concerned with the

Table 1. Some typical relations in WN


relational structure of the lexicon, i.e. the way how words are linked, than with the words’ forms. Word meaning is defined in terms of synonyms and glosses, or the word’s position in the network, i.e. its relation(s) to its direct neighbor(s).

In sum, rather than organizing word forms alphabetically, WN organizes lexical entries in terms of synsets, i.e. equivalent sets (cop-policeman), which in turn are linked to each other in various ways (see, table-1). This makes the resource ‘basically’ a network,4 hence its name, WN. The fact that the user is given a semantic network rather than a list of alphabetically organized words has obvious consequences for navigation : word-forms can now (also be) accessed via relations, more precisely, via a word and a relation {([boy]+synonym) → [lad]}; {([dog]+hypernym) → [domestic animal])} ..., which together are the input on the basis of which the resource provides an output.

2.2 Under What Conditions is WN Really Useful for Consultation?

When does WN allow us to find the elusive word (target)? As mentioned already, WN has not been built for consultation by people. Despite this fact it can be used for that, all the more as it is quite good for this task provided that the three following conditions are met :

(a) The author knows the link holding between the source word (input) and the target (e.g. ([dog]+synonym = [?] → [bitch]); ([dog]+hypernym = [?] → [canine]); Of course, any input will yield some output. More precisely; all the senses of the keyed-in word, but to go beyond that, one has to tell WN which direction to go (up, down or sideways) which is done via the links, which, once chosen display the hypernym, hyponym or meronym, etc.

(b) The input (source word) and the target are direct neighbors in the resource. For example, [seat]-[leg] (meronym); [talk]-[whisper]

4 We wrote 'basically' to draw the reader's attention to the fact that WN is not a single network, but a whole set of networks. There are 25 for nouns, and at least one for all the other parts of speech. Being based on different organizing princples, the different nets cannot really communicate with each other, which in our case (lexical access) poses a serious problem.


(troponym), etc.;(c) The link is part of WN’s database (hyponym, hypernym,...).

We will get back to these points in the conclusion, but before doing so, we will describe our attempt to build automatically the semantic net within which search takes place. Note that WN has been built manually, which is a very time- and energy-consuming process, relying on human intuition. We try to do this automatically, using corpora. This should be useful for various people : those who want to build this kind of resource from scratch, those wishing to improve the current version of WN, or those who want to develop this kind of database for an under-resourced language. This being said, to build such a net with all the links contained in WN is a very difficult task. This is why we will consider here only a small subset, meronyms, to see whether the same approach could be used for other kinds of links.

3. Why Care about Meronyms?

Parts/wholes, or, more technically speaking, meronyms and holonyms play an important part not only in language (see below), but also in human cognition by and large. Whenever we look for our socks, shoes or keys (door, vault, luggage) we always take into account the larger object of which they may be part of or the scene they may play a role in. Hence encoding and expressing the different parts properly is vital for making ourselves understood. Obviously, ‘food for thought’ and ‘thought for food’ do not refer to the same ‘thing’. Parts and wholes also play an important role in composition (music, architecture, or making a painting). Suppose you were to sketch a specific African mammal. In order to be able to ‘re’-present this concept visually you would need first to have a fairly clear idea in your mind concerning the whole (pattern, rough shape of the animal) to fill in then the appropriate details. Otherwise you may well end up with an unrecognizable form, or possibly some kind of hybrid creature, whose head is from the animal you have in mind (giraffe), the body from some other animal (horse) and the tail from still another creature (donkey). This is probably not quite what you want, since you intended to refer to a specific, well-known animal, living in Africa, known for its size (neck, legs, body), horns (hair-covered; ossicones) and coat pattern which resembles that of an


Okapi or leopard.Since language expresses knowledge, and since meronyms play an

important role in cognition (knowledge), they necessarily also play an important role in language. For example, they allow us

• to inform someone about the location of an objet : (a) ‘The beer is in the fridge’ or (b) address (country>city>street>name, or the opposite);

• to signal the scope of some event : ‘I had a flat tire’ meaning that (only) part of the car was broken;

• to give directions : ‘Enter the building and go to room 108 at the end of the corridor.’

....Using this relation or its arguments (terms) wrongly produces noise,

amusement or bewilderment (Africa is a big country.), and possibly even mistakes (see below) or misunderstandings. Put differently, learning to use meronyms properly is important. This has been realized already by the old Greeks who considered it not only as a fundamental ontological category, but also as a rhetorical device in their figures of speech (metonymy, synecdoche).

While meronymy seems fairly straight forward when both terms are known and well understood (head-mouth; body-legs), things can quickly become complicated in the opposite case. For example, if you do not have a minimal understanding concerning viruses, bacteria, and microbes, it is not obvious at all to decide which of them can be the ‘part’ or the ‘whole’ in a given sentence.

Note that appreciating meronymic roles is important not only for language understanding, but also for language production. For example, to produce properly a sentence like “put the beer into the trunk” (put <x> into <y>), you must know which entities can assume the respective roles (x:part vs. y:whole). Yet, this can depend on the context, i.e. the second term. For example, a ‘city’ could play either role, as it can be part of a larger entity (country), as well as be the place where people live in. We can say, ‘Rome is in Italy’, but we can also write ‘John is (now) in Rome’. In both cases the part precedes the whole, and while we may claim that the role is signalled


via the verb, we cannot always assume which term can assume what role, as the ‘city’ mentioned can play either role. Reference generation, i.e. the way to refer to an object, is another example where meronyms play an important role. Imagine a household with one kitchen and one fridge. When you say “go to the kitchen and open the fridge”, it would be very odd to use the indefinite article, ‘a’, yielding ‘a fridge’. First of all, there is only one kitchen and one fridge, but, more importantly, having introduced the container (kitchen), it is perfectly normal now to assume that the listener is able to identify the part (fridge) to which the speaker is referring to.

To understand the role played by a given term may be vital in math. Yet, this is not always obvious and has to be learned. For example, some children have difficulties to acquire the notion of proportion, i.e. the comparison of a specific quantity, the ratio of a part, with respect to the whole. Being shown a set of sweets of which three are red and two are blue, many of them will give the wrong answer when being asked the proportion of the blue sweets, their answer being 2/3 rather than 2/5. The mistake is due to the fact that they compare the two colors rather than two sets, the small one (the blue: 2) to the whole, i.e. the red and blue (2+3=5).

Meronymic links are also useful information in dictionaries. Suppose you were searching for the terms ‘admiral’ or ‘general’, yet all that came to your mind were ‘navy’ or ‘army’. A dictionary containing meronyms would be able to provide you the elusive word, as it ‘knows’ that ‘admirals’ and ‘generals’ are part of the ‘navy’ and ‘army’ and that both are part of the armed forces, i.e. military.

As one can see, Part-Whole relations (PT-WHRs) are important. This being so, several scholars have proposed taxonomies of them (Winston et al., 1987; Pribbenow, 1995; Gerstl & Pribbenow, 1995; Vieu & Aurnague, 2007; Keet & Artale, 2008). We will follow Winston’s classical proposal:

component – integral object handle – cupmember – collection tree – forestportion – mass grain – saltstuff – object steel – bikefeature – activity paying – shoppingplace – area oasis – desert


• Integral objects have a structure; their components can be torn apart, and their elements have a functional relation with respect to the whole. For example, ‘kitchen–apartment’ or ‘aria–opera’.

• ‘Tree-forest’ and ‘chairman-committee’ are typical representatives of Member-Collection relations.

• Portion-Mass captures the relations between portions, masses, objects and physical dimensions. For example: ‘meter-kilometer’.

• The Stuf f-Object category encodes the relations between an object and the stuff it is made of. For example, ‘steel-car’ or ‘snow-snowball’.

• Feature-activity encodes the fact that something is inevitably part of another thing. Visiting the ‘Eiffel Tower’ can be considered as being part of ‘visiting Paris’.

• Place-Area captures the relation between an area and a sub-area like ‘Ethiopia-Addis Ababa’.

Meronymic relations can also be categorized as typical or accidental. The former are always true (roof-house), while the latter are episodic (cucumber-sandwich), they happen only occasionally, and are not always predictable. We focus here only on the first type. To do so, we must take the meaning of the meronymic arguments into account (part_of (roof, house)).

To capture the meaning of words we relied on the intuition that meanings depend to some extent on a word’s neighborhood, be it direct (black coffee) or indirect (the color of cof fee is generally black). Words occurring in similar contexts tend to have similar meanings (Harshman, 1970). This idea, known as the ‘distributional hypothesis,’5 has been proposed by various scholars (Firth, 1957; Harris, 1954, see also Lenci, 2008; Baroni & Lenci, 2010). It implies that the meaning of a word depends on the way it is used (Wittgenstein, 1922), that is, its context, i.e. its more or less direct neighbors. A word’s meaning cannot be fully grasped unless one takes the context in which it occurs into account. Meaning and context can be captured in terms of proximity or neighborhood, i.e. words co-occurring within a defined window (phrase, sentence, paragraph). We operationalized this notion via a vector space model (VSM) which we will briefly introduce in section

5 http://en.wikipedia.org/wiki/Distributional_hypothesis


5.1 together with some enhancements (section 5.2). Yet, before doing so, we would like to present some related work (section 4) and our approach (section 5).

4. Related Work

A great amount of work has been done on lexical networks, relation identification and vector space models.

4.1 Related Work on Wordnets

WN and Wikipedia are arguably (among) the most used resources in computational linguistics. They are precious for many tasks —word sense disambiguation, semantic tagging and information retrieval, etc.— and their combination (Navigli & Ponzetto, 2010; Ruiz-Casado et al., 2005) probably even more so.6 To get an idea of the impact of WN it suffices to see the size of the community they have succeeded in creating, the number of publications, the number of existing similar resources inspired by the original resource,7 or the number of applications relying on it. For example, Rosenzweig et al. (2007)8 list more than 860 papers describing applications using WN, and this was nearly 10 years ago. The two following websites9 show related projects, or corpora annotated with the help of WN. Additional information concerning the success of this resource is given in (Morato et al., 2004) who try to explain its causes.

As one can see, a resource like WN is important.10 Hence efforts have

6 For a similar approach, see UBY (http://www.ukp.tu-darmstadt.de/data/lexical-resources/uby).7 70, according to the following website (http://www.globalwordnet.org). For well-known examples, see : EuroWN (Vossen, 1998), BalkaNet (Tufis, 2000) and MultiWN (Pianta et al., 2002), or the Open Multilingual WordNet created by F. Bond (http://compling.hss.ntu.edu.sg/omw/), which links open-source wordnets in over 20 languages (see also, Bond et al., 2008). 8 http://wordnet.princeton.edu/wordnet/publications9 http://wordnet.princeton.edu/wordnet/related-projects/ and http://globalwordnet.org/wordnet-annotated-corpora/


been made to extend WN or to build an equivalent resource. Yet building such a resource manually takes times, is expensive and possibly error prone. Even though built by humans, chances are that the result is biased by a number of a priori decisions its creators are not even aware of. This being so, various alternatives have been considered : (a) use of the original version (Farreres, et al. 1998) or a combination of WNs (Bond et al., 2008); (b) combine multiple resources (Fiser & Sagot 2008); (c) use parallel corpora in concert with WN (Fiser, 2009, Diab, 2004); (d) use traditional dictionaries together with some computational tools (Vetulani, 2009), etc. Atserias, et al. (1997) provide a good summary of methods used to build multilingual wordnets, trying to account also for the reasons of its success.

Other strategies have been proposed : (a) build the resource interactively (Piasecki et al. 2009), (b) use crowd-sourcing (Lafourcade & Joubert, 2015; Lafourcade, 2007),11 or (c) use corpora and computational methods for identifying similarities (Bullinaria & Levy, 2007) or semantically related words (Turney, 2006). Note that this last technique has reached a degree of maturity, coming close to that of native speakers. This is why it is used in AutoWordnet,12 a Marie-Curie project, conceived and carried out by Rapp, who tries to build this resource automatically. For a short description of the underlying methodology see (Rapp & Zock, 2012). Note that this is not the only attempt to build WN automatically. For other approaches with a similar goal see (Barbu et al., 2007; Saveski & Trajkovski, 2010).

4.2 Work on Automatic Relation Extraction and Vector Space Models

Semantic relations can be gathered in many ways, for example, by gleaning co-occurrence information about words or word pairs from corpora. Pattern-based methods use lexico-syntactic templates to identify individual relations (Hearst 1992). Distributional methods record the co-occurrence of words with their context, allowing thus to represent terms as vectors which can then be compared in terms of angular distance (cosine similarity) to

10 For an interesting alternative, see (Borin et al., 2013).11 See also, http://russianword.net/publications.12 http://www.ftsk.uni-mainz.de/user/rapp/autowordnet/


determine semantic relatedness (Schütze 1993). A lot of work has been done for identifying semantic information or

relations (McDonald, 1995; Auger & Barrière, 2008, Bach & Badaskard). These works are somewhere along a continuum, ranging from exclusively hand-crafted patterns (Hearst, 1998), to rule-based and probabilistic methods (Bullinaria & Levy, 2012). Figure 1 here below sketches the major steps in this kind of approach.

For example, Finin (1980) relied exclusively on manually built rules. Beamer (2008) used a knowledge-intensive approach by drawing on resources like WN and annotated corpora. The work of (Hage, 2006) and (Harshman, 1970) is domain-dependent, while the one of (Girju et al., 2005) and (Matthew & Charniak, 1999) is language-dependent, as it relies on syntactic structures.

Resource intensive approaches like the ones using WN are not suitable for languages devoid of such a resource, for example, under-resourced languages. Resource intensive approaches use texts, tagged with WN information, for example, word senses. Unfortunately, this kind of approach cannot be applied to applications relying on real-world data. Normal texts are never tagged with WN information (senses, type of link, etc.). In addition, most of the above-mentioned approaches are highly language

Figure 1. Acquisition based on patterns or word distributions(inspired by Frank & Padó, 2012)


dependent. The classification features used to build the rules are extracted from a specific language. For example, Hearst (1998) uses syntactic features occurring frequently in sentences and various kinds of texts. However, such syntactic structures are rare, their coverage is small and their effectiveness greatly depends on the type of semantic relation extracted. Indeed, the author reports better results for hyponyms than for meronyms which may be due to the frequent ambiguity of syntactic structures encoding meronymic relations. This is why we may need to take a different approach, and we believe that among the available techniques the Vector Space Model (VSM) is a very good candidate.

While not being in the rim lights as much as WN, the Vector Space Model (VSM) has nevertheless had a strong impact on the community and the way work is being carried out these days. This is reflected in the words of Turney & Pantel (2010) when they write,

“The success of the VSM for information retrieval has inspired researchers to extend the VSM to other semantic tasks in natural language processing, with impressive results. For instance, Rapp (2003) used a vector-based representation of word meaning to achieve a score of 92.5% on multiple-choice synonym questions from the Test of English as a Foreign Language (TOEFL), whereas the average human score was 64.5%. Turney (2006) used a vector-based representation of semantic relations to attain a score of 56% on multiple-choice analogy questions from the SAT college entrance test, compared to an average human score of 57%.”

Latent Semantic Analysis (LSA) is a method for extracting and representing hidden meanings (the contextual, usage-based meaning of words) by using a vector space model. LSA is in many respects similar to neural nets (Landauer & Dumais, 1997), for example, parallelism, but it relies on singular value decomposition, a (mathematical matrix decomposition) technique akin to factor analysis, allowing to reduce vector dimensions. This technique works remarkably well for determining the relative similarity between terms (tea, coffee vs. bread). Actually, LSA achieved similar scores as non-natives did (64% vs. 64,5%) in the TOEFL


vocabulary test (Landauer and Dumais, 1997). This being said, similarity and relatedness are not the same. Similarity is generally captured via synonyms, whereas relatedness is expressed via antonyms, hyponyms, meronyms etc. which are, of course, not the same as synonyms. Yet LSA, like other vector space models, considers all of them in a similar fashion, providing the degree of similarity between word pairs regardless of the type of relationship holding between them.

ADW, standing for ‘Align, Disambiguate, and Walk’ (Pilehvar et al., 2013), is a graph-based approach measuring the semantic similarity of linguistic items at various levels (word senses, texts, ...). Linguistic items are represented in a unified way via semantic signatures, which are a probability distribution over concepts, word senses, or words in a lexicon. To measure the similarity between words ADW considers their senses. To this end it starts by disambiguating words by using the context in which a word is used. Hence, a given form, say ‘mouse’, may convey different meanings depending on the context in which it occurs. To evaluate the quality of the similarity measures the authors used a correlation task for several well-known datasets : RG-65, MC-30, WordSim-353 similarity, and YP-130. ADW achieved excellent results compared to the state of the art approaches (Pilehvar et al., 2013).

NASARI (Novel Approach to a Semantically Aware Representation of Items) is a vector representation technique of two complementary resources, WN, a lexicographic database and Wikipedia, an encyclopedic resource (Camacho-Collados et al., 2015). The authors used BabelNet (Navigli and Ponzetto, 2012), as a bridge for mapping WordNet synsets to Wikipedia pages. BabelNet is a high coverage multilingual encyclopedic dictionary and a semantic network merging, among other Wikipedia and WordNet. Being able now to exploit a unified version of several resources NASARI outperforms competitors with respect to word similarity detection and sense clustering.

5. General Description of Our Approach

As mentioned already, our goal is to build a semantic network encoding part_of relations. To this end we devised a weakly supervised method


for extracting automatically meronymic relations (component–integral object; part-whole).13 Our method depends very little on language and it is completely domain-independent. Nevertheless, we do need a ‘Part of Speech Tagger’ or a ‘part-of-speech tagged corpus’, a dependency parser and a small list of meronyms consisting 10 pairs of component–integral object part-whole. In this respect our work differs from that of other scholars, as we do not require a resource like WN. Hence, our approach can be used even for under-resourced languages, or languages lacking a resource like WN. Put differently, the method is sufficiently general to be applicable to other languages than the one for which it has been initially designed.

Since word-meaning is represented as a vector based on the n-gram value of all co-occurring words we need a corpus. To build the vectors we used COHA (Corpus of Historial American English), which is an N-gram, part-of-speech tagged corpus (Davies, 2011)14 of 400 million words. For languages lacking this kind of resource, i.e. a tagged corpus, plain text can be used, as the system is able to identify the words’ co-occurrences. Again, this feature is very convenient for under-resourced languages, as it makes their preparation (pre-processing) easier than if we had to annotate the corpus manually.

Starting with the first word in the corpus, the system extracts all associated words expressing a PT-WHR to continue then with the next word until it has reached the end. Actually, the system performs six operations.

Being basically interested in meryonyms, we start to tag the constituent words of sentences with their parts of the speech (step-1) to extract at step two the Noun-Noun co-occurrences (NN co-occ). For example : car-engine; airplane-engine; search-engine... We call the first noun, i.e. the one to the left the head (car) and the other (engine) the tail.

The NN co-occ will then be grouped on the basis of their tail noun (step-3), that is, we cluster NN co-occurrences if they have the same tail noun: engine [airplane, search, car]. These clusters will provide us with all the

13 Supervised learning means that the examples on the basis of which the system learns are labeled, i.e. they specify explicitly which forms are correct and which are not. In unsupervised learning examples are not labeled, the system clusters data into classes, giving the latter some arbitrary name.14 http://www.ngrams.info


nouns paired with the tail (for details, see section 6, step 3). At step four we cluster the noun pairs grouped at the preceding step

again, but this time on the basis of the degree of semantic similarity of their head nouns. This time we will cluster semantically related nouns if they have the same tail noun (engine {[airplane, car] [search]}).

At step five we compute the similarity values of the NN co-occurrences in order to decide on the nature of the relationship holding between the two arguments (NN co-occ), i.e. in order to decide whether they are linked via a meronymic relationship. To this end we used an enhanced version of the vector space model (VSM) as explained in section 5.2. The underlying assumption here is that specific vectors can be built on the basis of the relation holding between nouns, meronymy in our case (please refer to section 5.1 and 5.2 for more description of the vectors construction).

At step 6, the program will determine whether a given noun pair is linked via a meronomic relation or not by aggregating all the information obtained during the steps 1 to 5. Given a training set composed of the combination of meronymic and non-meronymic noun pairs, at step 6, the algorithm learns the optimal combination of the outputs from step one to five which help the system to classify all noun pairs of the training set into the proper groups: meronymic or not. This optimal combination of outputs is based on the results obtained at steps 3 and 4 (clustering) and step 5 (similarity values). The learned patterns will then be expressed as production rules to allow for further extraction of noun pairs linked via a meronymic relation.

5.1 Brief Description of VSM

Since we try to capture meaning via word similarity, the question arises of how to operationalize this notion. One way of doing so is to create a vector space composed of the target word and its neighbors (Lund and Burgess, 1996). This approach, known as vector space model (VSM), has been developed for information retrieval (Salton et al., 1975). The idea was to represent all documents of a collection as points in a space, i.e. as a vector in a vector space. Semantic similarity is expressed via the distance of two points: closely related points express similarity, while distant points signal unrelated or remotely related words. In sum, the user’s query and


the document are represented as points in some space. The documents are then sorted in terms of relative similarity with respect to the query, the most similar ones appearing on top of the list, and the remainder (the less similar ones) further down depending on their rank.

The success of the VSM for information retrieval has inspired researchers to use it for other tasks, for example the computation of word similarity or the representation of word meaning. To measure the semantic relatedness of a pair of words, the meaning of each word is represented as a vector in a word space. This vector is built on the basis of co-occurring words. Consider the following example which measures the proximity between car and vehicle by using the co-occurrence information gleaned from the corresponding Wikipedia pages, yielding the matrix shown here below. “A car is a wheeled, self-powered motor vehicle used for transportation.” (http://en.wikipedia. org/wiki/car) and “A vehicle is a mobile machine that transports people or cargo.” (http://en.wikipedia. org/wiki/vehicle)

The vectors are built by carrying out the four following steps :1. Extract words co-occurring with car in a predefined window. While

in our experiments, we considered the window to be six content bearing words of the sentence containing the target, hence, three terms preceding and following the term car, in the example here above we considered only the first five words to the right-hand side of the target, for illustrational purposes. Hence, the term ‘car’ yields

Table 2. Sample word space matrix. I (index), (vector for term1, ‘car’)


the following list of words: wheeled, self-powered, motor-vehicle, use and transportation.

2. Do the same for the word vehicle which will yield: mobile machine, transport, people, cargo.

3. Build the corresponding vectors for each word (car, vehicle) based on its co-occurrences (see the table here above). In its simplest form, the vectors for each word are built on the basis of their co-occurrences. A term (car, vehicle) receives frequency values depending on the number of times it co-occurs with another term within the defined window, which would yield here the following vectors for car [1, 1, 1, 1, 1, 0, 0, 0, 0] and vehicle [0, 0, 0, 0, 0, 1, 1, 1, 1]. Note, the words here are referred to via their index for example, their position in the word space. Hence, the first position refers to the first term, the second to the second, and the last to the last in the matrix, the word ‘cargo’ in the case of ‘vehicle’. Unlike in the example given here above, we use in our experiments the weighted frequency of the words’ co-occurrences rather than binary values.

4. Measure the distance between the vectors. In order to quantify the similarity between two words, we have to compute the cosine similarity of their respective vector representations. This is done in the following way:

Take a pair of words, say car (T1) and vehicle (T2), with their respective vectors and weights, and carry out the steps described here below. For example, the vector for T1 comprises the terms (wheeled, self-powered, motor vehicle, use, transportation, mobile machine, transport, people, cargo), and is written as (T1 with an arrow). To determine the similarity between ‘car’ and ‘vehicle’ we compute their cosine similarity, i.e. the angle between their vectors. The cosine similarity value is computed in the following way:

1. sum the product of the weight of the terms in the respective vectors ( and ) 2. sum the square of the weights of each vector term of and ;3. take the square root of the product of the results of step 2;4. divide the result of step 1 by the result of step 3.


VSMs have been quite successful for a number of tasks (see section 5.2), but none of them, not even LSA, can provide us with the kind of information needed: the name of the relationship holding between two concepts or words. Put differently, associations must not only be identified as in LSA, they must also be labeled in terms of their relation type. Note that this is different from naming the clusters created on the basis of directly associated words (query word, prime.), i.e. direct neighbors of the input. Despite its limitations, we believe that the VSM can be used to extract PT-WHRs. This has two major advantages: it requires little man power and few resources (corpora), at least far less than in Girju’s approach (Girju et al., 2005) which relies heavily on WN and annotated corpora.

The underlying idea in this work here is that the type of relationship holding between two concepts/words can be inferred from data, for example, corpora, by computing the similarity values of their co-occurrences. This allows us to extract part_of relations, which can then be clustered on the basis of the similarity of words. The similarity value can be obtained in different ways (Dagan et al., 1999), and it may depend on the type of relation to be identified. Put differently, the vectors used for encoding, say, a part-whole relations are different from those encoding hyponyms. The n-gram information used to extract the vectors is also specific to the type of semantic relation to be encoded (see section 5.2 for details).

5.2 Specificities of Our Approach for Measuring Directional Similarity

As shown in section 5.1, vector space models measure the relative proximity of words mainly on the basis of their co-occurrences. For example, to appreciate whether and to what extent ‘car’ and ‘engine’ are related, they consider all words co-occurring with ‘car’ and ‘engine’ to build then a vector for each one of them. The distance between them is equivalent to their degree of relatedness. The same principle could be used for any pair of words, relatedness being generally expressed via a numeric value ranging from 0 to 1.

This approach looks very promising and has indeed yielded quite interesting results. Alas, this model also has some shortcomings. For


instance, it does not take context into account. Yet, words occur in various contexts and they are ‘related’ in ‘various ways’, one of them being semantic: term equivalence (synonymy), hierarchical, i.e. more general or more specific (part_of, meronymy; hyponymy), etc. Yet, the VSM described here above measures the degree of relatedness between, say ‘car’ and ‘engine’, in the same way as it would do, between ‘buy’ and ‘purchase’ or ‘car’ and ‘vehicle’. Yet each of these two couples encode quite different relations, synonymous in one case, hierarchical (hypernym/hyponym) in the other. The question one might ask then is, is it reasonable to use the same procedure or the same vector space model for word pairs being linked via different semantic relations?

In a similar vein, the degree of relatedness may depend on the inherent strengths of the words under consideration. Some words are strong antonyms, or strong synonyms, etc., while others are not. Indeed, experiments done by psychologists (Paradis, 2009) showed that ‘good-bad’ and ‘long-short’ are considered to be better antonyms than ‘obscure-clear’ and ‘speedy-slow’. Moreover, antonyms exhibit a higher degree of co-occurrence similarity than synonyms (Scheible et.al. 2013). Yet, VSMs tend to yield strong similarity values both for synonyms and antonyms, making their distinction very hard. To avoid this, we suggest to take into account the semantic nature of the words under consideration. In other words, the vector design for synonyms should be different from the ones for meronyms, hyponyms, etc. since the relations they encode are totally different. In order to design the vectors accordingly, i.e. relative to a given semantic relation type, we made certain assumptions which we present together with the designed vectors here below.

Synonyms like ‘buy’ and ‘purchase’ are words with the same or similar meanings. Hence, being often used one for the other they are competing in many respects (which incidentally may lead to word creations or blends, a kind of speech error where parts of one lemma are concatenated with parts of another to form a ‘new’ word: spoon + fork → foon/spork; smoke + fog (1) → smog). Such words occur in similar contexts. Hence, a vector space model measuring the degree of relatedness between a pair of synonyms can be built by using all words co-occurring in various contexts, which can be anything from a sentence, to a paragraph, or an entire document. This has


been done for synonyms with quite some success (Turney & Patel, 2010; Rapp 2004). Note that this does not hold for other relations like ‘source-product’ or ‘whole-part’ (meronymes) which are related to each other only in some respects. For instance, ‘engine’ is related to its whole ‘car’ only in some respects (mainly, functional), while ‘car’ and ‘automobile’ are related to each other in nearly all respects as they can substitute each other. This being so, the performance of such models is much better for extracting synonyms than it is for extracting other relations, for example, meronyms. The reason for this seems to lie in the fact that the model fits well the very nature of synonyms: being potential substitutes, their similarity can be measured by considering the contexts of the synonymic pairs via their co-occurrences. In other words, all the contexts of the synonyms can be modeled via the (entire) set of their co-occurrences. The question remains whether we can do the same for terms being related differently, say, for words being related via meronymic or hyponymic relations? Should we consider all word co-occurrences to model the vector space for them, or should we design different types of vector space models using only specific sets of word co-occurrences depending on the nature of the semantic relation? We have taken the latter option.

In other relations, say, source-product, part-whole (meronymes), ... the linked concepts are not substitutes. They do not compete at all with each other. Quite to the contrary, they often complete each other, or, if they are related, they are so only in some respects. For instance, part-whole relations of integral objects, the relation we are mostly interested in here in this work, do have this characteristic. Their components are separable, entertaining a functional relation with respect to the whole (Winston et al., 1987). Hence, the ‘part’ has only a functional relation with respect to the ‘whole’, but not the opposite. This is why the ‘part’ is generally mentioned in the context of referring to the ‘whole’ object, while the opposite is rarely the case.

Besides that, holistic entities have generally many other components apart from the ones just mentioned. Hence their similarity, derived, say from co-occurrence, is not restricted only to one of the parts. However, the part is totally and exclusively linked to the whole/s and (in general) to its functions. This may lead us to the conclusion that the context of the part is a subset of the context of the whole. For instance ‘engine’ and ‘car’ are


meronyms. The function of an ‘engine’ is linked to its whole, say, ‘car’ or ‘airplane’. The context of the term ‘engine’ can be captured by extracting its co-occurrences when it is mentioned (as is generally the case) in the context of holistic objects like a ‘car’ or ‘airplane’.

The reverse does not hold true. The term ‘car’ (holistic object) is not always mentioned when we talk about its parts, ‘engine’, ‘muffler’, ‘tires’, ..... Hence, if our assumptions are correct, we can determine the meaning of ‘car’ in the context of ‘engine’ via word distribution. This kind of evidence can be obtained from big corpora, representing a great variety of domains.

Obviously, the constructed vectors need to be in line with our assumptions, the goal being to capture the underlying nature of the relation at hand, namely component-integral part whole relations. To achieve this goal we have identified four types of similarity values:

• DsimH (directional similarity of the head) to measure the degree of similarity of the head and the tail nouns in the context of the head noun;

• DsimT (directional similarity of the tail) to measure the degree of similarity of the tail and head nouns in the context of the tail noun

• NsimHT (nondirectional similarity between the head and the tail) to measure the degree of similarity between the head and the tail nouns in all contexts assumed by the head and tail.

• We have also calculated the similarity between the head and the respective cluster, which we called NsimHC (nondirectional similarity between the head and the cluster). To this end we computed the non directional similarity between the given head and all the other heads in the cluster, taking then their average.

In order to get the similarity values we constructed four types of vectors: • The NvecH (nondirectional vectors for head) is constructed by taking

all the co-occurrences of the head noun. • The NvecT (nondirectional vectors for tail) is constructed by taking all

the co-occurrences of the Tail noun. • DvecH (directional vector for head) and DvecT (directional vector for

tail) are constructed by taking the intersection of NvecH and NvecT. The vector terms in the DvecH are the terms appearing in both NvecH and


NvecT and the values are the vector values of the terms in the NvecH. The size of the DvecH is equal to the size of the NvecH. Likewise, the vector terms in the DvecT are the terms appearing in both NvecT and NvecH and the values are the respective vector values of the terms in the NvecT. The size of the DvecT is equal to the size of the NvecT.

Note that these vectors are different from the ones used in traditional vector space models. The DvecT vector for the tail noun is built here only on the basis of words co-occurring with the tail noun in order to capture independently its context. Words co-occurring only with the head noun will not be included in the directional vector for the tail (DvecT). Hence, the size of the vector is equal to the size of the number of words co-occurring with the tail. Likewise, the vector for the head noun (DvecH) is built only on the basis of words co-occurring with the head noun to capture also its context. If ever a word co-occurs both with the head- and the tail noun, its value is recorded in both vectors, otherwise their respective vector values will be the weight, i.e. frequency of the tail/head noun and zero for the other noun.

The DsimH is calculated by taking the cosine of the directional vectors for the head and nondirectional vectors for the tail: DsimH = cossim (Dvec H, Nvec T). To compute the DsimT we take the cosine of the directional vectors for the tail and nondirectional vectors for the head: DsimT = cossim (Dvec T,

Table 3. Sample vectors for the meronymic pair ‘car-engine’


Nvec H). Finally, the NsimHT is calculated by taking the cosine of the non directional vectors for tail and non directional vectors for head: DsimHT = cossim (Nvec T, Nvec H).

6. Our Approach in More Detail

Four problems need to be solved for building the resource (the semantic network). We need to get a representative corpus, index lexical entries in terms of associations (i.e. build an association matrix), rank the terms and label the links.

To address the first task we used the Brown corpus and Wikipedia. Next, we developed a system, i.e. a pipeline comprising six components (see figure 2) to address the remaining problems explained below:

• Step 1: This component identifies the part of speech categories of the sentence elements. Since part-whole relations connect only nouns, we need only a tagger being able to identify this POS. We extracted from the Brown corpus a sample of sentences to tag them then with their respective POS by using the Stanford POS-tagger.15

Input: Car carries its own engine.Output: Car/NNP carries/VBZ its/PRP$ own/JJ engine/NN ./.

• Step 2: The next component extracts Noun-Noun co-occurrences (N-N sequences) from the tagged corpus. For example, ‘corolla car’, ‘door of car’, ‘car engine’, ‘engine of car’, ‘car design’, ‘network design’,

Figure 2. System information flow

15 http://nlp.stanford.edu/software/tagger.shtml


’airplane engine’, ‘search engine’ etc. Noun phrases are not included in our current version. There are two types of co-occurrences: nouns occurring directly next to each other (car engine) and nouns whose co-occurrence is mediated via another type of word occurring in between them (key of the car) often a verb, adjective or preposition). Both types need to be identified. To this end we extracted N-N co-occurrences linked via the patterns presented in table 4 below.

• The above six patterns are the most productive patterns encoding meronymic relations. To identify them we have used the following procedure.

○ Starting with the list of 10 meronymic pairs (e.g.: car engine), extract sentences containing them in the Brown corpus and Wikipedia;

○ Parse the sentences (we used the Stanford parser);○ Extract words linking the meronymic pairs in the dependency

tree of the sentences and put them in a bag;○ Count the frequency of the patterns from the previous step, sort

and select the most frequent ones.Nouns pairs can be easily extracted, regardless of their distance to

each other and regardless of the type and number of words in between them. In order to do so, we mine the co-occurrence information of the noun pairs using the Stanford dependency parser.16 This kind of parser reveals not only the needed relational information, but is also able to

Table 4. The six most frequent meronymic patterns


capture long distance dependencies between nouns. Note that, while being similar to other co-occurrence extraction algorithms, our method does not rely on the linear order of words. Instead we determine co-occurrences on the basis of the dependency tree generated by the parser. This allows us to filter out Noun Noun co-occurrences randomly, linked via any of the five meronymic patterns listed above.

Having two nouns (car-engine; engine of car), we signal their respective functions via names, calling the first one the head and the second the tail. ‘Car’ and ‘engine’ are respectively the head and the tail in the ‘car-engine’ co-occurrence, while they are the reverse in the ‘engine-car’ example. Hence, if the part appears before or after the whole object, we will retrieve the holistic entity, i.e. whole object. Since it is risky to conclude that a noun expresses the ‘part’ or the ‘whole’, as this may turn out to be incorrect, we have decided to delay this decision until the very end.

Running a dependency parser on a POS-tagged sentence allows us to extract N-N co-occurrences regardless of their distance in the sentence. Consider the following example:

Car/NNP carries/VBZ its/PRP$ own/JJ engine/NN ./.Typed dependencies

1. nsubj(carries-2, car-1)2. root(ROOT-0, carries-2)3. poss(engine-5, its-3)4. amod(engine-5, own-4)5. dobj(carries-2, engine-5)

The POS-tagger allows us to identify the nouns occurring in the sentence via any of the six meronymic patterns, while the dependency parser extracts in addition their syntactic role. In the example here above the function of car and engine is clearly shown in the dependency tree (see 1 and 5 here above), i.e ‘car’ and ‘engine’ are linked via the noun verb noun meronymic pattern.

• Step 3: The N-N co-occurrences with an identical tail noun identified

16 http://nlp.stanford.edu/software/lexparser.shtml


at the preceding step are then clustered on the basis of their tail noun. For example, ‘corolla car’ and ‘door of car’ belong to one cluster, since both of them have the same tail noun: ‘car’, while ‘car design’ and ‘network design’ belong to another cluster. The same holds true for ‘airplane engine,’ ‘search engine’ and ‘car engine’. The rationale behind this step is to group noun pairs sharing the same tail. Hence, all the possible nouns paired with the given tail noun (like engine) will be placed in the same category providing us with multiple sets of head nouns (for instance, ‘airplane’, ‘car’ vs. ‘search’) paired with the tail noun, for example, engine. We counted the frequency of the head tail noun co-occurrence, sorted them and selected the 50 most frequent ones only. At the next step we will further cluster the head nouns.

car [corolla, door], design [car, network], engine [ship, car, game, aircraft]

As can be seen from the above example, the noun pairs are grouped on the basis of their tail noun.

• Step 4: the noun pairs of the clusters created in step three are clustered again, but this time on the basis of the degree of semantic similarity of their head nouns.

car { [corolla] [door] }

Table 5. Frequencies of some head nouns co-occurring with the tail ‘engine’


design { [car] [network] }engine {[airplane, car] [search]}

In order to calculate the similarity between the head nouns, we used the cosine value of the vectors of the head nouns. The vector used to measure the similarity between the head nouns we followed the procedure corresponding to the S2 type vector according to the explanation given in section 5.2. Hence, the vectors are created by taking all words co-occurring with the nouns. Head nouns with similar cosine values are clustered together. Accordingly, ‘airplane’ and ‘car’ are grouped together, while ‘search’ is placed in its own group within the engine cluster.

• Step 5: This component computes the similarity between the head and tail noun. In order to do so we used the vectors constructed at step 5.2. As explained in section 5.2, we have built different vectors to capture the underlying nature of the meronym at hand and produced the four types of similarity values: DsimH , DsimT, NsimHT and NsimHC.

The underlying idea is that the tail nouns of the noun pairs presenting the ‘Component-Integral object’ or a ‘Part-Whole relation’ have a DsimT value above a given threshold α with respect to their head nouns in their clusters, the NsimHT above a given threshold β and the NsimHC above a given threshold θ. The method used for tuning the thresholds for α, β and θ is explained in section 7 when we describe the evaluation. Using this procedure reveals that terms like ‘airplane’ and ‘car’ are very similar to ‘engine’, while ‘search’ has a small subset of terms in the cluster: airplane-engine, ‘search-engine’, ‘car-engine’. At step 6 below we have provided the production rules (if-then rules) learned from the DsimT, NsimHC and NsimHT thresholds (α, β, θ) in order to discriminate the meronymic from other kinds of relations.

• Step 6: tthe last module identifies whether two nouns are linked via an integral component Part-Whole relation or not (PT-WHR). To do so, the system draws on information provided by the above-mentioned modules. Given some cluster(s) (built in steps 3 and 4) and a set of similarity values (α, β, θ identified in the training corpus, step 5), the system extracts automatically a production rule: if <condition> then <action>. This latter is used to decide whether two words are linked


via an integral component PT-WHR or not. In order to achieve this goal, we ran the algorithm described in steps 1 to 5 on the training corpus.

The training set contains sentences linking noun pairs entertaining a part-of relationship and sentences linking noun pairs devoid of this kind of relationship. Accordingly, noun pairs are tagged as “T” (true) if they exhibit a part_of relationship and as “F” (false) otherwise. We used as training set the text collection of SemEval (Girju et al. 2007). Since the number of integral component PT-WHR in the SemEval is very small (14), we have added more positive examples from WN (86 examples) to raise the training set to 100 positive examples.

The system groups noun pairs according to their similarity values in order to learn the optimal range of these values automatically. As explained in step 5, the system calculates the four similarity values (DsimT, DsimH, NsimHT and NsimHC) for every noun co-occurrence in the training set and takes then the range of values exhibited by the majority of part_of noun co-occurrences in the training corpus. In order to determine this range, we calculated an error rate for all possible similarity ranges obtained for all N-N co-occurrences in the training corpus and selected then the one with the lowest error rate using the procedure described in algorithm 1 (see also section 7).

Here below is a subset of the production rules created in order to discriminate meronymic relations from other kinds of relations (the values of α, β, θ are tuned in section 7.):

Given the pairs of nouns as described in the steps 3 and 4 here above.If the similarity value NsimHT > β && if the similarity

value DsimT > α If the noun pairs occurred at least once as a

compound noun Then the head noun refers to the whole and the tail

to the partElse if the average similarity value between the N and

the other Ns in the cluster (NsimHC) > θIf one of the Ns in the cluster has S2 > NsimHT and

DsimT > αIf the N-pairs occurred at least once as


compound nounThen the head noun refers to the whole and the tail to the part

Else the relationship between the Ns is not a whole-part relation

The rule stipulating that ‘noun pairs occurring at least once as a compound noun’, does not imply that the noun referring to the ‘part’ is always the second noun, and the ‘whole’ the first. Indeed, the two may be separated by words of another type, for example, a preposition. In this case we will swap position, the ‘part’ preceding the ‘whole’. Both cases will be handled as discussed in step 2. Having extracted the nouns for both cases, we can find the pairs as a compound noun at least once in a well-balanced corpus. For example, having extracted ‘engine of car’ via the method described in step 2, the system will interpret the pair as ‘part-whole’.

6.1 A Walk Through

In this section, we will explain our approach via some examples. We start with a high level presentation, followed by a more detailed description using an example extracted from real text.

The example here below illustrates the functioning of the algorithm at a high level: at step 2 the algorithm lists N-N occurrences like car-engine, train-engine, airplane-engine, benzine-engine, gasoline-engine, and search-engine. N-N occurrences are put in the same cluster as they have the same tail noun : engine (step 3). In step 4 the cluster is further classified into three sub-clusters: cluster 1, cluster 2 and cluster 3:

Cluster 1: Vehicles [car-engine, train-engine, airplane-engine]Comment: We have an integral component Part-Whole relation, as ‘engine’ is part of a holistic entity: vehicles (car, train, and airplane).Cluster 2: Oil [benzine-engine, gasoline-engine] Comment: ‘Engine’ is not part of ‘oil’ (‘benzine’ or ‘gasoline’). Cluster 3: search-engine

The two clusters mentioned above are created within a cluster having engine as tail noun. The clusters are identified on the basis of the similarity


value of the head nouns. Since ‘car’, ‘train’, and ‘airplane’ have a strong similarity value they are put in the same cluster. Likewise, ‘benzine’ and ‘gasoline’ are put into some cluster and so does ‘search’. At step 6 the system separates the cluster 1 from the rest, as the vector similarity of ‘engine’ and ‘oil’ on the one hand and ‘search’ on the other is below a given threshold value, while the one of ‘engine’ and ‘vehicle’ is above it.

Let us explain our approach now in more detail via an example extracted from real text. Suppose the following input :

The Japanese government decided to raise taxes for the export of Toyota cars. This is not the only problem Toyota had to face during the last few months. Indeed, the motors of their new car models having problems, the company decided to revise for free all the recently sold cars......

The POS-tagger identifies in step-1 the words’ part of speech:The/DT Japanese/JJ government/NN decided/VBD to/TO raise/VB taxes/NNS for/IN the/DT export/NN of/IN Toyota/NNP cars/NNS./ This/DT is/VBZ not/RB the/DT only/JJ problem/NN Toyota/NNP had/VBD to/TO face/VB during/IN the/DT last/JJ few/JJ months/NNS./. Indeed/RB,/, the/DT motors/NNS of/IN their/PRP$ new/JJ car/NN models/NNS having/VBG problems/NNS,/, the/DT company/NN decided/VBN to/TO revise/VB for/IN free/VBP all/PDT the/DT recently/RB cars/NNS sold/VBD.

At the next step we extract N-N co-occurrences: Toyota-car; motors-car, car-models. At step-3 we cluster these co-occurrences according to their tail noun :{[Toyota-car, motors-car], car models]]}. At step-4, the head nouns are clustered according to their similarity value. This latter is based on the distance between the vectors of the head nouns (the nouns appearing first). This yields the following results: Toyota, motors and car. We also calculate at step-4 the cosine (similarity of the vectors of the head nouns). Words with related vectors will be grouped in the same cluster. At step-5, we identify


the similarity values (DsimT, DsimH, NsimHT and NsimHC ) for the head and the tail noun as shown in the table:

This is the way how vectors are built:• The vector value is the weighted frequency of words co-occuring

with ‘Toyota’ and ‘car’ (the intersection of ‘Toyota’ and ‘car’), 0 for words that, while not co-occuring with ‘Toyota’, do co-occur with ‘car’. This allows us to create the vector DvecH for the head, ‘Toyota’.

• The DsimH value for ‘Toyota’ is calculated by taking the distance (cosine) of the DvecH vector of ‘Toyota’ and NvecT built on the basis of words co-occuring with ‘car’.

• Likewise, the vector value is the weighted frequency of words co-occurring with ‘car’ and ‘Toyota’ 0 for words, that, while not co-occurring with ‘car’, do co-occur with ‘Toyota’. This allows us to build the DvecT for ‘car’ (since ‘car’ is tail in the example).

• The DsimT value for ‘car’ is calculated by taking the distance (cosine) of the DvecT vector for ‘car’ and a vector built on the basis of words co-occurring with ‘Toyota’.

• The NsimHT similarity value is calculated by taking the cosine of the NvecH (of ‘Toyota’) and NvecT (of ‘car’).

Let us now show how we decide whether a relationship is of the kind ‘part_whole’ (step-6).

The rules use the similarity values of the table depicted above in order to decide whether there is a meronymic relation between the two nouns, and to determine their respective roles (which is the ‘whole’, which is the ‘part’). This is how the rule works. DsimH is 0.74 for ‘Toyota’ and 0.13 for ‘car’, NsimHT being 0.10. Likewise, DsimT is 0.82 for ‘motor’ and 0.52 for ‘car’, the value of NsimHT being 0.40.

Assume that N1 and N2 are respectively the first and the second noun. The production rule checks now the similarity values against the threshold learned from the training set, the thresholds being the ranges of the similarity values exhibited by most of the meronyms in the training set.

if (DsimT > = 0.8 and NsimHT > = 0.4) then print: N1 <part>; N2 <whole>if (DsimH> = 0.8 and NsimHT > = 0.4) then print: N2 <part>; N1 <whole>

In the ‘Toyota-car’ co-occurrence, ‘Toyota’ and ‘car’ are respectively N1


and N2. Substituting the values in the rule would yield:if (0.74 > = 0.8 and 0.10 > = 0.4) then print (‘Toyota’ is <part> and ‘car’

is <whole>)if (0.13 > = 0.8 and 0.099 > = 0.4) then print (‘car’ is the <part> and

‘Toyota’ is the <whole>)Since none of the above apply, the relationship between the nouns is not

of the meronymic kind. Let us do the same for ‘motor-car’:if (0.821 > = 0.8 and 0.40 > = 0.4) then print (‘motor’ is the <part> and ‘car’

is the <whole>)The condition stated in the rule is satisfied by the similarity value of the

noun pairs. Hence, we do have a meronymic relationship, with ‘motor’ being the <part> and ‘car’ being the <whole>.

if (0.51 > = 0.8 and 0.402 > = 0.4) then print (‘car’ is the <part> and ‘motor’ is the <whole>), which is false.

The steps just described are performed for all N-N co-occurrences in the paragraph.

6.2 Identification of the Links’ Senses

The concepts and the links holding between them are thus extracted from the corpus as explained above. However, there is one other problem that needs to be addressed. A word may be polysemic, that is, it may have several meanings. For example, the word-form (lemma) ‘mouse’ may stand for a ‘rodent’ (animal) or a ‘computer device’.

Likewise, the noun ‘table’ has various senses. WN17 lists among others the following four:

• S1 (n) table, tabular array (a set of data arranged in rows and columns). Example: ‘mathematical table’

• S2 (n) table (a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs). Example : ‘it was a sturdy table’

• S3 (n) table (a piece of furniture with tableware for a meal laid out

17 http://poets.notredame.ac.jp/cgi-bin/wn


on it). Example: ‘I reserved a table at my favorite restaurant’• S4 (n) table (a company of people assembled at a table for a meal

or game). Example: ‘He entertained the whole table with his witty remarks.’

In order to identify the senses, we start by listing all the parts of the concepts and cluster then the extracted parts on the basis of the cosine value between their vectors constructed from their n-gram. Polysemous words, that is concepts/words with several senses, will have several clusters. The links/associations holding between the concepts are marked on the basis of their senses. Hence, the link between two concepts encodes two types of information: the nature of the semantic relationship and the sense. In our current version we have only one type of relation i.e. meronym and the senses are not labeled semantically.

The senses are learned from the number of clusters built on the basis of the parts of the concepts. Example, ‘table has parts: columns, rows, legs, tabletop and tableware’. The cosine value of each part is compared with all other parts to identify the clusters. To this end we used the k-means clustering technique.18 In our ‘table’ example, ‘column and row’ and ‘leg, tabletop and tableware’ are grouped together given their respective vectors.

Like others (Rapp, 2004; Diab & Resnik, 2002; Kaji, 2003; Pinto et al., 2007) we use a clustering method to identify senses. However, our task is narrower than theirs, as the clusters are formed only from a small set of words, associated with a given word at a time. Also, we have considered

Figure 3. Sample of the semantic map for two senses

18 http://en.wikipedia.org/wiki/K-means_clustering


meronymic word senses only, i.e. senses that affect PT-WHRs. The extracted wholes and their parts are organized into a network.

Concepts are organized hierarchically i.e. going from the whole to its parts. For example ‘tooth’ is part of ‘gear’ which is part of an ‘engine’ which is part of a ‘car’. In this case, ‘car’ is the root. Concepts which are parts of several concepts are connected via several links. For example, ‘engine’ being part of ‘car’ and ‘train’ has two incoming links (see figure 4)

7. Evaluation

7.1 Introduction

We have tested our system for its ability to extract PT-WHRs by using the text collection of SemEval (Girju et al. 2007) and some of our own taken from WN. The Semeval test set is POS-tagged and annotated in terms of WN senses. The corpus has positive and negative semantic relations. The part–whole relations extracted by the system were validated by comparing them with the valid relations labeled in the test set answer key. The format of the test set is described in the sample here below:

“Some sophisticated <e2>tables</e2> have three <e1>legs</e1>.”WordNet(e1) = “n3”, WordNet(e2)=”n2”; Part-Whole(e1, e2) = “true”

This format has been defined by (Girju et al. 2007). Since this does not correspond to a real text format, we have changed the corpus accordingly, to obtain the following text: “Some sophisticated tables have three legs”. To evaluate the performance of our system we defined ‘precision’, ‘recall’, and ‘F-measure’ metrics in the following way:

As the number of Component-Integral object part-whole is small in

Figure 4. Sample of the semantic map showing multiple links


the SemEval test set (9 examples), we have added some meronyms (41 examples) from WN. The resulting number of meronymic relation pairs (50 examples) accounts now for 50% of our test set. 50 % of the test set contains negative examples coming either from WN or from the SemEval test set (all of them)..

We defined ‘recall’ as the percentage of correctly retrieved relations out of the correct relations available in the test set, while ‘precision’ is for us the percentage of correctly retrieved relation out of retrieved relations. We obtained 80% for precision, 74% for recall and 77% for the F-measure. The PT-WHRs extracted by the system were validated by comparing them with the valid relations labeled in the test set answer key. Each test set has an answer key, which allowed us to count (manually) correctly retrieved relations.

All the encountered errors are hyponyms and attribute relations. However, this does not imply that all hyponyms in the test are incorrectly retrieved as part-whole relation. Actually, only 12% of them are incorrectly retrieved as such in the test set.

7.2 Comparison with Other Systems.

We compared the performance of our system against state-of-the-art approaches. This being said, we could not perform a direct comparison of our approach with other systems for the following reason. As already discussed in section 5.2, vector space model approaches are widely used to measure the degree of similarity between linguistic items (words, sentences, paragraphs, documents). However, in our case, we used it to measure relatedness (meronymic relation). Nevertheless we were able to compare


the similarity values produced by our model against the similarity values produced by three other state-of-the-art vector space models. To this end we selected three state-of-the-art approaches, one based on distributional models like LSA (Stefanescu et.al., 2014) and the other two based on lexical resources, ADW (Taher and Navigli , 2015) and NASARI (Camacho-Collados et al., 2015). The descriptions of these approaches are provided in section 4.2. Table 6 (here below) shows the similarity values of 30 meronymic pairs produced by our algorithm as opposed to the other three vector space models.

As can be seen from table 6, our approach produced better similarity values for the meronymic pairs than the other three models. All the pairs in the table are meronymic pairs. Hence strong similarity values are expected for all of them. Indeed, all systems recognized them, but our system scored best, producing similarity values greater than 0.6 for 70 % of the meronymic pairs, followed by NASARI producing similarity values for 50 % of the meronyms in the range (i.e 0.6-1.0),. The average similarity values of the meronyms produced by the respective approaches are presented in the last row of the table. Again, our approach provided the best result (0.69), followed by NASARI (0.58).

Concerning the extraction of meronymic relations we have also compared our approach with ADW, LSA, and NASARI. To this end, and in order to be consistent, we have harmonized the parameters α, β, θ for the three approaches, following the same procedure as the one taken in our approach. The parameters’ values of the respective approaches are presented in table 7. Next we applied the same evaluation steps for all the approaches on the test set. The overall performances (precision, recall and f-measure) of our approach and the other three are presented in table 8. As can be seen, our approach achieved again good results compared to its competitors.

We have also evaluated the performance of the system with respect to determining the senses of a concept. To do so we used the clustering technique described above. Word forms expressing several senses have several clusters. We evaluated the results against the gold standard of meronymic word senses taken from WN (Miller, 1990).

Our clustering is based on the distance between the vectors of the parts of a given concept. We defined precision as the percentage of words assigned


to their actual WN meronymic senses out of the total number of words assigned to the output clusters. Recall is the ratio of words assigned to their actual WN meronymic senses’ correct relations available in the test set. We achieved the following results : precision (89%), recall (86%) and 87,47 %

Table 6. The similarity values of selected noun pairs


for the F-measure.Parameter tuning: the thresholds (α for tail, α for head, β, θ) for the

similarity values (DsimT, DsimH, NsimHT and NsimHC respectively) are tuned according to the following procedure where x refers to the number of ‘negative relations retrieved’ and y to the ‘number of positive relations excluded’.

Suppose a corpus containing six N-N occurrences of which the first three are negative and the remainder are positive examples. Suppose further that each noun has respectively the following values for DsimT (0.2, 0.3, 0.6, 0.8, 0.85, 0.9) and NsimHT (0.1, 0.3, 0.4, 0.45, 0.5, 0.55). In this case our algorithm provides 0.8 and 0.4 for α and β respectively, since all the positive meronymic examples here above have a DsimT equal or greater than 0.8 and a NsimHT greater than 0.4.

We may now get to the last part of this paper. Again, the goal is to build a resource allowing people to overcome the tip-of-the-tongue problem with the help of a computer. Before presenting our ideas of how to go beyond WN, we would like to draw the reader’s attention to the role knowledge plays in this dictionary of the future. To this end we try to clarify what we mean by knowledge and why it is so important in particular for the design of the tool in mind.

Table 7. The tuned values of α, β, θ for the respective approaches

Table 8. Performance evaluation of our approach compared to ADW, LSA and NASARI


Algorithm 1 parameter tuning1. Input DsimT, DsimH, NsimHT2. Output α, β3. α ← ᴓ4. β ← ᴓ5. fm (minimum f-measure) ← ᴓ 6. for DsimT/DsimH =0; DsimT/DsimH <=1; DsimT/DsimH + = 0.057. for NsimHT = 0; NsimHT <=1; NsimHT + = 0.058. x = the percentage of negative relations in the range9. y = the percentage of positive relations outside the range10. fs ← f-measure of x and y11. if (fs<fm)12. fm fs13. α ← DsimT / DsimH14. β ← NsimHT15. return α, β

8. Lexical Access a Knowledge-Based Process

As many other tasks, language (processing) is strongly knowledge-based. Obviously, before being able to use any word we must have acquired it. It is only then that it has become part of our knowledge. Ubiquitous as it may be, lexical knowledge can mean many things.

1° Declarative knowledge is what we acquire when learning words (meaning, form, spelling, grammar, usage), and this is also what is generally encoded in dictionaries.

2° Next, there is metaknowledge which also needs to be acquired. Though hardly ever being taught, it does play a great role in word access. Being generally unavailable for inspection (or introspection), metaknowledge reveals itself nevertheless in various forms. For example:

(a) when we fail to access a word. There are generally two cases. Either we don’t know the word, or we do, but fail to access it when needed. In the first we have not learned the form. Hence, it is normal that we cannot produce it. Despite our ignorance, we do know something, namely that we do not know this word. In general people know which words are part of their vocabulary. For example, we


all know that we don’t know ‘akdelbret’ which, while sounding like a possible word, never entered our lexicon, because we have never encountered it.

Failing to produce a known word, implies different kinds of knowledge. We know not only that we know the word, but we know a lot more than that. People being in the TOT state know for example when and where they’ve used the word for the last time, its first and last syllable, etc. All this is metaknowledge.

(b) via the query we provide when we contacting the lexicon. Unlike in reading where know the word about which we ask for additional information, we cannot assume this to be always the case in speaking or writing. Of course, there are situations where we do know the form in which case we consult the dictionary mainly for confirmation or for additional information (spelling, grammar, usage). The situation is entirely different though if we cannot retrieve the form for which we want additional information. In this case we have to find it first, and this can only be done indirectly, i.e. via a somehow related form.

While in a fully connected lexical graph all words are accessible from anywhere, experience shows that most people provide inputs closely related to the target. Indeed, it is very rare that the distance between the source and the target is greater than 2 or 3 steps. In other words, people have knowledge concerning the organization of their lexicon. This kind of metaknowledge or, in this case, relational knowledge, is nicely exploited by WN which offers the user the possibility to start search from a direct neighbor of the target word, inviting him then to specify via link choice which specific neighbor he is looking for. This second step is necessary, as an input may yield various (direct) neighbors, and since the system cannot guess which one the user is looking for, it is up to him to convey this information by making a choice among the various links (see also section 2.1).

As we have seen, this works quite well (section 2.2), but only because the user knows how words are related. What are the direct neighbors of a given word? Interestingly, users know more than that. They seem to have even some knowledge, vague and subconscious


as it may be, about the neighbors of the neighbors. Indirect neighborhood is evidenced via the fact that in case of failure in the first round– none of the system’s answers corresponding to the user’s goal (target)– the user knows with which of the proposed candidates to continue search. Put differently, he knows which of the proposed candidates is most closely related to the target. Suppose that the entry was ‘black’ and the system answered by ‘night, charcoal, coffee’, then chances are that the user would pick ‘coffee’ if his target were ‘espresso’, as, ‘coffee’ is closer to ‘espresso’ than ‘night’ or ‘charcoal’.

3° The last kind of knowledge, called knowledge state, refers to the knowledge that is activated at a given point in time, in our case, the very moment of launching a search. What has been primed? What is available in the user’s mind? Not all information stored in our mind is equally available or prominent anytime. The fact peoples’ knowledge states vary is important, as it co-determines the way a user proceeds in order to find the information he is looking for. This being so, it is important to be taken into consideration by the system designer.

9. WordNet and Beyond

9.1 Limitations of WN

Having shown, how a subset of WN could be produced automatically, we will now get back to word access, our starting point. Lexical access means basically the following : reduce the entire lexicon to one (target), whatever the size of the resource may be. Obviously, this kind of reduction should be performed quickly and naturally, requiring as little time and effort (minimal number of steps) as possible on the user’s side.

We have touched upon the issue of lexical access for language production, discussing the conditions under which WN is a good resource for consultation. We found that it was actually very good under certain conditions, implying, that there are also cases where it will not provide an optimal solution. This is likely to occur when :

(a) The input (‘play’) and the target (‘tennis’) belong to different parts of


speech. While Roget groups under one heading all concepts typically used in a given domain, WN scatters them throughout the resource. “Tennis players are in the noun.person file, tennis equipment is in noun.artifact, the tennis court is in noun location, the various strokes are in noun.act, and so on.” (Miller in Fellbaum, 1998: 34). Chaffin called this the ‘tennis problem’ (Fellbaum, 1998: 10). The fact that typically related words are not contained in WN as such produces a number of side-effects. In particular, it limits severely WN’s potential to reveal a word, even though it contains it. Indeed, due to the absence of syntagmatic links we can use as (query) words only those belonging to the same POS category as the target word. Hence, we cannot use ‘eat’ or ‘yellow’ to access ‘knife’ or ‘banana’. Yet, in both cases the noun is frequently associated with the verb (‘eat’) or the adjective (‘yellow’). For example, ‘banana’ and ‘yellow’ have the following rankings in E.A.T. (http://www.eat.rl.ac.uk/cgi-bin/eat-server), an association thesaurus built in Edinburgh: banana-yellow (3d position); yellow-banana (5th position). The problem could be overcome if ‘eat’ or ‘yellow’ were contained in the glosses of ‘knife’ or ‘banana’ which in our example is not the case. Storage does not guarantee access, neither for humans (see the TOT studies (Brown & McNeill, 1966), nor for machines. Indeed, we may fail to find a word in WN even though it contains it and even though the input is clearly a related word in real world. For example, we tried ‘wine’ and ‘harvest’ to find ‘vintage’, but we failed, while we succeeded easily via a lexical resource bootstrapped from Wikipedia because of its numerous syntagmatic links (For details see Zock & Schwab, 2016). Hence, even though close, we may fail to find the target. As result, we may turn in circles and end up forgetting our goal, as we sometimes do by using Google. Not knowing where we are, nor which direction to go, we may be forced to give up sooner or later. The absence of syntagmatic links also reduces considerably the number of words among which to choose, as well as the number of words via which a target can be accessed. In consequence, an input will generally yield only a small set of words among which to choose. This set may be too small to contain the target or any of its


direct neighbors so that we could proceed from there. Of course, the WN community is very well aware of these problems, but despite the number of proposals made to overcome the ‘tennis problem’,19 more work is needed.

(b) The source (input) and the target are only indirectly related, the distance between the two being bigger than 1. This would be the case when the target (‘Steffi Graf’) cannot be found directly in response to some input (‘tennis player’), but only via an additional step, say, ‘has instance’: ([tennis player] → [has instance]) → [Steffi Graf]). Note that this is a potential problem for any association network. Note also that, even though Named Entities (NEs) are generally not contained in a lexicon, some of them have made it into WN. This is the case for some sports champions.

(c) The prime and the target are linked via a syntagmatic association (‘smoke’-’cigar’). Since the majority of relations used by WN connect words from the same part of speech, word access is next to impossible if the output (target) belongs to a different part of speech than the input (prime). See also point ‘a’, here above.

(d) The user ignores the link, he cannot name it, or the link is not part of WN’s repertory. This holds true (at least) for nearly all syntagmatic associations. Note that this problem does not arise if navigation is possible via other criteria, for example, general categories as in Roget’s Thesaurus. See also our proposal further below.

Since our network (sections 5-6) has basically the same characteristics as WN, it is subject to the same kind of criticisms. This being so, let us see how to go beyond this. To this end we present here briefly a navigational tool meant to help authors to overcome the TOT problem.

9.2 Some Ideas of How to Go Beyond WN

To find something deliberately at least two conditions must be fulfilled: existence and findability.

19 Gliozzo & Strapparava, 2008; Boyd-Graber et al. 2006; Rayson et al. 2004; Lofberg et al. 2004; Steuten et al. 2001; Agirre et al. 2000; Harabagiu & Moldovan, 1998; Hirst & St-


The first refers to the fact that the target one is looking for exists. Did we learn and store the word we are looking for? Humans acquire vocabulary incrementally. Yet, lexical competency is never complete. Hence, some words may be unavailable at the moment of search. We start here from the assumption that the word seeker has acquired the word he is looking for. He just is not able to access it at the very moment of speaking or writing.

The second condition, findability, refers to the fact that one has an effective method of organizing and indexing words in order to find them when needed. Existence does not guarantee findability. Something may be there, but we cannot find it. One reason for this may be the fact that we are drowned in information. The target is there, but hidden like a needle in a haystack. Think of the endless flat lists that you get when searching in Google. Another reason of our failure may be due to the fact that we receive too much noise, i.e. information unrelated to our goal. Again, browsing the web can be an example illustrating our case.

One other important factor contributing to the success of search is knowledge, or, more precisely, the cognitive states we are in when initiating search. This point is important as it co-determines the strategy we decide upon in order to carry out search. In what (query) terms ask the question? How to describe the problem?, etc. The TOT problem nicely illustrates the variability of knowledge states. As studies have shown (Brown, 1991), people being in this state always know something concerning the target word, but what is known varies considerably: the meaning, form (syllables: beginning or end), origin, related words, etc.

The TOT-problem is a bit like a nearly completed puzzle. It has everything apart from some elements that are still missing. In the case of a puzzle we can provide these parts if ever we know the complete picture. Otherwise we are condemned to trial and error. In the case of wordfinding the situation is different. We cannot provide the missing parts of a word whose form is eluding us. Not being able to name the target,20 we do not know what the full picture looks like. This being so we cannot provide the

Onge, 1998; Al-Halimi & Kazman, 1998; Bentivogli et al. 2004. See also the work of this group in Trento (http://wndomains.fbk.eu) who mapped WordNet Domains to Wikipedia categories.20 Because, if we could, there wouldn't be a search problem to begin with.


lacking part(s), all the more as we don’t even know which parts precisely are missing.

A similar problem arises for connectionist networks. Psychologists working within this paradigm ascribe word-access failure to lack of energy of some links (Levelt et al. 1999; Dell, 1986). Yet, just as in the case of our word puzzle where we cannot provide the missing parts, we cannot ‘heat’ the energy lacking links of a word built according to connectionist principles. In both cases we ignore the final result (target), as this is precisely what we are looking for. Hence we do not know what to supply or what links, i.e. what phoneme or sequence of phonemes, to heat.

Psychologists studying the mental lexicon have used connectionist networks to simulate human performance (speed, accuracy, i.e. errors). The results are truly impressive. Alas, the method seems to have certain flaws, making it inapplicable to our goal. Starting from the assumed result (target word), psychologists have tuned their systems accordingly. Yet, how can one assume the target to be known, while pretending to find it? Another reason why we cannot use this approach is linked to representation. Connectionists represent data via numbers (0-1) which are generally meaningless for people who prefer symbols. It is via a category name that we know that the bag called ‘fruit’ may contain ‘pears’, ‘bananas’ or ‘apples’, but certainly not ‘cutlery’, ‘pens’, or ‘shoes’.

This being said, even if we cannot transpose directly the methods used by psychologists, functionally speaking (word finding) we can achieve something similar. Note our focus is not on the time course of lexical access, but rather on navigation in a lexical network. Hence access is deliberate rather than automatic, it is interactive and fairly slow. Let us now turn to describe certain features of our tool for the future. The goal is to help humans to overcome the TOT problem. To this end we plan to build a map, i.e. a densely populated associative network, and a tool to support navigation within the map, i.e. a categorial tree. Both are meant to be built automatically.

As just mentioned, the map is basically an association network. Words (nouns, verbs, adjectives) are connected if they evoke each other. Put differently (directly) associated terms are (direct) neighbors. Since this process is recursive– our neighbors having other neighbors ‘who’ in turn


have also different neighbors, etc.– we end up with a huge, fully connected lexical graph. This connectivity has various consequences. Since all words are connected, we can enter the graph at any point, all words (including the target) being accessible from anywhere, at least in theory (see below).

Flexibility is a crucial feature of our future resource. It is important, as there is no way to predict the user’s knowledge state which may vary considerably and in unpredictable ways with respect to a given target. It varies from person to person and from moment to moment, either because of topic changes in discourse (bank: geography vs finance) or because of changes in real-world.21 Just think of some striking event that has happened in the recent past (Greece, Syria, Charly/Paris). Given this fluctuation and the high unpredictability of knowledge states, it advisable not to build systems with a single (universally valid) starting point.22 Quite to the contrary. What we need is a resource allowing the user to initiate search from any point, providing whatever information is available at that very moment. While the map (association network) defines the space within which search takes place, the input (starting point) reveals our knowledge available at the onset of search.

Obviously, for words to be accessible they must be part of the network, but this is not enough. Findability requires existence on the resource side and quality of input on the user’s side. Not all queries are equal. Direct neighbors are better query terms than unrelated or only remotely related terms. Another point is population. The denser the graph, i.e. the greater the number of words linking to a target, the greater its accessibility. Again, not all links are equal. For an association to be useful it must be shared by a large number of users.

21 This being so, associations change accordingly. Just think of the ideas associated with Dominique Strauss-Kahn, one of the top candidates during the last presidential campaign in France. While the associations prior to May 18, 2011 were probably IMF, Anne Sinclair, or politics, the ones after the Sofitel event were probably quite different, shifting from the initially primed 'election' towards a similar sounding, though, semantically speaking quite different word.22 For example, a single lexical tree, covering all the words of a language. This kind of approach has been attempted in the early days of language generation (Goldman, 1975). For a criticism, see (Zock et al. 2010).


Since our graphs are dense, most nodes being connected with many other nodes, we need a tool to travel within this space without losing sight of our goal (loops), while keeping things under control and to get as quickly as possible from some input to desired target word (final output). The tool we will present further below is a categorial tree, which plays the same role as signposts do in a city or on the road: they help the user to decide on the direction to go.

9.3 The Roadmap

Let us now see quickly how to make this idea work. Imagine an author wishing to convey the name of a beverage commonly found in coffee shops (target : ’mocha’). Failing to do so, he reaches for a lexicon. Since dictionaries are too huge to be scanned from cover (letter A) to cover (Z), we suggest a dialog between the user and the computer to reduce incrementally the search space.

At the engineering level two steps need to be performed : (a) creation of a resource within which search takes place (semantic network, i.e. association network), and (b) method of organizing the list of words given in response to the user’s input. This second step is vital in order to speed up navigation.

The dialogue between the user and the system works as follows. The process is initiated via some input (query),23 which leads to the activation of a list of directly associated words. Since this list is too big to be scanned from beginning to end, we propose to present it as a categorial tree whose leaves are the words directly associated with the input (potential target words), and whose nodes are the category names of the words they subsume: ‘cats’ and ‘dogs’ are subsumed under the category ‘animal’, ‘tree’ and ‘flower’ under ‘plants’, etc. For example, the following input {Afghanistan, Africa, Asia, birds, black, brown, China, cows, curry, dog, Japan, Pakistan, Persia} could yield the following output : color (black, brown, curry); animals (cows, birds, dog); countries (Afghanistan, China, Japan, Pakistan, Persia); continents (Africa, Asia). Note that, unlike WN

23 This latter can be a single word —'coffee' in the case of target 'mocha'— or a set of words, which in a normal communicative setting would yield a sentence, where the information seeker asks someone else to help him to find the elusive word.


which refrains from using hypernyms for adjectives, we use the term ‘color’ as a category name.

As one can see in figure 5 (next page), the process consists basically in three steps : (a) user input (query), (b) system output (answer), (c) user’s choices concerning the target (does the list contain it?), or, choice of the word to continue search with. Concretely speaking this leads to the following kind of dialogue. The user starts by providing her input, that is, any word coming to her mind, word somehow connected to the target (U1, figure 5).24 The system presents then in a clustered and labeled form (categorial tree) all direct associates (I2, figure 5).25 The user navigates in this tree, deciding on the category within which to look for the target(U2-3, figure 5), and if he cannot find it in any of them, in what direction to go (U4, figure 5). If he could find the target, search stops, otherwise the user will pick one of the associated terms or provides an entirely new word and the whole process iterates. The system will come up with a new set of proposals.

As one can see, this method is quite straight-forward, reducing considerably time and space needed for navigation and search. Suppose that you had to locate a word in a resource of 50.000 words. If your input triggered 100 direct associates, one of them being the target, then we would have reduced in a single step the search space by 99,8%, limiting navigation and search to a very small list. Suppose that our hundred words were evenly spread over 5 groups, than search would consist in spotting the target in a list of 25 items: 5 being category names and 20 being words within the chosen group.

24 Note, that in order to determine properly the initial search space (step-1), we must have already well understood the input [mouse1/mouse2 (rodent/device)], as otherwise our list will contain a lot of noise, presenting 'cat, cheese' together with 'computer, mouse pad' {cat, cheese, computer, mouse pad}, which is not quite what we want, since some of these candidates are irrelevant, i.e. beyond the scope of the user’s goal.25 This labeling is obligatory to allow for realistic navigation, as the list produced in response to the input may be very long and the words being of the same kind may be far apart from each other in the list. Hence it makes sense to structure words into groups by giving them appropriate (i.e. understandable) names so that the user, rather than looking up the entire list of words, searches only within a specific bag labeled by a category.


Figu

re 5

. Lex

ical

acc

ess a

s a th

ree-

step

dial

ogue

(p

rovi

de in

put,

navi

gate

and

cho

ose

then

am

ong

the

poss

ible

out

puts)

Hyp

othe

tical

lexi

con

con

tain

ing

60.0

00 w

ords

Giv

en s

ome

inpu

t (he

re, ‘

coffe

e’) t

he s

yste

m d

ispl

ays

all d

irect

ly a

ssoc

iate

d w

ords

, i.e

. dire

ct n

eigh

bors

(gra

ph),

orde

red

by s

ome

crite

rion

or n

ot

Tree

des

igne

d fo

r na

viga

tiona

l pu

rpos

es (

redu

ctio

n of

sea

rch-

spac

e).

The

leav

es c

onta

in p

oten

tial t

arge

t w

ords

and

the

node

s th

e na

mes

of

thei

r ca

tego

ries,

allo

win

g th

e us

er to

look

onl

y un

der t

he re

leva

nt p

art o

f the

tree

. Si

nce

wor

ds a

re g

roup

ed in

nam

ed c

lust

ers,

the

user

doe

s no

t hav

e to

go

thro

ugh

the

who

le li

st o

f wor

ds a

nym

ore.

Rat

her h

e na

viga

tes

in a

tree

(top

-to

-bot

tom

, lef

t to

right

), ch

oosi

ng Þ

rst t

he c

ateg

ory

and

then

its

mem

bers

, to

chec

k w

heth

er a

ny o

f the

m c

orre

spon

ds to

the

desi

red

targ

et w

ord.

(E.A

.T, c

ollo

catio

nsde

rived

from

cor

pora

)

Cre

ate

+/or

use

asso

ciat

ive

netw

ork

C :

Cate

goria

l Tre

eB:

Red

uced

sea

rch-

spac

eA

: Ent

ire le

xico

nD

: C

hose

n w

ord

Target word

‘mocha’

Clu

ster

ing

+ la

belin

g

1° v

ia c

ompu

tatio

n2°

via

a re

sour

ce3°

via

a c

ombi

natio

n o

f res

our-

c

es : W

N,

Roge

t, N

Es,

etc.

Step

-2: s

yste

mSt

ep-1

: sys

tem

Prov

ide

inpu

t: ‘c

offe

e’

Step

-1: u

ser

1° n

avig

ate

in th

e tre

e;2°

det

erm

ine

whe

ther

it c

onta

ins

the

targ

et o

r a re

late

d w

ord.

3° d

ecid

e on

the

next

act

ion:

con

tinue

/ sto

p.

Nav

igat

ion

+ ch

oice

Step

s-2-

4: use

r

A B C . . K . M . . N . . . . . . . . . . Z

abac

us

zeph

yr

moch

ata

rget

wo

rd

evok

ed

term

coff

ee

. .

BISC

UIT

S 1

0.01

BITT

ER 1

0.0

1D

ARK

1 0

.01

DES

ERT

1 0.

01D

RIN

K 1

0.0

1FR

ENCH

1 0

.01

GRO

UN

D 1

0.0

1IN

STA

NT

1 0.

01M

ACH

INE

1 0.

01MOCHA

1 0

.01

MO

RNIN

G 1

0.0

1M

UD

1 0

.01

NEG

RO 1

0.0

1SM

ELL

1 0.

01TA

BLE

1 0.

01

TEA

39 0

.39

CUP

7 0.

07BL

ACK

5 0

.05

BREA

K 4

0.0

4ESPRESSO

40.

0.4

POT

3 0.

03CR

EAM

2 0

.02

HO

USE

2 0

.02

MIL

K 2

0.0

2CAPPUCINO

20.

02ST

RON

G 2

0.0

2SU

GA

R 2

0.02

TIM

E 2

0.02

BAR

1 0.

01BE

AN

1 0

.01

BEV

ERA

GE

1 0.

01

PASTRY

set o

f wor

dses

pres

soca

ppuc

cino

mocha

DRINK

set o

f wor

ds

Categorial t

ree

TASTE

COLOR

FOOD

Pre-

proc

essi

ng

1° A

mbi

guity

det

ectio

n vi

a W

N2°

Inte

ract

ive

disa

mbi

guat

ion:

coffe

e: ‘b

ever

age’

/ ‘c

olor

’ ?

1° A

mbi

guity

det

ectio

n vi

a W

N2°

Dis

ambi

guat

ion:

via

clu

ster

ing

Post

-pro

cess

ing

S1 U1

U2

S2

I1I2

S: system

, U: user,

I: interaction

moch

a


9.4 Some Ideas and Challenges for Building this Resource

Concerning step-1, we could use the E.A.T. (Edinburgh Association Thesaurus) to respond to the input via all directly associated terms. This would considerably reduce the initial search space, the entire lexicon. Obviously, other resources than the E.A.T. are possible, and this is precisely one of the points we would like to experiment with in the future: check which knowledge source (corpus, association thesaurus, lexical resource) produces the best set of candidates, i.e. the best search space and the best structure to navigate.

Since our plan is to extract associations from corpora, it is important to note that we are looking for typically associated (related/evoked) terms. Hence, extracting all co-occurrences of a given corpus is not a good solution. Take for example the Wikipedia page concerning ‘pandas’ (https://en.wikipedia.org/wiki/Giant_panda). Of all the words contained in this document we are interested only in “bear, China, bamboo; black and white patches, diplomatic gift; ....”, as they are likely to evoke the word panda. The rest is mainly noise. As one can see, only very few words are really relevant for our task, and the challenge is to make sure to get those precisely.

Concerning the output, the challenge is mainly how to cluster terms and how to give them appropriate names, i.e. names ordinary people can understand. For example, WN lists the following sequence of hypernyms for horse (horse → equine → odd-toed ungulate → ungulate → placental mammal → mammal → vertebrate → chordate → animal → organism → entity). While this list captures many of the phylogenetic details a biologist would want to see recorded, most of these terms mean next to nothing to an ordinary dictionary user. Metalanguage is a definitely a problem, and this is also one of the reasons why we cannot rely on Roget’s Thesaurus, which otherwise is fairly close to what we want.

While the creation of the network (system, step-1) and the navigational tools (system-, step-2) are not trivial tasks, the solution of the latter is certainly more of a challenge than the former (step-1) which is solved to a large extent (see the work the extraction of co-occurrence lists). Again, to put words into clusters is one thing, to give them names an ordinary dictionary user can understand is quite another. Yet, arguably this is a


crucial step, as it allows the user to navigate on this basis. Of course, one could question the very need of labels, and perhaps this is not too much of an issue if we have only say, 3-4 categories. We are nevertheless strongly convinced that the problem is real, as soon as the number of categories (hence the words to be classified) grows.

To conclude, we think it is fair to say that the first stage is within reach, while the automatic construction of the categorical tree remains a true challenge despite the vast literature devoted to this topic or to strongly related problems (Zhang et al., 2012; Biemann, 2012; Everitt et al., 2011).

10. Conclusion

We have started the paper by asking whether WN could be used to support authors (speakers, writers) to find an eluding word. It appears that, even though not really being conceived for this task, it is quite good under certain circumstances. Next we described a method to build automatically a subset of WN, by considering meronyms. In the last section we discussed the conditions under which WN is not very good, presenting a proposal of how to go beyond it. The idea was to build a semantic map automatically, i.e. a densely populated association network (step-1) and a categorial tree (step-2) based on the outputs produced in response to some input (query).

The main challenges ahead of us are the extraction of salient co-occurrences in a corpus, i.e. words likely to evoke a potential target word (step-1). For example, terms like ‘bear, China, bamboo; black and white patches, diplomatic gift’, etc. seem very appropriate in order to evoke the word ‘Panda’.

Since any input (query) is likely to yield many outputs (think of Google), the user may get drowned if we present this result as a flat list. To make the task more manageable we propose to cluster the outputs and to give names to each group, allowing thus the user to navigate in a categorial tree rather than in a huge unstructured list of words. This reduces the cognitive load while increasing the efficiency of control and search.


References

Agirre, E., Ansa, O., Martinez, D., & Hovy, E. 2000. Enriching very large ontologies with topic signatures. Proc. of the workshop on Ontology Learning held in conjunction with ECAI .

Aitchison, J. 2003. Words in the Mind: an introduction to the mental lexicon. Oxford, Blackwell.

Al-Halimi, R., & R. Kazman. 1998. Temporal indexing through lexical chaining. In C. Fellbaum (ed.),WordNet: an electronic lexical database. Cambridge, MA:MIT. Press, 333–51.

Atserias J., Climent S., Farreres X., Rigau G., & Rodriguez H. 1997. Combining multiple methods for the automatic construction of multilingual WordNets. Proceedings of the International Conference "Recent Advances on Natural Language Processing" RANLP'97, Tzigov Chark, Bulgaria

Auger, A., & Barrière, C. 2008. Pattern-based approaches to semantic relation extraction : A state of-the-art. Introduction to the Special issue of Terminology. 14:1, John Benjamins, Amsterdam

Bach, N., & Badaskard, S. 2007. A Review of Relation Extraction. http://citeseerx.ist.psu.edu/viewdoc/ sum-mary?doi=10.1.1.188.7674

Barbu, E., Barbu Mititelu, V., & Dei Molini, S. 2007. Automatic building of Wordnets. In Proceedings of Recent Advances in Natural Language Processing IV. John Benjamins, Amsterdam. pp. 217-226

Baroni M., & Lenci, A. 2010. Distributional Memory: A General Framework for Corpus-Based Semantics. Computational Linguistics, 36 (4), pp. 673-721

Beamer, B., Rozovskaya, A., & Girju, R. 2008. Automatic semantic relation extraction with multiple boundary generation. Association for the Advancement of Artificial Intelligence.

Beckwith, R., Fellbaum, C., Gross, D., & Miller, G. A. 1991. WordNet: A lexical database organized on psycholinguistic principles. Lexical acquisition: Exploiting on-line resources to build a lexicon, 211-231.

Benson, M., Benson, E., & Ilson, R. 2010. The BBI Combinatory dictionary of English. John Benjamins, Philadelphia.

Bentivogli, L., Forner, P., Magnini, B., & Pianta, E. 2004, August. Revising the wordnet domains hierarchy: semantics, coverage and balancing. In Proceedings of the Workshop on Multilingual Linguistic Ressources (pp. 101-108). Association for Computational Linguistics.

Bernstein, T. 1975. Bernstein’s Reverse dictionary. Crown, New York.Biemann, C. 2012. Structure discovery in natural language. Springer.


Bond, F., Isahara, H., Kanzaki, K., & Uchimoto, K. 2008. Boot-strapping a WordNet using multiple existing WordNets. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of LREC. pp. 1619-1624. Marrakech.

Borin, L. Forsberg, M., & Lönngren, L. 2013. SALDO: a touch of yin to WordNet’s yang. Language Resources & Evaluation, 47, pp. 1191–1211

Boyd-Graber, J., Fellbaum, C., Osherson, D., & Schapire, R. 2006. Adding dense, weighted connections to WordNet. In Proceedings of the third international WordNet conference (pp. 29-36).

Brown, A. 1991. A review of the tip of the tongue experience. Psychological Bulletin, 10, 204-223

Brown, R., & McNeill, D. 1966. The tip of the tongue phenomenon. In: Journal of Verbal Learning and Verbal Behaviour, 5:325-337.

Brown, R., & Berko, J. 1960. Word association and the acquisition of grammar. Child Development, 1-14.

Bullinaria, J.A & Levy, J. 2007. Extracting Semantic Representations from Word Co-occurrence Statistics: A Computational Study. Behavior Research Methods, 39, 510-526.

Bullinaria, J.A., & Levy, J.P. 2012. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behavior Research Methods, 44(3):890-907.

Camacho-Collados, J., Pilehvar, M.T., & Navigli, R. 2015. NASARI: a Novel Approach to a Semantically-Aware Representation of Items, Human Language Technologies: The Annual Conference of the North American Chapter of the ACL, pp 567–577, Denver, Colorado, USA.

Caramazza, A., & Berndt, R. S. 1978. Semantic and Syntactic Processes in Aphasia: A Review of the Literature. Psychological Bulletin 85: 898-918.

Collins, A. M., & Quillian, M. R. 1969. Retrieval Time From Semantic Memory. Journal of Verbal Behavior and Verbal Learning 8: 240-247.

Dagan, I., Lee, L., & Pereira, F. 1999. Similarity-based models of word cooccurrence probabilities. Machine Learning 34(1–3):43–69.

Davies, M. 2011. N-grams and word frequency data from the Corpus of Historical American English (COHA).

Deese, J. 1965 The structure of associations in language and thought. BaltimoreDell, G.S. 1986. A spreading-activation theory of retrieval in sentence production.

Psychological Review, 93, 283-321. Diab, M., & Resnik, P. 2002. An unsupervised method for word sense tagging using

parallel corpora. In Proc. of ACL.Edmonds, D. editor. 1999. The Oxford Reverse Dictionary. Oxford University

Press, Oxford, Oxford.


Evert S. 2004. The Statistics o f Word Co-occurrences: Word Pairs and Collocations, Ph.D. thesis, University of Stuttgart.

Ervin, S. 1961. Changes with age in the verbal determinants of word association. American Journal of Psychology 74: 361–72.

Everitt, B., Landau, S., Leese, M., & Stahl, D. 2011. Cluster analysis. John Wiley and Sons.

Farreres, X., Rigau, G., & Rodriguez, H. 1998. Using WordNet for Building WordNets. In: Proceedings of COLING-ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada

Fellbaum, C. editor. 1998. WordNet: An electronic lexical database and some of its applications. MIT Press.

Fillenbaum, S., & Jones, L. V. 1965. Grammatical Contingencies in Word Association. Journal of Verbal Learning and Verbal Behavior 4: 248-255.

Finin, T. 1980. The semantic interpretation of compound nominals. Ph.D. Dissertation, University of Illinois at Urbana-Champaign.

Firth, J.R. 1957. A synopsis of linguistic theory 1930-1955. In Studies in Linguistic Analysis, pp. 1-32. Oxford: Philological Society.

Fiser, D. 2009. Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet. Z. Vetulani & H. Uszkoreit, Human Language Technology. Challenges of the Information Society. LNAI 5603, Springer, 359-368.

Fiser, D., & Sagot, B. 2008. Combining Multiple Resources to Build Reliable Wordnets. In Proceedings of Text, Speech and Dialogue (LNCS 2546), Springer 2008, pp. 61--68, Berlin: Heidelberg.

Frank, A., & Padó, S. 2012. Semantics in computational lexicons. In Maienborn, C., von Heusinger, K.and Portner, P. (eds.) 2012, Semantics (HSK 33.3), de Gruyter, pp. 63-93

Garrett, M. F. 1982. Production of Speech: Observations from Normal and Pathological Language Use. In A. Ellis (ed.). Normality and Pathology in Cognitive Functions. London: Academic Press.

Gerstl, P. & Pribbenow, S. 1995. Midwinters, end games, and body parts: a classification of part-whole relations. International Journal of Human-Computer Studies, 43, 865-889.

Girju R., Moldovan D., Tatu, M., & Antohe, D. 2005. Automatic discovery of Part–Whole relations. ACM 32(1)

Girju, R., Hearst, M., Nakov, P., Nastase, V., Szpakowicz; S., Turney, P., & Yuret D. 2007. Classification of semantic relations between nominals: Dataset for Task 4 in SemEval, 4th International Workshop on Semantic Evaluations, Prague, Czech Republic.

Gliozzo, A., & Strapparava, C. 2008. Semantic domains in computational


linguistics. Springer.Goldman, N. M. 1975. Conceptual generation. In Schank, R. (ed). Conceptual

information processing. Elsevier, North Holland, pp. 289-371.Hage, W., Kolb, H., & Schreiber, G. 2006. A method for learning part-whole

relations. TNO Science & Industry Delft, Vrije Universiteit Amsterdam.Harabagiu, S., & Moldovan, D. 1998. Knowledge processing on an extended

wordnet. WordNet: An electronic lexical database, 305, 381-405.Harris, Z. 1954. Distributional structure. Word 10 (23), 46–162.Harshman, R. 1970. Foundations of the parafac procedure: Models and conditions

for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, 16.

Hearst M. A. 1998. Automated discovery of WordNet relations. In Fellbaum, C. (Ed.) WordNet: An Electronic Lexical Database and Some of its Applications, MIT Press. pp. 131-151

Hirst, G., & St-Onge, D. 1998. Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An electronic lexical database, 305, 305-332.

Kahn, J. 1989. Reader’s Digest Reverse Dictionary. Reader’s Digest, London.Kaji, H. 2003. Word sense acquisition from bilingual comparable corpora. In

Proceedings of NAACL.Keet, C.M., & Artale, A. 2008. Representing and Reasoning over a Taxonomy of

Part-Whole Relations. Applied Ontology, 2008, 3(1-2): 91-110Kilgarriff, A., Rychlý, P., Smrž, P., & Tugwell, D. 2004. The Sketch Engine. In:

Williams, G. and S. Vessier (eds.). Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004. Lorient: Université de Bretagne Sud. 105–116.

Lafourcade, M. 2007. Making people play for Lexical Acquisition with the JeuxDeMots prototype. In 7th International Symposium on Natural Language Processing, Pattaya, Chonburi, Thailand.

Lafourcade, M., & Joubert, A. 2015. TOTAKI: A help for lexical access on the TOT Problem. In Gala, N., Rapp, R., & Bel-Enguix, G. éds. (2015), Language Production, Cognition, and the Lexicon. Festschrift in honor of Michael Zock. Series Text, Speech and Language Technology XI. Dordrecht, Springer, pp. 95-112

Landauer, T. K., & Dumais, S. T. 1997. A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.

Lenci, A. 2008. Distributional approaches in linguistic and cognitive research. Italian Journal of Linguistics, 20(1):1–31.

Levelt, W.J.M., Roelofs, A., & Meyer, A.S. 1999. A theory of lexical access in


speech production. Behavioral and Brain Sciences, 22, 1-75. Lofberg, L., Juntunen, J. P., Nykanen, A., Varantola, K., Rayson, P., & Archer, D.

2004. Using a semantic tagger as dictionary search tool. In 11th EURALEX (European Association for Lexicography), pp. 127-134.

Lund, K., & Burgess, C. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, and Computers, 28(2), 203–208.

Matthew, B., & Charniak, E. 1999. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 57–64, University of Maryland.

McDonald, R. 2005. Extracting relations from unstructured text. UPenn CIS Technical Report, MS-CIS-05-06

Meyer, D., & Schvaneveldt, R. 1971. Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. Journal o f Experimental Psychology. 90: 227–234

Miller, G. A. (ed.) 1990. WordNet: An on-Line lexical data-base. International Journal of Lexicography, 3(4), 235-312.

Miller, G. A. 1995. WordNet : A lexical database for English. Communications of the ACM, 38 (11), 39–41.

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. 1990. Introduction to wordnet: An on-line lexical database. International journal of lexicography, 3(4), 235-244.

Morato, J., Marzal, M., Llorens, J., & Moreiro, J. 2004. WordNet Applications. Proceedings of the Second Global WordNet Conference. Pp. 270–278. (http://www.fi.muni.cz/gwc2004/proc/105.pdf)

Navigli, R., & Ponzetto, S.P. 2010. BabelNet: Building a very large multilingual semantic network. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden, pages 216-225.

Paradis, C. 2009. Good and bad opposites: using textual and psycholinguistic techniques to measure antonym canonicity. C. Paradis, C. Willners & S. Jones. The Mental Lexicon, 4.3. 380–429.

Pianta, E., Bentivogli, L., & Girardi, C. 2002. MultiWordNet: developing an aligned multilingual database. In: Proceedings of the First International Conference on Global WordNet, Mysore, India

Piasecki M., Szpakowicz S., & Broda B. 2009. A Wordnet from the Ground Up. Wrocław, Oficyna Wydawnicza Politechniki Wroclawskiej.

Pilehvar, M. T., Jurgens, D., & Navigli R. 2013. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, , Sofia, Bulgaria, pp. 1341–1351


Pinto, D., Rosso, P., & Jimenez-Salazar, H. 2007. Word sense induction using self-term expansion. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), 430–433.

Pribbenow, S. 1995. Parts and Wholes and their Relations. Habel, C. & Rickheit, G. (eds.): Mental Models in Discourse Processing and Problem Solving. John Benjamins Publishing Company, Amsterdam

Quillian, M. R. 1967. Word Concepts: A Theory and Simulation of Some Basic Semantic Capabilities. Behavioral Science 12: 410-430.

Quillian, M. R. 1968. Semantic Memory. In Minsky, M. (ed.). Semantic Information Processing. Cambridge, Mass.: MIT Press.

Rapp, R. 2003. Word sense discovery based on sense descriptor dissimilarity. In: Proceedings of the Ninth Machine Translation Summit, New Orleans, pp. 315–322.

Rapp, R. 2004. A practical solution to the problem of automatic word sense induction. In proceedings of the ACL 2004 on Interactive poster and demonstration sessions.

Rapp, R., & Zock, M. 2012 The design of a system for the automatic extraction of a lexical database analogous to WordNet from raw text. Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data, Coling, Mumbai, India

Rayson, P., Archer, D., Piao, S., & McEnery, A. M. 2004. The UCREL semantic analysis system.

Roget, P. 1852. Thesaurus of English words and phrases. Longman, London.Rosenzweig J., Mihalcea R., & Csomai A. 2007. WordNet bibliography. (http://

wordnet.princeton.edu/wordnet/)Ruiz-Casado, M., Alfonseca, E., & Castells, P. 2005. Automatic Assignment of

Wikipedia Encyclopedic Entries to WordNet Synsets. In Advances in Web Intelligence, Volume 3528 of LNCS, pp. 380–386. Springer Verlag.

Rundell, M., & Fox, G. (Eds.) Fox. 2002. Macmillan English Dictionary for Advanced Learners. Macmillan, Oxford.

Salton, G., & McGill, M. 1983. Introduction to Modern Information Retrieval. McGraw-Hill

Salton, G., Wong, A., & Yang, C.-S. 1975. A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.

Saveski, M., & Trajkovski, I. 2010. Automatic construction of wordnets by using machine translation and language modeling. In T. Erjavec et al. (Eds.), Proceedings of seventh language technologies conference, 13th international multiconference information society. Ljubljana, Slovenia.

Scheible, S., Schulte im Walde, S., & Springorum, S. 2013. Uncovering distributional differences between synonyms and antonyms in a Word Space


Model. In: Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP). Nagoya, Japan, pp. 489-497

Schütze, H. 1993. Word space. In: Hanson, S., Cowan, J. & Giles, C. (eds.). Advances in Neural Information Processing Systems, vol. 5. San Francisco, CA: Morgan Kaufmann. Pp. 895–902.

Stefanescu, D., Banjade, R., & Rus, V. 2014. Latent Semantic Analysis Models on Wikipedia and TASA. The 9th Language Resources and Evaluation Conference (LREC 2014), 26-31 May, Reykjavik, Iceland

Steuten, A. A., Dehne, F., & van de Riet, R. P. 2001. WordNet++: A Lexicon Supporting the Color-X Method. In Bouzeghoub, M., Kedad, Z. and Métais, E. (Eds.). Natural Language Processing and Information Systems. Springer Berlin Heidelberg, pp. 1-16

Summers, D. 1993. Language Activator: the world’s first production dictionary. Longman, London.

Tufis, D. 2000. BalkaNet - Design and Development of a Multilingual Balkan WordNet. Romanian Journal of Information Science and Technology Special Issue 7(1-2)

Turney, P., & Pantel, P. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1):141–188.

Turney, P.D. 2006. Similarity of Semantic Relations. Computational Linguistics, 32(3), 379–416.

Vetulani, Z., Walkowska, J., Obrębski, Marciniak, J., T., Konieczka, P., & Rrzepecki P. 2009: An Algorithm for Building Lexical Semantic Network and Its Application to PolNet - Polish Wordnet Project, in: Z. Vetulani & H. Uszkoreit, Human Language Technology. Challenges of the Information Society. LNAI 5603, Springer, 369-381.

Vieu, L., & Aurnague, M. 2007. Part-of relations, functionality and dependence. In Aurnague, M., Hickmann, M. and Vieu, L. (eds.). The Categorization of Spatial Entities in Language and Cognition, pp. 307–336. J. Benjamins, Amsterdam

Vitevitch, M. S. 2008. What can graph theory tell us about word learning and lexical retrieval? Journal of Speech, Language, and Hearing Research, 51, 408–422.

Vitevitch, M., Goldstein, R., Siew, C., and Castro, N. 2014. Using complex networks to understand the mental lexicon. Yearbook of the Poznań Linguistic Meeting 1, pp. 119–138

Vossen, P. (ed.) 1998. EuroWordNet: a multilingual database with lexical semantic networks for European Languages. Kluwer, Dordrecht

Widdows, D. 2004. Geometry of Meaning. University of Chicago Press. Winston, M., Chaffin, R., & Hermann, D. 1987. Taxonomy of part-whole relations.


Cognitive Science, 11(4), 417–444.Wittgenstein, L. 1922. Tractatus Logico-Philosophicus. London: Routledge &

Kegan PaulZhang, Z., Gentile, A., & Ciravegna, F. 2012. Recent advances in methods of lexical

semantic relatedness – a survey. Journal of Natural Language Engineering, Cambridge Universtiy Press, 19(4):411–479.

Zock, M., & Schwab, D. 2016. WordNet and beyond. In Fellbaum, C., Vosse, P., Mititelu, V. and D. Forăscu (Eds.). 8th International Global WordNet conference. Bucharest (http://gwc2016.racai.ro)

Zock, M., Ferret, O., & Schwab, D. 2010. Deliberate word access : an intuition, a roadmap and some preliminary empirical results. In A. Neustein (Ed.) International Journal of Speech Technology, 13(4), pp. 107-117. Springer Verlag.

automatic creation of a semantic network encoding part of...

Documents