04 classifying semantic relations in german...

Classifying Semantic Relations in German Nominal Compounds using a Hybrid Annotation Scheme

Daniil Sorokin, Corina Dima and Erhard Hinrichs

University of Tübingen, Germany {firstname.lastname}@uni-tuebingen.de

This paper reports on novel results for the automatic classification of semantic relations that hold between the constituents of nominal compounds in German. It utilizes a hybrid annotation scheme that models semantic relations using a combination of prepositional paraphrases and semantic properties. The machine learning (ML) experiments use the support vector machine (SVM) implementation in Weka for single-label prediction tasks and Weka SVMs in conjunction with the Mulan library for multi-label prediction.

Keywords: Classification of Semantic Relations, Compound Analysis, Multi-Label Classification, Support Vector Machines

1. Introduction

The fact that the interpretation and generation of compound nouns pose a major challenge for natural language processing has been known for quite some time (Sparck Jones, 1985; Sag et al. 2002). This challenge is particularly evident for languages like English and German, where compounding is a highly productive process of word formation. Apart from splitting the compound into its constituent parts, recognizing the semantic relation that holds between the constituent parts is key to the correct understanding of compounds for humans and machines alike.

Journal of Cognitive Science 16-3: 261-286, 2015Date submitted: 03/12/15 Date reviewed: 05/04/15 Date confirmed for publication: 09/07/15©2015 Institute for Cognitive Science, Seoul National University

262 Daniil Sorokin, Corina Dima and Erhard Hinrichs

Table 1. Germanic compounds and their paraphrase-based translations into Romance languages.

Germanic languages Romance languages

English German French Romanian

children’s bookbook for children

KinderbuchBuch für Kinder

―livre pour les

enfants

―carte pentru copii

tennis ballball for tennis

TennisballBall für Tennis

―balle de tennis

―minge de tenis

summer vacationvacation in summer

SommerferienFerien im Sommer

―vacances d’été

―vacanţă de vară

tree househouse in a tree

BaumhausHaus im Baum

―cabane dans

l’arbre

―casa din copac

Popović et al. (2006) have pointed out that splitting German compound nouns into their constituent parts is essential for improving statistical machine translation involving German as a source or target language. Nakov (2008) has shown that using paraphrases to characterize the meaning of compounds improves SMT when translating a Germanic language to a Romance language, where compounding crucially involves the use of prepositions connecting the constituent parts of the corresponding nominal compound in the Germanic language. Table 1 shows examples of this kind for the Germanic languages English and German and for the Romance languages French and Romanian.

The prepositional paraphrase Buch für Kinder ‘book for children’ in the Germanic source language helps in aligning the corresponding compound Kinderbuch to its translations into French and Romanian. The same preposition for also leads to the most natural paraphrases of the compound Tennisball in English and German, namely Ball für Tennis ‘ball for tennis’. Notice, however, that while the preposition for was used for paraphrasing both compounds above in English and German, two different prepositions had to be used for the translation into French and Romanian: pour/pentru and de respectively. The compounds Sommerferien ‘summer vacation’ and

263Classifying Semantic Relations in German Nominal Compounds

Baumhaus ‘tree house’ exhibit the same pattern: both can be paraphrased in German and English by the use of the preposition im/in, while in the translations in Romance languages there is a split between the locative sense of these prepositions, as in casa din copac and when they are used to refer to the time span when a certain event occurs, as in vacances d’été. These findings have direct implications for the semantic analysis of compounds in the context of natural language processing. The examples in Table 1 suggest that the semantics of compounds should be rendered both by prepositional paraphrases and by reference to a set of semantic properties. The prepositional paraphrases are necessary for the multilingual alignment of compounds, but are not always sufficient to uniquely determine the correct translation in the target language. A similar argument can be made for monolingual NLP tasks such as natural language generation, which require a mapping from an abstract meaning representation to a suitable linguistic realization. However, practical experiments in machine translation and natural language generation are beyond the scope of the current paper. We appeal to these NLP applications only insofar as they provide justification for the novel annotation scheme for nominal compounds presented in this paper.

Information retrieval is another example of a natural language processing problem where a syntactic and semantic analysis of compounds is helpful (Karlgren, 2005; Nastase et al. 2013). Pedersen (2007) demonstrates that the analysis of the internal structure of compounds combined with paraphrasing can improve search results for queries containing compounds.

The purpose of the present paper is twofold: (i) to present an annotation scheme for nominal compounds that builds on the insights to be gained from applications like machine translation in that it classifies nominal compounds in terms of prepositional paraphrases and semantic properties; (ii) to present a set of machine learning experiments that make use of this hybrid annotation scheme and that demonstrate the disambiguation potential of both prepositions and semantic properties on a dataset of German noun-noun compounds.

The remainder of this paper is structured as follows: Section 2 situates the present research with respect to the state of the art in compound annotation and in automatic classification of semantic relations for compounds. The


hybrid annotation scheme is introduced in Section 3; Section 4 describes the German dataset used in the ML experiments, motivates the choice of features used and describes the experimental setup; Section 5 presents the results of the single-label and multi-label experiments, which are then discussed in more detail in Section 6; the overall conclusions are presented in Section 7.

2. Related Work

2.1 Annotation Schemes

Several annotation schemes have been proposed for the semantics of compounds in theoretical and computational linguistics. Levi (1978) devises a predicate-based annotation scheme for compound-internal relations, according to which a compound can be formed either via predicate deletion using a fixed set of predicates (CAUSE, HAVE, MAKE, USE, BE, IN, FOR, FROM and ABOUT), or via predicate nominalization. Warren (1978) proposes a larger taxonomy, where two-place category labels encode ontological distinctions about the constituents of the compound (e.g. SOURCE-RESULT, PART-WHOLE, ORIGIN-OBJ, COMPARANT-COMPARED, etc.), but also makes a survey of the prepositional paraphrases that can be considered typical for each of the categories (e.g. of, with, from and like, respectively, for the categories listed above). Downing (1977) and Finin (1980) postulate that there is an infinite number of possible relations. More recent work presents annotation schemes based on prepositional paraphrases (Lauer, 1996), verbal paraphrases (O`Séaghdha, 2008), or semantic categories (Rosario and Hearst 2001, Girju et al. 2005, Tratz and Hovy 2010).

2.2 Automatic Classification

One of the earliest computational approaches to the classification of compound nouns is due to Lauer (1996), who reports an accuracy of 47% at predicting one of 8 possible prepositions using a set of 385 compounds. Rosario and Hearst (2001) obtain 60% accuracy at the task of predicting


one of 18 relations using neural networks and a dataset of 1660 compounds. The domain-specific inventory they use was obtained through iterative refinement by considering a set of 2245 extracted compounds and looking for commonalities among them. Girju et al. (2005) use WordNet-based models and SVMs to classify nouns according to an inventory containing 35 semantic relations, and obtain accuracies ranging from 37% to 64%. Kim and Baldwin (2005) report 53% accuracy on the task of identifying one of 20 semantic relations using a WordNet-based similarity approach, given a dataset containing 2169 noun compounds. O`Séaghdha and Copestake (2013) experiment with the dataset of 1443 compounds introduced in O`Séaghdha (2008) and obtain 65.4% accuracy when predicting one of 6 possible classes using SVMs and a combination of various types of kernels. Tratz and Hovy (2010) classify English compounds using a new taxonomy with 43 semantic relations, and obtain 79.3% accuracy using a Maximum Entropy classifier on their dataset comprising 17509 compounds and 63.6% accuracy on the O`Séaghdha (2008) data.

All these efforts have concentrated on English compounds, despite the fact that compounding is a pervasive linguistic phenomenon in many other languages. Recent work (Verhoeven, Daelemans, & van Huyssteen, 2012; Verhoeven & Daelemans, 2013) applied the guidelines proposed by O`Séaghdha (2008) to annotate compounds in Dutch and Afrikaans. The reported F-Scores are 49.0% on the 1447 compounds Dutch dataset and 51.1% on the 1439 compounds Afrikaans dataset.

3. Annotation Scheme

This section presents a hybrid annotation scheme that attempts to combine the relative strengths of the property- and the paraphrase-based approaches. This is a revised version of the initial annotation scheme introduced in Dima et al. (2014). The annotation scheme allows the specification of compound-internal semantics via a combined label, typically one preposition and one semantic property. This is motivated by insights from applications such as machine translation that suggest that the combination of two labels is necessary to represent the semantics of a compound in sufficient detail (see Section 1). The set of possible prepositions is language-dependent and


Table 2. Annotating compounds headed by Haus ‘house’ with prepositions and semantic properties.

German compound

Preposition & Property English translation

AutohausMöbelhausModehausKonzerthausAuktionshausGästehausArmenhausWaisenhausHolzhausSteinhausSchneehausBaumhausEckhausLandhaus

[für ‘for’ goods][für ‘for’ goods][für ‘for’ goods][für ‘for’ usage][für ‘for’ usage][für ‘for’ user][für ‘for’ user][für ‘for’ user][aus ‘of’ material][aus ‘of’ material][aus ‘of’ material][in ‘in’ location][in ‘in’ location][in ‘in’ location]

‘car dealership’ lit. ‘car house’‘furniture store’ lit. ‘furniture house’‘fashion house’‘concert hall’ lit. ‘concert house’‘auction house’‘guest house’‘poor house’‘orphanage’ lit. ‘orphan house’‘wooden house’‘stone house’‘igloo’ lit. ‘snow house’‘tree house’‘corner house’‘country house’

has to be instantiated each time the annotation scheme is applied to a new language. The semantic properties are, in contrast, language-independent, and can be used directly for annotating nominal compounds in new languages.

Apart from the hybrid nature of the annotation scheme that combines property and paraphrase-based labels, another novel aspect of the annotation scheme is that the annotation is performed on a per head basis rather than on a per compound basis. Thus, the annotation task is defined as follows: given a set of compounds with the same head, identify and group together similar compounds. Table 2 illustrates the annotation process for German compounds with the head Haus ‘house’.

Prepositional paraphrases are one method of defining such similarity. In this case, all the compounds that can be paraphrased using the same preposition will belong to the same group. While this type of grouping seems to do very well in the case of the preposition aus ‘of’, where compounds with the meaning ‘houses made of material’ are clustered together, it is less useful for the other prepositions. In the case of the


preposition für ‘for’, the clustered compounds can be further differentiated: Konzerthaus and Auktionhause are ‘houses used for concerts or auctions’, while Autohaus and Möbelhaus are ‘buildings where certain goods are sold’, like cars and furniture.

In contrast, the compounds paraphrased with the prepositions in, an and auf all refer to ‘a type of house specified by a location’, and should be grouped together. This type of analysis justifies the complementary annotation with a semantic property, in addition to the intuitive but potentially more ambiguous annotation with prepositions.

The choice of semantic properties is guided by the qualia structure of a noun as proposed by Pustejovsky (1995). The semantic relations in our inventory are refinements of the four basic roles that make up the qualia of a noun: AGENTIVE, CONSTITUTIVE, FORMAL and TELIC (see Table 3).

The AGENTIVE role refers to “factors involved in the origin or ‘bringing about’ of an object” (Pustejovsky 1995, p. 86). Such factors include the creator of the object denoted by the noun, whether the object is an artifact or a natural kind or what causal chain is involved in the creation of the object. In our inventory of semantic properties the relations CAUSE, ORIGIN and PRODUCTION METHOD further differentiate Pustejovsky’s AGENTIVE role.

The CONSTITUTIVE qualia role refers to “the relation between an object and its constituents” (Pustejovsky 1995, p. 85). These relations include the material that the object denoted by the noun is made of, the weight of the object or what parts or component elements the object has. The relations COMPONENT, INGREDIENT, MATERIAL and PART in our inventory are refinements of the CONSTITUTIVE qualia role.

The FORMAL qualia role “distinguishes the object within a larger domain” (Pustejovsky 1995, p. 85) by specifying its orientation, magnitude, shape, dimensionality, color and position. The formal qualia role has the most refinements in our inventory, with 17 semantic properties that further differentiate it. Some of the semantic properties are direct correspondents of the role specifications, like APPEARANCE, MEASURE and SHAPE, while others like TIME POINT and TIME SPAN specify a more specialized ‘position in time’ (see the complete list in Table 3).

The TELIC qualia role refers “to the purpose that an agent has in


Table 3. Refining the qualia structure of a noun via semantic properties. Third column presents an approximate mapping to the inventory proposed by (O`Séaghdha, 2008): ⊆: inclusion, ∞: some overlap.Semantic property Example Relation to other inv.

Agentivecausecause-1

originorigin-1

production methodConstitutivecomponentingredientmaterialpartpart-1

Formalaccess-1

appearancedietdelimiterconsistencyconstruction methodcontenteponymhyponymlocationmanner of functioningmeasuremeasure-1

occasionownerowner-1

place of useshapeshape-1

storagetime pointtime spanTelicactivityfunctionfunction-1

goodsusageuser

Vulkaninsel ‘volcanic island’Regenwolke ‘rain cloud’Seewasser ‘sea water’Erdbebenherd ‘epicentre’Pfannkuchen ‘pancake’

Chlorwasser ‘chlorine water’Gurkensalat ‘cucumber salad’Holzlöffel ‘wooden spoon’Siegelring ‘seal ring’Kinderhand ‘child’s hand’

Gartentür ‘garden gate’Marmorkuchen ‘marble cake’Ameisenbär ‘anteater’ lit. ‘ant bear’Pensionsalter ‘retirement age’Panzerglass ‘bullet-proof glass’Blockhaus ‘block house’Sportszeitung ‘sports magazine’Sachertorte ‘Sacher cake’Lachsfisch ‘salmon fish’Berghütte ‘mountain hut’Gasherd ‘gas stove’Literflasche ‘litre bottle’Cholesterinspiegel ‘cholesterol level’Abendkleid ‘evening dress’Stadtarchiv ‘city archive’Sternekoch ‘star chef’Wandkalender ‘wall calendar’Kirschtomate ‘cherry tomato’Eisberg ‘iceberg’ lit. ‘ice mountain’Taschenkamm ‘pocket comb’Abendessen ‘evening meal’Saisonarbeiter ‘seasonal worker’

Laufschuhe ‘running shoes’Chefarzt ‘head physician’Hausbrücke ‘bridge house’Schuhfabrik ‘shoe factory’Tennisball ‘tennis ball’Cowboyhut ‘cowboy hat’

⊆OS:REL (2.1.7.1)⊆OS:REL (2.1.7.1)∞ OS:IN2 (2.1.3.1/2)∞ OS:IN1 (2.1.3.1/2)⊆OS:REL (2.1.7.1)

⊆OS:HAVE2 (2.1.2.4)⊆OS:HAVE2 (2.1.2.4)⊆OS:BE1 (2.1.1.2)⊆OS:HAVE1 (2.1.2.4)⊆OS:HAVE2 (2.1.2.4)

⊆OS:HAVE2 (2.1.2.4)⊆OS:BE1 (2.1.1.3)⊆OS:ACTOR2 (2.1.4.2)⊆OS:REL (2.1.7.1)⊆OS:HAVE2 (2.1.2.3)⊆OS:REL (2.1.7.1) ⊆OS:ABOUT2 (2.1.6.1)⊆OS:REL (2.1.7.1)⊆OS:BE1 (2.1.1.1)⊆OS:IN2 (2.1.3.1)⊆OS:REL (2.1.7.1)⊆OS:HAVE2 (2.1.2.3)⊆OS:HAVE1(2.1.2.3)⊆OS:IN2 (2.1.3.3)⊆OS:HAVE1 (2.1.2.1)⊆OS:HAVE2 (2.1.2.1)⊆OS:IN2 (2.1.3.1)⊆OS:BE2 (2.1.1.2)⊆OS:BE1 (2.1.1.2)⊆OS:IN2 (2.1.3.1/2)⊆OS:IN2 (2.1.3.3/4)⊆OS:IN2(2.1.3.3/4)

⊆OS:INST2 (2.1.5.1)⊆OS:BE1 (2.1.1.1)⊆OS:BE2 (2.1.1.1)⊆OS:ACTOR2 (2.1.4.2)⊆OS:INST2 (2.1.5.1)⊆OS:ACTOR1 (2.1.4.2)


performing an act” or “the built-in function or aim which specifies certain activities” (Pustejovsky 1995, p. 86) related to the object denoted by the noun. Our inventory contains 11 semantic properties that are refinements of the TELIC qualia role, among them ACTIVITY, OCCASION and USAGE.

The qualia structure of a noun is an abstraction over the possible relations that the noun might have with other nouns. A compound is built around the qualia structure of the head noun, by implicitly selecting one relation and explicitly stating its second argument in the form of the modifier (see Johnston and Busa, 1999). For example, the qualia structure of the noun Löf fel contains the relation MATERIAL. The semantics of the noun can, therefore, be extended, by specifying the argument of the MATERIAL relation. The compound Holzlöf fel ‘wooden spoon’ presents a possible extension where the modifier Holz ‘wood’ is the argument of the MATERIAL relation. The purpose of the annotation effort presented here is to recover the implicit relation between the head and the modifier by labeling it with a semantic property from our inventory.

For certain semantic relations it is possible that the roles that the head and the modifier play in the relation are reversed. For example in the compound Apfelbaum ‘apple tree’ the modifier apple(part) is PART of the head tree(whole), whereas in Kinderhand ‘child’s hand’ the head hand(part) is PART of the modifier child(whole). Rather than introducing new properties for specifying the semantics of such compounds we reuse the existing semantic properties by marking the inverse direction with the -1 superscript. Thus, Kinderhand should be annotated as PART-1 meaning that it has the semantics of the property PART but the roles played by the head and the modifier are reversed.

3.1 Inter-annotator Agreement Results

An inter-annotator agreement (IAA) study was conducted using a sample of 500 nominal compounds headed by concrete nouns from GermaNet. Written guidelines were given to two student annotators, native speakers of German, who performed the annotation independently. They had previously been trained on the compound annotation task, but had never seen any of the compounds that were part of the study.


Separate IAA scores were computed for the property labeling task, for the preposition labeling task as well as for the task of assigning a combined (property, preposition) label. The property annotation resulted in a percentage of agreement of 76.4% and a Kappa score (Cohen, 1960) of 0.74, while the preposition annotation resulted in a percentage of agreement of 79.5% and a Kappa score of 0.75. It is noteworthy that the amount of agreement is roughly the same for both property and preposition labeling. We conjecture that this similar agreement is due to the parallel annotation as the property labeling helped to disambiguate the preposition labeling and vice versa. Our findings regarding the agreement levels for the preposition and property labels are in stark contrast with the IAA results by Girju et al. (2005). In a similar two-label annotation experiment, they report a Kappa of 0.80 for annotation with the 8 prepositions proposed by Lauer (1996) and 0.58 for the annotation with their inventory of 35 semantic relations.

The agreement measured for combined property and preposition assignment resulted in a percentage of agreement of 68.6%. All of the IAA results reported in this section correspond to a substantial agreement according to the classification of Kappa coefficients proposed by Landis (1977). A more thorough discussion of the reported IAA results can be found in Dima et al. (2014).

3.2 Relation to other Annotation Schemes

The hybrid annotation scheme presented in this paper builds on the annotation schemes that were previously proposed in the literature. In particular, it combines Lauer’s (1996) suggestion of using prepositional paraphrases for compounds with proposals that encode compound semantics via two-place semantic properties (Warren, 1978; O`Séaghdha, 2008; Tratz & Hovy, 2010; Rosario & Hearst, 2001). The third column in Table 3 contains an approximate mapping between the semantic properties in our inventory and the ones proposed by (O`Séaghdha, 2008), which were also adapted for Dutch by (Verhoeven & Daelemans, 2013). A common aspect to the two inventories is the use of directionality: roof window and block diagram entail the same PART relation, but the position of the word specifying the part is different in the two compounds (position 2


for window and position 1 in the case of block, respectively). O`Séaghdha (2008) annotates compounds in context (there is a support sentence for each compound, which is used to disambiguate the semantic relation). In contrast, our annotation scheme considers compounds out of context, and focuses on labeling the interactions between a head word and the range of modifiers that it can form compounds with.

4. Experiments

4.1 Dataset

The experiments described in this section use a dataset of German compounds that was obtained by extracting compounds headed by concrete nouns from the German wordnet GermaNet (Hamp and Feldweg, 1997; Henrich and Hinrichs, 2010). The dataset contains 5082 compounds but only a subset containing 4607 compounds was used for the classification experiments. Strongly lexicalized compounds such as Eselsbrücke (‘mnemonic’, lit. ‘donkey bridge’) that were assigned neither property nor preposition and those compounds that were annotated with multiple prepositions or properties were removed because they require a special treatment. The remaining set of 4607 compounds contains 2171 distinct modifiers (2.1 compounds per modifier on average) and 360 distinct heads (12.8 compounds per head on average). A unique modifier appears on average with 1.3 different prepositions and 1.4 different properties; a unique head appears on average with 3.5 different prepositions and 4.4 different properties.

This dataset was labeled using the annotation scheme described in Section 3 which in its instantiation for German contains 17 prepositions and 38 semantic properties. Each compound in the dataset was annotated by two student annotators. An experienced lexicographer inspected and in some cases post-corrected all candidate annotations and adjudicated cases of disagreement in order to arrive at a gold standard. In terms of size, the German dataset is comparable with English datasets surveyed by Tratz and Hovy (2010), and is, to the best of our knowledge, the largest German noun-noun compound dataset annotated with compound-internal relations.


4.2 Feature Selection and Compound Modeling

The experiments are based on the assumption that the meaning of a compound can be predicted based on the semantic characteristics of its constituents. The models used in the experiments make use of two types of features: distributional features extracted from the German corpus web-news (Versley and Panchenko, 2012) and knowledge-based features extracted from the German wordnet GermaNet. Compound similarity is captured using two complementing approaches, described originally in (O`Séaghdha, 2008): (i) the lexical similarity approach, which considers pairwise similarities between constituents (e.g. plastic knife and metal spoon are similar because the pairs (plastic, metal) and (knife, spoon) are similar); (ii) the relational similarity approach, which states that word pairs appearing in similar contexts will have similar semantic relations (e.g. “The knife was made of cheap plastic.”; “Spoons are typically made of metal.”; given the similarity of the contexts, the relation between knife and plastic is assumed to be similar to the relation between spoon and metal). Table 4 presents an overview of the features used to model the compounds in our experiments.

Distributional features for lexical similarity are obtained by extracting co-occurrence information separately for the modifier and the head; relational similarity is modeled by considering the contexts where the modifier and the head appear as individual words in the same sentence. In both cases, two lists of reference elements are used for collecting the co-occurrence information from a fixed-size context: a corpus-derived list of the 1000 most frequent German words1 and the list of 17 prepositions defined by the annotation scheme. The motivation behind extracting the co-occurrences with the prepositions is that in many cases the choice of a correct preposition depends on the lexical associations between the constituents and particular prepositions. Lemmas are used as a basis for extracting co-occurrence counts. The context size is fixed to three tokens on the right and on the left of the target word in the lexical similarity setup.

1 http://wortschatz.uni-leipzig.de/html/wliste.html


Table 4. List of features used in the experiments.

Distributional features Similarity type

Co-occurrence distributions (PMI) of each individual constituent with the 1000 most frequent German words (lemma)Co-occurrence distributions (PMI) of each individual constituent with the 17 prepositionsCo-occurrence distributions (PMI) of the two constituents with the 1000 most frequent German words (lemma)Co-occurrence distributions (PMI) of the two constituents with the 17 prepositions

lexical

lexical

relational

relational

Knowledge-based features

Beginner category from GermaNet for each constituentBinary indicators for hypernyms for each constituentGloss terms (binary indicators for the 1000 most frequent German words) for each constituentBinary indicators for relations between constituentsHirst-St.Onge relatedness measure between constituentsBeginner category of the least common subsumer of the two constituents

lexicallexicallexical

relationalrelationalrelational

The relational similarity setup takes into account the sentences where both constituents appear together. The extracted context includes, in this case, the three tokens on the right and on the left of the two constituents, as well as all the tokens between them. The extracted raw observations are transformed by computing the pointwise mutual information (PMI) scores between the target word / target word pair and the reference element.

The knowledge-based features are extracted from GermaNet. The constituents in our dataset are not sense-disambiguated, so in order to compute the feature values for a constituent we have to first identify its GermaNet sense. This is trivial for more than a half of constituents, because there is only one possible sense. For the compound constituents that are ambiguous in GermaNet we select the sense that has the shortest distance to the second constituent of the same compound or the compound itself. Once the constituent senses are fixed we can extract different sets of features to represent them. Lexical similarity is modeled using three sets of features.


The first feature set considers the GermaNet gloss of the head/modifier and records which of the 1000 most frequent German words occur in the gloss in the form of binary indicators (e.g. if the word house occurs in the gloss of a constituent, the indicator is 1, otherwise it is 0). A second set of binary indicators encodes which of the 940 top concepts in GermaNet are hypernyms of the head/modifier. The third set of features specifies which one of the 17 unique beginner categories in GermaNet (Place, Artifact, Person, etc.) includes the head/modifier. Relational similarity, i.e. the connection between the head and the modifier, is modeled using indicators for two-place GermaNet relations such as hypernymy, antonymy, meronymy etc. and the Hirst-St. Onge relatedness measure (Hirst and St-Onge, 1998). A last relational feature specifies the GermaNet beginner category for the least common subsumer of the two constituents of the compound.

In all the classification experiments described in the next sections a compound is represented by a 6943-dimensional feature vector that consists of blocks of concatenated features. The vector contains 3051 (43.9%) distributional features and 3892 (56.1%) knowledge-based features. 5916 (85.2%) of the features capture the lexical similarity between constituents, and 1027 (14.8%) capture their relational similarity. All features have the same weight in the vector.

4.3 Experimental Setup

The hybrid annotation scheme described in Section 3 provides the opportunity to conduct: (i) single-label experiments, in order to assess the predictive strength of the prepositional paraphrases and the semantic properties in isolation; (ii) a multi-label experiment which attempts to simultaneously predict the propositional paraphrases and the semantic properties in question.

The experiments were carried out using Support Vector Machines. SVMs have been successfully applied to a variety of natural language processing tasks including compound interpretation for English and Dutch (O`Séaghdha 2008, Verhoeven et al. 2012). We use the SVM implementation from Weka Data Mining Software with a simple linear kernel (Witten et al. 2011) for the single-label classification tasks. The linear kernel is effective and much


faster on datasets with a large number of extracted features compared to other types of kernels. Another advantage is the simple tuning process, as the soft-margin parameter C is the only one that has to be optimized for the linear kernel (Ben-Hur and Weston, 2010). In this paper we only report on results obtained with SVMs with a linear kernel. Exploring other types of kernels that have been proposed in the literature, in particular the string kernels proposed by (O`Séaghdha, 2008; O`Séaghdha & Copestake, 2013) is left to future research.

The Mulan library (Tsoumakas et al. 2011) was chosen to transform the multi-label classification task to a single-label format that can be used directly by the Weka SVM implementation. The library offers a convenient way to use an existing dataset in the Weka ARFF format in multi-label classification experiments without modifying it. It also includes a set of evaluation measures targeted at multi-label learning such as label-based and instance-based accuracy.

Most importantly, Mulan defines a number of problem transformation methods that adapt a multi-label dataset for single-label classification algorithms such as SVM. Two the most straightforward transformations are Binary relevance (BR) and Label powerset (LP) methods. If we denote our set of possible labels as L and q=|L| then BR learns q binary classifiers, one for each label in L. Thus, BR does not take into account label correlations that are relevant for our task (e.g. the fact that a property occurs typically with a small subset of the possible prepositions, and not with all the prepositions in the inventory). LP, on the other hand, is very effective for capturing dependencies between labels. It considers each unique combination of labels that exists in the dataset as a new class of the constructed single-label classification task. A classifier trained on the transformed dataset outputs a combination of labels for each instance. The LP method leads to an increase in computational complexity, especially for large datasets, as the number of classes equals the number of all label combinations in the dataset. The upper bound for the number of labels is min(m, 2q), where m is the size of the dataset and q=|L|. In our case each entry from the dataset is annotated with exactly two labels and each label comes from a separate known inventory. The upper bound for the number of LP labels is min(m, p*s), where m is the size of the dataset, p is the


size of the preposition inventory and s is the size of the semantic property inventory. This is a much lower estimate and the actual number of classes is even smaller in practice thus making the LP transformation method the best choice for our experiments.

The Mulan library also implements some more complex transformation methods that try to overcome some of the LP’s and BP’s shortcomings (e.g. random k-labelsets). These transformations are not considered in the reported experiments and are left for future work.

All the conducted experiments use a 10-fold cross-validation setup. For each fold the SVM C parameter was optimized through the 5-fold cross-validation on the training set.

5. Results

Table 5 and Table 6 summarize the results of the single-label and multi-label experiments respectively. In order to estimate the difficulty of the tasks the tables also include the most frequent baseline.

For both single-label and multi-label setups, the prediction of prepositional paraphrases achieves higher results compared to the prediction of semantic properties. However, it has to be kept in mind that the set of property labels used in the annotation of the German dataset is twice as large as the set of prepositions used in this dataset. Hence, the classification task involving semantic properties is considerably harder, and it is noteworthy that there is only a small difference in accuracy and F-score between the two.

The results in Table 5 also clearly show that multi-label prediction outperforms single-label prediction. The accuracy and the weighted averaged F-score for preposition label prediction increase from 62.32% to 65.58% and from 0.616 to 0.639 respectively. For property label prediction the accuracy increases from 60.74% to 61.59% while the F-score remains unchanged. This suggests that the simultaneous prediction of both labels aids in the correct prediction of preposition labels, and has no negative impact on the property prediction.

The deeper reason for conducting the multi-label annotation and the corresponding multi-label experiments derives from the disambiguation requirements for natural language processing applications involving


Table 5. Single-label (SL) experiment results.

Classifier Accuracy preposition

F-score preposition

Accuracy property

F-score property

BaselineSingle-label prepositionSingle-label propertyMulti-label classifier

34.62%62.32%

-65.58%

0.1820.616

-0.639

22.66%-

60.74%61.59%

0.084-

0.6010.601

compounds. As discussed in Section 1, applications such as machine translation, where a compound in the source language can correspond to a prepositional phrase in the target language or vice versa, require mutual disambiguation of prepositional paraphrases and semantic properties.

The most significant result of our experiments is that using a multi-label classifier we obtain more than 10% increase in combined label (instance-based) accuracy (from 48.44% to 59.61%, see Table 6) i.e. the task of predicting both the semantic property and the prepositional paraphrase correctly for each compound. By using the hybrid annotation scheme we are able to give a more accurate specification of the compound-internal relation while improving over the results of automatic classification experiments that use single-label annotation schemes. Thus, this kind of annotation scheme is not only beneficial for the applications such as machine translation, as discussed in Section 1, but also aids in the machine learning classification tasks.

Notice also that the multi-label annotation setup can be seen as an instance of the more general scenario of multi-task learning (Caruana, 1997). The results obtained in the experiments reported here corroborate the claim of Caruana (1997) that using a shared representation for multiple learning tasks enables the fine-tuning of classifiers by taking into account patterns that generalize across individual learning tasks.

To the best of our knowledge, the results reported in this paper are the first for the task of automatically classifying the semantic relations for German nominal compounds. While it is always difficult to compare results across different datasets, languages and learning algorithms, the accuracy obtained in our study can be regarded as state-of-the-art results when compared to the studies mentioned in Section 2. The highest result for any


Table 6. Multi-label experiment results.

Classifier Combined label accuracy

BaselineSingle-label preposition + single-label propertyMulti-label classifier

22.66%48.44%59.61%

dataset of nominal compounds thus far (79.3%) was obtained by Tratz and Hovy (2010) on their dataset containing 17509 instances.

6. Error analysis

In this section we will take a more in-depth look at the overall results described in the previous section, by comparing the F-scores obtained for individual semantic properties. Figure 1 plots the frequency of the 38 semantic properties in the dataset against the F-score obtained for these properties in the multi-label classification experiments. Due to space limitations, we do not present an in-depth discussion of the correlation between the frequency of prepositional paraphrases and their F-scores. The two are strongly correlated (r = 0.70). However, the inherent ambiguity of prepositions and their more limited inventory, prepositions do not offer the same type of fine-grained semantic resolution offered by the set of semantic properties.

Figure 1 shows that the semantic properties exhibit a highly skewed distribution, with 3 properties (USAGE, PART and PART-1) accounting for more than 40% of all data instances, and with the other properties forming a long tail distributed over the remaining approximately 60% of the data instances. It is important to note that the skewed distribution of semantic properties reflects the overall patterns of productivity exhibited by nominal compound usage and formation of novel compounds. USAGE and part-whole relations like PART and PART-1 are generally applicable properties and thus lead to large clusters of compounds. This observation has been corroborated by Moldovan et al. (2004) and Girju et al. (2005), who report that the part-whole relation is the most frequent (19.86%) among all occurrences of their 35 distinct semantic relations and in their corpus of annotated compound-internal relations for English. Similarly for German,


Figure 1. F-score (●) and relative frequency (▲) per property

34% of all part-whole relations recorded in GermaNet release 6.0 involve nominal compounds (Hinrichs et al. 2013).

Figure 1 also shows that there is no strong correlation between the relative frequency of a semantic property and its F-score (Pearson’s correlation coefficient r = 0.22). While relative frequency is not strongly correlated with the F-score, it turns out that the degree of semantic cohesiveness among compound modifiers for a given semantic property plays an important role. The reason for focusing on the modifiers rather than on the heads of compounds is motivated by the fact that the number of distinct modifiers (2171) far outnumber the number of distinct heads (360) in the compound dataset used for the experiments. Moreover, since the data were annotated by considering modifiers from different semantic fields, they display a much higher degree of semantic diversity.

As described in Section 4, the classification experiments model compounds using two types of information: distributional information obtained from large corpora and semantic field information obtained from GermaNet. The same two knowledge sources were exploited to measure the diversity of the modifiers assigned to the same property. We focus our discussion on the 21 semantic properties that occur with a relative frequency of 0.5% or higher (see threshold in Figure 1). A relative frequency of 0.5% corresponds to 23 instances in the dataset. The semantic properties below this threshold do not have enough instances for computing reliable diversity


Figure 2. Correlation between the pairwise distance between the modifiers and the F-score for a property

measures and are therefore not considered in the discussion.Given the set of modifiers for a given semantic property, a distance score

can be obtained by measuring all pairwise distances between the vectors that represent these modifiers. Figure 2 plots the per-property mean of these distances against the F-score of each semantic property. There is considerable negative correlation (r = -0.468) between the cohesiveness of the set of modifiers for a given semantic property as measured by the pairwise distances and the F-score. The properties that achieve a modest F-score tend to be located in the bottom right side of the plot, which corresponds to higher distances between the modifiers. Interestingly, this type of correlation can provide an explanation for the fact that some properties with similar frequencies have very different F-scores. This is the case for the properties FUNCTION, ORIGIN, MANNER OF FUNCTION and OWNER (marked with solid circles in Figure 2), which have roughly the same frequency (see Figure 1) but show considerable variance in F-score: 0.38, 0.56, 0.45 and 0.62 respectively. The difference in F-score for these four properties correlates with their position on the horizontal axis in Figure 2. The modifiers assigned to the property OWNER have the lowest mean distance compared to the other three properties and therefore


Figure 3. Correlation between the entropy of the set of modifiers and the F-score for a given property

OWNER achieves the highest F-score among the four properties in question. The property FUNCTION yields the lowest F-score and accordingly appears rightmost in Figure 2 i.e. has the highest mean distance among modifiers for the four properties under consideration.

The second diversity measure uses the entropy of the semantic class membership of the set of modifiers for a given semantic property in order to determine the degree of uncertainty inherent in the set of modifiers. Semantic class membership is determined by assigning to each modifier one of the 17 unique beginner category from GermaNet (e.g. Place, Artifact, Person, etc.). Figure 3 plots the entropy against F-score for each semantic property. There is a strong negative correlation (r = -0.509) between the F-score and the entropy. Most of the properties with high F-score are located in the upper left part of the plot, i.e. there is little uncertainty about the semantic class of the modifiers assigned to these semantic properties. The entropy measure provides us with even better insights for some of the results than the corpus-based measure discussed above. This is case for the properties INGREDIENT, LOCATION and MATERIAL (marked with solid circles in Figure 3) that have similar frequencies, but also exhibit a


high variance in F-score: 0.88, 0.47 and 0.78 respectively. The difference in their F-scores cannot be explained using the more noisy corpus-based measure (see Figure 2) but are well accounted for by the more refined entropy measure (see Figure 3).

The results of measuring the semantic diversity of the set of modifiers for a given property strongly confirm the intuition that the difficulty of the automatic classification task correlates with the degree of distributional diversity and/or with the degree of uncertainty (entropy). Moreover, these measures of semantic diversity can also be used as an exploratory tool for assessing the complexity inherent in a given annotation scheme. The described correlations provide insights for both designing more effective compound representations for machine learning and for refining the taxonomy in order to minimize the diversity of the modifiers assigned to the same property. The resulting guidelines for the annotation scheme, as well as the extended dataset will be made available via the project’s web page2.

7. Conclusions and Future Work

This paper has reported on novel results for the automatic classification of semantic relations that hold between the constituents of nominal compounds in German. The experiments use a dataset with a hybrid annotation scheme that models semantic relations using a combination of prepositional paraphrases and semantic properties. To the best of our knowledge, it is the first study of its kind for German and its results are comparable to state-of-the-art results obtained for English on the same task.

An anonymous reviewer suggested using more advanced distributional approaches to represent the semantic information associated with the constituents of a compound. One possibility is to use distributed representations created by neural language models (word embeddings), which have already been shown to provide near state-of-the-art results for the automatic interpretation of English noun compounds (Dima & Hinrichs, 2015).

2 http://www.uni-tuebingen.de/en/research/core-research/collaborative-research-centers/sfb-833/section-a-context/a3-hinrichs.html


Another line of work that requires further investigation is the treatment of ambiguous compounds, i.e. compounds that can be assigned multiple properties and prepositions. Their analysis would greatly benefit from the use of multiple support sentences in order to distinguish among contexts that license specific semantic interpretations.

Acknowledgements

The second and third author of the present paper would like to thank Verena Henrich and Christina Hoppermann for joint work on the dataset of German compounds used in this paper. We are very grateful to our student assistants Kathrin Adlung, Nadine Balbach, and Tabea Sanwald, who helped us with the annotations reported in this paper. Financial support was provided by the German Research Foundation (DFG) as part of the Collaborative Research Center ‘Emergence of Meaning’ (SFB 833) and by the German Ministry of Education and Technology (BMBF) as part of the research grant CLARIN-D.

References

Ben-Hur, A., & Weston, J. (2010). A user's guide to support vector machines. Methods in molecular biology (609), pp. 23-39.

Caruana, R. (1997). Multitask Learning. Machine learning , 75 (1), 41-75.Cohen, J. (1960). A coefficient of agreement of nominal scales. Educational and

Psychological Measurement , 20 (1), 37-46.Dima, C., & Hinrichs, E. (2015). Automatic Noun Compound Interpretation using

Deep Neural Networks and Word Embeddings. 11th International Conference on Computational Semantics (IWCS 2015). London, UK.

Dima, C., Henrich, V., Hinrichs, E., & Hoppermann, C. (2014). How to Tell a Schneemann from a Milchmann: An Annotation Scheme for Compound-Internal Relations. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) (pp. 1194-1201). Reykjavik, Iceland: European Language Resources Association.

Downing, P. A. (1977). On the Creation and Use of English Compound Nouns. Language 53 (4), 810-842.


Finin, T. (1980). The semantic interpretation of compound nominals. Ph.D. dissertation, University of Illinois.

Girju, R., Moldovan, D., Tatu, M., & Antohe, D. (2005). On the semantics of noun compounds. Computer Speech and Language 19 (4), 479--496.

Hamp, B., & Feldweg, H. (1997). GermaNet - a Lexical-Semantic Net for German. Proceedings of ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications (pp. 9-15). Madrid, Spain: Association for Computational Linguistics.

Henrich, V., & Hinrichs, E. (2010). Gernedit - the Germanet editing tool. Proceedings of LREC 2010, Main Conference (pp. 2228-2235). Uppsala, Sweden: Association for Computational Linguistics.

Hinrichs, E., Henrich, V., & Barkey, R. (2013). Using part-whole relations for automatic deduction of compound-internal relations in GermaNet. Language Resources and Evaluation 47 (3), 839-858.

Hirst, G., & St-Onge, D. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellfaum (Ed.), WordNet - An Electronic Lexical Database (pp. 305-332). Cambrige, MA, USA: MIT Press.

Johnston, M., & Busa, F. (1999). Qualia structure and the compositional interpretation of compounds. In Breadth and depth of semantic lexicons (pp. 167-187). Netherlands: Springer.

Karlgren, J. (2005). Compound terms and their constituent elements in information retrieval. 15th Nordic Conference of Computational Linguistics (NODALIDA-05). Joensuu, Finland.

Kim, S. N., & Baldwin, T. (2005). Automatic Interpretation of Noun Compounds Using WordNet Similarity. Proceedings of the Second International Joint Conference on Natural Language Processing (pp. 945-956).

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33 (1), 159-174.

Lauer, M. (1996). Designing Statistical Language Learners: Experiments on Noun Compounds. Ph.D. dissertation, University of Macquarie.

Levi, J. N. (1978). The Syntax and Semantics of Complex Nominals. New York: Academic Press.

Moldovan, D., Badulescu, A., Tatu, M., Antohe, D., & Girju, R. (2004). Models for the Semantic Classification of Noun Phrases. HLT-NAACL 2004: Workshop on Computational Lexical Semantics (pp. 60-67). Boston, MA, USA: Association for Computational Linguistics.

Nakov, P. (2008). Noun compound interpretation using paraphrasing verbs: Feasibility study. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5253 LNAI, pp. 103-117. Varna, Bulgaria.


Nastase, V., Nakov, P., O`Séaghdha, D., & Szpakowicz, S. (2013). Semantic Relations Between Nominals. Morgan & Claypool.

O`Séaghdha, D. (2008). Learning compound noun semantics. Ph.D. dissertation, Computer Laboratory, University of Cambridge.

O`Séaghdha, D., & Copestake, A. (2013). Interpreting compound nouns with kernel methods. Natural Language Engineering 19 (03), 331-356.

Pedersen, B. S. (2007). Using shallow linguistic analysis to improve search on. Natural Language Engineering 13 (1), 75–90.

Popović, M., Stein, D., & Ney, H. (2006). Statistical Machine Translation of German Compound Words. In FinTAL - 5th International Conference on Natural Language Processing (pp. 616-624). Springer.

Pustejovsky, J. (1995). The generative lexicon (Vol. 17). Cambridge, Massachusetts: The MIT Press.

Rosario, B., & Hearst, M. (2001). Classifying the semantic relations in noun compounds via a domain-specific lexical hierarchy. Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (pp. 82-90).

Sag, I., Baldwin, T., & Bond, F. (2002). Multiword expressions: A pain in the neck for NLP. In Proc. of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (pp. 1-15). Springer.

Sparck Jones, K. (1985). Compound noun interpretation problems. In F. Fallside, & W. A. Woods (Eds.), Computer Speech Processing (pp. 363-381). Englewood Cliffs, NJ, USA: Prentice-Hall.

Tratz, S., & Hovy, E. (2010). A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 678-687). Uppsala, Sweden.

Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., & Vlahavas, I. (2011). MULAN: A Java library for multi-label learning. Journal of Machine Learning Research 12, 2411-2414.

Verhoeven, B., & Daelemans, W. (2013). Semantic classification of Dutch noun-noun compounds. A distributional approach. Computational Linguistics in the Netherlands (CLIN) (3), 2-18.

Verhoeven, B., Daelemans, W., & van Huyssteen, G. B. (2012). Classification of Noun-Noun Compound Semantics in Dutch and Afrikaans. Proceedings of the Twenty-Third Annual Symposium of the Pattern Recognition Association of South Africa (PRASA 2012), (pp. 121-125). Pretoria, South Africa.

Versley, Y., & Panchenko, Y. (2012). Not Just Bigger: Towards Better-Quality Web Corpora. Proceedings of the seventh Web as Corpus Workshop (pp. 44-52). Lyon, Frankreich.

Warren, B. (1978). Semantic patterns of noun-noun compounds. Göteborg: Acta


Universitatis Gothoburgensis.Witten, I. H., Frank, E., & Hall, M. (2011). Data Mining Practical Machine

Learning Tools and Techniques (3 ed.). Amsterdam: Morgan Kaufmann.

04 classifying semantic relations in german...

Documents