taxo-semantics: assessing similarity between multi-word...

Decision Support Systems 98 (2017) 10–25

Contents lists available at ScienceDirect

Decision Support Systems

j ourna l homepage: www.e lsev ie r .com/ locate /dss

Taxo-Semantics: Assessing similarity between multi-word expressionsfor extending e-catalogs

Heiko Angermann*, Zeeshan Pervez, Naeem RamzanUniversity of the West of Scotland, School of Engineering and Computing, High Street, Paisley PA1 2BE, United Kingdom

A R T I C L E I N F O

Article history:Received 2 September 2016Received in revised form 31 March 2017Accepted 1 April 2017Available online 8 April 2017

Keywords:Concept similarityElectronic commerceElectronic catalogLogic programming

A B S T R A C T

Taxonomies, also named directories, are utilized in e-catalogs to classify goods in a hierarchical mannerwith the help of concepts. If there is a need to create new concepts when modifying the taxonomy, thesemantic similarity between the provided concepts has to be assessed properly. Existing semantic similar-ity assessment techniques lack in a comprehensive support for e-commerce, as those are not supportingmulti-word expressions, multilingualism, the import/export to relational databases, and supervised user-involvement. This paper proposes Taxo-Semantics, a decision support system that is based on the progressin taxonomy matching to match each expression against various sources of background knowledge. Thesimilarity assessment is based on providing three different matching strategies: a lexical-based strategynamed Taxo-Semantics-Label, the strategy Taxo-Semantics-Bk, which is using different sources of backgroundknowledge, and the strategy Taxo-Semantics-User that is providing user-involvement. The proposed systemincludes a translating service to analyze non-English concepts with the help of the WordNet lexicon, canparse taxonomies of relational databases, supports user-involvement to match single sequences with Word-Net, and is capable to analyze each sequence as (sub)-taxonomy. The three proposed matching strategiessignificantly outperformed existing techniques. Taxo-Semantics-Label could improve the accuracy result bymore than 7% as compared to state-of-the-art lexical techniques. Taxo-Semantics-Bk could improve the accu-racy compared to structure-based techniques by more than 8%. And, Taxo-Semantics-User could additionallyincrease the accuracy by on average 23%.

© 2017 Elsevier B.V. All rights reserved.

1. Introduction

Taxonomies are subcategories of ontologies, using hierarchicallyordered concepts to model a field of interest in a formal way [1].While keyword search is known as a quick solution for finding spe-cific products, a hierarchical representation of a domain has its mer-its for navigation, and for exploring similar items [2]. The creationof the taxonomy is either performed through expert(s) knowing thedomain in detail, by extracting the taxonomy from a text corpus, orby referring to a standard taxonomy [3] (e.g. Global Language of Busi-ness (GS1), the United Nations Standard Products and Services Code(UNSPSC), or the Classification and Product Description (eCl@ss)). Suchstandard taxonomies, as well as custom e-commerce taxonomiesoften use multi-word expressions(MWEs) as labels for the conceptscombining semantically less or more similar sequences as a label.

In the age of the digital customer, it is now possible to derive thecustomers’ preferences with the concepts [4,5]. The preferences can

* Corresponding author.E-mail addresses: [email protected] (H. Angermann),

[email protected] (Z. Pervez), [email protected] (N. Ramzan).

be used for providing personalized directories (e.g. in Angermannand Ramzan [6]), but new concepts have to be created accordingly,as the semantics of the taxonomy is changing. The new conceptscan again be created by the expert, or through systems performinga semantic similarity assessment(SSA) measure. The lexicon-basedSSA measures are taking into account literal values, respectivelythe terminological heterogeneity existing between concepts [7,8]. Incontrast, the structure-based SSA techniques are detecting similar-ity by the help of the taxonomy-based is-A hierarchy [9]. Hereby, theconceptual heterogeneity is used to derive the path-length distancebetween the two concepts based on the depth of the concepts insidethe taxonomy. The information-based SSA measures are also usingthe conceptual heterogeneity, but are exploiting the additional con-tent assigned to the concepts and their probability to occur insidethe taxonomy. For inferring further similarities between the con-cepts, the semantic lexicon WordNet is widely used [10]. The authorsin Ma et al. [11] are manually mapping the concepts of the sourcetaxonomy to the synsets in WordNet before applying a lexical-basedmeasure. Another approach in Kim et al. [12] is determining thepath of the concepts in WordNet, before computing structure-basedsimilarity. The approach in Nguyen and Conrad [13] analyzes theindirect connections between the WordNet entities. However, as

http://dx.doi.org/10.1016/j.dss.2017.04.0010167-9236/© 2017 Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.dss.2017.04.001

http://www.ScienceDirect.com/

http://www.elsevier.com/locate/dss

http://crossmark.crossref.org/dialog/?doi=10.1016/j.dss.2017.04.001&domain=pdf

mailto: [email protected]



http://dx.doi.org/10.1016/j.dss.2017.04.001

H. Angermann et al. / Decision Support Systems 98 (2017) 10–25 11

recent approaches do not consider the comparison between MWEsin e-commerce, the need for creating new concepts, the output ofthe assessment result in relational database format, or the supportof multilingualism as provided in other domains (e.g. Hogenboom etal. [14]), the proposed systems are not comprehensive enough to beused in the e-catalog/e-commerce domain.

To provide a system for comparing similarity between conceptsin e-catalogs, and to use the comparison results for extending thetaxonomy, the decision support system Taxo-Semantics is presented.The proposed system is build upon the progress made in the relatedresearch field taxonomy matching(TM) and differs from existingtechniques in five ways. Firstly, non-English taxonomies cannot beanalyzed with the help of WordNet when a translator is missing, asWordNet only contains English synsets. To overcome multilingual-ism, a dictionary can be included in Taxo-Semantics to match non-English sequences. Secondly, for parsing real-world e-commercetaxonomies, the approach must be capable to process taxonomiescaptured in Structured Query Language (SQL). In Taxo-Semantics, theSQL import/export format Comma Separated Values (CSV) can beparsed to analyze and extend taxonomies coming from relationaldatabases. Thirdly, although the latest Ontology Alignment EvaluationInitiative (OAEI)1 campaigns evident that user-involvement can sig-nificantly affect the assessment quality results [15] , either no user-involvement is supported in recent systems, or the provided user-involvement is limited to identify the most-related WordNet synset.In Taxo-Semantics, the user-involvement is supported in a super-vised way. Through this, it helps non-expert users, as well as expertusers to detect the related WordNet synset for each MWE sequence.In addition, the user-involvement is used as further technique toaffect the SSA. Fourthly, recent approaches are focussing on sin-gle labels, although, the most real-world e-commerce applicationslike Amazon2 are using MWEs. To compare concepts using MWEs,Taxo-Semantics is providing three different and scalable matchingstrategies. And fifthly, recent approaches do not consider the usageof the assessment result for creating new concepts. Taxo-Semanticsis capable to use the assessment result for creating mediator con-cepts that generalize the semantically similar concepts. In the end,Taxo-Semantics provides the following contributions to the field ofdecision support systems:

• A decision support system that helps (non)-experts to seman-tically compare concepts of taxonomies used in e-commerce.

• A decision support system that performs the assessment pro-cess as matching operation.

• A decision support system that involves the user during thematching/assessment process.

• A decision support system, which makes use of the matchingresult to extend the taxonomy.

The remainder of the paper is organized as follows. In Section 2,the problems of SSA, and TM are formulated, as well as the usageof MWEs in e-catalogs. The decision support system TaxoSemantic ispresented in Section 3. In Section 4, the system is evaluated with thehelp of three e-catalogs. The work concludes in Section 5.

2. Problem formulation

Formally, a Taxonomy T is an out-tree, see Eq. (1):

T = {C, E}, (1)

which is using a set of concepts C for describing terms with the helpof a label, and a set of edges E connecting less general with more

1 http://oaei.ontologymatching.org/.2 http://www.amazon.com.

general concepts. The less general concepts are formally named SubConcept of its super-ordinated concept. In the example shown inFig. 1, the concept “Car Accessoires & Parts” is-A sub concept of “Car& Motorbike”, which is-A sub concept of “Shop by Department”. Thesuper-ordinated concept is named Super Concept. Consequently,“Shop by Department” is-A super concept of “Car & Motorbike”,which is-A super concept of “Car Accessoires & Parts”, as well asof “Tools & Equipment”, “Sat Nav & Car Electronics”, and “Motor-bike Accessoires & Parts”. Concepts sharing the same super conceptare referred as Sibling Concepts. A Root Concept has no moregeneralized super concept. In the illustrated taxonomy, “Shop byDepartment” is the root concept of the e-catalog. Optional, each con-cept can have additional information like a description (e.g. “tools forrepairing a motorbike”), and a set of properties (e.g. “color”, “powerconsumption”, “size”).

In recent e-commerce applications, the labels mainly consist ofa combination of different word sequences, named multi-wordexpressions(MWEs). MWEs are defined as idiosyncratic interpreta-tions that cross word boundaries [16]. For example, the Amazonconcept “Car & Motorbike”, includes four less generalized conceptsthat all consist of MWEs. Or, the Walmart concept “Auto Detailing &Car Care” includes five concepts, which are throughout labelled withMWEs, see Fig. 2. However, for the most widely used backgroundresource for inferring semantic similarity WordNet, only 41% of thesynsets are MWEs [16].

Semantic similarity assessment(SSA) is the task of finding thecognitive and methodical homogeneity between a pair of concepts,formally simC1,C2 . For example, between the concepts “Car Acces-soires & Parts”, and “Tools & Equipment”, see Fig. 3. Two terms aresemantically similar, if their meanings are close, or the concepts aresharing common attributes [17] has a similar aim, namely to find thecorrespondences between concepts of two taxonomies. The corre-spondences are computed during a matching operation [18] , whichincludes a matching strategy that defines how the similarities shouldbe assessed. Current matching systems combine different techniquesto increase the assessment accuracy [19]. In contrast to SSA, suchapproaches are annually evaluated by the OAEI.

3. Proposed system Taxo-Semantics

This section explains the system Taxo-Semantics. The explana-tion starts by detailing the methodology used to detect similaritybetween concepts. Afterwards, its implementation in logic program-ming language Prolog is presented.

3.1. Taxo-Semantics method

As the hierarchical structure inside the taxonomy already pro-vides different concept types, this relationships can be used to createconcept pairs. In Taxo-Semantics, the sibling sub concepts are takeninto account as concept pairs. Thus, each concept detailing a superconcept is compared with every sibling concept. Because of the bynature flexible number of concepts, each super concept can have Npairs, see Eq. (2):

N = n!/ ((n − k)! ∗ k) , (2)

Shop by Department

Car & Motorbike

MotorbikeAccessories

& Parts

Sat Nav & CarElectronics

Tools &Equipment

CarAccessories

& Parts

Fig. 1. A subset of the Amazon product taxonomy.

http://oaei.ontologymatching.org/

http://www.amazon.com

12 H. Angermann et al. / Decision Support Systems 98 (2017) 10–25

Department: Auto & Tires

Auto Detailing & Car Care

PressureWorkers

DetailingKits

CleaningTools

Car Wash& Polish

Car Washers& Cleaner

Fig. 2. A subset of the Walmart product taxonomy.

where n is the number of sibling sub concepts, and k is the num-ber of concepts being part of a concept pair. In Taxo-Semantics, eachpair consists of two concepts. For example, the super concept “Car& Motorbike” given in Fig. 1 would create six pairs. The sub con-cept “Car Accessoires & Parts” would be compared with “Tools &Equipment”, “Sat Nav & Car Electronics”, and “Motorbike Acces-sories & Parts”. Its sibling concept “Tools & Equipment” would becompared with “Sat Nav & Car Electronics”, as well as “MotorbikeAccessories & Parts”. Finally, the sub concept “Sat Nav & Car Elec-tronics” would be compared with “Motorbike Accessories & Parts”.Through this, each sibling concept is compared with each other sib-ling concept. By default, Taxo-Semantics is considered to comparethe sub concepts belonging to the identical super concept. However,a comparison between all sub concepts of a taxonomy, no matterto which super concept those belong, would also be possible. Than,the root concept has to be considered as super-ordinated concept,instead of using the super concept. This is necessary if redundantsub concepts exist and the assessment should be performed overallconcepts. For assessing the similarity between a pair, Taxo-Semanticsdoes not use a static assessment procedure. Taxo-Semantics is basedon the progress in the field of taxonomy matching, and performssimilar to a matching system, e.g. in Faria et al. [20], Jimenez-Ruizet al. [21]. Such matching systems usually provide different match-ing techniques inside a library to be used in a flexible matchingstrategy according to the users’ selection. Through this, a match-ing strategy can, in contrast to a static assessment procedure reacton the characteristics of the domain, i.e. the quality of the taxon-omy. For example, if the differences between the concepts is mainlybased on the literal values, the matching strategy should focus onstring- or lexical-based techniques. Or, if the differences betweenthe concepts is mainly based on conceptual values, the matchingstrategy should focus on structure-based techniques. This flexibilityis considered in Taxo-Semantics by providing three different match-ing strategies, which in turn can use techniques of other matchingstrategies: Taxo-Semantics-Label, in the following shortened as TS-Label, Taxo-Semantics-Bk, in the following shortened as TS-Bk, andTaxo-Semantics-User, in the following shortened as TS-User. TS-Labelis comparing the labels of the concepts with a lexical-based tech-nique. TS-Bk is based on background-knowledge. To do so, eachmeaningful sequence of the MWEs is searched in WordNet. Afterthat, for each related synset, the (sub)-taxonomy is created basedon its super concepts in WordNet, named hypernym. The concepts

CarAccessories

& Parts

Tools &Equipment

Sat Nav &Car Electronics

MotorbikeAccessories

& Parts

Fig. 3. Semantic similarity assessment between concepts of the Amazon taxonomy.

being included in the (sub)-taxonomy are compared using the short-est path distance existing between the two concepts [22]. Hereby thepath distance is based on the depth of the less general synset for thetwo concepts, not on the complete WordNet taxonomy. In addition,this strategy can combine the lexical-based technique of the formerstrategy, as well as a language-based technique performing on theWordNet gloss, and a content-based technique using the propertiesof the concepts. TS-User is based on TS-Bk, but additionally supportsuser-involvement. Hereby, the user can state if a similarity betweenthe concepts exists or not, which is afterwards combined with theresult of the other used techniques. For all strategies, the user candefine a dynamic threshold. If the value resulted by the system isequal or higher than the threshold, the concepts inside the pair areconsidered as similar.

3.1.1. Background sourcesBackground sources are used to help inferring further rela-

tionships between concepts and thus help assessing similaritybetween concepts/taxonomies [19]. The latest OAEI campaigns evi-dent that the taxonomy matching systems perform better, than moreresources of background knowledge they are using [15]. In Taxo-Semantics, four different sources of background knowledge are used:the semantic lexicon WordNet, a dictionary containing all Englishroot words, a list containing punctmarks, as well as a thesaurus.

The semantic lexicon WordNet, its synsets and hypernyms, areused for performing a structure-based analyzes between the pair. Ahypernym, in the form of: Hypernym = (Synset, Synset) specifies thatthe second argument is a super-ordinated synset of the first synset,see Fig. 4. Thus, it represents the relationships analogues to the is-A hierarchy. In addition, Taxo-Semantics is capable to make use ofthe gloss that is assigned to every WordNet synset, in the form of:Gloss = (Synset, Text). A gloss gives a brief definition of the synsetand, in most cases, one or more short sentences illustrating the useof synset members [23]. With the help of the gloss, a language-basedcomparison on a larger text corpus can be performed, also if the tax-onomy is initially not supported with concept descriptions, as givenin the most e-commerce applications. The Summer Institute of Lin-guistics dictionary3 is a resource that contains all English root words.In Taxo-Semantics, it is used for supporting the before-mentionedlanguage-based technique. Punctmarks are also considered as irrel-evant sequences having no semantically rich influence on the labelof a concept, as well as on the gloss. A thesaurus can be used tomatch between the input language, and between English. This is nec-essary as WordNet only contains English synsets. In Taxo-Semantics,the Google translator is used to result a list containing the initialsequence, as well as the sequence expressed in English (for furtherdetails see Aiken et al. [24]).

3.1.2. Matching operationTaxo-Semantics is performing the matching operation firstly

between the concepts and the background knowledge, and secondly,between the sibling concepts. The workflow to assess the similarityconsists of seven steps.

1. Select Matching Strategy defines the strategy to be usedfor performing the match, as well as the threshold to stateif a similarity exists or not. As explained above, a user canchoose between three matching strategies. The completestrategy is defined as a six-tuple, see Eq. (3):

ST = {STR, THH, BKL, BKG, BKP, UIV}, (3)

3 http://www.sil.org/.

http://www.sil.org/


Coupe

Beach Wagon Cruiser

Car,Auto,

Automobile,Machine,Motorcar

Motor Vehicle,Automotive Vehicle

Sibling Term

Sibling Term

DirectHypernym

DirectHypernym

InheritedHypernym

DirectHypernym

is − A

(a)

Fig. 4. An exemplary result for querying “Coupe” in WordNet: the result represented as a graph including an inferred is-A relationship.

where STR defines the strategy, THH the threshold, andUIV if user-involvement should be supported or not. Thevariables BKL (lexical-based), BKG (language-based), andBKP (content-based) define if the matching strategy TS-Bkshould combine the structure-based technique with othertechniques.

2. Input Source Taxonomy asks the user for choosing theinput taxonomy, and to define the natural language of thetaxonomy. The concepts of the input taxonomy are parsedaccording to its file format. Hereby, the concepts are investi-gated against its super and sub concepts, which results thata concept having no super ordinated concept is defined asthe root concept, and the concepts having a super-ordinatedconcept are sub concepts. In the case of the concept includesproperties, those are added accordingly. Through this, eachconcept is defined in a four-tuple, see Eq. (4):

TA = {SIS, SNS, SIP, [PRO1, . . . , PRON]}, (4)

where SIS is the identifier of the concept, SNS is the labelof the concept, SIP is the identifier of the super-ordinatedconcept, and PRO includes the properties. Through this,Taxo-Semantics is not limited to a specific number of levels,because a super concept can of course again be a sub con-cept of a more general concept. On the base of the definedconcepts, the concept pairs are created as a three-tuple, seeEq. (5):

CP = {SIS1, SIS2, SIP}, (5)

where SIS1 and SIS2 are two sibling concepts of SIP. If thelanguage of the taxonomy is not English, the concepts aretranslated using a thesaurus in the form of a three-tuple, seeEq. (6):

TR = {LAN, SNS, SNL}, (6)

where LAN is defining the language, SNS is the labelexpressed in the initial language, and SNL is the label inEnglish.

3. Preprocess Background Knowledge loads the necessarybackground resources if the strategy TS-Bk or TS-User ischosen. As explained above, the synsets, hypernyms, andgloss of WordNet are used as main background resources.Furthermore, the dictionary and the punctation marks(semi-colon, comma, full stop, colon, quotation mark, ques-tion mark, exclamation mark, brackets, hyphen, dashes,apostrophe, braces, slash) are used for supporting the

language-based technique, and to help detecting the most-related WordNet synsets. The system establishes for eachconcept one or multiple related WordNet synset(s). To doso, Taxo-Semantics reduces the label to its root word throughremoving stop words and abbreviations that are not foundin the dictionary. For example, the label “Pendulum JigsawCARVEX PS 420” is reduced to “pendulum jigsaw”, becausethe provider specific sequences “CARVEX”, “PS”, and “420”won’t be found within WordNet. Note that the abbreviationand number are only reduced for matching with Word-Net, for the lexical-based technique, the abbreviations andnumbers remains. This is important if the main differencebetween concepts is mainly affected by a company-specificnumber (e.g. “Boeing 737” and “Boeing 777”). After thereduction to the root word, the single root word, and theroot word consisting of MWEs, are searched in WordNetsynsets as follows:(a) Firstly, the complete root word is searched in WordNet,

as WordNet also contains a small portion of MWEs.(b) Secondly, if no related synset is found, the single

sequences are queried. For example, the term “pen-dulum jigsaw” cannot be found as WordNet synset.However, there is a synset for “pendulum”, and anothersynset for “jigsaw”.

(c) Thirdly, if the first and second search fails, the sequenceis assigned to the most general subsumer in WordNet.Alternatively, the single sequence of the MWE can beignored.

For each detection, the synset is displayed to the user alongwith its associated gloss. For example, the sequence “car”can be found several times in WordNet. Each synset con-taining “car” as word of speech is displayed to the useralong with its synonyms (e.g. “car”, “automobile”), and thegloss (e.g. “a motor vehicle with four wheels; usually pro-pelled by an internal combustion engine”). Through this,the user can choose the most-related WordNet synset foreach sequence. After defining the synset(s), each concept isassigned accordingly with one or multiple two-tuples, seeEq. (7):

CS = {SIS, SIY}, (7)

where SIY is one of the related WordNet synset(s)for a concept. Afterwards, Taxo-Semantics creates (sub)-taxonomies for each verified synset. The (sub)-taxonomiesare having the direct hypernym of the detected synset,and the inherited hypernyms as more general concepts.Consequently, the most general concept is the WordNet rootconcept.


4. Perform Similarity Assessment is computing the seman-tic similarity according to the chosen strategy. TS-Label iscomparing the initial labels with utilizing the ISub4 library.The lexical-based comparison result is captured inside afour-tuple, see Eq. (8):

IS = {SIS1, SIS2, SIP, SIS}, (8)

where SIS is the similarity result. TS-Bk starts by compar-ing the (sub)-taxonomies using the Shortest Path Measure,a structure-based variant of the technique provided in Radaet al. [25], see Eq. (9):

TAS = 2 ∗ deepmax − len(SIY1, SIY2), (9)

where len(SIY1, SIY2) is the length existing between twosynsets SIY1 and SIY2, and deepmax is the depth inside Word-Net for the synset appearing deeper inside WordNet. Forresulting the maximum length, a search for the first com-mon subsumer is performed. If the common subsumer isvery close to both synsets, both synsets are consideredas similar. And, if the subsumer is not very close to bothsynsets, the synsets are considered as not similar, becauseof a longer path distance. This comparison result is capturedinside a four-tuple, see Eq. (9):

SE = {SIS1, SIS2, SIP, TAS}, (10)

where TAS expresses the path closeness. If TS-Bk is addi-tionally using the language-based analysis, the glosses arecompared, after using elimination and tokenization. Again,its comparison result is stored inside a four-tuple, see Eq.(11):

JA = {SIS1, SIS2, SIP, JAS}, (11)

where JAS is the Jaccard similarity existing between twoWordNet glosses. If TS-Bk should be combined with thelexical-based technique, the algorithm of TS-Label is per-formed. And finally, if a content-based technique is desired,the properties of the concepts are compared using Jaccardsimilarity. Its comparison result between the two sets iscaptured inside a four-tuple, see Eq. (12):

PR = {SIS1, SIS2, SIP, PAS}, (12)

where PAS is the Jaccard similarity result between twocompared lists of properties. If the matching is supportedusing user-involvement, the user can additionally define if asemantic similarity exists between the two concepts or not.The user has to assess the similarity by the help of both ini-tial labels. The assessment is captured in a four-tuple, seeEq. (13):

UI = {SIS1, SIS2, SIP, UIV}, (13)

where UIV is the rating of the user, which can be zero (“0”)for stating not similar, or one (“1”) for stating that bothlabels, respectively the concepts, are similar.

5. Combine Strategy Results is necessary if the similarityassessment is performed using the strategy TS-Bk, and the

4 http://www.swi-prolog.org/.

strategy is using a combination of multiple techniques. Con-sequently, it calculates the average mean of the differentmeasures. Hereby, each technique is considered with thesame weight.

6. Filter Irrelevant Similarities is comparing the final similar-ity result with the threshold to state if both concepts aresemantically similar or not.

7. Output Assessed Pairs exports the final result in CSVdata format, more precisely, the included five-tuples, seeEq. (14):

S = {SIP, SIS1, SIS2, THH, SAS}, (14)

where SAS identifies, if both concepts are semantically simi-lar (“yes”), or not (“no”).

3.1.3. Extending taxonomiesTaxo-Semantics can use the output of the assessment to help the

user defining more complex relationships between sub-ordinatedand super-ordinated concepts, instead of connecting those with acommon super concept. In the following, this type of concept isreferred as mediator concept. Formally, a Mediator Concept is asub-ordinated concept of a super concept, and super-ordinates Nsub-ordinated concepts of the super concept. For example, the con-cepts “Car Washers & Cleaner” and “Car Wash & Polish” can have themediator concept “Car Washing & Cleaning Material”, or the concepts“Cleaning Tools” and “Pressure Workers” can have the mediatorconcept “Cleaning and Pressure Equipment”, see Fig. 5. This pre-supposes that the sub concepts of a mediator concept are detectedas semantically similar, and the sub concepts that are not part ofthe mediator concept, are detected as not similar. For example, theconcept “Detailing Kits” has no similar sibling concept, thats whyno mediator concept is created. Creating such mediator concepts isrequired when there is a need to detail the taxonomy, for examplewhen providing personalized directories. The creation of mediatorconcepts makes use of the user, the assessment result, and performsin two steps:

1. For each concept, the label is displayed to the user. Accord-ing to the results of the similarity comparison, the seman-tically similar sibling concepts are detected. Each label isdisplayed to the user to define a label for the mediatorconcept. Finally, the mediator is added to the knowledgebase as a four-tuple, see Eq. (15)

M = {SIS, NBS, SIP, MED}, (15)

where NBS is a list containing all semantically similar siblingconcepts, and MED is the label of the mediator concept.

2. Similar to the last step of the matching operation, the medi-ator concepts are output in CSV.

Department: Auto & Tires

Auto Detailing & Car Care

DetailingKits

Cleaning andPressure Equipment

PressureWorkers

CleaningTools

Car Washing& Cleaning Material

Car Wash& Polish

Car Washers& Cleaner

Fig. 5. A subset of the exemplary extended Walmart product taxonomy.

http://www.swi-prolog.org/


3.2. Taxo-Semantics implementation

To illustrate Taxo-Semantics, the logic programming languageProlog is used. In Prolog, a knowledge base consisting of facts, rules,and queries is used to structure the e-catalog and the system. Factsare predicates, which in contrast to rules, do not query other predi-cates. Each predicate (e.g. main(ID, Name)) includes a functor, i.e. thename of the predicate, and a set of arguments captured betweenbrackets (e.g. (ID, Name)). The short form to represent predicates isto write the number of arguments behind the functor (e.g. main/2).The arguments can be variables (starting with a capital letter), lists,atoms, or strings. A list (written as [. . . ]) can include multiple atoms,strings, or other lists. Through its data structure, logic approachesare geared to deal efficiently with larger sets of concepts, but alsowith small business taxonomies, and provide an effective approachfor knowledge management in e-commerce, especially for taxonom-ical engineering [26]. The final query in Taxo-Semantics (assess/0)performs with the help of seven foregone rules, see Listing 1.

The input of the taxonomy is done using rule input/0, see Listing 2.The used rule inputdefine/3 is taken to define the taxonomy (PAT),

its data model (POS), and the natural language of the taxonomy(LAN). Another rule inputprolog/3 is using identifypairs/1 for iden-tifying the super concepts having minimum two sub concepts. Foreach super concept, the concept pairs are asserted to the knowledgebase (cp/3) using predicate createpairs/1. If the taxonomy is storedin CSV format, rule inputcsv/3 satisfies, instead of inputprolog/3. Andfinally, if the language is not English, a dictionary must be defined intranslate/1.

The strategy, including the threshold, is queried with the rulestrategy/0, see Listing 3. In techniques/4, the user can define if theproposed technique should be used (1) or not (0) to support the onbackground knowledge based strategy. A lexicon-based technique

performing on the labels (BKL), a language-based technique perform-ing on the WordNet gloss assigned to the synset(s) (BKG), and acontent-based technique using the concepts’ properties (BKP). Fur-thermore, the user has to define, if user-involvement should besupported or not in UIV. The final strategy is captured in st/6.

The preprocessing loads the background resources using rulepreprocess/0, see Listing 4. It also searches for each concept therelated WordNet synset(s) to calculate the (sub)-taxonomies.

Rule wordnet/1 is consulting the WordNet resources into Prolog(WOR) and the English dictionary in CSV file for removing irrelevanttokens and sequences (PAL). With rule synset/0 the related Word-Net synset(s) is/are detected for each concept. To do so, the label isdivided into single sequences, sequences not found in the dictionaryare removed, and the length of the remaining root word is measuredwith rule rootword/4. This is necessary to define if the remaining rootword consists of one or multiple sequences. Root words consisting ofonly one sequence are detected with synset1/3. Root words contain-ing multiple sequences are searched in WordNet using synsetN/3.If no synset is found for the compound root word, each sequenceis searched in WordNet with synsetF/4. Each matched synset isdisplayed along with its synonyms and its gloss to the user withsynsetU/2. If no synset is found, synset1N/3, synset0N/2, or synsetZ/1satisfies. If a single sequence of the MWE should be ignored, orthe complete label should be ignored, the user can skip the detec-tion. For each verified synset, its (sub)-taxonomies are resulted usingrule subtaxonomies/0, along with the included rule hypernym/2 fordetecting the (inherited) hypernyms.

The matching to detect similarity between pairs is performedwith matching/0, see Listing 5.

The rule matchlabels/0 is performing a comparison based on theinitial labels of the two concepts. Hereby, the built in predicateisub/4 is used. After the comparison, the predicate is/4 is created

Listing 1. s/0.

Listing 2. si/0.


Listing 3. ss/0.

Listing 4. sp/0.

holding the comparison result ISS. The glosses are compared withusing Jaccard similarity in the rule matchlanguage/0. To do so, eachdetected synset is matched with the WordNet gloss (GLO). For eachgloss, stop words and punctmarks are removed before storing thecomparison result in the fact ja/4. The (sub)-taxonomies of thesynsets are compared in rule matchpath/0. Hereby, it first searchesfor the first common subsumer in both paths with utilizing thebuilt-in predicate member/2. Afterwards, it analyzes the position ofthe subsumer in both paths by exploiting the index with the built-in predicate index/3. Finally, both weights are subtracted and theresult is captured in the fact se/4. If the concepts are assigned withproperties, a comparison between the two sets is performed usingJaccard similarity in rule matchcontent/0, before asserting the predi-cate pr/4 including the similarity result. With the rule matchuser/0,the user can, in addition to the involved techniques, also indicateif the concepts are similar. For each pair, the user can define if amatching exists (1), or not (0), which is later compared with theresult(s) provided through the automatically performing matchingtechniques.

As each concept can have multiple synsets, and the strategycan also consist of different techniques, the combined results aredetected with rule combine/0, see Listing 6.

The results based on the structure of WordNet are combinedwith combinestructure/0. If the structure-based analysis is supportedthrough analyzing the synset gloss, this results are combined incombinegloss/0. All results of different techniques are combined withcombineall/1. Finally, the predicate re/4 is asserted, which includesthe identifier of the super concept, the identifiers of the two com-pared sub concepts, and the combined similarity result SIM.

To filter if the pair is similar or not, the rule sf/0 is used, seeListing 7. To do so, the similarity result for each pair is comparedagainst the in the strategy defined threshold. If user-involvementis supported, the similarity result performed through the systemand the user-rating is combined in filteruser/2. If user-involvementis not supported, only the similarity score detected by the systemis compared in filtermachine/2. In both cases, the predicate s/5 isasserted.

Finally, each generated predicate s/5 detailing the similarity ofthe pair is output in CSV format using rule output/0, see Listing 8. Ifthe assessment result should be used to extend the taxonomy, therule extend/0 is used in addition, including two foregone rules. Inrule extendsiblings/0, for each concept, the semantically similar sib-ling concepts are resulted using rule siblingssimilar/2. Afterwards,the user has to define a label generalising the included concepts,or the user can ignore the creation of the mediator concept, usingrule similarslabel/0. Finally, each mediator concept is added to theknowledge base, respectively output to CSV file format using ruleextendoutput/0.

4. Experimental evaluation

To demonstrate the efficiency of Taxo-Semantics when comparingconcepts in e-catalogs, the system is evaluated on three e-commercedatabases, see Table 1. The Adventureworks, and the Northwinddatabase are available through Microsofts hosting site CodePlex.5

5 https://www.codeplex.com/.

https://www.codeplex.com/


Listing 5. sm/0.

The Festool database is provided through a German retailing firm. Allthree databases are e-commerce applications, which are using tax-onomies (i.e. e-catalogs) to categorize goods. For all three databases,the labels used to describe the concepts mainly consist of MWEs.The databases were used as provided with minimal modification.

Concepts that have been deactivated by the marketing expert, i.e. notshown to the customers anymore, have been removed from the anal-ysis, as given for the Festool database. In addition, labels in pluralhave been modified into its singular form, and for the matching withWordNet, all characters have been transferred to lowercase.

Listing 6. sc/0.


Listing 7. sf/0.

Listing 8. so/0.

Each taxonomy is considered to consist of three hierarchies(root concept, super concept, sub concept) and each sub conceptcontains maximum three properties. The properties for the Festooldatabase are already provided. For the Adventureworks database,the properties were taken from the demo platform Componentone.6

No demo is provided for Northwind, thats why no properties areprovided for this database. The task is to define the similaritybetween the concepts of the third hierarchy (sub-concept) analoguesto the judgement provided by an expert user. The assessment isperformed using WordNet. For example, the concepts inside theAdventureworks database “shorts” and “gloves” belong to the superconcept “clothing”. However, for both concepts there exists a morespecific hypernym, namely “trousers” and “hand wear”. Through this,“shorts” and “gloves” are considered as not similar. In contrary, a“bib-short” and “short” should be detected as similar as both havethe common hypernym “trousers”. The experiments are dividedinto three directions to provide the most comprehensive and repro-ducible analysis:

• TS-Label and TS-Bk (without user-involvement) are com-pared with other well-known related techniques, see Table 2.This helps to identify, which strategy performs best forwhich database characteristics. TS-Label is compared withother lexical-based approaches, namely the Hamming and

6 http://www.componentone.com/.

Levenshtein distance. TS-Bk is compared with structure-basedapproaches, the Wu Palmer distance, the distance metricspresented by Ahsae, as well as the structure based methodspresented by Li, and by Liu. Furthermore, the used structure-based technique inside TS-Bk is added to this category. To do so,TS-Bk is considered without the flexibility to add further tech-niques. According to the literature, it is in the following namedShortest Path Measure. Each technique is implemented in Pro-log based on the formula presented in Lingling et al. [22], anda cross-verification using WordNet Similarity for Java (WS4J7).Where necessary, the technique was normalized using themaximum length of the label/concept. Each technique wasmodified to be applicable for assessing MWEs, analogues to ourstrategies.

• As TS-Bk allows to combine multiple techniques inside onestrategy, different single variants of the strategy are also inves-tigated, see Table 3. This means that each combination oftechniques is considered separately. This helps to identify,which combination of different techniques helps to improvethe matching quality result. In addition to the techniques hav-ing the aim to semantically compare the sibling concepts, theprovided translation service is also evaluated. To do so, thedatabase that is provided not only in English (Festool) is used.

7 http://ws4jdemo.appspot.com/.

http://www.componentone.com/

http://ws4jdemo.appspot.com/


Table 1Characteristics of the four databases used for experimental evaluation.

Characteristic Adventureworks Northwind Festool

Number root concepts 1 1 1Number super concepts 4 8 9Number sub concepts 37 22 44Average mean of sequences

per sub concept1.19 (±0.40) 1.32 (±0,48) 2.98 (±1.37)

Average mean of irrelevantsequences per sub

0 (±0) 0.18 (±0.39) 0,44 (±0.88)

Number of propertiesassigned to sub concepts

48 – 132

Number of concept pairs 188 27 115Number of concept pairs

being similar37 7 57

Number of concept pairsbeing dissimilar

151 20 58

More precisely, as Festool is a German company, we comparethe translation of the German taxonomy into the English tax-onomy performed by the expert, with the translation of theGerman taxonomy into the English taxonomy performed bythe machine. To better distinguish between the results, we usethe adequacy of the translation (all meaning, most meaning,much meaning, little meaning, none meaning).

• The strategy using user-involvement, i.e. TS-User, is comparedwith TS-Bk. Through this, it can be stated, if involving thenon-expert user can improve the matching quality result. Fur-thermore, as the best performing variant of TS-Bk is used forcomparison, it states the minimal improvement to be expected.However, as the choice of the most related WordNet synseteffects our strategies, we also investigate the probability tochoose the correct synset. To do so, we firstly investigate thenumber of matched synsets shown to the expert, as well asshown to the non-expert. As the number of synsets shown tothe users has an influence to choose the correct synset, but alsohas an influence on the time required to choose the correctsynset, this measure states two characteristics. To better dis-tinguish between the results, we compute the average numberof synsets shown to the user per concept, but also investi-gate their distribution. Hereby, a concept resulting exactly onesynset to be chosen is referred as 1:1 synset, a concept result-ing two synsets is referred as 1:2 synset, and so on. Secondly,we investigate the possible information loss when the experthas to assess the most related synset(s), compared to whenthe non-expert has to assess the most related synset(s). Todo so, we investigate the percentage of synsets being MWE,the percentage of synsets performing as compound synsets fora single label, as well as the percentage of single synsets fora single label. This is, because non-expert users tend to skip

Table 2Compared techniques used for assessing semantic simi-larity between concepts.

Technique/strategy Description

Levenshtein [27] Lexical-basedHamming Lexical-basedShortest Path Measure Structure-basedWu and Palmer [28] Structure-basedAhsae et al. [9] Structure-basedLi et al. [17] Structure-basedLiu et al. [29] Structure-basedTS-Label Lexical-basedTS-Bk Background-knowledgeTS-User TS-Bk, user-involvement

Table 3Characteristics of the subvariants used for strategy Taxo-Semantics-BK.

Variant Label Language Structure Content

TS-Bk-Label Yes No Yes NoTS-Bk-Content No No Yes YesTS-Bk-Language No Yes Yes NoTS-Bk-Textual Yes Yes Yes NoTS-Bk-Complete Yes Yes Yes Yes

sequences, if they have already defined another synsets foranother sequence.

• Finally, an analytical comparison, and a statistical signifi-cance measure is performed to highlight the strength andweakness of the presented decision support system. Hereby,the proposed strategies are compared with existing techniquesin a theoretical manner by investigating the characteristics ofthe used techniques, benefits, and its drawbacks.

The three proposed matching strategies, its variants, as wellas the above-mentioned related lexical-based and structure-basedtechniques, are evaluated with using the standard metrics used ininformation retrieval, namely the Balanced Accuracy (shorted asBACC), given in Eq. (16):

BACC =TPR + TNR

2, (16)

where TPR is the true positive rate, and TNR the true negative rate.The TPR, also referred as Sensitivity, measures the proportion ofcorrectly identified positives, as given in Eq. (17):

TPR =TP

TP + FN, (17)

where TP is a true positive, and FN is a false negative statement. TheTNR, also referred as Specificity, measures the proportion of correctlyidentified negatives, as given in Eq. (18):

TNR =TN

TN + FP, (18)

where TN is a true negative, and FP is a false positive statement. Astatement is true, if the assessment provided through the system isequal to the assessment provided through the expert (positive, ornegative). A statement is false, when both assessments are unequal.The maximum accuracy is 1, the minimum accuracy is 0. For eachtechnique/strategy/variant, the average BACC is computed for eache-catalog to analyze the overall result, as given in Eq. (19):

BACC =BACCTH1 + . . . + BACCTHN

NTH, (19)

where BACC is the balanced accuracy for a single threshold TH, and Nis the number of thresholds. In total, 21 thresholds were investigatedin steps of 0.05: 0.00, 0.05, 0.10, . . . , 1.00. In addition, the BACC foreach of the 21 thresholds is computed and highlighted to analyze theconsistency of the technique/strategy/variant (a graphical summaryfor the most meaningful thresholds is presented in Figs. 9, 10, and11). The improvement (shorted as INC) of the strategies compared toexisting techniques is compared with analyzing the relative increaseof balanced accuracy, as given in Eq. (20):

INC =

(BACCStrategy

BACCTechnique∗ 100

)− 100, (20)


where Strategy stands for the BACC result of a strategy of Taxo-Semantics, and Technique stands for the BACC result of an existingtechnique. This formula is also used to measure the improvementwhen combining different techniques inside the strategy TS-Bk, aswell as when using user-involvement in the strategy TS-User. For thelatter, we follow the most recent progress of the OAEI for evaluatinguser-involvement. Namely, by defining a 10%, 20%, and 30% error ratefor positive and negative statements. Hereby, the mentioned per-centage of pairs is classified incorrectly by the users to simulate theby nature chance when involving non-experts. The average result ofdifferent users is finally taken as balanced accuracy.

4.1. Comparison of strategies with existing approaches

The obtained results evident, that for the three e-catalogs,the Taxo-Semantics matching strategies could outperform existinglexical- as well as structure-based techniques, see Figs. 6, 7, and 8,and Table 4.

TS-Label could increase the accuracy compared to existinglexical-based techniques by on average +7.31%. Using the ISublibrary instead of the best performing lexical-based techniqueLevenshtein results an increase of on average +5.45% across all threedatabases. This improvement is achieved as ISub is normalizing thetwo labels to be compared. More precisely, all sequences of MWEsare mapped to lowercase, and stopwords inside the labels are auto-matically removed. This is important, as in real-world databases,the labels often contain provider-specific abbreviations, or the labelsonly differ in having different adjectives before a noun. Conse-quently, the highest improvement was performed for the Northwind,and the Festool database. Both are using irrelevant sequences tolabel the concept, and Festool is in addition using provider-specificsequences.

TS-Bk could outperform existing structure-based techniques byon average +8.80%. Compared to the best performing existingstructure-based techniques, an increase of on average +2.81% wasperformed. The highest improvement was achieved for the Festooldatabase. This is because this database is using the highest portionof relevant sequences inside the MWEs. This results demonstratethe importance of semantically rich labels for sibling sub concepts.However, in contrast to the lexical-based techniques, not a single

Leven

shtei

n

Hamming

Rada

Wu-

Palm

erAhs

ae LiLiu

TS-Lab

el

TS-Bk

TS-Use

r0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.21

0.13 0.

18

0.59

0.45

0.29

0.46

0.11

0.34

0.53

0.85 0.

91

0.86

0.47

0.62

0.78

0.61 0.

67

0.79

0.88

0.53

0.52

0.52

0.53

0.54

0.54

0.54

0.52 0.

57

0.7

Sensitivity Specificity Balanced Accuracy

Fig. 6. Sensitivity, Specificity, and Balanced Accuracy results achieved for the Adven-tureworks e-catalog.

Leven

shtei

n

Hamming

Rada

Wu-

Palm

erAhs

ae LiLiu

TS-Lab

el

TS-Bk

TS-Use

r0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.28

9¨1

0´2

0.33

0.69

0.59

0.48

0.6

0.14

0.3

0.66

0.75

0.89

0.39

0.53

0.67

0.49

0.92

0.85 0.87

0.52

0.49 0.

57

0.54 0.56 0.58

0.55

0.53 0.

58

0.76


Fig. 7. Sensitivity, Specificity, and Balanced Accuracy results achieved for the North-wind e-catalog.

existing technique performed second-best. This shows the inconsis-tency of existing techniques, which can be overcome by using TS-Bk,respectively a combination of multiple techniques. For the Adven-tureworks database, an increase of +5.61% was performed comparedto when only using the Shortest Path Measure. The obtained resultsfor the Northwind database are equal to the Li distance result. And,for the Festool database, the accuracy could be increased by +2.84%compared to the Shortest Path Measure. The results for the threedatabases in addition demonstrate that the best structure-basedtechnique was used inside TS-Bk. When not combining the ShortestPath Measure with additional techniques, the an average balancedaccuracy of 0.60 was achieved. The other structure-based tech-niques achieved a lower balanced accuracy: 0.54 (Wu and Palmer),0.57 (Ahsae), 0.59 (Li), and 0.56 (Liu).

When comparing the strategy TS-Label with TS-Bk it is obviousthat the lexical-based strategy shows almost the same balancedaccuracy result. However, this is affected by a very high specificity

Leven

shtei

n

Hamming

Rada

Wu-

Palm

erAhs

ae LiLiu

TS-Lab

el

TS-Bk

TS-Use

r0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.45

0.25

0.69

0.8

0.75

0.72 0.75

0.65 0.

69

0.8

0.7

0.86

0.72

0.31

0.45

0.58

0.43

0.72 0.

76 0.82

0.58

0.56

0.71

0.56 0.

6 0.65

0.59

0.69 0.

73 0.81


Fig. 8. Sensitivity, Specificity, and Balanced Accuracy results achieved for the Festoole-catalog.


0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.70.4

0.6

0.8

1

Threshold

Acc

urac

y

Levenshtein Hamming Rada Wu-Palmer

Ahsae Li Liu TS-Label

TS-Bk TS-User

Fig. 9. Balanced accuracy results for various thresholds achieved for the Adventure-works e-catalog.

compared to a low sensitivity. The reason for this is that the MWEsget more similar the deeper the concept occur inside the taxonomy.Through this, a lower result can be expected when using a higherlevel of the taxonomy for the experiments. In contrast, the structure-based techniques show a higher harmony when comparing thesensitivity and specificity.

4.2. Comparison of additional techniques inside combined strategies

When comparing the different variants of TS-Bk, multiple obser-vations can be highlighted. Combining the lexical-based techniquewith structure-based techniques inside the variant TS-Bk-Label hasnot increased accuracy. The adding of a content-based analysisshowed an inconsistent improvement. For the Adventureworksdatabase, the accuracy could be increased by +8.65%. For the Festooldatabase, a decrease was performed. The reason is that the Festooldatabase is using very similar properties across all concepts. Through

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.70.4

0.6

0.8

1

Threshold

Acc

urac

y



TS-Bk TS-User

Fig. 10. Balanced accuracy results for various thresholds achieved for the Northwinde-catalog.

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.70.4

0.6

0.8

1

Threshold

Acc

urac

y



TS-Bk TS-User

Fig. 11. Balanced accuracy results for various thresholds achieved for the Festool e-catalog.

this, a higher similarity result is computed, which negatively affectsthe accuracy. The Adventureworks database in contrast is using moredifferent sets of properties to more precisely distinguish betweenconcepts. The highest improvement was performed when combiningthe language-based technique with the structure-based techniqueinside TS-Bk-Language. On average a further decrease of on average+0.92% was performed.

When summarizing the results regarding the translation ser-vice, it can be stated that most of the concepts are translated withall meaning or most meaning (26.19%, and 35.71%). One difficultyregarding the translation was that the Festool is using for some termsprovider-specific translations, i.e. alternative synonyms. For that rea-son, some MWE have been translated with much meaning (23.81%).A small number of concepts has been translated with little or nonemeaning (9.52%, and 4.76%), which highlights that for most of theMWE a meaningful translation has been performed using Googletranslator. See Table 5.

4.3. Comparison of user-involvement with background-knowledge

The strategy Taxo-Semantics-User is based on the strategy Taxo-Semantics-Bk, but in addition supports user-involvement. The com-parison between both strategies allows to state if user-involvementcan additionally affect the matching quality result. To demonstratethe minimal expected increase of accuracy, the best performingsubvariant of TS-Bk is compared with the accuracy result whentaking the comparison result of non-experts into account. For allthree databases, the adding of user-involvement could significantlyimprove the matching quality result.

On average, an increase of +22.99% was performed. The high-est improvement was performed for the Northwind database(+32.75%), as this database showed the lowest accuracy result acrossall existing techniques and proposed strategies. This highlights thatsupervised user-involvement not only increases the matching qual-ity result itself, but also the consistency of the used strategy acrossdifferent e-catalogs, see Figs. 9, 10, and 11. The probability to choosethe correct synset goes from 40.29% to 55.99%, as shown in Table 6.For the Adventureworks and the Northwind database, maximumtwo sequences have been searched for the concepts inside WordNet.Depending on the position of the sequence, the probability furtherincreases to choose the correct synset. For the Festool database,


Table 4Improvement Taxo-Semantics for different databases compared to existing techniques.

Existing Taxo-Semantics Adventureworks Northwind Festool Average

Lexical-based TS-Label 1.14 11.04 9.73 7.31Best TS-Label 0.19 8.35 7.83 5.45Structure-based TS-Bk 6.60 3.05 16.94 8.80Best TS-Bk 5,61 0.00 2.84 2.81TS-User TS-Bk 24.48 32.75 11.72 22.99

Table 5Improvement Taxo-Semantics when combining TS-Bk with additional techniques.

Technique Subvariant Adventureworks Northwind Festool Average

Shortest path TS-Bk-Lexical 0.00 −4.39 0.00 −1.46Shortest path TS-Bk-Content 8.65 – −7.09 0.78Shortest path TS-Bk-Language −0.96 0.88 2.84 0.92Shortest path TS-Bk-Textual 0.00 −1.75 1.42 −0.11Shortest path TS-Bk-Complete 3.85 – −2.13 0.86

maximum three valid sequences have been searched for the conceptsinside WordNet. Similar as for the other two databases, the proba-bility increases with the position of the sequence. The possibility toincrease the time required to choose the correct synset as the expertusers, and to increase the probability would be to remember thechoice for repeating sequences. For example in the Festool database,many concepts are using labels that overlap with other concepts andonly differ in a single sequence. However, this would increase thecost of computation and decrease the flexibility of the decision sup-port system. When comparing the expert users with the non-expertusers regarding the actually chosen synsets, it is further clear thatthe difference mainly consists in skipping sequences, see Table 7. Asthe non-expert users tend to identify the same MWE inside WordNetas the expert user, the MWE that cannot be found in WordNet areslightly more error-prone. Through this, some compound terms arebecoming labels consisting of a single sequence after the matchingwith WordNet.

4.4. Analytical comparison and statistical significance

In addition to the metrics used in information retrieval, a statis-tical significance measure is performed using the Receiver OperatingCharacteristics (ROC) curve and the resulting Area Under Curve(AUC) measure. The ROC curve is comparing for each of the above-mentioned thresholds, the false positive rate (FPR = 1 − TNR)against the TPR. Through this, the diagram can clearly identify if thetechnique/strategy performs well or badly in terms of a binary clas-sification task. The AUC measure represents the area under the ROCcurve. Consequently, a technique/strategy having a higher AUC mea-sure is considered as better compared to a technique/strategy havinga lower AUC area. To better distinguish between the different values,the traditional academic point system is used. It classifies an AUCbetween 0.9 and 1.0 as excellent, an AUC between 0.8 and 0.9 asgood, an AUC between 0.7 and 0.8 as fair, an AUC between 0.6 and0.7 as poor, and an AUC between 0.5 and 0.6 as fail.

According to the statistical significance measure, the ShortestPath Measure, as used in TS-Bk, is the only single technique thatshows a fair result, see Table 8. The lexical-based techniques andstrategies show overall a poor result, or the technique fails. Abetter statistical result show the structure-based techniques andstrategies. When comparing TS-Bk with the Shortest Path Measure,only a slight increase can be performed. The strategy using user-involvement is the only technique performing an excellent result.See Fig. 12.

From an analytical perspective, the Hamming and Levenshteindistance are analyzing the similarity based on the strings of the

labels, as similar to the strategy proposed in Taxo-Semantics-Label [27]. The Hamming distance describes the number of charactersbeing different at the same index. Each substitution necessary totransform the initial character into the target character increasesthe distance result [30]. The Levenshtein distance characterizes theminimum number of edits required to transform one string into theother. Each edit is qualified by the kind of transformation necessary:if a single character has to be added (insertions), deleted (deletions),or substituted (substitutions). Compared to the technique in Taxo-Semantics-Label, the Hamming distance requires that the strings tobe compared have the same length. Consequently, if the stringsdo not have the same length, the strings have to be normalized,which results further computational costs, and a decrease of the sim-ilarity value. However, if the strings merely differ in adjectives orprovider-specific addition sequences (e.g. “Car” compared to “CarBig”), the computation is meaningful, especially when weightingthe adjectives lower than the main word. When comparing theLevenshtein distance with Taxo-Semantics-Label the difference issmaller as the used ISub8 library is based on the Levenshtein dis-tance [27]. However, the ISub library is capable to automaticallyconvert each character into lower case. Through this, characters areconsidered as similar, because they have the same semantic weight.The distance measures presented in Wu and Palmer [28], Ahsae etal. [9], Li et al. [17], Liu et al. [29] are structure-based techniques.The Wu and Palmer [28] measure is referred as edge-based measure,because of it is using the latest common subsumer of the two con-cepts to be compared, respectively the relative depth of the concepts.This is important for taxonomies consisting of multiple levels, fur-ther arbitrary relationships (e.g. meronyms), and if each conceptbelongs to exact one super concept. However, e-commerce are usu-ally limited to the is-a knowledge. In addition, e-catalogs mainlyconsist of maximum four levels with redundant sub concepts. Thetechnique proposed in Ahsae et al. [9] is built upon the before-mentioned measure, but in also measures the difference of the depth.In addition, their attempt is capable to weight each sequence of aMWE different. However, in e-commerce this would require thatthe user has a high expertise about the domain, or further user-involvement would be required, which effects further computationalcosts. The Li et al. [17] is mapping the concepts into a concept space.This has the benefit that the context of the term can be consid-ered over a larger semantic network. However, the attempt does notconsider to use the knowledge also for other measures(e.g. language-based), as provided in Taxo-Semantics-Bk. The work presented in

8 http://www.swi-prolog.org/.

http://www.swi-prolog.org/


Table 6Probability for expert and non-expert users to choose the most related synset for three differentdatabases.

Sequence/synset Adventureworks Northwind Festool

Average 4.43 ± 3.39 2.91 ± 2.22 3.98 ± 2.791:1–1:3 56.75 63.63 52.391:3–1:6 21.62 31.82 30.95

Synset 1 1:7–1:9 10.81 0.00 14.281:10–1:12 8.11 4.55 2.381:12–1:15 2.70 0.00 0.00Probability 40.29 ± 31.81 55.99 ± 35.75 42.77 ± 32.49Average 7.00 ± 5.59 1.33 ± 0.58 2.94 ± 2.091:1–1:3 33.33 100.00 78.131:3–1:6 16.67 0.00 15.63

Synset 2 1:7–1:9 0.00 0.00 3.131:10–1:12 50.00 0.00 3.131:12–1:15 0.00 0.00 0.00Probability 41.67 ± 45.64 83.33 ± 26.87 47.01 ± 32.49Average – – 3.1 ± 3.211:1–1:3 – – 80.001:3–1:6 – – 0.00

Synset 3 1:7–1:9 – – 20.001:10–1:12 – – 0.001:12–1:15 – – 0.00Probability – – 63.89 ± 39.65

Table 7Information loss for non-expert users to choose the most related synset for threedifferent databases.

User Label Adventureworks Northwind Festool

MWE 2.70 5.00 0.00Expert Compound 13.51 0.00 45.45

Single 83.78 95.00 54.54MWE 2.70 4.55 0.00

Non-expert Compound 0.00 0.00 36.36Single 97.30 95.45 63.63

Liu et al. [29] combines the Shortest Path Measure with the depthof the concepts in a non-linear function, and is the most relatedtechnique to the work presented at hand. Their attempt is distin-guishing between background knowledge consisting of merely theis-a structure (tree), and between background knowledge includ-ing more arbitrary relationships (graphs). However, as backgroundknowledge, merely WordNet is used.

5. Conclusions

This work presented Taxo-Semantics, a system to assess simi-larity between concepts in e-catalogs, and to use the assessmentresult for extending e-catalogs. Taxo-Semantics differs from existingapproaches in five directions. Firstly, MWEs can be analyzed throughproviding three different matching strategies. Secondly, the system

includes a translation service for matching concepts not expressed inEnglish for being matched to WordNet. Thirdly, the proposed systemcan input/export the taxonomies, respectively the assessment resultin relational database format. Fourthly, Taxo-Semantics providesoptional user-involvement to support the matching with WordNet,and to affect the assessment result. Fifthly, the assessment result canbe used to create mediator concepts for semantically similar siblingconcepts. The extensive computational experiments performed onthree e-catalogs, highlight the efficiency of Taxo-Semantics and thesupported strategies. On average, the accuracy could be increasedby +7.31%, +8.80%, and +22.99% for the lexical-based, background-based, and the strategy supporting user-involvement.

Future work on Taxo-Semantics can be divided in three directions.Firstly, standard and upper-level taxonomies like eCl@ss or GS1could be used. Those provide a more specific view on conceptsacross e-commerce domains and provide a further description forthe concepts. This would allow to enrich the matching process byintegrating a further language-based analysis in addition to using theWordNet gloss. Secondly, as the single sequences of MWEs usuallyhave a different semantically weight, this can be considered for thebackground-based strategy. And thirdly, as taxonomies suffer fromsemiotic heterogeneity, i.e. misinterpretation of concepts, implicitfeedback could be used to further analyze the goodness of thesemantics of the taxonomy.

Acknowledgments

Thanks to Festool GmbH & Co. KG for providing real-world datato analyze in our experiments.

Table 8Semantic assessment area under curve comparison results for three databases.

Technique/strategy Adventureworks Northwind Festool Mean Point

Levenshtein [27] 0.61 0.51 0.73 0.62 ± 0.11 PoorHamming 0.59 0.43 0.61 0.54 ± 0.10 FailShortest Path Measure 0.63 0.58 0.88 0.70 ± 0.16 FairWu and Palmer [28] 0.58 0.63 0.67 0.63 ± 0.05 PoorAhsae et al. [9] 0.60 0.66 0.72 0.66 ± 0.06 PoorLi et al. [17] 0.61 0.65 0.83 0.70 ± 0.12 FairLiu et al. [29] 0.60 0.65 0.72 0.66 ± 0.06 PoorTS-Label 0.54 0.55 0.83 0.64 ± 0.16 PoorTS-Bk 0.62 0.62 0.89 0.71 ± 0.16 FairTS-User 0.90 0.99 0.99 0.96 ± 0.05 Excellent


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

00.10.20.30.40.50.60.70.80.9

1

1-Specificity

Sens

itivi

ty

Levenshtein

Hamming

Shortest Path

Wu-Palmer

Ahsae

Li

Liu

Taxo-Semantics-Label

Taxo-Semantics-Bk

Taxo-Semantics-User

(a) Adventureworks database

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

00.10.20.30.40.50.60.70.80.9

1

1-Specificity

Sens

itivi

ty

Levenshtein

Hamming

Shortest Path

Wu-Palmer

Ahsae

Li

Liu


Taxo-Semantics-Bk

Taxo-Semantics-User

(b) Northwind database

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

00.10.20.30.40.50.60.70.80.9

1

1-Specificity

Sens

itivi

ty

Levenshtein

Hamming

Shortest Path

Wu-Palmer

Ahsae

Li

Liu


Taxo-Semantics-Bk

Taxo-Semantics-User

(c) Festool database

32

Fig. 12. Semantic assessment receiver operating characteristic comparison results for three databases. (a) Adventureworks database. (b) Northwind database. (c) Festool database.

References

[1] D. Sanchez, M. Batet, A semantic similarity method based on informa-tion content exploiting multiple ontologies, Expert Syst. Appl. 40 (2013)1393–1399.

[2] H. Angermann, N. Ramzan, E-commerce with SmartStore.NET - a review of thethrough Microsoft distributed Shop-System, Vis. Studio One 44 (2016) 39–42.

[3] K. Meijer, F. Frasincar, F. Hogenboom, A semantic approach for extractingdomain taxonomies from text, Decis. Support. Syst. 62 (2014) 78–93.

[4] A. Sharma, S. Dey, Mining marketing intelligence from online reviews usingsentiment analysis, Int. J. Intercul. Inf. Manag. 5 (2015) 57–82.

[5] H. Angermann, N. Ramzan, Preference analysis with Microsoft SocialEngagement - big data and Web 3.0 inside Microsoft Dynamics CRM, Vis. StudioOne 44 (2016) 39–42.

[6] H. Angermann, N. Ramzan, Taxo-Publish: towards a solution to automaticallypersonalize taxonomies in e-catalogs, Expert Syst. Appl. 66 (2016) 76–94.

[7] S. Chuang, L. Chien, Enriching Web taxonomies through subject categoriza-tion of query terms from search engine logs, Decis. Support. Syst. 35 (2003)113–127.

[8] B. Furlan, V. Batanovic, B. Nikolic, Semantic similarity of short texts in lan-guages with a deficient natural language processing support, Decis. Support.Syst. 55 (2013) 710–719.

[9] M.G. Ahsae, M. Naghibzadeh, S.E. Yasrebi Naeini, Semantic similarity assess-ment of words using weighted WordNet, Int. J. Mach. Learn. Cybern. 5 (2012)479–490.

[10] G.A. Miller, WordNet: a lexical database for English, Commun. ACM 38 (1995)39–41.

[11] Y. Ma, J. Liu, Z. Yu, Concept name similarity calculation based on WordNet andontology, J. Soft. 8 (2013) 746–753.

[12] D. Kim, H. Wang, A.H. Oh, Context-dependent conceptualization, Proceedingsof the 23rd International Joint Conference on Artificial Intelligence, Associationfor the Advancement of Artificial Intelligence Press, Palo Alto, CA, USA, 2013,pp. 2654–2661.

[13] T. Nguyen, S. Conrad, A semantic similarity measure between nouns based onthe structure of Wordnet, Proceedings of International Conference on Infor-mation Integration and Web-based Applications & Services, Association forComputing Machinery, New York City, NY, USA, 2013, pp. 605–609.

[14] A. Hogenboom, B. Heerschop, F. Frasincar, U. Kaymak, F. de Jong, Multi-ingual support for lexicon-based sentiment analysis guided by semantics,Decis. Support. Syst. 62 (2013) 43–53.

[15] M. Cheatham, Z. Dragisic, J. Euzenat, D. Faria, A. Ferrara, G. Flouris, Results of theontology alignment evaluation initiative 2015, Proceedings of the 10th Interna-tional Workshop on Ontology Matching, CEUR Workshop Proceedings, Aachen,Germany, 2016, pp. 60–115. others.

[16] I.A. Sag, T. Baldwin, F. Bond, A. Copestake, D. Flickinger, Multiword expres-sions: a pain in the neck for NLP, Proceedings of the 3rd InternationalConference on Computational Linguistics and Intelligent Text Processing,Springer Publishing Company, Incorporated, Berlin/Heidelberg, Germany,2002, pp. 1–15.

[17] P. Li, H. Wang, K.Q. Zhu, Z. Wang, X. Wu, Computing term similarity bylarge probabilistic isA knowledge, Proceedings of the 22nd ACM International

http://refhub.elsevier.com/S0167-9236(17)30060-X/rf0005


















Conference on Information & Knowledge Management, Association for Com-puting Machinery, New York City, NY, USA, 2013, pp. 1401–1410.

[18] E. Peukert, J. Eberius, E. Rahm, A self-configuring schema matching system,Proceedings of the 28th IEEE International Conference on Data Engineering,Institute of Electrical and Electronics Engineers, New York City, NY, USA, 2012,pp. 306–317.

[19] P. Shvaiko, J. Euzenat, Ontology matching: state of the art and future chal-lenges, Trans. Knowl. Data Eng. 25 (2013) 158–176.

[20] D. Faria, C. Martins, A. Nanavaty, A. Taheri, C. Pesquita, E. Santos, Agreement-MakerLight results for OAEI 2014, Proceedings of the 9th International Work-shop on Ontology Matching, CEUR Workshop Proceedings, Aachen, Germany,2014, pp. 1–8. others.

[21] E. Jimenez-Ruiz, B.C. Grau, W. Xia, A. Solimando, X. Chen, V. Cross, LogMapfamily results for OAEI 2014, Proceedings of the 9th International Workshopon Ontology Matching, CEUR Workshop Proceedings, Aachen, Germany, 2014,pp. 1–9. others.

[22] M. Lingling, H. Runqing, G. Junzhong, A review of semantic similaritymeasures in WordNet, Int. J. Hybrid Inf. Technol. 6 (2013) 1–12.

[23] C. Fellbaum, WordNet: An Electronic Lexical Database, MIT Press, Cambridge,CB, USA, 1998.

[24] M. Aiken, M. Park, L. Simmons, T. Lindblom, Automatic translation inmultilingual electronic meetings, Translation J. 13 (2009)

[25] R. Rada, H. Mili, E. Bicknell, M. Blettner, Development and application of ametric on semantic nets, Trans. Syst. Man Cybern. 19 (1989) 17–30.

[26] A. Gomez-Perez, M. Fernández-López, O. Corcho, Ontological Engineering: WithExamples From the Areas of Knowledge Management, e-Commerce and theSemantic Web, Springer Science & Business Media. 2006.

[27] W. Levenshtein, Binary codes capable of correcting deletions, insertions andreversals, Sov. Phys. Dokl. (Springer) 10 (1966) 707–710.

[28] Z. Wu, M. Palmer, Verbs semantics and lexical selection, Proceedings of the32nd Annual Meeting on Association for Computational Linguistics, Associa-tion for Computing Machinery, New York City, NY, USA, 1994, pp. 133–138.

[29] H. Liu, H. Bao, D. Xu, Concept vector for semantic similarity and relatednessbased on WordNet structure, J. Syst. Soft. 85 (2012) 370–381.

[30] S.Z. Li, Encyclopedia of Biometrics, Springer Publishing Company, Incorporated,Heidelberg/Berlin, Germany, 2009.

Dr. Heiko Angermann received the B.Eng. (Hons.) in dig-ital publishing from Stuttgart Media University, Germanyand PhD in Computer Science from University of the Westof Scotland in 2014 and 2016, respectively. He is currentlyHead of Projects at a German E-Commerce consultinghouse. In his research he is focusing on the adaptionand evolution of user-specific taxonomies for support-ing personalized publishing in omni-channel context. Hehas been involved in various research and commercialprojects to increase security and manageability of high-risk information, especially formal structured metadata.He has defined a concept for the German Federal PrintingOffice to minimize risk for storing high-sensitive infor-mation, and developed a database for the processing oferror-prone information.

Dr Zeeshan Pervez received his Master in InformationTechnology from National University of Sciences andTechnology (NUST), Pakistan and PhD in Computer Engi-neering from Kyung Hee University (KHU), South Korea.His research interests are security and privacy aspects ofinternet-of-things, cloud computing, cyber security andknowledge management. Since 2013, he has been withUniversity of the West of Scotland (UWS) as an AssistantProfessor. At UWS he is leading postgraduate programmeson Advanced Computing (IoT and Big Data) and Informa-tion and Network Security.

Dr Naeem Ramzan received the M.Sc. in telecommu-nication from University of Brest, France and PhD inelectronic engineering from Queen Mary University ofLondon in 2004 and 2008, respectively. He is currentlyProfessor in School of Engineering and Computing, Uni-versity of the West of Scotland. He has investigated/co-investigated/participated several projects funded by EUand UK research councils. He has authored or co-authoredof over 120 research publications including journals, bookchapters and standardization contributions. He co-editeda book on Social Media Retrieval (Springer 2013) andserved as Guest Editor in number of special issues in tech-nical journals. He is senior member of IEEE, fellow of HEA,and member of IET. He is co-chair of the UltraHD groupof Video Quality Experts Group (VQEG) and co-Editor-in-chief of the VQEG E-Letter.

















taxo-semantics: assessing similarity between multi-word...

Documents